There has been growing interest in higher-order quantum processes in which separate operations do not occur in a definite causal order (see, e.g., refs. 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22 for a selection). This property, called causal nonseparability3,6,7,23 was formalized within the process matrix framework3, which describes correlations between quantum nodes of intervention without assuming a predefined order between the nodes. Challenging conventional notions of causality, causally nonseparable processes have been shown to allow informational tasks that cannot be achieved with operations used in a definite order4,5,8,24. Such processes have been conjectured to be relevant in the context of quantum gravity1,2,3,25 and closed time-like curves2,3,11,22,26,27, but some are also known to admit realizations in standard quantum mechanics on time-delocalized systems18. A prominent example is the quantum SWITCH, which has been demonstrated experimentally28,29,30,31,32,33.

On a separate front, there is the recent development of the framework of quantum causal models34,35 (see, e.g., refs. 36,37,38,39,40,41,42,43,44,45,46 for related, previous work) as a fully quantum version of the classical framework of causal models47,48. It is formulated within the formalism of process matrices, but contains the classical causal models as special cases and generalizes many of the fundamental concepts and core theorems of the latter. Quantum causal models thus constitute a general framework for reasoning about quantum systems in causal terms, allowing the rigorous study of the empirical constraints imposed by quantum causal structures—however, only as far as causal structures are concerned that are expressible as directed acyclic graphs (DAGs), i.e., where there is a well-defined causal order. The central idea behind the approach in refs. 34,35 is that causal relations between quantum systems, as encoded in a DAG, correspond to influence through underlying unitary transformations. This facilitated, in particular, a justification of the quantum Markov condition relative to a DAG that underpins the definition of a quantum causal model—any such model can be thought of as arising from a unitary circuit fragment with a compatible causal structure by marginalizing over latent local disturbances35.

It is a natural question whether these hitherto separate lines of research can be merged to arrive at a causal model perspective on processes that are not compatible with a fixed order of the quantum nodes. While this direction of thought has been considered in earlier work (see, e.g., refs. 45,49,50), it was previously not clear how to take the idea forward due to various conceptual and technical obstacles—including, for example, how quantum nodes and the quantum Markov condition should be defined, how the notion of the autonomy of causal mechanisms should be understood49, and how to prevent paradoxes.

This work overcomes these obstacles by generalizing the approach to quantum causal models of refs. 34,35. A large class of processes that are not compatible with a fixed order of the nodes can then be understood to have a causal structure, albeit one that includes directed cycles. This may appear counterintuitive, but the process matrix framework guarantees that it is free of paradoxes. The motivation for entertaining such a proposal is twofold. First, in light of the puzzling nature of causally nonseparable processes and the open question of which ones are physically possible in nature, a conceptual clarification of causal structure is an important next step. Second, our approach yields mathematical tools facilitating new technical results, including a more fine-grained description of the compositional structure of a process that is implied by its causal properties. One of the implications of the latter derived in this work is a proof that all bipartite processes that admit a unitary extension51 are causally separable. We also prove that for unitary processes, causal nonseparability and cyclicity of their causal structure are equivalent.


The process formalism, causal order and signalling

Let us start by setting out some necessary background and essential concepts. In quantum theory, a system A is associated with a complex Hilbert space \({{\mathcal{H}}}_{A}\), and its state is a density operator \({\rho }_{A}\in {\mathcal{L}}({{\mathcal{H}}}_{A})\), where \({\mathcal{L}}({{\mathcal{H}}}_{A})\) is the space of linear operators over \({{\mathcal{H}}}_{A}\). The most general evolution of a system, assuming that it is initially uncorrelated with its environment, is given by a completely positive trace-preserving (CPTP) map \({\mathcal{E}}:{\mathcal{L}}({{\mathcal{H}}}_{A})\to {\mathcal{L}}({{\mathcal{H}}}_{B})\), where this notation allows the output system to be different from the input system. The most general operation that an agent can perform from an input system A to an output system B has a classical outcome k and specifies the transformation from A to B conditioned on each value of k being obtained. Mathematically, the operation corresponds to a quantum instrument, which is a collection of completely positive (CP) maps \(\{{{\mathcal{E}}}^{k}:{\mathcal{L}}({{\mathcal{H}}}_{A})\to {\mathcal{L}}({{\mathcal{H}}}_{B})\}\), such that \({\mathcal{E}}={\sum }_{k}{{\mathcal{E}}}^{k}\) is a trace-preserving CP map.

It is convenient to represent CP maps with operators, via a variant of the Choi-Jamiołkowski (CJ) isomorphism52,53, which to a given CP map \({\mathcal{E}}:{\mathcal{L}}({{\mathcal{H}}}_{A})\to {\mathcal{L}}({{\mathcal{H}}}_{B})\) associates the CJ operator \({\rho }_{B| A}^{{\mathcal{E}}}:= {\sum }_{i,j}\,{\mathcal{E}}({\left|i\right\rangle }_{A}\left\langle j\right|)\otimes {\left|i\right\rangle }_{{A}^{* }}\left\langle j\right|\), where \(\left\{{\left|i\right\rangle }_{A}\right\}\) is an orthonormal basis of \({{\mathcal{H}}}_{A}\), and \(\left\{{\left|i\right\rangle }_{{A}^{* }}\right\}\) the corresponding dual basis. The CJ operator for a CPTP map \({\mathcal{E}}\) satisfies \({{\rm{Tr}}}_{B}[{\rho }_{B| A}^{{\mathcal{E}}}]={{\mathbb{1}}}_{{A}^{* }}\). This variant of the CJ isomorphism is used in refs. 34,35, and has the advantage that the CJ operator is both positive semi-definite and independent of the basis used in its definition.

The idea behind the process formalism3 is that there is a fixed set of locations Ai, i = 1, , n, which in this work we call quantum nodes, at each of which an agent can perform an operation on a quantum system. A quantum node Ai is associated with two Hilbert spaces, an input Hilbert space \({{\mathcal{H}}}_{{A}_{i}^{{\rm{in}}}}\) and an output Hilbert space \({{\mathcal{H}}}_{{A}_{i}^{{\rm{out}}}}\) (both here assumed finite-dimensional). The input Hilbert space carries the state of the input system just before the operation by the agent, and the output Hilbert space carries the state of the output system just after the operation. The operation itself corresponds to a quantum instrument \(\{{{\mathcal{E}}}_{A}^{{k}_{A}}:{\mathcal{L}}({{\mathcal{H}}}_{{A}^{{\rm{in}}}})\to {\mathcal{L}}({{\mathcal{H}}}_{{A}^{{\rm{out}}}})\}\). Conceptually, a quantum node is sometimes thought of as representing a small, localized laboratory in some region of spacetime, but may also be conceived more abstractly, for example as occupying a particular position in between the gates of a quantum circuit.

The aim of the process formalism is to describe the correlations between the outcomes of the operations that are performed at the separate quantum nodes. Given a set of instruments at the quantum nodes A1, ..., An, the joint probability for their outcomes is given by

$$P({k}_{{A}_{1}},\ldots ,{k}_{{A}_{n}})={\rm{Tr}}\left[{\sigma }_{{A}_{1}...{A}_{n}}\left(\mathop{\bigotimes }_{i}{\tau }_{{A}_{i}}^{{k}_{{A}_{i}}}\right)\right]\,,$$

where \({\tau }_{A}^{{k}_{A}}:= {\left({\rho }_{{A}^{{\rm{out}}}| {A}^{{\rm{in}}}}^{{{\mathcal{E}}}^{{k}_{A}}}\right)}^{T}\), and \({\sigma }_{{A}_{1}...{A}_{n}}\in {\mathcal{L}}({\bigotimes }_{i}{{\mathcal{H}}}_{{A}_{i}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{A}_{i}^{{\rm{out}}}}^{* })\) is called the process operator, and which we will also sometimes refer to more simply as the process.

A process operator \({\sigma }_{{A}_{1}...{A}_{n}}\), which up to a different convention of the CJ isomorphism is the same as a process matrix3, obeys constraints, designed to ensure that valid joint probabilities are returned by Eq. (1) for any possible choices of the operations performed by the agents, and that the same holds even when the agents have pre-shared entanglement. These constraints are3: \({\sigma }_{{A}_{1}...{A}_{n}}\ge 0\) and \({\rm{Tr}}[{\sigma }_{{A}_{1}...{A}_{n}}\,({\tau }_{{A}_{1}}\otimes \cdots \otimes {\tau }_{{A}_{n}})]=1\), for any set of CPTP maps \(\{{\tau }_{{A}_{i}}\}\) at the n nodes. Simple-to-check necessary and sufficient conditions for an operator in \({\mathcal{L}}({\bigotimes }_{i}{{\mathcal{H}}}_{{A}_{i}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{A}_{i}^{{\rm{out}}}}^{* })\) to be a valid process operator can be found in refs. 6,7. To avoid clutter when tracing over a node A we will write \({{\rm{Tr}}}_{A}[\,\text{\_}\!\text{\_}\,]:= {{\rm{Tr}}}_{{A}^{{\rm{in}}}{({A}^{{\rm{out}}})}^{* }}[\,\text{\_}\!\text{\_}\,]\).

A question of central interest in the study of the process formalism has been whether a given process operator is compatible with the existence of a definite causal order of its nodes. A closely related question concerns how this relates to the possibilities for signalling between different nodes. Let us first make these notions more precise.

Consider the sequence of quantum operations represented in the form of a circuit in Fig. 1.

Fig. 1: Simple circuit with two quantum nodes A and B.
figure 1

For any state ρ and any CPTP map \({\mathcal{E}} \) this defines a process operator over A and B.

The gate in the circuit corresponds to an arbitrary CPTP map \({\mathcal{E}}\), and the initial preparation to an arbitrary bipartite state ρ. Quantum nodes A and B correspond to positions in the circuit in between gates, at which an agent can choose to perform a quantum instrument on the system at that position. The nodes are represented as broken wires, with it understood that the agent’s instrument mediates the two pieces. The lower piece of the wire corresponds to the input Hilbert space of the quantum node and the upper piece of the wire to the output Hilbert space. Any circuit with some wires broken defines a partial order over the quantum nodes, with a node N preceding node \(N^{\prime}\) in the partial order if and only if there is a path from N to \(N^{\prime}\) along the (broken or unbroken) wires of the circuit. We call this partial order the causal order. In the example, node A precedes node B in the causal order. The circuit defines joint probabilities for the outcomes of any quantum instruments that are performed at nodes A and B, hence defines a process operator over the nodes A and B.

An important concept is now that of causal separability, first introduced in ref. 3 for the bipartite case. A bipartite process σAB is called causally separable iff it can be seen to arise as a convex mixture of processes with a fixed causal order between A and B, i.e., \({\sigma }_{AB}=p\,{\sigma }_{AB}^{A {\not \preceq} B}+(1-p)\,{\sigma }_{AB}^{B{\not \preceq} A}\), with 0 ≤ p ≤ 1, where \({\sigma }_{AB}^{B {\not \preceq} A}\) is a process that can arise from a sequence of operations of the form of Fig. 1, and \({\sigma }_{AB}^{A{\not \preceq} B}\) is a process that can arise from a different sequence of operations, of the same form except that B precedes A. Otherwise σAB is causally nonseparable. The idea is that a causally separable process can be thought to describe a situation in which a well-defined, though possibly unknown, causal order of the nodes exists, whereas a causally nonseparable process is not compatible with such an interpretation.

The connection with signalling between the nodes is as follows. Given a bipartite process σAB, we say that there is no signalling from quantum node B to quantum node A if and only if for all quantum instruments \({\tau }_{A}^{{k}_{A}}\) at A and all deterministic quantum instruments τB at B, the probability distribution \(P({k}_{A})={\rm{Tr}}[{\sigma }_{AB}({\tau }_{A}^{{k}_{A}}\otimes {\tau }_{B})]\) is independent of τB. This condition is equivalent to \({\sigma }_{AB}={\sigma }_{A{B}^{{\rm{in}}}}\otimes {{\mathbb{1}}}_{{({B}^{{\rm{out}}})}^{* }}\)3. The connection between signalling and causal order is that in any sequence of operations of the form of Fig. 1, there is no signalling from B to A. Moreover, every bipartite process operator with no-signalling from B to A is known to have a realisation as the process operator arising from a circuit of the form of Fig. 154.

The formal definition of the multipartite generalization of causal separability is more intricate than in the bipartite case: beyond just convex mixtures of fixed causal orders, the definition allows for a dynamical causal order in which the causal order at later quantum nodes can depend on the events taking place at earlier quantum nodes. This definition is postponed to a later section dedicated to causal separability. See also refs. 7,23 for a detailed discussion. A discussion of signalling in a multipartite process is also more involved, since whether a subset of quantum nodes can signal to another subset of quantum nodes depends on the interventions performed at other quantum nodes not in the two subsets.

Causal influence vs signalling

This work is concerned with a notion of causal structure, which is distinct from the causal order defined by a circuit, and which also needs to be carefully distinguished from the possibilities for signalling afforded by a general process operator. In order to motivate the idea, consider a circuit of the form of Fig. 1, with each wire representing a qubit, with \(\rho ={\rho }_{{A}^{{\rm{in}}}}\otimes {\left|0\right\rangle }_{A^{\prime} }\left\langle 0\right|\), and with the channel \({\mathcal{E}}\) being a quantum Controlled-NOT gate with the control on the output wire of the A node. This circuit defines a process operator \({\sigma }_{AB}^{0}\) on the A and B nodes, which may easily be computed, and it can be verified that \({\sigma }_{AB}^{0}\) allows signalling from A to B. Similarly, in the same circuit except with \(\rho ={\rho }_{{A}^{{\rm{in}}}}\otimes {\left|1\right\rangle }_{A^{\prime} }\left\langle 1\right|\), the process operator \({\sigma }_{AB}^{1}\) is easily computed, and it can be verified that signalling is possible from A to B.

Now consider the same experiment, except with the preparation of the \(A^{\prime}\) system given by flipping a fair coin, and preparing \({\left|0\right\rangle }_{A^{\prime} }\left\langle 0\right|\) on heads and \({\left|1\right\rangle }_{A^{\prime} }\left\langle 1\right|\) on tails. If the outcome of the coin flip is unknown, then the state of the \(A^{\prime}\) system is the mixed state \({\mathbb{1}}/2\), and the corresponding process operator over A and B is

$${\sigma }_{AB}^{{\rm{mix}}}={{\mathbb{1}}}_{{({B}^{{\rm{out}}})}^{* }}\otimes (1/2){{\mathbb{1}}}_{{B}^{{\rm{in}}}}\otimes {{\mathbb{1}}}_{{({A}^{{\rm{out}}})}^{* }}\otimes {\rho }_{{A}^{{\rm{in}}}}.$$

In the process \({\sigma }_{AB}^{{\rm{mix}}}\), there is no signalling from A to B. Indeed the very same process operator could arise from a situation in which A and B are independent and spacelike separated.

In the experiment with the coin flip, it is clear that A has a causal influence on B, since agents who know the value of the coin flip would be able to send signals from A to B. From the perspective of agents who do not know the value of the coin flip, however, signalling is washed out by the randomness of the unobserved system. A similar phenomenon is well understood in the literature on classical causal modelling. In a canonical example, A, B, and C are all classical bits, with A and C causes of B such that B is equal to the parity of A and C. If C is inaccessible, or hidden, and satisfies P(C = 0) = P(C = 1) = 1/2, then B is randomly distributed regardless of the value of A. Hence as long as C remains hidden, signals cannot be sent from A to B. (See Section 2.4 of ref. 35.)

The conclusion that should be drawn from the example with process \({\sigma }_{AB}^{{\rm{mix}}}\) is that causal influence between quantum nodes should not be defined in terms of the possibilities for signalling afforded by a process operator, at least not if there is a chance that unobserved systems (\(A^{\prime}\) in the example) are interacting with the systems under study (A and B in the example). Given only a process operator σAB and no other data, although signalling is sufficient for causal influence, it can happen that A has a causal influence upon B even though there is no signalling from A to B in σAB.

Quantum causal models

The framework of quantum causal models, introduced in refs. 34,35, is based on the idea that in an example like that just above, statements about causal influence can be defined in terms of signalling, but only once all relevant systems are included in the description. At this point, the description is of a closed system, and at least in standard quantum theory, evolution of a closed system is unitary. Hence quantum causal models define causal influence in terms of unitary transformations. Reference 35 shows that in the case of unitary circuits with broken wires representing quantum nodes, the causal relations between the quantum nodes can be summarized in the form of a DAG, where the DAG imposes constraints on the process operator over the quantum nodes.

These leaves open the question of what the pattern of causal influence might be in causally nonseparable processes described in the literature. Can it even be well defined or must one conclude that these processes are not amenable to causal explanation at all, or that all that can be discussed is signalling between the nodes? Our idea is that such processes can be understood in causal terms, if the framework of quantum causal modelling is extended to allow causal cycles. We will show that the resulting formalism can be used successfully to describe some of the much-studied instances of causally nonseparable processes from the literature. Later, we show the utility of this approach by using it to settle previously open questions concerning causally nonseparable processes.

The following definition generalizes that of refs. 34,35, by allowing cyclic graphs (along with a more minor generalization, which is that the input and output Hilbert spaces of a quantum node can here have different dimensions).

Definition 1 (Quantum causal model (QCM)—generalized) A QCM is given by:

  1. (1)

    a causal structure represented by a directed graph G with vertices corresponding to quantum nodes A1, . . . , An,

  2. (2)

    for each Ai, a quantum channel \({\rho }_{{A}_{i}| Pa({A}_{i})}\in {\mathcal{L}}({{\mathcal{H}}}_{{A}_{i}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{Pa{({A}_{i})}^{{\rm{out}}}}^{* })\), where Pa(Ai) denotes the set of parents of Ai according to G, such that \([{\rho }_{{A}_{i}| Pa({A}_{i})}\,,\,{\rho }_{{A}_{j}| Pa({A}_{j})}]=0\) for all i, j and such that \({\sigma }_{{A}_{1}...{A}_{n}}={\prod }_{i}{\rho }_{{A}_{i}| Pa({A}_{i})}\) is a process operator over the quantum nodes A1, . . . , An.

When writing products of the form \({\prod }_{i}{\rho }_{{A}_{i}| Pa({A}_{i})}\), it is understood implicitly that each factor is padded with an identity operator in tensor product for all other spaces. A QCM is called cyclic iff its causal structure contains directed cycles, and acyclic otherwise.

It is useful to define a term to express the fact that a given process operator σ has the correct form with respect to a given causal structure to define a QCM.

Definition 2 (Quantum Markov condition—generalized) A process \({\sigma }_{{A}_{1}...{A}_{n}}\) is called Markov for a directed graph G with quantum nodes A1, …, An as its vertices iff it admits a factorization into pairwise commuting channels of the form \({\sigma }_{{A}_{1}...{A}_{n}}=\mathop{\prod }\nolimits_{i = 1}^{n}{\rho }_{{A}_{i}| Pa({A}_{i})}\).

Note that the Markov condition of classical causal models47,48 is a special case of Definition 2, obtained when the graph is acyclic and \({\sigma }_{{A}_{1}...{A}_{n}}\) is diagonal in a product basis, and encodes a classical probability distribution35. The following first sets out some further terminology and basic properties of Definition 1 and then turns to motivating and explaining Definition 1, making the link with unitary transformations, and showing why it is that for a particular directed graph, condition (2) should hold.

First, observe that not every cyclic graph supports a QCM in an interesting way. Consider, for example, the two-node cyclic graph of Fig. 2a. A QCM with such a causal structure would come with a process operator

$${\sigma }_{AB}={\rho }_{A| B}\,{\rho }_{B| A}\,.$$

Here and throughout, channels between the nodes on which a process is defined are written such that anything appearing to the right of the bar refers to the output Hilbert space of the node, and anything appearing to the left of the bar refers to the input Hilbert space of the node. By our conventions ρABρBA = ρAB\({\otimes} \)ρBA. However, this is not a valid process operator unless either \({\rho }_{A| B}={\rho }_{{A}^{{\rm{in}}}}\otimes {{\mathbb{1}}}_{{({B}^{{\rm{out}}})}^{* }}\), or \({\rho }_{B| A}={\rho }_{{B}^{{\rm{in}}}}\otimes {{\mathbb{1}}}_{{({A}^{{\rm{out}}})}^{* }}\). In other words, at least one of the channels ρAB, ρBA carries no information, but simply ignores its input and prepares a fixed state on the output. Intuitively speaking, this is because there would otherwise be logical paradoxes for certain choices of interventions at A and B.

Fig. 2: Examples of cyclic directed graphs.
figure 2

a A cyclic directed graph that does not admit a faithful quantum causal model; b A cyclic directed graph that admits a faithful quantum causal model.

More generally, we will say that a QCM is faithful iff each of the channels \({\rho }_{{A}_{i}| Pa({A}_{i})}\) is signalling from \({A}_{j}^{{\rm{out}}}\) to \({A}_{i}^{{\rm{in}}}\) for every AjPa(Ai), i.e.,

$${\rho }_{{A}_{i}| Pa({A}_{i})}\ne \frac{1}{{d}_{j}}{{\rm{Tr}}}_{{\left({A}_{j}^{{\rm{out}}}\right)}^{* }}({\rho }_{{A}_{i}| Pa({A}_{i})})\otimes {{\mathbb{1}}}_{{\left({A}_{j}^{{\rm{out}}}\right)}^{* }},$$

where dj is the dimension of \({A}_{j}^{{\rm{out}}}\). Our claim concerning the causal structure of Fig. 2a can be summarized as:

Proposition 1 There is no faithful cyclic quantum causal model with two nodes.

Proof See Methods.

Now consider the cyclic graph \(G^{\prime}\) in Fig. 2b A QCM with \(G^{\prime}\) as its causal structure comes with the data

$${\sigma }_{ABC}={\rho }_{A| BC}\,{\rho }_{B| AC}\,{\rho }_{C}\,.$$

Equation (5), compared to Eq. (3), has the key difference that the commuting operators have non-trivial action on \({({C}^{{\rm{out}}})}^{* }\). As a result, it turns out that faithful cyclic QCMs of this form do exist. An example is described below.

Note that, given a cyclic graph such as that in Fig. 2b, even when a faithful QCM exists it is not in general the case that any set of commuting channels \({\rho }_{{A}_{i}| Pa({A}_{i})}\) defines a process operator. (See Methods for an explicit demonstration of this fact.) The constraint in the definition of a QCM that \({\sigma }_{{A}_{1}...{A}_{n}}={\prod }_{i}{\rho }_{{A}_{i}| Pa({A}_{i})}\) is a valid process operator is essential, and is what guarantees that grandfather-type paradoxes do not arise3. This is in contrast to the acyclic case, where, given an acyclic causal structure, it is not hard to argue that any product of commuting channels of the form \({\prod }_{i}{\rho }_{{A}_{i}| Pa({A}_{i})}\) is a valid process operator35, hence in particular a faithful QCM with that causal structure can always be found.

Unitarity and causal structure

The definition of a QCM above is predicated on the idea that causal structure should be represented by a directed graph. This idea, however, along with the stipulation that the accompanying process is Markov for the graph, was presented without much justification or further comment. Why is causal structure represented by a directed graph, for example, as opposed to a different mathematical object, such as a partial order, or a preorder, or some kind of hypergraph? This section considers a subclass of processes—unitary processes, defined momentarily—and shows that a unitary process is associated with a causal structure, which can indeed be represented with a directed graph, and that the unitary process is Markov for that graph. In other words, a unitary process, along with its causal structure, defines a QCM.

In order to define a unitary process, observe that a process operator \({\sigma }_{{A}_{1}...{A}_{n}}\) has the mathematical form of the CJ operator for a channel \({\mathcal{P}}:{\mathcal{L}}({\bigotimes }_{i}{{\mathcal{H}}}_{{A}_{i}^{{\rm{out}}}})\to {\mathcal{L}}({\bigotimes }_{i}{{\mathcal{H}}}_{{A}_{i}^{{\rm{in}}}})\)3. Where it is convenient to emphasise this form, we will sometimes write \({\sigma }_{{A}_{1}...{A}_{n}}={\rho }_{{A}_{1}...{A}_{n}| {A}_{1}...{A}_{n}}^{{\mathcal{P}}}\), where it is understood implicitly that an Ai to the right of the bar stands for \({A}_{i}^{{\rm{out}}}\), while an Ai to the left of the bar stands for \({A}_{i}^{{\rm{in}}}\). A unitary process is a process (where some of the input or output spaces may be trivial, i.e., 1-dimensional) such that the channel \({\mathcal{P}}\) is a unitary channel.

The first step is to define a notion of causal structure that pertains to the inputs and outputs of a unitary channel.

Definition 3 (Causal structure of a unitary channel) Given a unitary channel \({\rho }_{CD| AB}^{{\mathcal{U}}}\), write AD (A does not influence D), iff \({{\rm{Tr}}}_{C}[{\rho }_{CD| AB}^{{\mathcal{U}}}]={\rho }_{D| B}^{{\mathcal{M}}}\otimes {{\mathbb{1}}}_{{A}^{* }}\) for some marginal channel \({\mathcal{M}}\). If A can influence D, i.e. ¬(AD), A is a direct cause of D. For any unitary channel \({\rho }_{{C}_{1}...{C}_{l}| {B}_{1}...{B}_{k}}^{{\mathcal{U}}}\) with k input and l output subsystems its causal structure is then the set of causal relations between input and output subsystems and can be represented by a DAG with vertices B1, ..., Bk and C1, ..., Cl and an arrow Bj → Ci whenever Bj is a direct cause of Ci.

This definition (which, given the correspondence between unitary maps U and unitary channels \({\mathcal{U}}(\_)=U(\_){U}^{\dagger }\), we let refer to either) lifts naturally to the case of a unitary process, in such a way that causal relationships are defined between the nodes of the process, rather than between inputs and outputs of a channel.

Definition 4 (Causal structure of a unitary process) Given a unitary process \({\sigma }_{{A}_{1}...{A}_{n}}={\rho }_{{A}_{1}...{A}_{1}| {A}_{1}...{A}_{n}}^{{\mathcal{U}}}\), write AjAi (node Aj does not influence node Ai), iff \({A}_{j}^{{\rm{out}}}\) does not influence \({A}_{i}^{{\rm{in}}}\) in \({\mathcal{U}}\). If node Aj can influence node Ai, then Aj is a direct cause of Ai. The causal structure of the unitary process is the set of all causal relations between its quantum nodes, and is representable as the directed graph with vertices A1, ..., An and an arrow Aj → Ai, whenever Aj is a direct cause of Ai.

The fact that any unitary process is Markov for its causal structure, hence defines a QCM, is then immediate from the following theorem of refs. 34,35.

Theorem 1 (References 34,35) Given a unitary channel \({\rho }_{{C}_{1}...{C}_{l}| {B}_{1}...{B}_{k}}^{{\mathcal{U}}}\), let \({\{Pa({C}_{i})\}}_{i = 1}^{l}\) be the parental sets as defined by its causal structure. Then the CJ operator factorizes as \({\rho }_{{C}_{1}...{C}_{l}| {B}_{1}...{B}_{k}}^{{\mathcal{U}}}=\mathop{\prod }\nolimits_{i = 1}^{l}{\rho }_{{C}_{i}| Pa({C}_{i})}\), where the marginal channels commute pairwise, \([{\rho }_{{C}_{i}| Pa({C}_{i})}\,,\,{\rho }_{{C}_{j}| Pa({C}_{j})}]=0\) for all i,j.

The case of non-unitary processes, and their relationship to causal structure is presented below. First, we describe a well-known example of a causally nonseparable process—the quantum SWITCH2—and show explicitly that it defines a unitary process operator with cyclic causal structure, hence a cyclic QCM.

Example: the quantum SWITCH

The quantum SWITCH2 was the first example described of a causally non-separable process. The SWITCH is standardly defined as a higher-order map2,54,55 that takes as input two CP maps \({{\mathcal{F}}}_{A}:{\mathcal{L}}({{\mathcal{H}}}_{{A}^{{\rm{in}}}})\to {\mathcal{L}}({{\mathcal{H}}}_{{A}^{{\rm{out}}}})\) and \({{\mathcal{G}}}_{B}:{\mathcal{L}}({{\mathcal{H}}}_{{B}^{{\rm{in}}}})\to {\mathcal{L}}({{\mathcal{H}}}_{{B}^{{\rm{out}}}})\), where \({d}_{{A}^{{\rm{in}}}}={d}_{{A}^{{\rm{out}}}}={d}_{{B}^{{\rm{in}}}}={d}_{{B}^{{\rm{out}}}}=d\), and gives as an output a CP map \({\mathcal{E}}:{\mathcal{L}}({{\mathcal{H}}}_{Q}\otimes {{\mathcal{H}}}_{S})\to {\mathcal{L}}({{\mathcal{H}}}_{Q^{\prime} }\otimes {{\mathcal{H}}}_{S^{\prime} })\), where \({d}_{Q}={d}_{Q^{\prime} }=2\) and \({d}_{S}={d}_{S^{\prime} }=d\). Here, \({{\mathcal{H}}}_{Q}\) and \({{\mathcal{H}}}_{Q^{\prime} }\) are interpreted as the Hilbert spaces of a control qubit at some initial and some final time, respectively, and \({{\mathcal{H}}}_{S}\) and \({{\mathcal{H}}}_{S^{\prime} }\) as the Hilbert spaces of some target system at the same two times. Intuitively, the effect of the quantum SWITCH is to transform the target system from the initial to the final time by the sequential application of the CP maps \({{\mathcal{F}}}_{A}\) and \({{\mathcal{G}}}_{B}\), where the order in which the two CP maps are applied is conditioned coherently on the logical value of the control qubit.

To formulate this precisely, we will describe the quantum SWITCH directly as a 4-node process (see Fig. 3), which involves the nodes A and B, where \({{\mathcal{F}}}_{A}\) and \({{\mathcal{G}}}_{B}\) are inserted, a node P with Pout = QS, where the control qubit and target system at the initial time are prepared in some state, and node F with \({F}^{{\rm{in}}}=Q^{\prime} S^{\prime}\), where the control qubit and the system at the final time are subject to some measurement. The SWITCH is then a unitary four-partite process with process operator \({\sigma }_{ABPF}^{{\rm{SWITCH}}}={\rho }_{ABF| ABP}^{{\mathcal{U}}}=\left|W\right\rangle \left\langle W\right|\), where

$$\left|W\right\rangle := \,{\left|0\right\rangle }_{{Q}^{* }}{\left|0\right\rangle }_{Q^{\prime} }{\left|{\phi }^{+}\right\rangle }_{{S}^{* }{A}^{{\rm{in}}}}{\left|{\phi }^{+}\right\rangle }_{\left({A}^{{\left.{\rm{out}}\right)}^{* }}\right.{B}^{{\rm{in}}}}{\left|{\phi }^{+}\right\rangle }_{{({B}^{{\rm{out}}})}^{* }S^{\prime} }\\ \,+\, {\left|1\right\rangle }_{{Q}^{* }}{\left|1\right\rangle }_{Q^{\prime} }{\left|{\phi }^{+}\right\rangle }_{{S}^{* }{B}^{{\rm{in}}}}{\left|{\phi }^{+}\right\rangle }_{{({B}^{{\rm{out}}})}^{* }{A}^{{\rm{in}}}}{\left|{\phi }^{+}\right\rangle }_{{({A}^{{\rm{out}}})}^{* }S^{\prime} },$$

with \({\left|{\phi }^{+}\right\rangle }_{XY}:= {\sum }_{i}{\left|i\right\rangle }_{X}{\left|i\right\rangle }_{Y}\) and the appearance of the dual spaces due to our convention for the CJ isomorphism. It is straightforward to verify that the causal structure of \({\sigma }_{ABPF}^{{\rm{SWITCH}}}\) is the cyclic directed graph in Fig. 4. From Theorem 1, it follows that

$${\sigma }_{ABPF}^{{\rm{SWITCH}}}={\rho }_{F| ABP}\,{\rho }_{A| BP}\,{\rho }_{B| AP}\,{\rho }_{P}\,,$$

where we have formally added ρP to make the Markovianity of \({\sigma }_{ABPF}^{{\rm{SWITCH}}}\) for GSWITCH explicit, but here ρP is just the number 1, since Pin is trivial. Hence, the graph GSWITCH together with ρFABP, ρABP, ρBAP, ρP, form a faithful cyclic QCM.

Fig. 3: A unitary process with nodes A and B, root node P (in the global past) and leaf node F (in the global future).
figure 3

Here with Pout = QS and \({F}^{{\rm{in}}}=Q^{\prime} S^{\prime}\); the quantum SWITCH is an example of such a process.

Fig. 4: The causal structure GSWITCH of the quantum SWITCH.
figure 4

Any unitary process of the form as in Fig. 3 has a causal structure that is a subgraph of GSWITCH.

Compatibility vs Markovianity

This section extends the discussion of causal structure to non-unitary processes. Briefly, in a QCM involving a non-unitary process σ, the arrows of the graph are taken to represent facts about the causal structure of some underlying unitary process, with the property that σ is recovered from the unitary process when marginalizing over auxiliary systems. The auxiliary systems take the form of a final system F, along with uncorrelated local disturbances, where the latter are inputs to the unitary process in a direct product state, with the property that each of them is a direct cause of at most one of the nodes of σ. As we shall show, it then follows that the process σ is Markov for the graph.

The following was introduced in ref. 51 (there under the name purifiability), and will help make these ideas precise.

Definition 5 (Unitary extendibility) A process \({\sigma }_{{A}_{1}...{A}_{n}}\) is called unitarily extendible iff there exists a unitary process \({\sigma }_{{A}_{1}...{A}_{n}PF}={\rho }_{{A}_{1}...{A}_{n}F| {A}_{1}...{A}_{n}P}^{{\mathcal{U}}}\) on the quantum nodes A1, …, An, plus additional root node P and leaf node F, such that \({\sigma }_{{A}_{1}...{A}_{n}}={{\rm{Tr}}}_{FP}[{\sigma }_{{A}_{1}...{A}_{n}PF}\,{\tau }_{P}]\) for some state \({\tau }_{P}\in {\mathcal{L}}({{\mathcal{H}}}_{{P}^{{\rm{out}}}}^{* })\). The process \({\sigma }_{{A}_{1}...{A}_{n}PF}\) is called a unitary extension of \({\sigma }_{{A}_{1}...{A}_{n}}\).

It was found in ref. 51 that not all process operators are unitarily extendible. The reason for this is that, although for any process \({\sigma }_{{A}_{1}...{A}_{n}}={\rho }_{{A}_{1}...{A}_{n}| {A}_{1}...{A}_{n}}^{{\mathcal{P}}}\), corresponding to a channel \({\mathcal{P}}\), the channel \({\mathcal{P}}\) admits a dilation to a unitary channel, this unitary channel does not necessarily correspond to a valid process itself. Process operators that are not unitarily extendible are those for which no dilation exists such that the unitary channel corresponds to a valid process.

Now suppose that a process \({\sigma }_{{A}_{1}...{A}_{n}}\) does have a unitary extension \({\sigma }_{{A}_{1}...{A}_{n}PF}\), involving the additional root node P. As per Def. 4, the unitary extension \({\sigma }_{{A}_{1}...{A}_{n}PF}\) has a causal structure given by some directed graph G with nodes A1, ..., An, P, F. Let \(G^{\prime}\) be the subgraph with nodes A1, ... ,An, along with all arrows that connect only these nodes in G. In general, in the graph G, the node P will have arrows to several of the Ai, meaning that P is a common cause for these nodes. There will then, in general, be correlations in σ that are explained by the common cause P. This means that the graph \(G^{\prime}\), which omits P, is at best an incomplete causal explanation for the correlations in σ, since it does not explain those correlations due to P. In this case, there is no reason why σ should be Markov for the graph \(G^{\prime}\).

Consider now a unitary extension of \({\sigma }_{{A}_{1}...{A}_{n}}\) with the feature that the node P can be factored into uncorrelated local disturbances λi, such that each λi is a direct cause of at most one of the nodes Ai. In this case, the graph \(G^{\prime}\), obtained by omitting all of the λi and leaf node F, can be seen as a causal explanation for correlations described by the process \({\sigma }_{{A}_{1}...{A}_{n}}\), which omits only local disturbances and the final effect F, and which does not omit common causes. In this case, we will say that σ is compatible with the graph \(G^{\prime}\). In fact, it is more useful to define this term more broadly: we will say that σ is compatible with any graph, with nodes A1, ..., An, that contains \(G^{\prime}\) as a subgraph. The following definition makes this precise, generalizing that of ref. 35 to the cyclic case.

Definition 6 (Compatibility with a directed graph) A process \({\sigma }_{{A}_{1}...{A}_{n}}\) is compatible with a directed graph G with nodes A1, ..., An, iff \({\sigma }_{{A}_{1}...{A}_{n}}\) is extendible to a unitary process \({\sigma }_{{A}_{1}...{A}_{n}{\lambda }_{1}...{\lambda }_{n}F}\), with an extra root node λi for i = 1, ..., n and an extra leaf node F, such that:

  1. (1)

    there exists a product state \({\tau }_{{\lambda }_{1}}\otimes \cdots \otimes {\tau }_{{\lambda }_{n}}\) with \({\tau }_{{\lambda }_{i}}\in {\mathcal{L}}({{\mathcal{H}}}_{{\lambda }_{i}^{{\rm{out}}}}^{* })\) such that \({\sigma }_{{A}_{1}...{A}_{n}}={{\rm{Tr}}}_{{\lambda }_{1}...{\lambda }_{n}F}[\,{\sigma }_{{A}_{1}...{A}_{n}{\lambda }_{1}...{\lambda }_{n}F}\,({\tau }_{{\lambda }_{1}}\otimes \cdots \otimes {\tau }_{{\lambda }_{n}})]\),

  2. (2)

    \({\sigma }_{{A}_{1}...{A}_{n}{\lambda }_{1}...{\lambda }_{n}F}\) satisfies the following no-influence conditions (with Pa(Ai) referring to G): \({\{{A}_{j} \nrightarrow {A}_{i}\}}_{{A}_{j}\notin Pa({A}_{i})}\,,\,{\{{\lambda }_{j} \nrightarrow {A}_{i}\}}_{j\ne i}\).

The following then justifies the stipulation, as a part of the definition of a QCM, that the process accompanying a graph is Markov for the graph.

Theorem 2 If a process \({\sigma }_{{A}_{1}...{A}_{n}}\) is compatible with the directed graph G, then it is also Markov for G.

Proof Similarly to the acyclic case in ref. 35, the theorem follows essentially from Theorem 1: the unitary extension, asserted to exist by virtue of the assumed compatibility with G, has to factorize into pairwise commuting operators of the form \({\sigma }_{{A}_{1}...{A}_{n}{\lambda }_{1}...{\lambda }_{n}F}={\rho }_{F| {A}_{1}...{A}_{n}{\lambda }_{1}...{\lambda }_{n}}({\prod }_{i}{\rho }_{{A}_{i}| Pa({A}_{i}){\lambda }_{i}})\). This yields \({\sigma }_{{A}_{1}\cdots {A}_{n}}={\prod }_{i}{{\rm{Tr}}}_{{\lambda }_{i}}[\,{\rho }_{{A}_{i}| Pa({A}_{i}){\lambda }_{i}}\,{\tau }_{{\lambda }_{i}}\,]\), where the factors \({\rho }_{{A}_{i}| Pa({A}_{i})}:= {{\rm{Tr}}}_{{\lambda }_{i}}[\,{\rho }_{{A}_{i}| Pa({A}_{i}){\lambda }_{i}}\,{\tau }_{{\lambda }_{i}}]\) are pairwise commuting operators.

Reference 35 also establishes a converse to this result, for the case that G is acyclic. For a general directed graph G, however, the same proof does not suffice since, even though a dilation to a unitary channel with the required causal constraints can always be found35, it is not immediate whether this channel can be guaranteed to define a valid process. We pose this as a hypothesis:

Hypothesis 1 If a process \({\sigma }_{{A}_{1}...{A}_{n}}\) is Markov for a directed graph G, then it is compatible with G.

The hypothesis is satisfied by all examples that we have investigated, but we do not have a proof that it is true in general. Some consequences of the validity or otherwise of this hypothesis are discussed in the “Conclusions” section.

Example: a process that violates a causal inequality

While the quantum SWITCH is causally nonseparable, the correlations that can be established between operations at the nodes of the quantum SWITCH can always be obtained by a causally separable process with sufficiently large input and output dimensions at each node6,7. There are, however, causally nonseparable processes that can produce correlations violating causal inequalities3,7,9,12,56,57,58,59,60, which are incompatible with the existence of a definite order between the nodes irrespectively of the types of systems or operations performed at those nodes4,7,61. In the literature such processes are called noncausal7.

An example of a tripartite noncausal process is the one which was found by Araújo and Feix (AF) and then published and further studied by Baumeler and Wolf in refs. 12,62. It is remarkable in that the process is both classical and deterministic (see below for further discussion of classical processes). Any classical process can be viewed as a quantum process, diagonal with respect to a product basis. The AF process, viewed as a quantum process on nodes A, B and C, each with two-dimensional input and output Hilbert spaces, is described by the process operator

$${\sigma }_{ABC}^{{\rm{AF}}}={\rho }_{A| BC}\,{\rho }_{B| CA}\,{\rho }_{C| AB},$$


$${\rho }_{A| BC}=\mathop{\sum }_{b,c = 0,1}\left|{\neg} b\wedge c\right\rangle \langle {\neg} b\wedge c{\left.\right|}_{{A}^{{\rm{in}}}}\otimes | b,c\rangle {\left\langle b,c\right|}_{{\left({B}^{{\rm{out}}}{C}^{{\rm{out}}}\right)}^{* }},$$
$${\rho }_{B| CA}=\sum _{c,a = 0,1}\left|{\neg}\ c\wedge a\right\rangle \langle {\neg}\ c\wedge a{\left.\right|}_{{B}^{{\rm{in}}}}\otimes | c,a\rangle {\left\langle c,a\right|}_{{({C}^{{\rm{out}}}{A}^{{\rm{out}}})}^{* }},$$
$${\rho }_{C| AB}=\sum _{a,b = 0,1}\left|{\neg}\ a\wedge b\right\rangle \langle {\neg}\ a\wedge b{\left.\right|}_{{C}^{{\rm{in}}}}\otimes | a,b\rangle {\left\langle a,b\right|}_{{({A}^{{\rm{out}}}{B}^{{\rm{out}}})}^{* }}.$$

As is explicit in this description, the AF process together with the causal structure in Fig. 5 defines a faithful cyclic QCM.

Fig. 5: The causal structure of the AF process.
figure 5

The fully connected directed graph with three nodes A, B and C.

It was shown by Baumeler and Wolf (BW)62 that this process is unitarily extendible (also see refs. 27,51) with a unitary extension given by

$${\sigma }_{ABCFP}^{{\rm{BW}}}={\rho }_{ABCF| ABCP}^{U}\,,$$

where the output space of the root node P is a tensor product of three qubits \({{\mathcal{H}}}_{{P}^{{\rm{out}}}}={{\mathcal{H}}}_{{\lambda }_{A}}\otimes {{\mathcal{H}}}_{{\lambda }_{B}}\otimes {{\mathcal{H}}}_{{\lambda }_{C}}\) and the unitary U is defined by the following bijection of orthonormal bases:

$$U: \,{\left|a,b,c\right\rangle }_{{A}^{{\rm{out}}}{B}^{{\rm{out}}}{C}^{{\rm{out}}}}\,\otimes \,{\left|l,m,n\right\rangle }_{{\lambda }_{A}{\lambda }_{B}{\lambda }_{C}}\,\\ \mapsto \,{\left|l\oplus ({\neg}b\wedge c),m\oplus ({\neg}c\wedge a),n\oplus ({\neg}a\wedge b)\right\rangle }_{{A}^{{\rm{in}}}{B}^{{\rm{in}}}{C}^{{\rm{in}}}} \\ \otimes \,{\left|a,b,c\right\rangle }_{{F}^{{\rm{in}}}}\,.$$

The original AF process is recovered for marginalization over F and feeding in the product state \(\left|0,0,0\right\rangle\) for λA, λB and λC. Formally letting the latter three define distinct root nodes λA, λB and λC, it is not too hard to show that this BW unitary extension also satisfies the corresponding causal constraints of Definition 13 to establish \({\sigma }_{ABC}^{{\rm{AF}}}\) to be compatible with the graph of Fig. 5– in keeping with Hypothesis 1.

Cyclicity and extended circuit diagrams

An essential feature of the Markov condition in Definition 2 is the pairwise commutation relation of the operators of the form \({\rho }_{{A}_{i}| Pa({A}_{i})}\), where the parental sets in general overlap. That two commuting operators act non-trivially on the same Hilbert space has consequences for the algebraic structure of the operators and leads to an intimate link between causal and compositional structure.

In order to exemplify the fruitfulness of studying this link the following will revisit the two examples from earlier.

The quantum SWITCH can be considered as a unitary process over 4 nodes, given by \({\sigma }_{ABPF}^{{\rm{SWITCH}}}={\rho }_{ABF| ABP}^{{\mathcal{U}}}=\left|W\right\rangle \left\langle W\right|\), where \(\left|W\right\rangle\) is defined in Eq. (6). The unitary channel \({\mathcal{U}}\) corresponds to a unitary map \(U:{{\mathcal{H}}}_{{A}^{{\rm{out}}}}\otimes {{\mathcal{H}}}_{{P}^{{\rm{out}}}}\otimes {{\mathcal{H}}}_{{B}^{{\rm{out}}}}\to {{\mathcal{H}}}_{{A}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{F}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{B}^{{\rm{in}}}}\), which is depicted in Fig. 6 together with its causal structure shown in blue. Observe in particular that in U, Aout does not influence Ain, and similarly Bout does not influence Bin, as must be the case for a well-defined process18.

Fig. 6: Unitary map U that defines the quantum SWITCH.
figure 6

The causal structure of U is indicated in blue.

Reference 63 shows that any unitary map U with three in- and output systems, and the causal constraints of Fig. 6, has a decomposition of the following form:

$$U=\left({{\mathbb{1}}}_{{B}^{{\rm{in}}}}\otimes T\otimes {{\mathbb{1}}}_{{A}^{{\rm{in}}}}\right)\left(\mathop{\bigoplus }_{i\in I}{V}_{i}\otimes {W}_{i}\right)\left({{\mathbb{1}}}_{{A}^{{\rm{out}}}}\otimes S\otimes {{\mathbb{1}}}_{{B}^{{\rm{out}}}}\right),$$

where S and T are unitaries, and \({\{{V}_{i}\}}_{i\in I}\) and \({\{{W}_{i}\}}_{i\in I}\) families of unitaries of the form

$$S\ :\,{{\mathcal{H}}}_{{P}^{{\rm{out}}}}\,\to \,\mathop{\bigoplus }_{i\in I}{{\mathcal{H}}}_{{P}_{i}^{L}}\otimes {{\mathcal{H}}}_{{P}_{i}^{R}}\,,$$
$${V}_{i}\ :\,{{\mathcal{H}}}_{{A}^{{\rm{out}}}}\otimes {{\mathcal{H}}}_{{P}_{i}^{L}}\,\to \,{{\mathcal{H}}}_{{B}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{F}_{i}^{L}}\,,$$
$${W}_{i}:\,{{\mathcal{H}}}_{{P}_{i}^{R}}\otimes {{\mathcal{H}}}_{{B}^{{\rm{out}}}}\,\to \,{{\mathcal{H}}}_{{F}_{i}^{R}}\otimes {{\mathcal{H}}}_{{A}^{{\rm{in}}}}\,,$$
$$T\ :\mathop{\bigoplus }_{i\in I}{{\mathcal{H}}}_{{F}_{i}^{L}}\otimes {{\mathcal{H}}}_{{F}_{i}^{R}}\,\to \,{{\mathcal{H}}}_{{F}^{{\rm{in}}}}\,.$$

Such a compositional structure with direct sums over tensor products goes beyond what is expressible with ordinary circuit diagrams. Reference 63 therefore introduced extended circuit diagrams to give a graphical representation of such decompositions. Figure 7 arises from that extended circuit diagram representation of Eq. (14) by bending the wires corresponding to Ain and Bin down to re-identify the quantum nodes A and B—thereby filling the black box of the quantum SWITCH from Fig. 3. For details on this diagrammatic language, we refer the reader to ref. 63, but the essential idea is that individual wires with indices on them, such as those between the circles S and Vi and Wi, respectively, represent the families of Hilbert spaces \({\{{{\mathcal{H}}}_{{P}_{i}^{L}}\}}_{i\in I}\) and \({\{{{\mathcal{H}}}_{{P}_{i}^{R}}\}}_{i\in I}\), while the two parallel wires together represent \({\bigoplus }_{i\in I}{{\mathcal{H}}}_{{P}_{i}^{L}}\otimes {{\mathcal{H}}}_{{P}_{i}^{R}}\). An implicit summation over orthogonal subspaces indexed by i allows the representation of the intermediate unitary map iViWi from Eq. (14).

Fig. 7: Extended circuit diagram decomposition of the quantum SWITCH.
figure 7

Additionally indicated in grey are the labels of the intermediate families of Hilbert spaces.

It is easy to see what this decomposition is concretely in the case of the quantum SWITCH: the index i takes two values, 0 and 1, corresponding to the logical values of the control qubit, i.e., \({{\mathcal{H}}}_{P}={{\mathcal{H}}}_{Q}\otimes {{\mathcal{H}}}_{S}\cong ({\mathbb{C}}\otimes {{\mathcal{H}}}_{S})\oplus ({{\mathcal{H}}}_{S}\otimes {\mathbb{C}})\) and the unitaries Vi and Wi are either the SWAP transformation on the respective systems or the identity depending on i. We see that even though the causal structure of the full process is cyclic, the process splits into a direct sum of processes in each of which causal influence and the flow of information follow acyclic paths.

This decomposition of the quantum SWITCH applies more generally: seeing as any unitary process of the type depicted in Fig. 3, with a root node P, a leaf node F, and two nodes A and B in between, satisfies AoutAin and BoutBin, it follows that any such unitary process has a decomposition as in Fig. 7. Note that the below will furthermore establish (as a direct consequence of the proof of Theorem 3) that for each i the summand ViWi of that corresponding decomposition has to have an acyclic causal structure, that is, any unitary process with nodes A, B, P, F where P is a root node and F a leaf node, is a direct sum of unitary processes in which causal influences flow along acyclic paths.

The second example concerns the tripartite AF process and its BW unitary extension \({\rho }_{ABCF| ABCP}^{U}\) (see Eqs. (12)–(13)).

The root node P has as output space \({{\mathcal{H}}}_{{P}^{{\rm{out}}}}={{\mathcal{H}}}_{{\lambda }_{A}}\otimes {{\mathcal{H}}}_{{\lambda }_{B}}\otimes {{\mathcal{H}}}_{{\lambda }_{C}}\), where each λX influences only X and F for X = A, B, C. The associated unitary map U and its causal structure are depicted in Fig. 8.

Fig. 8: The unitary map U from Eq. (13) that defines the BW unitary extension of the AF process.
figure 8

Also depicted is its causal structure, where for better visibility, rather than direct cause relations, the no-influence conditions are shown as red dashed arrows.

The results from ref. 63 allow again the statement of an extended circuit decomposition of U, which is implied by its causal structure and which makes the pathways of causal influence through U graphically evident (the proof is completely analogous to that of Theorem 7 in ref. 63).

This decomposition of U is depicted in Fig. 9 and reads:

$$U= \,\left({{\mathbb{1}}}_{{C}^{{\rm{in}}}{B}^{{\rm{in}}}{A}^{{\rm{in}}}}\otimes W\right)\,\left(\mathop{\bigoplus }_{i,j,k}{P}_{ij}\otimes {Q}_{ik}\otimes {R}_{jk}\right)\\ \left({{\mathbb{1}}}_{{\lambda }_{C}}\otimes S\otimes {{\mathbb{1}}}_{{\lambda }_{B}}\otimes T\otimes V\otimes {{\mathbb{1}}}_{{\lambda }_{A}}\right)\,,$$

for (families of) unitary maps

$$S\ :\,{{\mathcal{H}}}_{{A}^{{\rm{out}}}}\,\to \,\mathop{\bigoplus }_{i}{{\mathcal{H}}}_{{X}_{i}^{L}}\otimes {{\mathcal{H}}}_{{X}_{i}^{R}}\,,$$
$$T\ :\,{{\mathcal{H}}}_{{B}^{{\rm{out}}}}\,\to \,\mathop{\bigoplus }_{j}{{\mathcal{H}}}_{{Y}_{j}^{L}}\otimes {{\mathcal{H}}}_{{Y}_{j}^{R}}\,,$$
$$V\ :\,{{\mathcal{H}}}_{{C}^{{\rm{out}}}}\,\to \,\mathop{\bigoplus }_{k}{{\mathcal{H}}}_{{Z}_{k}^{L}}\otimes {{\mathcal{H}}}_{{Z}_{k}^{R}}\,,$$
$$W\ :\,\mathop{\bigoplus }_{i,j,k}{{\mathcal{H}}}_{{G}_{ij}^{(1)}}\otimes {{\mathcal{H}}}_{{G}_{ik}^{(2)}}\otimes {{\mathcal{H}}}_{{G}_{jk}^{(3)}}\,\to \,{{\mathcal{H}}}_{{F}^{{\rm{in}}}}\,,$$
$${P}_{ij}\ :\,{{\mathcal{H}}}_{{\lambda }_{C}}\otimes {{\mathcal{H}}}_{{X}_{i}^{L}}\otimes {{\mathcal{H}}}_{{Y}_{j}^{L}}\,\to \,{{\mathcal{H}}}_{{C}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{G}_{ij}^{(1)}}\,,$$
$${Q}_{ik}\ :\,{{\mathcal{H}}}_{{X}_{i}^{R}}\otimes {{\mathcal{H}}}_{{\lambda }_{B}}\otimes {{\mathcal{H}}}_{{Z}_{k}^{L}}\,\to \,{{\mathcal{H}}}_{{B}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{G}_{ik}^{(2)}}\,,$$
$${R}_{jk}\ :\,{{\mathcal{H}}}_{{Y}_{j}^{R}}\otimes {{\mathcal{H}}}_{{Z}_{k}^{R}}\otimes {{\mathcal{H}}}_{{\lambda }_{A}}\,\to \,{{\mathcal{H}}}_{{A}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{G}_{jk}^{(3)}}\,.$$
Fig. 9: Causally faithful extended circuit decomposition of the unitary U from Fig. 8.
figure 9

(Up to some swaps for better readability).

By appropriately bending the wires that correspond to Ain, Bin and Cin to re-identify the nodes A, B and C (and swapping some wires for better readability) one obtains Fig. 10, revealing a fine-grained compositional structure of the BW unitary extension.

Fig. 10: Extended circuit diagram decomposition of the BW unitary extension of the AF process.
figure 10

Compared to Fig. 9 the wires for Ain, Bin and Cin are bent around (as well as some swaps inserted for better visibility).

Note that the stated decomposition is general in the sense that a decomposition of the form as in Fig. 9 exists for any unitary with a causal structure as in Fig. 8. However, in the concrete case of the BW unitary extension one can easily see what the components in Eq. (19) correspond to through a comparison with Eq. (13). All three indices i, j and k are binary and, via the unitaries S, T and V can be seen to correspond to one-dimensional subspaces of Aout, Bout and Cout. Hence, each indexed space, i.e., each element of a family of Hilbert spaces associated with an indexed wire, is a trivial Hilbert space. For any fixed value (i, j, k), the unitary PijQikRjk is of the type \({{\mathcal{H}}}_{{\lambda }_{C}}\otimes {{\mathcal{H}}}_{{\lambda }_{B}}\otimes {{\mathcal{H}}}_{{\lambda }_{A}}\to {{\mathcal{H}}}_{{C}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{B}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{A}^{{\rm{in}}}}\), where all spaces are qubits (suppressing all trivial spaces). The unitary \({P}_{ij}:{{\mathcal{H}}}_{{\lambda }_{C}}\to {{\mathcal{H}}}_{{C}^{{\rm{in}}}}\) maps \(\left|{\lambda }_{C}\right\rangle \,\mapsto\, \left|{\lambda }_{C}\oplus ({\neg}\ i\wedge j)\right\rangle\), i.e., Pij is the identity or the NOT gate depending on the values of i and j. The unitaries Qik and Rjk can similarly be identified through comparison with Eq. (13).

One thus finds that the BW unitary extension is a direct sum over unitary processes each of which has an acyclic causal structure. Furthermore, it is natural to wonder whether knowing a decomposition of the form as in Fig. 8 might suggest a way in which the process could be implemented—a process, which we recall is one that violates a causal inequality.

How about other unitary processes not of the two presented types? Reference 63 provides extended circuit decompositions for many classes of unitary transformations, where the decompositions are causally faithful, meaning that if A is an input to the unitary U and B an output, then there is a path from A to B in the extended circuit iff A can influence B through U (note this is distinct from the notion of faithfulness of a QCM). Consider now a unitary map U that corresponds to a unitary process, in the sense that the output Hilbert spaces of the nodes correspond to the inputs to U, and the input Hilbert spaces of the nodes correspond to the outputs of U. If U has a causally faithful extended circuit decomposition, then by appropriately bending the wires, as in the above examples, one can always obtain a fine-grained compositional structure of the corresponding unitary process. Reference 63 states the hypothesis that all finite-dimensional unitary transformations (over-specified tensor products of input Hilbert spaces and output Hilbert spaces) have a causally faithful extended circuit decomposition. This would mean that all unitary processes, by bending the wires, would admit causally faithful decompositions in a similar manner. At the time of writing, however, the hypothesis remains unproven.

The bipartite unitarily extendible processes

Understanding which processes have a physical realization is a central open question in the field of indefinite causal order18,51. While causally nonseparable processes may have a realization in exotic scenarios involving both quantum systems and gravity, it seems clear that any present-day laboratory experiment admits a description in terms of a straightforward, definite, causal ordering of suitably defined parts of the experiment. Nevertheless, various experiments have been performed that are claimed as realizations of nonseparable processes such as the quantum SWITCH28,29,30,31,32,64. This has caused some debate18,49,65.

Behind much of this debate, however, lies merely a question of how the abstract mathematical description is assumed to map to physical phenomena. Each of the implementations claimed so far is of a process that involves coherent control over the time-ordering of nodes in a similar manner to the SWITCH, and which cannot therefore violate causal inequalities. Reference 18 shows that any such implementation can be seen as a valid implementation of a nonseparable process, if the process is understood as being defined over time-delocalized systems, where the input and output Hilbert spaces of the nodes of the process correspond to subsystems of tensor products of Hilbert spaces of systems associated with different times. This raises the question: which processes in general admit a laboratory implementation, at least in terms of time-delocalized systems? In particular, can a process violating causal inequalities be implemented?

There was some hope that a process violating causal inequalities could be implemented, because ref. 18 also shows that every unitary extension of a bipartite process has a realization in terms of time-delocalized systems. Hence if there were a unitarily extendible bipartite process violating causal inequalities, then it could be implemented, at least via time-delocalized systems. The following theorem, however, shows that there is no such possibility. Any bipartite unitarily extendible process is causally separable, hence, in particular, cannot violate causal inequalities, as conjectured in ref. 51; furthermore, all unitary extensions of bipartite processes are variations of the quantum SWITCH, realizable by coherent control of the times of the operations of A and B. The argument uses the existence of a faithful extended circuit decomposition of the form as in Eq. (14) that is implied by the causal constraints of Fig. 6.

Theorem 3 All unitarily extendible bipartite processes are causally separable. Given a bipartite process, if it is unitarily extendible, then the unitary extension has a realization in terms of coherent control of the order of the node operations.

Proof See Methods.

As one can see, e.g., from the AF process, being unitarily extendible does not imply causal separability in the general multipartite case. However, the decomposition from Fig. 8 of the BW unitary extension of the AF process proved insightful with regards to how the cyclicity of the causal structure comes from different contributions across the direct sum. More generally, suppose a causally faithful extended circuit decomposition of the unitary extension of some multipartite process is known. It is then natural to ask whether some kind of generalization of the constraints established as part of the proof of Theorem 3 could be derived, which in the bipartite case just happen to give causal separability, while in the general case constrain each summand of the decomposition. As is the case with the bipartite processes, to which Theorem 3 applies, one would expect that such constraints on summands of the unitary extension also manifest themselves in interesting ways for the non-unitary marginal process. We leave this question for future investigation.

Causal nonseparability

The definition of causal separability was given above only for bipartite processes and it was mentioned that the multipartite case, with more than two nodes, is more intricate. This section will first give the general definition, following refs. 7,23, and then present another main result.

Seeing as the idea of causal separability is to capture whether a process is consistent with our intuitions on causal order, it is natural to let it incorporate the following two features. First, in addition to probabilistic mixtures of fixed orders of nodes it allows for a dynamical causal order of them, that is, the overall causal order of some nodes need not be fixed, but may depend on what happens at some earlier nodes. Second, it demands that causal separability is preserved under extending the process with an arbitrary ancillary input state shared between the nodes (a property called extensibility7). A process is thus causally separable if, upon considering arbitrary shared entanglement between auxiliary input systems to all nodes, the resulting extended process can be seen to arise from a probabilistic mixture of particular processes: for each there is a node P in the past such that for all possible interventions at P the marginal process has a fixed causal order, or more generally, is itself again causally separable. Hence, one ends up with an iterative definition of the concept. This notion was originally called extensible causal separability in ref. 7 to distinguish it from the analogous concept without extensibility, but as it is undoubtedly the more natural concept, we here refer to it simply as causal separability, as in ref. 23. (Note, there have been two equivalent definitions of that notion23, which differ by whether extensibility is imposed at the level of the full process7 or at each level of the iteration23. For the present purposes, it is convenient to use the latter one.) Finally, making the concept precise relies on the following notion of no-signalling in a process, which, along with various equivalent statements, was given in ref. 7.

Definition 7 (No signalling in a process) Given a process \({\sigma }_{{A}_{1}...{A}_{n}}\), we say that there is no signalling from a subset S {A1, …, An} of its nodes to the complementary subset \(\overline{S}:= \{{A}_{1},\ldots ,{A}_{n}\}\!\!\setminus \!\!S\), iff the probabilities \(P({k}_{\overline{S}})={\rm{Tr}}\left[{\sigma }_{{A}_{1}...{A}_{n}}\left({\tau }_{\overline{S}}^{{k}_{\overline{S}}}\otimes {\tau }_{S}\right)\right]\) for the outcomes of any operation \({\tau }_{\overline{S}}^{{k}_{\overline{S}}}={\bigotimes }_{A\in \overline{S}}{\tau }_{A}^{{k}_{A}}\) performed at \(\overline{S}\) are independent of the choice of trace-preserving operations τS = ASτA performed at S.

Now let \({\tau }_{{A}_{j}}\) represent a CP map at the node Aj, which is not necessarily trace-preserving. If there is no signalling to a node Aj from {A1, …, An}{Aj}, then for any \({\tau }_{{A}_{j}}\), the object \({{\rm{Tr}}}_{{A}_{j}}[{\sigma }_{{A}_{1}...{A}_{n}}{\tau }_{{A}_{j}}]\) is proportional to a process operator. In this case, let \(\sigma {| }_{{\tau }_{{A}_{j}}}\) be the corresponding correctly normalized process operator. We refer to \(\sigma {| }_{{\tau }_{{A}_{j}}}\) as a conditional process. We can now state the formal definition of causal separability.

Definition 8 (Causal separability23) Every single-node process is causally separable. For n ≥ 2, a process σ on n quantum nodes A1, …, An is said to be causally separable, iff, for any extension of each node Aj with an additional input system \({{\mathcal{H}}}_{{({A}_{j}^{\prime})}^{{\rm{in}}}}\) to a new node \({\tilde{A}}_{j}\), defined by \({{\mathcal{H}}}_{{\tilde{A}}_{j}^{{\rm{in}}}}:= {{\mathcal{H}}}_{{A}_{j}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{({A}_{j}^{\prime})}^{{\rm{in}}}}\) and \({{\mathcal{H}}}_{{\tilde{A}}_{j}^{{\rm{out}}}}:= {{\mathcal{H}}}_{{A}_{j}^{{\rm{out}}}}\), and any auxiliary quantum state \(\rho \in {\mathcal{L}}({{\mathcal{H}}}_{({A}_{1}^{\prime})^{\rm{in}}}\otimes \ldots \otimes {{\mathcal{H}}}_{{({A}_{n}^{\prime})}^{{\rm{in}}}})\), the process σρ on the quantum nodes \({\tilde{A}}_{1}\), …, \({\tilde{A}}_{n}\) decomposes as

$$\sigma \otimes \rho \,=\,\sum \limits_{k = 1}^{n}{q}_{k}\,{\sigma }_{(k)}^{\rho },$$

with qk≥ 0, ∑kqk = 1, where for each k, \({\sigma }_{(k)}^{\rho }\) is a process in which there can be no signalling to \({\tilde{A}}_{k}\) from the rest of the nodes, and where for any CP map \({\tau }_{{\tilde{A}}_{k}}\) that can take place at the node \({\tilde{A}}_{k}\), the conditional process on the remaining n − 1 nodes, \({\sigma }_{(k)}^{\rho }{| }_{{\tau }_{{\tilde{A}}_{k}}}\), is itself causally separable.

An important question then concerns the relation between causal nonseparability and cyclicity of causal structure. For a QCM that involves a generic (not necessarily unitary) process, the cyclicity of its directed graph does not in general imply causal nonseparability of the process, even if the QCM is faithful. Consider, for example, the quantum SWITCH with process operator \({\sigma }_{ABFP}^{{\rm{SWITCH}}}\). Tracing out the system Fin, we obtain a reduced 3-node process that (relabelling C as P) is both faithful and Markov for the graph of Fig. 2b, having the form σABP = ρABPρBAPρP. This process is causally separable, since it can be understood as describing a situation in which the order between A and B depends in an incoherent manner on the logical value of the control qubit prepared at the initial time. This process thus forms a faithful cyclic QCM and is a canonical example of a process with dynamical causal order (here between nodes A and B).

In fact, one and the same cyclic graph may appear in two distinct faithful QCMs, one involving a causally separable, the other a nonseparable process. An example of this can again be given using the quantum SWITCH. The latter is causally nonseparable and has the graph in Fig. 4 as causal structure, which however also is the causal structure of the classical SWITCH2, which in contrast is causally separable (see subsequent discussion of classical processes). What this points at is a well-known fact, namely that causal separability cannot separate the distinction between cyclicity and acyclicity on one hand, and classical and quantum causal order on the other hand.

For the case of unitary processes things are, however, much simpler.

Theorem 4 A unitary process is causally nonseparable iff it has a cyclic causal structure.

Proof See Methods.

If a unitary process has a causal structure given by an acyclic graph, then it is a unitary comb54. Hence a unitary process is either a comb or is causally nonseparable—intermediate possibilities, such as dynamical causal order, cannot arise. Note that there is no classical analogue of Theorem 4, i.e., a classical deterministic process is not necessarily causally nonseparable if it has a cyclic causal structure. The classical SWITCH2 is again an example that establishes this claim. (See below for an introduction of classical deterministic processes).

Cyclicity and classical processes

If a process operator is diagonal in a basis that is a product of local bases for the input and output Hilbert spaces at each node, it is equivalent to a classical process3,12,56, where each node X is associated with a pair of classical variables Xin and Xout. Following ref. 35 we call such classical nodes classical split nodes. Classical processes are studied in detail in refs. 12,35,56. (See also refs. 11,22.) This section presents the main ideas, and defines (possibly cyclic) classical split-node causal models (CSM). For the most part the definitions are the obvious classical analogues of those for the quantum case. While cyclic classical causal models have sometimes been studied (see, e.g., refs. 66,67), for example to encompass the possibility of classical feedback loops, they are not of the split-node variety described here, and are not equivalent.

A classical process, defined over classical split-nodes X1, ..., Xn, corresponds to a map \({\kappa }_{{X}_{1}...{X}_{n}}:{X}_{1}^{{\rm{in}}}\times {X}_{1}^{{\rm{out}}}\times \cdots \times {X}_{n}^{{\rm{in}}}\times {X}_{n}^{{\rm{out}}}\to [0,1]\), such that \({\sum }_{{X}_{1}^{{\rm{in}}},{X}_{1}^{{\rm{out}}},...,{X}_{n}^{{\rm{in}}},{X}_{n}^{{\rm{out}}}}\left({\kappa }_{{X}_{1}...{X}_{n}}{\prod }_{i}P({X}_{i}^{{\rm{out}}}| {X}_{i}^{{\rm{in}}})\right)=1\), for any set of classical channels \(\{P({X}_{i}^{{\rm{out}}}| {X}_{i}^{{\rm{in}}})\}\). A local intervention at a node X, with outcome kX, corresponds to a classical instrument P(kX, XoutXin). Given a local intervention at each node, the joint probability distribution over the outcomes is

$$P({k}_{{X}_{1}},...,{k}_{{X}_{n}})= \sum \limits_{{X}_{1}^{{\rm{in}}},{X}_{1}^{{\rm{out}}},...,{X}_{n}^{{\rm{in}}},{X}_{n}^{{\rm{out}}}}\left({\kappa }_{{X}_{1}...{X}_{n}}\mathop{\prod }\limits_{i}P({k}_{{X}_{i}}{X}_{i}^{{\rm{out}}}| {X}_{i}^{{\rm{in}}})\right).$$

A special case of a classical process is a deterministic process \({\kappa }_{{X}_{1}...{X}_{n}}^{f}\), for which \(P({X}_{1}^{{\rm{in}}},...,{X}_{n}^{{\rm{in}}}| {X}_{1}^{{\rm{out}}},...,{X}_{n}^{{\rm{out}}})=\delta (({X}_{1}^{{\rm{in}}},...,{X}_{n}^{{\rm{in}}}),f({X}_{1}^{{\rm{out}}},...,{X}_{n}^{{\rm{out}}}))\), where \(f:{X}_{1}^{{\rm{out}}}\times ....\times {X}_{n}^{{\rm{out}}}\to {X}_{1}^{{\rm{in}}}\times ....\times {X}_{n}^{{\rm{in}}}\) is a function. When f is bijective, we call such a process reversible. It was shown in ref. 12 that the set of classical processes over nodes X1, ..., Xn forms a polytope, and that the deterministic polytope, defined as all convex mixtures of deterministic processes, is in general a strict subset of it. While all classical processes on two nodes are causally separable3, on three or more nodes there exist classical processes, including deterministic classical processes, that are causally nonseparable—the AF process from ref. 56, described above, is an example.

Definition 9 (CSM—generalized) A CSM is given by:

  1. (1)

    a causal structure represented by a directed graph G with vertices corresponding to classical split-nodes X1, . . . , Xn,

  2. (2)

    for each Xi, a classical channel \(P({X}_{i}^{{\rm{in}}}| Pa{({X}_{i})}^{{\rm{out}}})\), where Pa(Xi) denotes the set of parents of Xi according to G, such that \({\kappa }_{{X}_{1}\cdots {X}_{n}}={\prod }_{i}P({X}_{i}^{{\rm{in}}}| Pa{({X}_{i})}^{{\rm{out}}})\) is a classical process over X1, . . . , Xn.

This definition generalizes that of ref. 35 to include the case of cyclic graphs, and classical split nodes where the input and output variables have different cardinalities. Reference 35 presents detailed discussion of the relationship between (acyclic) CSMs and standard classical causal models47,48.

In the classical case, causal structure (defined for unitary processes in the quantum case) can be defined for deterministic processes.

Definition 10 (Causal structure of a deterministic classical process) Given a deterministic process \({\kappa }_{{X}_{1}...{X}_{n}}^{f}\), the causal structure of the process is the directed graph with vertices X1,...,Xn and an arrow Xi → Xj, whenever \({X}_{j}^{{\rm{in}}}\) depends on \({X}_{i}^{{\rm{out}}}\) through the function f.

Definition 11 (Classical Markov condition—generalized) A process \({\kappa }_{{X}_{1}...{X}_{n}}\) is called Markov for a directed graph G with classical split-nodes X1, …, Xn as its vertices iff it admits a factorization of the form \({\kappa }_{{X}_{1}...{X}_{n}}=\mathop{\prod }\nolimits_{i = 1}^{n}P({X}_{i}^{{\rm{in}}}| Pa{({X}_{i})}^{{\rm{out}}})\), where Pa(Xi) denotes the set of parents of Xi according to G.

The following is immediate.

Proposition 2 Every deterministic classical process is Markov for its causal structure.

In the case of general—i.e., not necessarily deterministic—classical processes, an account of their relationship to causal structure can be given that again mirrors the quantum case. Let us adopt the provisional approach that causal structure always inheres in deterministic reversible processes (where reversibility here may not be essential, but is assumed to provide a closer analogue to the quantum case in which unitarity is assumed). Then compatibility with a given directed graph can be defined in terms of extension to a reversible deterministic process with latent local noise variables.

Definition 12 (Reversible extendibility) A process \({\kappa }_{{X}_{1}...{X}_{n}}\) is reversibly extendible iff there exists a reversible deterministic process \({\kappa }_{{X}_{1}\cdots {X}_{n}F\lambda }^{f}\) with an additional leaf node F and root node λ, such that \({\kappa }_{{X}_{1}\cdots {X}_{n}}={\sum }_{{F}^{{\rm{in}}},{\lambda }^{{\rm{out}}}}[{\kappa }_{{X}_{1}\cdots {X}_{n}F\lambda }^{f}P({\lambda }^{{\rm{out}}})]\) for some P(λout).

Definition 13 (Compatibility with a directed graph) A process \({\kappa }_{{X}_{1}\cdots {X}_{n}}\) is compatible with a directed graph G with nodes X1,...,Xn, iff \({\kappa }_{{X}_{1}\cdots {X}_{n}}\) is reversibly extendible to a deterministic process \({\kappa }_{{X}_{1}\cdots {X}_{n}F{\lambda }_{1}...{\lambda }_{n}}^{f}\), with an additional leaf node F, root nodes λi, and a product distribution \({\prod }_{i}P({\lambda }_{i}^{{\rm{out}}})\), such that through f, \({X}_{i}^{{\rm{in}}}\) depends neither on \({\lambda }_{j}^{{\rm{out}}}\) for j ≠ i nor on \({X}_{j}^{{\rm{out}}}\) for XjPa(Xi) (with Pa(Xi) referring to G).

With Proposition 2, the following analogue of Thm. 2 is straightforward.

Theorem 5 If a classical process \({\kappa }_{{X}_{1}\cdots {X}_{n}}\) is compatible with a directed graph G, then it is also Markov for G.

As in the quantum case, we leave open whether the converse to Theorem 5 holds.

Hypothesis 2 If a process \({\kappa }_{{X}_{1}...{X}_{n}}\) is Markov for a directed graph G, then it is compatible with G.

We remark only that Hypothesis 2 is not obviously implied by its quantum counterpart, Hypothesis 1. First, it is not known whether reversible extendibility implies unitary extendibility for a classical process when seen as a special case of a quantum process. Second, even if this is the case, it is still conceivable that while a classical process that is Markov for a given graph may admit unitary extensions with the required no-influence properties when viewed as a quantum process, no such extension may be equivalent to a deterministic classical process for the given preferred basis.

We conclude with the following observation.

Theorem 6 Given a set of classical split nodes X1, ..., Xn, the set of reversibly extendible classical processes on X1, ..., Xn coincides with the deterministic polytope.

Proof See Methods.

If Hypothesis 2 holds, then Theorem 6 implies in particular that the process defined by a CSM must always belong to the deterministic polytope. An example of a classical process \({\kappa }_{{X}_{1}\cdots {X}_{n}}\) outside of the deterministic polytope is described in ref. 12 (and denoted \({\hat{E}}_{ex1}\) therein). It is not too hard to show that this process is not Markov for any directed graph, hence cannot be the process defined by a CSM, in keeping with Hypothesis 2.


This work presented an extension of the framework of quantum causal models from refs. 34,35 to include cyclic causal structures. We showed that the quantum SWITCH, and a process that violates causal inequalities, found by Araújo and Feix and described by Baumeler and Wolf, can be seen as the processes defined by cyclic quantum causal models. We also gave decompositions of any SWITCH-type process and of the unitary extension of the aforementioned process by Araújo and Feix, enabling diagrammatic representations that make the internal causal structures evident. Applications of these results included proofs that any unitarily extendible bipartite process is causally separable, and that any unitary process is cyclic if and only if it is causally nonseparable.

What technically comes as the natural generalization of the framework of acyclic quantum causal models is conceptually a substantial step—allowing causal structure to be cyclic. Taking this extended causal model perspective seriously then offers an alternative view of certain processes: a process that is incompatible with definite causal order may now also be seen to have a well-defined cyclic causal structure. This is to say, to admit of a partial order is not an essential property of being causal anymore. While processes that violate a causal inequality were previously referred to as noncausal processes, suggesting they cannot be understood causally, at least some of them then do admit a causal understanding.

Note that as far as acyclic causal structures are concerned there also is the earlier framework of QCMs by Costa and Shrapnel from ref. 45, which is related to, in fact strictly contained in that of refs. 34,35, which the current work extends. The Markov condition of ref. 45 is a special case of Definition 2, restricted to DAGs for which each node’s output space factorizes into as many subsystems as the node has children, with each subsystem only influencing the corresponding child. With this idea of a system per arrow, the process operator \({\prod }_{i}{\rho }_{{A}_{i}| Pa({A}_{i})}\) becomes a tensor product. As a consequence—for essentially the same reason as why Proposition 1 holds—the notion of a QCM from ref. 45 does not admit a nontrivial extension to cyclic directed graphs. The extension of faithful QCMs to cyclic graphs relies on the particular nature of our Markov condition that allows the nontrivial action of pairwise commuting operators \({\rho }_{{A}_{i}| Pa({A}_{i})}\) to overlap on non-factorizing output spaces.

Although we do not provide the details, we note a further application of the generalized framework: it allows an extended version of the causal discovery algorithm sketched in ref. 35 (inspired in turn by the first of its kind in ref. 50). While the version in ref. 35, takes a process operator as input, and outputs DAGs as candidate causal explanations, where possible at all, the extended version can discover and output cyclic causal structures. The basic steps of the algorithm in ref. 35 largely remain the same, but for instance the algorithm does not halt anymore when encountering a cyclic graph Gσ that encodes the direct signalling relations between pairs of nodes of the given process σ. Instead Markovianity for such cyclic Gσ can still be checked to establish whether Gσ is a plausible causal explanation.

One of the main questions left open is the validity of our hypothesis that Markovianity implies compatibility for cyclic graphs, which would generalize one of the main results established for the acyclic case in ref. 35. The validity of this hypothesis has consequences, which we spell out as follows.

Reference 51, in motivating the study of unitary extendibility of processes, includes the suggestion that unitary extendibility should be regarded as a necessary condition for a process to be realizable in nature. Here, the meaning of ‘realizable’ is a little vague, but might be taken, for example, to include exotic scenarios involving gravity as well as the time-delocalized sense discussed above in which some processes have been realized in the laboratory. (It does not include realization via postselection, since it is known that all processes can be realized under a suitable postselection10,13,27,68.) The suggestion would hold if all processes, once sufficient systems are included, are unitary at the most fundamental level.

Alternatively, under the assumption that the process operator framework provides the most general description of the possible correlation between quantum systems, in non-postselected scenarios, one may speculate that a necessary condition for a process to be realizable in nature is that it can arise from a QCM. Here, ‘arise’ means that there is a QCM with process \(\sigma ^{\prime}\) such that σ can be obtained from \(\sigma ^{\prime}\) by inserting channels at some of the nodes of \(\sigma ^{\prime}\) and marginalizing over them. The idea is that any correlations described by such a process admit a causal explanation, albeit one that may involve cycles. On the other hand, any process that cannot arise from a QCM in this manner describes correlations that are not amenable to an understanding in causal terms.

The connection with unitary extendibility is that any process that is unitarily extendible has the property that it can arise from a QCM. Furthermore, if Hypothesis 1 holds, then any process that is not unitarily extendible cannot arise from a QCM. Hence if Hypothesis 1 holds, the speculation above coincides with the suggestion of ref. 51.

If Hypothesis 1 fails, there is a peculiar class of cyclic quantum causal models, in which the process is Markov for the graph but not compatible with the graph. There then are two logically conceivable options: one may insist on the notion of compatibility as the essential concept for giving causal explanations, turning the Markov condition into a necessary but insufficient condition; alternatively, one could insist on the Markov condition as the essential concept for giving causal explanations, turning the current notion of compatibility into a sufficient but not necessary condition. We leave open the question whether any meaning can be given to the arrows of the graph in this case, given that there is no suitable unitary extension to define causal relations, and whether such processes might be realizable or not.

Beyond establishing the hypothesis, future work might study the extent to which other core results of the framework of quantum causal models in the acyclic case, such as the d-separation theorem35, can be generalized in an appropriate way to the cyclic case, as has been done for the classical framework (see, e.g., ref. 67).

Finally, one of the most promising avenues for future work is the general idea behind the above causal decompositions of our example processes together with Theorem 3: to derive further causal decompositions of unitary transformations U, as started in ref. 63, and then study the interplay between the discovered algebraic structure and the condition that U defines a valid unitary process when identifying in- and output spaces of U as the out- and input spaces of quantum nodes. We expect this mathematical tool to lead to insights into which unitarily extendible processes are causally nonseparable and how the cyclicity is distributed, mathematically speaking, across the process—with possible hints for the process’ physical realizability.


Characterisation of process operators

In order to state necessary and sufficient conditions for an operator to be a valid process operator, the following will be useful. Let \({\{{\eta }_{X}^{l}\}}_{l = 0}^{{d}_{X}^{2}-1}\) denote a Hilbert-Schmidt (HS) basis for \({\mathcal{L}}({{\mathcal{H}}}_{X})\), i.e., a set of operators such that they are orthonormal with respect to the HS inner product and, in addition, traceless for all \(l=1,...,{d}_{X}^{2}-1\), while \({\eta }_{X}^{0}=(1/{d}_{X}){{\mathbb{1}}}_{X}\). Any \(\sigma \in {\mathcal{L}}({{\mathcal{H}}}_{{A}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{A}^{{\rm{out}}}}\otimes {{\mathcal{H}}}_{{B}^{{\rm{in}}}}\otimes {{\mathcal{H}}}_{{B}^{{\rm{out}}}})\) can be expanded in a HS basis as \(\sigma ={\sum }_{{l}_{1},{l}_{2},{l}_{3},{l}_{4}}\,{\alpha }_{{l}_{1}{l}_{2}{l}_{3}{l}_{4}}\,{\eta }_{{A}^{{\rm{in}}}}^{{l}_{1}}\otimes {\eta }_{{A}^{{\rm{out}}}}^{{l}_{2}}\otimes {\eta }_{{B}^{{\rm{in}}}}^{{l}_{3}}\otimes {\eta }_{{B}^{{\rm{out}}}}^{{l}_{4}}\). A term of type Ain in the expansion is a summand with non-trivial action only on Ain, i.e., l1 ≠ 0 and l2 = l3 = l4 = 0. Similarly for types AinBout etc.

It was shown in ref. 3 that σ being a bipartite process operator is equivalent to σ ≥ 0, \({\rm{Tr}}[\sigma ]={d}_{{A}^{{\rm{out}}}}{d}_{{B}^{{\rm{out}}}}\) and that in a HS basis expansion, in addition to a term, which is proportional to the identity operator on all four spaces, only the coefficients of terms of the types Ain, Bin, AinBin, AinBout, AoutBin, AinAoutBin and AinBinBout, may be non-vanishing. These conditions were generalized to n numbers of nodes in ref. 7 and can easily be stated as (1) σ ≥ 0, (2) \({\rm{Tr}}[\sigma ]=\mathop{\prod }\nolimits_{i = 1}^{n}{d}_{{A}_{i}^{{\rm{out}}}}\) and (3) that in a HS basis expansion the only non-vanishing terms, apart from an overall identity operator, are of a type such that there must be at least one node, say Ai, on whose out-space, \({A}_{i}^{{\rm{out}}}\), the action is trivial, but on whose in-space, \({A}_{i}^{{\rm{in}}}\), the action is non-trivial. Equivalent conditions were presented in ref. 6 where the projector onto the linear subspace of process operators was defined explicitly, giving a basis-independent characterization.

Proof of Proposition 1 Suppose a bipartite cyclic QCM is given by the (unique) cyclic graph G with two nodes A and B from Fig. 2a and a process σAB = ρABρBA, Markov for G. It follows that σAB = ρBAρAB, as both factors act on distinct Hilbert spaces. Now suppose that this is a faithful QCM, i.e., both channels ρAB and ρBA are signalling channels. One way to see that this contradicts the assumption that σAB is a valid process is by analyzing the non-vanishing types of terms in an expansion of σAB relative to a HS product basis (see above). If signalling from Bout to Ain is possible in ρAB, then an expansion of just ρAB has to contain a non-vanishing term of type AinBout. Similarly, if signalling from Aout to Bin is possible in ρBA, then an expansion of ρBA has to contain a non-vanishing term of type BinAout. Consequently, σAB has to contain a non-vanishing term of type AinBoutBinAout, which is forbidden for a process operator3.

Product of commuting operators not necessarily a process operator

As established by Proposition 1, not all cyclic graphs support a faithful cyclic QCM. Here we show that, given a cyclic graph G that does support a faithful cyclic QCM, it is not true that any product of commuting operators \({\prod }_{i}{\rho }_{{A}_{i}| Pa({A}_{i})}\), with parental sets as in G, constitutes a process operator. Consider for instance the graph G in Fig. 2b (and see the discussion below Definition 8 for an example of a faithful cyclic QCM over G). Letting the three nodes A, B and C be classical split nodes, with classical bits Ain, Aout, Bin, Bout, Cin and Cout, define classical channels as in Eqs. (29)–(30). It is easy to see that the signalling relations through the channels P(AinBout, Cout) and P(BinAout, Cout) are indeed as in Fig. 2b. At the same time, for any choice of probability distribution P(Cin), the product P(AinBout, Cout)P(BinAout, Cout)P(Cin) cannot be a classical process: consider an intervention at C which fixes Cout to be 0, then P(AinBout, 0)P(BinAout, 0) is still a product of two signalling classical channels, which (seeing them as special cases of quantum channels) was already established in the proof of Proposition 1 to be in contradiction with being a process. This establishes the claim.

$$P({A}^{{\rm{in}}}| {B}^{{\rm{out}}},{C}^{{\rm{out}}}):= \left\{\begin{array}{llll}P(0| 0,0)\,=\,0.4,&P(0| 0,1)\,=\,0.3,&P(0| 1,0)\,=\,0.8,&P(0| 1,1)\,=\,0.3,\\ P(1| 0,0)\,=\,0.6,&P(1| 0,1)\,=\,0.7,&P(1| 1,0)\,=\,0.2,&P(1| 1,1)\,=\,0.7.\end{array}\right.$$
$$P({B}^{{\rm{in}}}| {A}^{{\rm{out}}},{C}^{{\rm{out}}}):= \left\{\begin{array}{llll}P(0| 0,0)\,=\,0.5,&P(0| 0,1)\,=\,0.3,&P(0| 1,0)\,=\,0.25,&P(0| 1,1)\,=\,0.1,\\ P(1| 0,0)\,=\,0.5,&P(1| 0,1)\,=\,0.7,&P(1| 1,0)\,=\,0.75,&P(1| 1,1)\,=\,0.9.\end{array}\right.$$

Proof of Theorem 3

Suppose the bipartite quantum process operator σAB is unitarily extendible. Consider an arbitrary unitary extension of it, \({\sigma }_{ABFP}={\rho }_{ABF| ABP}^{{\mathcal{U}}}\). From Eq. (14) it follows that the reduced process obtained by tracing out Fin has the form

$${\sigma }_{ABP}={{\rm{Tr}}}_{{F}^{{\rm{in}}}}[{\rho }_{ABF| ABP}^{{\mathcal{U}}}]=\sum _{i\in I}{\rho }_{A| B{P}_{i}^{L}}\otimes {\rho }_{B| {P}_{i}^{R}A}\,,$$

for the decomposition \({{\mathcal{H}}}_{{P}^{{\rm{out}}}}={\bigoplus }_{i\in I}{{\mathcal{H}}}_{{P}_{i}^{L}}\otimes {{\mathcal{H}}}_{{P}_{i}^{R}}\), identified by S, where \({\rho }_{A| B{P}_{i}^{L}}={{\rm{Tr}}}_{{F}_{i}^{L}}[{\rho }_{A{F}_{i}^{L}| B{P}_{i}^{L}}^{{V}_{i}}]\) and \({\rho }_{B| {P}_{i}^{R}A}={{\rm{Tr}}}_{{F}_{i}^{R}}[{\rho }_{{F}_{i}^{R}B| {P}_{i}^{R}A}^{{W}_{i}}]\) and, where \({\rho }_{A| B{P}_{i}^{L}}\otimes {\rho }_{B| {P}_{i}^{R}A}\) is taken as an operator on the whole space, acting as zero map on all but the ith subspace. Note that from σABP being a process operator it follows that feeding in any \({\tau }_{P}\in {\mathcal{L}}({{\mathcal{H}}}_{{P}^{{\rm{out}}}}^{* })\) gives a quantum process operator on the nodes A and B. Let iI be some fixed index and suppose through the channel \({\rho }_{A| B{P}_{i}^{L}}\) system Bout can signal to Ain and similarly, through the channel \({\rho }_{B| A{P}_{i}^{R}}\) system Aout can signal to Bin. Then there exists an appropriate state τP, which has only support on the ith subspace, and which is of a product form \({\gamma }_{{P}_{i}^{L}}\otimes {\phi }_{{P}_{i}^{R}}\), such that in

$${{\rm{Tr}}}_{{({P}_{i}^{L})}^{* }}[{\rho }_{A| B{P}_{i}^{L}}\,{\gamma }_{{P}_{i}^{L}}]\,\,\otimes \,\,{{\rm{Tr}}}_{{({P}_{i}^{R})}^{* }}[{\rho }_{B| A{P}_{i}^{R}}\,{\phi }_{{P}_{i}^{R}}]\,,$$

both, the marginal channel on the left is signalling from Bout to Ain and the one on the right from Aout to Bin. Since the expression in Eq. (32) has to give a process operator over A and B, this yields a contradiction due to Prop. 1. Hence, for each i at most one of the channels \({\rho }_{A| B{P}_{i}^{L}}\) and \({\rho }_{B| A{P}_{i}^{R}}\) allow signalling from Bout to Ain or from Aout to Bin, respectively. By assumption there exists an appropriate \({\tau }_{P}\in {\mathcal{L}}({{\mathcal{H}}}_{{P}^{{\rm{out}}}}^{* })\) such that

$${\sigma }_{AB}=\sum _{i}{{\rm{Tr}}}_{{({P}^{{\rm{out}}})}^{* }}\left[({\rho }_{A| B{P}_{i}^{L}}\otimes {\rho }_{B| {P}_{i}^{R}A})\,{\tau }_{P}\right]\,.$$

By the above analysis, it also follows that each summand in Eq. (33) has to be a process operator up to normalization. Since they sum up to a process operator, the inverses of the normalization constants have to form a probability distribution and one can therefore write \({\sigma }_{AB}={\sum }_{i}{p}_{i}\,{\sigma }_{AB}^{(i)}\), where each \({\sigma }_{AB}^{(i)}\) is a process operator with at most A signalling to B or vice versa. This is the form of a bipartite causally separable process operator.

Note further that if \({\rho }_{A| B{P}_{i}^{L}}\) is non-signalling from Bout to Ain, then in Vi there is no influence from Bout to Ain, and similarly, if \({\rho }_{B| {P}_{i}^{R}A}\) is non-signalling from Aout to Bin, then in Wi there is no influence from Aout to Bin. Therefore, the above constraints mean that each term ViWi in Eq. (14) corresponds to a process over nodes including A and B that allows signalling in at most one direction between A and B. The latter always admits an implementation as a unitary circuit fragment with nodes A and B in a fixed order54. Since the full unitary U of the unitary extension is a direct sum of such fixed-order unitary processes taking place in the different orthogonal subspaces, and every operation at the nodes A and B can be dilated to a unitary, the full unitary process \({\sigma }_{ABFP}={\rho }_{ABF| ABP}^{U}\) can be realized by coherently conditioning which of the corresponding fixed-order unitary circuits takes place on the logical value of some control n-level quantum system, where n is the number of different subspaces. Note that since the systems involved in the fixed-order circuits may have different dimensions, this implementation in practice may require bringing in different systems depending on the control variable i, but this can always be seen as part of a process on a larger system of a fixed dimension. Moreover, the fixed-order processes in the different orthogonal subspaces can be grouped into two sets: one in which A is before B and another one in which B is before A. This allows embedding the process into another one where one of two possible circuits (in which A and B occur in different orders) is applied in a coherently controlled fashion based on the logical value of a control qubit, similarly to the quantum SWITCH. This yields another possible unitary extension \({\sigma }_{AB\tilde{F}\tilde{P}}\) of the original bipartite process, where \({\tilde{F}}^{{\rm{in}}}\) and \({\tilde{P}}^{{\rm{out}}}\) would contain Fin and Pout, respectively, as subspaces. The originally assumed unitary extension σABFP can then be seen to take place effectively as part of \({\sigma }_{AB\tilde{F}\tilde{P}}\).

Proof of Theorem 4

The below proof of Theorem 4 will use the following two concepts. First, generalizing the notion of a process being unitary, a process is called isometric if its induced channel from the output systems of all nodes to the input systems of all nodes arises from an isometry. Second, a quantum comb, as defined in ref. 54 (provided first input and last output system are trivial), is a special kind of quantum process: a process \({\sigma }_{{A}_{1}\ldots {A}_{n}}\) over n quantum nodes for the given total order of its nodes A1, …, An is a quantum comb (an (n + 1)-comb) iff

$$\begin{array}{l}\,\forall l=1,\ldots ,n-1\,{{\rm{Tr}}}_{{A}_{l+1}\ldots {A}_{n}}[{\sigma }_{{A}_{1}\ldots {A}_{n}}]=\\ \frac{1}{{d}_{{A}_{l}^{{\rm{out}}}}}{{\rm{Tr}}}_{{({A}_{l}^{{\rm{out}}})}^{* }}\left[{{\rm{Tr}}}_{{A}_{l+1}\ldots {A}_{n}}[{\sigma }_{{A}_{1}\ldots {A}_{n}}]\right]\otimes {{\mathbb{1}}}_{{({A}_{l}^{{\rm{out}}})}^{* }}\,\end{array}$$
$$\wedge \,{\sigma }_{{A}_{1}\ldots {A}_{n}}=\frac{1}{{d}_{{A}_{n}^{{\rm{out}}}}}{{\rm{Tr}}}_{{({A}_{n}^{{\rm{out}}})}^{* }}[{\sigma }_{{A}_{1}\ldots {A}_{n}}]\otimes {{\mathbb{1}}}_{{({A}_{n}^{{\rm{out}}})}^{* }}.$$

Proof of Theorem 4 Let \({\sigma }_{{A}_{1}\ldots {A}_{n}}\) be a unitary process. The following will establish, what is equivalent to Theorem 4, namely that acyclicity of its causal structure is equivalent to \({\sigma }_{{A}_{1}\ldots {A}_{n}}\) being causally separable.

First, suppose \({\sigma }_{{A}_{1}\ldots {A}_{n}}\) has an acyclic causal structure. There then exists a total order of the quantum nodes A1, …, An (appropriately relabelled) such that AjAij ≥ i (see Def. 4). This implies that the conditions in Eqs. (34)–(35) are satisfied (note that \({d}_{{A}_{n}^{{\rm{out}}}}=1={d}_{{A}_{1}^{{\rm{in}}}}\)). Hence, \({\sigma }_{{A}_{1}\ldots {A}_{n}}\) is a quantum comb54. Such a process is a special case of a causally separable process since in a quantum comb there can be no signalling from {Aj+1,  , An} to {A1,  , Aj} for any j = 1,  , n − 1, and this remains true under extending the process with arbitrary shared input ancillary states.

For the converse direction, suppose the unitary process \({\sigma }_{{A}_{1}\ldots {A}_{n}}\) is causally separable. In order to show that it then has an acyclic causal structure we will prove that it is a quantum comb. In fact we will prove the following more general statement concerning isometric processes, which gives the claim as a special case.

Lemma 1 Every causally separable isometric process is a quantum comb.

Proof of Lemma 1 The main idea of the following proof is the observation that the process operator of an isometric process is proportional to a rank-1 projector and hence cannot be written as a nontrivial convex mixture of different positive semi-definite operators. The proof proceeds by induction.

An isometric process over one single node is a 2-comb. Assume that all causally separable isometric processes on n nodes are quantum combs. Let \({\sigma }_{{A}_{1}\ldots {A}_{n+1}}\) be an isometric process over n + 1 nodes, which is causally separable. Let us extend it by adding auxiliary input systems for all n + 1 nodes with the following pure state shared among them:

$$\left|\Psi \right\rangle \,=\,\bigotimes _{i = 1,j = 2,i\,{<}\,j}^{i = n,j = n+1}\,{\left|{\phi }^{+}\right\rangle }_{ij}\,,$$

where each \({\left|{\phi }^{+}\right\rangle }_{ij}=\frac{1}{\sqrt{n!}}\mathop{\sum }\nolimits_{l = 1}^{n!}\left|l\right\rangle \left|l\right\rangle\) is a maximally entangled state, shared between node Ai and node Aj. Thus, \(\left|\Psi \right\rangle\) is a tensor product of \(\frac{1}{2}n(n+1)\) maximally entangled bipartite states, such that every pair of nodes indexed by (i, j) shares one such state of Schmidt rank n!. Using the notation of Definition 8, \(\tilde{\sigma }:= \sigma \otimes \left|\Psi \right\rangle \left\langle \Psi \right|\) is an extended process over the extended nodes \({\tilde{A}}_{1},\ldots ,{\tilde{A}}_{n+1}\), with \(\left|\Psi \right\rangle \in {\bigotimes }_{i = 1}^{n+1}{{\mathcal{H}}}_{{({A}_{i}^{\prime})}^{{\rm{in}}}}\), where each \({{\mathcal{H}}}_{{({A}_{i}^{\prime})}^{{\rm{in}}}}\) is an n-fold tensor product of (n!)-dimensional systems.

By assumption, \(\tilde{\sigma }\) is causally separable, too, while it also is proportional to a rank-1 projector. In the decomposition as in Eq. (27), implied by causal separability, there therefore is only one summand. Hence, there exists one node, let this be \({\tilde{A}}_{1}\) (for an appropriate relabelling), such that \({\tilde{A}}_{2},\ldots ,{\tilde{A}}_{n+1}\) cannot signal to \({\tilde{A}}_{1}\) and for all CP maps \({\tau }_{{\tilde{A}}_{1}}\) at that node the conditional process \(\tilde{\sigma }{| }_{{\tau }_{{\tilde{A}}_{1}}}\) is causally separable. Now consider a CP map such that \({\tau }_{{\tilde{A}}_{1}}=\left|\tau \right\rangle {\left\langle \tau \right|}_{{\tilde{A}}_{1}}\) itself is a rank-1 projector. The process operator \(\tilde{\sigma }{| }_{{\tau }_{{\tilde{A}}_{1}}}\) then still is proportional to a rank-1 projector and, hence, representing an isometric process on the remaining n nodes \({\tilde{A}}_{2},\ldots ,{\tilde{A}}_{n+1}\). As argued above it also is causally separable. By assumption then such an isometric, causally separable process \(\tilde{\sigma }{| }_{{\tau }_{{\tilde{A}}_{1}}}\) on n nodes is a quantum comb.

Notice first that if there is no signalling to \({\tilde{A}}_{1}\) from all other nodes in the extended process \(\tilde{\sigma }\), then there is no signalling to A1 from all other nodes in the original process σ. Consider \({\tau }_{{\tilde{A}}_{1}}=\left|\tau \right\rangle {\left\langle \tau \right|}_{{A}_{1}}\otimes \left|\phi \right\rangle \left\langle \phi \right|\), where \(\left|\phi \right\rangle \left\langle \phi \right|\) is some fixed projector on the ancillary input system \({({A}_{1}^{\prime})}^{{\rm{in}}}\) and \({\tau }_{{A}_{1}}=\left|\tau \right\rangle {\left\langle \tau \right|}_{{A}_{1}}\) has rank-1. Since projecting the ancillary systems via \(\left|\phi \right\rangle \left\langle \phi \right|\) leaves the ancillary systems on the remaining nodes in some pure state \(\left|{{\Phi }}\right\rangle \left\langle {{\Phi }}\right|\), the conditional process on the remaining nodes has the form \({\left.\sigma \right|}_{{\tau }_{{A}_{1}}}\otimes \left|{{\Phi }}\right\rangle \left\langle {{\Phi }}\right|\). Since the latter is a quantum comb for every \(\left|\tau \right\rangle {\left\langle \tau \right|}_{{A}_{1}}\), so must be \({\left.\sigma \right|}_{{\tau }_{{A}_{1}}}\).

There are n! different possible total orders of the nodes, given by Aπ(2), …, Aπ(n+1) for π being one of the n! different permutations of 2, . . . , n + 1. We will now show (by proof of contradiction) that there exists a reordering Aπ(2), …, Aπ(n+1) with which the quantum comb \(\sigma {| }_{{\tau }_{{A}_{1}}}\) is compatible for any choice of \(\left|\tau \right\rangle {\left\langle \tau \right|}_{{A}_{1}}\). Suppose there does not exist one such appropriate total order. Then for every permutation π, there exists \({\tau }_{{A}_{1}}^{\pi }:= \left|{\tau }^{\pi }\right\rangle {\left\langle {\tau }^{\pi }\right|}_{{A}_{1}}\), such that the corresponding quantum comb \(\sigma {| }_{{\tau }_{{A}_{1}}^{\pi }}\) is incompatible with the total order of the remaining nodes defined by π. Let \({{\mathcal{C}}}_{l}^{\pi }(\sigma )=0\) for l = 1, . . . , n be the linear constraint corresponding to the lth condition in Eqs. (34)–(35) for a process operator σ over n nodes to be a valid quantum comb for the total order π.

Consider a process operator \(\bar{\sigma }:= \mathop{\sum }\nolimits_{\pi = 1}^{n!}{q}_{\pi }\,\sigma {| }_{{\tau }_{{A}_{1}}^{\pi }}\), where qπ ≥ 0, π, and ∑πqπ = 1 (letting π, both, be a permutation as well as an index enumerating those permutations). By construction, for every π at least one of the conditions in \({\{{{\mathcal{C}}}_{l}^{\pi }(\sigma {| }_{{\tau }_{{A}_{1}}^{\pi }}) = 0\}}_{l = 1}^{n}\) fails. Therefore, one can then choose the weights qπ such that for every π the process operator \(\bar{\sigma }\) violates at least one of these constraints \({\{{{\mathcal{C}}}_{l}^{\pi }(\bar{\sigma }) = 0\}}_{l = 1}^{n}\), establishing that \(\bar{\sigma }\) is not a quantum comb for any possible order of the n nodes. More precisely, the condition that \(\bar{\sigma }\) respects the constraints \({\{{{\mathcal{C}}}_{l}^{\pi }(\bar{\sigma }) = 0\}}_{l = 1}^{n}\), for a given π can be written as \(\mathop{\sum }\nolimits_{\alpha = 1}^{n!}{q}_{\alpha }\,{{\mathcal{C}}}_{l}^{\pi }(\sigma {| }_{{\tau }_{{A}_{1}}^{\alpha }})=0\) for l = 1, …, n, which implies that (q1, …, qn!), viewed as a point in an (n!)-dimensional Euclidean space, must belong to a specific hyperplane in that space. Our assumption that at least one of \({{\mathcal{C}}}_{l}^{\pi }(\sigma {| }_{{\tau }_{{A}_{1}}^{\pi }})\) must be nonzero, makes it a proper hyperplane. Then, in order for \(\bar{\sigma }\) to be compatible with the quantum-comb conditions for at least one π, the point (q1, …, qn!) must belong to the union of the hyperplanes corresponding to the different values of π. Since this is a finite set of hyperplanes, it is possible to find (a continuum of) points in the positive orthant that are outside of this union. Since rescaling (q1, …, qn!) by a constant factor, which amounts to rescaling \(\bar{\sigma }\) by a constant factor, does not change the fact of whether any of the above constraints is violated or not, there exists a (q1, …, qn!) with the required properties, such that \(\bar{\sigma }\) is not a quantum comb for any total order π.

We will now use this fact to construct the contradiction with the assumption that there is no single order π with which all isometric quantum combs \(\sigma {| }_{{\tau }_{{A}_{1}}}\) are compatible. To this end, we will first show that, starting from our extended process \(\tilde{\sigma }=\sigma \otimes \left|\Psi \right\rangle \left\langle \Psi \right|\), for any j {2, . . . , n + 1} it is possible to apply a suitable CP map \(\left|\tau \right\rangle {\left\langle \tau \right|}_{{\tilde{A}}_{1}}\) such that this yields a conditional process of the form \({\left.\tilde{\sigma }\right|}_{{\tau }_{{\tilde{A}}_{1}}}=|{\bar{\sigma }}_{j}\rangle \langle {\bar{\sigma }}_{j}| \otimes | {{\Phi }}\rangle {\left\langle {{\Phi }}\right|}_{res{t}_{{\rm{in}}}^{\prime}}\), where \(|{\bar{\sigma }}_{j}\rangle =\mathop{\sum }\nolimits_{\pi = 1}^{n!}\sqrt{{q}_{\pi }}\,{\left|\pi \right\rangle }_{{a}_{j}}|{\sigma }_{{\tau }_{{A}_{1}}^{\pi }}\rangle\) with, recalling Eq. (36), \({{\mathcal{H}}}_{{a}_{j}}\) the factor of \({{\mathcal{H}}}_{{({A}_{j}^{\prime})}^{{\rm{in}}}}\) sharing the state \({\left|{\phi }^{+}\right\rangle }_{1j}\) with \({{\mathcal{H}}}_{{a}_{1}}\) of the node A1 and \(|{\sigma }_{{\tau }_{{A}_{1}}^{\pi }}\rangle \langle {\sigma }_{{\tau }_{{A}_{1}}^{\pi }}|:= {\left.\sigma \right|}_{{\tau }_{{A}_{1}}^{\pi }}\), the conditional process on the remaining n of the original n + 1 nodes, and where \(\left|{{\Phi }}\right\rangle {\left\langle {{\Phi }}\right|}_{res{t}_{{\rm{in}}}^{\prime}}\) is some pure state on the remaining auxiliary input systems (i.e. \({\left|{{\Phi }}\right\rangle }_{res{t}_{{\rm{in}}}^{\prime}}\) is in \({\bigotimes }_{i\ne 1}{{\mathcal{H}}}_{{({A}_{i}^{\prime})}^{{\rm{in}}}}\) excluding the subfactor \({{\mathcal{H}}}_{{a}_{j}}\)).

To see this, let j ≠ 1. If we apply a CP map of the form \(\left|\tau \right\rangle {\left\langle \tau \right|}_{{\tilde{A}}_{1}}=\left|\chi \right\rangle \langle {\left.\chi \right|}_{{a}_{1}{A}_{1}}\otimes | \phi \rangle {\left\langle \phi \right|}_{res{t}_{{\tilde{A}}_{1}}}\), where \(\left|\chi \right\rangle =\mathop{\sum }\nolimits_{\pi = 1}^{n!}\sqrt{{\epsilon }_{\pi }}\,{\left|\pi \right\rangle }_{{a}_{1}}{\left|{\tau }^{\pi }\right\rangle }_{{A}_{1}}\), and \(\left|\phi \right\rangle {\left\langle \phi \right|}_{res{t}_{{\tilde{A}}_{1}}}\) is some projector on the remaining ancillary input systems in \({({A}_{1}^{\prime})}^{{\rm{in}}}\), then we will obtain a conditional process of the form \({\tilde{\sigma }|}_{{\tau }_{{\tilde{A}}_{1}}}=|{\sigma }_{j}\rangle \langle {\sigma }_{j}| \otimes | {{\Phi }}\rangle {\langle {{\Phi }}|}_{res{t}_{{\rm{in}}}^{\prime}}\), with \(|{\sigma }_{j}\rangle =\frac{1}{\sqrt{\mathop{\sum }\nolimits_{\pi = 1}^{n!}{\epsilon }_{\pi }{\gamma }_{\pi }}}\mathop{\sum }\nolimits_{\pi = 1}^{n!}\sqrt{{\epsilon }_{\pi }{\gamma }_{\pi }}{\left|\pi \right\rangle }_{{a}_{j}}|{\sigma }_{{\tau }_{{A}_{1}}^{\pi }}\rangle\), where \({\gamma }_{\pi }:= {\rm{Tr}}[(\left|{\tau }^{\pi }\right\rangle {\left\langle {\tau }^{\pi }\right|}_{{A}_{1}})\sigma ]\). Therefore, by choosing ϵπ = qπ/(cγπ), for some large enough constant c to ensure that \(\left|\tau \right\rangle {\left\langle \tau \right|}_{{\tilde{A}}_{1}}\) is appropriately normalised to represent a CP map, we can make \(|{\sigma }_{j}\rangle =|{\bar{\sigma }}_{j}\rangle\) as desired. (Note that π, γπ ≠ 0 since \({{\rm{Tr}}}_{{A}_{1}}[(\left|{\tau }^{\pi }\right\rangle {\left\langle {\tau }^{\pi }\right|}_{{A}_{1}})\sigma ]\) is proportional to a process operator on the remaining n nodes, the trace over which gives \(\mathop{\prod }\nolimits_{i = 2}^{n+1}{d}_{{A}_{i}^{{\rm{out}}}}\).)

By our main assumption, the n-node process \({\left.\tilde{\sigma }\right|}_{{\tau }_{{\tilde{A}}_{1}}}=|{\bar{\sigma }}_{j}\rangle \langle {\bar{\sigma }}_{j}| \otimes | {{\Phi }}\rangle {\langle {{\Phi }}|}_{res{t}_{{\rm{in}}}^{\prime}}\) must be a quantum comb, and since \(\left|{{\Phi }}\right\rangle {\left\langle {{\Phi }}\right|}_{res{t}_{{\rm{in}}}^{\prime}}\) is just a state on some input systems, \(|{\bar{\sigma }}_{j}\rangle \langle {\bar{\sigma }}_{j}|\) must also be a quantum comb (on the nodes Ai ≠ A1, i ≠ j, and the node Aj extended via the ancillary input system aj). But tracing out the system aj from the latter quantum comb must also yield a quantum comb on the nodes Ai ≠ A1, which can easily be seen from the quantum-comb conditions. However, by construction, \({{\rm{Tr}}}_{{a}_{j}}|{\bar{\sigma }}_{j}\rangle \langle {\bar{\sigma }}_{j}|=\bar{\sigma }\), where \(\bar{\sigma }\) is not supposed to be a quantum comb, which is a contradiction.

Therefore, there must exist a total order \(\bar{\pi }\), such that \(\sigma {| }_{{\tau }_{{A}_{1}}}\) is a quantum comb compatible with \(\bar{\pi }\) for every rank-1 \({\tau }_{{A}_{1}}\). By the convexity of the set of n-node operators that are quantum combs compatible with \(\bar{\pi }\), this automatically extends to all CP maps \({\tau }_{{A}_{1}}\).

So far we have shown that the process σ is such that there is a node A1 to which the rest of the nodes cannot signal, and the remaining nodes can be put in a total order A2, …, An+1, such that for every CP map \({\tau }_{{A}_{1}}\), the conditional process \(\sigma {| }_{{\tau }_{{A}_{1}}}\) is a quantum comb compatible with that order. Now observe that this implies that the full process σ is a quantum comb compatible with the total order A1, A2, …, An+1. Since for all possible CP maps \({\tau }_{{A}_{1}}\) it holds that \({{\mathcal{C}}}_{l}(\sigma {| }_{{\tau }_{{A}_{1}}})=0\) for l = 2, ..., n + 1, it follows from the linearity of these constraints, that the corresponding quantum comb conditions hold for σ, i.e., \({{\mathcal{C}}}_{l}(\sigma )=0\) for l = 2, ..., n + 1. Finally, that \({{\mathcal{C}}}_{1}(\sigma )=0\) holds follows from just σ being a process, since it is equivalent to that if in σ we trace out all of the nodes A2, …, An+1, we should be left with, up to normalization, a valid single-node process on A13. Therefore, the isometric process σ on n + 1 nodes is a quantum comb, too, which completes the proof of Lemma 1 and thereby also that of Theorem 4.

Proof of Theorem 6 First, suppose \({\kappa }_{{X}_{1}...{X}_{n}}\) is a reversibly extendible process, that is, there exists a reversible deterministic process \({\kappa }_{{X}_{1}...{X}_{n}\lambda F}^{g}\) for some bijection \(g:{X}_{1}^{{\rm{out}}}\times ...\times {X}_{n}^{{\rm{out}}}\times {\lambda }^{{\rm{out}}}\to {X}_{1}^{{\rm{in}}}\times ...\times {X}_{n}^{{\rm{in}}}\times {F}^{{\rm{in}}}\), such that

$${\kappa }_{{X}_{1}...{X}_{n}}={\mathop{\sum }_{{\lambda }^{{\rm{out}}},{F}^{{\rm{in}}}}}{\kappa }_{{X}_{1}...{X}_{n}\lambda F}^{g}\,P({\lambda }^{{\rm{out}}})$$

for some probability distribution P(λout). It follows from the fact that \(\kappa_{X_{1}...X_{n}\lambda F}^{g}\) is a classical process that marginalization as in Eq. (37) has to yield a classical process over nodes X1, ..., Xn for arbitrary distributions P(λout), in particular for every point-distribution. Hence, for every value \(\lambda ^{\prime}\) of λout, the induced function \(g_{\lambda ^{\prime} }({\_}\!{\_}):= g({\_}\!{\_},\lambda ^{\prime} )\) has to define a deterministic process for n + 1 nodes and furthermore, also once marginalizing over F it still has to be a deterministic process for the n nodes X1, …, Xn. Hence, Eq. (37) can be read as establishing that the given \({\kappa }_{{X}_{1}...{X}_{n}}\) is a convex mixture of deterministic processes over the nodes X1, ..., Xn, i.e., \({\kappa }_{{X}_{1}...{X}_{n}}\) lies in the deterministic polytope.

Conversely, suppose \({\kappa }_{{X}_{1}...{X}_{n}}\) lies inside the deterministic polytope, that is, there exists a family of deterministic processes \({\{{\kappa }_{{X}_{1}...{X}_{n}}^{{f}_{i}}\}}_{i = 1}^{m}\), defined by the functions \({f}_{i}:{X}_{1}^{{\rm{out}}}\times ...\times {X}_{n}^{{\rm{out}}}\to {X}_{1}^{{\rm{in}}}\times ...\times {X}_{n}^{{\rm{in}}}\) such that \({\kappa }_{{X}_{1}...{X}_{n}}=\mathop{\sum }\nolimits_{i = 1}^{m}{q}_{i}\,{\kappa }_{{X}_{1}...{X}_{n}}^{{f}_{i}}\) for some probability distribution {qi}. The proof will proceed by first observing that such a process can be seen to arise from one single deterministic process on n + 2 nodes. Together with the fact that every deterministic process is reversibly extendible, proven in ref. 11, this establishes the claim. In order to see that indeed an appropriate deterministic process on n + 2 nodes exists, let λout and Fin be variables with cardinality m and define the function

$$f:{X}^{{\rm{out}}}\times {\lambda }^{{\rm{out}}}\,\to \,{X}^{{\rm{in}}}\times {F}^{{\rm{in}}}$$
$$(x,\,i)\,\mapsto\, ({f}_{i}(x),\,i)\,,$$

where \({X}^{{\rm{out}}}={X}_{1}^{{\rm{out}}}\times ...\times {X}_{n}^{{\rm{out}}}\) (similarly for Xin) and x = (x1, ..., xn). Together with setting P(λout = i) qi, f defines a deterministic classical process over the nodes X1,...,Xn, λ and F, which gives back \({\kappa }_{{X}_{1}...{X}_{n}}\) upon marginalization over λ and F. That f indeed defines a process follows from the fact that arbitrary variation of the distribution P(λout) corresponds to an arbitrary weighting {qi} in the originally given mixture, each case of which has to be a classical process. This concludes the proof.