Introduction

Since the discovery of quantum algorithms that outperform all known classical ones in certain tasks,1 improving our understanding of the possibilities and limitations of quantum computation has become one of the central goals of quantum information theory. While it is notoriously difficult to prove unconditional separation of polynomial-time classical and quantum computation,2 an approach that is often regarded more tractable is to analyze how certain modifications of quantum computing affect its computational power. For instance, one may consider restrictions on the set of allowed quantum resources, and ask under which condition the possibility of universal quantum computation is preserved despite the restriction. Notable results along these lines, among many others, include the Gottesman–Knill theorem,3,4,5 insights on the necessity of contextuality as a resource for magic state distillation,6 or bounds on the noise threshold of quantum computers.7

In a complementary and in some sense more radical approach, going back to Abrams and Lloyd,8 one considers modifications of the quantum formalism itself and studies the impact of those modifications on the computational efficiency, resembling strategies of classical computer science such as the introduction of oracles.9 For example, it has been shown that availability of closed timelike curves leads to implausible computational power,10 that stronger-than-quantum nonlocality reduces the set of available transformations,11,12,13,14 that tomographic locality forces computations to be contained in a class called AWPP,15,16 and that in some theories (satisfying additional axioms) higher-order interference17 does not lead to a speed-up in Grover’s algorithm.18 Further examples can be found, e.g., in refs. 19,20,21,22

In this paper, we consider a specific modification of the quantum formalism that is arguably among the simplest and most conservative possibilities. This modification dates back to ideas by Jordan et al.,23 and it has several independent motivations as we will explain further below. This generalization keeps all characteristic properties of quantum computation unchanged, but modifies a single aspect: namely, it allows the quantum bit to have any number of d ≥ 2 degrees of freedom, instead of standard quantum theory’s d = 3 (or the classical bit’s d = 1). It has been conjectured24 that the resulting theories allow for interesting “beyond quantum” reversible multipartite dynamics, which would make the corresponding models of computation highly relevant objects of study within the research program mentioned above. However, here we show that, quite on the contrary, these models are so constrained that they do not even allow for classical computation; hence, in Aaronson’s terminology, the d = 3 case of the standard qubit circuit model can be seen as an “island in theoryspace”.25

Results

The results of this paper are organized as follows:

At first we motivate and explain the framework: In “Framework: Single gbits” section, we define single bits that generalize the qubit (“gbits”), and afterwards in “Framework: Gbit circuits” section, we give three postulates that allow us to reason about circuits that are constructed out of n of these gbits. We formulate the problem that is addressed in this work and describe how it relates to earlier results in the literature in “d = 3 equals quantum computation, and relation to earlier work” section. In “Main result” section, we state our main result: namely, while our principles uniquely determine quantum computation in the case that the single gbits have d = 3 degree of freedom, any other value of d does not even allow for classical computation.

Framework: Single gbits

In both classical and quantum computation, we can restrict our attention to the circuit model (as in Fig. 1) where each of the wires (the single systems that enter and exit logical gates) corresponds to a two-level system. Quantum two-level systems (qubits) are different from classical ones (bits): they allow for a more complex behavior which encompasses phenomena like coherent superposition, interference, or uncertainty relations. Yet, both classical and quantum bits can be formalized in a unified way that we now describe (for both single and multiple bits, i.e., circuits, we follow the construction and notation from ref. 26).

Fig. 1
figure 1

The circuit model that we consider in this paper. We have an arbitrary finite number n of wires (here n = 4), and each wire carries a “gbit” which is a state in a d-dimensional Bloch ball state space. Initially, a product state is prepared (encoding, for example, the classical input to the algorithm), then a finite number of gates Gi is applied, each acting on an arbitrary number of gbits, and finally local measurements are performed. We assume that the Gi are elements of an (arbitrary unspecified) closed connected matrix group, and that the global state of n wires is uniquely determined by the statistics and correlations of single-wire measurements (“tomographic locality”). If d = 3, i.e., if the gbits are qubits, it has been shown in ref. 26 that these assumptions uniquely characterize unitary quantum computation as the only computationally non-trivial theory. Here we analyze the case d ≠ 3, and prove that—despite conjectures to the opposite24—the corresponding models do not allow for any non-trivial computation at all. We do not assume that wires can be swapped, or that all transformations can be composed out of two-gbit transformations. See the main text for details

To any \(d \in {\Bbb N}\), we associate a “generalized bit” (gbit) that has the d-dimensional Bloch ball, \(B^d = \left\{ {\vec a \in {\Bbb R}^d|\left| {\vec a} \right| \le 1} \right\}\), as its state space. Every vector \(\vec a\) in the Bloch ball Bd corresponds to a possible state of the generalized bit. Two-outcome measurements are described by vectors \(\vec b \in {\Bbb R}^d\) with \(\left| {\vec b} \right| = 1\), such that the probability of the first outcome if performed on state \(\vec a\) is \(( {1 + \vec a \cdot \vec b}){\mathrm{/}}2\), and that of the second outcome is \(( {1 - \vec a \cdot \vec b} ){\mathrm{/}}2\). In the following, it will be convenient to use the notation \(v\left( {\vec a} \right) = \left( {1,\vec a} \right)^ \top \in {\Bbb R}^{d + 1}\), such that these two probabilities become \(\frac{1}{2}v( {\vec a} ) \cdot v( { \pm \vec b} )\). Reversible transformations of states are given by \(\vec a \mapsto R\vec a\), where RSO(d) is a rotation matrix. These transformations map states to states and can be inverted (by applying R1), hence we can interpret them as closed-system time evolutions or, equivalently, reversible gates on single generalized bits.

For d = 3, this formalism recovers the qubit of standard quantum theory:5 as is well-known, every 2 × 2 density matrix ρ can be written in the form

$$\rho = ({\mathbf{1}} + \vec a_\rho \cdot \vec \sigma )/2,$$

where \(\vec \sigma = \left( {\sigma _x,\sigma _y,\sigma _z} \right)\) denotes the Pauli matrices. It is automatic in this representation that trρ = 1, and positivity ρ ≥ 0 is equivalent to \(| {\vec a_\rho } | \le 1\). Hence the set of states of a quantum bit can be represented by the Bloch ball B3. This representation has the important property that statistical mixtures correspond to convex combinations: if a state ρ is prepared with probability p and another state ρ′ is prepared with probability 1 − p, then the total state  + (1 − p)ρ′ corresponds to the Bloch vector \(\vec a_{p\rho + (1 - p)\rho \prime } = p\vec a_\rho + (1 - p)\vec a_{\rho \prime }\). This statistical interpretation of convex mixtures is also taken for balls of other dimensions d ≠ 3, hence these Bloch balls can be regarded as state spaces of generalized probabilistic theories.11

In the d = 3 case, projective measurements are represented by unit vectors \(\vec b\), \(| {\vec b} | = 1\), with outcome probabilities \(( {1 \pm \vec a \cdot \vec b} ){\mathrm{/}}2\) as described above. Unitary transformations U on states, acting as \(\rho \mapsto U\rho U^\dagger\), are described in the Bloch ball picture by orthogonal maps RU, \(R_U^ \top R_U = {\mathbf{1}}\), such that \(\vec a_{U\rho U^\dagger } = R_U\vec a_\rho\). More general measurements (positive operator-valued measures) or transformations (completely positive maps) can also be described in the Bloch ball representation, but they are not needed in what follows and therefore omitted.

The simplest case of d = 1 corresponds to the classical bit: there are two possible configurations, \(\vec a = + 1\) and \(\vec a^\prime = - 1\), and further states that represent classical uncertainty about the configuration. Namely, if we have +1 with probability p (and thus −1 with probability 1 − p), this corresponds to the state \(p\vec a + (1 - p)\vec a^\prime\) in the interior the one-dimensional “Bloch ball”.

There is one peculiarity in the d = 1 case: instead of SO(1) = {1}, we should allow the group O(1) = {−1, 1} as Bloch ball transformations such that also the bit flip is allowed.

What is the significance of the d-dimensional Bloch balls if d is neither one nor three? These gbits have appeared in various places in quantum information theory and the foundations of quantum mechanics. Historically, they have first shown up as precisely those two-level state spaces that can be described as (formally real, irreducible) Jordan algebras,23 a natural algebraic generalization of standard quantum theory. In fact, quantum theory with real amplitudes, i.e., over the field \({\Bbb R}\) instead of \({\Bbb C}\), has a (d = 2)-dimensional Bloch ball as its “quantum bit”, and the bits of27 quaternionic28 and octonionic quantum theory correspond to Bd for d = 5 and d = 9, respectively. Furthermore, the fact that a two-level system should have a Euclidean ball state space can be derived from a variety of different sets of natural assumptions. In many reconstructions of quantum theory from physical or information-theoretic principles,29,30,31,32,33,34,35,36,37,38,39,40 this fact is derived as a first step. For example, postulating that the group of reversible transformations acts transitively on the pure states implies that the pure states must all lie on the unit hypersphere of an invariant inner product. If some points on the sphere were not valid states, then there would exist additional measurements that would violate further natural postulates like Hardy’s29 “Subspaces” axiom. This argumentation or others along similar lines29,30,31,32,33,34,35,36,37,38,39,40 lead to Euclidean balls as the most natural state spaces of a generalized bit.

A more geometrical motivation can be found by considering spin-\(\frac{1}{2}\) particles (compare, e.g., to ref. 24): under rotations SO(3), they transform via SU(2). The density matrix transforms under the adjoint representation, which means that the Bloch vectors transform via the same rotation as in physical space. Therefore, the Bloch vector \(\vec b\) can be seen as defining an oriented axis in physical space. The model considered in this paper is a direct generalization of the Bloch ball and this interpretation to arbitrary spatial dimensions. Indeed, the possibility that space might have more than three dimensions has appeared in a large variety of physical theories.41,42,43,44,45,46 It has also been argued that these generalized bits can be interpreted as “information quasiparticles” in some sense.47 In summary, these gbits are among the simplest and most natural generalizations of the classical bit and the qubit of quantum mechanics.

Framework: Gbit circuits

To describe circuit computation, we need to define the state space, measurements, and transformations of several gbits. In standard quantum theory, where the gbits are qubits, there is a unique definition of these notions: the states of n qubits are exactly the (2n) × (2n) density matrices, the reversible transformations are the unitaries, and the measurements are described by collections of projection operators. Similar definitions apply to n classical bits. But if the gbits are Bloch balls of dimension \(d\not \in \{ 1,3\}\), then it is apriori unclear what the composite state space should be.

Since we would like to be as general as possible, we will not make any attempt to fix the composite state space from the outset. Instead, we will work with a small set of principles that the composite n-gbit system is supposed to satisfy. While these principles will constrain the n-gbit state space, it is by no means obvious that they determine it uniquely. However, we will show below that they are indeed constraining enough to allow us to derive the full set of states and transformations.

An important principle is the no-signaling principle:11 the outcome statistics of measurements on any group of gbits does not depend on any other operations (e.g., measurements) that are performed on the remaining gbits. This is a physically well-motivated constraint that lies at the heart of what we mean by “different wires” (i.e., subsystems) of the circuit in the first place.

This principle is satisfied by classical as well as quantum computation, and so is our second postulate of tomographic locality:29,48 every state on n gbits is uniquely characterized by the statistics and correlations of the local gbit measurements. In other words, a global n-gbit state is nothing but a catalog of probabilities for the outcomes of all the single-gbit measurements and their correlations.

It is not only classical and quantum theory that satisfies the principle of tomographic locality, but also more general probabilistic theories like boxworld.12 If this principle was violated, then a collection of gbits would in some counterintuitive sense be “more” than a composition of its building blocks. Even though this formulation makes tomographic locality sound very natural, there are simple examples of theories that violate it. One such example is given by quantum theory over the real numbers \({\Bbb R}\).49,50 This is because observables of two single real qubits do not linearly generate all observables of two real qubits. In particular, if σy is the Pauli matrix with purely imaginary entries, then σy is not a real qubit observable, but σy σy is a real two-qubit observable. Intuitively, it represents a novel “holistic” degree of freedom that cannot be constructed out of local degrees of freedom and their correlations.

Not only is the postulate of tomographic locality very intuitive, but it is also very powerful: it allows us to represent states of n gbits as tensors.11 That is, even if we do not know what the set of n-gbit states is, we know that every such state can be written as an element of the linear space \(\left( {{\Bbb R}^{d + 1}} \right)^{ \otimes n}\) (in the quantum case, where d = 3, this amounts to the 4n-dimensional real linear space of Hermitian (2n) × (2n) matrices; for real bits, it is the 2n-dimensional space that contains the probability vectors over 2n configurations). In particular, an n-gbit product state with local Bloch vectors \(\vec a_1, \ldots \vec a_n\) is represented by

$$v\left( {\vec a_1, \ldots ,\vec a_n} \right): = \left( {1,\vec a_1} \right)^ \top \otimes \ldots \otimes \left( {1,\vec a_n} \right)^ \top ,$$

and all other states ω are vectors on the same space (but not of this product form). Tomographic locality then amounts to the fact that all these states are uniquely determined by the numbers

$$2^{ - n}v\left( {\vec b_1, \ldots ,\vec b_n} \right)^ \top \omega ,$$

which are the outcome probabilities of local gbit measurements corresponding to the Bloch vectors \(\vec b_1, \ldots ,\vec b_n\) on the state ω. This mathematical property has many intuitively appealing consequences that are not otherwise guaranteed, e.g., the property that products of pure states are pure. It is also the reason why the mathematical literature has focused almost entirely on this notion of composite state space (cf., e.g., ref. 51): it leads to notions of “tensor products” of ordered linear spaces that allow one to prove general statements that are otherwise unavailable. In the context of this paper, it would seem extremely difficult to make any meaningful statements whatsoever if not even the linear space on which the global states live could be fixed from the outset.

We need one further ingredient to arrive at a model of computation, namely a set of reversible transformations. In analogy to standard quantum computation (where these are the unitaries), we postulate that the transformations form a closed connected matrix group, and thus Lie group, \({\cal G}\): they form a group since they can be composed; they must be linear maps since if we prepare a state ω with probability p and ω′ with probability (1 − p), they must act on the components of the convex combination  + (1 − p)ω′ individually, to be consistent with the probabilistic interpretation.11 Moreover, it is physically meaningful to model the group as closed since whenever we can approximate a transformation to arbitrary accuracy by gates, it makes sense to declare this transformation as in principle implementable.

This postulate is almost, but not quite, satisfied by classical computation, i.e., the d = 1 case. As Bennett has shown,52 classical computation can be made fully reversible, at only marginal cost of space or time resources. There are finite universal gate sets (including, e.g., Toffoli gates) that generate the full group of permutations of the 2n configurations of the n bits. These permutations therefore constitute the reversible transformations of the classical bits, and they form a closed matrix group of linear maps. This group, however, is discrete and not connected.

This discreteness is already reflected in the fact that the one-dimensional “Bloch ball” is discrete, i.e., has only a finite number (two) of pure states. Since the set of classical configurations (pure states) of n bits is discrete, the group of reversible transformations must also be discrete. In the case d ≥ 2 to which we thus restrict our attention in the following, however, even single bits (Bloch balls) contain a continuous manifold of pure states. In order to allow every pure state to evolve into every other (which we would expect to be crucial for the exploitation of the full computational potential), it is therefore necessary that the reversible transformations form a continuous group \({\cal G}\)—in more detail, that \({\cal G}\) is a matrix Lie group such that its connected component at the identity is non-trivial. It then makes sense to consider continuous time evolution that implements elements of this connected component (as it is the case in quantum theory), and to disregard the mathematical possibility of having additional disconnected components. This motivates the assumption that \({\cal G}\) is connected.

All gates in a circuit will be elements of \({\cal G}\). This group must in particular contain the local qubit rotations: for RSO(d), write \(\hat R\left( {1,\vec a} \right)^ \top : = \left( {1,R\vec a} \right)^ \top\), then the subgroup of local transformations is

$${\cal G}_{{\mathrm{loc}}}: = \left\{ {\hat R_1 \otimes \hat R_2 \otimes \ldots \otimes \hat R_n|R_i \in {\mathrm{SO}}(d)} \right\}.$$

Note that we have used tomographic locality in deriving this prescription: since a local transformation acts like a product of transformations on the product states, it must act like this on all other states too since they live on the vector space that is spanned by the product states. Tomographic locality hence enforces that we can represent any linear map \(X:\left( {{\Bbb R}^{(d + 1)}} \right)^{ \otimes n} \to \left( {{\Bbb R}^{(d + 1)}} \right)^{ \otimes n}\) as a tensor with n upper and n lower indices; that is,

$$X_{\beta _1\beta _2 \ldots \beta _n}^{\alpha _1\alpha _2 \ldots \alpha _n}: = \left( {\vec e_{\beta _1} \otimes \ldots \otimes \vec e_{\beta _n}} \right)^ \top X\left( {\vec e_{\alpha _1} \otimes \ldots \otimes \vec e_{\alpha _n}} \right),$$

where 0 ≤ αi, βi ≤ d, and \(\vec e_\gamma\) denotes the γ-th unit vector, e.g., \(\vec e_0 = \left( {1,0, \ldots ,0} \right)^ \top\). This is in contrast to Bloch vectors \(\vec b \in {\Bbb R}^d\), where we use the notation \({\Bbb R}^d \ni \vec b = \vec e_1 = \left( {1,0, \ldots ,0} \right)^ \top\).

We demand that \({\cal G}_{{\mathrm{loc}}} \subseteq {\cal G}\), but do not make any further assumptions on \({\cal G}\). In particular, we do not assume that the n gbits have physically identical roles: our assumptions allow in principle composite state spaces of n gbits that are not symmetric with respect to permutations of the gbits. Hence we are also not assuming that gbits can be reversibly swapped, or that other natural choices of transformations such as extensions of classical reversible gates (like CNOT) can necessarily be implemented. Therefore, our framework does not rely on the same set of assumptions as the circuit framework of symmetric monoidal categories53 that is often used in the quantum foundations context.

For any Bloch ball dimension d, there is a trivial computational model: namely the choice that \({\cal G} = {\cal G}_{{\mathrm{loc}}}\). This describes a theory where the only possible reversible transformations are independent local transformations of the single gbits. Such a model does not even allow for classical gates like the CNOT; it only admits gates and computations that evolve the gbits independently from each other without ever correlating them, i.e., products of single-gbit gates. A state space that is compatible with this choice of global transformations is simply

$${\mathrm{conv}}\left\{ {\left( {1,\vec a_1} \right)^ \top \otimes \ldots \otimes \left( {1,\vec a_n} \right)^ \top |\vec a_i \in B^d} \right\},$$

i.e., all convex combinations of product states. This is a state space that does not contain entanglement.

d = 3 equals quantum computation, and relation to earlier work

For the case of the standard qubit, i.e., of d = 3, it has been proven in ref. 26 that there is only a single possible non-trivial \(\left( {\cal {G}}_{\mathrm{loc}}{\subsetneq} {\cal {G}} \right)\) theory that satisfies the assumptions from above: namely, standard quantum theory over n qubits, with the (2n) × (2n) density matrices as the states, and the projective unitary group \({\cal G} = {\mathrm{PU}}\left( {2^n} \right)\) of transformations. That is, the postulates on composition of gbits from above, together with the structure of the single qubit, are sufficient to determine qubit quantum computation uniquely.

While this result is interesting in its own right, it is also the main motivation for the present work: if quantum computation is characterized by such a simple list of principles, then maybe one obtains other interesting models of computation by slightly tweaking one of the postulates. Since large parts of the mathematical structure are determined by the postulates on composition (no-signaling and tomographic locality), the most promising road towards modifying the setup and also keeping important mathematical tools seems to be to modify the structure of the single qubit—and technically as well as conceptually (as explained in “Framework: Single gbits” subsection), the most natural way to do this is by changing the dimension of the Bloch ball d.

In the special case of n = 2 gbits, the consequences of the above postulates have been explored in refs. 54,55. There it has been proven that the only consistent choice of transformations for Bloch ball dimension d ≠ 3 is given by the trivial choice \({\cal G} = {\cal G}_{{\mathrm{loc}}}\). However, computation is typically taking place on a large number \(n \gg 2\) of gbits, and the techniques of refs. 54,55 cannot readily be generalized to n > 2.

In fact, it has been suggested in ref. 24 that it is essential for Bloch ball dimensions d ≥ 4 to allow for genuine m-partite interaction of the gbits, where m ≥ d − 1 ≥ 3. Without a conclusive proof or explicit construction of the state space, the authors conjectured that interesting multipartite reversible dynamics is possible for such systems. In contrast to quantum theory, this m-partite dynamics would not be decomposable into two-gbit interactions. While tomographic locality has not been assumed in ref. 24, it is an important first step to check their conjecture under this additional assumption. In fact, it has been argued in ref. 56 that in the context of spacetime physics (the Bloch balls are interpreted in ref. 24 as carrying some sort of d-dimensional spin degrees of freedom), tomographic locality is to be expected due to arguments from group representation theory.

This gives us another, independent motivation to ask the main question of this paper: if d ≠ 3 and n is any finite number of gbits, then what are the possible theories that satisfy the assumptions of “Framework: Gbit circuits” subsection?

Main result

The main result of this work is an answer to the question posed at the end of the previous section:

Theorem 1

Consider a theory of n gbits, where single gbits are described by a (d ≥ 2)-dimensional Bloch ball state space, subject to the single-gbit transformation group SO(d). As described above, let us assume no-signaling, tomographic locality, and that the global transformations form a closed connected matrix group \({\cal {G}}\). If d ≠ 3, then necessarily \({\cal G} = {\cal G}_{{\mathrm{loc}}}\), i.e., the only possible gates are (independent combinations of) single-gbit gates. No transformation can correlate gbits that are initially uncorrelated; hence not even classical computation is possible.

Theorem 1 will be proved in “Methods” section.

Discussion

Given a few simple properties that turn out to characterize qubit quantum computation, we have considered a natural modification: allowing the single bits to have more or less than the qubit’s d = 3 degrees of freedom. We have analyzed the set of possible reversible transformations in the resulting theories, under the conjecture24 (and in hopes) of discovering novel computational models that differ in interesting ways from quantum computation. Unfortunately, it turns out that the resulting models do not allow for any non-trivial reversible gates whatsoever. This reinforces earlier intuition25 that quantum theory, or in this context quantum computation, is an “island in theoryspace”.

While we have made an effort to be as careful and parsimonious in our assumptions as possible, it is still interesting to ask whether there are any remaining “loopholes” that could in principle leave some wiggle room for non-trivial beyond-quantum computation: can any of the assumptions of “Framework: Gbit circuits” subsection be dropped or weakened, while insisting that single bits are described by Bloch balls? We discuss several options in the Supplementary Material; in short, the most promising (but difficult) approaches would be to drop tomographic locality, and/or to drop reversibility or continuity of transformations. Both options present formidable mathematical challenges and are therefore deferred to future work.

The “rigidity” of quantum theory, i.e., the difficulty of modifying it in consistent ways, has been recognized in different contexts for a long time, see, e.g., Weinberg’s proposal of a nonlinear modification of quantum mechanics,57 and Gisin’s subsequent discovery58 that this modification allows for superluminal signaling. The research presented in this paper and in other work (like refs. 59,60) makes this intuition more rigorous by specifying which combinations of principles already enforce the familiar behavior of quantum theory. These insights also illuminate our understanding of quantum computation, since they tell us which physical principles enforce its properties, and/or which other theoretical models of computation are plausibly conceivable.

Finally, it is interesting to speculate that the result of this paper is indirectly related to spacetime physics. After all, it is the fact that a qubit is represented as a 3-ball B3, with SO(3) as its transformation group, which allows for spin-1/2 particles that couple to rotations in three-dimensional space. Given the popularity of approaches in which spacetime emerges in some way from an underlying quantum theory,61,62,63 this observation can perhaps be regarded as more than a coincidence. In fact, it has been argued more rigorously that the structures of quantum theory and spacetime mutually constrain each other.24,56,64,65,66 This suggests a slogan that also fits some other ideas from quantum information:67 the limits of computation are the limits of our world.

Methods

We will now prove this result for the case d ≥ 4. The proof in the d = 2 case uses similar techniques, but differs in several details for group-theoretic reasons. It will hence be deferred to the Supplementary Material.

Generator normal form for all dimensions d ≥ 2

As a first step, we will consider the generators of global transformations and show that there exists at least one that is of a certain normal form. This part of the proof is valid for all dimensions d ≥ 2. A large part of this first step follows the construction in ref. 26, and extends it to arbitrary dimensions.

Let \(G \in {\cal G}\) be a transformation of the composite system. Suppose we prepare n gbits initially in states with Bloch vectors \(\vec a_1, \ldots ,\vec a_n\), evolve the resulting product state via G, and perform a final local n-gbit measurement with Bloch vectors \(\vec b_1, \ldots ,\vec b_n\). The probability that the all the n outcomes on the n gbits are “yes” is

$$2^{ - n}v\left( {\vec b_1,\vec b_2, \ldots ,\vec b_n} \right)^ \top Gv\left( {\vec a_1,\vec a_2, \ldots ,\vec a_n} \right) \in [0,1].$$

Let us consider a group element \(G = e^{\epsilon X}\) with \(X \in {\frak g}\) (the corresponding Lie algebra) and \(\varepsilon \in {\Bbb R}\) and expand:

$$v\left( {\vec b_1, \ldots ,\vec b_n} \right)^ \top \left( {{\mathbf{1}} + \epsilon X + \frac{{\epsilon ^2}}{2}X^2 + {\cal O}\left( {\epsilon ^3} \right)} \right)v\left( {\vec a_1, \ldots ,\vec a_n} \right) \in [0,2^n].$$

From now on we restrict ourselves to unit length Bloch vectors, i.e., \(\left| {\vec a_i} \right| = \left| {\vec b_j} \right| = 1\) for all i, j. We obtain

$${\cal C}\left[ {\vec a_1} \right]: = v\left( { - \vec a_1,\vec b_2,...,\vec b_n} \right)^ \top Xv\left( {\vec a_1,\vec a_2, \ldots ,\vec a_n} \right) = 0$$

since the zeroth order is zero which is a local minimum as a function of ε (see Fig. 2 for further explanation). Thus the second order contribution has to be non-negative:

$$v\left( { - \vec a_1,\vec b_2, \ldots ,\vec b_n} \right)^ \top X^2v\left( {\vec a_1,\vec a_2, \ldots ,\vec a_n} \right) \ge 0,$$

or more generally with the roles of qubits 1 and k exchanged,

$$v\left( {\vec b_1, \ldots ,\vec b_{k - 1}, - \vec a_k,\vec b_{k + 1}, \ldots \vec b_n} \right)^ \top X^2v\left( {\vec a_1, \ldots ,\vec a_n} \right) \ge 0.$$
(1)

Other first and second order constraints are

$$v\left( {\vec a_1,\vec a_2, \ldots ,\vec a_n} \right)^ \top Xv\left( {\vec a_1,\vec a_2, \ldots ,\vec a_n} \right) = 0,$$
(2)
$$v\left( {\vec a_1,\vec a_2, \ldots ,\vec a_n} \right)^ \top X^2v\left( {\vec a_1,\vec a_2, \ldots ,\vec a_n} \right) \le 0$$
(3)

for analogous reasons as above (since \(\vec b_j = \vec a_j\) for all j yields probability one for \(\epsilon = 0\), which is the global and thus a local maximum). For fixed Bloch vectors \(\vec a_2, \ldots ,\vec a_n,\vec b_2, \ldots ,\vec b_n\), define \(W_\beta ^\alpha\) as

$$\left[ {\vec e_\beta \otimes \left( {\begin{array}{*{20}{c}} 1 \\ {\vec b_2} \end{array}} \right) \otimes \ldots \otimes \left( {\begin{array}{*{20}{c}} 1 \\ {\vec b_n} \end{array}} \right)} \right]^ \top X\left[ {\vec e_\alpha \otimes \left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_2} \end{array}} \right) \otimes \ldots \otimes \left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_n} \end{array}} \right)} \right].$$
(4)

The equation \({\cal C}\left[ {\vec e_i} \right] = 0\) implies \(W_0^0 + W_0^i - W_i^0 - W_i^i = 0\), and \({\cal C}\left[ { - \vec e_i} \right] = 0\) implies \(W_0^0 - W_0^i + W_i^0 - W_i^i = 0\). Thus, \(W_i^i = W_0^0\) and \(W_0^i = W_i^0\) for all i ≥ 1. Since the vectors \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a} \end{array}} \right)\) linearly span all of \({\Bbb R}^{d + 1}\), we get

$$X_{i\;\beta _2\; \ldots \;\beta _n}^{i\;\alpha _2\; \ldots \;\alpha _n} = X_{0\;\beta _2\; \ldots \;\beta _n}^{0\;\alpha _2\; \ldots \;\alpha _n},$$
(5)
$$X_{0\;\beta _2\; \ldots \;\beta _n}^{i\;\alpha _2\; \ldots \;\alpha _n} = X_{i\;\beta _2\; \ldots \;\beta _n}^{0\;\alpha _2\; \ldots \;\alpha _n}$$
(6)

for all i ≥ 1 and all α2, …, αn, β2, …, βn ≥ 0. Similarly, \({\cal C}\left[ {\frac{1}{{\sqrt 2 }}\left( {\vec e_i + \vec e_j} \right)} \right] = 0\) for i ≠ j, i, j ≥ 1 yields

$$W_0^0 + \frac{1}{{\sqrt 2 }}W_0^i + \frac{1}{{\sqrt 2 }}W_0^j - \frac{1}{{\sqrt 2 }}W_i^0 - \frac{1}{2}W_i^i - \frac{1}{2}W_i^j - \frac{1}{{\sqrt 2 }}W_j^0 - \frac{1}{2}W_j^i - \frac{1}{2}W_j^j = 0.$$

Using the results on \(W_i^i\) and \(W_i^0\) further above, this reduces to \(- \frac{1}{2}W_i^j - \frac{1}{2}W_j^i = 0\), and thus

$$X_{j\;\beta _2\; \ldots \;\beta _n}^{i\;\alpha _2\; \ldots \;\alpha _n} = - X_{i\;\beta _2\; \ldots \;\beta _n}^{j\;\alpha _2\; \ldots \;\alpha _n}$$
(7)

for all i, j ≥ 1 and α2, …, αn, β2, …, βn ≥ 0. While we have derived (5), (6), and (7) for the first gbit, analogous equations hold for all other gbits with labels 2, …, n.

Fig. 2
figure 2

We are using configurations like this one to derive constraints on the generators \(X \in {\frak g}\). In the special case ε = 0, the transformation exp(εX) reduces to the identity. Hence, if we prepare the first wire in the (pure) state with Bloch vector \(\vec a_1\), and perform a final measurement of that wire with Bloch vector \(- \vec a_1\), the corresponding outcome will have probability zero, regardless of which local measurements we choose for the other wires. But probability zero is a local minimum, which implies that the derivative of this probability with respect to ε must be zero (yielding \(C[\vec a_1] = 0\)), and the second derivative must be non-negative (yielding constraint (1) in the case k = 1)

Let us denote by \({\cal A}\) the antisymmetric (d + 1) × (d + 1)-matrices of the form

and by \({\cal B}\) the symmetric (d + 1) × (d + 1)-matrices of the form

Furthermore, let \({\cal I}: = {\Bbb R} \cdot {\mathbf{1}}\), i.e., all multiples of the (d + 1) × (d + 1) identity matrix. The sets \({\cal A}\), \({\cal B}\) and \({\cal I}\) are real linear matrix subspaces. Note that these three spaces are pairwise orthogonal with respect to the Hilbert–Schmidt inner product 〈X, Y〉 := tr(XY). The matrix W defined in (4) must then be an element of \({\cal A} \oplus {\cal B} \oplus {\cal I}\) due to the identities for its components that we have derived above. More generally, since the same identities hold for every index i {1, …, n} for the tensor X, we obtain \(X \in ({\cal A} \oplus {\cal B} \oplus {\cal I})^{ \otimes n}\). Since \(X \in {\frak g}\) was arbitrary, this tells us that

$${\frak g} \subset \left( {{\cal A} \oplus {\cal B} \oplus {\cal I}} \right)^{ \otimes n}.$$

The Lie algebra of the local transformations is

$${\frak g}_{{\mathrm{loc}}} = {\cal A} \otimes {\mathbf{1}} \otimes \ldots \otimes {\mathbf{1}} + {\mathbf{1}} \otimes {\cal A} \otimes {\mathbf{1}} \otimes \ldots \otimes {\mathbf{1}} + \ldots + {\mathbf{1}} \otimes {\mathbf{1}} \otimes \ldots \otimes {\mathbf{1}} \otimes {\cal A},$$

writing “+” instead of “” for readability. We can write the space \(({\cal A} \oplus {\cal B} \oplus {\cal I})^{ \otimes n}\) in a somewhat different form. To this end, consider strings of symbols x {A, B, I}n, for example, x = ABAI (if n = 4), and denote the corresponding tensor product matrix spaces by Sx; for this example, \(S_x = {\cal A} \otimes {\cal B} \otimes {\cal A} \otimes {\cal I}\). Then Sx Sy for x ≠ y (with respect to the Hilbert–Schmidt inner product), and

$$({\cal A} \oplus {\cal B} \oplus {\cal I})^{ \otimes n} = \mathop { \oplus }\limits_{x \in \{ A,B,I\} ^n} S_x.$$

Now let \(X \in {\frak g}{\mathrm{\backslash }}{\frak g}_{{\mathrm{loc}}}\) be an arbitrary generator which is not in the local Lie algebra (here we explicitly make the assumption that such an X exists). Since X ≠ 0, there must exist x such that Φx(X) ≠ 0 for the orthogonal projection Φx into Sx, and since \(X\not \in {\frak g}_{{\mathrm{loc}}\prime }\), at least one of those x must satisfy

$$x\not \in \{ AI \ldots I,IAI \ldots I, \ldots ,I \ldots IA\} .$$

Reordering the gbits, we may assume that \(x = A^{n_A}B^{n_B}I^{n_I}\), where nA + nB + nI = n and one of the following three cases applies:

  1. (i)

    \(n_A = 0\),

  2. (ii)

    nA = 1 and nB ≥ 1,

  3. (iii)

    \(n_A \ge 2\),

Since Sx has an orthonormal basis of matrices of the form \(A_{\bar A_1} \otimes \ldots \otimes A_{\bar A_{n_A}} \otimes B_1 \otimes \ldots \otimes B_{n_B} \otimes {\mathbf{1}}^{ \otimes n_I}\), where all \(A_{\bar A_i} \in {\cal A}\) and \(B_i \in {\cal B}\), there must exist some matrix \(\tilde M_x\) of that form (i.e., \(\tilde M_x \in S_x\)) such that \(\left\langle {X,\tilde M_x} \right\rangle \ne 0\). By moving constant scalar factors into the A-terms, we may assume that there are unit vectors \(\vec b_i\) such that \(B_i = B_{\vec b_i}\) for i = 1, …, nB. But since \(\hat RB_{\vec b}\hat R^ \top = B_{R\vec b}\) for all R SO(d), there are orthogonal matrices \(\hat R_i\) such that \(R_i\vec b_i = \vec e_1 = \left( {1,0, \ldots ,0} \right)^ \top\) for all i, and the local transformation \(T: = {\mathbf{1}}^{ \otimes n_A} \otimes \hat R_1 \otimes \ldots \otimes \hat R_{n_B} \otimes {\mathbf{1}}^{ \otimes n_I}\) satisfies

$$M_x^\prime : = T\tilde M_xT^{ - 1} = T\tilde M_xT^{\rm T} = A_{\bar A_1} \otimes \ldots \otimes A_{\bar A_{n_A}} \otimes B^{ \otimes n_B} \otimes {\mathbf{1}}^{ \otimes n_I},$$

where \(B: = B_{\vec e_1}\). Set X′ := TXT−1, then since \(T \in {\cal G}_{{\mathrm{loc}}} \subset {\cal G}\), and since the adjoint action of \({\cal G}_{{\mathrm{loc}}}\) preserves \({\frak g}_{{\mathrm{loc}}}\), we have \(X^\prime \in {\frak g}{\mathrm{\backslash }}{\frak g}_{{\mathrm{loc}}}\), and \(\left\langle {X{^\prime},M_x^\prime } \right\rangle = {\mathrm{tr}}\left( {TXT^{ - 1}T\tilde M_xT^{ - 1}} \right) = \left\langle {X,\tilde M_x} \right\rangle \ne 0\). Similar argumentation allows us to bring the \(A_{\bar A_i}\) into a standard form. Since the d × d-matrices \(\bar A_i\) are antisymmetric, one can infer from the results in refs. 17,27,28 that there are orthogonal transformations Ri SO(d) such that

$$\begin{array}{c}R_i\bar A_iR_i^ \top = \left( {\begin{array}{*{20}{c}} 0 & {\lambda _1^{(i)}} & {} & {} & {} & {} & {} \\ { - \lambda _1^{(i)}} & 0 & {} & {} & {} & {} & {} \\ {} & {} & 0 & {\lambda _2^{(i)}} & {} & {} & {} \\ {} & {} & { - \lambda _2^{(i)}} & 0 & {} & {} & {} \\ {} & {} & {} & {} & \ddots & {} & {} \\ {} & {} & {} & {} & {} & 0 & {\lambda _{d/2}^{(i)}} \\ {} & {} & {} & {} & {} & { - \lambda _{d/2}^{(i)}} & 0 \end{array}} \right)(d\,{\mathrm{even}}),\\ \left( {\begin{array}{*{20}{c}} 0 & {} & {} & {} & {} & {} & {} & {} \\ {} & 0 & {\lambda _1^{(i)}} & {} & {} & {} & {} & {} \\ {} & { - \lambda _1^{(i)}} & 0 & {} & {} & {} & {} & {} \\ {} & {} & {} & 0 & {\lambda _2^{(i)}} & {} & {} & {} \\ {} & {} & {} & { - \lambda _2^{(i)}} & 0 & {} & {} & {} \\ {} & {} & {} & {} & {} & \ddots & {} & {} \\ {} & {} & {} & {} & {} & {} & 0 & {\lambda _{\frac{{d - 1}}{2}}^{(i)}} \\ {} & {} & {} & {} & {} & {} & { - \lambda _{\frac{{d - 1}}{2}}^{(i)}} & 0 \end{array}} \right)(d\,{\mathrm{odd}}).\end{array}$$

To save space, we will use the following notation in the remainder of the paper, where \(\sigma = \left( {\begin{array}{*{20}{c}} 0 & 1 \\ { - 1} & 0 \end{array}} \right)\):

$$R_i\bar A_iR_i^ \top = \left\{ {\begin{array}{*{20}{c}} {\lambda _1^{(i)}\sigma \oplus \lambda _2^{(i)}\sigma \oplus \ldots \oplus \lambda _{d/2}^{(i)}\sigma } & {(d\,{\mathrm{even}}),} \\ {0_{1 \times 1} \oplus \lambda _1^{(i)}\sigma \oplus \lambda _2^{(i)}\sigma \oplus \ldots \oplus \lambda _{\frac{{d - 1}}{2}}^{(i)}\sigma } & {(d\,{\mathrm{odd}}).} \end{array}} \right.$$

Now consider the corresponding (d + 1) × (d + 1)-matrices \(A_{R_i\bar A_iR_i^ \top }\), for which we will introduce the following notation. By Aj, denote the matrix for which only the jth block is non-zero, with λj = 1. That is, for even d, we have the (d + 1) × (d + 1)-matrices

$$\begin{array}{c}A_1 = 0_{1 \times 1} \oplus \sigma \oplus 0_{2 \times 2} \oplus \ldots \oplus 0_{2 \times 2},\\ A_2 = 0_{1 \times 1} \oplus 0_{2 \times 2} \oplus \sigma \oplus 0_{2 \times 2} \oplus \ldots \oplus 0_{2 \times 2},\\ \vdots \\ A_{d/2} = 0_{1 \times 1} \oplus 0_{2 \times 2} \oplus \ldots \oplus 0_{2 \times 2} \oplus \sigma ,\end{array}$$

and for odd d, we have an extra initial zero, namely

$$\begin{array}{c}A_1 = 0_{2 \times 2} \oplus \sigma \oplus 0_{2 \times 2} \oplus \ldots \oplus 0_{2 \times 2},\\ A_2 = 0_{2 \times 2} \oplus 0_{2 \times 2} \oplus \sigma \oplus 0_{2 \times 2} \oplus \ldots \oplus 0_{2 \times 2},\\ \vdots \\ A_{(d - 1)/2} = 0_{2 \times 2} \oplus 0_{2 \times 2} \oplus \ldots \oplus 0_{2 \times 2} \oplus \sigma .\end{array}$$

The local transformation \(\tilde T: = \hat R_1 \otimes \ldots \otimes \hat R_{n_A} \otimes {\mathbf{1}}^{ \otimes n_B} \otimes {\mathbf{1}}^{ \otimes n_I}\) satisfies

$$\begin{array}{r}M_{x}: = {{\tilde T}M{^\prime}}_{x}{\tilde T}^{ - 1} = {{\tilde T}M{^\prime}_{x}{\tilde T}^ \top} \\ = \left( {\mathop {\sum}\limits_{j} {\kern 1pt} \lambda _{j}^{(1)}A_j} \right)\otimes \ldots \otimes \left( {\mathop {\sum}\limits_{j} {\kern 1pt} \lambda _{j}^{(n_A)}A_j} \right) \otimes B^{ \otimes n_B} \otimes {\mathbf{1}}^{ \otimes n_I},\end{array}$$
(8)

where the \(\lambda _j^{(i)}\) are real numbers. Set \(X^{\prime\prime} : = \tilde TX{^\prime}\tilde T^{ - 1}\), then since \(\tilde T \in {\cal G}_{{\mathrm{loc}}} \subset {\cal G}\), we have \(X^{\prime\prime} \in {\frak g}{\mathrm{\backslash }}{\frak g}_{{\mathrm{loc}}}\), and \(\left\langle {X^{\prime\prime} ,M_x} \right\rangle = {\mathrm{tr}}\left( {\tilde TX^\prime \tilde T^{ - 1}\tilde TM_x^\prime \tilde T^{ - 1}} \right) = \left\langle {X^\prime ,M_x^\prime } \right\rangle \ne 0\).

In summary, we have shown that if there exist any nonlocal generators at all, then there is one (denoted X″) that has non-zero overlap with a matrix MxSx of the simple form (8).

Next we will show that this implies that \({\frak g} = {\frak g}_{{\mathrm{loc}}}\) for all Bloch ball dimensions d ≥ 4.

Proof of Theorem 1 for d ≥ 4

We now use Schur’s Lemma to construct orthogonal projectors (with respect to the Hilbert–Schmidt inner product) onto the subspaces of \({\cal A} \oplus {\cal B} \oplus {\cal I}\). First, define

$${\mathrm{\Phi }}_I[M]: = {\int}_{{\mathrm{SO}}(d)} {\kern 1pt} \hat RM\hat R^{ - 1}dR\quad (M \in {\cal A} \oplus {\cal B} \oplus {\cal I}),$$

then ΦI[M] = 0 for all \(M \in {\cal A} \oplus {\cal B}\) and ΦI[M] = M for all \(M \in {\cal I}\). Since these subspaces are orthogonal with respect to the Hilbert–Schmidt inner product, ΦI is the orthogonal projector onto the subspace \({\cal I}\) of \({\cal A} \oplus {\cal B} \oplus {\cal I}\) (we are not interested in its action on matrices that are not in the space \({\cal A} \oplus {\cal B} \oplus {\cal I}\)).

For j = 1, …, d, consider the stabilizer subgroup

$${\cal G}_j: = \{ R \in {\mathrm{SO}}(d)|R\vec e_j = \vec e_j\} ,$$

where \(\vec e_j\) denotes the jth standard unit vector in \({\Bbb R}^d\). Every \({\cal G}_j\) is isomorphic to SO(d − 1) whose fundamental representation is irreducible (note that this is not true for d = 3; this causes the crucial difference to ref. 26). Set

$${\mathrm{\Phi }}_{\vec e_j}[M]: = {\int}_{{\cal G}_j} {\kern 1pt} \hat RM\hat R^{ - 1}{\kern 1pt} dR\quad \left( {M \in {\cal A} \oplus {\cal B} \oplus {\cal I}} \right),$$

then \({\mathrm{\Phi }}_{\vec e_1}[M] = {\int}_{{\mathrm{SO}}(d - 1)} {\left( {\begin{array}{*{20}{c}} {{\mathbf{1}}_2} & {} \\ {} & S \end{array}} \right)M\left( {\begin{array}{*{20}{c}} {{\mathbf{1}}_2} & {} \\ {} & {S^{ - 1}} \end{array}} \right){\kern 1pt} dS}\), and, similarly as above, Schur’s Lemma implies that \({\mathrm{\Phi }}_{\vec e_1}\) is the orthogonal projector onto \({\mathrm{span}}(B) \oplus {\cal I}\). Hence \({\mathrm{\Phi }}_B: = {\mathrm{\Phi }}_{\vec e_1} - {\mathrm{\Phi }}_I\) is the orthogonal projector onto span(B).

Finally, we will construct the orthogonal projector onto \({\cal A}_{{\mathrm{blocks}}}: = {\mathrm{span}}\{ A_1, \ldots ,A_z\}\), where z = d/2 if d is even and z = (d − 1)/2 if d is odd. To this end, define the SO(2)-matrix \(R(\theta ): = \left( {\begin{array}{*{20}{c}} {\cos {\kern 1pt} \theta } & {\sin {\kern 1pt} \theta } \\ { - \sin {\kern 1pt} \theta } & {\cos {\kern 1pt} \theta } \end{array}} \right)\), and set

$$\hat R\left( {\theta _1,\theta _2, \ldots ,\theta _z} \right): = \left( {\begin{array}{*{20}{c}} {{\mathbf{1}}_y} & {} & {} & {} \\ {} & {R(\theta _1)} & {} & {} \\ {} & {} & \ddots & {} \\ {} & {} & {} & {R(\theta _z)} \end{array}} \right),$$

where y = 1 if d is even and y = 2 if d is odd. Furthermore, define Φ′[M] as

$${\int}_0^{2\pi } \frac{{d\theta _1}}{{2\pi }}{\int}_0^{2\pi } \frac{{d\theta _2}}{{2\pi }} \ldots {\int}_0^{2\pi } \frac{{d\theta _z}}{{2\pi }}\hat R\left( {\theta _1, \ldots ,\theta _z} \right)M\hat R\left( {\theta _1, \ldots ,\theta _z} \right)^{ - 1}.$$

Using the identities

$$\begin{array}{*{20}{l}} {{\int}_0^{2\pi } R(\theta ){\kern 1pt} \frac{{d\theta }}{{2\pi }}} \hfill & = \hfill & {0,} \hfill \\ {{\int}_0^{2\pi } R(\theta )\left( {\begin{array}{*{20}{c}} {m_{11}} & {m_{12}} \\ {m_{21}} & {m_{22}} \end{array}} \right)R( - \theta )\frac{{d\theta }}{{2\pi }}} \hfill & = \hfill & {\frac{1}{2}\left( {\begin{array}{*{20}{c}} {m_{11} + m_{22}} & {m_{12} - m_{21}} \\ { - m_{12} + m_{21}} & {m_{11} + m_{22}} \end{array}} \right)} \hfill \\ {} \hfill & { = :} \hfill & {{\mathrm{\Psi }}\left[ {\left( {\begin{array}{*{20}{c}} {m_{11}} & {m_{12}} \\ {m_{21}} & {m_{22}} \end{array}} \right)} \right].} \hfill \end{array}$$

we can evaluate the action of Φ′ as follows. First, any given (d + 1) × (d + 1)-matrix M can be written in the block matrix form

$$M = \left( {\begin{array}{*{20}{c}} {M_{0,0}} & {} & {M_{0,z}} \\ \vdots & \ddots & \vdots \\ {M_{z,0}} & {} & {M_{z,z}} \end{array}} \right)$$

where M0,0 is a y × y-matrix, all Mi,j for i, j ≥ 1 are 2 × 2-matrices, and the other matrices are y × 2 and 2 × y-matrices. Then, the action of Φ′ becomes

$${\mathrm{\Phi }}^\prime [M] = \left( {\begin{array}{*{20}{c}} {M_{0,0}} & 0 & {} & 0 \\ 0 & {{\mathrm{\Psi }}[M_{1,1}]} & {} & \vdots \\ \vdots & {} & \ddots & 0 \\ 0 & {} & 0 & {{\mathrm{\Psi }}[M_{z,z}]} \end{array}} \right).$$

Hence Φ′ is an orthogonal projection that acts as the identity on \({\cal I}\) (i.e., Φ′(1) = 1), and it projects \({\cal A}\) into its subspace \({\cal A}_{{\mathrm{blocks}}}\). Furthermore, if d is even, then Φ′ annihilates \({\cal B}\), and if d is odd, then Φ′ projects \({\cal B}\) into its subspace span(B). Thus, for d even, the orthogonal projector onto \({\cal A}_{{\mathrm{blocks}}}\) is ΦA := Φ′ − ΦI, and for d odd, it is ΦA := Φ′ − ΦI − ΦB. Note that all these statements are only claimed to hold for the case that the maps are applied to operators in \({\cal A} \oplus {\cal B} \oplus {\cal I}\).

The projectors ΦI, ΦB and ΦA map the Lie algebra \({\frak g}\) into itself, if we apply different products of those projectors to the n sites. For example, consider the special case n = 1. Then \(Z \in {\frak g}\) implies \(\Phi _I[Z] \in {\frak g}\) since \({\frak g}\) is closed with respect to conjugations by elements of \({\cal G}\) and integrals. Similarly, \({\mathrm{\Phi }}_{\vec e_1}[Z] \in {\frak g}\), and since \({\frak g}\) is a linear space, we also have \({\mathrm{\Phi }}_B[Z] = {\mathrm{\Phi }}_{\vec e_1}[Z] - {\mathrm{\Phi }}_I[Z] \in {\frak g}\), and similarly for the projector ΦA. If n ≥ 2, then we can successively apply the projectors to one of the sites, using the fact that tensoring local rotations with identities gives local transformations in \({\cal G}_{{\mathrm{loc}}}\). Thus, if we define

$${\mathrm{\Phi }}: = {\mathrm{\Phi }}_A^{ \otimes n_A} \otimes {\mathrm{\Phi }}_B^{ \otimes n_B} \otimes {\mathrm{\Phi }}_I^{ \otimes n_I},$$

then Y := Φ[X″] is another valid generator, \(Y \in {\frak g}\). Furthermore, Φ[Mx] = Mx, hence

$$0 \ne \left\langle {X^{\prime\prime} ,M_x} \right\rangle = \left\langle {X^{\prime\prime} ,{\mathrm{\Phi }}[M_x]} \right\rangle = \left\langle {{\mathrm{\Phi }}[X^{\prime\prime} ],M_x} \right\rangle = \left\langle {Y,M_x} \right\rangle$$
(9)

and thus Y ≠ 0 (we have used that Φ is an orthogonal projection and thus in particular self-adjoint with respect to the Hilbert–Schmidt inner product). In particular, \(Y \in {\mathrm{Im}}({\mathrm{\Phi }}) = {\cal A}_{{\mathrm{blocks}}}^{ \otimes n_A} \otimes {\mathrm{span}}(B)^{ \otimes n_B} \otimes {\cal I}^{ \otimes n_I}\). Consequently, there are real numbers \(\lambda _{j_1, \ldots ,j_{n_A}}\) such that

$$Y = \mathop {\sum}\limits_{j_1, \ldots ,j_{n_A} = 1}^z \lambda _{j_1, \ldots ,j_{n_A}}A_{j_1} \otimes \ldots \otimes A_{j_{n_A}} \otimes B^{ \otimes n_B} \otimes {\mathbf{1}}^{ \otimes n_I}.$$

Now we apply the identities AjAk = −δjkPj and B2 = PB, where

$$\begin{array}{l}P_B = {\mathbf{1}}_{2 \times 2} \oplus 0_{(d - 1) \times (d - 1)},\\ P_1 = 0_{y \times y} \oplus {\mathbf{1}}_{2 \times 2} \oplus 0_{2(z - 1) \times 2(z - 1)},\\ P_2 = 0_{y \times y} \oplus 0_{2 \times 2} \oplus {\mathbf{1}}_{2 \times 2} \oplus 0_{2(z - 2) \times 2(z - 2)}\end{array}$$

and so on, up to Pz. This gives us

$$Y^2 = ( - 1)^{n_A}\mathop {\sum}\limits_{j_1, \ldots ,j_{n_A}} \lambda _{j_1, \ldots ,j_{n_A}}^2P_{j_1} \otimes \ldots \otimes P_{j_{n_A}} \otimes P_B^{ \otimes n_B} \otimes {\mathbf{1}}^{ \otimes n_I}.$$
(10)

Suppose that nA is even so that \(( - 1)^{n_A} = 1\). We will now show that constraint (3) gets violated. To this end, fix some \(j_1^0, \ldots ,j_{n_A}^0\) such that \(\lambda _{j_1^0, \ldots ,j_{n_A}^0} \ne 0\). For i = 1, …, nA, choose some unit vector \(\vec a_i \in {\Bbb R}^d\) such that \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right)^ \top P_{j_i^0}\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right) > 0\); for all other ji, we automatically get \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right)^ \top P_{j_i}\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right) \ge 0\). For i = nA + 1, …, nA + nB, set \(\vec a_i: = \vec e_1\), then \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right)^ \top P_B\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right) = 2\). Finally, for i ≥ nA + nB + 1, choose \(\vec a_i\) arbitrarily such that \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right)^ \top {\mathbf{1}}\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right) = 2\). Altogether, we obtain

$$v\left( {\vec a_1, \ldots ,\vec a_n} \right)^ \top Y^2{\kern 1pt} v\left( {\vec a_1, \ldots ,\vec a_n} \right) > 0$$

which violates constraint (3). Thus nA must be odd, and \(( - 1)^{n_A} = - 1\).

Recall constraint (1) in the special case k = 2:

$$v\left( {\vec b_1, - \vec a_2,\vec b_3, \ldots ,\vec b_n} \right)^ \top Y^2v\left( {\vec a_1,\vec a_2, \ldots ,\vec a_n} \right) \ge 0$$
(11)

for all unit vectors \(\vec a_i,\vec b_j \in {\Bbb R}^d\). For all \({i \in[n_{A}\,+\,n_{B}\,+\,1, n]{\mathrm{\backslash }}\{2\}}\), choose \(\vec a_i,\vec b_i\) such that \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec b_i} \end{array}} \right)^ \top {\mathbf{1}}\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right) > 0\) (simply avoid the choice \(\vec a_i = - \vec b_i\)). Similarly, for all \({i \in[n_{A}\,+\,1,n_{A}\,+\,n_{B}]{\mathrm{\backslash }}\{2\}}\), choose \(\vec a_i,\vec b_i\) such that \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec b_i} \end{array}} \right)^ \top P_B\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right) > 0\). We will now distinguish two cases for nA.

First, consider the case nA = 1. Since our original generator X was chosen nonlocal, it follows that nB ≥ 1, as explained in “Generator normal form for all dimensions d ≥ 2” subsection. Thus, the second tensor factor in (10) must be PB. We will now choose \(\vec a_2 = \vec e_2\) which implies that \(\left( {\begin{array}{*{20}{c}} 1 \\ { - \vec a_2} \end{array}} \right)^ \top P_B\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_2} \end{array}} \right) = 1\). But then we may still choose \(\vec b_1,\vec a_1\) arbitrarily, and by choosing these two unit vectors suitably from the subspace \({\mathrm{Im}}\left( {P_{j_1}} \right)\), we may generate an arbitrary sign for \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec b_1} \end{array}} \right)^ \top P_{j_1}\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_1} \end{array}} \right)\). Thus, we can break constraint (11) by a suitable choice of these two unit vectors, which yields a contradiction.

Second, suppose that nA ≥ 3 (we already know that nA must be odd). Then we can choose \(\vec a_2\) such that \(\left( {\begin{array}{*{20}{c}} 1 \\ { - \vec a_2} \end{array}} \right)^ \top P_{j_2}\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_2} \end{array}} \right) = - 1\). We have even more freedom than in the previous case: for all \({i \in[1,n_{A}]{\mathrm{\backslash }}\{2\}}\), we can choose \(\vec b_i,\vec a_i\) from the subspace \({\mathrm{Im}}\left( {P_{i_j}} \right)\) such that we get an arbitrary sign for every \(\left( {\begin{array}{*{20}{c}} 1 \\ {\vec b_i} \end{array}} \right)^ \top P_{j_i}\left( {\begin{array}{*{20}{c}} 1 \\ {\vec a_i} \end{array}} \right)\). This also leads to a violation of constraint (11), and we obtain a contradiction as well.

This means that our initial assumption must have been wrong—namely, that there exists a generator in \({\frak g}{\mathrm{\backslash }}{\frak g}_{{\mathrm{loc}}}\). We conclude that instead this set must be empty, hence \({\frak g} = {\frak g}_{{\mathrm{loc}}}\). But since \({\cal G}\) is compact and connected, it follows from ref. 68 [Theorem VII.2.2 (v)] that \({\cal G}\) cannot be larger than \({\cal G}_{{\mathrm{loc}}}\). This proves our main result, Theorem 1, for Bloch ball dimensions d ≥ 4. The proof for d = 2 is given in the Supplementary Material.