Recovering the quantum formalism from physically realist axioms

We present a heuristic derivation of Born’s rule and unitary transforms in Quantum Mechanics, from a simple set of axioms built upon a physical phenomenology of quantization. This approach naturally leads to the usual quantum formalism, within a new realistic conceptual framework that is discussed in details. Physically, the structure of Quantum Mechanics appears as a result of the interplay between the quantized number of “modalities” accessible to a quantum system, and the continuum of “contexts” that are required to define these modalities. Mathematically, the Hilbert space structure appears as a consequence of a specific “extra-contextuality” of modalities, closely related to the hypothesis of Gleason’s theorem, and consistent with its conclusions.

A series of recent experimental tests of Bell's theorem [1][2][3] have been said to close the door on Einstein's and Bohr quantum debate 4 . It is generally considered that Einstein lost the case, by advocating a notion of "local realism" incompatible with quantum mechanics (QM) 5 . However, Bohr 6 also presented himself as a realist as far as physics is concerned, and QM has no direct conflict with relativistic causality. One may thus wonder whether a deep -but philosophically sound -redefinition of physical reality might provide a way to reconcile the founding fathers of quantum physics. In refs 7, 8, 9 and 10 we argued that this can be done, under the condition that fully predictable physical properties (called "elements of physical reality" in ref. 5) are attached not to a system alone, but to a system within a given experimental context 6 .
In this paper, we further exploit this idea, in order to present a heuristic derivation of the quantum formalism, understood as a non-classical way to calculate probabilities. An outstanding feature of our approach is that the superposition principle and Born's rule appear as consequences of the quantized number of states accessible to a quantum system, without any appeal to "wave functions" -but we do recover projective measurements. Our approach bears some relationship with Gleason's theorem [11][12][13] , as it will be discussed below (see also Methods).
We shall start without formalism, but from a few definitions and hypotheses, presented here as axioms. These axioms are based on standard quantum phenomenology, and they have been introduced and discussed in refs 7 and 8 under the acronym "CSM", meaning Context, System, Modality. In the present article we will not repeat this discussion, but rather use the the following axioms to summarize the main features of our approach. Though the formulation of the axioms contains very little mathematics, they have deep mathematical consequences, that will be spelled out in the "Results" section below.
• Axiom 1 (modalities): (i) Given a physical system, a modality is defined as the values of a complete set of physical quantities that can be predicted with certainty and measured repeatedly on this system. (ii) Here a "complete set" means the largest possible set compatible with certainty and repeatability, for all possible modalities attached to this set. This complete set of physical quantities is called a context, and the modality is attributed jointly to the system and the context. (iii) Modalities cannot show up independently of a context, but the same modality may appear in different contexts, with the same conditions of repeatability and certainty. • Axiom 2 (quantization): (i) For a given context, there exist N distinguishable modalities {u i }, that are mutually exclusive: if one modality is true, or realized, the others are wrong, or not realized. (ii) The value of N, called the dimension, is a characteristic property of a given quantum system, and is the same in all relevant contexts. • Axiom 3 (changing contexts): Given axioms 1 and 2, the different contexts relative to a given quantum system are related between themselves by continuous transformations g which are associative, have a neutral element (no change), and an inverse. Therefore the set of context transformations g has the structure of a continuous group .
For the sake of clarity, we note that, within the usual QM formalism (not used so far), a modality and a context correspond respectively to a pure quantum state, and to a complete set of commuting observables. The axioms are formulated for a finite N, but this restriction will be lifted below. Intuitively, as discussed in details in ref. 7, a context can be seen as a given "knob settings" of the measurement apparatus. We will not repeat this discussion here, but we want to consider the following question: it is postulated in Axiom 2 that there are N mutually exclusive modalities associated to each given context, but there are many more modalities, corresponding to all possible contexts, related according to Axiom 3. These modalities are generally not mutually exclusive, but are incompatible: it means that if one is true, one cannot tell whether the other one is true or wrong. Then, how to relate between themselves all these modalities?
A first crucial result already established in ref. 7 is that this connection can only be a probabilistic one, otherwise the axioms would be violated; the argument is as follows. Let us consider a single system, two different contexts C u and C v , and the associated modalities u i and v j , where i and j go from 1 to N. The quantization principle (Axiom 2) forbids to gather all the modalities u i and v j in a single set of more than N mutually exclusive modalities, since their number is fixed to N. Therefore the only relevant question to be answered by the theory is: If the initial modality is u i in context C u , what is the conditional probability for obtaining modality v j when the context is changed from C u to C v ? We emphasize that this probabilistic description is the unavoidable consequence of the impossibility to define a unique context making all modalities mutually exclusive, as it would be done in classical physics. It appears therefore as a joint consequence of the above Axioms 1 and 2, i.e. that modalities are quantized, and require a context to be defined. Now, according to Axiom 3, changing the context results from changing the measurement apparatus at the macroscopic level, that is, "turning knobs". A typical example is changing the orientations of a Stern-Gerlach magnet. These context transformations have the mathematical structure of a continuous group, denoted : the combination of several transformations is associative and gives a new transformation, there is a neutral element (the identity), and each transformation has an inverse. Generally this group is not commutative: for instance, the three-dimensional rotations associated with the orientations of a Stern-Gerlach magnet do not commute. For a given context, there is a given set of N mutually exclusive modalities, denoted {u i }. By changing the context, one obtains N other mutually exclusive modalities, denoted {v j }, and one needs to build up a mathematical formalism, able to provide the probability that a given initial modality u i ends up in a new modality v j .
The standard approach at this point is to postulate that each modality u i is associated with a vector |u i 〉 in a N-dimensional Hilbert space, and that the set of N mutually exclusive modalities in a given context is associated to a set of N orthonormal vectors. Rather than vectors |u i 〉 and |v j 〉 , one can equivalently use rank-one projectors P u i and P v j , and Born's rule giving the conditional probability p(v j |u i ) can be written as In this article, we will postulate neither Born's rule nor even Hilbert spaces, but we will derive them as the consequence of the previous Axioms. Then we will discuss the relation with Gleason's theorem, as well as some consequences of our approach.

Results
In this part, we start from Axioms 1-3 and construct a consistent probability theory, by imposing some requirements on what it should describe. The first steps will thus be to translate the Axioms into mathematical constraints on probabilities relating modalities. This will lead us to manipulate N × N probability matrices, in a general way not restricted to the quantum formalism. Using our Axioms to obtain physically-based constraints, we will finally get Born's rule and unitary transforms. , the matrix Π v|u is said to be a stochastic matrix (see Methods for definitions).
For clarity, let us emphasize the interpretation of the conditional probability notation: in agreement with the definition of modalities as certainties, the meaning of | p v u j i is that "if we start (with certainty) from modality u i in the old context, then the probability to get modality v j in the new context is | p v u j i ". These probabilities provide the connection between theoretical predictions and experiments, and correspond to relative frequencies in repeated experiments starting from the same u i . It is not critical whether they are interpreted in a frequentist or Bayesian sense, but it is critical to acknowledge that they are intrinsic consequences of our axioms on quantized modalities, and thus are not associated to any "missing information". For N = 3, one has for instance As we will see below, N ≥ 3 is required because some crucial properties of Π v|u do not show up for N = 2. Let us also define a "return" probability matrix Π u|v , by exchanging the roles of the initial and final contexts. The matrix Π u|v is stochastic like Π v|u , but these two matrices are a priori unrelated, whereas it is known that in standard QM, they are transpose of each other.
Second consequence of the Axioms: the extra-contextuality of modalities. An essential ingredient for determining the mathematical structure of Π v|u is provided by a physical constraint on the probability . This is found in Axiom 1 (iii) where it was stated that "Modalities cannot show up independently of a context, but the same modality may appear in different contexts, with the same conditions of repeatability and certainty". This means in particular that if = . v j and u i represent the same modality, within two different contexts. This claim may seem surprising since the measured physical quantities in the two contexts can be quite different (see example in Methods); but what matters here is that the certainty and reproducibility are transmitted from one context to the other, hence the idea that the modality is conserved. This has also a major mathematical consequence, which is that when , we will require that the same mathematical object is associated with the modality (v j or u i ) in the two contexts.
More generally, and again in agreement with the physical reality of modalities, we will require that the probability | p v u j i depends only on the particular modalities u i and v j being considered, and not on the whole contexts in which they are embedded. Importantly, to build our formalism, we shall apply this requirement not only to the value of | p v u j i , but also to its mathematical expression; how to do that will be spelled out below. This property will be called "extra-contextuality" (see relation with other works in Methods), and it means also that a modality can be defined independently of the (N − 1) other modalities which appear in a given context. Such an extra-contextuality is fully compatible with contextual objectivity 7,9,10 : the latter states that a modality needs a context to be defined, whereas the former tells that the same modality can show up in several contexts (a simple example is given in Methods, as well as an interesting link with a proof by John Bell 14 ).
The mathematical translation of the Axioms. Summarizing the previous discussions, we want to calculate probabilities relating physical events called modalities, occurring for a given physical system in a given physical context. Given the physical system, the rules are: • For any given physical context, there are exactly N mutually exclusive modalities. As a consequence, the in two different contexts can be arranged in a N × N stochastic matrix Π v|u . A similar "return" probability matrix Π u|v is defined by exchanging the roles of the initial and final contexts, and the set of context transformations has the structure of a continuous group.
, then v j and u i are the same modality, and will be associated with the same mathematical object.
This rule applies within a given context, where one has are either zero or one between two different contexts, one will say that this is the same context, up to re-labelling the modalities. These rules have obtained from the Axioms, though not by a fully formal deduction. They may thus be considered as additional principles, deduced from the non-mathematical Axioms, and leading to exploitable mathematical consequences.
From there, the main idea of our derivation is the following: we will first write a general parametrization of stochastic matrices, which is mathematically and physically neutral, i.e., it is just a rewriting. Nevertheless, this parametrization provides a simple criterion for the stochastic matrix to be unistochastic, i.e. that its coefficients are the square moduli of the coefficients of a unitary matrix (see definitions in Methods). Then we will translate the extra-contextuality constraint into an equation, from which we will show that the matrices Π v|u and Π u|v are unistochastic. Finally the usual formalism of quantum mechanics (Born's rule, unitary transforms, link between Π v|u and Π u|v ) will follow automatically.
Mathematical lemmas on stochastic matrices. The theorems below are valid both for Π u|v and Π v|u , so u and v will be omitted whenever clarity allows.

Lemma 1:
The elements p j|i of a N × N stochastic matrix can always be written under the general form j are two sets of N hermitian projectors of dimension N × N, mutually orthogonal within each set, and where R is a real nonnegative diagonal matrix such that Tr(R 2 ) = N, and Tr( ′ Let us first introduce the orthogonal (N × N) projectors P i , that are zero everywhere, except for the i th term on the diagonal that is equal to 1; one has obviously P i P j = P i δ ij . A useful operation is then to extract the particular probability from Π u|v , and one has the following identities: Scientific RepoRts | 7:43365 | DOI: 10.1038/srep43365 where Tr is the Trace, † is the Hermitian conjugate, and are N × N matrices formed by square roots of the probabilities, and by arbitrary phase factors which are introduced here for the sake of generality, and cancel out when calculating the matrices Π v|u and Π u|v . From Eqs (3-5) the elements p j|i of a general stochastic matrix Π can be written as (the subcripts u|v or v|u are omitted for simplicity): Now, according to the singular values theorem (see Methods), there must exist two unitary matrices U and V, and a real diagonal matrix R, such that where the diagonal values of R are the square roots of the (real) eigenvalues of Σ Σ † , equal to those of Σ † Σ , and are called the singular values of Σ (see proof in Methods). The value of Tr(R 2 ) is the sum of the square moduli of all the coefficients of Σ , and is therefore equal to N. Using Eqs (6 and 7) p j|i can now be written as: are two sets of projectors, mutually orthogonal within each set. Finally, the normalization condition ∑ j p j|i = 1 implies that hence the diagonal elements of R 2 in the basis associated with the projectors { ′ P i } are all equal to one. ◻ Let us show now that very different situations occur, depending on the fact that the matrix R is (or is not) equal to the identity matrix 1 . This is related to: Eq. (7) shows that Σ is unitary if =R 1, and if Σ is unitary then Σ Σ = † 1 and =R 1 2 , so =R 1. ◻ An important corollary is that the matrix Π is unistochastic if =R 1. The reciprocal is not true, because Π being unistochastic does not imply that any matrix Σ defined by Eq. (5) is unitary (the phases may be wrong).
An obvious consequence of Lemma 2 is that if =R 1 for all possible pairs of contexts, then the matrix Π is unistochastic for all pairs of context; we will show below that this corresponds to the usual quantum formalism. The opposite case is that ≠R 1 for some pairs of contexts, but we will show that this contradicts our basic constraint that | p v u j i should depend only on the particular modalities u i and v j being considered. First let us establish the following mathematical Lemma: if its determinant  is equal to zero, and it is easy to check that  is the determinant of the unistochastic matrix obtained from U. ◻ Summarizing the previous results, Lemma 1 tells us that for any stochastic matrix Π , one can parametrize the probabilities p j|i by using the diagonal matrix R and two sets of projectors ′ P { } i and ″ P { } j . Then according to Lemmas 2 and 3, two situations are possible: -either =R 1 for all pairs of contexts, and the matrix Π is always unistochastic as shown in Lemma 2.
-or ≠R 1 for some pairs of contexts, and a stochastic (but generally not unistochastic) matrix Π is obtained for appropriate projectors, with  = 0 as shown in Lemma 3.
Scientific RepoRts | 7:43365 | DOI: 10.1038/srep43365 The Fundamental Theorem. We are now in a position to use the extra-contextuality constraint (ECC), which says that the expression of p j|i = Tr( ′ P j R ″ P j R) should depend only on the particular modalities u i and v j being considered, and not on the whole contexts in which they are embedded. A first step is the following Lemma: Lemma 4: Given a N dimensional system, each context must be associated with a set of N mutually orthogonal projectors, each projector corresponding to one of the N mutually exclusive modalities. Proof. In the case where the initial and final contexts are the same, then Π =1, Σ is unitary and diagonal, and =R 1. From its definition V can be any unitary matrix, and U = Σ V, so the two sets of projectors { ′ P i } and { ″ P j } are identical, and are associated with the current context. In addition, since Lemma 2 gives p j|i = Tr( ′ P i ′ P j ) = δ ij , each modality u k (k = i or j) must be associated with a projector ′ P k of the set { ′ P k } corresponding to the current context. ◻ For N ≥ 3, these N projectors may be part of other orthogonal sets, and the corresponding modalities may be part of other contexts. Again for consistency with the ECC, we will require that the same projector always corresponds to the same modality. This will extend first to all contexts containing one (or several) of the initial modalities, giving new projectors and new modalities, and then to the whole space of all N × N projectors, which will thus be associated to all possible modalities. This association has to be consistent when the contexts are changed; this will be discussed in eqs (14 and 15).
Let us emphasize that at this point we don't have QM yet; in some sense, we have justified the Hilbert space framework of Gleason's theorem, as the space of N × N projectors, but we still miss the main hypothesis and the result of the theorem, i.e. Born's law. More precisely, we have justified that ′ P i and ″ P j depend solely on u i and v j in eq. (2); however, it is still possible that R depends on the whole contexts C u and C v in which u i and v j are embedded, and not on these two modalities only.
So we will use again the ECC to require that not only ′ P i and ″ P j but also R depend solely on u i and v j ; more explicitly, this can be written: depends only on the two specific modalities u i , v j associated with the projectors ′ ″ P P , u v i j , and not on the contexts in which they are embedded.

Fundamental Theorem:
If each modality is bijectively associated with a rank-one projector, and if p j|i is given by Eq. (11) provides the unique way to express the coefficients p j|i of a stochastic matrix as a function of the sole modalities u i and v j , satisfying the ECC as expressed by Eq. (11). As we will show now Born's formula and unitary transforms directly follow from this result.
Unitary matrices and Born's formula. From now on we will take =R 1 according to the previous Theorem. Therefore the matrix Σ v|u = UV † is unitary, but one may wonder whether orthogonal (real) matrices might be enough. In order to justify that the full unitary set is required, we will use Axiom 3, telling that the change of contexts corresponds to a continuous group, to require that the set of matrices Σ v|u is connected in a topological sense, and contains the identity matrix. This set must contain permutation matrices, because they correspond simply to "relabelling" the modalities, i.e. to a trivial change of context. One has then Lemma 5: If the set of matrices Σ v|u , including permutation matrices, is connected in a topological sense, and contains the identity matrix, then the matrices Σ v|u must be complex unitary matrices. Proof. The set of real orthogonal matrices is topologically disconnected in two parts with determinant + 1 and − 1, whereas permutation matrices may have determinant − 1, and the identity has determinant + 1. On the other hand, all (complex) unitary matrices are connected to the identity, hence the result (see also refs 15 and 16). ◻ We are thus lead to the conclusion that Σ v|u must be a unitary matrix S v|u , with . Then Eqs (3) for picking up a particular probability become: As said above Π v|u is unistochastic, and we can define It is clear that these operators are all Hermitian projectors, i.e. one has P † = P and P 2 = P for each of them, and also that all sets { ′ P i } and { ″ P j } have the same orthogonality properties as the initial set of projectors {P i }, i.e. P i P j = P i δ ij . One can thus rewrite Eq. (3) as: Scientific RepoRts | 7:43365 | DOI: 10.1038/srep43365 which is just Born's formula (Eq. 1). Eqs (12 and 13) are consistent with our initial requirement associating a projector with a modality in any context, but make clear that this association is up to a global unitary transform, related to the choice of a fiducial context. In particular, there are two possible choices for the matrix Π v|u : One can now come back to the matrix Π u|v , for which the same reasoning is valid, and leads to a unitary matrix S u|v . By reverting the contexts one has thus: Again, the projectors should be the same for a given modality in a given context, i.e. one should have ″ = ″ P Q j j (for the same P i in the other context), and ′ = ′ P Q i i (for the same P j in the other context). This is obtained if S u|v is the inverse of S v|u , leading to a last lemma:  (14) and (15). ◻ Then the various points of view represented in the relations (14,15) are all consistent and give the same values for the probabilities, because each S v|u can be associated to an element of the group of context transformations , and its inverse is For the general consistency of the approach including Axiom 3, this set of matrices gives a N × N (projective) representation of the group of context transformations; this is fully consistent with the well known Wigner theorem 17 . This continuous unitary evolution will be essential to describe the evolution of the system (translation in time) 17 . Since we have now reached the starting point of most QM textbooks 18 , it should be clear that the standard structure of QM can be obtained from this construction. In particular, one can associate the N orthogonal projectors {P i } to the N orthonormal vectors which are eigenstates of these projectors up to a phase factor, i.e., to rays in the Hilbert space. Similarly, the expected probability law for the measurement results {a i } will be obtained by writing any physical quantity A as an operator = ∑ A aP i i , this is the usual spectral theorem. The tensor product structure for composite systems can also be introduced in the usual way (see Methods).

Discussion
An interesting outcome of our derivation is that the usual Hilbert space structure (for N × N matrices) shows up, without any initial assumption of a superposition principle, interference effect, or wave function [19][20][21][22][23][24][25][26] . This structure comes directly from requirements on probabilities, implying that the matrices Π u|v and Π v|u belong to the unistochastic subset of stochastic matrices. This appears as the mathematical consequence of the joint physical requirements of contextuality of the theory (contexts are needed to define modalities), quantization of modalities (making probabilities necessary), and extra-contextuality of modalities (probabilities depend on modalities, that may belong to different contexts).
Extra-contextuality is also a crucial hypothesis for Gleason's theorem 11,12 , which is deeply related to our derivation; however, the reasonings proceed in quite different ways. The Hilbert space structure is an assumption in Gleason's theorem, whereas in our case it appears more heuristically. Rather than reconstructing "from scratch" the Trace formula, as done by Gleason, we introduce it as a general parametrization of stochastic matrices; this avoids the heavy machinery of Gleason's theorem 11,12 , in particular the demonstration of continuity (see Methods). Then we use extra-contextuality to restrict acceptable matrices to unistochastic ones, ending up again with Born's formula in finite dimension. Using explicitly Gleason's theorem is also possible 13 , and has the advantage of lifting the restriction on a finite N (see Methods).
We emphasize that we do not need any additional "measurement postulate", since measurement is already included in Axiom 1, i.e. in the very definition of a modality [7][8][9][10] . Quantum superposition are here as usual, but they are not spooky "dead-and-alive" concepts: they are rather the manifestation of a modality (i.e., a certainty) in another context. Entanglement is also present as linear superpositions of tensor product states, corresponding to modalities in a "joint" context, and the specific case of two-particle Bell-EPR experiment is discussed in ref. 8. Since a modality requires both a context and a system, it embeds non-local features corresponding to quantum non-locality, but it is fully compatible with relativistic causality 7,8 , and operationally agrees with no-signaling, just like QM does. From a foundational point of view, our approach also provides a clear distinction between the modality, which is a real physical phenomenon, or a physical event in the sense of probability theory, and the projector, which is a mathematical tool for calculating non-classical probabilities.
To conclude, let us emphasize that we discussed a very idealized version of QM, based on pure states and orthogonal measurements. Nevertheless, this idealized version does provide the basic quantum framework, and connects the experimental definition of a physical quantity and the measurement results in a consistent way, both physically and philosophically 7 . Adding more refined tools such as density matrices, imperfect measurements, POVM, open systems, decoherence, is of great practical interest and use, but this will not "soften" the basic ontology of the theory, as it is presented here. The present work, deeply rooted in ontology, is thus complementary to many recent related proposals [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33] .

Methods
Stochastic matrices. A stochastic matrix has real positive coefficients, with all lines summing up to 1.
Bistochastic matrices are stochastic ones, with both lines and columns summing to 1. Orthostochastic and unistochastic matrices are obtained by taking the square moduli of the coefficients of respectively an orthogonal or a unitary matrix 34 . For N = 2, all bistochastic matrices are also ortho-and uni-stochastic, and for N ≥ 3, the set of unistochastic matrices is larger than the orthostochastic set, but smaller than the bistochastic set. For instance, the simple matrix is a well known example of a bistochastic matrix, which is neither orthostochastic, nor unistochastic; therefore it is not an acceptable (quantum) probability matrix Π v|u .

Singular values theorem and the invariance of R.
To obtain the singular values decomposition, diagonalize the Hermitian matrix Σ † Σ , get the real diagonal matrix R and the unitary matrix V so that V † Σ † Σ V = R 2 . Then define another unitary matrix U such that UR = Σ V, and RU † = V † Σ † , so that U † Σ Σ † U = R 2 . One gets thus the decomposition Σ = URV † as expected.
In the demonstration of the fundamental theorem, we note that one might restrict the new projectors ′  P { } k to be such that = ∼ 0  . Then R can be different of 1 , but it has still to be modified to some  R to fulfill the normalization conditions with the ′  P { } k . Since the hypotheses is that R should be constant, this case is excluded also. One may also wonder what would happen if no phase factors were included in the definition of Σ . Then the Lemmas are still valid, but Σ cannot be unitary, and not even orthogonal. Then according to Lemma 2, R cannot be the identity, and therefore the extra-contextuality constraint cannot be satisfied.

Extra-contextuality and Gleason's theorem.
Extra-contextuality is not a new concept, but it is a new name given to a known concept, called non-contextuality in articles dealing with Gleason's theorem 12 , or "measurement non-contextuality" in ref. 31. Extra-contextuality is not the contrary of contextuality, and it avoids confusion arising when using "non-contextuality". In particular, contexts are needed to define modalities, and modalities are extra-contextual, without any contradiction with refs 7, 9, 10 and 13 or with the Kochen-Specker theorem. As a simple example of extra-contextuality, consider a system of two spin 1/2 particles, and define = +    S S S 2 2 . Using standard notations for coupled and uncoupled basis, the |m 1 = 1/2, m 2 = 1/2〉 modality in the context {S z1 , S z2 } is the same as the |S = 1, m S = 1〉 modality in the context  S S { , } z 2 , though other modalities in the same two contexts are different.
Demonstrating continuity of the probability formula is an essential step of Gleason's theorem. In our derivation continuity appears formally in Axiom 3, and is embedded in the matrix formalism that we are using. It is used in the Fundamental theorem to to build a new set of projectors ′  P { } k , keeping one of them constant, and in Lemma 5 to get complex unitary matrices. So it does play a role, but does not have to be demonstrated. The explicit use of Gleason's theorem for allowing the dimension N to be infinite is spelled out in ref. 13. This requires to introduce an Axiom 4 associating modalities and projectors in an Hilbert space; such an Axiom is not formally required in the present heuristic derivation, but it provides a useful "back-up".
It is interesting to note that John Bell demonstrated explicitly in ref. 14 (Section V) that if extra-contextuality is accepted as it is done in the hypothesis of Gleason's theorem, then the impossibility of hidden variables (HV) automatically follows (more technically, Bell showed that there is no dispersion-free state). Then he wrote at the end of his proof : It was tacitly assumed that measurement of an observable must yield the same value independently of what other (commuting) measurements may be made simultaneously. Thus as well as P(Φ 3 ) say (projector on vector Φ 3 ), one might measure either P(Φ 2 ) or P(Ψ 2 ), where Φ 2 and Ψ 2 are orthogonal to Φ 3 but not to one another. These different possibilities require different experiment arrangements; there is no a priori reason to believe that the results for P(Φ 3 ) should be the same. So Bell rejected the "tacit assumption", and therefore also the conclusion that there is no dispersion-free states. But extra-contextuality, seen as a consequence of the reality of modalities, may provide the missing "a priori reason" to accept the assumption, and thus also Bell's proof.
Obviously Bell's goal was very different from ours, since he was investigating the possibility of HV, whereas we want to recover the quantum formalism. Nevertheless, we conclude that if we accept extra-contextuality, then the quantum formalism follows (from the present article), and consistently with this result, HV are excluded (from ref. 14). Relations with textbook quantum mechanics. In this section we outline various issues relating our approach to standard QM. First, we considered only pure states and orthogonal (projective) measurements. This fits with the usual view that mixed states (density matrices) and non-orthogonal measurements (POVM) correspond to more classical aspects of probabilities, and can be introduced at a later stage. This is possible because in each context, a classical probability distribution can be built upon the N mutually exclusive modalities.
In our approach entanglement appears naturally in the following way: let us consider two systems 1 and 2 with N 1 and N 2 mutually exclusive modalities. If both systems are considered together, but each one in its own context, there are clearly N = N 1 × N 2 mutually exclusive modalities. But from Axiom 2, the value of N does not depend on the context, so the global system must be described by N × N projectors and unitary matrices. Many of these projectors cannot be split into projectors acting separately on system 1 or system 2, and are associated to entangled states.
In this article the postulate on time evolution is not spelled out, but it enters in the same framework, by including translations in time in the group . For instance, if  is the Galileo group, standard non-relativistic QM can be recovered, including Schrödinger's equation 17 . Also, we did not discuss the known connection between the physical quantities and the infinitesimal generators of , or the role of "projective" representations. Finally, we considered only non-relativistic quantum mechanics, and therefore "type I" von Neumann algebra, see e.g. ref. 35.