From contextuality of a single photon to realism of an electromagnetic wave

Violations of Bell inequalities have been an incontestable indicator of non-classicality since the seminal paper by John Bell. However, recent claims of Bell inequalities violations with classical light have cast some doubts on their significance as hallmarks of non-classicality. Here, we challenge those claims. The crux of the problem is that such classical experiments simulate quantum probabilities with intensities of classical fields. However, fields intensities measurements are radically different from single-photon detections, which are primitives of any genuine Bell experiment. We show that this fundamental difference between field intensities measurements and single photon detections shifts the classical bound of relevant Bell inequalities to its algebraic limit, leaving no place for their violations.

The quantum-to-classical transition in optical interferometry can be observed either in a single-particle or multi-particle interference [1]. The multi-particle interference is commonly regarded as more fundamental, including purely quantum phenomena such as the Hong-Ou-Mandel (HOM) effect [2] and multi-photon violations of Bell inequalities [3]. The quantum-to-classical transition in these phenomena has different origins. In the HOM setting it can be attributed to the strength of particle indistinguishability [4] whereas multi-photon Bell inequality violations are tied to the coherence strength between entangled photons or, in the limit of many particles, to the vanishing ability of revealing single particle properties with multi-photon measurements [5].
A single-photon interference scenario is much simpler to describe. To illustrate it, let us consider the simplest case -a Young double-slit experiment [6]. Here the classical limit is achieved by increasing the average number of photons prepared in a coherent state or a mixture of such states. In this limit, the interference pattern does not change. What changes is the physical meaning of a mathematical formalism used to describe the experiment -the probability amplitudes of a single photon become amplitudes of a classical electromagnetic wave. Straightforward as it seems, a relatively recent 'discovery' of Bell inequality violations with classical light, dubbed 'classical entanglement', makes the whole classical to quantum transition less obvious.
Everything started in 1996 with a paper by Patrick Suppes et. al. [7]. They proposed an interferomteric experiment to violate a Bell inequality with classical light. A year later Robert Spreeuw introduced the concept of classical entanglement between two different degrees of freedom of a single classical light beam [8]. Moreover, he showed that this entanglement leads to violations of the CHSH (Clauser-Horne-Shimony-Holt) inequality [9]. He subsequently generalized this idea to more degrees of freedom, demonstrating a classical version of the GHZ (Greenberger-Horne-Zeilinger) paradox [10].
Classical entanglement occurs between two or more properties of an individual system and as such it does not require spatial separation unlike the standard EPR scenario. Because of this, the classical entanglement was largely dismissed by other researchers as a mere curiosity, irrelevant in the context of quantum non-locality [11]. However a few years later a variety of papers appeared, discussing a similar concept of intrasystem entanglement in different physical implementations. Eberly, Qian et. al. and further Aiello et.al in a series of papers [12][13][14] developed a theory of "bipartite" entanglement between polarization and position degrees of freedom in stochastic light beams [15]. Later, Eberly et. al. performed an experiment achieving a strong violation of the CHSH inequality with entangled states of stochastic light fields [16]. A similar violation of the CHSH inequality with classical entanglement between fields in two optical resonators was proposed by Snoke [17]. Finally Frustaglia et. al. [18] derived a procedure, following earlier ideas of Cerf et. al. [19] and Spreeuw [10], which allows to reconstruct probability distributions coming from any quantum correlations tests using classical optical circuits. As an illustration of their method, the authors of [18] performed an experiment with microwave circuits showing violations of the CHSH [9] and Mermin [20] type inequalities.
Experimental demonstrations of Bell inequalities violations with classical light have profound physical implications. Snoke [17] and Qian et.al. [16] hypothesised that a Bell inequality violation does not testify a presence of quantum entanglement in a given physical system -it may as well be a classical entanglement. Another hypothesis by Frustaglia et. al. [18] is that the bounds on the strength of quantum correlations (so called Tsirelson bounds) are not restricted to quantum physics, but arise naturally in classical systems which simulate quantum correlations. If these claims were true, we would have to reconsider the role of Bell inequalities in probing quantum to classical transition.
In this paper we challenge these claims by showing that one does not observe any violation in classical regime if the Bell inequalities are properly derived. More precisely, Bell inequalities test if probability distributions of measurement results are contextual [21] -the feature commonly accepted as an indicator of non-classicality. From the mathematical perspective, Bell inequalities are based on properties of exclusivity relations between jointly measurable events [22]. A proper structure of such exclusivity implies existence of a test that can distinguish between contextual (non-classical) and non-contextual (classical) probability distributions.
We show a non-contextual physical model based on quantum-to-classical transition from single photons to classical waves. Our model proves that classical waves are not contextual and thus they can be still called classical. Moreover, we demonstrate that the proper classical bounds, i.e., the bounds respecting the correct exclusivity structure of detection events for Bell tests with classical light are equal to the algebraic bounds on the correlations' strength. Therefore, there is no place for any classical contextuality in such systems.

I. RESULTS
A. Photon distribution in the classical limit Consider a single photon in a polarisation state √ p H |H + √ p V |V , where H and V denote horizontal and vertical polarisations, respectively, and p H + p V = 1. When incident on a polarising beam splitter (PBS), the photon can either go through and become H polarised, or be reflected and become V polarised. These are two possibilities occuring randomly with probabilities p H and p V , if one decides to detect the photon after the PBS. Denote these two possible outcomes as {0, 1} and {1, 0}. This scenario is a physical implementation of a binary ±1 random variable X, where the outcome {1, 0} is associated with the value +1 and {0, 1} with −1.
Next, let us consider two indistinguishable photons in the above state entering the same PBS port. They are uncorrelated and therefore they scatter on the PBS independently [23]. The photons cannot be distinguished and therefore there are only three exclusive outcomes: {2, 0}, {1, 1} and {0, 2}. These outcomes cannot be interpreted as products of two single-photon outcomes becasue of indistinguishability, i.e., events {1, 0} × {1, 0}, {1, 0}×{0, 1}, {0, 1}×{1, 0} and {0, 1}×{0, 1} are meaningless. Moreover, unlike in the single-photon case, for two photons, statements "photon is detected on the left" and "photon is detected on the right" are not exclusive because there is a chance that photons can be detected on both sides. Interestingly, the average number of photons in each output is proportional to single-photon scattering probabilities, i.e.,n H = 2p H andn V = 2p V .
In general, for N photons scattering on the PBS one can observe N +1 exclusive outcomes: {N, 0}, {N −1, 1}, etc. This is drastically fewer than 2 N outcomes observable for distinguishable particles. For N photons the single-photon random variable X is ill-defined because of indistinguishability. However, it is possible to define a random variable whose outcomes are given by X = (n H − n V )/N , i.e., the difference between photon numbers in the output ports divided by the total number of photons. Note, that −1 ≤ X ≤ 1 and X = X for N = 1. Additionally, since each photon is transformed independently and according to the same rule, the average number of photons in each polarisation mode is given byn H = N p H andn V = N p V . Because of this X = p H − p V does not depend on N and the most probable outcome state is {n H ,n V }.
Finally, let us discuss the classical limit. In this case the total number of photons is undetermined but their average number is large ( N 1). In quantum theory such situations are usually represented by a high amplitude coherent state [24]. Once we go to the classical limit, it is quite natural to treat the beam of light as a continuous object that can be split into portions in an arbitrary way. The PBS transforms a single beam with intensity I into two beams, the H polarised beam with intensity I H and the V polarised one with I V . This is predicted by both, classical and quantum theories. In the classical limit the average value of random variable X becomes (I H −I V )/I. However, since the intensities of two beams are given by I H = Ip H and I V = Ip V , we get X = p H − p V , as expected. Note, that the fluctuations of X scale as 1/ √ N , therefore in the classical limit X can be treated as a deterministic variable.
The above scenarios are schematically represented in Fig. 1. Although we considered only a single PBS, the similarity between classical intensities and probabilities generated by photonic distributions would also hold if one used an arbitrary number of linear optical devices (PBS, standard beam splitters (BS), phase shifters, etc). In this case the whole setup is equivalent to a multiport corresponding to a more complex random variable or a sequence of random variables.
To summarise, we see that the same average value X is predicted by both, quantum and classical theories, since this value does not depend on N . Nevertheless, the underlying exclusivity structure of the outcomes of X strongly depends on N . This fact causes some bizarre interpretation difficulties in experimental Bell-type scenarios with classical light. We will discuss this problem in more details in the following sections.

B. Correlations in the classical limit
Next, we show that unlike photonic distribution, the correlations between spatially separated photons strongly FIG. 1. Schematic representation of the classical limit in an experiment with uncorrelated photons. Single photon on a polarising beam splitter (PBS) can either go through or reflect. There are two exclusive outcomes: the photon is either registered on the left with the polarisation V or on the right with the polarisation H. The corresponding probabilities are pV and pH , respectively (pH + pV = 1). Two photons on the PBS can produce three exclusive outcomes: both on the left with probability p 2 V , one on the left and one on the right with probability 2pH pV , and both on the right with probability p 2 H . N photons on the PBS can produce N + 1 exclusive outcomes, however the most probable events are those with approximately N pV photons on the left and N pH photons on the right. A classical beam of light of intensity I is split on the PBS into two beams. In principle there is a continuum of outcomes, however one always observes the one with the corresponding intensities IH and IV , where IH = IpH and IV = IpV . depend on N and on the exclusivity structure of detection events. The problem of quantum correlations in the classical limit was discussed in details before (see for example [5]) so we provide here only a simple example.
Consider a pair of photons in an entangled polarisation state √ p H |HH + √ p V |V V . These two photons are shared between two spatially separated parties, Alice and Bob, who measure their photons polarisations with respective PBSs. As before, we represent the two polarisation possibilities by {1, 0} and {0, 1}. Moreover, we can also use the ±1 random variables X A and X B , defined in the same way as X above, to represent the measurement of Alice and Bob. Alice and Bob register either {1, 0}×{1, 0} with probability p V , or {0, 1}×{0, 1} with probability p H . The average values and the corresponding correlations are X A = X B = p H − p V and X A X B = 1.
Next, let us consider N such photonic pairs shared between Alice and Bob who measure polarisation on all pairs at the same time. As before, the local measurements are represented by the random variables X A = (n AH − n AV )/N and X B = (n BH − n BV )/N . Again, there are correlations between the photonic pairs giving n AH = n BH = n H and n AV = n BV = n V , hence Al-ice registers the same photon distribution as Bob and X A = X B . However, for N entangled pairs the correlations X A X B are much weaker. Note, that the outcome {n, N − n} × {n, N − n} happens with probability , thus For example, for p H = p V = 1/2 one gets X A X B = 1/N . Interestingly, X A = X B = p H − p V and in the limit of the large number of photons To conclude, the values X A and X B do not depend on N . However, the correlation between X A and X B , X A X B , does. As a consequence, in the classical limit of large N the two random variables get practically uncorrelated. Therefore, the classical limit of Bell-type scenarios based on correlations between many particles can always be explained by a classical theory (for more details see [5] and the methods). To reinforce our statement let us say that correlations between individual photons cannot be used to mimic any non-classical correlations in the limit of classical beams. The idea of a classical simulation of quantum correlations using classical beams uses different approach, and in the next section we focus on correlations between random variables defined for the same particle.

C. Bell inequalities in the clasical limit
Let us consider the CHSH scenario [9], which is the simplest Bell test involving four ±1 binary random variables A 0 , A 1 , B 0 and B 1 . In a classical theory these four random variables are jointly distributed and the following inequality must be satisfied In quantum theory it is possible to find a set of binary observables represented by Hermitian matrices, such that This means that A i and B j can be jointly measured, but it is not possible to jointly measure A 0 and A 1 or B 0 and B 1 . Interestingly, for quantum correlations A i B j the inequality (3) can be violated up to ±2 √ 2 for an optimal choice of the state and observables. The violation implies that the Each measurement produces a binary outcome (±1). Here, we present an instance in which Alice chooses to measure 0 and Bob chooses to measure 1. Alice's outcome is + and Bob's is −, hence they jointly register an event (+ − |01). b) The same instance, but in a local scenario. Classical entanglement can only be tested in such scenarios. The measurement of A is performed before B. It is generally assumed that both properties are compatible (they commute in QM sense), therefore the order of measurement is irrelevant. c) The exclusivity graph for the CHSH scenario. The vertices correspond to measurement events and the edges represent the exclusivity relations. Orange edges correspond to exclusivity of measurement outcomes for the same settings, e.g.
(+ + |ij) and (− − |ij). Grey edges correspond to exclusivity of measurement outcomes in which the second measurement has different settings, e.g. (+ + |i0) and (− − |i1). This exclusivity can be tested in the setting represented in b), by choosing B0 for the second left measuring device and B1 for the second right measuring device -detailed discussion in the text. measured correlations cannot be described by classical theories. The simplest quantum system where such a scenario is possible has four levels. In the original Bell-type scenario we have two spatially separated systems, e.g. two polarisation entangled photons discussed in the previous section, see also Fig. 2 a). In this case A 0 and A 1 correspond to the polarisation properties of the first photon, whereas B 0 and B 1 correspond to the polarisation properties of the second photon. However, using the arguments from the previous section, a large number of indistinguishable entangled pairs would produce A i B j ≈ A i B j in the classical limit. Thus, the CHSH inequality (3) would not be violated.
Let us now discuss another implementation of the CHSH scenario. This time the four-level system is made of a single photon which can occupy four modes, e.g., two polarisation modes (H and V ) and two spatial modes (a and b). As a result the photon can be in one of four possible states a H , a V , b H and b V , or in an arbitrary superposition of them. The properties A 0 and A 1 can be associated with spatial modes, whereas B 0 and B 1 can be associated with polarisation. For example, A 0 can assign +1 to mode a and −1 to b. On the other hand, A 1 can assign ±1 to orthogonal superpositions of modes, like |a ± |b . Similarly, B 0 can assign +1 to polarisation H and −1 to V , whereas B 1 can assign +1 to the righthanded circular polarisation and −1 to the left-handed one.
The Hilbert space of the system is a tensor product of two Hilbert spaces: the one corresponding to spatial modes and another one to polarisation. However, this time the system cannot be divided into parts that can be separated from each other. Still, it is possible to speak of entanglement between these two degrees of freedom, but this entanglement has nothing to do with nonlocality. Nevertheless, violation of the CHSH inequality with A i and B j confirms the presence of entanglement between spatial modes and polarisation. This entanglement gives non-classical correlations that can be attributed to contextuality rather than to nonlocality.
The properties A i and B j can be measured sequentially, as in [18], and the measurement of one property does not disturb the measurement of the other. More precisely, such a measurement can be implemented in a setup in which the system goes through the measuring device corresponding to A i and then through one of the two measuring devices corresponding to B j . The schematic representation of this setup is shown in Fig.  2 b). Because A i and B j commute, the results of the measurements do not depend on their order, i.e., B j can be measured before A i . The measurements lead to four possible outcomes that we denote by (+ + |ij), (+ − |ij), (− + |ij) and (− − |ij). The result (+ − |ij) corresponds to A i = +1 and B j = −1.
A single run of the experiment makes one of the four detectors, placed after the outputs, click. The probabilities of these clicks are p(+ + |ij), p(+ − |ij), p(− + |ij) and p(− − |ij). They can be estimated after many experimental runs and used to evaluate correlations One can observe violation of the CHSH inequality if in each experimental run the photon is prepared in the same special state and the measurements A i and B j are properly chosen. Although the setup is interpreted as a measurement of two random variables, it can also be viewed as a measurement of a single degenerate random variable X ij whose outcomes are products of the outcomes of A i and B j . Therefore, What would happen if in a single experimental run one used many identical photons or a classical beam of light? From our initial discussion we know that the intensities at the outputs would be proportional to N p(+ + |ij), N p(+−|ij), N p(−+|ij) and N p(−−|ij), where N is the number of photons. In the classical limit one would deal with a beam of light whose intensities would be I(++|ij), I(+ − |ij), I(− + |ij) and I(− − |ij). Moreover, I(+ + |ij)/I = p(+ + |ij), etc., where I is the input intensity.
In addition, one could consider a random variable where n(+ + |ij) is the number of photons in the output (+ + |ij), etc. For a single photon A i B j = X ij = X ij . For N > 1 it is impossible to assign definite values to A i , B j and to assign X ij to individual photons. However, X ij can be evaluated and in the classical limit one gets (6) Thus, it is possible to prepare a classical state of light such that The above may lead to a discussion whether the classical light has some nonclassical properties [7,8,10,[12][13][14][16][17][18]. In the following sections we show that for more than one photon the classical bound is different than ±2. One needs to remember that although X ij does not depend on N , the random variable X ij and the corresponding exclusivity structure of events strongly depends on N , therefore in order to understand what is really going on it is better to examine the CHSH scenario from the point of view of events, not averages.

D. Exclusivity and classical bounds
The CHSH inequality can be rewritten with probabilities of detection events. Since This inequality can be derived in a completely different way. The upper bound equal to three comes from the exclusivity structure of events. Firstly, the events (+ + |ij), (− − |ij), (+ − |ij) and (− + |ij) are pairwise exclusive. This is because they correspond to different outcomes of the same measurements. For example, (+ + |00) cannot happen together with (− − |00). In addition, two events are exclusive if they share the same measurement settings and the corresponding outcomes are different. This means that (+#|ij) is exclusive to (−#|ik) and (# + |ij) is exclusive to (# − |kj); Here # denotes an arbitrary outcome. For example, (+ − |10) is exclusive to (− + |11) and (− − |00) is exclusive to (− + |10). Such example can be realised in quantum theory by events corresponding to projections onto states |0 ⊗ |0 and |1 ⊗ (α|0 + β|1 ).
Although |0 and α|0 +β|1 are in general nonorthogonal states, the exclusivity is provided by the orthogonality of |0 and |1 in the first Hilbert space. Verification of this type of exclusivity can be implemented in the sequential scenario represented in Fig. 2 b) in which the second left measuring device is set to B 0 and the second right to B 1 . The exclusivity structure of the eight events can be represented with the exclusivity graph [22] whose vertices correspond to events and edges to exclusivity between two events, see Fig. 2 c). The upper bound of (8) is derived under assumption that the eight events are jointly distributed [25]. The joined probability distribution (JPD) is constructed over all possible assignments of 1/0 (truth/false) values to these events. In principle there are 2 8 possible assignments, however the value 1 cannot be simultaneously assigned to two exclusive events. This significantly reduces the number of possible assignments. The maximum value of the sum of the eight probabilities is given by the maximal number of events that can be assigned the value 1. The problem of finding the maximal number of events that can be assigned 1 is equivalent to the graph theoretical problem known as maximum independent set [22]. An independent set of a graph is a set of disconnected vertices. We are looking for a set with the largest possible number of vertices. In general, it is an NP-hard problem but it is solvable for our graph. Note, that the set of events that are assigned 1 must correspond to the independent set of the exclusivity graph, since two events from such set cannot be exclusive. It is easy to find that the maximum independent set of the graph from Fig. 2 c) contains three vertices. Therefore, the sum of the eight probabilities cannot be larger than three if these probabilities originate from some JPD. Not surprisingly, quantum theory can go as high as 2 + √ 2 and it cannot be modelled with any JPD.

E. (Non-)contextuality of many indistinguishable particles and a proper classical bound
The 1/0 assignment corresponds to a deterministic non-contextual (NC) model. The photon is assigned at most one event from each set of pairwise exclusive events. Such a set makes a measurement context -a set of events that can be jointly measured. If a context is complete, i.e., it consists of all possible measurement outcomes, the photon is assigned exactly one event. However, in the scenario considered here all contexts are not complete and contain exactly two events.
To properly discuss the problem of non-classicality of correlations in Bell-type scenarios for classical light, we need to redefine the introduced exclusivity graph model so that a transition from a single photon to a macroscopic electromagnetic wave is transparent. Instead of assigning events to a photon one should tie a photon to an event. This is a subtle difference but it leads to fundamental consequences once we deal with more than one photon. More precisely, a photon is assigned to at most one single event in each measurement context, where 1 corresponds to a photon and 0 to no photon event. In this new picture the events can be considered as modes and 1/0 as occupation numbers. The NC model assigns a well defined occupation number to each mode. The exclusivity leads to conservation of the particle numbersince there is a single photon in the system there could be at most a single photon in each context. If two exclusive events were assigned one, then there would exist a context containing two photons, which would contradict conservation of the particle number. The above interpretation was proposed for the first time in [26]. This approach is discussed in details in the Methods section. It should be emphasised that the introduced model is very general and describes the single-photon-to-classicalwave transition in Bell-type scenarios irrespective of the direct physical implementation, which may be introduced in many different scenarios [7,10,14,[16][17][18][19].
One can now rewrite the inequality (8) as n(+ − |11) + n(− + |11) + n(+ − |01) + n(− + |01) + n(+ − |10) + n(− + |10) + n(+ + |00) + n(− − |00) ≤ C, (9) where n(+ + |ij), etc., are occupation numbers of the corresponding events and C is the NC bound on the sum of these numbers, which in the case of a single photon equals to three. A single photon violates this bound. Now, consider the same CHSH scenario, but this time inject two indistinguishable photons to the system. The exclusivity and particle number conservation imply that there could be at most two photons per context. The possible occupation numbers are 0, 1 or 2. Since each context consists of only two events, one can assign a single photon to each event, see Fig. 3. Therefore, for two photons C = 8, which is the maximal possible sum of non-contextually assigned occupation numbers over all events. We see that the bound depends on the number of particles. We divide (9) by N and rewrite it as where P C (N ) = C/N . For a single photon P C (1) = 3, whereas for two photons P C (2) = 4. Therefore, for N = 2 there is no violation and the measurements can be described by NC occupation number assignments. As far as we know, the value P C (2) = 4 cannot be reached in any experimental setup, although it is allowed in our model. This is becasue the maximal quantum value of 2 + √ 2 (attainable in the CHSH scenario) does not depend on the physical implementation of the experiment. In particular it does not depend on the dimension of the state space of the physical system, which in our case translates to independence on the particle number N .
Finally, let us consider the classical limit N 1. This time the system is described by a classical light beam of intensity I for which the inequality (10) reads where N in the denominator of (10) was replaced by N due to the particle number uncertainty. The maximal experimentally attainable value of the left-hand side is still 2 + √ 2 becasue the classical beam in any linear optical setup behaves in the same way as a single-photon probability amplitude. The right-hand side can be evaluated in two ways. Firstly, in the classical limit the total intensity I that is distributed between the events can be treated as a continuous property. Therefore, in the NC model one can assign I/2 to each event and as a result P cl = 4. This is the main result in this section: The corresponding CHSH inequality (11) cannot be violated by classical light.
The other approach, which also confirms the above result, does not assume that the intensity is a continuous property. We consider two cases. First, let us take even N . The number of photons per context cannot be greater than N and since each context contains two events, one simply assigns N/2 photons per context. This leads to P C (N ) = 4. Next, we consider odd N . In this case it is easy to show that one can assign (N −1)/2 photons to five events and (N + 1)/2 photons to three events, such that there are at most N photons per each context, see Fig. 4. As a result one gets P C (N ) = 4− 1 N . In the classical limit N is undetermined, therefore P cl = P C (N ) . However, since N 1 the dominating terms in P C (N ) correspond to large values of N and hence P cl ≈ 4. To conclude, we see that in the experiments discussed above light beams, as expected, do not exhibit any quantum behaviour. Quantum behaviour is only possible for a single photon and already for N > 1 one observes noncontextual (classical) behaviour. This is because for N ≥ 2 the non-contextual bound P C (N ) is either 4 or 4− 1 N and is always greater than the physically attainable value of 2 + √ 2 ≈ 3.41. We have only considered the CHSH scenario, however in the Methods section we show that for an arbitrary contextuality scenario the bound P cl is always greater or equal than what can be achieved in the classical limit and therefore classical systems are always noncontextual and can never violate any Bell-type inequality.

II. DISCUSSION
It is commonly accepted that the violation of some Bell inequality is an indicator of non-classical behaviour. It turns out that these violations are strongly related to the wave-particle duality. In quantum theory the same object can manifest a wave-like behaviour in one experiment and a particle-like behaviour in some other experiment. The wave-particle duality does not occur in classical theory, i.e., classical objects are either waves or particles, but never both. No one ever doubts that classical particles are localized and discrete whereas classical waves are delocalized and continuous.
Bell inequalities are derived under assumption that measured properties, such as position, are well defined and the phenomenon of superposition does not occur. This is a typical particle-like approach which can be partially justified in quantum regime by the fact that at the end of each experiment one registers a single click of some detector. However, the violation of the corresponding inequality implies that this assumption needs to be reconsidered. This is because the measurements in the Bell tests exploit the phenomenon of superposition and in this sense the violation is a manifestation of the wave-like behaviour.
The situation is different in the classical theory in which there is no wave-particle duality. In particular, the classical light behaves as a wave all the time and instead of clicks one registers continuous intensities. Therefore, there is no justification to apply the standard Bell inequality to such system. In this case Bell inequalities need to be rederived using properly chosen assumptions. This is what we show in this work. Our results have the following implications on the previously discussed Bell-type tests with classical light. It was stressed by Snoke [17] and Qian et.al. [16] that violation of some Bell inequality by classical light is in fact a confirmation of the presence of some kind of entanglement. Here we show that the value of 2 + √ 2 obtainable by classical light can indicate some form of strong correlations, but these correlations are fully describable by a noncontextual model. Therefore, classical entanglement does not give rise to contextual behaviour, it does not have any features of quantum entanglement apart from mathematical analogy on the level of complex vector spaces.
Next, Frustaglia et. al. [18] argued that the bounds on the strength of quantum correlations (so called Tsirelson bounds) are not restricted to quantum theory but arise naturally in classical systems that simulate quantum correlations. There are attempts to derive these bounds using only the exclusivity structure of detection events [27,28]. However, we showed that in the CHSH scenario discussed above the classical bound resulting from the exclusivity structure of detection events is P cl = 4. Therefore, the value of 2+ √ 2 in the classical regime must originate from some physical constraints.
It is interesting to understand why quantum systems and classical light lead to the same value. The reason for this is that quantum probability amplitudes and classical electromagnetic amplitudes transform in the same way (apart from the nondeterministic quantum collapse induced by the measurement). This is why classical wave theory can simulate some aspects of quantum theory. Nevertheless, the interpretation of both amplitudes is completely different. The value of 2 + √ 2 would be much more fundamental if it resulted from both, the exclusivity structure of events and from the transformations allowed by the theory.
Finally, we would like to note that our work leads to an open problem. Is it possible to find a classical system, other than a classical wave, which would provide a value larger than 2 + √ 2? Perhaps there are some additional constraints preventing this from happening.

A. Nonlocality and contextuality
Original Bell inequalities are statistical tests that verify whether the statistics of measurements' outcomes performed on spatially separated systems fulfill the conditions of locality and realism [9]. Locality means that any action executed on one system does not affect the other one. Realism means that each measurable property has a well defined outcome, irrespective of whether this property is measured or not. Initially Bell-type tests were derived from the properties of probability distributions for correlations of measurement outcomes [9]. It was early recognised by Fine [25] that any Bell inequality is equivalent to the existence of a joint probability distribution (JPD) for all measurable properties.
Violation of Bell inequalities by quantum systems, typically referred to as quantum nonlocality, is an example of a broader class of phenomena -quantum contextuality [21]. Simply speaking, contextuality is a property of a physical system where the outcome of some property A may depend on whether it is measured with B or with C. In typical contextuality scenario one considers a system with a number of measurable properties. The goal is to find a noncontextual (NC) assignment of outcomes to all measurements, i.e., to assign an outcome to each property in a way that does not depend on what is assigned to other properties. Just like in the case of nonlocality, contextuality is equivalent to a lack of JPD for all measurable properties [25]. Finally, note that noncontextual scenarios do not require space-like separated measurements, therefore Bell-type scenarios constitute a subclass of contextual scenarios, in which the commeasurable observables can be identified with spatially separated physical systems [29]. This is the reason why we focus on a more general phenomenon of contextuality.
Contextuality scenarios underline the role of the exclusivity structure of events behind Bell-type inequalities, which is not explicit in the standard correlation-based approach. In a series of papers [22,27,30,31] Cabello et. al. show that all contextuality tests can be derived from the exclusivity structure of measurement events. Assume that the set of all events v i , the precise meaning of which is defined separately in the physical scenario, can be decomposed into subsets C k = {v r } called measurement contexts, such that all the events in a single context correspond to outcomes of some experiment. Therefore, the events in a single context can be jointly measured. Because of various constraints, some of the events cannot be simultaneously true, the property which is known as exclusivity in probability theory. We point out that the notion of exclusivity is a fundamental property of Kolmogorovian probability theory, since elementary events, which constitute a sample space in any statistical model, must be mutually exclusive.
The exclusivity structure of the set of events for a contextuality test can be represented in the exclusivity graph [22] in which adjacent vertices represent exclusive events. An example of such a graph is presented in Fig.  2 c). For commeasurable observables A i with outcomes a i , each vertex represents an event, which is a conjunction of all single detection events for a fixed experimental context. Namely v k = (a 1 , . . . , a n |A 1 , . . . , A n ), where (with a slight abuse of notation) the set {a i } represents actual measurement outcomes, obtained for fixed prop-erties {A i } that constitute a measurement context. Two events v a = (a 1 , . . . , a n |A 1 , . . . , A n ) and v b = (b 1 , . . . , b n |B 1 , . . . , B n ) in some contextuality scenario are exclusive if and only if: Having defined the exclusivity structure for a given scenario, one may attach different kinds of probabilistic structures to the set of events. We stress that the phrase probabilistic structure does not refer to a Kolmogorovian model, instead it can be understood within the context of generalized probabilistic theories [32]. The models can be ordered with respect to the strength of allowed correlations and this strength can be measured by the maximal value of P = i p(v i ) allowed within the model. We discuss three classes of models. The most restrictive one is the classical Kolmogorovian model. It assumes that all events can be attributed to a single sample space (not necessarily as atomic events, but compound events as well), and that a joint probability distribution exists for all v i . Then the maximal value of P, denoted by P C , known as noncontextual local hidden variable (NCHV) or simply classical bound, is given by the vertex independence number of the exclusivity graph -the maximal number of non-adjacent vertices. To rephrase, the classical bound P C corresponds to the maximal number of events that can be assigned truth value 1. This mathematical model represents a physical situation where all observables are commeasurable.
The second model, involving stronger correlations, is the quantum probabilistic model [33]. Here each event corresponds to a projector P (v i ). Each projector is assigned a probability p(v i ) = Tr(ρP i (v i )), where ρ represents a quantum state. For the context C k consisting of mutually exclusive events the corresponding projectors make a mutually orthogonal set and therefore: This means that either (equality) the projectors within a context represent a von Neumann measurement or (sharp inequality) a von Neumann measurement extension is possible. The maximum value P Q of P for a quantum model, known as the Tsirelson's bound, is bounded by the Lovász number of the exclusivity graph [22]. This bound is often tight [30]. The quantum model is contextual if and only if P C < i p(v i ) ≤ P Q . This means that an assignment of outcome probabilities can be done within each context separately and that the model cannot be extended to a single Kolmogorovian probabilistic model. The third model, leading to the strongest correlations, is defined by a sole demand that its probabilistic structure fulfills the exclusivity (E) principle in its final version [27] (whose other variant, known as local orthogonality, was independently developed in [28]). The E principle states that the sum of probabilities corresponding to a subset of pairwise exclusive events is bounded by 1. This principle is a conjunction of two facts: (1) Cabello's principle called the Specker's principle [31], which states that the set of pairwise exclusive events is jointly exclusive, and (2) the property of a Kolmogorovian model that the sum of probabilities of jointly exclusive events is at most 1. Therefore, the probabilistic model obeying the E principle can be defined as the set of numbers 0 ≤ p(v i ) ≤ 1, fulfilling the normalization condition vi∈C k p(v i ) ≤ 1 for each context.
Note that both, classical Kolmogorovian model and the quantum model, obey the E principle. This does not mean that a model obeying the E principle is classical or quantum. There are models obeying the E principle that are more contextual than quantum theory. This is because the maximal possible value P E of P for the model obeying the E principle can be greater than P Q . One can derive it using the properties of an exclusivity graph. The detailed derivation is given in the appendix of [34] and states that the maximal possible value is given by the fractional packing number of the graph, defined as: The maximum is taken over all cliques S of the graph. A clique is a subset of mutually adjacent vertices (pairwise exclusive events). Therefore the cliques of an exclusivity graph correspond to all the contexts C k . The Lovász number is bounded by the fractional packing number. However in many cases the fractional packing number is larger than the Lovász number.
To conclude, the bounds for the three models obey the following relation For example, in the case of the exclusivity graph presented in Fig. 2 c) one has P C = 3, P Q = 2 + √ 2 and P E = 4. In general, if the sum of probabilities of events in the exclusivity graph is bounded by P C then we say that the model is noncontextual. Otherwise it is contextual.

B. Contextuality and indistinguishability
Now we discuss events that occur in scenarios in which instead of a single photon one uses N indistinguishable photons. If the number of photons N in the system increases so does the number of possible detection events. This affects the structure of the exclusivity graph and that of bounds P C , P Q and P E . Instead of changing the graph, one can keep the exclusivity graph in the same form and change the range of values that are assigned to each event, which warrants a slight redefinition of the entire scenario. Here we focus on the classical Kolmogorovian model because we want to find a criterion for which the system of N indistinguishable photons is noncontextual. We will show that within the Kolmogorovian model the bound P C depends on N , i.e., P C = P C (N ). Finally, we show that in the classical limit P cl = P E , therefore classical light, for which the ratio of output intensities to the initial intensity is the same as the probabilities generated by a single photon, is always noncontextual.
For N = 1 one assigns truth values 1/0 to each event and tries to maximise the number of events that are assigned 1 under the exclusivity constraint. The value 1 corresponds to events that will be observed in an experiment, provided a proper measurement is performed, whereas 0 corresponds to events that will not be observed. However, 1 can be interpreted as assignment of a photon to an event and 0 as an assignment of the absence of a photon. One can generalise this approach to the case N > 1. This time each event v i is assigned a value n i = 0, 1, . . . , N . The values n i can be interpreted as occupation numbers. This approach was suggested in [26]. The exclusivity of v i and v j translates to n i + n j ≤ N . In general, the sum of occupation numbers in each context vi∈C k n i ≤ N . Therefore, the exclusivity of events implies that the number of particles per context cannot be larger than the total number of particles. If the context consists of all measurement outcome events then the number of particles in it must be equal to N . This corresponds to the particle-number conservation. In full analogy to the original model, which assigns probabilities to the events, the strength of correlations for assigning particles to the events is represented by the expression P(N ) = 1 N i n i , where the sum is taken over all events in the exclusivity graph. The Kolmogorovian model in the original scenario translates to an occupation-based model with fixed global assignment of occupation numbers.
We are looking for the maximal value P C (N ) of the expression P(N ) optimized over all allowed NC global assignments. For N = 1 the NC occupation-based model has the same constraints as the deterministic model with assigned probabilities, thus P C (1) = P C . However, for N > 1 the constraints are different and P C (N ) = P C . In particular, we are interested in the classical limit N 1. The system, corresponding to a classical light beam, is modelled by a collection of photons whose total number is undetermined, but its average number is large. We are looking for P cl which is the maximum of 1 N i n i . We write I i ≡ n i . We call I i the intensity assigned to the event v i . There is a fundamental difference between intensities and occupation numbers. In general n i = n i , however classical limit is deterministic and therefore I i = I i . In reality I i fluctuates as N , but we will come back to this problem in a moment. Moreover, unlike occupation numbers, intensities can be treated as continuous variables I i ∈ [0, I], where I is the total intensity. In this case we are looking for P cl which is a maximum of where I = N is the total intensity. The ratios 0 ≤ I i /I ≤ 1 are continuous numbers and the only constraint on them is that within each context: vi∈C k Ii I ≤ 1. As we mentioned above I i is not truly deterministic, but the ratio I i /I fluctuates as 1/ N , therefore in the classical limit N 1 these fluctuations are of no importance to us.
By taking p i = I i /I the above model is equivalent to the third model of the original scenario, which only needs to obey the E principle. Therefore the noncontextuality bound in the classical limit is P cl = P E . Since the classical light when passing a linear optical setup evolves in the same way as the single-particle quantum amplitudes, we have: which proves that classical light is always noncontextual.
C. Non-applicability of the methods used to derive the quantum bound PQ with the E principle In [27] a general method of derivation of the maximal quantum bound for a given Bell-type scenario was introduced. This method is based on the idea, that the E principle can be applied not only to a single copy of a Bell-type test represented by some exclusivity graph G, but also to k independent runs of such identical Bell tests carried on different physical systems. Such a compound experiment is described by an exclusivity graph that is represented by the so called disjunctive product G k of k copies of the graph G [35]. It is assumed that the E principle should be applicable also to any clique (representing measurement context) in the graph G k . It might seem that such a procedure should not lead to any new conclusions, since the carried k experiments are completely independent. However it turns out that application of the E principle to a clique in G k restricts the possible assignments in much stronger way than in the case when it is applied to a single copy of G. This is because each vertex V of G k represents a joint event, which consists of a conjunction of k independent events corresponding to some k vertices {v i } of G. Therefore the assigned probability factorizes into p(V ) = p(v 1 ) · . . . · p(v k ). It can be easily shown that application of E principle to some clique in G k consisting of some number of vertices V places stronger restriction on the possible values of p(v i ).
The easiest possible case is given by the contextuality test of Klychko-Can-Binicioglu-Schumovsky (KCBS) [36], in which the original exclusivity graph G is a 5cycle, and therefore the E principle allows for assignment of the probability at most 1 2 to each vertex leading to the bound P E = 5 2 . On the other hand the graph G 2 representing the exclusivity structure for two independent KCBS experiments has a 5-vertex clique K, and the E principle applied to K allows for assignment of a probability at most 1 5 to each vertex of K. Note that a probability assigned to each vertex V i of K is a product of probabilities corresponding to different subsets of vertices of G: p(V i ) = p(v i )p(u i ). Now, K can be chosen such that the vertices in K, which are of the form {v i × u j }, contain all the vertices from the elementary graphs G. Hence assignments p(v i ) and p(u j ) are also assignments to the entire graph G of a single KCBS test. Since 5, which exactly reproduces the quantum bound P Q . Now it can be easily seen that the above derivation cannot be applied to the case of classical waves, described by the NC occupation-based model defined in the previous section. This is because in the limiting case of the modified model the vertices of the graph are assigned relative intensities of light instead of probabilities of events (16). Taking two such independent experiments one cannot meaningfully create a product graph, the vertices of which would correspond to relative intensities which are products of two intensities from the single experiments. This follows from the fact that the intensity of two independent waves is not a product of their intensities but instead it is their sum (assuming they propagate in a linear medium). Instead, the joint probability of two independent events is a product of their probabilities. This implies that the entire derivation of the bound cannot be performed in the case of correlations for classical waves. Nevertheless classical waves still obey the bound P Q . As we showed, this cannot be directly derived from the exclusivity structure of the experiment. It holds because classical waves follow the same evolution rules as quantum probability amplitudes.