## Introduction

As previously discussed by Richard P. Feynman, a negative probability, which relaxes the axiom of a non-negative probability of an event in Kolmogorov’s probability theory, sheds new light on our understanding of quantum phenomena1,2. The essence of his idea is that a negative probability results in much less mathematical complications in intermediate steps for the analysis of a given physical event. As an example, Feynman developed joint probability distributions for spin-1/2 systems to address Young’s double-slit experiment using a different approach, such that the probability distributions can have negative values1. Such an idea has been applied to many studies involving various quantum phenomena3,4,5,6,7. In particular, the negative probability approach provides new insight into a disagreement between classical and quantum predictions on the Bell’s theorem8. It is experimentally verified that our nature cannot be described by any classical theory of local realism, as the classical theory obeys but the quantum theory violates the Bell inequalities9,10,11. Negative probability introduces a different point of view that local realistic theory endowed with a negative probability can simulate violations of the Bell inequalities, similar to quantum theory12,13,14. However, these negative probabilities are distinct from the definition of probability with respect to the relative frequency of events, in which case, an operational interpretation is not necessarily straightforward (for historical review, see ref. 15).

On the other hand, negative values play a role as an indicator. Let us retrace Feynman’s example on the trading of apples1: “A man starting a day with five apples who gives away ten and is given eight during the day has three left.” Feynman pointed out that the initial and final numbers of apples, we denote by (5,3) and call a fundamental entity, can be calculated in the following two ways. The first case reads “beginning with 5 apples at time $${t}_{1}$$, given 8 at $${t}_{2}$$, giving away 10 at $${t}_{3}$$, and having 3 left at $${t}_{4}$$”, or symbolically, $${5}_{{t}_{1}}+{8}_{{t}_{2}}-1{0}_{{t}_{3}}={3}_{{t}_{4}}$$ with a time ordering $${t}_{i} < {t}_{j}$$ for $$i < j$$. This can be claimed to be natural since the apple number is always kept as non-negative. On the other hand, Feynman introduced an alternative model which disregards keeping the apple number as nonnegative but maintains the same fundamental entity $$({5}_{{t}_{1}},{3}_{{t}_{4}})$$; it reads $$5-10=-\,5$$, and $$-5+8=3$$. Note that the latter model disregards the chronological order $${t}_{i}$$, i.e., changes the order of trading, thus a negative number of apples appears in the intermediate steps. Although such negative numbers are abstract, allowing negativity leads to have more freedom in mathematical analysis without altering the fundamental entity. We say that such a model is free of time context16,17. In other words, the time context-free approach maintains the same fundamental entity at the expense of relaxing the assumption of nonnegative apples. Moreover, the negative apples in the intermediate steps introduce the idea of “debts” with respect to the trading of apples (the indicator).

In quantum optics, negative values have been widely used as an indicator of nonclassicality in regard to classical statistics. For instance, the Wigner function, one of the so-called quasiprobability distributions, is used to represent a joint distribution of the position and momentum in phase space18. However, due to the uncertainty principle with such conjugate variables, the Wigner function is negatively valued for some quantum states. It witnesses quantum phenomena of the states and has been generalized to finite dimensional systems such as quantum bits, and currently the quasiprobability approach has been applied to the omnidirectional range of fields in quantum information science19,20,21,22,23,24,25,26,27,28,29,30.

However, such nonclassicality indicators with Wigner functions lack the operational formalism wherein preparation, operation, and measurement cooperate explicitly16,17. Negative values in one quasiprobability can be positive in another. This fact can be an obstacle to the operational interpretation of these negative values. Furthermore, their comparison to classical statistics reveals the subtlety that quasiprobabilities require different physical interpretations for the same form of functionals as their classical counterparts. This is described as “incommensurability” of quasiprobabilities31,32. Therefore, it is more natural to employ quasiprobability that compares quantum and classical statistics on the same footing.

The operational quasiprobability (OQ) introduced in31,32 allows the problems in question to be resolved. The OQ is commensurate since it evaluates the statistics of the mathematical functionals with the same physical interpretation in every model, regardless of whether a quantum or classical case is being considered. The OQ consists of selective and sequential measurements in time and is formulated as joint probability distributions that simultaneously describe multiple measurement setups. We consider the expectation values of multiple measurement setups as fundamental entities, i.e., the measurable quantities of interest. The OQ method allows the joint distributions to be independent of the measurement setups and to simultaneously describe the multiple-setup outcomes, even though the distributions can be negative. This method can be considered as an alternative way of describing quantum theory that depends on the setups. It allows a direct comparison between quantum and classical statistics and identifies nonclassicality in an operational way.

We are focused on the specific feature that the moments will vary depending on the measurement(s) that are performed. This is called measurement-selection context and is similar to the time context of the Feynman’s apple example. The fundamental entity is a set of all moments in the single measurements. For two dichotomic observables, it is $$\{(\langle {A}^{n}\rangle ,\langle {B}^{m}\rangle )\}$$ with $$\langle {A}^{n}\rangle$$ being $$n$$-th moments in one of the single measurements and similarly $$\langle {B}^{m}\rangle$$ in the other. The OQ is free of the measurement-selection context, in the sense that the local marginals of the joint distribution are equal to the probabilities of the single measurements. This is one of the most astonishing features that classical models presume, including a macrorealistic model33. Moreover, the context-free OQ can always be constructed in quantum theory, even though quantum theory is not context-free. Instead, the quantum OQ pays a tariff of negative probabilities. Such inevitable negatives can be understood as an indicator of nonclassicality in the context of measurement selection.

The macrorealistic model assumes no-signaling in time (NSIT) and arrow of time (AoT); NSIT implies that a later measurement produces a result which is not affected by whether or not an earlier measurement is performed, and AoT is a similar condition where the role of later and earlier measurements is exchanged. In other words, roughly speaking, the two measurements are independent, and both of the sequential measurement and the individual measurements leave the fundamental entity unchanged. The classical model is thus free of the measurement-selection context. More explicitly, when two measurements are sequentially performed at times $${t}_{1}$$ and $${t}_{2}$$ with $${t}_{1} < {t}_{2}$$, respectively, or they are individually performed, the NSIT and AoT are described by

$$\begin{array}{rcl}NSIT:\ {P}_{{t}_{2}}({a}_{2}) & = & {P}_{{t}_{1},{t}_{2}}({a}_{2}),\\ AoT:\ {P}_{{t}_{1}}({a}_{1}) & = & {P}_{{t}_{1},{t}_{2}}({a}_{1}).\end{array}$$

Here $${P}_{{t}_{i}}({a}_{i})$$ are the probabilities of the single measurements, whereas $${P}_{{t}_{1},{t}_{2}}({a}_{i})$$ are the marginals of the joint measurement $${P}_{{t}_{1},{t}_{2}}({a}_{1},{a}_{2})$$ such that $${P}_{{t}_{1},{t}_{2}}({a}_{i})={\sum }_{{a}_{j\ne i}}{P}_{{t}_{1},{t}_{2}}({a}_{1},{a}_{2})$$ for $$i,j=1,2$$. In contrast, quantum theory violates the macrorealistic assumptions and it is contextual with the measurement selection.

However, its OQ representation allows the joint distribution to be free of measurement-selection context as in the macrorealistic model, and it pays the quantum tariff of negative probabilities. The OQ representation states that a quasiprobability distribution $${\mathscr{W}}$$ is operationally defined for both quantum and classical models by31,32:

$$\begin{array}{c}{\mathscr{W}}({a}_{1},{a}_{2})={P}_{{t}_{1},{t}_{2}}({a}_{1},{a}_{2})+\frac{1}{2}[{P}_{{t}_{1}}({a}_{1})-{P}_{{t}_{1},{t}_{2}}({a}_{1})]+\frac{1}{2}[{P}_{{t}_{2}}({a}_{2})-{P}_{{t}_{1},{t}_{2}}({a}_{2})].\end{array}$$
(1)

The $${\mathscr{W}}$$ has marginal probabilities equal to those of the single measurements. The presence of negative $${\mathscr{W}}({a}_{1},{a}_{2})$$ originates from the statistical difference between the single and sequential measurements. Note that $${\mathscr{W}}$$ will be nonnegative joint probabilities if NSIT and AoT are applicable. Thus, measuring OQ represents an evaluation of the context-free model for a given experiment on one hand, and also is a test of whether a given system violates the macrorealistic model based on its negative values on the other hand. Recently, experiments to test macrorealism have been conducted with various quantum states and measurement schemes34,35,36,37.

It is worth noting that a negative OQ suffices for the failure of the macrorealistic description but is not necessary. Despite the violation of the assumptions, we can still observe a positive OQ depending on the statistical differences. Such a behaviour was reported by employing general measurements32. This implies that the macrorealism, or the conditions of NSIT and AoT cannot fully capture the classicality reflected by the OQ positivity in the context of measurement-selection.

We now experimentally illustrate the negative probability in an operational way by measuring the degrees of freedom of the polarization of single photons. By considering the negative quasiprobability together with the quantum nature of photons, nonclassicality was demonstrated in the laboratory. To this end, we selectively implemented two sets of polarizations at consecutive times; the horizontal/vertical (H/V) measurement at $${t}_{1}$$ and the diagonal/anti-diagonal (D/A) at $${t}_{2}$$. Let us denote the selection of measurements by a tuple $$({n}_{1},{n}_{2})$$. We can then perform the following four measurement setups: (i) no measurement by $$({n}_{1},{n}_{2})=(0,0)$$, (ii) H/V single measurement by $$({n}_{1},{n}_{2})=(1,0)$$, (iii) D/A single measurement by $$({n}_{1},{n}_{2})=(0,1)$$, and (iv) consecutive joint measurement of H/V and D/A by $$({n}_{1},{n}_{2})=(1,1)$$. In this way, each of the probabilities at times $${t}_{i}$$ in equation 1 is associated with the tuple. We represent the experimental results by the notation of times for convenience. For example, the joint probabilities $${P}_{{t}_{1},{t}_{2}}({a}_{1},{a}_{2})$$ are equal to $${P}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})$$ with $$({n}_{1},{n}_{2})=(1,1)$$. The negativity is defined by the sum of the negative components of $${\mathscr{W}}({a}_{1},{a}_{2})$$: $${\mathscr{N}}\equiv \frac{1}{2}{\sum }_{{a}_{1},{a}_{2}}\left[\left|{\mathscr{W}}({a}_{1},{a}_{2})\right|-{\mathscr{W}}({a}_{1},{a}_{2})\right].$$ The maximum of the negativity is $$(\sqrt{2}-1)/4\approx 0.104$$ in case of two measurements31. We investigated the negativity of OQ for two input sources: (i) heralded single photons generated by spontaneous parametric down-conversion (SPDC) and (ii) single photons emitted from a single molecule. Note that all inputs were set to generate a single photon.

For the generation of heralded single photons, we exploited collinear type-II SPDC as shown in Fig. 1b. The signal was counted only when the trigger ($${D}_{trigger}$$) clicked. The input polarization state of a single photon is given by $$|\Psi (\theta ,\varphi )\rangle =\cos (\theta /2)|H\rangle +{e}^{i\varphi }\sin (\theta /2)|V\rangle$$ with $$\theta ,\varphi \in [0,9{0}^{\circ }]$$, where the angles of waveplates determine $$\theta$$ and $$\varphi$$ (see Methods). To experimentally implement the four measurement setups for OQ, which are described in Fig. 1a, we used the arrangement depicted in Fig. 1b. Three polarizing beam splitters (PBSs) and two half-wave plates (HWPs) are used to selectively measure the polarization states of photons. The PBS$${}_{1}$$ is used for the H/V polarization measurements and the two PBS$${}_{2}$$ and PBS$${}_{3}$$ with the HWPs are used for the D/A polarization measurements. Selective measurements can be performed by moving each PBS in and out of the path of the input beam. In the laboratory, we positioned two PBS$${}_{2}$$ and PBS$${}_{3}$$ at fixed positions to reduce experimental errors, and moved only the PBS$${}_{1}$$ to implement the four measurement setups. In the detection part, we counted the relative ratio of measurement outcomes as follows: $${P}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})={N}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})/{N}_{{n}_{1},{n}_{2}}^{tot}$$, where $${N}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})$$ denotes the sum of the counted photons at the detector $${D}_{{a}_{1},{a}_{2}}$$ and $${N}_{{n}_{1},{n}_{2}}^{tot}$$ is the total number of counted photons at all detectors for a given setup $$({n}_{1},{n}_{2})$$. See Methods for more details.

Figure 2

shows the negativity of OQ by the heralded single photon as a function of $$\theta$$ and $$\varphi$$ in the state $$|\Psi (\theta ,\varphi )\rangle$$. For $$\varphi ={0}^{\circ }$$, we observed the negativity for all range of $$\theta$$ (see Fig. 2a). The error bars are obtained by considering the functioning errors of the optics and devices used (see Supplementary Information). Figure 2b shows a contour plot of the negativity as varying the variables $$\theta$$ and $$\varphi$$. The maximum negativity yields $${\mathscr{N}}\approx 0.103$$ at $$\theta =4{5}^{\circ }$$ and $$\varphi ={0}^{\circ }$$, which well reproduces the theoretical prediction31. The negativity clearly indicates that any classical models that assume the NSIT and AoT conditions, cannot describe the nonclassicality of a single photon.

The quantum nature and the negativity of a single photon is reduced due to the decoherence. Figure 3a shows the theoretical values of the negativity on a cross-section of the Bloch sphere for $$\varphi ={0}^{\circ }$$. A pure state in the form of $$|\Psi (\theta ,\varphi =0)\rangle =\cos (\theta /2)|H\rangle +\sin (\theta /2)|V\rangle$$ with $$0\le \theta \le 36{0}^{\circ }$$ is placed on the rim of the plane. The inner points correspond to mixed states which are in the probabilistic mixture of the pure states; $$\hat{\varrho }({\theta }_{1},{\theta }_{2},\alpha )=\frac{1+\alpha }{2}|\Psi ({\theta }_{1})\rangle \langle \Psi ({\theta }_{1})|+\frac{1-\alpha }{2}|\Psi ({\theta }_{2})\rangle \langle \Psi ({\theta }_{2})|$$, where $$\alpha$$ determines the mixing ratio of two pure states with a constraint of $$\left|\alpha \right|\le 1$$. When $$\left|\alpha \right|=1$$, $$\hat{\varrho }$$ becomes the pure state. The central point of the circle indicates a completely depolarized state. The diamond dash lines in Fig. 3a represent the mixture of two pure states, where the negativity becomes zero and thus their convex region inside the dash lines also has a zero negativity. In this case, the experimental results, Fig. 3b, are again in well agreement with the theoretical predictions.

The heralded single photons exhibit an anti-bunching feature (see Supplementary Information). However, discussions on the second-order correlation function of these photons have been previously reported38,39,40. As a demonstration with a deterministic single photon source, we performed similar measurements with photons emitted from a single molecule (terrylene)41 (see Methods for more experimental details). In this case, the photon statistics clearly show the anti-bunching nature without any detection schemes such as triggering (see Supplementary Information). A similar negativity is obtained compared to the SPDC case (see Fig. 3c).

Finally, we also performed the same experiment with the weak-field light source (see Supplementary Information). Similar to the results obtained for the single photon sources, we also observed negative values. In this experiment, we post-selected the raw data to evaluate the negativity in a way that only single APD clicks were sampled and the rest of events, e.g., more than two clicks simultaneously were neglected. In general, the weak-field light is understood not to be the single-photon source in the sense that this light does not show the anti-bunching effects. However, the negativity can be detected with a post-selection process. Recently, such a phenomenon was reported that a coherent state of the optical field can show the nonclassicality32,42. We highlight that the operational quasiprobability reveals the negativity by an interplay between given state and measurement.

In conclusion, we experimentally explored the negativity of the operational quasiprobability by measuring the single photon polarizations. We introduced the context of measurement selection by constructing the quasiprobability such that by marginals it provides the same fundamental entity as that of the single measurements. As a result, the quasiprobability can reproduce the quantum predictions by allowing negative probabilities. The measured negatives highlight the discrepancy between the classical and quantum predictions in the context of measurement selection. In the case of the classical prediction, we investigated the macrorealistic model assuming the NSIT and AoT conditions. In this model, the operational quasiprobability becomes a legitimate joint probability distribution for the given measurement setups. Therefore, observing the negatives highlights the nonclassical property in the context of measurement selection. We note that negativity is merely a sufficient condition for violating the NSIT. That is, the operational quasiprobability can be nonnegative even if the NSIT is violated. Such a case is encountered if a general measurement is involved, which will be discussed in a forthcoming paper. From a fundamental perspective, the measured negativity provides an operational approach to unravel the nonclassicality of photons in the context of measurement selection.

## Methods

### Input preparation

Heralded single photons: The experimental schematic used to generate the heralded single photons is shown in Fig. 1b. Orthogonally polarized photon pairs are generated by a type-II SPDC process using a 401.5 nm continuous wave (CW) mode laser to irradiate a periodically poled KTiOPO$${}_{4}$$ (PPKTP). The resulting photon pair is separated using a PBS. The horizontally polarized photon (signal) is used as the input photon and the vertically polarized photon (idler) is used as the trigger. Simultaneous detection in the signal and idler channels exclude the contribution of the vacuum state of the SPDC source, and thus gives the anti-bunching property of the heralded single photons. We also experimentally examined this by measuring a second-order correlation function with multi-channel correlation measurements and obtained a value of $${g}^{2}(0)=0.036$$ (for more details, see Supplementary Information).

Single photons from a single molecule: The output of a CW-mode laser (532 nm) is focused using an oil objective (NA = 1.40) onto a single terrylene molecule which is embedded in a thin para-terphenyl crystal ($$\sim$$20 nm) in a total internal reflection geometry38. The emitted fluorescence signal transmitted across a long-pass filter (LPF) is collected by the same objective and diverted to the detection part. The measurement value of $${g}^{2}(0)$$ is 0.14 (see Supplementary Information).

Input polarization state: The input photon polarization state was set to a pure state for H/V polarization states in the form of $$|\Psi (\theta ,\varphi )\rangle =\cos (\theta /2)|H\rangle +{e}^{i\varphi }\sin (\theta /2)|V\rangle$$, where $$\theta$$ and $$\varphi$$ are the bases of the polar coordinates on a unit sphere, called a Bloch sphere. To prepare such a state, the horizontally polarized photons are sequentially transmitted through the half wave-plate (HWP) and two quarter wave plates (QWPs) as illustrated in Fig. 1b. The last QWP is fixed at the angle of $$\pi /4$$. The final polarization state of the input photon is obtained as follows:

$$\begin{array}{c}{T}_{QWP}(\frac{\pi }{4}){T}_{HWP}(p){T}_{QWP}(q)(\begin{array}{c}1\\ 0\end{array})={e}^{i(-2p+q+\pi /4)}(\begin{array}{c}\cos (\pi /4-q)\\ {e}^{i(4p-2q-\pi /2)}\sin (\pi /4-q)\end{array}),\end{array}$$
(2)

where $${T}_{Q(H)WP}$$ represents the transfer matrix of the corresponding wave-plate. The parameters of $$p,q$$, and $$\pi /4$$ are the rotating angles of the waveplates and follow the relations: $$p=(\pi +\varphi -\theta )/4$$ and $$q=(\pi /2-\theta )/2$$.

### Collecting data

In the detection part, we counted the relative ratio of measurement outcomes as follows: $${P}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})={N}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})/{N}_{{n}_{1},{n}_{2}}^{tot}$$, where $${N}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})$$ denotes the sum of the counted photons at the detector $${D}_{{a}_{1},{a}_{2}}$$ and $${N}_{{n}_{1},{n}_{2}}^{tot}$$ is the total number of counted photons at all detectors for a given setup $$({n}_{1},{n}_{2})$$. We here describe in detail how to obtain each component in Eq. 1 in terms of the measurement setting $$({n}_{1},{n}_{2})$$ and the detectors (APDs) $${D}_{{a}_{1},{a}_{2}}$$.

1. 1.

$${P}_{{t}_{1},{t}_{2}}({a}_{1},{a}_{2})$$: we implement the consecutive measurement of H/V and D/A, which corresponds to $${P}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})$$ with $$({n}_{1},{n}_{2})=(1,1)$$. Then, collect the data at all the APDs $${D}_{{a}_{1},{a}_{2}}$$.

2. 2.

$${P}_{{t}_{1}}({a}_{1})$$: this corresponds to the H/V single measurement that can be rewritten by $${P}_{{n}_{1},{n}_{2}}({a}_{1},0)$$ with $$({n}_{1},{n}_{2})=(1,0)$$. In laboratory, we fixed the PBS$${}_{2,3}$$, thus it can be obtained by the marginal of the $${P}_{{n}_{1},{n}_{2}}({a}_{1},{a}_{2})$$ with $$({n}_{1},{n}_{2})=(1,1)$$. That is, $${P}_{1,0}({a}_{1},0)={P}_{1,1}({a}_{1},0)+{P}_{1,1}({a}_{1},1)$$. In this case, the data is collected at $${D}_{{a}_{1},0}$$ for $${P}_{1,1}({a}_{1},0)$$ and $${D}_{{a}_{1},1}$$ for $${P}_{1,1}({a}_{1},1)$$.

3. 3.

$${P}_{{t}_{1},{t}_{2}}({a}_{1})$$: this is simply a marginal of $${a}_{2}$$ by $${P}_{{t}_{1},{t}_{2}}({a}_{1},{a}_{2})$$.

4. 4.

$${P}_{{t}_{2}}({a}_{2})$$: this corresponds to the D/A single measurement that is given by $${P}_{{n}_{1},{n}_{2}}(0,{a}_{2})$$ with $$({n}_{1},{n}_{2})=(0,1)$$. After the PBS$${}_{1}$$ is out, we collect the data at APDs $${D}_{0,{a}_{2}}$$.

5. 5.

$${P}_{{t}_{1},{t}_{2}}({a}_{2})$$: this is simply a marginal of $${a}_{1}$$ by $${P}_{{t}_{1},{t}_{2}}({a}_{1},{a}_{2})$$.

### Detection and data acquisition

The photon clicks of the single photon counting detectors (Perkin-Elmer, SPCM-AQ4C) are sent to a field-programmable gate array (FPGA, NI PXI-7841R) for the post-selection process. For data acquisition and processing, the FPGA operates at a 25 ns clock speed (40 MHz) and 125 ns processing cycle. Each detector has a slightly different detection efficiency; therefore, it is of crucial importance that their effective efficiencies are equalized. To this end, we measured the counts of the input polarization with the angle $$\theta =4{5}^{\circ }$$, where all four detectors are supposed to have the same input photon flux under the same detection efficiency. The detected counts are used as the references to normalize the signals of each detector.

The sum of the detected counts at four APDs is about $$1{0}^{4}$$ on average, for all values of $$\theta$$ and $$\varphi$$. Table 1 shows some examples of the detected counts for each APD from Fig. 2a.