Introduction

Decision making research is quite diverse and involves the study of optimal algorithms and to make the best decisions possible in particular situations, with applications ranging from psychology1,2,3 and management4,5,6 to reinforcement learning7 and dynamic resource allocation8,9,10. Fundamentally, this type of research addresses how complete or partial information about a given environment is processed by a “user” (human or artificial) to achieve the best possible outcome.

The multi-arm bandit (MAB) problem is a typical example of decision making inspired by game theory, in which a user faces several choices with identical potential reward amounts (slot machines in the canonical example), whose reward probabilities are unknown to the user but can change without notice. Based on real-life situations such as mobile network connections11, computing10, or industrial process chains8, the method that the user typically adopts is to try each machine and select the one with the highest reward probability, while still trying the others from time to time to verify whether or not the current choice remains the best.

In many application instances, another dimension must be accounted for: the competition for rewards/resources between multiple users choosing simultaneously, such as for traffic regulation, telecom bandwidth allocation, or energy grid management. In these cases, users should not only search for the best choice, but also ensure that the reward allocation is both globally optimal and fair for all users. When the number of choices is limited and the number of users is high, the main focus of the problem becomes ensuring that the reward is equally shared rather than finding the best choice individually, such as in the case of mobile network connections in crowded environments.

For the MAB problem, previous researchers have implemented situations by using photonic systems as a resource for decision making, including chaotic laser sources12,13, excitation transfer via near-field coupling14,15, or polarized single-photon sources16,17. Using already established algorithms18, these works have demonstrated the potential usability of physics to solve complex decision-making situations efficiently, in up to a 64-arm single-user case13.

A consistent extension of this work is to study situations in which several players must make choices simultaneously between multiple unknown, probabilistic choices, known as the competitive multi-arm bandit (CMAB) problem. Instead of using parallel, independent physical systems to solve this problem, another idea is to use particular collective states following quantum formalism, known as entangled states, to study collective strategies. Entanglement is a fundamental property of composite systems in quantum physics that has attracted interest in game theory for solving deterministic problems with payoff matrices by finding Nash equilibrium19 in competitive situations20,21,22. Some reports have also highlighted the benefits of using entanglement and a quantum approach in machine learning23,24,25. However, as the CMAB problem is probabilistic and the same kinds of solution algorithms cannot be followed, new approaches are needed.

We previously numerically and experimentally demonstrated that the polarization degree of freedom of photons can be used to allocate the rewards from two machines to two users efficiently by using polarization-entangled photon pair states26. In the present report, we extended this work to the use of N polarized photons in an N-photon quantum superposition state to solve the CMAB in more general situations with N ≥ 2 and two choices. The goal is to maximize both the reward output from the machines and the equality, or fairness, of the repartition of rewards among players, given that their possible actions are limited to a rotation of their own polarization measurement basis at every turn. More precisely, the intent is to obtain N-photon quantum superposition states with such properties that the optimal strategy for each player leads to an optimal situation for all players in terms of total outcome and fair repartition.

In this report, after defining the general situation under study, we identify the set of constraints on N-photon quantum superposition states to optimize the reward allocation for any N. From there, we present the derivation of the corresponding states for three, four, and five users and a comparison of the corresponding properties for the MAB solution. Finally, we discuss the convergence estimation of error-correction protocols with these states.

Results

Definitions and formulation of the problem

Problem description and hypotheses

The MAB problem is used to reproduce situations in which the available choices have outcomes with different probabilities and/or reward amounts, which are unknown to the user(s). In the CMAB problem, several users simultaneously face the same choices, also referred to as “machines” in the following, with or without communication with each other. If several users select the same machine, the potential reward is split between them. Figure 1 represents a typical example case of internet connection through relays, where the concepts of competition and limited communication between users are clearly involved.

Figure 1
figure 1

Principle of CMAB problem with N players and two machines, illustrated with internet connection. In every run, each player selects one machine to use to try to connect to the internet, which does so with a (unknown) probability P1,2. In case of connection, each player receives the same, shared bandwidth as the other players who selected the same machine.

According to the situation in Fig. 1, we defined four rules for our study:

  1. (i)

    Each machine has a fixed reward in case of success, equal to 1 for all, and 0 otherwise;

  2. (ii)

    If a machine is selected by k users and gives a reward, each user receives an individual reward of 1/k;

  3. (iii)

    The users know the reward amount given by each machine in case of success, as well as the total number of users, but not the probability of reward of the machines;

  4. (iv)

    The users do not communicate with each other.

Rule (i) is only for calculational convenience and does not cause loss of generality, as arbitrary and different rewards between machines produce the same results. The main requirements for Rule (ii) are that the total rewards of the machines remain the same regardless of how many users select them, and that some users are not favored over others. Without Rule (iii), users would have no means of deducing anything about the decisions of the other users and their consequences; thus, it is necessary to elaborate the optimization algorithms. Rule (iv) restricts the kinds of algorithms usable here and emulates actual network architecture constraints.

Definitions and formalism

We consider now the CMAB with N ≥ 2 and two machines, A and B. Let xi,A and xi,B be the reward amounts for turn \(i\) of machines A and B, respectively, and let \(r_{i,j}\) be the reward amount received by user \(j \in 1,N\) in turn \(i\). For a given turn \(i\), user \(j\) makes a choice between A and B, with its reward obeying Rule (ii) such that

$$ r_{i,j} = \sum\limits_{k = A,B} {\frac{{\varepsilon_{i,j,k} }}{{\sum\nolimits_{{j^{\prime } = 1}}^{N} {\varepsilon_{{i,j^{\prime } ,k}} } }}x_{i,k} } ,\quad \varepsilon_{i,j,k} \in \left\{ {0,1} \right\}. $$
(1)

Here \(\varepsilon_{i,j,k}\) corresponds to the actual selection of machine \(k\) in turn \(i\) by user \(j\), with \(\sum\nolimits_{k = A,B} {\varepsilon_{i,j,k} = 1}\). After a given number of turns \(n_{t}\), we define \(R_{j}\) and \(R\) as

$$ R_{j} (n_{t} ) = \sum\limits_{i = 1}^{{n_{t} }} {r_{i,j} } ;\quad R(n_{t} ) = \sum\limits_{j = 1}^{N} {R_{j} (n_{t} )} . $$
(2)

\(R_{j}\) is the accumulated reward for user j, and R is the total accumulated reward. Since N ≥ 2, it is evident that the optimal strategy involves having each machine selected by at least one user in every turn, so that the total accumulated reward R is maximized.

Fairness

Optimizing the total accumulated reward R does not imply anything about the repartitioning of the total reward among users. In the simplest case of two users and two machines with constant, different probabilities of giving the same reward, having the users always select the same machines would result in a heavily unbalanced situation. In contrast, if both users recognize the best machine and always select it simultaneously, they receive the same reward amount, although the total accumulated reward R decreases.

To quantify this effect, a metric is needed that indicates in a straightforward manner whether or not the current distribution is fair between users. This notion of fairness has been particularly studied in socioeconomics, in the use of income inequality indices such as the Gini index27, Hoover index28, and Theil index29,30,31, like the information entropy used in telecommunication. Telecommunication technology also relies on these metrics as figures of merit for resource sharing, with several definitions having been proposed and studied32,33,34.

In this study, we decided to use the Jain index \(I_{J}\) to estimate the fairness between users for a given repartition \(\left\{ {R_{j} } \right\}_{j \in 1,N}\) after a given number of trials, where \(I_{J}\) is given by

$$ I_{J} = \frac{{\left( {\sum\nolimits_{j = 1}^{N} {R_{j} } } \right)^{2} }}{{N \cdot \sum\nolimits_{j = 1}^{N} {\left( {R_{j} } \right)^{2} } }}. $$
(3)

The choice of this metric is justified by several important properties, following the discussion by34:

  • The metric should be continuous with respect to the reward variables;

  • The definition should not vary with N, explaining the normalization by N in the denominator;

  • Its value interval is between 0 and 1;

  • It is symmetric and sensitive to the variation in the reward of any user.

The first two properties are essential to establish a metric that is valid for any N and any reward distribution. Concerning the Jain index specifically, its main advantage is its intuitiveness: if only one user receives a reward while \(N - 1\) others do not, then \(I_{J} = \frac{1}{N}\), and for k equally rewarded users and \(N - k\) left-out ones, \(I_{J} = \frac{k}{N}\). This aspect explains why \(I_{J}\) is already used widely34 in telecommunication to estimate the fairness of bandwidth allocation between channels.

Finally, fairness can be understood in two inequivalent ways:

  • Fairness of the reward accumulated so far, or

  • Fairness of the expected value of rewards to be obtained in the next trial (regardless of the accumulated reward differences).

The first case corresponds to active fairness, in which it is attempted to correct any uneven accumulated repartition in every step. The second case is characterized by a passive pattern, in which it is attempted to make the subsequent trials fair, without trying to correct unfairness from past events. In the example of telecommunication, an algorithm following the first case will attempt to compensate for differences in the accumulated data amount in every trial, while the second one will only attempt to achieve an equal data transmission rate on average between users for the subsequent trials. Since the second situation corresponds to a sufficient condition for the first situation after a transition period, we will focus here only on passive or instantaneous fairness. Note that the individual rewards in every step do not need to be equal for the situation to be fair: it only matters that the average expected value is the same for all users.

Targets and performance

With these tools, we can formulate the problem to be solved in this CMAB situation.

How can the maximum total reward available for all users be obtained, while simultaneously guaranteeing instantaneous fairness between users?

Considering this objective, we used a modified version of the Jain index that involves both fairness and total reward performance. Let \(X_{k} (n_{t} ) = \sum\nolimits_{i = 1}^{{n_{t} }} {x_{i,k} }\) be the accumulated reward given by machine k after \(n_{t}\) trials and \(X(n_{t} ) = X_{A} (n_{t} ) + X_{B} (n_{t} )\) be the total reward available to users. The modified index is defined as

$$ I_{p} = I_{J} \cdot X = \frac{{\left( {\sum\nolimits_{j = 1}^{N} {R_{j} } } \right)^{2} }}{{N \cdot \sum\nolimits_{j = 1}^{N} {\left( {R_{j} } \right)^{2} } }} \times \left( {\frac{{\sum\nolimits_{j = 1}^{N} {R_{j} } }}{{\sum\nolimits_{k = A,B} {X_{k} } }}} \right). $$
(4)

This quantity corresponds to the fairness index considered by the global performance, which we call the pondered index. It possesses the same advantages as the Jain index while also evaluating the efficiency of the global strategy: if all users make the same choice, the repartition will be fair with poor performance, while a single user who always selects the best machine alone will generate a strongly unbalanced repartition. Thus, \(I_{p} \in \left[ {0,1} \right]\) and is equal to 1 if and only if the repartition is fair and the total available reward is obtained by the users. The strategies discussed herein are all intended to make \(I_{p}\) as close to 1 as possible.

N = 2 case and constraints on the required quantum state

This section discusses the N = 2 case from the perspective of the fairness discussion introduced before and with the use of quantum superposition states employing the polarization of photons. The purpose is to apply the formalism to a well-known case so that situations with more than two users can be easily treated.

This situation is the simplest case of users having to share limited and uncertain resources among them, without direct communication between them. One possibility is to ensure, by construction, equality among users in the case of a sub-optimal reward produced by a selfish strategy. To maximize the total reward, each machine should be selected by one user in every step, and to maximize fairness, each user should select one machine as often as the other. In other words, users should avoid conflicts of decision between them. This principle leads to the difficult task of guaranteeing that users never select the same machine simultaneously when they cannot communicate with each other.

In our previous work26, we studied the situation in which users receive photon pairs in a quantum superposition of polarization states, while only being able to rotate a half-waveplate to optimize their rewards. Comparisons with other strategies such as independent random decision making and situations with selfish users are also provided. While the latter part will not be discussed here, we will describe the quantum state used and its properties in relation to the two-user CMAB.

Derivation of the state

We demonstrated previously26 that polarization-entangled photon pairs can solve this issue when they are in state \(\left| {\psi (\phi )} \right\rangle_{2}\) given by

$$ \left| {\psi (\phi )} \right\rangle_{2} = \frac{1}{\sqrt 2 }\left( {\left| {HV} \right\rangle + e^{i\phi } \left| {VH} \right\rangle } \right), $$
(5)

where \(\phi\) is a real number and \(\left| H \right\rangle\) and \(\left| V \right\rangle\) are the horizontal and vertical linear polarization states of single photons, respectively. In that setup, each user has a half waveplate, which transforms any input state according to the angle \(\theta /2\) of its fast axis with respect to the \(\left| H \right\rangle\) direction of the photon source, following

$$ \left( {\begin{array}{*{20}c} {a\left| H \right\rangle } \\ {b\left| V \right\rangle } \\ \end{array} } \right) \to \hat{r}(\theta )\left( {\begin{array}{*{20}c} {a\left| H \right\rangle } \\ {b\left| V \right\rangle } \\ \end{array} } \right),\quad \hat{r}(\theta ) = \left( {\begin{array}{*{20}c} {\cos \theta } & {\sin \theta } \\ { - \sin \theta } & {\cos \theta } \\ \end{array} } \right). $$
(6)

Generally speaking, state \(\left| {\psi (\phi )} \right\rangle_{2}\) only works if users have the same polarization measurement base \(\left\{ {\left| H \right\rangle ,\left| V \right\rangle } \right\}\), or equivalently if \(\theta = 0\;[90^{ \circ } ]\) with respect to the photon source (there is then no mixing of states while projecting onto the orthogonal states of a polarizing beam splitter cube). In contrast, for \(\phi = \pi\),

$$ \left| \psi \right\rangle_{2} = \frac{1}{\sqrt 2 }\left( {\left| {HV} \right\rangle - \left| {VH} \right\rangle } \right), $$
(7)

which is invariant under any simultaneous rotation of the polarization measurement bases of the users. In this case, there is no need for direct communication between users to achieve an optimal situation, since only the relative angle between the measurement bases of the users matters.

Properties of the state

Next, assuming the state \(\left| \psi \right\rangle_{2}\) is sent, we discuss the evolution of fairness and total performance of the users when they modify their polarization bases independently by rotating their half-waveplates. It is assumed that this state is well controlled and reproducible and that the detection efficiency is 100%; in practice, limited detection efficiency can be overcome by post-selection or coincidence schemes, at the expense of lower fidelity of the state. Figure 2 shows the evolution of fairness and total performance with the rotation angles of the waveplates of both users after 1000 trials, averaged over 20 repetitions.

Figure 2
figure 2

(a) Total reward, (b) fairness, and (c) modified Jain index \(I_{p}\) for two users and two machines, using \(\left| \psi \right\rangle_{2}\) as the input state, as functions of the measurement basis tilt of each user.

Figure 2b shows that the fairness remains constant regardless of the combination of angles, which means that both players always receive the same accumulated reward regardless of their own angles. In addition, Fig. 2a demonstrates that the total reward depends only on the relative angle between the waveplates of the users, in accordance with the global rotational invariance of the state. From the perspective of the users, improving the reward of one user is thus equivalent to improving the rewards of both users equally. Finally, Fig. 2c presents the modified Jain index \(I_{p}\), which is equivalent here to the evolution of the total reward due to the perfect fairness in every case between users.

Realignment algorithm convergence

This section discusses the determination of how users can actually proceed to have their polarization measurement bases aligned with each other (modulo 180°). According to Fig. 2, for any given tilt angle of one user, the other one can find an optimal position by tuning only its own waveplate. In other words, the action of one user is sufficient to find an optimal configuration for both.

In the previous study26, only one user attempted to find a correct angle configuration in the realignment algorithm, without any information about the angle of the other user. However, this strategy implies agreement regarding which user should act, because if both users act simultaneously they may not find an equilibrium situation.

These observations motivated us to implement a new, simpler, and scalable strategy, which involves selecting a random waveplate position when too many conflict situations are recorded over time. Given that Rules (i), (ii), and (iii) are verified, all users can recognize simultaneously when there is total conflict of decision and thus when realignment is needed, based on the reward amounts they receive (1/N in the case of success and full conflict). The exact algorithm is presented in section 1 of the Supplementary information.

To compare the performances of the two algorithms, we numerically studied a set of 100 random initial angles for the measurement basis of both users, with any value between 0° and 360° in increments of 5°, from which one player performed either the algorithm corresponding to the autonomous polarization-basis alignment of our previous study26 (under assumption (II) in that report) or the proposed memory-based random algorithm. Briefly, the first algorithm supposes that no information is available to either user about the measurement basis position of the other user, and only one user adjusts its basis tilt in small increments until conflict is avoided. Each set of initial angles is repeated 20 times, and the averaged \(I_{p}\) is estimated from the angle configuration in several time steps. We used identical parameters for both algorithms, with a memory capacity of eight reward-giving events, a conflict event threshold of 2 before the angle was changed, and an angle increment of 5° for the previous algorithm. These parameters correspond to a trade-off between sensitivity to conflicts and robustness against possible errors.

Figure 3 shows that both algorithms converge toward a stable, optimal configuration after several hundred plays, with the proposed algorithm converging faster. Using a coarser angle increment in the previous algorithm would probably make it converge faster, with the risk of skipping an optimal situation. As such, the proposed algorithm performs well for any initial configuration, and we chose to use it again for realignment estimation for higher numbers of players.

Figure 3
figure 3

Modified Jain index \(I_{p}\) during realignment when one player uses the proposed random algorithm (green line) or the previous incremental algorithm (orange line), for 100 identical random initial angles. Each result is the average of 20 trials.

N = 3 case

Based on the considerations in the N = 2 case, this section presents the derivation of an appropriate superposition of three-photon polarization quantum states to maximize \(I_{p}\), with a similar ideal setup as in the N = 2 case.

Constraints on the quantum state

Several criteria on the quantum state to be sent to the users can be introduced following the observations made in the N = 2 case:

  • Invariance of the measurement probabilities with respect to simultaneous rotation of the measurement bases of all users;

  • Symmetry of the state with respect to all other users;

  • No term with all users selecting the same choice simultaneously.

The first condition comes from the requirement that the state be device-independent; otherwise, the problem depends on the choice of the measurement basis of the photon source. The second condition means that with an odd number of users, such as three, the terms should have equal probabilities between an unbalanced situation (such as two users on A and one user on B) and its mirror (two users on B and one on A), as well as equal probabilities between all permutations of these terms. Finally, the last condition is associated with the search for the maximum total reward, which requires that all choices be attempted in every trial.

Derivation of the quantum state

Let us start from the three conditions given in the previous section to discuss whether or not there exists a family of compatible states for N = 3. This derivation is intended to demonstrate how these conditions constrain the form of the final quantum state obtained.

Global rotational invariance is best described using the density matrix formulation for the quantum states. Let \(\left| \psi \right\rangle_{3}\) be the entangled state under study and \(\rho_{3}\) the associated density matrix. Following the last criteria given in the previous section and using the polarization basis \(\left\{ {\left| H \right\rangle ,\left| V \right\rangle } \right\}\) of the photon source, the state in the three-photon Hilbert space can be written as

$$ \begin{aligned} \left| \psi \right\rangle_{3} & = 0\left| {HHH} \right\rangle + a_{1} \left| {HHV} \right\rangle + a_{2} \left| {HVH} \right\rangle + a_{3} \left| {VHH} \right\rangle \\ & \quad + b_{1} \left| {VVH} \right\rangle + b_{2} \left| {VHV} \right\rangle + b_{3} \left| {HVV} \right\rangle + 0\left| {VVV} \right\rangle \\ \end{aligned} $$
(8)

in its canonical basis, where the coefficients are all complex and include the normalization condition. From this equation, the necessary conditions on the coefficients can be derived such that the invariance of the state probability measurements does not vary under a simultaneous rotation \(\theta \in {\mathbb{R}}\) of all measurement bases, denoted hereafter as \(\left\{ {\left| \theta \right\rangle ,\left| {\theta^{\prime } } \right\rangle } \right\}\) in the photon source reference frame. This condition involves the global rotation operator \(\hat{R}_{3} (\theta )\), defined in the three-photon Hilbert space H from the two-dimensional operator \(\hat{r}(\theta )\) as

$$ \hat{R}_{3} (\theta ) = \hat{r}(\theta ) \otimes \hat{r}(\theta ) \otimes \hat{r}(\theta ),\quad \hat{r}(\theta ) = \left( {\begin{array}{*{20}c} {\cos \theta } & {\sin \theta } \\ { - \sin \theta } & {\cos \theta } \\ \end{array} } \right). $$
(9)

Then, the condition of invariance of the measurement probabilities under global rotation can be expressed as

$$ \forall \theta \in {\mathbb{R}},\;\forall \left| \phi \right\rangle \in H,\;\left\langle \phi \right|\hat{R}(\theta )^{\dag } \rho_{3} \;\hat{R}_{3} (\theta )\left| \phi \right\rangle = \left\langle \phi \right|\rho_{3} \left| \phi \right\rangle . $$
(10)

After applying (10) to the canonical basis of the three-photon Hilbert space, the following non-redundant conditions can be obtained:

$$ a_{1} + a_{2} + a_{3} = 0 $$
(11)
$$ a_{1} = \pm \,ib_{1} ;\;a_{2} = \pm \,ib_{2} ;\;a_{3} = \pm \,ib_{3} $$
(12)
$$ \left| {a_{1} } \right| = \left| {a_{2} } \right| = \left| {a_{3} } \right| = \left| {b_{1} } \right| = \left| {b_{2} } \right| = \left| {b_{3} } \right| = \frac{1}{\sqrt 6 }. $$
(13)

From (11) and (13), one can deduce that the complex cubic roots of 1 are involved. In addition, as the global phase of the state is irrelevant, there are 11 unknown variables to find (the amplitudes and phases of the parameters), with only 10 equations ((13) summarizes six equations). Four states can solve the equations, corresponding to the permutation of the phase differences and the choice of \(\pm i\) in (12):

$$ \left| \psi \right\rangle_{3} = \frac{1}{\sqrt 6 }\left[ {\left| {HHV} \right\rangle + z\left| {HVH} \right\rangle + z^{2} \left| {VHH} \right\rangle \pm i\left( {\left| {VVH} \right\rangle + z\left| {VHV} \right\rangle + z^{2} \left| {HVV} \right\rangle } \right)} \right],\;z = e^{{ \pm \frac{2i\pi }{3}}} . $$
(14)

All states have the same properties that are relevant to the present analysis, with symmetry in the rotation parameter space of each player. In the following, we discuss only the state

$$ \left| \psi \right\rangle_{3} = \frac{1}{\sqrt 6 }\left[ {\left| {HHV} \right\rangle + i\left| {VVH} \right\rangle + e^{{\frac{2i\pi }{3}}} \left( {\left| {HVH} \right\rangle + i\left| {VHV} \right\rangle } \right) + e^{{\frac{4i\pi }{3}}} \left( {\left| {VHH} \right\rangle + i\left| {HVV} \right\rangle } \right)} \right]. $$
(15)

State properties

As discussed about the constraints, the total reward for all users is maximized when at least one user selects each machine in every turn. Thus, the total reward is reduced in conflict situations in which all users select one machine.

Figure 4 shows three-dimensional scatter plots of the total reward, fairness, and pondered index as functions of the angle of the polarization measurement basis of each user relative to the photon source basis, with 5° discretization. The color and size of each sphere are functions of the variable studied. The fairness exhibits the greatest difference from the N = 2 case here, as there exist situations in which both total reward and fairness are sub-optimal. The total rewards still vary considerably over the parameter space, and the combination of the variations explains the more significant variation of the pondered index with respect to the N = 2 case.

Figure 4
figure 4

(a) Total reward, (b) fairness, and (c) pondered fairness \(I_{p}\) for \(N = 3\) users rotating their polarization measurement bases independently. Red circles in (c) indicate optimal angle combinations from which all other can be deduced by global rotation or a period of \(\pi\).

Note that in addition to \(\pi\)-periodic optimal situations and invariance with respect to simultaneous rotation, additional optimal situations can be found for specific combinations of angles, indicated by the red circles in Fig. 4c and corresponding to rotations of \(\pi /3\) or \(2\pi /3\). This characteristic corresponds to the fact that permutations of the phases of several terms in \(\left| \psi \right\rangle_{3}\) yield the same performance. This feature leads to a parameter space in which many combinations of angles between the three users are favorable.

Realignment algorithm convergence

In accordance with the objective of achieving device-independent behavior, the users should be able to find the optimal situation without needing to communicate with each other. Supposing that their measurement bases are misaligned with each other, there should be a mean of retrieving a favorable situation as quickly as possible. However, unlike in the N = 2 case, the actions of one user alone are not sufficient to reach the optimal angle combination in the general case, as can be seen in Fig. 4 along the tuning parameter axis of each user taken separately. Consequently, the random exploration strategy introduced in Realignment algorithm convergence of the N = 2 case is needed.

To estimate the efficiency and speed of convergence of this strategy, we selected 100 initial random angle combinations and repeated the calculations 20 times for each combination to obtain a statistical average. At different time steps, the angle combination was taken and attempted 1000 times to estimate the instantaneous \(I_{p}\). Figure 5a shows the evolution of \(I_{p}\) over time when one, two, or three users follow this strategy while the other(s) remain fixed, still with 5° discretization of the available angles for computational purposes. It is evident that when at least two users react to conflicts, the average \(I_{p}\) increases steadily until it reaches more than 0.98 after 10,000 events, while it remains bounded at 0.95 on average when only one user moves. At least two users are thus necessary to reach the optimal situation.

Figure 5
figure 5

(a) Instantaneous pondered equality \(I_{p}\) after different numbers of time steps t (log scale), when one (green), two (blue), or three (black) users react to misalignment. Estimated upcoming rewards for (b) one moving user (blue) and two fixed users (red) and (c) two moving users (blue) and one fixed user (red).

We also tested the evolutionary stability35 of this strategy by checking whether users are at a disadvantage or an advantage if they use this strategy or if they remain passive, relying on other users to improve the situation. Figure 5b depicts the estimated reward given the current angle configuration for each user when only user 1 is moving, while Fig. 5c shows the situation when users 1 and 2 react to conflicts, but not user 3. As can be seen, a passive user always receives a lesser or equal reward compared to the active users, while all users always benefit from attempting to improve the situation. It is thus beneficial for users to follow this algorithm rather than remaining passive, which confirms the evolutionary stability of the algorithm. Besides, if any user tries to increase its own outcome, other users can easily recognize it and get even higher reward than the selfish user, thus pushing users to play for the common best situation.

N = 4 case

We have already identified the differences between odd and even numbers of users, mainly the necessary symmetry between terms that gives unbalanced rewards, such as when there are two users for machine A and one user for machine B, as well as similarity, such as invariance by simultaneous rotation of the measurement bases of the users. This part will focus on the \(N = 4\) case to identify the properties that are preserved when the situation is scaled up according to the number of users.

Differences from the N = 2 and N = 3 cases

The same constraints apply to the state to be obtained with respect to the N = 3 case: invariance with respect to simultaneous rotation of the polarization measurement bases, symmetry between users, and avoidance of terms in which all users select the same machine simultaneously. However, this last constraint is fulfilled by two types of terms:

  • Asymmetric terms in which three users select one machine and one user selects the other, such as \(\left| {HHHV} \right\rangle\) and \(\left| {VVVH} \right\rangle\),

  • Symmetric terms in which users are split equally between machines, such as \(\left| {HHVV} \right\rangle\) and its permutations.

A priori, both kinds of terms are usable here as long as their relative amplitudes are equal between permutations of the photons of the users; other considerations are required to determine the expression of the state.

Derivation of the quantum state

The state can be derived using the same technique as for N = 3, starting from global rotation invariance of the density matrix of the target state. We chose to study separately the asymmetric and symmetric terms for clarity.

Let \(\left| S \right\rangle_{4}\) and \(\left| A \right\rangle_{4}\) respectively be the symmetric and asymmetric target states in the corresponding Hilbert space, with all coefficients being complex:

$$ \left| S \right\rangle_{4} = c_{1} \left| {HHVV} \right\rangle + c_{2} \left| {HVHV} \right\rangle + c_{3} \left| {HVVH} \right\rangle + c_{4} \left| {VHHV} \right\rangle + c_{5} \left| {VHVH} \right\rangle + c_{6} \left| {VVHH} \right\rangle $$
(16)
$$ \begin{aligned} \left| A \right\rangle_{4} & = a_{1} \left| {HHHV} \right\rangle + a_{2} \left| {HHVH} \right\rangle + a_{3} \left| {HVHH} \right\rangle + a_{4} \left| {VHHH} \right\rangle \\ & \quad + b_{1} \left| {VVVH} \right\rangle + b_{2} \left| {VVHV} \right\rangle + b_{3} \left| {VHVV} \right\rangle + b_{4} \left| {HVVV} \right\rangle . \\ \end{aligned} $$
(17)

Then, the invariance by simultaneous rotation of any angle \(\theta\) using the operator \(\hat{R}_{4} (\theta )\) is applied. For the symmetric states \(\left| S \right\rangle_{4}\), the conditions on the complex coefficients for invariance under operator \(\hat{R}_{4} (\theta )\) follow:

$$ c_{1} + c_{2} + c_{3} = 0 $$
(18)
$$ c_{1} = c_{6} \;,\quad c_{2} = c_{5} \;,\quad c_{3} = c_{4} $$
(19)
$$ \left| {c_{1} } \right| = \left| {c_{2} } \right| = \left| {c_{3} } \right| = \left| {c_{4} } \right| = \left| {c_{5} } \right| = \left| {c_{6} } \right| = \frac{1}{\sqrt 6 }. $$
(20)

As before, the global phase can be fixed at 0 without loss of generality, which gives 10 equations for 11 unknowns. The constraints are the same as for the N = 3 case, with the difference that there is no \(\pm i\) factor anymore between mirror terms. There are two possible states of this kind:

$$ \left| S \right\rangle_{4} = \frac{1}{\sqrt 6 }\left( {\left| {HHVV} \right\rangle + \left| {VVHH} \right\rangle + z\left( {\left| {HVHV} \right\rangle + \left| {VHVH} \right\rangle } \right) + z^{2} \left( {\left| {HVVH} \right\rangle + \left| {VHHV} \right\rangle } \right)} \right)\,,\;z = e^{{ \pm \frac{2i\pi }{3}}} . $$
(21)

In the following, we will discuss only the case \(z = e^{{\frac{2i\pi }{3}}}\), as both states produce the same results in this analysis.

For \(\left| A \right\rangle_{4}\), the global rotation invariance gives the conditions

$$ a_{1} + a_{2} + a_{3} + a_{4} = 0 $$
(22)
$$ a_{1} = - b_{4} \;,\quad a_{2} = - b_{3} \;,\quad a_{3} = - b_{2} \;,\quad a_{4} = - b_{1} $$
(23)
$$ \left| {a_{1} } \right| = \left| {a_{2} } \right| = \left| {a_{3} } \right| = \left| {a_{4} } \right| = \left| {b_{1} } \right| = \left| {b_{2} } \right| = \left| {b_{3} } \right| = \left| {b_{4} } \right| = \frac{1}{\sqrt 8 }. $$
(24)

Excluding the global phase, there are 15 unknowns for 13 equations, leaving the choice of one relative phase between \(a\) terms and the permutation order (for a given phase difference between, say, \(a_{1}\) and \(a_{2}\), we can have either \(a_{3} = - a_{1}\) or \(a_{4} = - a_{1}\)). This characteristic gives the family of states

$$ \begin{aligned} \left| A \right\rangle_{4} (\phi ) & = \frac{1}{\sqrt 8 }\left[ {\left( {\left| {HHHV} \right\rangle - \left| {VVVH} \right\rangle } \right) + e^{i\phi } \left( {\left| {HHVH} \right\rangle - \left| {VVHV} \right\rangle } \right)} \right. \\ & \quad \left. { - e^{{i\frac{\phi }{2}}} \left[ {e^{{ \pm i\frac{\phi }{2}}} \left( {\left| {HVHH} \right\rangle - \left| {VHVV} \right\rangle } \right) + e^{{ \mp i\frac{\phi }{2}}} \left( {\left| {VHHH} \right\rangle - \left| {HVVV} \right\rangle } \right)} \right]} \right], \\ \end{aligned} $$
(25)

where \(\phi \in \left[ {0,2\pi } \right]\). At this stage, the choice of \(\phi\) is not constrained, as all these states follow all the rules we have set.

State properties

As before, the objective is to study the variations of the performances of the states when the users vary their waveplate angles. As a four-dimensional representation of the parameter space is inconvenient to display and comprehend, we used the global rotation invariance and fixed the angle of one user at 0°, while the three other users varied their angles. The other configurations can be deduced by translation of the three-dimensional figure along the three-dimensional diagonal.

Figure 6 shows the total reward, fairness, and \(I_{p}\) for the symmetric state \(\left| S \right\rangle_{4}\) when users 1, 2, and 3 tilt their waveplates, with 5° discretization, and user 4 remains fixed at 0. While the fairness remains almost optimal in any configuration, the conflict rate (and, thus, the total reward) changes significantly while only particular states give optimal situations, typically with a period of 180°. Between them, conflicts and/or fairness are sub-optimal, and the density of optimal combinations in the four-dimensional space is relatively low. Regarding realignment, at least three users generally need to tune their half-waveplates to achieve an optimal situation. Figure 6c shows that for any fixed angle, for example, for user 3, there is no optimal configuration unless users 3 and 4 have the same angle (here, 0°). This finding is confirmed by the simulation results presented in Sect. 2 of the Supplementary information.

Figure 6
figure 6

(a) Total reward, (b) fairness, and (c) pondered fairness \(I_{p}\) for N = 4 users and \(\left| S \right\rangle_{4}\), when three users tune their waveplates and the fourth remains static at 0°.

All of the asymmetric states \(\left| A \right\rangle_{4} (\phi )\) show fairness greater than 0.995 for any angle configuration. Thus, the variations of \(I_{p}\) only depend on the total reward. To understand the influence of \(\phi\) on the state performance, we studied the following states:

$$ \begin{aligned} \left| A \right\rangle_{4} (\phi ) & = \frac{1}{\sqrt 8 }\left[ {\left( {\left| {HHHV} \right\rangle - \left| {VVVH} \right\rangle } \right) + e^{i\phi } \left( {\left| {HHVH} \right\rangle - \left| {VVHV} \right\rangle } \right)} \right. \\ & \quad \left. { - \left[ {\left( {\left| {HVHH} \right\rangle - \left| {VHVV} \right\rangle } \right) + e^{i\phi } \left( {\left| {VHHH} \right\rangle - \left| {HVVV} \right\rangle } \right)} \right]} \right]. \\ \end{aligned} $$
(26)

Figure 7a–c show only \(I_{p}\) for the asymmetric states \(\left| A \right\rangle_{4} (0)\),\(\left| A \right\rangle_{4} \left( { + \frac{\pi }{2}} \right)\), and \(\left| A \right\rangle_{4} \left( \pi \right)\), respectively, with the same user configuration as for \(\left| S \right\rangle_{4}\). For \(\phi = 0\) and \(\phi = \pi\), there exist several planes of optimal situations, passing through points \((0\,[\pi ],0\,[\pi ],0\,[\pi ])\) that are not on the same edge of the cube of length \(\pi\). However, for \(\phi = \frac{\pi }{2}\), the optimal situations are only on lines corresponding to the intersections of the planes visible in (a) and (c).

Figure 7
figure 7

Pondered fairness \(I_{p}\) for \(N = 4\) users and \(\left| A \right\rangle_{4} \left( \phi \right)\) for different values of \(\phi\), when three users tune their waveplates and the fourth remains static at 0°. (a) \(\phi = 0\), (b) \(\phi = \frac{\pi }{2}\), and (c) \(\phi = \pi\). The fairness \(I_{J}\) is maximized within the natural fluctuation level for all these configurations.

Optimal planes such as those in Fig. 7 can be obtained for any permutation of the terms in (26) as long as \(\phi = 0\;[\pi ]\). The locations of these planes can be explained by the choice of phase terms in (26) and the sign of the phase difference in the second group of terms. Different signs give different planes in the parameter, although they always pass through points \((0\,[\pi ],0\,[\pi ],0\,[\pi ])\) that are not on the same edge of the cube of length \(\pi\) for a given plane. Furthermore, if user 4 has any fixed angle \(\theta_{4}\), the pattern in the parameter space of the three remaining users will simply be shifted, and the new optimal planes will pass through points \((\theta_{4} \,[\pi ],\theta_{4} \,[\pi ],\theta_{4} \,[\pi ])\).

Regarding the realignment procedure, in all cases and for any combination of fixed angles for two users, such as users 3 and 4, there always exists an optimal situation accessible to the two remaining users, as can be seen in Fig. 7. Thus, two users are sufficient to obtain optimal conditions, even though the other users remain fixed. In addition, fairness is always achieved; thus, it is of the best interest of each user to improve the situation for everyone, which is directly linked to the situation of the given user. This finding is also confirmed by the simulation results corresponding to this case, as presented in section 2 of the Supplementary information. Besides, when \(\phi = 0\;[\pi ]\), one user can reach an optimal situation from any initial 4-angle configuration by itself, as all lines parallel to x, y or z axes intersect an optimal plane: in those particular cases, the action of one user is enough to realign the whole system.

Discussion

In the previous sections, we demonstrated the application of quantum superposition of the polarization states of two, three, and four users who must decide between two machines. Explicit expressions of the optimal states in each case were found, as well as expressions of the optimal states in the N = 5 case, whose derivation is presented in Section 3 of the Supplementary information. In addition, we developed a script using Wolfram Mathematica, presented in Section 4 of the Supplementary information, that can identify whether or not a given N-photon state is symmetric and invariant under simultaneous rotation of all users. A generalization of the theoretical work presented here for any N would be an interesting development, as entangled photon states with more than 10 photons have already been generated experimentally. Based on the empirical observations, we hypothesize that complex N-roots of 1 are involved in the phase coefficients for the optimal state formulation, regardless of N.

In addition, we observed differences between odd and even N: while maximum fairness is achieved for even N regardless of the combination of user waveplate angles, odd N does not exhibit this property. This fact remains to be explained.

As for any resource sharing system, security is a major concern similarly with all telecommunication systems. The system described in this article relies on two fundamental features:

  • The properties are device independent, so any new user can be added with initial alignment verification.

  • Equality among users and resource allocation performance can be achieved without having to rely blindly on a remote central entity.

Regarding the first point, device independence is guaranteed by the rotation-independent property of the quantum states used in this study. Regarding the second point, the central entity corresponds to the N-photon source providing for all users. For this objective to be achieved, the users need a way to check whether the state received indeed corresponds to the optimal state expected (given the total number of simultaneous users and the expected total throughput). Such protocols have already been developed and verified for entangled photon pairs in the case of quantum key distribution36,37,38, including in a star-type network between an arbitrary number of users39, and several studies have proven the feasibility for more than two parties40,41,42, although limitations have been identified for large N43. All this may be of particular interest to secure applications of quantum information and resource sharing such as voting strategies using quantum states44.

Last but not least, this analysis relies on means of producing fully tunable quantum superposition of N photons with their polarizations as the entangled degree of freedom. Recent works have shown ways to obtain a Greenberger–Horne–Zeilinger state for up to 10 photons45, while experiments have already succeeded in generating any 6-photon state46. As such, the production of N-photon states is less a theoretical issue than a technological one.

Conclusion

Solving the CMAB problem implies maximizing the total outcome from given choices, as well as ensuring fair repartition of this outcome among all users. In this article, we theoretically and numerically demonstrated that carefully chosen polarization-entangled N-photon states can solve this problem for at least five players who must choose simultaneously between two choices or machines. Different behaviors, such as guaranteed fairness of outcome repartition only for even N, were identified between even and odd numbers of players, due to the fundamental properties of the polarization degree of freedom of photons. Nonetheless, in every case, the properties of these states imply that the best strategy for each player is to attempt actively to reach a measurement configuration that corresponds to a common optimum for all players simultaneously, thus achieving evolutionary stability. These quantum states are derived from a few basic rules, such as global invariance by rotation of the measurement bases of the players and symmetry of the state by permutation of the players and/or choices. Although a detailed derivation of the favorable states for any N has not been performed, we developed an algorithm that can verify whether any entangled state follows the set of rules described in this report. In addition, we demonstrated that, under basic assumptions, a fairly simple one-sided algorithm enables the users to correct any misalignment from an optimal configuration autonomously, without the need to trust a central entity. By associating this kind of system with Bell-type verification of the N-photon-state source, decentralized, secure, and optimal resource sharing among users could be achieved.