Quantum formalism for the dynamics of cognitive psychology

The cognitive state of mind concerning a range of choices to be made can be modelled efficiently by use of an element of a high-dimensional Hilbert space. The dynamics of the state of mind resulting from information acquisition can be characterised by the von Neumann–Lüders projection postulate of quantum theory. This is shown to give rise to an uncertainty-minimising dynamical behaviour equivalent to Bayesian updating, hence providing an alternative approach to representing the dynamics of a cognitive state, consistent with the free energy principle in brain science. The quantum formalism, however, goes beyond the range of applicability of classical reasoning in explaining cognitive behaviour, thus opening up new and intriguing possibilities.


Decision making and the state of mind
Throughout the paper I shall be concerned with the cognitive process of decision making, which will be a topic distinct from what is known as statistical decision theory, for which there are excellent treatises 20,21 .Decision making occurs when a person is unsure from which alternative to choose; but I will be using the term 'decision making' in a broad sense to include a person's uncertain point of view on a topic, for which there are multiple views, and for which there may not be a need to choose one particular alternative.In any case, this uncertainty, which is largely due to lack of sufficient information, can be modelled in the form of a set of probabilities that represents the likelihoods of different alternatives being selected.Suppose that a decision needs to be made to choose one out of N alternatives.The number of alternatives can be infinite, or even uncountable-the formalism extends straightforwardly to these cases, but for simplicity I consider the finite case here.That said, my own view is that the use of infinite or uncountable variables is merely for mathematical convenience, for, the reduction of uncertainty in determining the value of a continuous variable requires infinite information acquisition, which is not available in real world.At a given moment in time, let p k denote the likelihood that the kth alternative is selected.If p k = 1 for a value of k then the mind is in a definite state in relation to this decision.Hence for a given decision, the set of numbers (p 1 , p 2 , . . ., p N ) represents the state of mind in relation to that decision making.
It will be useful here to explain in more detail the meaning of these probabilities.For this purpose, consider the example, say, of choosing an item from a menu in a restaurant.The likelihood of selecting a given choice by a person, in fact, cannot be inferred, because in general a repeated "measurement" is not permissible in cognitive psychology.This follows from the fact that after an experiment, information is typically acquired to alter the state of mind of the person (e.g., "I have tried this dish and it was not so nice.").Hence in general one cannot reconstruct the state of mind of a person on a given choice.This is entirely analogous to the situation in quantum mechanics whereby in general one cannot perform a measurement without altering the state of the system.
To overcome this issue, consider, hypothetically, a large number of identical "clones" of the person, all of whom are faced with the same decision at the same time.The fact that there are uncertainties in the choice means that different clones will make different choices, the results of which can then be used to infer the likelihoods of different alternatives being chosen.This is essentially the idea of an "ensemble" introduced by Gibbs.Of course in reality there are no clones.In cognitive psychology the issue is handled by performing instead experiments on a large number of people.Each person is in a different state of mind, but if the group of people have some level of commonality, then they can be viewed as clones of a hypothetical representative of the group.
At any moment in time one faces a multitude of decisions, not just one, some of which are intertwined with each other while others are independent.In probability theory, such a situation is modelled by means of a joint probability for the totality of decisions.Alternatively, the situation can be modelled on a Hilbert space by use of the square-root map: Here by a Hilbert space H , I mean a real vector space endowed with a Euclidean inner product, such that each element of H has a finite norm.A typical element of H , i.e. a real vector of finite length, is denoted |ψ� , whose transposition is written �ψ| in the usual Dirac notation.Thus the inner product of a pair of elements |ψ� and |φ� in H is written An element |ψ� ∈ H , which is a vector, can also be expressed in the matrix form as follows: It will be shown that the norm of a vector carries no psychologically relevant information.Thus, in what follows it will be assumed that all vectors have unit norm.Clearly, the vector with components {ψ k } , in the basis |e k � = (0, 0, 0, . . ., 1, 0, . . ., 0) , with only the kth element nonzero, is an element of an N-dimensional real Hilbert space H N .Thus, in the Dirac notation the state can be expressed in the form of a superposition |ψ� = k ψ k |e k � .If there is a second decision to be made out of M alternatives, then the state of a person's mind in relation to these two choices is represented by an element of the tensor product H N ⊗ H M .This tensor product structure arises solely from statistical dependencies of two decisions, when modelled on a Hilbert space.
There are reasons for choosing to work with a Hilbert space H of square-root probability vectors, rather than the probabilities themselves.To this end recall the example of a person who is attempting to guess the outcome of a coin toss.I have indicated that the physical state of a tossed coin is that of either heads or tails, and this is represented on H by what is called a mixed state.Writing |1� for the 'heads' state and |0� for the 'tails' state, and p for the bias of the coin, the 'either-or' state is given by the following mixed-state density matrix: The state of mind of a person trying to guess the outcome, on the other hand, is neither heads nor tails, and this is represented on H by a linear superposition which can equally be represented in the form of a pure-state density matrix |ψ��ψ| .The two states ρ and |ψ��ψ| are different.Working with the mixed state ρ is equivalent to working with classical probabilities, whereas in general the properties of a pure state |ψ� cannot be fully captured using the language of classical probabilities.Of course, if the only decision at hand concerns the guessing of the outcome of a coin toss, then there are no psychological experiments one can perform that will distinguish the states ρ and |ψ� .That is, with merely one decision it is not possible to determine statistically, even if a large number of clones are available, whether the state of mind of a person is that of 'either-or' or 'neither-nor' .This makes it legitimate to work with classical probabilities when dealing with a single decision.However, if there are more decisions involved, and if some of the decisions are not compatible in the sense explained below-and there are ample empirical data suggesting some decisions are not compatible 22 -then it is possible to experimentally distinguish the mixed state ρ from the pure state |ψ� .In this situation, the use of Hilbert space techniques becomes a necessity, because cognitive behaviour resulting from incompatible decisions cannot in general be modelled by use of classical probabilistic reasonings, but can be explained by use of quantum probabilities 11 .
The Hilbert-space construction outlined here extends to the case where an arbitrary number of decisions are to be made.With this in mind, I define the state of mind of a person facing a range of alternatives to consider, at any moment in time, to be an element of the tensor product H = ⊗ K l=1 H l , where K is the number of distinct decisions.If two decisions can be made independently, then the component of the state vector belonging to the corresponding subspace of H will be in a product state.Otherwise, a state is entangled.As a simple example, consider a pair of binary decisions, for example, whether to take fish or meat for the main course, and whether to take red or white wine to accompany it.Writing |F� and |M� for the food choices, and similarly |R� and |W� for the wine selections, if the state of mind of a person is |ψ� = c 1 |FW� + c 2 |MR� , then the person will choose fish with white wine with probability c 2 1 , and meat with red wine with probability c 2 2 = 1 − c 2 1 ; but no other option will be chosen.This is evidently an entangled state (a state that cannot be expressed in the form of |XY � , where |X� is an arbitrary linear combination of |F� and |M� , and |Y � is an arbitrary linear combination of |R� and |W� ), which collapses to one or the other alternatives at the moment the waiter arrives and takes the order.
In this Hilbert space formulation, a given choice can be modelled by a real symmetric matrix, whose dimension is the number of alternatives.Such a matrix corresponds to observables in quantum mechanics.I will assume, for now, that all such 'observables' or 'choices' are compatible in the sense that the matrix representations can be diagonalised simultaneously.What this means is that at any given time, an arbitrary number of decisions can be made simultaneously.It is then evident that no state of mind, whether entangled or not, can violate laws of classical probability, and hence no state can violate, in particular, Bell's inequalities.Later in the paper, however, I will consider the case where choice observables are incompatible.
The eigenvalues of the choice observables then label different alternatives.This is analogous to quantum observables when it concerns labelling outcomes of a single measurement.However, observables in quantum theory have a second role apart from representing measurement outcomes: they generate dynamics.As a consequence, the differences of observable eigenvalues have direct physical consequences, and hence they cannot be www.nature.com/scientificreports/chosen arbitrarily.It appears, in contrast, that the differences of eigenvalues of the choice observables have no significance: the results of a selection, such as choosing a hand in the game of rock, scissors and paper, can be labelled by means of any three distinct numerical values, merely as place keepers so that statistical analysis can be applied.It will be shown below, however, that when it concerns the dynamics of the state of mind, the eigenvalue differences do play an important role, and hence, just as in quantum theory, they cannot be chosen arbitrarily, It may be helpful to remark here about the use of real Hilbert spaces in the present consideration, as opposed to complex Hilbert spaces required for the characterisation of quantum systems.In quantum theory, an imaginary number (or a complex structure, more precisely) plays two roles: one concerns the rotation of a plane wave by the right angle, and the other concerns the identification of the orientation of the time axis, connected to the unitary time evolution.To my knowledge in cognitive psychology there has been no phenomenon identified that suggests the requirement of a complex number-perhaps because this is not a question that has emerged in the past.Hence, although it seems plausible that complex numbers may ultimately be required for a more adequate characterisation of cognitive behaviour, for now I shall confine my analysis to real Hilbert space.

Dynamics
Having established the framework for representing the cognitive state of mind, let us explore how the state changes in time.To this end I shall be working under the hypothesis that a given state |ψ� of a person's mind changes only by transfer of information.It is, of course, possible that an isometric motion analogous to the unitary motion of quantum theory that does not exchange information can change the state (for example, to start wondering about the feasibility of a choice immediately after making the choice without any external information), and if so this will be given by an orthogonal transformation.However, without any clear physical or psychological evidence indicating the existence of such a symmetry, I shall not consider this possibility, and focus instead on the universally acknowledged empirical fact that information acquisition (or information loss) will inevitably change the state of a mind.The question is, in which way?
To understand dynamics, I shall be borrowing some ideas from communication theory.Focusing on a single decision to start with, let X denote the decision or choice observable, with eigenvalues {x k } k=1,...,N .These eigenvalues for now merely label different alternatives.The eigenstate |x j � of X , satisfying X|x j � = x j |x j � , thus represents the state of mind in which the jth alternative has been chosen.Prior to an alternative being chosen, the state is in a superposition |ψ� = k c k |x k � .The state will change when the person acquires information relevant to the decision making.This information is rarely perfect.In communication theory, anything that obscures finding the value of the quantity of interest is modelled in terms of noise.Let ε denote this noise.Here, ε can take discrete values, or more commonly continuous values.I will consider the latter case so that ε acts on an infinite dimensional Hilbert space H ∞ distinct from the state space H N .The noise arises from external environments E .For simplicity I shall assume that the state of noise is pure, and is given by |η� = η(y) ∈ H ∞ , although a mixed state can equally be treated.Then initially the state of mind of a person attempting to make a decision and the state of the noise-inducing environment as perceived by the decision maker are disentangled, and when taken together can be represented by the product state |ψ�|η�.
Acquisition of partial information relevant to decision making can be modelled by observing the value of Here, the sum is taken in the tensor-product space H N ⊗ H ∞ .To understand this composite sum, consider the case in which ε is finite and can take three values ε 1 , ε 2 , ε 3 , while the decision is binary, represented by the values x 0 and x 1 .Then ξ is a 6 × 6 matrix with the eigenvalues In general, the eigenvalues of ξ are highly (typically N-fold) degenerate.The form that ξ takes is of course noth- ing more than a signal-plus-noise decomposition in classical communication theory 17 .The 'signal' term, more generally, will be a function F( X) of X , but for simplicity I shall assume the function to be linear because the choice of F(x) is context dependent.
Once the value of the information-providing observable ξ is measured, the initially-disentangled state becomes an entangled state.In quantum mechanics, the transformation of the state after measurement is given by the von Neumann-Lüders projection postulate.That is, writing for the projection operator onto the subspace of H N ⊗ H ∞ spanned by the eigenstates of ξ with the eigenvalue ξ , the projection postulate asserts that the state of the system after information acquisition is A short calculation then shows that this is given more explicitly by .
Vol www.nature.com/scientificreports/Two interesting observations that follow are in order.First, the density matrix by construction is a projection operator onto a pure state |ψ(ξ )� given by where ξ is the value of the detected signal.Second, the coefficients of the pure state agrees with the conditional probability of the choice given by the Bayes formula: That is, π k (ξ ) is the probability that the kth alternative is chosen, conditional on observing the value ξ of ξ .It follows that the von Neumann-Lüders projection postulate of quantum theory not only gives the correct classical result (as already observed in 7,16 with a different construction) but also provides a simple geometric interpretation of the Bayes formula.This follows because the Lüders state |ψ(ξ )� associated to a degenerate measurement outcome ξ is given by the orthogonal projection of the initial state onto Hilbert subspace associated to this outcome.Hence |ψ(ξ )� is the closest state on the constrained subspace in terms of the Bhattachayya distance 23 to the initial state |ψ�|η�.
It is worth remarking that an alternative interpretation of the von Neumann-Lüders projection postulate can be given in terms of the so-called free energy principle 24,25 .Intuitively, this principle asserts that the change in the state of mind follows a path that on average minimises elements of surprise.In the present context, the degree of surprise can be measured in terms of the level of uncertainty.Suppose that the state of mind after information acquisition becomes μξ that is different from the Lüders state ρξ .Then the level of uncertainty associated with the choice observable X resulting from μξ , conditional on the observed value ξ of ξ , is given by where I have written δξ = μξ − ρξ for the deviation.Since the first two terms on the right hand side together constitute the conditional variance of X , which is positive and is independent of μξ , to minimise the expected uncertainty, and hence the surprise, for all X and ξ , it has to be that δξ = 0 .It follows that among all the states consistent with the observation, the Lüders state is unique in that it minimises the expected level of future surprise, as measured by the uncertainty.
I might add parenthetically that a psychologist wishing to predict the statistics of the behaviour of a person who has acquired information relevant to decision making will a priori not know the observed value of ξ .To a psychologist, ξ is thus a random variable with the density p(y) = m p m η 2 (y − x m ) .That is, given the state |ψ�|η� , the probability of the measurement outcome of the observation of ξ lying in the interval [y, y + dy] is given by p(y)dy .Hence, in this case the density matrix ρξ has to be averaged over ξ , but the denominator of ρξ is just the density p(y) for ξ , so the averaged density matrix is given by where ω kl = x k − x l and Evidently, 0 ≤ �(ω kl ) ≤ 1 and �(ω kk ) = 1 for all k, l, but because the initial state of mind |ψ��ψ| in this basis has the matrix elements { √ p k p l } , it follows that an external observer (e.g., a psychologist) will perceive a decoherence effect whereby the off-diagonal elements of the reduced density matrix are damped.It is at this point that I wish to comment on the numerical values of the differences {ω kl } .While there is no reason why �(ω) should be monotonic in ω (unless η(y) is unimodal), it will certainly be the case that the decoherence effect is more pronounced for large values of ω .That is, �(ω) ≪ 1 for ω ≫ 1 .For the same token, the values of ω kl will directly affect the conditional probabilities {π k (ξ )} .Therefore, while the values of ω kl , and hence those of x k , can be chosen arbitrarily to describe the statistics of the initial state of mind, once the dynam- ics have been taken into account (what happens after information acquisition), it becomes evident that they cannot be chosen arbitrarily.
A better intuition behind this observation can be gained by reverting back to ideas of signal detection in communication theory.For this purpose, consider a binary decision.Supposed that the eigenvalues of X labelling the two decisions are chosen to be, say, ±0.1 and suppose that the noise distribution η 2 (y) is normal, centred at zero, with a small standard deviation.In this case, the observed outcomes of ξ will most likely take values close to zero.As a consequence, a single observation of ξ will reduce on average the initial uncertainty only by a very small amount.In contrast, suppose that the two eigenvalues of X are chosen to be ±10 , but the noise is the same as before.Then the observation will almost certainly yield the outcome that is close to +10 or −10 .Hence the uncertainty in this case has been reduced to virtually zero after a single observation.This extreme example shows how it is not possible to label different choice alternatives by arbitrary numerical numbers, while at the same time adequately modelling the dynamics of the state of mind.
. tr X − tr( X μξ ) www.nature.com/scientificreports/ In the event where a model η(y) for the state of noisy environment exists, it is possible in principle to estimate the eigenvalue differences {ω ij } by studying how much a person's views shifted from the acquisition of the noisy information.This is because the average reduction of uncertainty, as measured by entropy change or the decoherence rate, is determined by the eigenvalue differences {ω ij } .The implication of the choices of the eigenvalue differences {ω ij } on voter behaviours in an electoral competition has recently been examined 26 .

Sequential updating
I have illustrated how the cognitive state of mind of a person in relation to a given choice changes after a single acquisition of information.A more interesting, as well as realistic, situation concerns the sequential updating of the state of mind as more and more noisy information is revealed.In this case the information-providing observable ξt is a time series.As a simple example that extends the previous one very naturally, I consider the time series where the noise term εt is modelled by a standard Brownian motion {B t } multiplied by the identity operator of the Hilbert space H ∞ .The 'signal' component, more generally, can be given by t 0 F s ( X)ds , but again for simplicity I assume that the function F t (x) is linear for all t.In fact, even more generally, the range of alternatives X itself can be time dependent, but I do not consider case here.
In this example, what happens to the state of mind can be worked out by discretising the time interval [0, t] and taking the limit.Starting from time zero, over a small time increment dt the initial state |ψ�|η� is projected to the Lüders state �ξ dt |ψ�|η� , suitably normalised, in accordance with the projection postulate.The noise is normally distributed with mean zero and variance dt , so that η(y) is the square-root of the corresponding Gauss- ian density function.Then after another time interval dt we apply the projection operator again, normalise it, and repeat the procedure until time t.Finally, taking the limit, a calculation shows that the Lüders state, after monitoring the observable ξt up to time t, is given by where ξ t = Xt + B t and X is the random variable represented on the Hilbert space H N by the operator X along with the initial state |ψ� ; that is, X takes the value x k with the probability p k , and � t = k p k e k k ξ t − 1 2 x 2 k t gives the normalisation.
Since we have an explicit expression that monitors the change in the state of mind as information is revealed, there is a priori no reason to identify the differential equation to which |ψ(ξ t )� is the solution.Nevertheless, the exercise of working out the dynamical equation provides several new insights worth discussing.The detailed mathematical steps required here to work out the dynamics has been outlined in 27 , so I shall not repeat this.It suffices to say that the Lüders state is a function of t and ξ t , where the latter is a Brownian motion with a random drift.Hence the relevant calculus to apply is that of Ito: one Taylor expands |ψ(ξ t )� in t and ξ t , and retain leading- order terms, bearing in mind that (dξ t ) 2 = dt .Then it follows that where � X� t = �ψ(ξ t )| X|ψ(ξ t )�/�ψ(ξ t )|ψ(ξ t )� and where The process {W t } defined in this way is in fact a standard Brownian motion, known as the innovations process 28 .This process has the interpretation of revealing new information.That is, while the time series { ξt } contains new as well as previously known information about the impending choice to be made, the process {W t } merely contains information that was not known previously.
There are two important observations that follow.First, the evolution of the state of mind is not directly generated by the noise {B t } , nor by the observation { ξt } .Rather, it is the innovations process that drives the dynamics.But this is the case only if the state of mind changes in such a way as to continuously minimise uncertainties.Since the expectation of the cumulative uncertainty is the entropy 27 , it follows, according to the present framework, that the tendency towards low entropy states required in biology 24 , which forms the basis of the free energy principle, emerges naturally.In particular, the implication here based on the projection postulate is that the state of mind changes only in accordance with the arrival of new information; it will not change spontaneously on its own.Second, while the analysis presented here can be deduced as a result of standard least-square estimation theory 28,29 , I have derived these results using the von Neumann-Lüders projection postulate of quantum theory.It follows that the informationally efficient dynamical behaviour of a system-efficient in the sense of minimising surprises-is applicable not only to states of mind but also to quantum systems.An analogous point of view, based on the free energy principle, has recently been proposed elsewhere 30 .
It might be added, for the purpose of psychological modelling, that the averaged reduced density matrix ρt = E[ ρξ t ] can be seen to obey the dynamical equation This, of course, is the Lindblad equation generated by the decision X.

Projecting down the dynamics
One advantage of working with the mathematical formalism of quantum theory in modelling the psychological states of mind is the deeper insight that it can uncover (cf. 10 ).To this end, I note that although I have defined the state of mind as a vector in Hilbert space, what I really have in mind is a projective Hilbert space consisting of rays through the origin of Hilbert space.The idea is as follows.In probability theory, one can say, for instance, that the likelihood of an event happening is 0.3, or three out of ten, or 30%-all of these statements convey the same idea.The total probability being equal to one is merely a convenient convention that does not carry any significance.Putting it differently, working with the convention that the expectation of any decision X in a state |ψ� is given by the ratio � X� = �ψ| X|ψ�/�ψ|ψ� , it is evident that the expectation values are independent of an overall scaling of the state |ψ� by a nonzero constant.Hence the Hilbert space vector |ψ� carries one psychologi- cally irrelevant degree of freedom.When this degree of freedom is eliminated by identification |ψ� ∼ |ψ� for any = 0 , one arrives at a projective Hilbert space, otherwise known as real projective space.This is a real manifold M of dimension N − 1 , endowed with a Riemannian metric induced by the underlying probabilistic rules given by the von Neumann-Lüders projection postulate 31 .
Let {ψ a } denote local coordinates for points on M .A point on M thus represents a state of mind correspond- ing to a family of vectors |ψ� , = 0 , on Hilbert space.For any representative |ψ� of that family corresponding to the point ψ ∈ M , consider a function on M through the expectation X(ψ) = �ψ| X|ψ�/�ψ|ψ� .With this convention, and writing ∇ a for the gradient vector, the dynamical equation for the state of mind |ψ(ξ t )� , when projected down to M , is given by where V X (ψ) = �ψ|( X − X(ψ))2 |ψ�/�ψ|ψ� is the function on M that corresponds to the variance of X in the state |ψ� .The mathematical analysis leading to this result have essentially been provided in 32,33 , which will not be repeated here.The important observation is that the drift term (the coefficient of dt ) that generates a ten- dency flow on the state space M is given by the negative gradient of the variance (uncertainty).Therefore, on the state space there is a tendency of driving the state of mind into one of the states with no uncertainty-this is the flow that attempts to reduce surprises.It also follows that if the state of mind is in the vicinity of one of the definite states of no uncertainty, then it will be difficult to escape from that neighbourhood-a phenomenon that I referred to as a tenacious Bayesian behaviour 19 .I shall have more to comment on this below.In Fig. 1 an example of the negative gradient flow of the uncertainty on the state space is sketched, along with the gradient flow ∇ a X of the mean.
It is worth remarking that in principle properties of the dynamics just outlined can be inferred from the squared amplitudes of the coefficients of |ψ(ξ t )� , which evidently contain as much information as the state |ψ(ξ t )� itself.Indeed, starting from the expression for π kt , which can be deduced by use of the Bayes formula, one can apply the Ito calculus to deduce the dynamical equation satisfied by π kt , known in communication theory as the Kushner equation 34 .However, π kt is a martingale, that is, on average a conserved process.In particular, the process has no drift (the coefficient of dt in the Kushner equation for π kt is zero), so by simply examining the Kushner equation for π kt it is difficult to infer key properties of the dynamics.In contrast, the surprise-minimising feature of the dynamics becomes immediately apparent once the process is projected to the state space M via Hilbert space.

Difference between psychological and quantum states
I have thus far emphasised the similarities in the state of mind as represented by a Hilbert space vector (or a density matrix), and the physical state of a quantum system as represented by the same scheme.There are, however, some important differences.The most important one, in my view, can be described through the following example.Suppose that the state of a quantum system is very close to one of the eigenstates |x k � of an observable X , and that the measurement outcome yields the value x l , l = k , for which the probability would have been very small.In this case, the interpretation of the event is the obvious one: a rare event has occurred.Now suppose instead that the state of mind is very close to one of the 'certain' states |x k � , in a situation where there is a correct choice to be made (for example, in deciding what had actually happened at an event in the past-as opposed to choices for which there need not be 'correct' outputs).In this case, if the correct choice happens to be |x l � , l = k , and if the outcome x k is nonetheless chosen (even though objectively the correct choice should have been x l ), this evidently does not mean that an unlikely event had occurred.Rather, it means that the initial state of mind was a misguided one.Putting the matter differently, while a state of a quantum system represents the physical reality of the system, a state of mind merely represents the person's perception of the state of the world.This difference between objective and subjective probabilities has important implications discussed below.There is also the suggestion that the state of a quantum system itself is entirely subjective 35 , but this idea will not be explored here.
The subjective nature of psychological states gives rise to the following challenge.In psychology, it is not uncommon for a group of people having varying dispositions to be given some information (e.g., an article to read or a video clip to watch) and for their responses to be examined.An example arises in the study of confirmation bias-a bias towards information that confirms their views 36,37 .The idea is to investigate how people having diverging opinions respond differently to the 'same' information.The issue here is that the information content of the given message such as an article or a video clip is different to people with different opinions, even though it is an identical information source.To explain this more concretely, consider the simple example discussed above.Suppose that the preference, or the opinion, of people on a topic is represented by the choice observable X .Then the information-providing observable representing an article discussing this topic is given by ξ = X + ε .However, if the first person has the state of mind |ψ� and the second person |ϕ� , then the Lüders state resulting from observing ξ is different.Hence, just because two people are given, say, the same article to read, to assert that they are given the same information is factually false.That is, the mutual information contained in the article about the impending decision is different to different readers.Indeed, if the observed value of ξ is given by ξ , then a calculation shows that the mutual information difference is where p i = �x i |ψ� 2 , q i = �x i |ϕ� 2 and f(y) denotes the distribution of the noise term.One important consequence in psychology is that the various conclusions drawn from such experiments on how people's behaviour might deviate from rational Bayesian updating require fundamental reexamination because they are not given the same information.
The subjective nature of psychological states also gives rise to a mathematical challenge.In communication theory, one is typically concerned with well-established communication channels, where the signal transmitted is assumed to represent an objective reality.Therefore, there is no ambiguity in interpreting the informationcarrying time series { ξt } .However, if different receivers were to interpret the 'same' message differently, and if there is a need to apply statistical analysis to the behaviour of different people, then the question arises as to which information process (called the 'filtration' in probability theory) one should be using for statistical analysis.To my knowledge, this situation has hardly ever been examined in the vast literature of probability and stochastic analysis.
Figure 1.Flow of the negative gradient of the variance.In the case of a decision involving three alternatives, the corresponding state space M is a real projective plane.This two-dimensional manifold is not orientable and cannot be embedded straightforwardly in three dimensions.However, it can be interpreted as a sphere with antipodal points being identified.Thus the flow on M can be captured by examining the flow on the positive octant of the sphere.Here, a threefold decision is modelled by a choice observable X with eigenvalues

Limitation of classical reasoning
Thus far I have assumed, for definiteness, that all decisions are compatible.What this means is that the quantum formalism advocated here, while effective, can be reduced, if necessary, to a purely classical probabilistic formulation.It seems to me that this assumption does not fully reflect the reality, and that it is plausible that not all decisions can be made simultaneously by human brains, even if there are only a small number of decisions to be made.Indeed, there are empirical examples in behavioural psychology that strongly indicate that not all decisions or opinions are compatible 10,38,39 .If so, the observables representing these choices will not commute.
The issue with the classical updating of likelihoods based on the Bayes formula is that it is not well suited to characterise changes of context, that is, changes in the sample space-represented, for example, by an arrival of information that reveals a previously unknown alternative (see [40][41][42] for discussions on contextuality in human decision making).In such a scenario, the prior probability of the new alternative is zero (because it was not even known), whereas the posterior can be nonzero.Hence, in the language of probability theory, the prior and the posterior are not absolutely continuous with respect to each other, prohibiting the direct use of the Bayes formula.In contrast, such a change of context can be modelled using incompatible observables, along with the von Neumann-Lüders projection postulate.
To see this, suppose that the prior state of mind is given by |ψ� = k c k |x k � when expanded in the eigenstates of X , where c m = 0 for some m, and suppose that acquisition of information takes the form η = Ŷ + ε , where Ŷ cannot be diagonalised using the basis states {|x k �} .Then it is possible that the Lüders state �η |ψ�/ √ �ψ| �η |ψ� resulting from information acquisition, when expanded in {|x k �} , is such that c m = 0 , thus circumventing the constraint of the classical Bayes formula.Therefore, in a situation whereby choice observables are not compatible, the quantum-mechanical formalism proposed here and elsewhere 11 becomes a necessity, because the modelling of the dynamical behaviour of a person cannot be achieved using the techniques of purely classical probability.As a simple example, consider two binary (yes/no) decisions that are represented by the choice observables Evidently, X and Ŷ cannot be diagonalised simultaneously, unless φ = 0 (mod 2π ).Suppose further that the initial state of mind of a person is represented by a Hilbert space vector for some θ .Then the probability that the person giving a 'yes' answer to question X is cos 2 1 2 θ ; whereas if question Ŷ were asked instead, then the likelihood of giving an affirmative answer is cos 2 1 2 (θ − φ) .Note that strictly speak- ing, according to the scheme introduced here, the state space for a pair of binary decisions is four-dimensional, if the two decisions (questions) are simultaneously considered.Here, I am interested in the effect of questions being asked sequentially, and for this purpose a two-dimensional representation suffices.Thus |ψ� represents an abstract state of mind for which a range of binary questions may be asked.Now suppose that question Ŷ is asked first, and subsequently question X is asked.Then from the projection postulate, the probability of giving a 'yes' answer to question X , irrespective of which answer was given to the first question, is 1 4 (2 + cos(θ) + cos(θ − 2φ)) .For φ = 0 this is different from the a priori probability cos 2 1 2 θ of answering 'yes' to question X .In other words, the so-called law of total probability in classical probability theory, that the unconditional expectation of a conditional expectation equals the unconditional expectation, is not applicable when one is dealing with incompatible propositions.Similarly, if question X is asked before ques- tion Ŷ , then the probability of giving a 'yes' answer to question Ŷ is cos 2 1 2 θ cos 2 1 2 φ + sin 2 1 2 θ sin 2 1 2 φ , which is different from cos 2 1 2 (θ − φ) when φ = 0.This example is perhaps the simplest one to demonstrate that answers to questions can be dependent on the order in which questions are asked, if the questions are not compatible (see 43 , Appendix 2, for a discussion on the order dependence).A more elaborate construction of this kind in higher dimension is found in 39 ; similarly, the use of positive operator-valued measures to analyse such order dependence is explored in 44 .In any case, violation of the law of total probability shows that this empirical phenomenon of order-dependence cannot be explained using compatible observables.
For a pair of binary choices, an attempt is made in 10 to explain the experiment discussed in 45 .The data presented in 45 show that when people are asked if Clinton is honest, about 50% answered 'yes' , and if they are then asked if Gore is honest, some 60% answered 'yes'; whereas if the order of the questions is reversed, then the figures change into 68% yes for Gore followed by 60% yes for Clinton.Note however that the explanation of this effect in 10 is incomplete because conditional probabilities are considered therein, whereas the data in 45 concern total probabilities.The analysis of total probabilities considered here, on the other hand, shows that by setting θ ≈ 9π/20 and φ ≈ π/15 , the phenomenon reported in 45 can be explained within a ±10% error margin.

Discussion
I have illustrated how the Hilbert-space formalism used in quantum theory is highly effective in modelling cognitive psychology, in particular, its dynamical aspects.I have shown how an important feature of the dynamics associated with Bayesian updating, or equivalently with the von Neumann-Lüders projection, namely, the uncertainty-reducing trend, is made transparent in this formalism.This, in turn, provides an alternative informationtheoretic perspective on the free energy principle, because of the close relation between entropy and variance in communication theory.That is, the Lüders state is the one that minimises the Kullback-Leibler divergence measure 46 from the initial state.One important consequence of the foregoing analysis is that states of low uncertainty are always preferred ones, irrespective of whether they represent the correct choices.Therefore, if the state of mind happens to be close to one of the false choices, then with a rational updating it is difficult to escape from this neighbourhood, since to achieve this, entropy has to increase before it can be decreased again, and this is counter to biological trends 47 .In such a situation, it appears that only the accidental effect of noise, which otherwise tends to be seen as a nuisance, can rescue the person from the false choice within a reasonable timescale, at least when all choices are compatible to each other.Hence noise can assist biological systems to conduct, in effect, a kind of simulated annealing-a closely related point of view has been put forward in the context of active inference 48 .
The situation changes once we accept the thesis of 10 that real-world decisions are never compatible, thus making it a necessity to model cognitive behaviour within the quantum formalism.To see this, consider a pair of maximally incompatible (in the sense that their eigenvectors are maximally separated) binary decisions modelled by the pair and suppose that the state of mind is given by or else a state very close to |ψ� .Since Ŷ |ψ� = |ψ� , this means that the state of mind in relation to decision Ŷ is already fixed to the alternative labelled by the eigenvalue +1 , and that the likelihood of choosing the other alternative labelled by the eigenvalue −1 is zero, or else very close to zero anyhow.Suppose further that the 'correct' choice is the one labelled by the eigenvalue −1 (in a situation where a correct alternative exists).The tenacious classical Bayesian behaviour 19 then implies that partial information η = Ŷ + ε about the truth will have little impact.Instead, if the person is given information, not about the choice Ŷ , but about X in the form ξ = X + ε , where the magnitude of noise ε is small, then after acquisition of this information the state will change into one of the two possible Lüders states that can result from the observations of ξ .These two states will be close to one of the two eigenstates of X .If partial information η = Ŷ + ε is subsequently provided, then irrespective of which Lüders state is chosen, the state of mind will now transform into one that is close to the truth with a high proobability.Thus the quantum formalism opens up a new possibility that was unavailable with the classical reasoning.
Let me conclude by speculating on possible implications in artificial intelligence.If one accepts the arguments presented, for example, in references 11,12 , for the premise that the probability assignment rules of quantum theory can describe human thinking more adequately than their classical counterparts, as supported by many empirical examples, then it follows that machine learning tools based on classical probability will ultimately fail to replicate human behaviour.In principle, artificial quantum intelligence (here I use the phrase to mean an artificial intelligence tool based on the quantum probability rules of von Neumann and Lüders, as opposed to a more commonly adopted notion of 'quantum artificial intelligence' using a quantum computer to enhance classical machine learning) can be implemented on classical computers (that is, without the need to build a quantum computer).However, for such an architecture to be useful, more research is needed to uncover the meaning and the implication of incompatible decisions in cognitive psychology.

x 2 = 2 , and x 3 = 3 .
Figure1.Flow of the negative gradient of the variance.In the case of a decision involving three alternatives, the corresponding state space M is a real projective plane.This two-dimensional manifold is not orientable and cannot be embedded straightforwardly in three dimensions.However, it can be interpreted as a sphere with antipodal points being identified.Thus the flow on M can be captured by examining the flow on the positive octant of the sphere.Here, a threefold decision is modelled by a choice observable X with eigenvalues x 1 = 1 , x 2 = 2 , and x 3 = 3 .The three axes, corresponding to the three corners of the octant, represent the three eigenstates of X with no uncertainty.The flow −∇ a V generated by the negative gradient of the uncertainty (shown in the left panel) takes an initial state into one of the states with no uncertainty: this is the tendency towards least surprise.The gradient ∇ a X of the mean is shown on the right panel for comparison.This term is multiplied by the Brownian increment dW t , which is normally distributed with mean zero and variance dt , so at any moment in time the direction of the flow generated by the Brownian fluctuation can go either way along the flow.