Does Nonlinear Neural Network Dynamics Explain Human Confidence in a Sequence of Perceptual Decisions ?

Recently single neurons measurements during perceptual decision tasks in monkeys have coupled the neural mechanisms of decision making and the establishment of a degree of confidence. These neural mechanisms have been investigated in the context of a spiking attractor network model. It has been shown that confidence about a decision under uncertainty can be computed using a simple neural signal in individual trials. However, it remains unclear if a neural attractor network can reproduce the behavioral effects of confidence in humans. To answer this question, we designed an experiment in which participants were asked to perform an orientation discrimination task, followed by a confidence judgment. Here we show for the first time that an attractor neural network model, calibrated separately on each participant, accounts for full sequences of decision-making. Remarkably, the model is able to reproduce quantitatively the relations between accuracy, response times and confidence, as well as various sequential effects such as the influence of confidence on the subsequent trial. Our results suggest that a metacognitive process such as confidence in perceptual decision can be based on the intrinsic dynamics of a nonlinear attractor neural network.


INTRODUCTION
A general understanding of the notion of confidence is that it quantifies the degree of belief we have in a decision Meyniel et al. [2015b], Mamassian [2015]. Many cognitive and psychology studies have tackled the problem of confidence estimation either by directly requiring participants to provide an estimation of their confidence Peirce and Jastrow [1884], Zylberberg et al. [2012], Adler and Ma [2018], or by using postdecision wagering (subjects can choose a safe option, with low reward regardless of the correct choice) Vickers [1979Vickers [ (reeditited in 2014, Kepecs and Mainen [2012], Fleming et al. [2010], Seth [2008], Massoni [2014]. Postdecision wagering has been used in behaving animals in order to study the neural basis of confidence estimation Smith et al. [2003], Kepecs et al. [2008], Kiani and Shadlen [2009], Komura et al. [2013], Lak et al. [2014]. In the context of two alternative forced choice (2AFC) the process of confidence readout has been studied with various models: using Bayesian decision theory and signal detection theory Clarke et al. [1959], Galvin et al. [2003], Fleming et al. [2010], Kepecs and Mainen [2012], as a measure of precision Meyniel et al. [2015a], Yeung and Summerfield [2012] or using integration of evidence over time Kepecs et al. [2008], Drugowitsch et al. [2012], Pleskac and Busemeyer [2010], Smith and Vickers [1988]. It has been proposed that choice and confidence can be read out from the same neural representation Kepecs and Mainen [2012], Meyniel et al. [2015b] in decision-making tasks. In order to account for the uncertain option task in monkeys experiments Kiani and Shadlen [2009], biophysically inspired models based on attractor neural networks have been proposed Wei and Wang [2015], Insabato et al. [2017]. In the uncertain option task, monkeys perform a two alternative forced choice task, but after a certain delay, in some trials, a sure target is presented for a certain For the modelling of the neural correlates, we consider a decision-making recurrent neural network governed by local excitation and feedback inhibition, based on the biophysical models of spiking neurons introduced and studied in Compte et al. [2000] and Wang [2002]. We work with the reduced version derived in Wong and Wang [2006], allowing for large scale numerical simulations and for better analytic analysis. More precisely, we consider the model variant introduced in Berlemont and Nadal [2019], which takes into account a corollary discharge (see Fig. 1 E). The model consists of two competing units, each one representing an excitatory neuronal pool, selective to one of the two categories, here C (clockwise) or AC (anti-clockwise). The two units inhibit one another, while they are subject to self-excitation. On presentation of a stimulus, each pool receives an excitatory current whose strength depends on the stimulus orientation. The stimulus orientation is characterize by the parameter c, which represents the strength of the stimulus (the larger c, the less ambiguous the stimulus, the easier the task). The excitatory current is given by the formula I stim = J ext (1 ± c). The decision, 'C' or 'AC', is made when one of the two units reaches a threshold (z). Once a decision is made (threshold is reached), an inhibitory current (the corollary discharge) is injected into the two neural pools, causing a relaxation of the neural activities towards a low activity, neutral state, therefore allowing the network to deal with consecutive sequences of trials, as illustrated in Fig. 1 F. For a biologically relevant range of parameters, relaxation is not complete at the beginning of a new trial, hence the decision made at this trial will depend on the one at the previous trial. Previously, we showed Berlemont and Nadal [2019] that he model accounts for the main sequential and post-error effects observed in perceptual decision making experiments in human and monkeys.
Full details about the experiment and the model can be found in the Material and Methods Section.

FITTING THE BEHAVIORAL RESULTS
We first fit the model to the behavioral data. As detailed in the Material and Methods Section, we perform model calibration in order to reproduce both the mean response times and the accuracy (success rates). For each participant, this is done separately for the three types of blocks.
Pure block. We consider the response time and accuracy of each participants by absolute value of the orientation angle. All subjects exhibit improved accuracy and shorter response times for less difficult (larger orientation) stimuli, as clas- Structure of a trial: Following a fixation period, the circular grating appeared and participants performed the decision (grating oriented clockwise or counterclockwise). In confidence blocks, after a delay participant reported their confidence with respect to the choice, on a discrete scale with 10 levels. We ask the following kind of confidence judgment to the participants: one extreme of the scale is "pure guess", the other is "absolutely certain". ( sically reported in the literature Baranski and Petrusic [1994]. Response times and accuracies with respect to stimulus orientation are shown in Fig. 2. Error bars are bootstrapped 95% confidence interval. As can be seen in Fig. 2, the model accurately reproduces the variety of observed behaviors. Confidence Block. Accuracy is higher but response times slower in confidence blocks than in pure blocks, an effect already reported in previous studies Martin et al. [0]. To test this effect on accuracy we ran a binomial regression of responses with fixed factors of orientation and type of block (pure or confidence), the interaction between these factors and a random participant intercept. The orientation coefficient was 2.15 (SD = 0.17, z = 12.44 and p < 10 −16 ); there was no effect of block type (p = 0.385). But we found a significant orientation by block type interaction (value of 0.55, SD = 0.08, z = 6.97 and p = 3 · 10 −12 ), to the effect that participants were more accurate in confidence blocks than in no-confidence blocks. In a similar way, we test the effect on response times by using a mixed effect regression with the same factors and intercept as for the accuracy (only on absolute value of the orientation). We found that the orientation coefficient (value of −0.08, SD = 0.013 and p = 0.0006) and the block type coefficient (value of 0.095, SD = 0.028 and p = 0.011) were significant, meaning that participant are slower in the confidence block. Moreover the slope by block type interaction with orientation was also significant (value of −0.028, SD = 0.010 and p = 0.031), meaning that the difference between the two types of blocks is more important at low orientation.
In Figure 2 we present the results of the fitting procedure applied to confidence blocks. First we note that in this case too the model is able to correctly reproduce the behavioral results of the different participants. Second we compare the parameters values obtained for the pure and confidence blocks. We find that participants have higher decision threshold (Signed Rank test Wilcoxon [1945] p = 0.03), higher stimulus strength level by angle (Signed Rank test, p = 0.031) and higher mean non-decision times (Signed Rank test p = 0.03). These results are analogous to those obtained with Drift Diffusion Models (DDM). Indeed, Martin et al. Martin et al. [0] find that, when fitting behavioral data with a DDM, non decision time, drift rate and decision threshold are modified by the confidence context. Our results therefore reinforces the final conclusion of these authors Martin et al. [0], which is that confidence evaluation context impacts first order perceptual decisions in at least three ways.  The black curve is the density of response times of the simulated model, and the red one corresponds to the associated non decision response times (B) In blue we plot the histogram of the response times of each subject (for each panel) in the confidence block. The black curve is the density of response times of the simulated model, and the red one corresponds to the associated non decision response times Feedback block. Surprisingly, we find that performance and response times across participants are identical for the feedback and pure blocks (no statistically significant difference). Our participants were highly trained on the orientation discrimination task. However confidence evaluation context has an impact on the decision-making process. The fact that feedback context does not have any impact highlights the specificity of the confidence in decision-making Martin et al. [0].
Non decision times. Our fitting procedure allows estimating the nondecision times (see the Material and Methods section). In Fig. 3 we represent the histogram of the response times across participants for the pure and confidence blocks. The red curve shows the distribution of nondecision time in the model, and the black curve the response times distribution. Note that with a fit only based on the mean response times and accuracies, the model is able to accurately account for the distributions of response times. We find that the minimum value of nondecision time is 75 ms for the no-confidence block, and 100 ms for the confidence block, and the average nondecision times are within the order of magnitude of saccadic latency Luce et al. [1986], Mazurek et al. [2003]. Finally we observe that the nondecision distributions clearly show a right skew for several participants, in agreement with Verdonck and Tuerlinckx [2016]. This justifies the modelling with an exponentially modified Gaussian distribution (EMG) Grushka [1972], instead of simply adding a constant nondecision time to every decision time.

CONFIDENCE MODELING
Recent studies have reported a choice-independent representation of confidence signal in monkeys Ding and Gold [2011] and in rats Kepecs et al. [2008], as well as evidence for a close link between decision variable and confidence (in monkeys from LIP recordings Kiani and Shadlen [2009] and in humans from fMRI experiments Hebart et al. [2014]). In an experiment with monkeys, Kiani and Shadlen Kiani and Shadlen [2009] introduce a 'sure target' associated with a low reward, which can be chosen instead of the categorical targets. The probability of not choosing the sure target is then a proxy for the confidence level. Wei and Wang Wei and Wang [2015] propose to model the neural correlates of confidence within the framework of attractor neural networks. They assume that the confidence level (as given by the probability of not choosing the sure target) is a sigmoidal function of the difference, at the time of decision, between the activities of the winning and loosing neural pools. This hypothesis is in line with similar hypothesis in the framework of DDMs and other decision-making models Vickers [1979Vickers [ (reeditited in 2014, Mamassian [2015], Drugowitsch et al. [2014]. They then show that the empirical dependencies of response times and accuracies in the confidence level are qualitatively reproduced in the simulations of the neural model. Following Wei and Wang [2015], we make here the hypothesis that the confidence in a decision is based on the difference ∆r between the neural activities of the winning and the loosing neural pools, measured at the time of the decision: the larger the difference, the greater the confidence. In our experiment with humans, the measure of confidence is the one reported by the subjects on a discrete scale, and it is this reported confidence level that we want to model. Within our framework, we quantitatively link this empirical confidence to the neural difference by matching the distribution of the neural evidence balance with the empirical histogram of the confidence levels, as illustrated in Fig. 10 of the Material and Methods section. In Fig. 4 we show for each participant the matching between the histogram of confidence levels, as reported by the participant, and the distribution of ∆r, as obtained in the model calibrated on the participant performance (as explained in the previous Section). Note that the main difference between participants lies in the percent of trials (and level on the confidence scale) for which a participant reports the highest confidence level. This allows a matching of the continuous distribution of ∆r onto the discrete one of the reported confidence levels. This last point can be at a very low value of ∆r, as it is the case for, e.g., Participant 2 (see Fig. 4.B).
In our analysis, the shape of the mapping is not imposed but inferred from the experimental data. This is in contrast with previous studies in which the sigmoidal shape is imposed Beck et al. [2008], Kepecs and Mainen [2012], Kepecs et al. [2008], Wei and Wang [2015]. We find however that, for each participant, the empirical mapping is very well approximated by a sigmoidal function of the type 1/ (1 + exp (−β(∆r − κ))), with participant-specific parameters κ and β. In Fig. 4, we also plot the corresponding sigmoidal fits. In Wei and Wang [2015] the authors exhibit a link between a probabilistic measure of confidence and ∆r, under the form of a sigmoid function. The similarity of our results on the link between the reported confidence and ∆r suggests that the human reported confidence can be understood as a discretization of a probabilistic function It has been shown that confidence ratings are closely linked to response times Baranski and Petrusic [1994], Desender et al. [2018a] and choice accuracy Peirce and Jastrow [1884], Baranski and Petrusic [1994], Sanders et al. [2016], Urai et al. [2017]. The behavioral confidence in the model is assumed to be based on a simple neural quantity measured at the time of the decision. In what follows we study if this hypothesis on the neural correlate of confidence can account for the links between the behavioral data: response times, accuracy and confidence. For the network we consider the parameters corresponding to the confidence block, as it is the only one with report of confidence. In Fig. 5 we represent the response times ( Fig. 5.A) and choice accuracy (Fig. 5.B) with respect to the reported confidence level for each participants. The data points show the experimental results (with the error bars as the bootstrapped 95% confidence interval), and the colored line the result of the simulation (with the light colored area the bootstrapped 95% confidence interval). Response times decrease Baranski and Petrusic [1994], Desender et al. [2018a] and accuracies increase with confidence Geller and Whitman [1973], Vickers and Packer [1982], Sanders et al. [2016], Desender et al. [2018a]. We find a monotonic dependency between response times and confidence, and between accuracy and confidence, but with specific shapes for each participant.
The model is able to accurately reproduce the strong link between response times and confidence ratings, despite the important difference of response times between participants. We observe that our model is able to correctly reproduce the trend of the link between performance and confidence, and that the behavioral results are inside the 95% bootstrapped confidence interval of the model. Note that some values of confidence are only observed for a few trials, resulting then in large error barsespecially for accuracies as we take the mean of a binary variable. The model is able to predict the psychometric and chronometric functions with respect to confidence for each participants.
It has been previously found that during a perceptual task reported confidence increases with stimulus strength for correct trials, but decreases for error trials Kepecs et al. [2008], Sanders et al. [2016], Desender et al. [2018a]. This effect of confidence has been correlated to patterns of firing rates in rats experiments Kepecs et al. [2008] and to the human feeling of confidence Sanders et al. [2016]. This effect is in accordance with a prediction of statistical confidence Ernst and Banks [2002], Griffin and Tversky [1992], Sanders et al. [2016]. In Fig. 6 we represent the mean confidence as a function of stimulus strength, for correct and error trials. We observe the same type of variations of confidence with respect to stimulus strength, both in the experimental results and in the model simulations. We thus see that our attractor network model reproduces a key feature of statistical confidence.
Next we show that the model successfully account for psychometric and chronometric aspects of confidence.

COGNITIVE EFFECTS OF CONFIDENCE
Perceptual decisions made by humans in behavioral experiments have been shown to depend not only on the current sensory input, but also on the choices made at previous trials. Various sequential effects have been reported Fernberger [1920], Laming [1979], Gold et al. [2008], Leopold et al. [2002], and different models have been proposed to account for them Cho et al. [2002], Angela and Cohen [2009], Glaze et al. [2015], Bonaiuto et al. [2016], Berlemont and Nadal [2019].  When the decision-maker does not receive feedback, confidence in one's decision might be important for controlling future behaviors Yeung and Summerfield [2012], Meyniel et al. [2015b]. Recently, the effect of confidence on the history biases have been experimentally investigated Braun et al. [2018], Samaha et al. [2018]. It has been shown that decisions with high confidence confer stronger biases upon the following trials. Here we investigate the influence of confidence upon the next trial in the empirical data, and show that the results are well reproduced by the behavior of our dynamical neural model.

Confidence
First we do a statistical analysis of the effect of history biases on response times in the experimental data. For this, we classify each trial into low and high confidence by means of a within participant median split: For each participant, a trial is considered as low confidence (resp. high confidence) if the reported confidence is below (resp. above) the participant's median. We analyze the history biases making use of linear mixed effects models (LMM) Gelman and Hill [2007]. The LMM we consider assumes that the logarithm of the response time at step n, RT n , is a linear combination of factors, as follows: ln(RT n ) = a 0,p + a 1,p |θ| + a 2 x repetition + a 3,p ln(RT n−1 ) + a 4 Conf n−1 (1) with x repetition a binary variable taking the value 1 if the correct choice for the current trial is a repetition of the previous choice (and 0 otherwise), θ the orientation of the circular grating (in degree), RT n−1 the response times of the previous trials, and Conf n−1 the confidence of the previous trial coded as 0 for low and 1 for high. The subscript p in a coefficient (e.g a 0,p ) indicates that for this parameter we allow for a random slope per participant.
We compared this LMM to other ones that do not include all these terms, using the ANOVA function (in R language, with the lme4 package Bates et al. [2015]) that performs model comparison based on the Akaike and Bayesian Information Criteria (AIC and BIC) Bates et al. [2014]. As we can note in Table 1 the LMM from Eq. 1 is preferable in all cases, and from now on we only report the results obtained with this LMM, Eq. (1).
The result of the generalized linear model are summarized in Table 2. As expected we find that the slope of the value of orientation is significant, as well as the repetition of response Cho et al. [2002]. In line with previous works Desender et al. [2018a], high confidence has the effect of speeding up the following trial (we find a significant negative slope). Finally the slope corresponding to the response time of the previous trial is significantly positive, meaning that the participants have the tendency to show sequences of fast (or slow) response times.
Next, we do the same type of statistical analysis of correlations between successive trials for the behavior of the neural model (with parameters fitted as explained previously), following a numerical protocol replicating the one of the experiment. We thus apply the same generalized model (Eq. 1) to the results of the simulations. The results of this statistical analysis are summarized in Table 3.
First we note that the attractor neural network captures the variation of response times with respect to angle orientation, as expected from Wong and Wang [2006]. Second, the dependency in the choice history (through the repetition of responses) is correctly reproduced, in agreement with a previous study of these effects Berlemont and Nadal [2019]. Finally, the effect of confidence on response times is significant, with negative slopes as for the experiment and with the same order of magnitude. In Fig. 7 we illustrate how the nonlinear neural dynamics lead to confidence-specific sequential effects. The analysis is here similar to the one done in Berlemont and Nadal [2019] for the analysis of post-error effects in the same neural model. On each panel we compare the (mean) neural dynamics for post-low and post-high confidence trials (respectively red and blue lines). Without loss of generality we suppose that the previous decision was a C grating. We first note that the relaxation dynamics between two consecutive trials are different, resulting in starting points for the next trial different for post-low and post-high confidence trials. Panel (A) corresponds to the case where the new stimulus is also C oriented ("repeated" case), at low strength. level. The ending points of the relaxations are deep into the basin of attraction, meaning that the non-linearity of the system plays an important role in the dynamics. Because the post-high confidence relaxation lies deeper into the basin of attraction than the one of post-low trials, the subsequent dynamics will be faster for post-high confidence trials in this case. In Panel (B) we represent the case, still at low stimulus strength, where the stimulus orientation of the new stimulus is the opposite ("alternated" case) to the one corresponding to the previous decision (hence an AC grating). The dynamics lie close to the basin boundary of the two attractors, thus the dynamics is slow and there is no significant difference between post-low and post-high confidence trials. In panels (C) and (D) we represent the same situations as panels (A) and (B), respectively, but for high strength levels (easy trials). The ending point of the relaxations are far from the boundary of the basins of attraction, whatever the grating presented. The response times for post-high and post-low confidence trials are thus similar.
In order to investigate this effect in the data of the experiment, we first transform the reaction times of each participant using the z-score Kreyszig [1979]. Now that the reaction times are normalized we consider all the participants together. We group the reaction times following the same cases as previously: high and low stimulus strength, repeated or alternated. We compare post high and low confidence trials in each subcase using a t-test Fay and Proschan [2010]. We find that mean reaction times between post low and high confidence trials are different in the low orientation stimuli and repeated case (t-test, p = 0.044), but that in the low orientation stimuli and alternated case, high orientation stimuli and alternated case, low orientation stimuli and repeated case they are identical (respectively p = 0.90, p = 0.70,p = 0.23). This is in accordance with the previous analysis of the non-linear dynamics.
The model reproduces sequential effects correlated with repetition and confidence, and we have shown that these effects result form the intrinsic nonlinear network dynamics. However it does not reproduce the correlations of the response time with the previous response time. This allows us to distinguish the effects that can be explained by the intrinsic dynamics of an attractor network, and the ones that would require the implementation of other cognitive processes.

COMPARAISON WITH OTHER MODELS
We now compare our modeling analysis with the ones that we obtain making use of two popular dynamical models of decision making for which confidence can be modelled in a very similar way.
First we consider another non-linear model with mutual inhibition, the Usher-McClelland model Usher and McClelland [2001] (more details in the Material and Methods section). Compared to attractor network models, this model allows for more intensive numerical simulations, with few parameters Cho et al. [2002], Botvinick et al. [2001], Gao et al. [2009]. In this model the decision is obtained through a competition between two units, until a threshold is reached. We can model confidence in this model as a function of the balance of evidence between the two units in a similar way as in our model. Using the same optimization algorithm as described in the Material and Methods section for our model, we fit the Usher-McClelland model separately for each participant. In Fig. 8 we represent the response times and accuracies with respect to confidence. First we note that the model fit the response times with respect to confidence, but only at intermediate levels of confidence. For some participants, we note a strong divergence at high confidence (Participant 1,4 and 5). This can be explained by the fact that, in this model, 'firing rate' variables can take negative values (in fact in the steady state without any input the firing rate variables have negative values). This leads to extreme value of confidence for long trials. Second the trend in accuracy is not correct for some participants (Participants 1 et 4). Accuracy is an increasing function of confidence (except for participant 5) but the experimental data do not fall within the bootstrapped confidence interval of the simulations. These limitations for response times and accuracy highlight the fact that this less biophysical model is able to capture the general trend of psychonometric and chronometric functions with confidence, but not all the effects.
Second, we consider a model of the drift-diffusion family. Variation of response times and accuracy with respect to confidence have been captured by drift-diffusion models in previous studies Moreno- Bote [2010], Zylberberg et al. [2012] and in independent race models (IRM) Raab [1962], Bogacz et al. [2006], Vickers [1970], Merkle and Van Zandt [2006]. Here we investigate if the IRM can reproduce confidence-specific sequential effects. The IRM is as follows. During the Figure 9: Schematic dynamics of a race model with a relaxation mechanism. The upper and bottom dash lines correspond to the two opposite decision thresholds. The blue trajectory is a typical winning race. The black rectangle on the x-axis denotes the beginning of the next stimulus, hence the end of the relaxation period. The green and orange trajectories are the loosing races in two trials with different confidence outcomes. The green and orange dashed lines represent the mean dynamics of these two races (starting for the ending point of the relaxation) during the presentation of the next stimulus.
accumulation of evidence the equations of evolution are: with ν i (t) a white noise process, and i = {C, AC}. The first race that reaches a threshold z (or −z) is the winning race. The confidence in the decision is modelled as a monotonic function of the balance of evidence |z−x losing | (Fig. 9) Vickers [1979Vickers [ (reeditited in 2014, Mamassian [2015], Drugowitsch et al. [2014], Wei and Wang [2015]. Here we extend the IRM in order to deal with sequences of trials. To do so, we allow for a relaxation dynamics between trials, in a way analogous to the relaxation dynamics in the attractor network model. Hence, after a decision is made, both units receive a non specific inhibitory input leading to a relaxation until the next stimulus is presented (see Fig. 9). Within this extended IRM framework we can study the sequential effects correlated with confidence.
Because there is no interaction between the two races, the relaxation of the winning race is the same in both low and high confidence trials. However, the ending point of the relaxation of a trial with high confidence is closer to the base-line (0 line) than one with low confidence (Fig. 9). For the next trial, if the winning race is the same as previously, then the mean response times are identical in low and high confidence cases. However, if the opposite decision is made, the response time in the post-low confidence case is faster than the one in the post-high confidence case, as we can observe with the mean race shown in Fig. 9. This behavior is in contradiction with the data in which we observed the opposite effect (Table 2). This conclusion more genrally applies to any race-type model without interactions between units. Race models with interactions have also been consideredBogacz et al. [2006], but such models are thus more similar to attractor networks, yet with less biophysical foundation.

DISCUSSION
Modeling confidence. Dynamical models of decision making implement in different ways the same qualitative idea: decision between two categories is based on the competition between units collecting evidences in favor of one or the other category (or with a single unit whose activity represent the difference between the categorical evidences). Apart from very few worksRolls et al. [2010a,b], authors propose that behavioral confidence can be modeled as a function of the balance of evidence, with some variations in the models Vickers [1979 (reeditited in Pleskac and Busemeyer [2010]. We discuss the various approaches in light of our results, first by comparing the different models for confidence, then the specific effects of confidence on sequential decision-making. Bayesian inference models compute confidence using extensions to drift-diffusion models (DDM) based on decision variable balance Vickers [1979Vickers [ (reeditited in 2014, Kepecs et al. [2008], Moreno-Bote [2010], Zylberberg et al. [2012], possibly with additional mechanisms -decision variable balance combined with response times Kiani et al. [2014] or post-decisional deliberation Pleskac and Busemeyer [2010] (the dynamics continues after the decision, thus updating the balance of evidence). These models successfully account for various psychometric and chronometric specificities of human confidence. In DDMs, confidence based on decision variable balance predicts that confidence should deterministically decrease as a function of response times Drugowitsch et al. [2012], Kiani and Shadlen [2009]. However, Ratcliff and Starns Ratcliff and Starns [2009] have shown that the response times distributions strongly overlap across confidence levels. Such property can be recovered making use of additional processes, such as with a two-stage drift-diffusion model Pleskac and Busemeyer [2010]. Yet, other effects remains not explained within the framework of DDM. This is the case of early influence of sensory evidence on confidence Zylberberg et al. [2012], as well as the fact that confidence is mainly influenced by evidence in favor of the selected choice Zylberberg et al. [2012]. Kiani and Shadlen [2009], Wei and Wang make use of a ring attractor neural network with confidence computed from the balance of evidence Wei and Wang [2015].

Considering experiments in monkeys
In the present paper, their approach has been extended to the case of a two-variable attractor network model Wong and Wang [2006], taking into account an inhibitory feedback allowing the network to engage in a sequence of trialsBerlemont and Nadal [2019]. The reported confidence is modelled as a function of the difference in activity between the winning and loosing populations at the time of decision. The asymmetries mentioned above in the influence of evidence on confidence automatically arise in the accumulation of evidence in attractor neural networks Wong et al. [2007]. We expect that these asymmetries will arise the same way on confidence, as we model confidence as a function of the balance of evidence.
Here we have shown, for the first time, that the network accounts for sequences of decisions and reproduce response times, accuracy and confidence individually for each participant. In addition, the model accounts for other effects reported in the literature, among which some effects which were believed to be specific signatures of Bayesian confidence Sanders et al. [2016], Adler and Ma [2018]: confidence increases with the orientation for correct trials but decreased for error trials.
Confidence and Serial dependence. The most common serial dependence effect in perceptual decision-making is the fact that the current stimulus appears more similar to recently seen stimuli than it is really Cho et al. [2002], Fecteau and Munoz [2003]. This effect can be observed for a large variety of perceptual features such as luminance Fründ et al. [2014], orientation , direction of motion Cho et al. [2002] or face identity Liberman et al. [2014]. It has been recently shown that the magnitude of history biases increases when previous trials were faster and correct. Within the Signal Detection Theory framework, making use of the correlations between confidence, response time and accuracy, this effect is interpreted as an impact of confidence on the next decision Braun et al. [2018]. By measuring directly the subjective confidence of the participants, recent studies confirm that confidence modulates the history biases Desender et al. [2018a], Samaha et al. [2018], Desender et al. [2018b].
On the theoretical side, the impact of confidence on response times of the subsequent trials has been investigated in the framework of DDMs Desender et al. [2018a]. The trials are divided into two categories, subsequent to low or high confidence trials, then a DDM is fitted separately on each type of trial. This amount to assume that parameters (threshold and drift) are changed depending on the confidence level at the previous trial.
In our experiment, we observe that high confidence trials lead to faster subsequent choices in agreement with the above mentioned experimental studies. This effect is well reproduced by the attractor neural network. In contrast to the DDM analysis, for each participant we calibrate our attractor model globally, and reproduce the sequential effects with a unique set of model parameters. This behavior can be understood as a property of the intrinsic nonlinear dynamics as discussed in the Results section. Hence, the attractor network model does not only account for the relationship between confidence, response times and accuracy, but also reproduces the influence of confidence on serial dependence.
Decision and non decision times. Human studies commonly report right-skewed response times distribution Ratcliff [1978], Luce et al. [1986], Ratcliff and Rouder [1998]. This long right tail is well captured by drift-diffusion model Ratcliff and Rouder [1998], Usher and McClelland [2001]. However it should be noted that, with trained subjects, the right-skew is more outspoken and the response times distribution can be accurately reproduced by a Gaussian distribution Peirce. In contrast to human studies, experiments with monkeys do not show such long right tails in the response times histograms, which is inconsistent with drift diffusion models Ditterich [2006]. When considering a uniform non-decision time attractor neural network cannot account for the right-skewed distribution, but accurately reproduce the shape of the distribution in monkeys experiments Wang. In this work we show that in the range of parameters we considered, the decision time could be approximate by a Gaussian distribution. In this case the long right tail results only from the non-decision time and not from the decision time anymore. However, even in the case of the drift-diffusion model, which naturally produce the correct distribution for human studies, non-decision times is not necessarily uniform Verdonck and Tuerlinckx [2016]. Indeed, if the non-decision times are not constrained to a specific shape within the DDM, the non-decision time density estimates are not uniform distribution but indicate a strong right skew Verdonck and Tuerlinckx [2016]. Thus even in the case where a model is able to account for the right tail of the decision time, this specific shape is plausibly due to a combination of the decision time and the non-decision time.
To conclude, in this work we design a specific experiment in order to study confidence with human participants. We fit a neural attractor network to each participant in order to describe their behavioral results: response times, accuracy and confidence. Finally we show that the impact of confidence on sequential effect is described by the intrinsic nonlinear dynamics of the network.

PARTICIPANTS
Nine participants (7 Females, Mean Age = 27.3, SD = 5.14) have been recruited from the Laboratoire de Psychologie cognitive et de Psycholinguistique's database (LSCP, DEC, ENS-EHESS-CNRS, PSL, Paris, France). Every subject had normal or corrected-to-normal vision. We obtained written informed consent from every participant who received a compensation of 15 euros for their participation. The participants performed three sessions on three distinct days in the same week for a total duration of about 2h15. The experiment followed the ethics requirements of the Declaration of Helsinki (2008) and has been approved by the local Ethics Committee. Three participants were excluded. Two of the excluded participants did not complete correctly the experiment and one exhibited substantially asymmetric performance (98% of correct responses for an angle of 0.2°, but 18% at -0.2°degree). We thus analyzed data from 6 participants.  Kleiner et al. [2007]. Trials began with the presentation of a black fixation point (duration = 200 ms). Then the stimulus for the primary decision task was presented, consisting in a circular grating (diameter = 4 • , Tukey window, 2 cycles per degree, Michelson contrast = 89%, duration = 100 ms, phase randomly selected at each trial). The grating had eight possible orientations with respect to the vertical meridian, and participants were asked to categorize them as clockwise or anti-clockwise with respect to the vertical meridian by pressing the right-arrow or left-arrow. Participants had been instructed to respond as follows: "You have to respond quickly but not at the expense of precision. After 1.5 s the message, "Please answer", will appear on the screen. The ideal is really to respond before this message appears".
Trials were of three types, grouped in pure block, feedback block and confidence block (see below). Participants performed three sessions on three distinct days. Each session (45 min) consisted in three runs, each run being composed of one exemplar of each of the three type of block, in a random order. Before starting the experiment, participants performed a short training block of each type, with easier orientations than in the main experiment.
Pure block In this block, after each decision participants waited 300 ms before the black fixation point appears. The stimulus appeared 200 ms after this fixation point. The eight possible orientations for the circular grating were [-1.6°, -0.8°, -0.5°, -0.2°, 0.2°, 0.5°, 0.8°, 1.6°] and a stimulus was chosen randomly among them with the following weights: Confidence block In the confidence block, participants had to evaluate the confidence on the orientation task 200 ms after the decision. To perform this task they had to move a slider on a 10-points scale, from pure guessing to certain to be correct. Importantly, the initial position of the slider was chosen randomly for each trial. Participants moved the slider to the left by pressing the "q" key, and to the right with the "e" key. They confirmed the choice of the value of confidence by pressing the space bar. The participants had the choice to indicate that they had made a "motor mistake" during the orientation task. For this they had to press a key with a red sticker instead of responding on the confidence scale.
After the choice of confidence, the participants had to wait 300 ms before the black fixation dot appears. After the fixation dot the stimulus appeared 200 ms later. The orientations of the circular gratings were the same as in the feedback block.

MODEL
A decision-making attractor neural network. We consider a decision-making recurrent network governed by local excitation and feedback inhibition, based on the biophysical models of spiking neurons introduced and studied in Compte et al. [2000] and Wang [2002]. Within a mean-field approach, Wong and Wang Wong and Wang [2006] have derived a reduced firing-rate model composed of two interacting neural pools which faithfully reproduces not only the behavioral behavior of the full model, but also the dynamics of the neural firing rates and of the output synaptic gating variables. The details can be found in Wong and Wang [2006] (main text and Supplementary Information). This model and its variants are used as proxies for simulating the full spiking network and for getting mathematical insights Wong and Wang [2006], Wang [2011], Miller andKatz [2013], Deco et al. [2013], Engel et al. [2015], Berlemont and Nadal [2019]. Here we make use of a model variant introduced in Berlemont and Nadal [2019], which takes into account a corollary discharge Sommer and Wurtz [2008], Crapse and Sommer [2009]. This results in an inhibitory current injected into the neural pools just after a decision is made, making the neural activities to relax towards a low activity, neutral, state, therefore allowing the network to deal with consecutive sequences of decision making trials. The full details can be found in Berlemont and Nadal [2019]. We remind here the equations and parameters with notation adapted to the present study.
The model consists of two competing units, each one representing an excitatory neuronal pool, selective to one of the two categories, C or AC. The dynamics is described by a set of coupled equations for the synaptic activities S C and S AC of the two units C and AC: i ∈ {C, AC}, The synaptic drive S i for pool i ∈ {C, AC} corresponds to the fraction of activated NMDA conductance, and I i,tot is the total synaptic input current to unit i. The function f is the effective single-cell input-output relation Abbott and Chance [2005], giving the firing rate as a function of the input current: where a, b, d are parameters whose values are obtained through numerical fit. The total synaptic input currents, taking into account the inhibition between populations, the self-excitation, the background current and the stimulus-selective current can be written as: with J i,j the synaptic couplings. The minus signs in the equations make explicit the fact that the inter-units connections are inhibitory (the synaptic parameters J i,j being thus positive or null). The term I stim,i is the stimulus-selective external input. The form of this stimulus-selective current is: with i = C, AC. The sign, ±, is positive when the stimulus favors population C, negative in the other case. Here the parameter J ext combines a synaptic coupling variable and the global strength of the signal (which are parametrized separately in the original model Wong and Wang [2006], Berlemont and Nadal [2019]). The quantity c θ , between 0 and 1, characterizes the stimulus strength in favor of the actual category, here an increasing function of the (absolute value of) the stimulus orientation angle, θ.
In addition to the stimulus-selective part, each unit receives individually an extra noisy input, fluctuating around the mean effective external input I 0 : with τ noise a synaptic time constant which filter the white-noise.
After each decision, a corollary discharge under the form of an inhibitory input is sent to both units until the next stimulus is presented: This inhibitory input, delivered between the time of decision and the presentation of the next stimulus, allows the network to escape from the current attractor and engage in a new decision task Berlemont and Nadal [2019].
Confidence modeling. Within the various decision making modelling frameworks, similar proposals have been made to model the neural correlated of the behavioral confidence level. In race models Raab [1962], which have equal number of accumulation variables and stimulus categories, as in attractor network models, the balance of evidence at the time of perceptual decisions has been used to model the neural correlate of the behavioral confidence Vickers and Packer [1982], Smith and Vickers [1988], Wei and Wang [2015]. This balance of evidence is given by the absolute difference between the activities of the category specific units at the time of decision. Here we follow Wei and Wang Wei and Wang [2015], considering that confidence is obtained as the difference in neural pools activities, ∆r = |r C − r AC |.
In our experiment, the subjects expressed their confidence level by a number on a scale from 0 to 9. In order to match the neural balance of evidence with the confidence reported by the subject, we map the balance of evidence histogram onto the behavioral confidence histogram, a procedure called histogram matching Gonzalez et al. [2002]. This mapping, illustrated in Fig. 10, is nonparametric. Note that the mapping is here from a continuous variable to a discrete one (taking integer values from 0 to 9).  common to all participants τ CD common to all participants threshold z specific to each participant c θ , θ = {0.2°, 0.5°, 0.8°, 1.6°} specific to each participant Table 5: List of parameters subject to calibration, separately for the pure and confidence blocks.

Parameter
We perform a model calibration in order to fit the behavioral data of each participant. More precisely, we calibrate the model by fitting, for each participant, both the mean response times and the accuracies for each orientation, this separately for each block. We note that we only fit the means, which implies that the fits do not take into account the serial dependencies. Doing so, any sequential effects that will arise in the model will result from the intrinsic dynamics of the network, and not from a fitting procedure of these effects.
The model parameters values are those used in Berlemont and Nadal [2019], and reproduced in Table 4, except for the few parameters listed in Table 5, that is I CD,max , τ CD , c θ and z, which are chosen in order to fit the data. The two parameters I CD,max and τ CD are imposed common to all participants (joint optimization). The parameters c θ (one for each orientation value) and z are optimized across subjects and blocks. these two times, the mean non decision time is thus independent of the orientation. For comparing data with model simulations (which only gives a decision time) at any given orientation θ, we first substract to the mean response time the mean response time averaged over all orientations (this for both data and simulations). We calibrate the model parameters so as to fit these centered mean response times. This will provide a fit of the mean response times (at each angle) up to a global constant, which is the mean non decision time (the modeling of the non decision time distribution is presented in the next Section). For each participant, and each block, we thus consider the cost function: where the sums are over the orientation values, θ = {0.2°, 0.5°, 0.8°, 1.6°}, and the normalization factors n (for response times) and m (for the accuracy) are given by In these expression, RT data (θ) denotes the mean experimental response time obtained by averaging over all trials at the orientations ±θ, RT data is the average over all orientations; RT network (θ) and RT network are the corresponding averages obtained from the model simulations. The coefficient α denotes the relative weight given to the response time and accuracy cost terms. We present the results obtained when taking α = 2, but it should be noted that the choice of this parameter does not impact drastically the fitted parameters. Finally we add a soft constraint on c θ so that this value does not diverge when the participant accuracy is close to 100%. Note that this constraint does not affect the results of the fitting procedure, but is necessary for computing the confidence interval of the fitted parameters.
For each subject, we minimize this cost function with respect to the choice of c θ and z, making use of a Monte Carlo Markov Chain fitting procedure, coupled to a subplex procedure Rowan [1990]. This method is adapted to handle simulation based models with stochastic dynamics Bogacz and Cohen [2004]. Finally, I CD,max and τ CD are fitted using a grid search algorithm as they have less influenced on the cost function. In the model, the parameter c represent the stimulus ambiguity, which we expect here to be a monotonous function of the amplitude of the angle, θ. When allowed to be independent parameter values for each value of the orientation, θ = {0.2°, 0.5°, 0.8°, 1.6°}, we find that the c θ values can be approximated by a linear or quadratic function of θ depending on the participant. We performed an AIC test Akaike [1992] between the linear and quadratic fit in order to choose which function to use for each participant. These approximations reduce the number of free parameters.
In order to obtain a confidence interval for the different parameters we used the likelihood estimation of confidence interval for Monte-Carlo Markov Chains method Ionides et al. [2016]. The confidence interval on the parameters is thus the 70% confidence interval, assuming a Gaussian distribution of the cost function (see Material and Methods section). This provides an approximation of the reliability of the parameters values found. In order to assess the reliability of this method we checked that the threshold z and stimulus strength c θ parameters have an almost non-correlated influence onto the cost function.
The results of the calibrating procedure are summarized in Tables 6, 7 Table 7: Threshold parameter for each participant after fit of the mean accuracy and response times of the confidence block. The ranges ∆z correspond to one sigma deviation of the likelihood with respect to the corresponding parameter (see the Material and Methods Section).
The non-decision time is considered to be due to encoding and motor execution Luce et al. [1986]. Most model-based data analysis of response time distributions assume a constant non-decision time Ratcliff and Rouder [1998], Usher and McClelland [2001], Wong and Wang [2006], Brown and Heathcote [2008]. However it has been shown that fitting data originating from a skewed distribution under the assumption of a nonskewed non-decision time distribution is cause for bias in the parameter estimates if the model for non-decision time is not correct Ratcliff [2013]. Recently, Verdonck and Tuerlinckx Verdonck and Tuerlinckx [2016] proposed a mathematical method to fit a non-parametrical non-decision time distribution. Analyzing various experimental data with this method within the framework of drift-diffusion models, they find that strongly right skewed non-decision time distributions are common. In this paper we make the hypothesis that the non-decision time distributions are ex-Gaussian distributions Grushka [1972], whose parameters are inferred from the data making use of the deconvolution method introduced in Verdonck and Tuerlinckx [2016] and detailed below.
We thus consider that the nondecision time distribution, noted ρ N DT , is described by an exponentially modified Gaussian (EMG) distribution Verdonck and Tuerlinckx [2016]: with erfc the complementary error function. For such distribution, the mean non decision time, N DT , is given by As we assume no correlation between response and non decision times, the total (observed) response time distribution, ρ data , can be written as the convolution of the decision time distribution, ρ decision , with the nondecision time distribution, ρ N DT : (with * standing for the convolution operation). If the decision time distribution is Gaussian, the resulting total distribution is an EMG distribution Grushka [1972].
Using maximum likelihood estimation, for each subject we fit the empirical response time distribution (all orientations together) by an EMG distribution with parameters µ data , λ data , σ data (and we thus have RT data = µ data + 1/λ data ).
For what concerns the model, we find that the decision time distribution of the attractor neuronal network is well fitted by a Gaussian distribution with parameters RT network , σ network . We thus model the decision time distribution by the one provided by the attractor network whose parameters have been calibrated as explained above. Hence we identify the mean and variance of the decision time distribution with the ones of the network: µ decision = RT network , σ decision = σ network .

20
Taking the characteristic function of both sides of Equation 13, we get: We can then identify the terms on both sides of this equations, which, given the use for the decision parameters of the ones of the attractor network, gives the equations and from which we can compute the non-decision time distribution parameters, λ N DT , µ N DT , σ N DT .
We present in Fig. 3 the fits of the response time distributions and the inferred non decision time distributions.

OTHER MODELS USED FOR COMPARISON
We compare the attractor neural network to the Usher and McClelland model Usher and McClelland [2001]. The equations of the model are the following: τ dx 1 = −kx 1 dt − βf (x 2 )dt + I 1 + σµ 1 (t) τ dx 2 = −kx 2 dt + βf (x 1 )dt + I 2 + σµ 2 (t) with µ i (t) a white-noise process and I i the input current to the system. The external input is defined as I i = 0.5 ± c θ , with c θ the strength per angle as in the attractor neural network. σ = 0.4 denotes the strength of the noise, k the relaxation strength, τ = 0.1 the relaxation time and β the inhibitory term. Finally the function f is a sigmoidal function of gain G = 0.4 and half-activity offset d = 0.5, f (x i ) = 1/ [1 + exp (−G(x i − d))]. The dynamics occurs until a threshold z is reached for one of the two units. It should be noted that, despite the non-linearity, the Usher-McClelland model is closer to drift-diffusion models than to biophysical attractor model (this because the only non-linearity is in the interaction between units). Reductions to one-dimensional drift diffusion models can be made in various ranges of parameters Bogacz et al. [2006]. In order to fit this model to the experiments we apply the same procedure as previously.