Confidence through consensus: a neural mechanism for uncertainty monitoring

Models that integrate sensory evidence to a threshold can explain task accuracy, response times and confidence, yet it is still unclear how confidence is encoded in the brain. Classic models assume that confidence is encoded in some form of balance between the evidence integrated in favor and against the selected option. However, recent experiments that measure the sensory evidence’s influence on choice and confidence contradict these classic models. We propose that the decision is taken by many loosely coupled modules each of which represent a stochastic sample of the sensory evidence integral. Confidence is then encoded in the dispersion between modules. We show that our proposal can account for the well established relations between confidence, and stimuli discriminability and reaction times, as well as the fluctuations influence on choice and confidence.


A Variance of the state of a diffusion process depends on time
The temporal evolution of variables corrupted by noise is described by stochastic dynamical equations of the form: where a and b are the deterministic functions and ξ is a stochastic variable. Depending on the stochastic force's (also called noise) distribution, the probability density p(x,t|x 0 ,t 0 ) evolves differently. In particular, if ξ (t) is normally distributed with independent samples at different times, the probability density evolves following the Fokker-Planck master equation where A controls drift and B diffusion. The case without drift and constant diffusion is simply a Wiener process. The probability density p(x,t|x 0 ,t 0 ) takes the form of a gaussian distribution with variance proportional to t − t 0 . Adding a constant drift does not change the variance's evolution (Fig. S1). If A or B depend on x, the probability density follows other dynamics such as Ornstein-Ulhenbeck, or non-linear diffusion. Both of these present correlations between elapsed time and the probability distribution's dispersion. For a detailed deduction refer to 1 chapter 3. non selective (white) and inhibitory (red). The progression shows the dimensionally reduced meanfield model. B. Firing rate threshold crossing detection mechanism from 7 . Populations A and B compete, when one crosses a threshold of activity the first layer of interneurons is activated. This first layer inhibits the second layer and thus releases population DA or DB in order for it to become active. Note that DA and DB receive direct connections from the competing layer whilst we propose another mechanism for DA and DB to become active once A and B surpass the threshold. The interneuron populations that mediate the detection are proposed to be in the Caudate Nucleus and the Superior Colliculus in the Basal Ganglia.

B Neural implementation
We base our neural decision model on an extensively studied model of perceptual decision making in which four populations of neurons interact 2 . Two pools of pyramidal neurons are sensitive to the sensory input of the decision process, while the others, one pool of pyramidal neurons and one pool of interneurons, are non selective. All pools are fully connected with each other through AMPA, NMDA and GABA A mediated synapses ( Fig. S2.A). The pyramidal neurons outward connections target AMPA and NMDA receptors while the interneurons target GABA A . Furthermore, all pools of neurons receive background Poisson spike trains that target AMPA receptors and yield a baseline of synaptic input current. The slow NMDA channel dynamics, along with balanced inhibition and positive recurrent connections, lead the whole system to a winner take all state where one of the selective pools of neurons becomes active and the other is inhibited. The network of single neurons and their synapses can be reduced to an effective two-dimensional meanfield model where the fractions of open NMDA channels drive the decision process 3,4 ( Fig. S2.A). The complete deduction can be found in 3 . Note that the four populations with three neurotransmitter channel dynamics were reduced to two populations with a single effective NMDA channel opening dynamic. Thus these populations are a mixture of pyramidal and interneurons, and as such follow an altered input-output relation (i.e. output firing rate as a function of input current). Furthermore, the background synaptic input, that is a vital part of the model, is included in the meanfield as dynamic Ornstein-Ulhenbeck process 5 plus a fixed value. In the numerical simulations, an exact update method is used to sample the Ornstein-Ulhenbeck noise (Refer to Eq. E-12b in Gillespie 1992 6 ). If after the stimulus presentation, population 1 has increased firing rate while 2 has baseline activity, then option 1 is selected. The firing rate of each population can be interpreted as the accumulated evidence in favor of each option, and the selection takes place after an activity threshold is crossed. The threshold crossing can be signaled by a separate bursting population that only fires once one of the competing populations surpasses a certain level of activity. Lo and Wang 7 proposed a possible implementation, schematically shown in Fig. S2.B. In Lo's implementation, populations A and B integrate sensory evidence and compete with each other. If, for example, A surpasses a threshold of activity, it activates population A1. A1 in turn inhibits A2, which is normally active due to background input. As A2 is silenced, inhibition on DA is released, allowing the direct connection from A to produce a burst of activity in DA. We use a simplified implementation, where DA and DB become active due to background input, and not by direct connections from the competition layer, once the inhibition is released.  Figure S3. Extended neuronal implementation with the competition (A and B), threshold crossing detection (L and U) and counting layers (D, H and C). All modules are interconnected although the connections are not depicted here. We propose that all populations receive different levels of background synaptic input. This background allows the second layer of the modules to have high baseline activity, and also allows the third layer of each module to have high activity once the inhibition from the second layer disappears.

3/17
We propose an ensemble network model of N = 100 modules of these decision pools that are weakly connected with each other. A crucial aspect of the model is how it determines the selected option. In the single module case, each population's firing rate could be interpreted as the evidence in favor of an alternative. In the multi module case, the mean across populations ⟨ ρ k i (t) ⟩ k could serve as the evidence in favor of option i. It is presently unclear how this computation could take place but there is evidence supporting normalization as a possible neural computation 8 . Alternatively, the decision could be signaled after over half the modules have crossed a threshold of activity. The simulations shown in the main text use the latter criterion but both criteria yield almost the same results.
The neural implementation is divided into: 1. The competition layer 2. The threshold crossing detection layer

The counting layer
The competition layer is made up by the neural populations that receive sensory input and compete in an attractor neural network (ANN) fashion ( Fig. 1.A and Fig. S2.A). This layer is the heart of the entire decision model. The activity of each population can be artificially read (i.e. not using a neural mechanism) in order to detect the selected option and the inter module dispersion, σ dv . We go further by addressing the problem of "how could the decision be detected and σ dv be read with neurons?". This is done by extending the competition layer with the threshold crossing detection and the counting layers (Fig.  S3).
The detailed dynamics that the competition layer obey are detailed in table 1.

D Neuron and Synapse Model Name
Meanfield competition approximation Type Firing rate, NMDA channel   Interneuron and pyramidal meanfield approximation Type Firing rate, instantaneous GABA channel if pyramidal neuron Dynamic equations I i ext (t) = I Background + I k i noise (t) The threshold crossing detection and counting layer follow similar dynamics. The populations in each layer are meanfield approximations of a groups of integrate and fire neurons. However, the neurons inside each population are either pyramidal or interneurons, and not mixtures as in the competition layer. Thus, the input-output relation and the synapses' dynamics change.
The detailed description can be found in tables 2 and 3. The main synaptic differences are that: 1. The interneurons only target GABA receptors. The GABA channel dynamics is approximated as instantaneous, thus the postsynaptic current is proportional to the interneuron's firing rate.
2. The counting layer's excitatory connections only are mediated by AMPA. This implies that, as with the interneurons, the postsynaptic current is proportional to the presynaptic firing rate.
The proposed threshold crossing detection layer is based on the mechanism proposed by Lo and Wang 7 to detect when a neural population surpasses a given firing rate value. It relies on the widely used mechanism of disinhibition 9-13 . Each

8/17
decision module in Fig. 1 .A is connected to a module that detects when one of the two decision pools crosses the threshold for the decision, as shown in Fig. S3. The threshold module is composed of three layers and each layer has two branches, L and U. Branch L is needed to signal the crossing of the lower decision threshold λ , while branch U is needed to detect whether the module is in a given interval near the threshold or already crossed the upper threshold λ + ∆λ . The decision threshold mechanism works as following. The second and third layer receive background input that would drive them to spike at a constant firing rate. However, the second layer constantly inhibits the third layer so it is silent. When one of the two decision pools (e.g. A) increases the activity, due to the decision process, and crosses the decision threshold (15Hz), the corresponding pool from layer one (LA1) of the threshold module is activated. This pool, in turn, inhibits its corresponding layer 2 population (LA2), thus releasing inhibition from the corresponding third layer (LA3). Hence LA3 becomes active due to the background input if A crosses the decision threshold. This implies that the third layer works as a binary indicator. It is firing ("on") if the competition population surpassed a certain threshold and is silent ("off") otherwise. The upper threshold mechanism (branch U) works in the same way but its threshold is set to a slightly higher value (20Hz). The global pools DA, DB, HA and HB receive input from all modules, thereby counting how many modules crossed the lower (DA and DB) and the upper threshold (HA and HB). The final commitment to a choice is based on the activity of pools DA and DB. When DA or DB surpass a certain level of activity that corresponds to over half the modules having committed to the same decision (97.68Hz for the parameters we used), the decision is taken. In order to form a confidence judgment the number of modules near the threshold (FMC) is estimated by the confidence pools (CA and CB) that calculate the difference between the activity of LA and HA for decision A (or LB and HB for decision B). The firing rate CA, if option A is selected, is used as input to a sigmoid that determines the probability of high confidence as, P High The parameters of the sigmoid control the bias and the slope of the transition from high to low confidence, and were determined by fitting the subjects' accuracy discriminated by confidence.

B.1 Description of model dynamics
An example of the network's dynamics during a single trial is shown in fig. S4. During the shown trial, the target's mean luminance was set to 52cd/m 2 and the distractor was set to 50cd/m 2 . The A populations were sensitive to the target patch while the B populations were sensitive to the distractor. During the presented simulation, IC = 0. The competing layer integrates sensory evidence for each option. A single module's pair of competing A k and B k populations is shown (Fig. S4.A). These competing populations drive the firing rates of the first layers of the threshold detection portion of the network (Fig. S4.B). These, in turn, release the inhibition placed on the third layer LA3, LB3, UA3 and UB3 (Fig. S4C). This layer works as a binary indicator value. The counting layer simply counts the number of third layer populations active (Fig. S4.D). Populations LA1, LB1, UA1 and UB1 receive synaptic input from competing neurons in each module, and become "active" (fire sufficient spikes in order to inhibit populations LA2, LB2, UA2 or UB2) after receiving more than I th current. The value of I th is determined in order to have a correct count method.
When population A has a constant firing rate, ρ A , its NMDA gating variable is at a steady state, s A = (1 + 1/(ρ A γτ NMDA )) −1 . The synaptic current sent to populations LA1 and UA1 will be equal to where w L/U is the connection weight to populations LA1 or UA1. Assuming a value of I th , the connection weights can be calculated in order to detect the crossing at a given firing rate value ρ A . w L = I th (1 + 1/(15Hzγτ NMDA )) and w L = I th (1 + 1/(20Hzγτ NMDA )). The best value for I th is the one that yields the best linear dependence between the number of competing populations that have crossed the threshold and the firing rates of DA, DB, HA and HB. However, synaptic noise in the threshold crossing detection and counter layers induce variability that makes measuring the best I th value complicated. In order to measure I th , we perform simulations fixing the noise to zero in the threshold crossing detection and counter layers (Fig. S5). The observed best linear dependence occurs at I th = 0.29nA. The resulting network yields the same behavior as the model with artificial counting.

11/17 C Detailed correlations within discriminability
In Figures 2 and 3 of the main text we show correlations between our model's RT, and σ dv and FMC. These correlations were shown for averaged RT, σ dv and FMC values. The averages were done for separate discriminabilities (d i ) between the target and distractor patches. Each average was taken on 2000 simulated trials. However, the shown correlation between RT and σ dv , and σ dv and FMC still hold within the same discriminability ( Fig. S6 and S7 respectively). It is clear that even for the same discriminibability, our model's dispersion is strongly correlated with the RT, and FMC is a valid estimate for said dispersion.

D Response time distribution overlap
Its important to clarify that our model does not explicitly rely on response time to compute confidence. Confidence is readout from an estimate of dispersion, which is only correlated with RT. In fact, if subjects only relied on RT to determine confidence, the distributions would not have a large overlap. However, we observe, both for simulations and for subjects' data, that RT distributions split by confidence are very overlapped, and that low confidence reports are usually associated to longer RTs (Fig.  S8). The simulation's RT distribution is nothing like the subjects' distribution which in fact is expected, as we force the model to decide in less than 1s. This was done because, in this work, we are not interested in reproducing the detailed RT distribution. For our analyses it was sufficient to only look at the task accuracy and sensory fluctuation's influence. To be able to account for the RT distribution, which is by its self a very difficult task, the merit function described in section "Data Fitting" of the main text should be adapted to include the RT's goodness of fit and probably the background input or network connection weights should be changed. We deemed this was too computationally expensive and beyond our intended analysis.

E Fixed delay task
The model was only tested on reaction time trials where it had to decide when it was ready and could sample the stimuli for as long as it needed. This was due to the fact that the behavioral data used to fit the psychophysical kernels was taken with reaction time trials. However, many experiments use the fixed delay paradigm where the stimulus is presented for a fixed time and the subject must respond 14,15 . Zylbergerg et al 16 also studied a task of this nature (random dot motion) and found that the decision and confidence kernels were qualitatively similar to those measured with the reaction time luminance discrimination task. However, they only tested one stimulus duration that was relatively long. When the stimuli are presented for very brief periods of time, subjective reports tend to be of low confidence. As the stimulus duration raises, so does the high confidence report rate 17-20 . Our model is also able to capture this fact (Fig. S9). However, care must be taken when forcing our proposed network to decide upon a limited amount of evidence (short stimuli durations). If the sensory input disappears to soon, the network may have not yet committed to a choice. This implies that some of the modules may even revert to a low firing rate steady state and never select amongst the alternatives. We propose that once stimulation vanishes, if the network has not committed to a choice, it queries the modules (i.e. forces them to cast a vote). We propose this is accomplished by increasing the background task non-specific synaptic input (I Background ) to a level that forces the ANNs to choose one of the two alternatives. We test our model's ability to explain confidence's dependence with stimulus duration by presenting it with two fluctuating sensory inputs. Both inputs are resampled from a normal distribution each 40ms. The target mean luminance is 52cd/m 2 , the distractor mean is 50cd/m 2 and the standard deviation is 5cd/m 2 . The stimuli are presented at t = 1s and last different times. After the stimuli are turned off, the network is forced to select an option by increasing the background mean synaptic input to force a winner take all situation (I Background = 0.3455nA). Upon the decision time, σ dv and FMC are measured (Fig. S9.A  and B). The FMC increases (σ dv decreases) with stimulus duration and thus the probability of high confidence reports also increases. Accuracy also increases with stimulus duration until it reaches a stable value (Fig. S9.C). This test is sufficient to say that when the environment controls stimulus duration, it modulates our model's confidence. However, if the model were aware of the length of the stimulation beforehand it could shift its criterium to favor speed or accuracy. This implies that the model could adopt a level of I Background that yielded slow or fast responses, thus tuning its speed/accuracy tradeoff 21,22 . We simulate 2000 trials with the same stimulation protocol as above but without turning off the sensory inputs, and for different I Background values. We observe, as is expected, that high I Background yields faster and less accurate responses, while low I Background produces slower, more accurate decision (Fig. S10.A and B). However, σ dv and FMC have a non-monotonic dependence with I Background (Fig. S10.C and D). Experimental observations show that subjects respond with greater confidence when favoring accurate responses 17,19 . To reproduce this, σ dv should monotonically increase with I Background , which is clearly not the case. It is worthy to note that for I Background values above 0.3255nA (the value used for all simulations), σ dv (FMC) is monotonically increasing (decreasing), which could be a sign that our model can account for the speed accuracy trade-off only in a limited range of background inputs. Nevertheless, the non-monotonicity for σ dv as a function of I Background appears to be in grave contradiction with experimental observations; however, our model is constructed assuming that the balance of speed and accuracy, i.e. the decision policy, is constant. Thus we do not study the problem of confidence calibration 23 . Calibration is the process through which a summary statistic (in our case the σ dv or FMC) is transformed to certainty level or a confidence report (which we assume occurs in a separate layer). This problem requires feedback connections and parameter tuning to actively learn the proper calibration for a variable decision policy. This also implies that the parameters that determine the probability of high confidence should also change with I Background . This interesting problem is well beyond what we studied in the present work.