Article | Published:

# Regulation of evidence accumulation by pupil-linked arousal processes

Nature Human Behaviour (2019) | Download Citation

## Abstract

Effective decision-making requires integrating evidence over time. For simple perceptual decisions, previous work suggests that humans and animals can integrate evidence over time, but not optimally. This suboptimality could arise from sources including neuronal noise, weighting evidence unequally over time (that is, the ‘integration kernel’), previous trial effects and an overall bias. Here, using an auditory evidence accumulation task in humans, we report that people exhibit all four suboptimalities, some of which covary across the population. Pupillometry shows that only noise and the integration kernel are related to the change in pupil response. Moreover, these two different suboptimalities were related to different aspects of the pupil signal, with the individual differences in pupil response associated with individual differences in the integration kernel, while trial-by-trial fluctuations in pupil response were associated with trial-by-trial fluctuations in noise. These results suggest that different suboptimalities relate to distinct pupil-linked processes, possibly related to tonic and phasic norepinephrine activity.

## Main

The ability to integrate evidence over time is a crucial component of perceptual decision-making. This is true whether we are integrating visual information from saccade to saccade as we scan a scene, or integrating auditory information from word to word as we listen to someone talk. In recent years, much work has been devoted to understanding how humans and animals perform evidence integration over short time scales (on the order of one second) in simple perceptual tasks1,2,3,4. In a classic paradigm from this literature, known as the random dot motion task, participants are presented with a movie of randomly moving dots that have a weak tendency to drift in a particular direction (for example, left or right) and they must decide which way the dots are drifting5. The optimal strategy in this task is to count (that is, integrate) the number of dots moving to the left and right over the time course of the stimulus and choose the side that had the most dots moving in that direction. Amazingly, this optimal strategy can account for many of the qualitative properties of human and animal behaviour, and neural correlates of integrated evidence can be found in several areas of the brain2,3,6,7.

Despite the ability of the optimal model to qualitatively account for a number of experimental findings, the quantitative performance of even highly trained humans and animals is suboptimal1,8. This suboptimality is thought to arise from at least four different sources: (1) neuronal noise; (2) unequal weighting of evidence over time; (3) order effects from previous trials; and (4) overall side biases.

The first source of suboptimality is neuronal noise. While the exact cause of neuronal noise is subject to debate8,9,10,11, it is thought that variability in neural firing impacts perceptual decision-making in one of two ways. First, noisy sensory information reduces the quality of the evidence going into the accumulator in the first place1,12,13,14. Second, noisy action selection causes mistakes to be made even after the integration process is complete15,16,17.

The second source of suboptimality comes from the unequal weighting of evidence over time, which we call here the ‘integration kernel’. In particular, while the optimal kernel in most perceptual decision-making tasks is flat (that is, all information is weighed equally over time), a number of studies have shown that humans and animals can have quite suboptimal kernels. For example, in the random dot motion task, monkeys exhibit a ‘primacy’ kernel, putting more weight on the early parts of the stimulus relative to the later parts of the stimulus4. Conversely, in a slightly different integration task, humans exhibit the opposite ‘recency’ kernel, weighing later information more than early information18,19. Finally, in some experiments, this second source of suboptimality appears to be absent, with a ‘flat’ integration kernel being found in both rats and highly trained humans1.

The third source of suboptimality reflects the tendency to let previous decisions and outcomes interfere with the present choice. Thus, when making multiple perceptual decisions, the current decision is influenced by the choice we just made; for example, by repeating an action when it is rewarded and choosing something else when it is not (a reinforcement learning effect8,15,16,20), or simply repeating a choice regardless of the outcome associated with it (a choice kernel effect8,20,21). Such sequential dependence can be advantageous when there are temporal correlations between trials, as is the case in many reinforcement learning tasks15,16, but is suboptimal in most perceptual decision-making tasks when each trial is independent of the past8,20,22,23.

Finally, the fourth suboptimality is an overall side bias where both humans and animals develop a preference for one option (for example, left) even though that leads to more errors overall20.

Evidence from a number of studies suggests that pupil-linked arousal processes, putatively driven by the locus coeruleus norepinephrine system24,25,26,27,28, are well placed to modulate all four of these different sources of suboptimality. With regard to noise, increased pupil response has been associated with a number of different cognitive processes, such as effort, arousal, mood, attention and memory, all of which might influence noise26,29. In the specific case of perceptual decisions, previous work suggests a role for pupil-linked arousal systems to modulate the overall neuronal noise (that is, the signal-to-noise ratio (SNR) of sensory cues) in the evidence accumulation process30,31. With regard to kernel and side bias, pupil response has been associated with a change in the ‘gain’ of other neural systems, which in turn is thought to modulate the strength of internal and external cognitive biases on decision-making26,28,32,33,34,35. In addition, recent empirical and theoretical work has also suggested that norepinephrine—a putative driver of pupil dilation—modulates the urgency of decision-making in a sequential sampling task such that the higher the norepinephrine level, the more urgently a decision is made36,37,38. Taken together, these studies point to the possibility of pupil-linked norepinephrine systems to modulate the integration kernel and side bias by changing the strength of pre-existing biases during the integration process. Finally, with regard to sequential effects, pupil changes have been related to how humans integrate relevant information from previous trials to infer uncertainty and expectation21,39,40, suggesting a role for pupil-linked arousal systems in modulating sequential effects.

In this work, we investigated all four sources of suboptimal perceptual decision-making and their relationship with pupil-linked arousal processes in a single task. By quantifying all four sources of suboptimality in the same task, we were able to assess the relationships between the suboptimalities and determine the extent to which pupil-linked arousal processes were related to each.

## Results

To study the effects of pupil response on evidence accumulation, we designed an auditory discrimination task based on the Poisson clicks task1 (see Methods). In this task, participants listened to two trains of clicks in the left and right ears, and were instructed to indicate which side they thought had more clicks (Fig. 1a). Clicks in our task were generated according to a Bernoulli process, such that there was always a click every 50 ms that was either on the left, with probability Pleft, or the right. This process meant that the total number of clicks was always fixed at 20 clicks and the clicks occurred at a fixed frequency of 20 Hz. This generative process for the clicks represented a slight departure from ref. 1, in which clicks were generated by a Poisson process with a refractory period of 20 ms. The main reason for using a Bernoulli process was to simplify the subsequent logistic regression analysis for quantifying the different sources of suboptimality, without imposing too many a priori assumptions. To indicate this difference, we refer to our task as the Bernoulli clicks task.

### Psychometric and chronometric functions

A total of 108 participants each performed between 666 and 938 trials (mean: 760.7) of the Bernoulli clicks task. Basic behaviour was consistent with behaviours in similar pulsed-accumulation tasks1,4. Choice exhibited a sigmoidal dependence on the net difference in evidence strength (that is, the difference in the number of clicks between the right and left, Δclick) (Fig. 1b). A simple logistic regression of the form:

$${\mathrm{logit}}[P_{{\mathrm{left}}}\,{\mathrm{at}}\,{\mathrm{trial}}\,{t}] = \beta _0 + \beta _{\Delta {\mathrm{click}}}\Delta {\mathrm{click}}$$
(1)

revealed a significant effect of Δclick (two-tailed, one-sample t107 = 29.19; P < 0.001; Cohen’s d = 2.81; 95% confidence interval (CI) = 0.33 to 0.37). Reaction times were also modulated by net evidence strength (Fig. 1c), and linear regression of the form:

$${\mathrm{RT}}\,{\mathrm{at}}\,{\mathrm{trial}}\,t\,{\mathrm{in}}\,{\mathrm{seconds}} = \beta _0 + \beta _{{\mathrm{\Delta click}}}\left|\Delta {\mathrm{click}}\right|$$
(2)

found a significant effect of the absolute value of Δclick on reaction time (two-tailed one-sample t107 = −16.29; P < 0.001; Cohen’s d = −1.57; 95% CI = −0.02 to −0.01). These results indicated that participants were faster and more accurate when the difference in the number of clicks was large (easy trials), and less accurate and slower when that difference was small (difficult trials).

### Humans exhibited all four suboptimalities in the Bernoulli clicks task

We used a logistic regression model to characterize the four different types of suboptimalities in human decision-making in our task. This model quantified the impact of each click, the reinforcement learning and choice kernel effects from the five previous trials and the side bias on participants’ choices. In particular, we assumed that the probability of choosing left on trial t was given by:

$${\mathrm{logit}}[P_{{\mathrm{left}}}\,{\mathrm{at}}\,{\mathrm{trial}}\,{t}] = \underbrace {\mathop {\sum}\limits_{i = 1}^{20} \beta _i^{{\mathrm{click}}}c_i}_{\begin{array}{*{20}{c}} {{\mathrm{Intergration}}} \\ {{\mathrm{kernel}}} \end{array}} + \underbrace {\mathop {\sum}\limits_{j = 1}^5 \beta _j^{{\mathrm{RL}}}{a}_{t - j}{r}_{t - j}}_{\begin{array}{*{20}{c}} {{\mathrm{Reinforcement}}} \\ {{\mathrm{learning}}} \end{array}} + \underbrace {\mathop {\sum}\limits_{j = 1}^5 \beta _j^{{\mathrm{CK}}}{a}_{t - j}}_{\begin{array}{*{20}{c}} {{\mathrm{Choice}}{\kern 1pt} } \\ {{\mathrm{kernel}}{\kern 1pt} } \end{array}} + \underbrace {\beta ^{{\mathrm{side}}}}_{\begin{array}{*{20}{c}} {{\mathrm{Side}}} \\ {{\mathrm{bias}}} \end{array}}$$
(3)

where ci is the ith click (+1 for a left click and −1 for a right click), atj is the choice made on the t − jth trial (+1 for a left choice and −1 for a right choice) and rtj is the ‘reward’ on the t − jth trial (+1 for correct and −1 for incorrect). Therefore, atjrtj indicates the correct side on the t − jth trial (+1 when left is correct and −1 when right is correct). The relative effect of each of these terms on the decision is determined by the regression weights: $$\beta _i^{{\mathrm{click}}}$$ (the effect of each click), $$\beta _i^{{\mathrm{RL}}}$$ (the reinforcement learning effect; that is, the effect of the previous correct side), $$\beta _i^{{\mathrm{CK}}}$$ (the choice kernel effect; that is, the effect of the previous choice) and βside (an overall side bias).

Each of the four suboptimalities could be quantified using different parameters from this model (Fig. 2a–d). First, the SNR—corresponding to suboptimality arising from neuronal noise—was quantified as the average weight given to all clicks $$\left( {\frac{1}{{20}}\mathop {\sum}\nolimits_{i = 1}^{20} \beta _i^{{\mathrm{click}}}} \right)$$. The higher the average click weight, the higher the SNR or, equivalently, the lower the relative level of the noise. This average was significantly different from zero (two-tailed, one-sample t107 = 27.65; false discovery rate (FDR)-corrected for multiple comparisons P < 0.001; Cohen’s d = 2.66; 95% CI = 0.37 to 0.43) (Fig. 2a), indicating that participants based their decision on (at least some of) the clicks and that each click increased the log odds of ultimately choosing that direction by about 0.4.

The second suboptimality (that is, deviations from a flat integration kernel) was quantified as the deviation of the click weights from the average (that is, $$\beta _i^{{\mathrm{click}}} - \frac{1}{{20}}\mathop {\sum}\nolimits_{j = 1}^{20} \beta _j^{{\mathrm{click}}}$$ (Fig. 2b). Here, we found that participants did not weigh all of the clicks equally (repeated-measures analysis of variance (ANOVA), F19, 2,033 = 28.21; P < 0.001; effect size measure partial η2 = 0.21). This was not consistent with previous reports of a similar task in which all clicks received equal weighting on average1.

Sequential effects—the third suboptimality—were captured by the effects from previous trials. Specifically, the terms $$\beta _j^{{\mathrm{RL}}}$$ and $$\beta _j^{{\mathrm{CK}}}$$ quantified the reinforcement learning effect (the effect of the past correct side) and choice kernel effect (the effect of the past choice) for the past five trials on the current choice. In line with earlier work20, we found that previous trials had both significant reinforcement learning and choice kernel effects on participants’ choices (Fig. 2c). Notably, the positive reinforcement learning regression weight demonstrated a positive reinforcement learning effect, in that participants tended to choose whichever side that was shown to be correct on the previous trial (two-tailed, one-sample t107 = 14.40; FDR-corrected for multiple comparisons P < 0.001; Cohen’s d = 1.39; 95% CI = 0.32 to 0.43). The negative choice kernel regression weight indicated an alternating choice kernel: participants tended to choose the opposite of what they had chosen on the previous trial (two-tailed, one-sample t107 = −10.45; FDR-corrected for multiple comparisons P < 0.001; Cohen’s d = −1.01; 95% CI = −0.33 to −0.23).

Finally, the side bias was quantified by the intercept term βside in the model (Fig. 2d). This term quantified the extent to which a participant chose the left side on all trials regardless of which side was the correct side. Here, we saw a significant right bias indicated by a significantly negative regression weight (two-tailed, one-sample t107 = −4.12; FDR-corrected for multiple comparisons P < 0.001; Cohen’s d = −0.40; 95% CI = −0.19 to −0.07).

### Sequential effects and SNR covary across participants

We then inspected how these suboptimalities correlated with each other across participants. We used a three-way mixed ANOVA to inspect the effect of previous trials on both SNR and kernel shape. The three factors we investigated were reinforcement learning regression weights, choice kernel regression weights and time. The ANOVA was set up to investigate the effect of these three factors on the regression weights of clicks. In this ANOVA, the main effects of either reinforcement learning or choice kernel on kernel weights told us whether reinforcement learning or choice kernel correlated with the overall SNR. The interaction effect between either reinforcement learning or choice kernel and time on kernel weights told us whether reinforcement learning or choice kernel correlated with the kernel shape.

We found that both reinforcement learning and choice kernel had significant main effects on SNR (reinforcement learning: F1, 2,080 = 50.66; P < 0.001; choice kernel: F1, 2,080 = 56.82; P < 0.001), but not kernel shape (reinforcement learning × time: F19, 2,080 = 0.80; P = 0.71; choice kernel × time: F19, 2,080 = 0.19; P = 0.99). Specifically, reinforcement learning is negatively correlated with SNR (Pearson’s r = −0.28; P = 0.003) (Fig. 3a), while choice kernel is positively correlated with SNR (Pearson’s r = 0.22; P = 0.03) (Fig. 3b).

Importantly, since the reinforcement learning effect was positive (Fig. 2c; that is, participants tended to choose, on the current trial, whichever side was correct in the previous trial), a negative correlation indicated that participants who relied more on feedback from the previous trial tended to rely less on information on the current trial. Conversely, since the choice kernel effect was negative (that is, participants tended to alternate their choices of sides between trials; Fig. 2c), a positive correlation indicated that the more participants alternated their choices (that is, relied on past choice history), again the less they relied on evidence from the current trial.

Together, these results suggest a ‘subtractive’ effect between choice history and SNR on the current trial: participants who rely more on history (reinforcement learning and choice kernel) tend to rely less on evidence from the current trial. This result could also be interpreted as participants who were worse at making decisions based on evidence from the current trial tending to rely more on previous history. Interestingly, we also found a similar small but significant relationship between sequential effects and SNR at the within-participant level (Supplementary Note 1 and Supplementary Fig. 1).

In addition, we also saw a negative correlation between reinforcement learning and choice kernel across participants (Pearson’s r = −0.46; P < 0.001) (Fig. 3c), which indicated that participants who relied more on past feedback also relied more on past choice (stronger alternating effect).

### Individual differences in pupil change correlate with individual differences in integration kernel

To examine the interaction between individual differences in pupil response and integration behaviour, we first computed the pupil diameter change during the presentation of click stimuli. We time-locked the pupillary response to the onset of the clicks stimulus, and averaged the pupil diameter for each participant. We then took the difference between the peak and the trough of the pupil diameter within the click stimuli, which we called the ‘pupil change’ for each participant (Fig. 4a). As shown by a median split in Fig. 4b, there were considerable individual differences in the pupil change, with some participants showing almost no change while others showed a much larger change during the stimulus. The difference in the magnitude of pupil change between two median-split groups is significant (two-tailed, two-sample t106 = 9.54; P < 0.001; Cohen’s d = 1.30; 95% CI = 0.18 to 0.27).

To examine the relationship between pupil change and the overall SNR and integration kernel, we used a two-way mixed ANOVA to compare the effects of pupil change (coded as a continuous variable) and time on participants’ regression weights ($$\beta _i^{{\mathrm{click}}}$$ from equation (3)). If pupil change had an effect on the overall SNR, we should see a main effect of pupil change on regression weight. Conversely, if pupil change had an effect on the integration kernel, we should see a significant interaction effect between pupil change and time on regression weight. Only the interaction effect was significant (interaction: F19, 2,014 = 2.225; P = 0.0018; partial η2 = 0.02; main effect: F1, 106 = 2.761; P = 0.10). Moreover, these results were robust to a number of different assumptions in the analysis, such as the size and location of the window size for computing pupil change (Supplementary Note 2 and Supplementary Figs. 23) and whether we performed the analysis on the raw regression weights or on the top two principal components (Supplementary Fig. 4). Taken together, these findings suggest that individual differences in pupil change affected the shape of the integration kernel but not the overall SNR (illustrated using a median split in the left two panels of Fig. 4c).

To understand which click weights were driving this interaction effect, we performed a correlation analysis between individual differences in the regression weights for each click and individual differences in the pupil change. These post-hoc tests suggested that the main change occurred in the second and third clicks, whose weights were increased in participants with high pupil change. Specifically, pupil change was significantly correlated with second (Pearson’s r = 0.29; FDR-corrected for multiple comparisons P = 0.02) and third (Pearson’s r = 0.30; FDR-corrected for multiple comparisons P = 0.02) kernel weights (Supplementary Fig. 5).

To examine the relationship between pupil change and sequential effects and side bias, we looked at the correlation between pupil change and regression weights for the previous trials (reinforcement learning and choice kernel) and side bias. We found no significant relationship between pupil change and either sequential effects (reinforcement learning: Pearson’s r = −0.06; FDR-corrected for multiple comparisons P = 0.92; choice kernel: Pearson’s r = 0.04; FDR-corrected for multiple comparisons P = 0.94) or side bias (Pearson’s r = −0.01; FDR-corrected for multiple comparisons P = 0.94) (illustrated using a median split in the right two panels in Fig. 4c).

Combined, these results suggest that individual differences in pupil change were associated with individual differences in only one of the four suboptimalities (that is, the kernel of integration), such that participants with larger pupil change had more uneven integration kernels.

### Trial-by-trial variability in pupil change correlates with trial-by-trial variability in SNR

To quantify how trial-by-trial pupil change relates to the four suboptimalities in evidence accumulation, we modified the regression model (equation (3)) to include interaction terms between clicks, previous trials and trial-by-trial fluctuations in the pupil:

$$\begin{array}{*{20}{l}} {{\mathrm{logit}}[P_{{\mathrm{left}}}\,{\mathrm{at}}\,{\mathrm{trial}}\,{t}]} \hfill & = \hfill & {\mathop {\sum}\limits_{i = 1}^{20} \beta _i^{{\mathrm{click}}}c_i + \beta _1^{{\mathrm{RL}}}{a}_{t - 1}{r}_{t - 1} + \beta _1^{{\mathrm{CK}}}{a}_{t - 1} + \beta ^{{\mathrm{side}}}} \hfill \\ {} \hfill & {} \hfill & { + \underbrace {\beta ^{{\mathrm{\Delta click}} \times {\mathrm{\Phi }}_t}{\Delta c\Phi }_t}_{{\mathrm{SNR}}\, \times\,{\mathrm{pupil}}} + \underbrace {\beta _1^{{\mathrm{RL}} \times {\mathrm{\Phi }}_t}{a}_{t - 1}{r}_{t - 1}{\mathrm{\Phi }}_t}_{\begin{array}{c}{\mathrm{Reinforcement}}{\kern 1pt} {\mathrm{learning}}\, \times\,{\mathrm{pupil}}\end{array}} + \underbrace {\beta _1^{{\mathrm{CK}} \times {\mathrm{\Phi }}_t}{a}_{t - 1}{\mathrm{\Phi }}_t}_{\begin{array}{c}{\mathrm{Choice}}\,{\mathrm{kernel}}\,{\mathrm{ \times }}\,{\mathrm{pupil}}\end{array}} + \underbrace {\beta ^{{\mathrm{side}} \times {\mathrm{\Phi }}_t}{\mathrm{\Phi }}_t}_{\begin{array}{c}{\kern 1pt} {\mathrm{Side}}\,{\mathrm{bias}}\, \times\,{\mathrm{pupil}}\end{array}}} \hfill \end{array}$$
(4)

where Δc is the number of clicks on the left minus the number of clicks on the right, corresponding to the mean click regression weight in Fig. 2a, indicating the average SNR. It is also worth noting that, here, the interaction between side bias and pupil change is equivalent to a main effect of pupil change, since the regressor of the interaction between side bias and pupil change is the same as the regressor of pupil change alone.

We found that trial-by-trial pupil change interacted significantly with Δclick (that is, the interaction term between SNR and pupil change was significantly different from zero; two-tailed, one-sample t107 = −3.46; FDR-corrected P < 0.001; Cohen’s d = −0.33; 95% CI = −0.03 to −0.01), but not with side bias (two-tailed, one-sample t107 = −0.80; FDR-corrected P = 0.46), reinforcement learning (two-tailed, one-sample t107 = 0.65; FDR-corrected P = 0.64) or choice kernel (two-tailed, one-sample t107 = −0.59; FDR-corrected P = 0.57) (Fig. 5). We showed that the result is robust after repeating the same analysis with trial-by-trial pupil changes residualized from previous-trial pupil change, ensuring that the result we saw was not driven by pupil signal bled over from previous trials (Supplementary Fig. 6).

We then tested whether there was an interaction between pupil change and integration kernel shape with a slightly modified version of equation (4):

$$\begin{array}{*{20}{l}} {{\mathrm{logit}}[P_{{\mathrm{left}}}\,{\mathrm{at}}\,{\mathrm{trial}}\,t]} \hfill & = \hfill & {\mathop {\sum}\limits_{i = 1}^{20} \beta _i^{{\mathrm{click}}}c_i + \beta _1^{{\mathrm{RL}}}{a}_{t - 1}{r}_{t - 1} + \beta _1^{{\mathrm{CK}}}{a}_{t - 1} + \beta ^{{\mathrm{side}}}} \hfill \\ {} \hfill & {} \hfill & { + \underbrace {\mathop {\sum}\limits_{i = 1}^{20} \beta _i^{{\mathrm{click}} \times {\mathrm{\Phi }}_t}c_i{\mathrm{\Phi }}_t}_{{\mathrm{Kernel}}\, \times\,{\mathrm{pupil}}} + \underbrace {\beta _1^{{\mathrm{RL}} \times {\mathrm{\Phi }}_t}{a}_{t - 1}{r}_{t - 1}{\mathrm{\Phi }}_t}_{\begin{array}{*{20}{c}} {{\mathrm{Reinforcement}}} \\ {{\mathrm{learning}}\, \times\,{\mathrm{pupil}}} \end{array}} + \underbrace {\beta _1^{{\mathrm{CK}} \times {\mathrm{\Phi }}_t}{a}_{t - 1}{\mathrm{\Phi }}_t}_{\begin{array}{*{20}{c}} {{\mathrm{Choice}}\,{\mathrm{kernel}}}\, \\ { \times\,{\mathrm{pupil}}} \end{array}} + \underbrace {\beta ^{{\mathrm{side}} \times {\mathrm{\Phi }}_t}{\mathrm{\Phi }}_t}_{\begin{array}{*{20}{c}} {{\mathrm{Side}}\,{\mathrm{bias}}} \\ { \times\,{\mathrm{pupil}}} \end{array}}} \hfill \end{array}$$
(5)

where Φt is the pupil change measure at trial t. The first four terms in this model are the same as in equation (3). The last four terms are the respective interaction terms of clicks (integration kernel), previous correct side (reinforcement learning), previous choice (choice kernel) and side bias with pupil change. With repeated-measures ANOVA, we did not find a significant effect of time on $$\beta _i^{{\mathrm{click}} \times {\mathrm{\Phi }}_t}$$ (F19, 2,033 = 0.72; P = 0.81), suggesting that pupil change did not modulate the integration kernel on a trial-by-trial level. We did not find a significant interaction effect between pupil change and reinforcement learning (two-tailed, one-sample t107 = 1.78; FDR-corrected P = 0.14), choice kernel (two-tailed, one-sample t107 = −0.78; FDR-corrected P = 0.62) or side bias (two-tailed, one-sample t107 = −1.63; two-tailed, FDR-corrected P = 0.19) either. We also performed variance impact factor analysis to assuage concerns of collinearity in our regression analyses (Supplementary Note 3 and Supplementary Figs. 711). These results combined suggest that pupil change on a trial-by-trial level specifically modulated the overall SNR, and not integration kernel or sequential effects.

### Drift diffusion model bound corresponds to individual pupil change

To further examine the potential computations underlying the relationship between the accumulation process and pupil change, we fit a canonical model for this type of perceptual decision-making task—the drift-diffusion model (DDM)—to participants’ choice data. This model proposes that evidence is integrated over time and a decision is made when the evidence passes a bound. In an interrogation task paradigm (such as ours in which participants can only respond after the train of clicks ends), the DDM with no bound provides an optimal solution to the task. Brunton and colleagues extended this standard DDM to include several different suboptimalities1.

First, their model includes three types of noise that describe variability in the initial state of the accumulator, σi, the integration of evidence over time, σa, and the encoding of sensory stimuli, σs.

Second, they include a ‘memory parameter’, λ, to describe the extent to which the model is ‘forgetful’ or ‘impulsive’. In particular, a forgetful accumulator (λ < 0) forgets previous evidence and exhibits the recency effect, while an impulsive accumulator (λ > 0) overweights early evidence and exhibits a primacy effect. When there is no memory noise (λ = 0), the integration kernel is flat.

Third, the bound, B, describes the threshold of evidence at which the model makes a decision. In the context of an interrogation paradigm, evidence coming after the bound has been crossed is ignored.

Fourth, a sensory adaptation process controls the impact of successive clicks on the same size. This process is controlled by two adaptation parameters: (1) the direction of adaptation, ϕ, which dictates whether the impact of a click on one side either increases (ϕ > 1) or decreases (ϕ < 1) with the number of clicks that were previously on the same side; and (2) a time constant τϕ, which determines how quickly the adapted impact recovers to 1.

Finally, the bias describes an overall bias, and the lapse rate describes the probability of a random response being made.

Overall, the Brunton model has nine free parameters—three types of neuronal noise, memory noise, bound, two parameters controlling sensory adaptation, bias and lapse rate. We fit these parameters using the maximum-likelihood procedure described previously1 and following code from ref. 41. We generated choices for each participant using the best-fitting parameters, and computed an integration kernel for each participant using these model-generated choices.

As shown in Fig. 6, the model provides a good fit to the main aspects of the data, including the integration kernel (Fig. 6a) and the psychometric curve (Fig. 6b). In addition, the parameter values we found from these fits were comparable to those found in Brunton et al.1 for the human version of their task (Supplementary Fig. 12). This provides a good replication of their findings in a larger cohort of participants.

Looking in more detail at the parameters, we find that pupil change is marginally significantly negatively correlated with bound (Pearson’s r = −0.28; FDR-corrected for multiple comparisons P = 0.05; Fig. 6c) and not significantly correlated with the other eight parameters (that is, participants with high pupil change have lower bound in the accumulator). This result suggests that pupil change affects the integration kernel shape, potentially by changing the height of the bound.

## Discussion

In this paper, we investigated four sources of suboptimality in human evidence integration—neuronal noise (as reflected in the SNR), uneven integration kernel, sequential effects and side bias—and their relationship with pupil diameter at the across-participant and within-participant levels. We showed that all four types of suboptimality were at play in our perceptual decision-making task. These included variance that could not be explained by another source (that is, ‘noise’, a predominantly ‘bump’ integration kernel, sequential effects in the form of a positive reinforcement learning effect (choosing the previous correct answer) and an alternation choice kernel effect (choosing the opposite of the previous choice), and an overall side bias (Fig. 2)). In addition, across the population, participants with stronger sequential effects (reinforcement learning and choice kernel) tended to rely less on evidence from the current trial (smaller SNR), and participants with one kind of sequential effect (reinforcement learning) also tended to have the other kind (choice kernel) (Fig. 3). At the physiological level, two of the four suboptimalities were associated with pupil dilation, at the trial-by-trial and individual difference levels, respectively. At the individual difference level, only the integration kernel was associated with pupil change, with a more uneven profile of integration being associated with larger pupil change during stimulus presentation (Fig. 4). Conversely, at the trial-by-trial level, only noise was associated with pupil change, with a smaller SNR being associated with larger pupil change on that trial (Fig. 5). Our work adds to growing literature on the suboptimalities in evidence accumulation and perceptual decision-making and their relationship with pupil dilation. Below, we discuss the implications of our behavioural and pupillometric findings.

At the behavioural level, our findings are consistent with a number of previous results showing the presence of noise in the integration process1,8, uneven weighting of information over time4,18,19, and the presence of order effects20,21,23 and side biases1. In addition, by running our task in a large number of participants (something not traditionally done in the animal literature), we were able to expose the relationships between the suboptimalities at the individual difference level. Intriguingly, this analysis suggests an antagonistic relationship between the use of information from the past trial (that is, reinforcement learning and choice kernel effects) and processing of the current stimulus, as reflected in SNR. Such an effect may reflect a kind of compensatory process in low performers. That is, people who are less able to process the stimulus correctly (low SNR) may rely more on sequential effects to (either explicitly or implicitly) try to compensate. While such a strategy is not adaptive for this task, this approach would pay off if there was autocorrelation in the task.

In addition to the correlations between suboptimalities, one unexpected behavioural finding is the shape of the uneven integration kernel—specifically, the ‘bump’ kernel, where clicks in the middle are weighed more than those at the beginning or the end. This contrasts with previous work on perceptual decision-making by Brunton and colleagues1, who found the integration kernel of rats and well-trained humans to be flat, Yates and colleagues4, who showed a purely primacy-driven integration kernel in monkeys, and several studies showing that humans have a recency kernel18,19. Given this difference in results, one obvious question is whether the bump kernel is a genuine feature of the integration process or some artefact of either the analysis pipeline or the task.

With respect to the analysis, one possibility is that the bump may result from a mixture of participants with primacy and recency kernels, which average together to form the bump. To test this, we categorized the integration kernel for each participant into one of the following four shapes: bump, primacy, recency or flat (for the categorization method, see Supplementary Note 4). All 108 of these are plotted in Supplementary Fig. 13. From here, it is easy to see that a large number of participants (49%) exhibit the bump kernel. This suggests that, at least on the level of individual participants, the bump kernel is a feature of the integration process, and not just an artefact of averaging. Of course, the possibility remains that this bump kernel is a result of mixing primacy and recency kernels within subject (for example, some trials have primacy kernels and some have recency kernels). More detailed modelling work will be needed to tease these interpretations apart.

With respect to the task, another possible cause for the bump kernel comes from the number of clicks in each stimulus being fixed. This fixed number of clicks in each stimulus means that an ideal observer, who is aware that there are only 20 clicks in each stimulus, could safely stop integrating clicks if the excess number of clicks favouring one side exceeds the number of remaining clicks. That is, by fixing the number of clicks, we may be implicitly favouring a bounded integration process (with a collapsing bound). Such bound crossing would cause the later clicks to be down-weighted on average, as we see in the later part of the bump profile. Of course, bound crossing would not account for the initial rise in weights for the bump profile, which would need some additional mechanism (perhaps a recency effect combined with a bound) as an explanation. Incidentally, this account would fit with the recency bias found in perceptual categorization in previous studies18,19. An important direction for future research would be to test whether this account fully explains the bump profile, with more detailed modelling in addition to more experiments in which the total number of clicks in each stimulus is not fixed.

At the physiological level, our results add to rapidly growing literature on the relationship between pupil dilation and decision-making. In particular, this literature has reported associations between pupil dilation and a number of suboptimalities, including noise18,21, reinforcement learning effects39, choice kernel effects21 and pre-existing biases28,33. Our work builds on previous work by examining the relationship between pupil dilation and all of these suboptimalities in a single task and a single cohort of participants. Our work also extends previous work by looking at the relationship between pupil dilation and the shape of the integration kernel. Below, we situate our results with respect to this previous literature, considering each of the suboptimalities in turn.

With regard to SNR, we found that increased pupil change is associated with lower SNR on each trial. This finding is consistent with much of the previous literature. For example, in the dot motion paradigm, Murphy and colleagues31 showed that trial-by-trial variability in the evidence accumulation process was associated with increased pupil dilation. Likewise in other perceptual decision-making tasks, several authors have observed an association with increased pupil dilation and noise in behaviour18,21. Outside perceptual decision-making, Jepma and Nieuwenhuis42 observed the same relationship between pupil change and decision noise in a reinforcement learning-based explore–exploit task.

Of course, while the finding that trial-to-trial pupil dilation is associated with trial-to-trial behavioural variability is robust across studies, exactly what this finding means is open to interpretation. In this paper, we have related it to SNR, with the interpretation that changes in the pupil reflect changes in SNR, which cause poor performance. If one takes pupil change as an index of activity in the locus coeruleus, our interpretation is consistent with the adaptive gain theory of norepinephrine function, such that increased locus coeruleus activity causes more variability in behaviour via changes in neural gain26,43.

An alternate interpretation, put forth by Urai and colleagues21, is that pupil change reflects subjective uncertainty and that participants are more uncertain on trials in which they perform poorly. In this interpretation, the direction of causality is reversed: it is poor performance that leads to changes in pupil response, via its effect on uncertainty (which, incidentally, may also be related to the locus coeruleus44). Distinguishing between these accounts, which predict almost identical relationships between pupil change and behavioural variability, will be difficult with correlational experiments such as ours, and future work using pharmacological and other causal interventions will be necessary to determine the direction of the relationship between pupil change (putatively the locus coeruleus) and noise.

With regard to sequential effects, in contrast with our result showing no relationship between pupil change and reinforcement learning and choice kernel, a number of studies have found relationships between pupil change and sequential effects. For example, Nassar and colleagues39 showed that both baseline pupil size and pupil change modulate how information from previous trials affects current choice39—a result that was recently replicated in a different version of the task40. Similarly, in a perceptual decision-making task, Urai and colleagues21 showed that pupil dilation on the previous trial modulated the extent to which that trial influenced the current choice.

One possible cause of the difference between our results and this previous work is the overall magnitude of the sequential effects in the respective tasks. Specifically, in our task, the sequential effects were small, with the combined effect of reinforcement learning and choice kernel equating to about 2 clicks, or 10% of the variance in the response. Conversely, in ref. 21, the previous trial effects accounted for almost 100% of the variance when evidence on the current trial was weak. Likewise, in refs. 39,40, successful performance of the tasks required the use of sequential effects, so the sequential effects observed were huge. This difference in the overall magnitude of the sequential effects could simply have made modulation of these sequential effects by pupil change too small to observe in our task.

Another possible cause of the difference in results is the timing of the pupil signal that we focused on. Specifically, our task was optimized to look at the pupil response during presentation of the click stimuli and not at pupil response at other points in the task, such as at baseline or following choice and feedback, which can have very different behavioural and computational correlates45. This difference in timing is especially important for the Urai et al.21 results where the pupil signal modulating sequential effects was computed 250 ms before feedback, which was at least 2,800 ms after stimulus onset. Such a time lag would be well into the inter-trial interval and possibly even the next trial in our task, making the corresponding pupil signal difficult for us to compute. Indeed, when we looked at the signal at these later times, there was no association between pupil change and sequential effects (Supplementary Note 5 and Supplementary Figs. 1415). Clearly, future experiments with additional delays will be necessary to determine whether pupil change modulates sequential effects in our task.

Finally, a number of other authors have related individual differences in pupil dilation to a number of other biases, including risk aversion46, learning styles28 and the framing effect33. While these biases are not directly related to integrating evidence over time, the more general point that individuals with large pupil change have more bias across a range of tasks is consistent with our result that individual differences in pupil change modulate the integration kernel. In particular, we find that people with greater pupil change show more deviation (that is, more bias) from the ideal, flat, integration kernel. Taken together, these results suggest that (at least some) deviations from optimality are modulated by pupil change, possibly via its association with the locus coeruleus.

More generally, the difference between the between- and within-participant pupil response results is intriguing. On the one hand, individual differences in pupil change correlate with kernel shape. On the other hand, trial-by-trial fluctuations in pupil change correlate with SNR. Why exactly would the individual differences and trial-by-trial correlates of the same signal be so different?

One possibility, originally raised in ref. 28, is that these slightly different measures of pupil diameter—individual differences versus trial-by-trial fluctuation—may represent different neural measures, with the average of pupil dilations representing baseline or tonic locus coeruleus signals, and the trial-to-trial fluctuations of pupil response reflecting transient, or phasic, locus coeruleus firing. At the individual difference level, Eldar and colleagues28,33 have suggested that the mean pupil response within a participant is a measure of tonic locus coeruleus activity. In contrast, at the trial-by-trial level, work in monkeys and in mice has suggested that moment-to-moment pupil diameter changes track phasic locus coeruleus firing24,25. Applying these interpretations to the present findings suggests that tonic locus coeruleus activity changes the kernel of integration while phasic locus coeruleus activity decreases the SNR.

The interpretation that tonic locus coeruleus activity modulates the integration kernel between participants is consistent with previous work showing that individual differences in pupil change correlate with individual differences in susceptibility to a variety of cognitive and decision biases28,33. Importantly, theoretical work has shown with a biophysically based neural network model that high tonic locus coeruleus activity acts to amplify attractor dynamics, essentially causing the storage of impulsive decisions38. This can serve as a partial explanation for why our results revealed a positive correlation between individual pupil change (a proxy for tonic locus coeruleus activity) and early kernel weight. Furthermore, empirical work has shown that pupil-linked arousal (associated with locus coeruleus–norepinephrine activity and neural gain) is related to time-varying changes in the decision bound36. Specifically, a higher pupil response was found to reflect a stronger urgency signal (lower decision bound). While this relationship between pupil response and urgency was only found for the case in which participants faced a decision deadline, our result that pupil change was correlated with the bound in DDM (Fig. 6) reinforces this account of pupil change reflecting a change in the decision bound, and thus effecting a change in the shape of the integration kernel. Clearly, more detailed experimental work will be needed to further test this result.

The interpretation that phasic locus coeruleus activity, as indexed by trial-by-trial pupil change, modulates SNR is consistent with a number of pupil findings, as outlined above18,21,31. However, it is at odds with a number of findings from direct locus coeruleus recordings in monkeys, where enhanced phasic locus coeruleus activity is associated with better task performance47. Understanding these results in more detail, with experiments in animals and neuroimaging in humans, will be important if we are to fully understand that the locus coeruleus plays a part in these decisions.

## Methods

### Participants

A total of 188 healthy participants (University of Arizona undergraduate students) took part in the experiment for course credit. Data from 108 participants were used in the analysis. We excluded 19 participants due to poor performance (accuracy lower than 60%), 6 due to missing behavioural data files, 11 due to missing pupillometry data files, and another 44 due to poor eye-tracking data (see the section ‘Eye tracking’ below). All participants provided informed written consent before participating in the study. All procedures conformed to the human subject ethical regulations. All study procedures and consent were approved by the University of Arizona Institutional Review Board.

### Bernoulli clicks task

Participants made a series of auditory perceptual decisions. On each trial, they listened to a series of 20 auditory ‘clicks’ presented over the course of 1 s. Clicks could be either ‘left’ or ‘right’ clicks, presented in the left or right ear. Participants decided which ear received the most clicks. In contrast with the Poisson clicks task1, in which the click timing was random, clicks in our task were presented every 50 ms with a fixed probability (P = 0.55) of occurring in the ‘correct’ ear. The correct side was determined with a fixed 50% probability. Feedback appeared 500 ms after response, followed by a 1-s fixation delay before the next trial.

Participants performed the task on a desktop computer, while wearing headphones, and were positioned in chin rests to facilitate eye tracking and pupillometry. They were instructed to fixate on a symbol displayed in the centre of the screen, where response and outcome feedback was also displayed during trials, and communicated their responses using a standard keyboard. Participants played until they gave 500 correct responses, or when 50 min of total experiment time was reached.

### Behavioural analyses

We modelled the choice with logistic regression using equation (3). In particular, we assumed that the probability of choosing left on trial t is a sigmoidal function of the impact from each click, the impact from five previous trial correct sides, the impact from five previous trial choices, and an overall side bias. In this model, by giving the ith click its own weight, we could account for the overall integration kernel.

### Eye tracking

A desk-mounted EyeTribe eye tracker was used to measure participants’ pupil diameter from both eyes at a rate of 30 samples per second while they were performing the behavioural task with their head fixed on a chin rest. Pupil diameter data were preprocessed to detect and remove blinks and other artefacts. Pupil diameter was z-scored across the entire experiment before analysis. For each trial, pupil response was extracted time-locked to the start of the trial (Fig. 4a). Change in pupil response was computed as the difference between the peak diameter and the minimum diameter during the 1 s following trial onset. Pupil response measurements in which more than one-third of the samples contained artefacts were considered invalid and excluded from the analysis. Only participants with at least 200 valid trials were included in analysis (n = 108).

### Across-participant pupil analysis

For each participant, we took the mean pupil response across trials and computed the change in pupil diameter, as described above. We then compared this pupil change measurement with regression weights from equation (3). Specifically, we performed a two-way mixed ANOVA in which pupil change is a between-subject variable, time is a within-subject variable and regression weight is the dependent variable. We inspected the main effect of pupil change, which informed whether the average regression weight changed with pupil change across participants. We also inspected the interaction effect between pupil change and time, which informed whether pupil change modulates the effect of time on regression weights (that is, the integration kernel).

### Trial-by-trial within-participant pupil analysis

For each trial, we took the pupil response and computed the change in pupil diameter. We then modelled participants’ choices with the logistic model in equations (4) and (5) to parse out trial-by-trial effects of pupil response on integration. The first three terms in both equations were similar to equation (3). However, in addition, we assumed that choice was also a function of the interaction between trial-by-trial pupil change and clicks, previous correct side and previous choice.

### DDM with sticky bound analysis

We fitted the nine-parameter DDM with sticky bound originally described by Brunton et al.1,41 (see ‘Code availability’ statement). Since the unit for the noise is set up in the code to be variance per second instead of variance per click, the noise depends on the click rate. Therefore, we changed the total sampling rate in the code from the default 40 Hz (which is the sampling rate used in Brunton et al.’s rat version of the task) to 20 Hz (the sampling rate used in our task and in Brunton et al.’s human auditory version of the task). We fitted the model on our human choice data using a maximum-likelihood approach.

### Statistics

All data analyses and statistics were performed in MATLAB and R. Repeated-measures ANOVA, two-way mixed ANOVA and the corresponding post-hoc tests were done in R. All other analyses and statistical tests were done in MATLAB.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Code availability

Experiment code was created with Psychtoolbox-3 and custom MATLAB code. All behavioural and pupil analyses were created with custom MATLAB and R code. All code can be found at https://github.com/janekeung129/clicks-pupil. Code for fitting the DDM with sticky bound1,41 is provided at http://github.com/misun6312/PBupsModel.jl.

## Data availability

The datasets generated and analysed during the current study are available from the corresponding author upon reasonable request.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

1. 1.

Brunton, B. W., Botvinick, M. M. & Brody, C. D. Rats and humans can optimally accumulate evidence for decision-making. Science 340, 95–98 (2013).

2. 2.

Erlich, J. C., Brunton, B. W., Duan, C. A., Hanks, T. D. & Brody, C. D. Distinct effects of prefrontal and parietal cortex inactivations on an accumulation of evidence task in the rat. eLife 4, e05457 (2015).

3. 3.

Katz, L. N., Yates, J. L., Pillow, J. W. & Huk, A. C. Dissociated functional significance of decision-related activity in the primate dorsal stream. Nature 535, 285–288 (2016).

4. 4.

Yates, J. L., Park, I. M., Katz, L. N., Pillow, J. W. & Huk, A. C. Functional dissection of signal and noise in MT and LIP during decision-making. Nat. Neurosci. 20, 1285–1292 (2017).

5. 5.

Newsome, W. T. & Pare, E. B. A selective impairment of motion perception following lesions of the middle temporal visual area (MT). J. Neurosci. 8, 2201–2211 (1988).

6. 6.

Hanks, T. D. et al. Distinct relationships of parietal and prefrontal cortices to evidence accumulation. Nature 520, 220–223 (2015).

7. 7.

Gold, J. I. & Shadlen, M. N. Neural computations that underlie decisions about sensory stimuli. Trends Cogn. Sci. 5, 10–16 (2001).

8. 8.

Drugowitsch, J., Wyart, V., Devauchelle, A.-D. & Koechlin, E. Computational precision of mental inference as critical source of human choice suboptimality. Neuron 92, 1398–1411 (2016).

9. 9.

Faisal, A. A., Selen, L. P. & Wolpert, D. M. Noise in the nervous system. Nat. Rev. Neurosci. 9, 292–303 (2008).

10. 10.

Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes. Nat. Neurosci. 9, 1432–1438 (2006).

11. 11.

Beck, J. M., Ma, W. J., Pitkow, X., Latham, P. E. & Pouget, A. Not noisy, just wrong: the role of suboptimal inference in behavioral variability. Neuron 74, 30–39 (2012).

12. 12.

Smith, P. L. & Ratcliff, R. Psychology and neurobiology of simple decisions. Trends Neurosci. 27, 161–168 (2004).

13. 13.

Osborne, L. C, Lisberger, S. G. & Bialek, W. A sensory source for motor variation. Nature 437, 412–416 (2005).

14. 14.

Kaufman, M. T. & Churchland, A. K. Cognitive neuroscience: sensory noise drives bad decisions. Nature 496, 172–173 (2013).

15. 15.

Sutton, R. S. & Barto, A. G. Introduction to Reinforcement Learning (MIT Press, Cambridge, 1998).

16. 16.

Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).

17. 17.

Griffiths, T. L. & Tenenbaum, J. B. Optimal predictions in everyday cognition. Psychol. Sci. 17, 767–773 (2006).

18. 18.

Cheadle, S. et al. Adaptive gain control during human perceptual choice. Neuron 81, 1429–1441 (2014).

19. 19.

Wyart, V., Myers, N. E. & Summerfield, C. Neural mechanisms of human perceptual choice under focused and divided attention. J. Neurosci. 35, 3485–3498 (2015).

20. 20.

Abrahamyan, A., Silva, L. L., Dakin, S. C., Carandini, M. & Gardner, J. L. Adaptable history biases in human perceptual decisions. Proc. Natl Acad. Sci. USA 113, E3548–E3557 (2016).

21. 21.

Urai, A. E., Braun, A. & Donner, T. H. Pupil-linked arousal is driven by decision uncertainty and alters serial choice bias. Nat. Commun. 8, 14637 (2017).

22. 22.

Barraclough, D. J., Conroy, M. L. & Lee, D. Prefrontal cortex and decision making in a mixed-strategy game. Nat. Neurosci. 7, 404–410 (2004).

23. 23.

Akrami, A., Kopec, C. D., Diamond, M. E. & Brody, C. D. Posterior parietal cortex represents sensory history and mediates its effects on behaviour. Nature 554, 368–372 (2018).

24. 24.

Joshi, S., Li, Y., Kalwani, R. M. & Gold, J. I. Relationships between pupil diameter and neuronal activity in the locus coeruleus, colliculi, and cingulate cortex. Neuron 89, 221–234 (2016).

25. 25.

Reimer, J. et al. Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex. Nat. Commun. 7, 13289 (2016).

26. 26.

Aston-Jones, G. & Cohen, J. D. An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu. Rev. Neurosci. 28, 403–450 (2005).

27. 27.

Rajkowski, J. Correlations between locus coeruleus (LC) neural activity, pupil diameter and behavior in monkey support a role of LC in attention. Soc. Neurosci. abstr. 19, 974 (1993).

28. 28.

Eldar, E., Cohen, J. D. & Niv, Y. The effects of neural gain on attention and learning. Nat. Neurosci. 16, 1146–1153 (2013).

29. 29.

Sara, S. J. The locus coeruleus and noradrenergic modulation of cognition. Nat. Rev. Neurosci. 10, 211–223 (2009).

30. 30.

Cavanagh, J. F., Wiecki, T. V., Kochar, A. & Frank, M. J. Eye tracking and pupillometry are indicators of dissociable latent decision processes. J. Exp. Psychol. Gen. 143, 1476–1488 (2014).

31. 31.

Murphy, P. R., Vandekerckhove, J. & Nieuwenhuis, S. Pupil-linked arousal determines variability in perceptual decision making. PLoS Comput. Biol. 10, e1003854 (2014).

32. 32.

Mather, M., Clewett, D., Sakaki, M. & Harley, C. W. Norepinephrine ignites local hotspots of neuronal excitation: how arousal amplifies selectivity in perception and memory. Behav. Brain Sci. 39, e200 (2016).

33. 33.

Eldar, E., Felso, V., Cohen, J. D. & Niv, Y. A pupillary index of susceptibility to decision biases. Preprint at https://www.biorxiv.org/content/10.1101/247890v1 (2018).

34. 34.

De Gee, J. W., Knapen, T. & Donner, T. H. Decision-related pupil dilation reflects upcoming choice and individual bias. Proc. Natl Acad. Sci. USA 111, E618–E625 (2014).

35. 35.

De Gee, J. W. et al. Dynamic modulation of decision biases by brainstem arousal systems. eLife 6, e23232 (2017).

36. 36.

Murphy, P. R., Boonstra, E. & Nieuwenhuis, S. Global gain modulation generates time-dependent urgency during perceptual choice in humans. Nat. Commun. 7, 13526 (2016).

37. 37.

Hauser, T. U., Moutoussis, M., Purg, N., Dayan, P. & Dolan, R. J. Noradrenaline modulates decision urgency during sequential information gathering. Preprint at https://www.biorxiv.org/content/10.1101/252932v1 (2018).

38. 38.

Eckhoff, P., Wong-Lin, K. & Holmes, P. Optimality and robustness of a biophysical decision-making model under norepinephrine modulation. J. Neurosci. 29, 4301–4311 (2009).

39. 39.

Nassar, M. R. et al. Rational regulation of learning dynamics by pupil-linked arousal systems. Nat. Neurosci. 15, 1040–1046 (2012).

40. 40.

Krishnamurthy, K., Nassar, M. R., Sarode, S. & Gold, J. I. Arousal-related adjustments of perceptual biases optimize perception in dynamic environments. Nat. Hum. Behav. 1, 0107 (2017).

41. 41.

Yartsev, M. M., Hanks, T. D., Yoon, A. M. & Brody, C. D. Causal contribution and dynamical encoding in the striatum during evidence accumulation. eLife 7, e34929 (2018).

42. 42.

Jepma, M. & Nieuwenhuis, S. Pupil diameter predicts changes in the exploration–exploitation trade-off: evidence for the adaptive gain theory. J. Cogn. Neurosci. 23, 1587–1596 (2011).

43. 43.

Servan-Schreiber, D., Printz, H. & Cohen, J. D. A network model of catecholamine effects: gain, signal-to-noise ratio, and behavior. Science 249, 892–895 (1990).

44. 44.

Angela, J. Y. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).

45. 45.

O’Reilly, J. X. et al. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proc. Natl Acad. Sci. USA 110, E3660–E3669 (2013).

46. 46.

Yechiam, E. & Telpaz, A. To take risk is to face loss: a tonic pupillometry study. Front. Psychol. 2, 344 (2011).

47. 47.

Aston-Jones, G., Rajkowski, J., Kubiak, P. & Alexinsky, T. Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. J. Neurosci. 14, 4467–4480 (1994).

## Acknowledgements

The authors received no specific funding for this work. We thank M. Alberhasky, C. Andrade, D. Carrera, K. Chung, M. de Leon, Z. Dzhalilova, A. Esprit, A. Foley, E. Giron, B. Gonzalez, A. Haddad, L. Hall, M. Higgs, M. Jacobs, M.-H. Kang, K. Kellohen, N. Kwatra, H. Kyllo, A. Lawwill, S. Low, C. Lynch, A. Ornelas, G. Patterson, F. Santos, S. Savita, C. Sikora, V. Thornton, G. Vargas, C. West and C. Wong for help with running the experiments. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

## Author information

### Author notes

1. These authors contributed equally: Waitsang Keung, Todd A. Hagen.

### Affiliations

1. #### Department of Psychology, University of Arizona, Tucson, AZ, USA

• Waitsang Keung
• , Todd A. Hagen
•  & Robert C. Wilson
2. #### Cognitive Science Program, University of Arizona, Tucson, AZ, USA

• Robert C. Wilson

### Contributions

W.K. analysed the data. T.H. collected and preprocessed the data. T.A.H. and R.C.W. designed the experiment. W.K. and R.C.W. wrote the manuscript. All three authors contributed to interpretation of the results and critical discussion.

### Competing interests

The authors declare no competing interests.

### Corresponding author

Correspondence to Waitsang Keung.

## Supplementary information

1. ### Supplementary Information

Supplementary Notes 1–5, Supplementary Figures 1–15, and Supplementary References.

3. ### Supplementary Software

Description: Custom code that implements the major analyses described in the paper