Environments furnish multiple information sources for making predictions about future events. Here we use behavioural modelling and functional magnetic resonance imaging to describe how humans select predictors that might be most relevant. First, during early encounters with potential predictors, participants’ selections were explorative and directed towards subjectively uncertain predictors (positive uncertainty effect). This was particularly the case when many future opportunities remained to exploit knowledge gained. Then, preferences for accurate predictors increased over time, while uncertain predictors were avoided (negative uncertainty effect). The behavioural transition from positive to negative uncertainty-driven selections was accompanied by changes in the representations of belief uncertainty in ventromedial prefrontal cortex (vmPFC). The polarity of uncertainty representations (positive or negative encoding of uncertainty) changed between exploration and exploitation periods. Moreover, the two periods were separated by a third transitional period in which beliefs about predictors’ accuracy predominated. The vmPFC signals a multiplicity of decision variables, the strength and polarity of which vary with behavioural context.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
We have deposited all choice raw data used for the analyses in the OSF repository at https://osf.io/d5qzw/?view_only=037ea3b875914623a06999cef97ac57f. We have deposited unthresholded fMRI maps of all contrasts depicted in the manuscript on NeuroVault at https://identifiers.org/neurovault.collection:8073. Source data are provided with this paper.
The above OSF repository includes the full Bayesian modelling pipeline. Relevant behavioural and neural regressors were derived from this pipeline. We also provide the code for behavioural GLMs shown in Fig. 3. Please follow the README file inside the repository for details of its use: https://osf.io/d5qzw/?view_only=037ea3b875914623a06999cef97ac57f.
Akaishi, R., Kolling, N., Brown, J. W. & Rushworth, M. Neural mechanisms of credit assignment in a multicue environment. J. Neurosci. 36, 1096–1112 (2016).
Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V. & Niv, Y. Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93, 451–463 (2017).
Garrett, N., González-Garzón, A. M., Foulkes, L., Levita, L. & Sharot, T. Updating beliefs under perceived threat. J. Neurosci. 38, 7901–7911 (2018).
Charpentier, C. J., Bromberg-Martin, E. S. & Sharot, T. Valuation of knowledge and ignorance in mesolimbic reward circuitry. Proc. Natl Acad. Sci. USA 115, E7255–E7264 (2018).
Mackintosh, N. J. A theory of attention: variations in the associability of stimuli with reinforcement. Psychol. Rev. 82, 276–298 (1975).
Pearce, J. M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).
Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A. & Cohen, J. D. Humans use directed and random exploration to solve the explore–exploit dilemma. J. Exp. Psychol. Gen. 143, 2074–2081 (2014).
Kolling, N., Scholl, J., Chekroud, A., Trier, H. A. & Rushworth, M. F. S. Prospection, perseverance, and insight in sequential behavior. Neuron 99, 1069–1082.e7 (2018).
Findling, C., Skvortsova, V., Dromnelle, R., Palminteri, S. & Wyart, V. Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nat. Neurosci. 22, 2066–2077 (2019).
Basten, U., Biele, G., Heekeren, H. R. & Fiebach, C. J. How the brain integrates costs and benefits during decision making. Proc. Natl Acad. Sci. USA 107, 21767–21772 (2010).
Boorman, E. D., Behrens, T. E. J., Woolrich, M. W. & Rushworth, M. F. S. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62, 733–743 (2009).
Chau, B. K. H., Kolling, N., Hunt, L. T., Walton, M. E. & Rushworth, M. F. S. A neural mechanism underlying failure of optimal choice with multiple alternatives. Nat. Neurosci. 17, 463–470 (2014).
De Martino, B., Fleming, S. M., Garrett, N. & Dolan, R. J. Confidence in value-based choice. Nat. Neurosci. 16, 105–110 (2012).
FitzGerald, T. H. B., Seymour, B. & Dolan, R. J. The role of human orbitofrontal cortex in value comparison for incommensurable objects. J. Neurosci. 29, 8388–8395 (2009).
Fouragnan, E. F. et al. The macaque anterior cingulate cortex translates counterfactual choice value into actual behavioral change. Nat. Neurosci. 22, 797–808 (2019).
Papageorgiou, G. K. et al. Inverted activity patterns in ventromedial prefrontal cortex during value-guided decision-making in a less-is-more task. Nat. Commun. 8, 1886 (2017).
Philiastides, M. G., Biele, G. & Heekeren, H. R. A mechanistic account of value computation in the human brain. Proc. Natl Acad. Sci. USA 107, 9430–9435 (2010).
Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
Hunt, L. T. et al. Triple dissociation of attention and decision computations across prefrontal cortex. Nat. Neurosci. 21, 1471–1481 (2018).
Lim, S.-L., O’Doherty, J. P. & Rangel, A. The decision value computations in the vmPFC and striatum use a relative value code that is guided by visual attention. J. Neurosci. 31, 13214–13223 (2011).
Lopez-Persem, A., Domenech, P. & Pessiglione, M. How prior preferences determine decision-making frames and biases in the human brain. eLife 5, e20317 (2016).
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Kolling, N., Behrens, T. E. J., Mars, R. B. & Rushworth, M. F. S. Neural mechanisms of foraging. Science 336, 95–98 (2012).
Zajkowski, W. K., Kossut, M. & Wilson, R. C. A causal role for right frontopolar cortex in directed, but not random, exploration. eLife https://doi.org/10.7554/eLife.27430 (2017).
Badre, D., Doll, B. B., Long, N. M. & Frank, M. J. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron 73, 595–607 (2012).
Costa, V. D., Mitz, A. R. & Averbeck, B. B. Subcortical substrates of explore–exploit decisions in primates. Neuron 103, 533–545.e5 (2019).
Noonan, M. P., Kolling, N., Walton, M. E. & Rushworth, M. F. S. Re-evaluating the role of the orbitofrontal cortex in reward and reinforcement: re-evaluating the OFC. Eur. J. Neurosci. 35, 997–1010 (2012).
Hunt, L. T. et al. Mechanisms underlying cortical activity during value-guided choice. Nat. Neurosci. 15, 470–476 (2012).
Rushworth, M. F. S., Noonan, M. P., Boorman, E. D., Walton, M. E. & Behrens, T. E. Frontal cortex and reward-guided learning and decision-making. Neuron 70, 1054–1069 (2011).
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Meder, D. et al. Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nat. Commun. 8, 1942 (2017).
Kolling, N., Wittmann, M. & Rushworth, M. F. S. Multiple neural mechanisms of decision making and their competition under changing risk pressure. Neuron 81, 1190–1202 (2014).
Wittmann, M. K. et al. Predictive decision making driven by multiple time-linked reward representations in the anterior cingulate cortex. Nat. Commun. 7, 12327 (2016).
Boorman, E. D., Behrens, T. E. & Rushworth, M. F. Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biol. 9, e1001093 (2011).
Boorman, E. D., Rushworth, M. F. & Behrens, T. E. Ventromedial prefrontal and anterior cingulate cortex adopt choice and default reference frames during sequential multi-alternative choice. J. Neurosci. 33, 2242–2253 (2013).
Kolling, N., Behrens, T., Wittmann, M. & Rushworth, M. Multiple signals in anterior cingulate cortex. Curr. Opin. Neurobiol. 37, 36–43 (2016).
Kolling, N. et al. Value, search, persistence and model updating in anterior cingulate cortex. Nat. Neurosci. 19, 1280–1285 (2016).
Hayden, B. Y., Pearson, J. M. & Platt, M. L. Neuronal basis of sequential foraging decisions in a patchy environment. Nat. Neurosci. 14, 933–939 (2011).
Quilodran, R., Rothé, M. & Procyk, E. Behavioral shifts and action valuation in the anterior cingulate cortex. Neuron 57, 314–325 (2008).
Stoll, F. M., Fontanier, V. & Procyk, E. Specific frontal neural dynamics contribute to decisions to check. Nat. Commun. 7, 11990 (2016).
Karlsson, M. P., Tervo, D. G. R. & Karpova, A. Y. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science 338, 135–139 (2012).
O’Reilly, J. X. et al. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proc. Natl Acad. Sci. USA 110, E3660–E3669 (2013).
Tervo, D. G. R. et al. Behavioral variability through stochastic choice and its gating by anterior cingulate cortex. Cell 159, 21–32 (2014).
Bernacchia, A., Seo, H., Lee, D. & Wang, X.-J. A reservoir of time constants for memory traces in cortical neurons. Nat. Neurosci. 14, 366–372 (2011).
Lebreton, M., Abitbol, R., Daunizeau, J. & Pessiglione, M. Automatic integration of confidence in the brain valuation signal. Nat. Neurosci. 18, 1159–1167 (2015).
Smith, S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23(Suppl. 1), S208–S219 (2004).
Jenkinson, M. & Smith, S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 5, 143–156 (2001).
N.T. was funded by a DTC ESRC studentship (ES/J500112/1), J.S. was supported by a MRC Skills Development Fellowship (MR/NO14448/1), M.C.K.-F. by a Sir Henry Wellcome Fellowship (103184/Z/13/Z), M.F.S.R. was funded by a Wellcome Senior Investigator Award (WT100973AIA). E.F. was funded by UKRI FLF (MR/T023007/1). The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. We would like to thank all members of the Rushworth lab for great discussions on this project.
The authors declare no competing interests.
Peer review information Primary Handling Editor: Marike Schiffer
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Participants might form expectations about possible sigma values across time. Then prior beliefs at block start should reflect information gathered from past observations. We constructed a Bayesian model that incorporated block-wise priors that reflect the previous history of all observations irrespective of predictor (referred to as adaptive model; green model) and compared it to the Bayesian model using uniform priors (referred to as original model; grey model). b, We replicated all effects of interest when deriving belief estimates with the adaptive Bayesian model (accuracy, t(23) = 4.7, p<0.001, d=0.96, 95% CI=[0.91 2.3]; uncertainty, t(23) = 1.2, p= 0.25, d=−0.24, 95% CI=[−0.77 0.21]; uncertainty x block time, t(23) = 6, p<0.001, d=1.2, 95% CI=[0.83 1.73]; accuracy x block time, t(23) = 2.6, p= 0.015, d=−0.54, 95% CI=[−1.1 −0.13]). c, The original Bayesian model was a better model fit for all behavioural analyses (here we only show across all trials) compared to the adaptive Bayesian model. One reason might be that a uniform prior provides more flexibility for estimates to converge towards their true value across time. d, Importantly, for behavioural and neural analyses, variables are constructed in relative terms (for example difference between left and right predictor): changing prior distributions only impact absolute values while the relative values keep the same proportions. For this reason, the key results remain unchanged when modifying initial block-wise priors. e, Next, we used the confidence judgement (that is interval size) at each block start averaged across four predictors as an index of prior beliefs and compared it across all (six) blocks. We show confidence judgements at the start of a block (red line) and at the end of a block (blue line). Participants reset their prior beliefs from one block to the next (blue line). Analysis of the interval size shows no credible evidence for a change of confidence judgment during the first encounters across blocks when excluding blocks that were affected by practice trials (red line: Mauchly’s test indicated a violation of equal variances: x2(9)=23,p=0.006, therefore we used Greenhouse Geiser: F(2.6,92) = 2.5, p =0.08, η2=0.71, Bayes factor10=0.91, error%=0.36). (see Supplementary Methods, Section 2 for detailed model construction and confidence judgement analysis) (n = 24; error bars are SEM across participants). Source data
The payoff scheme reflects participants’ beliefs in the accuracies and certainties associated with selected predictors, however it may itself exert an additional independent effect on behaviour and neural activity. We constructed a reinforcement learning (RL) model that tracked each predictor’s payoff history in a recency-weighted way and compared it to the Bayesian model using uniform priors (referred to as original model). a, Behavioural effects of interest (Fig. 3a) were replicated when controlling for the RL-derived value difference (in yellow) (accuracy, t(23) = 5.5, p<0.001, d=1.1, 95% CI=[0.48 1.1]; uncertainty, t(23) = −3.1, p= 0.0049, d=−0.63, 95% CI=[−0.75 −0.15]; uncertainty x block time, t(23) = 5.2, p<0.001, d=1.1, 95%CI =[0.49 1.13]; accuracy x block time, t(23) = −6.8, p<0.001, d=−1.4, 95% CI=[−0.84 −0.44] and RL value difference, t(23) = 11.9, p<0.001, d= 2.43, 95% CI =[0.7 0.99]). b, This is consistent with the relative lack of correlation between variables derived from the Bayesian model and the RL-derive value difference. c, A combination (red bar) of the RL value model (yellow bar) and original Bayesian model (grey bar) was the best fit for choice behaviour, supporting the relevance of value-based and information-based variables for explaining choice behaviour. d, We repeated the previous whole-brain analysis (fMRI-GLM1) and additionally included the RL value difference. We replicated a domain general prediction difference in vmPFC (upper panel), while there was no cluster-significant activation for RL-derived value difference (lower panel); however, the activation that was strongest was located within vmPFC. In conclusion, RL value terms complement the Bayesian model but do not substitute for the Bayesian model terms as an explanation of behaviour. (n = 24; error bars are SEM across participants; whole-brain effects family-wise error cluster corrected with z > 2.3 and p < 0.05). Source data
a, Model fits are better for the second compared to the first block half (paired t-test: t(23) 8.5,p<0.001,d=1.74,95% confidence interval(CI)=[28.5 46.8]). We tested whether we can replicate main effects of interest when equalizing model fits across block halves. b, c, We used choice residuals which offer an index of model fit but unlike BIC measures, they are specific to each trial and they do not depend on trial number or number of parameters. b, We show absolute choice residuals and block time for one example participant across all trials (‘short, medium and long’ refer to the horizon length). There is a limited correlation for all choice residuals across all trials and the block time variable (inset shows correlation across all participants; r=0.27;7% shared variance,95% CI=[−0.15 0.7]). This makes it very unlikely that results are driven by linear changes in model fit alone. c, We tested this empirically. First, we extracted absolute residuals from the main GLM across all trials and separated them into first and second halves. c-i, In accord with the model fit results, there was more residual variance during the first compared to second block half (paired t-test: t(23)=7.7, p<0.001, d=1.6, 95% CI=[0.08 0.14]). c-ii, Next, we excluded trials on the basis of the trial-wise choice residuals until there was no credible evidence that block halves were different in their residual variance (in effect this meant trials with residuals above 0.6 had to be excluded; paired t-test: t(23)=1.35, p=0.19, d= 0.28, 95% CI=[−0.004 0.02], Bayes factor10=0.48, error%=1.164e-4). d, Trials were collapsed back into one category and the main GLM (Fig. 3a) was applied onto the new subset of trials. We replicated all effects of interest (accuracy: t(23)=13.1, p<0.001, d=2.67, 95% CI=[2.8 3.8]; uncertainty: t(23) = −1.14, p=0.26, d=−0.23, 95% CI=[−1.6 0.45]; uncertainty x block time: t(23)=9.7, p<0.001, d=1.98, 95% CI=[2 3.1]; accuracy x block time: t(23)= −6.81, p<0.001, d=−1.39, 95% CI=[−2.8 -1.5]). (n = 24; error bars are SEM across participants). Source data
Extended Data Fig. 4 Trial classification into exploration/ exploitation according to individual choices.
A-i, We classified trials into exploration and exploitation according to subjects’ choices. To this end, we compared accuracy and uncertainty of the chosen and unchosen predictors, defining the prediction difference. Explorative choices (Ai-1) were defined as those directed towards higher uncertainty (positive uncertainty prediction difference) and less accurate predictors (negative accuracy prediction difference) (approximately 18% of trials). An exploitative trial (Ai-2) was defined by choices of predictors with a higher accuracy (positive accuracy prediction difference) and lower uncertainty (negative uncertainty prediction difference) than the unchosen predictor (approximately 52% of trials). A-ii, When participants chose predictors that were both more accurate and more uncertain, we compared the relative magnitudes of the accuracy prediction difference and the uncertainty prediction difference (Ai-3). If the difference in accuracy prediction was greater than the uncertainty prediction difference, then that trial was allocated to the exploitative bin. If the difference in uncertainty prediction difference was greater than the difference in accuracy prediction difference, then the trial was assigned to the exploratory bin. However, if the difference between the sizes of the decision variables was small (<5) then the trial was assigned to both exploration and exploitation bins (5% of trials). The remaining 25% of trials (white area in panel i) were not assigned to either exploitative or exploratory bins, because these choices were neither guided by uncertainty nor accuracy. A-iii, Example of an exploitative choice: the chosen predictor has a higher accuracy and a lower uncertainty prediction difference. B, As a manipulation check, we plot the prediction differences for accuracy and uncertainty separated by exploration and exploitation. We find that indeed exploratory trials are characterized by a positive uncertainty prediction difference (the chosen predictor was associated with greater uncertainty than the unchosen predictor) while exploitative trials are defined by a positive accuracy prediction difference (the chosen predictor was associated with greater predictive accuracy than the unchosen predictor) and negative uncertainty prediction difference (the chosen predictor was associated with greater negative predictive uncertainty than the unchosen predictor). For robustness of trial classification, see Supplementary Fig. 8. (n = 24).
Extended Data Fig. 5 Uncertainty-related signals in subcortical regions during exploration and exploitation.
We show subcortical activation associated with the uncertainty prediction difference for exploration, exploitation, and their difference. Activation is shown during the decision phase, and when relevant, during the outcome phase. We used bilateral masks and averaged the results over both hemispheres for each ROI. a, Amygdala represents uncertainty prediction difference during exploration more strongly than during exploitation (paired t-test: uncertainty prediction difference, explore vs exploit: t(23) = 3.5, p=0.002, d=0.71, 95% confidence interval (CI)=[6 23.5]). b, Activation patterns in VS during decision (left panel) and outcome phases (right panel) suggest its primary involvement is during exploitation. VS represented both a negative uncertainty prediction difference during the decision phase during exploitation (t(23)=−2.4, p=0.02, d=−0.49, 95% confidence interval=[−15.8 −1.3]) and the payoff during the outcome phase during both exploration and exploitation but it did so more strongly during exploitation (paired t-test: payoff for explore vs exploit (t(23) = −2.3, p=0.033, d=−0.47, 95% CI=[−21.2 −0.96]). c, Finally, VTA activity reflected uncertainty during both exploration and exploitation in the decision phase (exploration: t(23) = 2.3, p= 0.03, d=0.47, 95% CI=[1.94 40]; exploitation: t(23) = −3, p=0.007, d=−0.6, 95% CI=[−25.3 −4.4]). (n = 24; error bars are SEM across participants). Source data
Supplementary Methods (1, details on task versions; 2, alternative computational models), Supplementary Figs. 1–9 (related to: 3, methods/experimental design; 4, neural results), Supplementary Tables 1–3 (6, peak coordinates of cluster-corrected whole-brain effects) and Supplementary References.
Statistical source data for Fig. 3.
Statistical source data for Fig. 6.
Statistical source data for Fig. 7.
Statistical source data for Extended Data Fig. 1.
Statistical source data for Extended Data Fig. 2.
Statistical source data for Extended Data Fig. 3
Statistical source data for Extended Data Fig. 5.
About this article
Cite this article
Trudel, N., Scholl, J., Klein-Flügge, M.C. et al. Polarity of uncertainty representation during exploration and exploitation in ventromedial prefrontal cortex. Nat Hum Behav (2020). https://doi.org/10.1038/s41562-020-0929-3