Deciding how long to keep waiting for future rewards is a nontrivial problem, especially when the timing of rewards is uncertain. We carried out an experiment in which human decision makers waited for rewards in two environments in which reward-timing statistics favored either a greater or lesser degree of behavioral persistence. We found that decision makers adaptively calibrated their level of persistence for each environment. Functional neuroimaging revealed signals that evolved differently during physically identical delays in the two environments, consistent with a dynamic and context-sensitive reappraisal of subjective value. This effect was observed in a region of ventromedial prefrontal cortex that is sensitive to subjective value in other contexts, demonstrating continuity between valuation mechanisms involved in discrete choice and in temporally extended decisions analogous to foraging. Our findings support a model in which voluntary persistence emerges from dynamic cost/benefit evaluation rather than from a control process that overrides valuation mechanisms.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Scientific Reports Open Access 24 October 2018
Nature Communications Open Access 27 June 2018
Nature Communications Open Access 23 January 2018
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Mischel, W. & Ebbesen, E.B. Attention in delay of gratification. J. Pers. Soc. Psychol. 16, 329–337 (1970).
Baumeister, R.F., Vohs, K.D. & Tice, D.M. The strength model of self-control. Curr. Dir. Psychol. Sci. 16, 351–355 (2007).
Bartra, O., McGuire, J.T. & Kable, J.W. The valuation system: A coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76, 412–427 (2013).
Clithero, J.A. & Rangel, A. Informatic parcellation of the network involved in the computation of subjective value. Soc. Cogn. Affect. Neurosci. 9, 1289–1302 (2013).
Liu, X., Hairston, J., Schrier, M. & Fan, J. Common and distinct networks underlying reward valence and processing stages: a meta-analysis of functional neuroimaging studies. Neurosci. Biobehav. Rev. 35, 1219–1236 (2011).
Levy, D.J. & Glimcher, P.W. The root of all value: A neural common currency for choice. Curr. Opin. Neurobiol. 22, 1027–1038 (2012).
Kable, J.W. & Glimcher, P.W. The neural correlates of subjective value during intertemporal choice. Nat. Neurosci. 10, 1625–1633 (2007).
Kable, J.W. & Glimcher, P.W. An “as soon as possible” effect in human intertemporal decision making: Behavioral evidence and neural mechanisms. J. Neurophysiol. 103, 2513–2531 (2010).
Hare, T.A., Camerer, C.F. & Rangel, A. Self-control in decision-making involves modulation of the vmPFC valuation system. Science 324, 646–648 (2009).
Mischel, W., Ayduk, O. & Mendoza-Denton, R. Sustaining delay of gratification over time: a hot-cool systems perspective. in Time and Decision: Economic and Psychological Perspectives on Intertemporal Choice (eds. Loewenstein, G., Read, D. & Baumeister, R.F.) 175–200 (Russell Sage Foundation, New York, 2003).
Metcalfe, J. & Mischel, W. A hot/cool-system analysis of delay of gratification: dynamics of willpower. Psychol. Rev. 106, 3–19 (1999).
McGuire, J.T. & Kable, J.W. Rational temporal predictions can underlie apparent failures to delay gratification. Psychol. Rev. 120, 395–410 (2013).
McGuire, J.T. & Kable, J.W. Decision makers calibrate behavioral persistence on the basis of time-interval experience. Cognition 124, 216–226 (2012).
Rachlin, H. The Science of Self Control (Harvard University Press, 2000).
Dasgupta, P. & Maskin, E. Uncertainty and hyperbolic discounting. Am. Econ. Rev. 95, 1290–1299 (2005).
Kim, H., Shimojo, S. & O'Doherty, J.P. Overlapping responses for the expectation of juice and money rewards in human ventromedial prefrontal cortex. Cereb. Cortex 21, 769–776 (2011).
Hare, T.A., Malmaud, J. & Rangel, A. Focusing attention on the health aspects of foods changes value signals in vmPFC and improves dietary choice. J. Neurosci. 31, 11077–11087 (2011).
Hampton, A.N., Bossaerts, P. & O'Doherty, J.P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Hutcherson, C.A., Plassmann, H., Gross, J.J. & Rangel, A. Cognitive regulation during decision making shifts behavioral control between ventromedial and dorsolateral prefrontal value systems. J. Neurosci. 32, 13543–13554 (2012).
Casey, B.J. et al. Behavioral and neural correlates of delay of gratification 40 years later. Proc. Natl. Acad. Sci. USA 108, 14998–15003 (2011).
Figner, B. et al. Lateral prefrontal cortex and self-control in intertemporal choice. Nat. Neurosci. 13, 538–539 (2010).
Heatherton, T.F. & Wagner, D.D. Cognitive neuroscience of self-regulation failure. Trends Cogn. Sci. 15, 132–139 (2011).
Charnov, E.L. Optimal foraging, the marginal value theorem. Theor. Popul. Biol. 9, 129–136 (1976).
McNamara, J. Optimal patch use in a stochastic environment. Theor. Popul. Biol. 21, 269–288 (1982).
Hayden, B.Y., Pearson, J.M. & Platt, M.L. Neuronal basis of sequential foraging decisions in a patchy environment. Nat. Neurosci. 14, 933–939 (2011).
Fawcett, T.W., McNamara, J.M. & Houston, A.I. When is it adaptive to be patient? A general framework for evaluating delayed rewards. Behav. Processes 89, 128–136 (2012).
Nickerson, R.S. Response time to the second of two successive signals as a function of absolute and relative duration of intersignal interval. Percept. Mot. Skills 21, 3–10 (1965).
Griffiths, T.L. & Tenenbaum, J.B. Optimal predictions in everyday cognition. Psychol. Sci. 17, 767–773 (2006).
Montague, P.R., Dayan, P. & Sejnowski, T.J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Fiorillo, C.D., Newsome, W.T. & Schultz, W. The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci. 11, 966–973 (2008).
Cui, X., Stetson, C., Montague, P.R. & Eagleman, D.M. Ready.go: amplitude of the FMRI signal encodes expectation of cue arrival time. PLoS Biol. 7, e1000167 (2009).
Bueti, D., Bahrami, B., Walsh, V. & Rees, G. Encoding of temporal probabilities in the human brain. J. Neurosci. 30, 4343–4352 (2010).
Tootell, R.B.H. et al. The retinotopy of visual spatial attention. Neuron 21, 1409–1422 (1998).
Lacey, J.I. & Lacey, B.C. Some autonomic-central nervous system interrelationships. in Physiological Correlates of Emotion (ed. Black, P.) 205–227 (Academic Press, New York, 1970).
Schweighofer, N. et al. Humans can adopt optimal discounting strategy under real-time constraints. PLoS Comput. Biol. 2, e152 (2006).
Loewenstein, G. Anticipation and the valuation of delayed consumption. Econ. J. 97, 666–684 (1987).
Duckworth, A.L., Gendler, T.S. & Gross, J.J. Self-control in school-age children. Educ. Psychol. 49, 199–217 (2014).
Jimura, K., Chushak, M.S. & Braver, T.S. Impulsivity and self-control during intertemporal decision making linked to the neural dynamics of reward value representation. J. Neurosci. 33, 344–357 (2013).
Helfinstein, S.M. et al. Predicting risky choices from brain activity patterns. Proc. Natl. Acad. Sci. USA 111, 2470–2475 (2014).
Rushworth, M.F.S., Kolling, N., Sallet, J. & Mars, R.B. Valuation and decision-making in frontal cortex: one or many serial or parallel systems? Curr. Opin. Neurobiol. 22, 946–955 (2012).
Shenhav, A., Straccia, M.A., Cohen, J.D. & Botvinick, M.M. Anterior cingulate engagement in a foraging context reflects choice difficulty, not foraging value. Nat. Neurosci. 17, 1249–1254 (2014).
Blanchard, T.C. & Hayden, B.Y. Neurons in dorsal anterior cingulate cortex signal postdecisional variables in a foraging task. J. Neurosci. 34, 646–655 (2014).
McClure, S.M., Berns, G.S. & Montague, P.R. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38, 339–346 (2003).
Hollerman, J.R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).
Berns, G.S., McClure, S.M., Pagnoni, G. & Montague, P.R. Predictability modulates human brain response to reward. J. Neurosci. 21, 2793–2798 (2001).
Hare, T.A., O'Doherty, J., Camerer, C.F., Schultz, W. & Rangel, A. Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J. Neurosci. 28, 5623–5630 (2008).
Howe, M.W., Tierney, P.L., Sandberg, S.G., Phillips, P.E.M. & Graybiel, A.M. Prolonged dopamine signaling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).
Miyazaki, K.W. et al. Optogenetic activation of dorsal raphe serotonin neurons enhances patience for future rewards. Curr. Biol. 24, 2033–2040 (2014).
Janssen, P. & Shadlen, M.N. A representation of the hazard rate of elapsed time in macaque area LIP. Nat. Neurosci. 8, 234–241 (2005).
Brainard, D.H. The psychophysics toolbox. Spat. Vis. 10, 433–436 (1997).
Pelli, D.G. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat. Vis. 10, 437–442 (1997).
Luhmann, C.C., Chun, M.M., Yi, D.-J., Lee, D. & Wang, X.-J. Neural dissociation of delay and uncertainty in intertemporal choice. J. Neurosci. 28, 14459–14466 (2008).
Ungemach, C., Chater, N. & Stewart, N. Are probabilities overweighted or underweighted when rare outcomes are experienced (rarely)? Psychol. Sci. 20, 473–479 (2009).
Fitzgerald, T.H., Seymour, B., Bach, D.R. & Dolan, R.J. Differentiable neural substrates for learned and described value and risk. Curr. Biol. 20, 1823–1829 (2010).
Hertwig, R., Barron, G., Weber, E.U. & Erev, I. Decisions from experience and the effect of rare events in risky choice. Psychol. Sci. 15, 534–539 (2004).
Kaplan, E.L. & Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481 (1958).
Gibbon, J. Scalar expectancy theory and Weber's law in animal timing. Psychol. Rev. 84, 279–325 (1977).
Rakitin, B.C. et al. Scalar expectancy theory and peak-interval timing in humans. J. Exp. Psychol. Anim. Behav. Process. 24, 15–33 (1998).
Jenkinson, M. & Smith, S. A global optimisation method for robust affine registration of brain images. Med. Image Anal. 5, 143–156 (2001).
Smith, S.M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23, S208–S219 (2004).
Jenkinson, M., Beckmann, C.F., Behrens, T.E.J., Woolrich, M.W. & Smith, S.M. FSL. Neuroimage 62, 782–790 (2012).
Jenkinson, M., Bannister, P., Brady, M. & Smith, S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17, 825–841 (2002).
Cox, R.W. AFNI: What a long strange trip it's been. Neuroimage 62, 743–747 (2012).
Cox, R.W. AFNI: Software for analysis and visualization of functional magnetic resonance neuroimages. Comput. Biomed. Res. 29, 162–173 (1996).
Greve, D.N. & Fischl, B. Accurate and robust brain image alignment using boundary-based registration. Neuroimage 48, 63–72 (2009).
Nichols, T.E. & Holmes, A.P. Nonparametric permutation tests for functional neuroimaging: a primer with examples. Hum. Brain Mapp. 15, 1–25 (2002).
This research was supported by NIH grants DA030870 to J.T.M. and DA029149 to J.W.K.
The authors declare no competing financial interests.
Integrated supplementary information
A secondary behavioral analysis assessed the trajectory of persistence behavior over the course of the experiment in each condition. We estimated the timecourse of willingness to wait (WTW) in each condition across preliminary training, pre-scan practice, and fMRI runs (plot shows mean +/- SEM). Within each pair of adjacent runs (e.g., training runs 1–2), participants differed in whether the HP or LP condition was presented first. Cumulative minutes 0–20 are the preliminary training session, minutes 20–45 are from the day of scanning, and minutes 25–45 represent data collected in the scanner (the data used in all other analyses in the paper).
WTW timecourses were estimated using a nonparametric procedure described previously (McGuire & Kable, 2012). WTW at each point in the experiment was estimated as the longest time waited since the last quit trial. The estimate is necessarily only an approximation; we lack full moment-by-moment information about WTW because reward delivery events censor our observation of participants’ waiting times. This means there can be a lag before increases in WTW are reflected in the estimated timecourse (in particular, the gradual rise at the beginning of individual runs may be an artifact of the estimator).
Mean behavior was stable during scanning. We estimated subject-wise linear slopes in each condition across the fMRI runs (25–45 min) and neither differed significantly from zero (HP: median slope=0.00, IQR -0.22 to 0.13, signed-rank p=0.970; LP: median slope=-0.07, IQR -0.62 to 0.19, signed-rank p=0.391). We also estimated slopes in each run and condition individually (10 tests); there was a significant negative slope in the first training run of the LP condition (signed-rank p<0.001) but not in the other 9 condition × runs (0.06<p<0.97).
We further confirmed the stability of behavior within the fMRI experiment by calculating AUC for each run individually (cf. Fig. 2b). We observed strong Spearman correlations between an individual’s two HP runs (ρn=20=0.81, p<0.001), and LP runs (ρn=20=0.80, p<0.001).
The plot suggests a possible discontinuity between the training session and the fMRI session (at the 20-min point). One possible interpretation is that participants adopted a strategy of exploratory information-gathering during training, when tokens were less valuable, and shifted to a more exploitive strategy when token values increased from 10¢ to 30¢ on the day of the fMRI session. A second possibility is that it took several exposures for participants to learn to disambiguate the two environments.
Our theoretical model computes an awaited reward’s subjective value (SV) as the sum of two terms: expected reward value (EV) and expected time cost (Panels A–C; see Methods, Eq. 2). The EV term factors in the subjective probability of obtaining the reward on the current trial (rather than quitting). The time cost term factors in the number of additional seconds the agent expects to spend waiting on the current trial (before either giving up or receiving the reward). Both terms vary as a function of elapsed delay time, and depend on the agent's temporal expectations and intended giving-up time. The values plotted in Panels A–C assume the ideal giving-up time of 20s in the LP condition. This ideal strategy can be identified by maximizing total SV (Methods, Eq. 3), but cannot be identified by maximizing the terms in B or C individually. (A strategy of always waiting would maximize EV, whereas a strategy of never waiting would minimize the time cost.)
In our paradigm, the different SV trajectories between the HP and LP environments were mainly driven by the EV term. As time passed in the LP environment, it became more likely that the scheduled delay would exceed the agent’s giving-up time and the reward would not be obtained. The dynamic estimate of time cost alone did not follow markedly different trajectories in the two environments.
Accordingly, the SV-related brain responses that we observed in VMPFC could be alternatively described as encoding an EV estimate that evolved dynamically on the basis of temporal expectations. To verify this, we extracted trial-onset-locked timecourses for HRF-convolved EV using the same methods as our main analysis (Panels D–E; see Methods; equivalent results for time cost are shown for reference in Panel F). Although the SV and EV timecourses look dissimilar, the predicted difference between the HP and LP conditions was nearly identical between SV and EV (Panels G–H; median r2 = 0.85, IQR 0.83 to 0.88; time cost results shown for reference in Panel I). A whole-brain analysis of EV effects identified a significant VMPFC cluster similar to the SV-related cluster shown in Fig. 4a (257 voxels, corrected p = 0.020).
Therefore, although the responsiveness of VMPFC to delay-related costs is well established in other tasks, it remains for future work to establish definitively whether VMPFC encodes time costs per se in the willingness-to-wait paradigm. However, an effect either of total SV or of EV alone would support our main conclusion that VMPFC encodes a dynamic and context-dependent value signal in a foraging-like decision context.
Amount of data available at various lags from trial onset. A: Median number of trials available per subject for each timepoint (with IQR). Dashed line marks the 30s window analyzed. B: Number of subjects with data in both environments for each timepoint.
Although our behavioral results demonstrated a direct relationship between persistence and the subjective value of the awaited reward, alternative theoretical frameworks might posit that persistence depends on control processes whose engagement varies inversely with value. For example, it might be necessary to engage control processes in order to sustain persistence when an awaited reward’s value is in doubt.
Our two-tailed model-based contrast (Fig. 4) could in principle have detected BOLD effects negatively related to subjective value, but no such effects were found. The strongest sub-threshold negative cluster was in left superior occipital cortex (37 voxels; corrected p = 0.226).
It is possible that such an analysis would be more sensitive if it were restricted to timepoints when participants went on to continue waiting. For example, control processes might have become more strongly engaged as the awaited reward’s subjective value decreased, provided that persistence was indeed sustained further. To investigate this possibility we conducted a secondary analysis in which BOLD timecourses were estimated only from data that preceded the end of a trial by a margin 5s or more (in contrast to the margin of 1s used in our main analyses). For example, the timepoint coefficient at 30s was estimated only from trials that lasted at least 35s. If control processes were engaged to sustain persistence as subjective value decreased, then cognitive-control-related activations might have emerged as negative BOLD effects of subjective value in this analysis.
A disadvantage of this analysis strategy was that it severely curtailed the amount of available data. The number of available trials at later timepoints was reduced by more than 30% for the median participant (compare Panel A to Supplementary Fig. 3a), and some participants were eliminated entirely (compare Panel B to Supplementary Fig. 3b). Accordingly, neither positive nor negative effects were significant when timecourses estimated in this manner were submitted to a two-tailed model-based contrast (cf. Fig. 4). As in our primary analysis, the strongest sub-threshold negative effect was in a left superior occipital cluster (27 voxels, corrected p = 0.29).
The paucity of data for this analysis was a direct consequence of participants’ overall pattern of successful, value-sensitive behavioral calibration. Participants tended not to persist very long in the absence of a valuable future prospect. An internal control process that enforced persistence in such circumstances would not have been beneficial, at least in the present experimental task.
Occipitoparietal cluster in which the amplitude of the reward-related brain response was positively modulated by the duration of the preceding delay in the HP condition (398 voxels; local peaks in left [-9,-81,12] and right [12,-78,9] calcarine sulci and posterior parietal cortex [12,-78,45]). This region showed a higher-amplitude response to rewards that arrived after longer delays. Rewards at longer delays theoretically involved higher levels of expectancy (Fig. 2c), and were also associated with faster reaction times (Fig. 2d) and larger changes in heart rate (Fig. 7b). The analysis did not find any regions in which reward-related BOLD amplitude decreased as a function of expectancy, a pattern that would be characteristic of a reward prediction error signal.
We recalculated the predictions of our theoretical model using subject-specific empirical estimates of the richness of the environment. Environmental richness is used in our model to define the opportunity cost of time (i.e., the gains one might expect to attain by quitting, akin to abandoning a food patch to forage elsewhere). Because participants’ behavior fell short of optimality, it is reasonable to suppose they had a lower-than-optimal estimate of the richness of the environment.
Actual rates of reward were calculated from the fMRI sessions for each subject in each condition. Median reward rate was 1.10¢/s in the HP environment (range 0.80 to 1.20; optimal=1.22) and 0.63¢/s in the LP environment (range 0.40 to 0.73; optimal=0.82). We calculated performance-based theoretical subjective value trajectories for each subject and condition by using these observed reward rates to define the opportunity cost of time.
Plotted is the median performance-based subjective value trajectory in each condition (with IQR; dotted lines represent optimal trajectories from Fig. 3a). Incorporating the lower-than-optimal empirical reward rates tended to increase the subjective value of waiting, especially early in the delay, because delay time was treated as incurring a smaller opportunity cost. Individual subjects’ performance-based subjective value trajectories across 0–30s were nonetheless highly correlated with the optimal trajectories (HP: median r2=1.00, IQR 1.00 to 1.00; LP: median r2=0.91, IQR 0.84 to 0.92).
The modified procedure for calculating subjective value did not change the results of our fMRI analyses. For each subject we generated synthetic BOLD timecourses encoding performance-based subjective value and passed these through our fMRI analysis to obtain predicted trial-onset-locked BOLD trajectories (analogous to Fig. 3d; see Methods for details). The resulting performance-based difference timecourses (HP minus LP) were highly correlated with the original difference timecourses (median r2=1.00, IQR 0.99 to 1.00), and using this version of the model-derived regressor yielded the same pattern of whole-brain and ROI results described in the main text.
Using performance-based subjective value also did not alter the results for the stochastic behavioral choice model (Fig. 3b). The two variants of the model yielded equivalent fits to the data (difference of model deviances: median=-3.50, IQR -8.22 to 10.72, signed-rank p=0.852).
About this article
Cite this article
McGuire, J., Kable, J. Medial prefrontal cortical activity reflects dynamic re-evaluation during voluntary persistence. Nat Neurosci 18, 760–766 (2015). https://doi.org/10.1038/nn.3994
This article is cited by
Brain Imaging and Behavior (2020)
Scientific Reports (2018)
Nature Communications (2018)
Nature Communications (2018)
Predictive decision making driven by multiple time-linked reward representations in the anterior cingulate cortex
Nature Communications (2016)