Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops

Abstract

Evaluation of both immediate and future outcomes of one's actions is a critical requirement for intelligent behavior. Using functional magnetic resonance imaging (fMRI), we investigated brain mechanisms for reward prediction at different time scales in a Markov decision task. When human subjects learned actions on the basis of immediate rewards, significant activity was seen in the lateral orbitofrontal cortex and the striatum. When subjects learned to act in order to obtain large future rewards while incurring small immediate losses, the dorsolateral prefrontal cortex, inferior parietal cortex, dorsal raphe nucleus and cerebellum were also activated. Computational model–based regression analysis using the predicted future rewards and prediction errors estimated from subjects' performance data revealed graded maps of time scale within the insula and the striatum: ventroanterior regions were involved in predicting immediate rewards and dorsoposterior regions were involved in predicting future rewards. These results suggest differential involvement of the cortico-basal ganglia loops in reward prediction at different time scales.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Experimental design.
Figure 2: Task schedule and behavioral results.
Figure 3: Brain areas activated in the SHORT versus NO contrast (P < 0.001, uncorrected; extent threshold of four voxels).
Figure 4: Brain areas activated in the LONG versus SHORT contrast (P < 0.0001, uncorrected; extent threshold of four voxels for illustration purposes).
Figure 5: Comparison of brain areas activated in the SHORT versus NO contrast (red) and the LONG versus SHORT contrast (blue).
Figure 6: Voxels with a significant correlation (height threshold P < 0.001, uncorrected; extent threshold of four voxels) with reward prediction V(t) and prediction error δ(t) are shown in different colors for different settings of the discount factor γ.

References

  1. 1

    Bechara, A., Damasio, H. & Damasio, A.R. Emotion, decision making and the orbitofrontal cortex. Cereb. Cortex 10, 295–307 (2000).

    CAS  Article  PubMed  Google Scholar 

  2. 2

    Mobini, S. et al. Effects of lesions of the orbitofrontal cortex on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl.) 160, 290–298 (2002).

    CAS  Article  Google Scholar 

  3. 3

    Cardinal, R.N., Pennicott, D.R., Sugathapala, C.L., Robbins, T.W. & Everitt, B.J. Impulsive choice induced in rats by lesions of the nucleus accumbens core. Science 292, 2499–2501 (2001).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. 4

    Rogers, R.D. et al. Dissociable deficits in the decision-making cognition of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: evidence for monoaminergic mechanisms. Neuropsychopharmacology 20, 322–339 (1999).

    CAS  Article  Google Scholar 

  5. 5

    Evenden, J.L. & Ryan, C.N. The pharmacology of impulsive behaviour in rats: the effects of drugs on response choice with varying delays of reinforcement. Psychopharmacology (Berl.) 128, 161–170 (1996).

    CAS  Article  Google Scholar 

  6. 6

    Mobini, S., Chiang, T.J., Ho, M.Y., Bradshaw, C.M. & Szabadi, E. Effects of central 5-hydroxytryptamine depletion on sensitivity to delayed and probabilistic reinforcement. Psychopharmacology (Berl.) 152, 390–397 (2000).

    CAS  Article  Google Scholar 

  7. 7

    Doya, K. Metalearning and neuromodulation. Neural Net. 15, 495–506 (2002).

    Article  Google Scholar 

  8. 8

    Berns, G.S., McClure, S.M., Pagnoni, G. & Montague, P.R. Predictability modulates human brain response to reward. J. Neurosci. 21, 2793–2798 (2001).

    CAS  Article  Google Scholar 

  9. 9

    Breiter, H.C., Aharon, I., Kahneman, D., Dale, A. & Shizgal, P. Functional imaging of neural responses to expectancy and experience of monetary gains and losses. Neuron 30, 619–639 (2001).

    CAS  Article  Google Scholar 

  10. 10

    O'Doherty, J.P., Deichmann, R., Critchley, H.D. & Dolan, R.J. Neural responses during anticipation of a primary taste reward. Neuron 33, 815–826 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11

    O'Doherty, J.P., Dayan, P., Friston, K., Critchley, H. & Dolan, R.J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003).

    CAS  Article  Google Scholar 

  12. 12

    Sutton, R.S. & Barto, A.G. Reinforcement Learning (MIT Press, Cambridge, Massachusetts, 1998).

    Google Scholar 

  13. 13

    Houk, J.C., Adams, J.L. & Barto, A.G. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 249–270 (MIT Press, Cambridge, Massachusetts, 1995).

    Google Scholar 

  14. 14

    Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    CAS  Article  Google Scholar 

  15. 15

    Doya, K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr. Opin. Neurobiol. 10, 732–739 (2000).

    CAS  Article  PubMed  Google Scholar 

  16. 16

    McClure, S.M., Berns, G.S. & Montague, P.R. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38, 339–346 (2003).

    CAS  Article  Google Scholar 

  17. 17

    Mesulam, M.M. & Mufson, E.J. Insula of the old world monkey. III: Efferent cortical output and comments on function. J. Comp. Neurol. 212, 38–52 (1982).

    CAS  Article  Google Scholar 

  18. 18

    Cavada, C., Company, T., Tejedor, J., Cruz-Rizzolo, R.J. & Reinoso-Suarez, F. The anatomical connections of the macaque monkey orbitofrontal cortex. Cereb. Cortex 10, 220–242 (2000).

    CAS  Article  PubMed  Google Scholar 

  19. 19

    Chikama, M., McFarland, N.R., Amaral, D.G. & Haber, S.N. Insular cortical projections to functional regions of the striatum correlate with cortical cytoarchitectonic organization in the primate. J. Neurosci. 17, 9686–9705 (1997).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. 20

    Balleine, B.W. & Dickinson, A. The effect of lesions of the insular cortex on instrumental conditioning: evidence for a role in incentive memory. J. Neurosci. 20, 8954–8964 (2000).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21

    Knutson, B., Fong, G.W., Bennett, S.M., Adams, C.M. & Hommer, D. A region of mesial prefrontal cortex tracks monetarily rewarding outcomes: characterization with rapid event-related fMRI. Neuroimage 18, 263–272 (2003).

    Article  PubMed  Google Scholar 

  22. 22

    Ullsperger, M. & von Cramon, D.Y. Error monitoring using external feedback: specific roles of the habenular complex, the reward system, and the cingulate motor area revealed by functional magnetic resonance imaging. J. Neurosci. 23, 4308–4314 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23

    O'Doherty, J., Critchley, H., Deichmann, R. & Dolan, R.J. Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. J. Neurosci. 23, 7931–7939 (2003).

    CAS  Article  PubMed  Google Scholar 

  24. 24

    Koepp, M.J. et al. Evidence for striatal dopamine release during a video game. Nature 393, 266–268 (1998).

    CAS  Article  PubMed  Google Scholar 

  25. 25

    Elliott, R., Friston, K.J. & Dolan, R.J. Dissociable neural responses in human reward systems. J. Neurosci. 20, 6159–6165 (2000).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26

    Knutson, B., Adams, C.M., Fong, G.W. & Hommer, D. Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J. Neurosci. 21, RC159 (2001).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27

    Pagnoni, G., Zink, C.F., Montague, P.R. & Berns, G.S. Activity in human ventral striatum locked to errors of reward prediction. Nat. Neurosci. 5, 97–98 (2002).

    CAS  Article  PubMed  Google Scholar 

  28. 28

    Elliott, R., Newman, J.L., Longe, O.A. & Deakin, J.F. Differential response patterns in the striatum and orbitofrontal cortex to financial reward in humans: a parametric functional magnetic resonance imaging study. J. Neurosci. 23, 303–307 (2003).

    CAS  Article  Google Scholar 

  29. 29

    Haruno, M. et al. A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J. Neurosci. 24, 1660–1665 (2004).

    CAS  Article  PubMed  Google Scholar 

  30. 30

    Reynolds, J.N. & Wickens, J.R. Dopamine-dependent plasticity of corticostriatal synapses. Neural Net. 15, 507–521 (2002).

    Article  Google Scholar 

  31. 31

    Tremblay, L. & Schultz, W. Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. J. Neurophysiol. 83, 1864–1876 (2000).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32

    Critchley, H.D., Mathias, C.J. & Dolan, R.J. Neural activity in the human brain relating to uncertainty and arousal during anticipation. Neuron 29, 537–545 (2001).

    CAS  Article  Google Scholar 

  33. 33

    Rogers, R.D. et al. Choosing between small, likely rewards and large, unlikely rewards activates inferior and orbital prefrontal cortex. J. Neurosci. 19, 9029–9038 (1999).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. 34

    Rolls, E.T. The orbitofrontal cortex and reward. Cereb. Cortex 10, 284–294 (2000).

    CAS  Article  Google Scholar 

  35. 35

    Hanakawa, T. et al. The role of rostral Brodmann area 6 in mental-operation tasks: an integrative neuroimaging approach. Cereb. Cortex 12, 1157–1170 (2002).

    Article  PubMed  Google Scholar 

  36. 36

    Owen, A.M., Doyon, J., Petrides, M. & Evans, A.C. Planning and spatial working memory: a positron emission tomography study in humans. Eur. J. Neurosci. 8, 353–364 (1996).

    CAS  Article  PubMed  Google Scholar 

  37. 37

    Baker, S.C. et al. Neural systems engaged by planning: a PET study of the Tower of London task. Neuropsychologia 34, 515–526 (1996).

    CAS  Article  PubMed  Google Scholar 

  38. 38

    Middleton, F.A. & Strick, P.L. Basal ganglia and cerebellar loops: motor and cognitive circuits. Brain Res. Brain Res. Rev. 31, 236–250 (2000).

    CAS  Article  PubMed  Google Scholar 

  39. 39

    Haber, S.N., Kunishio, K., Mizobuchi, M. & Lynd-Balta, E. The orbital and medial prefrontal circuit through the primate basal ganglia. J. Neurosci. 15, 4851–4867 (1995).

    CAS  Article  PubMed  Google Scholar 

  40. 40

    Eagle, D.M., Humby, T., Dunnett, S.B. & Robbins, T.W. Effects of regional striatal lesions on motor, motivational, and executive aspects of progressive-ratio performance in rats. Behav. Neurosci. 113, 718–731 (1999).

    CAS  Article  PubMed  Google Scholar 

  41. 41

    Pears, A., Parkinson, J.A., Hopewell, L., Everitt, B.J. & Roberts, A.C. Lesions of the orbitofrontal but not medial prefrontal cortex disrupt conditioned reinforcement in primates. J. Neurosci. 23, 11189–11201 (2003).

    CAS  Article  PubMed  Google Scholar 

  42. 42

    Hikosaka, O. et al. Parallel neural networks for learning sequential procedures. Trends Neurosci. 22, 464–471 (1999).

    CAS  Article  PubMed  Google Scholar 

  43. 43

    Mijnster, M.J. et al. Regional and cellular distribution of serotonin 5-hydroxytryptamine2a receptor mRNA in the nucleus accumbens, olfactory tubercle, and caudate putamen of the rat. J. Comp. Neurol. 389, 1–11 (1997).

    CAS  Article  PubMed  Google Scholar 

  44. 44

    Compan, V., Segu, L., Buhot, M.C. & Daszuta, A. Selective increases in serotonin 5-HT1B/1D and 5-HT2A/2C binding sites in adult rat basal ganglia following lesions of serotonergic neurons. Brain Res. 793, 103–111 (1998).

    CAS  Article  PubMed  Google Scholar 

  45. 45

    Celada, P., Puig, M.V., Casanovas, J.M., Guillazo, G. & Artigas, F. Control of dorsal raphe serotonergic neurons by the medial prefrontal cortex: involvement of serotonin-1A, GABA(A), and glutamate receptors. J. Neurosci. 21, 9917–9929 (2001).

    CAS  Article  PubMed  Google Scholar 

  46. 46

    Martin-Ruiz, R. et al. Control of serotonergic function in medial prefrontal cortex by serotonin-2A receptors through a glutamate-dependent mechanism. J. Neurosci. 21, 9856–9866 (2001).

    CAS  Article  PubMed  Google Scholar 

  47. 47

    Hikosaka, K. & Watanabe, M. Delay activity of orbital and lateral prefrontal neurons of the monkey varying with different rewards. Cereb. Cortex 10, 263–271 (2000).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  48. 48

    Shidara, M. & Richmond, B.J. Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296, 1709–1711 (2002).

    Article  Google Scholar 

  49. 49

    Matsumoto, K., Suzuki, W. & Tanaka, K. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301, 229–232 (2003).

    CAS  Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank K. Samejima, N. Schweighofer, M. Haruno, H. Imamizu, S. Higuchi, T. Yoshioka, T. Chaminade and M. Kawato for helpful discussions and technical advice. This research was funded by 'Creating the Brain,' Core Research for Evolutional Science and Technology (CREST), Japan Science and Technology Agency.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kenji Doya.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figure 1

An example of the time series of the explanatory variables for one subject. (PDF 34 kb)

Supplementary Figure 2

A schematic diagram of the brain areas involved in reward prediction at different time scales. The dotted lines indicate a cortico-cortico connection, and the green arrows indicate the serotonergic pathways from the dorsal raphe. The 'limbic loop' (including lateral OFC and ventral striatum) is involved in short-term reward prediction. The 'cognitive and motor loops' (including DLPFC, PMd and dorsal striatum) are involved in long-term reward prediction. Ventroanterior-to-dorsoposterior topographical projections from the insula to the striatum are involved in short-to-long-term reward prediction (rainbow-colored arrow). The mPFC and dorsal raphe, which are reciprocally connected, may regulate these loops by cortico-cortical and cortico-striatal projections from mPFC and serotonergic projections from dorsal raphe. SNr, substantia nigra pars reticulate. (PDF 415 kb)

Supplementary Table 1

Areas significantly activated in the block-design analysis. (PDF 11 kb)

Supplementary Table 2

Areas with significant correlation with reward prediction V(t) estimated with different discount factors γ. (PDF 15 kb)

Supplementary Table 3

Voxels with significant correlation with reward prediction error δ(t) estimated with different discount factors γ. (PDF 15 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tanaka, S., Doya, K., Okada, G. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat Neurosci 7, 887–893 (2004). https://doi.org/10.1038/nn1279

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing