Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Mapping value based planning and extensively trained choice in the human brain

Abstract

Investigations of the underlying mechanisms of choice in humans have focused on learning from prediction errors, leaving the computational structure of value based planning comparatively underexplored. Using behavioral and neuroimaging analyses of a minimax decision task, we found that the computational processes underlying forward planning are expressed in the anterior caudate nucleus as values of individual branching steps in a decision tree. In contrast, values represented in the putamen pertain solely to values learned during extensive training. During actual choice, both striatal areas showed a functional coupling to ventromedial prefrontal cortex, consistent with this region acting as a value comparator. Our findings point toward an architecture of choice in which segregated value systems operate in parallel in the striatum for planning and extensively trained choices, with medial prefrontal cortex integrating their outputs.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Task and behavioral results.
Figure 2: Neural correlates of planning versus extensively trained choices.
Figure 3: Comparing values from planning and values from extensively trained mazes.
Figure 4: Functional coupling between caudate-vmPFC and putamen-vmPFC is significantly increased during choice in mixed trials.

References

  1. Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, Cambridge, Massachusetts, 1998).

  2. Samuel, A.L. Some studies in machine learning using the game of checkers. IBM J. Res. Develop. 3, 210–229 (1959).

    Article  Google Scholar 

  3. O'Doherty, J.P., Dayan, P., Friston, K., Critchley, H. & Dolan, R.J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003).

    Article  CAS  Google Scholar 

  4. Seymour, B. et al. Temporal difference models describe higher-order learning in humans. Nature 429, 664–667 (2004).

    Article  CAS  Google Scholar 

  5. Shallice, T. Specific impairments of planning. Phil. Trans. R. Soc. Lond. B 298, 199–209 (1982).

    Article  CAS  Google Scholar 

  6. Daw, N.D., Gershman, S.J., Dayan, P., Seymour, B. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

    Article  CAS  Google Scholar 

  7. Gläscher, J., Daw, N., Dayan, P. & O'Doherty, J.P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).

    Article  Google Scholar 

  8. Tricomi, E., Balleine, B.W. & O'Doherty, J.P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–2232 (2009).

    Article  Google Scholar 

  9. Tricomi, E.M., Delgado, M.R. & Fiez, J.A. Modulation of caudate activity by action contingency. Neuron 41, 281–292 (2004).

    Article  CAS  Google Scholar 

  10. Tanaka, S.C., Balleine, B.W. & O'Doherty, J.P. Calculating consequences: brain systems that encode the causal effects of actions. J. Neurosci. 28, 6750–6755 (2008).

    Article  CAS  Google Scholar 

  11. Hampton, A.N., Bossaerts, P. & O'Doherty, J.P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).

    Article  CAS  Google Scholar 

  12. Hare, T.A., O'Doherty, J., Camerer, C.F., Schultz, W. & Rangel, A. Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J. Neurosci. 28, 5623–5630 (2008).

    Article  CAS  Google Scholar 

  13. Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).

    Article  CAS  Google Scholar 

  14. Lau, B. & Glimcher, P.W. Action and outcome encoding in the primate caudate nucleus. J. Neurosci. 27, 14502–14514 (2007).

    Article  CAS  Google Scholar 

  15. Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005).

    Article  CAS  Google Scholar 

  16. Boorman, E.D., Behrens, T.E., Woolrich, M.W. & Rushworth, M.F. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62, 733–743 (2009).

    Article  CAS  Google Scholar 

  17. Noonan, M.P. et al. Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex. Proc. Natl. Acad. Sci. USA 107, 20547–20552 (2010).

    Article  CAS  Google Scholar 

  18. Basten, U., Biele, G., Heekeren, H.R. & Fiebach, C.J. How the brain integrates costs and benefits during decision making. Proc. Natl. Acad. Sci. USA 107, 21767–21772 (2010).

    Article  CAS  Google Scholar 

  19. FitzGerald, T.H., Seymour, B. & Dolan, R.J. The role of human orbitofrontal cortex in value comparison for incommensurable objects. J. Neurosci. 29, 8388–8395 (2009).

    Article  CAS  Google Scholar 

  20. Stephan, K.E., Penny, W.D., Daunizeau, J., Moran, R.J. & Friston, K.J. Bayesian model selection for group studies. Neuroimage 46, 1004–1017 (2009).

    Article  Google Scholar 

  21. Balleine, B.W. & O'Doherty, J.P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).

    Article  Google Scholar 

  22. Redgrave, P. et al. Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease. Nat. Rev. Neurosci. 11, 760–772 (2010).

    Article  CAS  Google Scholar 

  23. Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).

    Article  CAS  Google Scholar 

  24. Doya, K. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12, 961–974 (1999).

    Article  CAS  Google Scholar 

  25. Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  Google Scholar 

  26. Knutson, B. & Cooper, J.C. Functional magnetic resonance imaging of reward prediction. Curr. Opin. Neurol. 18, 411–417 (2005).

    Article  Google Scholar 

  27. O'Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).

    Article  CAS  Google Scholar 

  28. Berns, G.S., McClure, S.M., Pagnoni, G. & Montague, P.R. Predictability modulates human brain response to reward. J. Neurosci. 21, 2793–2798 (2001).

    Article  CAS  Google Scholar 

  29. Yin, H.H., Ostlund, S.B., Knowlton, B.J. & Balleine, B.W. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005).

    Article  Google Scholar 

  30. Sutton, R.S. First results with Dyna, an interesting architecture for learning, planning, and reacting. in Neural Networks for Control (eds. Miller, T, Sutton, R.S. & Werbos, P.) 179–189 (MIT Press, Cambridge, Massachusetts, 1990).

  31. Knutson, B., Taylor, J., Kaufman, M., Peterson, R. & Glover, G. Distributed neural representation of expected value. J. Neurosci. 25, 4806–4812 (2005).

    Article  CAS  Google Scholar 

  32. Jueptner, M., Frith, C.D., Brooks, D.J., Frackowiak, R.S. & Passingham, R.E. Anatomy of motor learning. II. Subcortical structures and learning by trial and error. J. Neurophysiol. 77, 1325–1337 (1997).

    Article  CAS  Google Scholar 

  33. Lehéricy, S. et al. Distinct basal ganglia territories are engaged in early and advanced motor sequence learning. Proc. Natl. Acad. Sci. USA 102, 12566–12571 (2005).

    Article  Google Scholar 

  34. Barto, A.G. Adaptive critic and the basal ganglia. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 215–232 (MIT Press, Cambridge, Massachusetts, 1995).

  35. Valentin, V.V., Dickinson, A. & O'Doherty, J.P. Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27, 4019–4026 (2007).

    Article  CAS  Google Scholar 

  36. Wunderlich, K., Rangel, A. & O'Doherty, J.P. Neural computations underlying action-based decision making in the human brain. Proc. Natl. Acad. Sci. USA 106, 17199–17204 (2009).

    Article  CAS  Google Scholar 

  37. Tanaka, S.C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887–893 (2004).

    Article  CAS  Google Scholar 

  38. Padoa-Schioppa, C. & Assad, J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).

    Article  CAS  Google Scholar 

  39. Chib, V.S., Rangel, A., Shimojo, S. & O'Doherty, J.P. Evidence for a common representation of decision values for dissimilar goods in human ventromedial prefrontal cortex. J. Neurosci. 29, 12315–12320 (2009).

    Article  CAS  Google Scholar 

  40. Plassmann, H., O'Doherty, J. & Rangel, A. Orbitofrontal cortex encodes willingness to pay in everyday economic transactions. J. Neurosci. 27, 9984–9988 (2007).

    Article  CAS  Google Scholar 

  41. Wunderlich, K., Rangel, A. & O'Doherty, J.P. Economic choices can be made using only stimulus values. Proc. Natl. Acad. Sci. USA 107, 15005–15010 (2010).

    Article  CAS  Google Scholar 

  42. Kable, J.W. & Glimcher, P.W. The neural correlates of subjective value during intertemporal choice. Nat. Neurosci. 10, 1625–1633 (2007).

    Article  CAS  Google Scholar 

  43. Fitzgerald, T.H., Seymour, B., Bach, D.R. & Dolan, R.J. Differentiable neural substrates for learned and described value and risk. Curr. Biol. 20, 1823–1829 (2010).

    Article  CAS  Google Scholar 

  44. von Neumann, J. & Morgenstern, O. Theory of Games and Economic Behavior (Princeton University Press, 1944).

  45. Dickinson, A. & Balleine, B.W. The role of learning in the operation of motivational systems. in Stevens' Handbook of Experimental Psychology (eds. Pashler, H. & Gallistel, R.) 497–533 (John Wiley & Sons, New York, 2002).

  46. Behrens, T.E., Woolrich, M.W., Walton, M.E. & Rushworth, M.F. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).

    Article  CAS  Google Scholar 

  47. Bellman, R. On the theory of dynamic programming. Proc. Natl. Acad. Sci. USA 38, 716–719 (1952).

    Article  CAS  Google Scholar 

  48. Andrade, A., Paradis, A.L., Rouquette, S. & Poline, J.B. Ambiguous results in functional neuroimaging data analysis due to covariate correlation. Neuroimage 10, 483–486 (1999).

    Article  CAS  Google Scholar 

  49. Friston, K.J. et al. Psychophysiological and modulatory interactions in neuroimaging. Neuroimage 6, 218–229 (1997).

    Article  CAS  Google Scholar 

  50. Friston, K.J., Rotshtein, P., Geng, J.J., Sterzer, P. & Henson, R.N. A critique of functional localisers. Neuroimage 30, 1077–1087 (2006).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank W. Yoshida and J. Oberg for help with data acquisition, and N. Daw and M. Guitart Masip for their valuable and insightful comments on the manuscript. This study was supported by a Wellcome Trust Program Grant and Max Planck Award (R.J.D. and K.W.) and the Gatsby Charitable Foundation (P.D.). The Wellcome Trust Centre for Neuroimaging is supported by core funding from the Wellcome Trust (091593/Z/10/Z).

Author information

Authors and Affiliations

Authors

Contributions

K.W. and P.D. conceived the study. K.W. designed the task, performed the experiments and analyzed the data. K.W., P.D. and R.J.D. wrote the paper.

Corresponding author

Correspondence to Klaus Wunderlich.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4 and Supplementary Tables 1–5 (PDF 1000 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wunderlich, K., Dayan, P. & Dolan, R. Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci 15, 786–791 (2012). https://doi.org/10.1038/nn.3068

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nn.3068

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing