Mapping value based planning and extensively trained choice in the human brain

Wunderlich, Klaus; Dayan, Peter; Dolan, Raymond J

doi:10.1038/nn.3068

Article
Published: 11 March 2012

Mapping value based planning and extensively trained choice in the human brain

Klaus Wunderlich¹,
Peter Dayan² &
Raymond J Dolan¹

Nature Neuroscience volume 15, pages 786–791 (2012)Cite this article

4304 Accesses
210 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Investigations of the underlying mechanisms of choice in humans have focused on learning from prediction errors, leaving the computational structure of value based planning comparatively underexplored. Using behavioral and neuroimaging analyses of a minimax decision task, we found that the computational processes underlying forward planning are expressed in the anterior caudate nucleus as values of individual branching steps in a decision tree. In contrast, values represented in the putamen pertain solely to values learned during extensive training. During actual choice, both striatal areas showed a functional coupling to ventromedial prefrontal cortex, consistent with this region acting as a value comparator. Our findings point toward an architecture of choice in which segregated value systems operate in parallel in the striatum for planning and extensively trained choices, with medial prefrontal cortex integrating their outputs.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Task and behavioral results.**

**Figure 2: Neural correlates of planning versus extensively trained choices.**

**Figure 3: Comparing values from planning and values from extensively trained mazes.**

**Figure 4: Functional coupling between caudate-vmPFC and putamen-vmPFC is significantly increased during choice in mixed trials.**

Dopamine-independent effect of rewards on choices through hidden-state inference

Article Open access 12 January 2024

Marta Blanco-Pozo, Thomas Akam & Mark E. Walton

Linear reinforcement learning in planning, grid fields, and cognitive control

Article Open access 16 August 2021

Payam Piray & Nathaniel D. Daw

Choice-relevant information transformation along a ventrodorsal axis in the medial prefrontal cortex

Article Open access 10 August 2021

David J.-N. Maisson, Tyler V. Cash-Padgett, … Jan Zimmermann

References

Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, Cambridge, Massachusetts, 1998).
Samuel, A.L. Some studies in machine learning using the game of checkers. IBM J. Res. Develop. 3, 210–229 (1959).
Article Google Scholar
O'Doherty, J.P., Dayan, P., Friston, K., Critchley, H. & Dolan, R.J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003).
Article CAS Google Scholar
Seymour, B. et al. Temporal difference models describe higher-order learning in humans. Nature 429, 664–667 (2004).
Article CAS Google Scholar
Shallice, T. Specific impairments of planning. Phil. Trans. R. Soc. Lond. B 298, 199–209 (1982).
Article CAS Google Scholar
Daw, N.D., Gershman, S.J., Dayan, P., Seymour, B. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Article CAS Google Scholar
Gläscher, J., Daw, N., Dayan, P. & O'Doherty, J.P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Article Google Scholar
Tricomi, E., Balleine, B.W. & O'Doherty, J.P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–2232 (2009).
Article Google Scholar
Tricomi, E.M., Delgado, M.R. & Fiez, J.A. Modulation of caudate activity by action contingency. Neuron 41, 281–292 (2004).
Article CAS Google Scholar
Tanaka, S.C., Balleine, B.W. & O'Doherty, J.P. Calculating consequences: brain systems that encode the causal effects of actions. J. Neurosci. 28, 6750–6755 (2008).
Article CAS Google Scholar
Hampton, A.N., Bossaerts, P. & O'Doherty, J.P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).
Article CAS Google Scholar
Hare, T.A., O'Doherty, J., Camerer, C.F., Schultz, W. & Rangel, A. Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J. Neurosci. 28, 5623–5630 (2008).
Article CAS Google Scholar
Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Article CAS Google Scholar
Lau, B. & Glimcher, P.W. Action and outcome encoding in the primate caudate nucleus. J. Neurosci. 27, 14502–14514 (2007).
Article CAS Google Scholar
Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005).
Article CAS Google Scholar
Boorman, E.D., Behrens, T.E., Woolrich, M.W. & Rushworth, M.F. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62, 733–743 (2009).
Article CAS Google Scholar
Noonan, M.P. et al. Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex. Proc. Natl. Acad. Sci. USA 107, 20547–20552 (2010).
Article CAS Google Scholar
Basten, U., Biele, G., Heekeren, H.R. & Fiebach, C.J. How the brain integrates costs and benefits during decision making. Proc. Natl. Acad. Sci. USA 107, 21767–21772 (2010).
Article CAS Google Scholar
FitzGerald, T.H., Seymour, B. & Dolan, R.J. The role of human orbitofrontal cortex in value comparison for incommensurable objects. J. Neurosci. 29, 8388–8395 (2009).
Article CAS Google Scholar
Stephan, K.E., Penny, W.D., Daunizeau, J., Moran, R.J. & Friston, K.J. Bayesian model selection for group studies. Neuroimage 46, 1004–1017 (2009).
Article Google Scholar
Balleine, B.W. & O'Doherty, J.P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).
Article Google Scholar
Redgrave, P. et al. Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease. Nat. Rev. Neurosci. 11, 760–772 (2010).
Article CAS Google Scholar
Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Article CAS Google Scholar
Doya, K. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12, 961–974 (1999).
Article CAS Google Scholar
Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Article CAS Google Scholar
Knutson, B. & Cooper, J.C. Functional magnetic resonance imaging of reward prediction. Curr. Opin. Neurol. 18, 411–417 (2005).
Article Google Scholar
O'Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).
Article CAS Google Scholar
Berns, G.S., McClure, S.M., Pagnoni, G. & Montague, P.R. Predictability modulates human brain response to reward. J. Neurosci. 21, 2793–2798 (2001).
Article CAS Google Scholar
Yin, H.H., Ostlund, S.B., Knowlton, B.J. & Balleine, B.W. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005).
Article Google Scholar
Sutton, R.S. First results with Dyna, an interesting architecture for learning, planning, and reacting. in Neural Networks for Control (eds. Miller, T, Sutton, R.S. & Werbos, P.) 179–189 (MIT Press, Cambridge, Massachusetts, 1990).
Knutson, B., Taylor, J., Kaufman, M., Peterson, R. & Glover, G. Distributed neural representation of expected value. J. Neurosci. 25, 4806–4812 (2005).
Article CAS Google Scholar
Jueptner, M., Frith, C.D., Brooks, D.J., Frackowiak, R.S. & Passingham, R.E. Anatomy of motor learning. II. Subcortical structures and learning by trial and error. J. Neurophysiol. 77, 1325–1337 (1997).
Article CAS Google Scholar
Lehéricy, S. et al. Distinct basal ganglia territories are engaged in early and advanced motor sequence learning. Proc. Natl. Acad. Sci. USA 102, 12566–12571 (2005).
Article Google Scholar
Barto, A.G. Adaptive critic and the basal ganglia. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 215–232 (MIT Press, Cambridge, Massachusetts, 1995).
Valentin, V.V., Dickinson, A. & O'Doherty, J.P. Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27, 4019–4026 (2007).
Article CAS Google Scholar
Wunderlich, K., Rangel, A. & O'Doherty, J.P. Neural computations underlying action-based decision making in the human brain. Proc. Natl. Acad. Sci. USA 106, 17199–17204 (2009).
Article CAS Google Scholar
Tanaka, S.C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887–893 (2004).
Article CAS Google Scholar
Padoa-Schioppa, C. & Assad, J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Article CAS Google Scholar
Chib, V.S., Rangel, A., Shimojo, S. & O'Doherty, J.P. Evidence for a common representation of decision values for dissimilar goods in human ventromedial prefrontal cortex. J. Neurosci. 29, 12315–12320 (2009).
Article CAS Google Scholar
Plassmann, H., O'Doherty, J. & Rangel, A. Orbitofrontal cortex encodes willingness to pay in everyday economic transactions. J. Neurosci. 27, 9984–9988 (2007).
Article CAS Google Scholar
Wunderlich, K., Rangel, A. & O'Doherty, J.P. Economic choices can be made using only stimulus values. Proc. Natl. Acad. Sci. USA 107, 15005–15010 (2010).
Article CAS Google Scholar
Kable, J.W. & Glimcher, P.W. The neural correlates of subjective value during intertemporal choice. Nat. Neurosci. 10, 1625–1633 (2007).
Article CAS Google Scholar
Fitzgerald, T.H., Seymour, B., Bach, D.R. & Dolan, R.J. Differentiable neural substrates for learned and described value and risk. Curr. Biol. 20, 1823–1829 (2010).
Article CAS Google Scholar
von Neumann, J. & Morgenstern, O. Theory of Games and Economic Behavior (Princeton University Press, 1944).
Dickinson, A. & Balleine, B.W. The role of learning in the operation of motivational systems. in Stevens' Handbook of Experimental Psychology (eds. Pashler, H. & Gallistel, R.) 497–533 (John Wiley & Sons, New York, 2002).
Behrens, T.E., Woolrich, M.W., Walton, M.E. & Rushworth, M.F. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Article CAS Google Scholar
Bellman, R. On the theory of dynamic programming. Proc. Natl. Acad. Sci. USA 38, 716–719 (1952).
Article CAS Google Scholar
Andrade, A., Paradis, A.L., Rouquette, S. & Poline, J.B. Ambiguous results in functional neuroimaging data analysis due to covariate correlation. Neuroimage 10, 483–486 (1999).
Article CAS Google Scholar
Friston, K.J. et al. Psychophysiological and modulatory interactions in neuroimaging. Neuroimage 6, 218–229 (1997).
Article CAS Google Scholar
Friston, K.J., Rotshtein, P., Geng, J.J., Sterzer, P. & Henson, R.N. A critique of functional localisers. Neuroimage 30, 1077–1087 (2006).
Article CAS Google Scholar

Download references

Acknowledgements

We thank W. Yoshida and J. Oberg for help with data acquisition, and N. Daw and M. Guitart Masip for their valuable and insightful comments on the manuscript. This study was supported by a Wellcome Trust Program Grant and Max Planck Award (R.J.D. and K.W.) and the Gatsby Charitable Foundation (P.D.). The Wellcome Trust Centre for Neuroimaging is supported by core funding from the Wellcome Trust (091593/Z/10/Z).

Author information

Authors and Affiliations

Wellcome Trust Center for Neuroimaging, University College London, London, UK
Klaus Wunderlich & Raymond J Dolan
Gatsby Computational Neuroscience Unit, University College London, London, UK
Peter Dayan

Authors

Klaus Wunderlich
View author publications
You can also search for this author in PubMed Google Scholar
Peter Dayan
View author publications
You can also search for this author in PubMed Google Scholar
Raymond J Dolan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.W. and P.D. conceived the study. K.W. designed the task, performed the experiments and analyzed the data. K.W., P.D. and R.J.D. wrote the paper.

Corresponding author

Correspondence to Klaus Wunderlich.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4 and Supplementary Tables 1–5 (PDF 1000 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wunderlich, K., Dayan, P. & Dolan, R. Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci 15, 786–791 (2012). https://doi.org/10.1038/nn.3068

Download citation

Received: 01 December 2011
Accepted: 14 February 2012
Published: 11 March 2012
Issue Date: May 2012
DOI: https://doi.org/10.1038/nn.3068

This article is cited by

Corticostriatal activity related to performance during continuous de novo motor learning
- Sungbeen Park
- Junghyun Kim
- Sungshin Kim
Scientific Reports (2024)
Emotion regulation and the salience network: a hypothetical integrative model of fibromyalgia
- Ana Margarida Pinto
- Rinie Geenen
- José A. P. da Silva
Nature Reviews Rheumatology (2023)
Explicit knowledge of task structure is a primary determinant of human model-based action
- Pedro Castro-Rodrigues
- Thomas Akam
- Albino J. Oliveira-Maia
Nature Human Behaviour (2022)
A lineage explanation of human normative guidance: the coadaptive model of instrumental rationality and shared intentionality
- Ivan Gonzalez-Cabrera
Synthese (2022)
White matter tracts characteristics in habitual decision-making circuit underlie ritual behaviors in anorexia nervosa
- Reza Tadayonnejad
- Fabrizio Pizzagalli
- Jamie D. Feusner
Scientific Reports (2021)