Abstract
Investigations of the underlying mechanisms of choice in humans have focused on learning from prediction errors, leaving the computational structure of value based planning comparatively underexplored. Using behavioral and neuroimaging analyses of a minimax decision task, we found that the computational processes underlying forward planning are expressed in the anterior caudate nucleus as values of individual branching steps in a decision tree. In contrast, values represented in the putamen pertain solely to values learned during extensive training. During actual choice, both striatal areas showed a functional coupling to ventromedial prefrontal cortex, consistent with this region acting as a value comparator. Our findings point toward an architecture of choice in which segregated value systems operate in parallel in the striatum for planning and extensively trained choices, with medial prefrontal cortex integrating their outputs.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
A lineage explanation of human normative guidance: the coadaptive model of instrumental rationality and shared intentionality
Synthese Open Access 21 November 2022
-
White matter tracts characteristics in habitual decision-making circuit underlie ritual behaviors in anorexia nervosa
Scientific Reports Open Access 05 August 2021
-
Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning
Nature Communications Open Access 16 December 2019
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, Cambridge, Massachusetts, 1998).
Samuel, A.L. Some studies in machine learning using the game of checkers. IBM J. Res. Develop. 3, 210–229 (1959).
O'Doherty, J.P., Dayan, P., Friston, K., Critchley, H. & Dolan, R.J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003).
Seymour, B. et al. Temporal difference models describe higher-order learning in humans. Nature 429, 664–667 (2004).
Shallice, T. Specific impairments of planning. Phil. Trans. R. Soc. Lond. B 298, 199–209 (1982).
Daw, N.D., Gershman, S.J., Dayan, P., Seymour, B. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Gläscher, J., Daw, N., Dayan, P. & O'Doherty, J.P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Tricomi, E., Balleine, B.W. & O'Doherty, J.P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–2232 (2009).
Tricomi, E.M., Delgado, M.R. & Fiez, J.A. Modulation of caudate activity by action contingency. Neuron 41, 281–292 (2004).
Tanaka, S.C., Balleine, B.W. & O'Doherty, J.P. Calculating consequences: brain systems that encode the causal effects of actions. J. Neurosci. 28, 6750–6755 (2008).
Hampton, A.N., Bossaerts, P. & O'Doherty, J.P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).
Hare, T.A., O'Doherty, J., Camerer, C.F., Schultz, W. & Rangel, A. Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J. Neurosci. 28, 5623–5630 (2008).
Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Lau, B. & Glimcher, P.W. Action and outcome encoding in the primate caudate nucleus. J. Neurosci. 27, 14502–14514 (2007).
Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005).
Boorman, E.D., Behrens, T.E., Woolrich, M.W. & Rushworth, M.F. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62, 733–743 (2009).
Noonan, M.P. et al. Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex. Proc. Natl. Acad. Sci. USA 107, 20547–20552 (2010).
Basten, U., Biele, G., Heekeren, H.R. & Fiebach, C.J. How the brain integrates costs and benefits during decision making. Proc. Natl. Acad. Sci. USA 107, 21767–21772 (2010).
FitzGerald, T.H., Seymour, B. & Dolan, R.J. The role of human orbitofrontal cortex in value comparison for incommensurable objects. J. Neurosci. 29, 8388–8395 (2009).
Stephan, K.E., Penny, W.D., Daunizeau, J., Moran, R.J. & Friston, K.J. Bayesian model selection for group studies. Neuroimage 46, 1004–1017 (2009).
Balleine, B.W. & O'Doherty, J.P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).
Redgrave, P. et al. Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease. Nat. Rev. Neurosci. 11, 760–772 (2010).
Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Doya, K. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12, 961–974 (1999).
Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Knutson, B. & Cooper, J.C. Functional magnetic resonance imaging of reward prediction. Curr. Opin. Neurol. 18, 411–417 (2005).
O'Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).
Berns, G.S., McClure, S.M., Pagnoni, G. & Montague, P.R. Predictability modulates human brain response to reward. J. Neurosci. 21, 2793–2798 (2001).
Yin, H.H., Ostlund, S.B., Knowlton, B.J. & Balleine, B.W. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005).
Sutton, R.S. First results with Dyna, an interesting architecture for learning, planning, and reacting. in Neural Networks for Control (eds. Miller, T, Sutton, R.S. & Werbos, P.) 179–189 (MIT Press, Cambridge, Massachusetts, 1990).
Knutson, B., Taylor, J., Kaufman, M., Peterson, R. & Glover, G. Distributed neural representation of expected value. J. Neurosci. 25, 4806–4812 (2005).
Jueptner, M., Frith, C.D., Brooks, D.J., Frackowiak, R.S. & Passingham, R.E. Anatomy of motor learning. II. Subcortical structures and learning by trial and error. J. Neurophysiol. 77, 1325–1337 (1997).
Lehéricy, S. et al. Distinct basal ganglia territories are engaged in early and advanced motor sequence learning. Proc. Natl. Acad. Sci. USA 102, 12566–12571 (2005).
Barto, A.G. Adaptive critic and the basal ganglia. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 215–232 (MIT Press, Cambridge, Massachusetts, 1995).
Valentin, V.V., Dickinson, A. & O'Doherty, J.P. Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27, 4019–4026 (2007).
Wunderlich, K., Rangel, A. & O'Doherty, J.P. Neural computations underlying action-based decision making in the human brain. Proc. Natl. Acad. Sci. USA 106, 17199–17204 (2009).
Tanaka, S.C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887–893 (2004).
Padoa-Schioppa, C. & Assad, J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Chib, V.S., Rangel, A., Shimojo, S. & O'Doherty, J.P. Evidence for a common representation of decision values for dissimilar goods in human ventromedial prefrontal cortex. J. Neurosci. 29, 12315–12320 (2009).
Plassmann, H., O'Doherty, J. & Rangel, A. Orbitofrontal cortex encodes willingness to pay in everyday economic transactions. J. Neurosci. 27, 9984–9988 (2007).
Wunderlich, K., Rangel, A. & O'Doherty, J.P. Economic choices can be made using only stimulus values. Proc. Natl. Acad. Sci. USA 107, 15005–15010 (2010).
Kable, J.W. & Glimcher, P.W. The neural correlates of subjective value during intertemporal choice. Nat. Neurosci. 10, 1625–1633 (2007).
Fitzgerald, T.H., Seymour, B., Bach, D.R. & Dolan, R.J. Differentiable neural substrates for learned and described value and risk. Curr. Biol. 20, 1823–1829 (2010).
von Neumann, J. & Morgenstern, O. Theory of Games and Economic Behavior (Princeton University Press, 1944).
Dickinson, A. & Balleine, B.W. The role of learning in the operation of motivational systems. in Stevens' Handbook of Experimental Psychology (eds. Pashler, H. & Gallistel, R.) 497–533 (John Wiley & Sons, New York, 2002).
Behrens, T.E., Woolrich, M.W., Walton, M.E. & Rushworth, M.F. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Bellman, R. On the theory of dynamic programming. Proc. Natl. Acad. Sci. USA 38, 716–719 (1952).
Andrade, A., Paradis, A.L., Rouquette, S. & Poline, J.B. Ambiguous results in functional neuroimaging data analysis due to covariate correlation. Neuroimage 10, 483–486 (1999).
Friston, K.J. et al. Psychophysiological and modulatory interactions in neuroimaging. Neuroimage 6, 218–229 (1997).
Friston, K.J., Rotshtein, P., Geng, J.J., Sterzer, P. & Henson, R.N. A critique of functional localisers. Neuroimage 30, 1077–1087 (2006).
Acknowledgements
We thank W. Yoshida and J. Oberg for help with data acquisition, and N. Daw and M. Guitart Masip for their valuable and insightful comments on the manuscript. This study was supported by a Wellcome Trust Program Grant and Max Planck Award (R.J.D. and K.W.) and the Gatsby Charitable Foundation (P.D.). The Wellcome Trust Centre for Neuroimaging is supported by core funding from the Wellcome Trust (091593/Z/10/Z).
Author information
Authors and Affiliations
Contributions
K.W. and P.D. conceived the study. K.W. designed the task, performed the experiments and analyzed the data. K.W., P.D. and R.J.D. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–4 and Supplementary Tables 1–5 (PDF 1000 kb)
Rights and permissions
About this article
Cite this article
Wunderlich, K., Dayan, P. & Dolan, R. Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci 15, 786–791 (2012). https://doi.org/10.1038/nn.3068
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nn.3068
This article is cited by
-
Emotion regulation and the salience network: a hypothetical integrative model of fibromyalgia
Nature Reviews Rheumatology (2023)
-
Explicit knowledge of task structure is a primary determinant of human model-based action
Nature Human Behaviour (2022)
-
A lineage explanation of human normative guidance: the coadaptive model of instrumental rationality and shared intentionality
Synthese (2022)
-
White matter tracts characteristics in habitual decision-making circuit underlie ritual behaviors in anorexia nervosa
Scientific Reports (2021)
-
Polarity of uncertainty representation during exploration and exploitation in ventromedial prefrontal cortex
Nature Human Behaviour (2020)