Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

Daw, Nathaniel D; Niv, Yael; Dayan, Peter

doi:10.1038/nn1560

Article
Published: 06 November 2005

Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

Nathaniel D Daw¹,
Yael Niv^1,2 &
Peter Dayan¹

Nature Neuroscience volume 8, pages 1704–1711 (2005)Cite this article

22k Accesses
1486 Citations
30 Altmetric
Metrics details

Abstract

A broad range of neural and behavioral data suggests that the brain contains multiple systems for behavioral choice, including one associated with prefrontal cortex and another with dorsolateral striatum. However, such a surfeit of control raises an additional choice problem: how to arbitrate between the systems when they disagree. Here, we consider dual-action choice systems from a normative perspective, using the computational theory of reinforcement learning. We identify a key trade-off pitting computational simplicity against the flexible and statistically efficient use of experience. The trade-off is realized in a competition between the dorsolateral striatal and prefrontal systems. We suggest a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate. This provides a unifying account of a wealth of experimental evidence about the factors favoring dominance by either system.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Task representations used by tree-search and caching reinforcement learning methods in a discrete-choice, discrete-trial representation of a standard instrumental conditioning task.**

**Figure 2: Behavioral results from reward devaluation experiments in rats.**

**Figure 3: Stylized tree representation of an instrumental conditioning task with two actions (a lever press and a chain pull) for two rewards.**

**Figure 4: Tree estimation at two stages of learning by the tree-search system on the task of Figure 1a.**

**Figure 5: Simulation of the dual-controller reinforcement learning model in the task of Figure 1a.**

**Figure 6: Simulation of the dual-controller reinforcement learning model in the task of Figure 3, in which two different actions produced two different rewards.**

Neural and computational underpinnings of biased confidence in human reinforcement learning

Article Open access 28 October 2023

Chih-Chung Ting, Nahuel Salem-Garcia, … Maël Lebreton

Reinforcement-learning in fronto-striatal circuits

Article 05 August 2021

Bruno Averbeck & John P. O’Doherty

Beyond dichotomies in reinforcement learning

Article 01 September 2020

Anne G. E. Collins & Jeffrey Cockburn

References

Kahneman, D. & Frederick, S. Representativeness revisited: attribute substitution in intuitive judgment. in Heuristics and Biases: the Psychology of Intuitive Judgment (eds. T. Gilovich, D.G. & Kahneman, D.) 49–81 (Cambridge University Press, New York, 2002).
Chapter Google Scholar
Loewenstein, G. & O'Donoghue, T. Animal spirits: affective and deliberative processes in economic behavior. Working Paper 04–14, Center for Analytic Economics, Cornell University (2004).
Lieberman, M.D. Reflective and reflexive judgment processes: a social cognitive neuroscience approach. in Social Judgments: Implicit and Explicit Processes (eds. Forgas, J., Williams, K. & von Hippel, W.) 44–67 (Cambridge University Press, New York, 2003).
Google Scholar
Killcross, S. & Blundell, P. Associative representations of emotionally significant outcomes. in Emotional Cognition: from Brain to Behaviour (eds. Moore, S. & Oaksford, M.) 35–73 (John Benjamins, Amsterdam, 2002).
Chapter Google Scholar
Dickinson, A. & Balleine, B. The role of learning in motivation. in Stevens' Handbook of Experimental Psychology Vol. 3: Learning, Motivation and Emotion 3rd edn. (ed. Gallistel, C.R.) 497–533 (Wiley, New York, 2002).
Google Scholar
Packard, M.G. & Knowlton, B.J. Learning and memory functions of the basal ganglia. Annu. Rev. Neurosci. 25, 563–593 (2002).
Article CAS Google Scholar
Owen, A.M. Cognitive planning in humans: neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog. Neurobiol. 53, 431–450 (1997).
Article CAS Google Scholar
Yin, H.H., Ostlund, S.B., Knowlton, B.J. & Balleine, B.W. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005).
Article Google Scholar
Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V. & Graybiel, A.M. Building neural representations of habits. Science 286, 1745–1749 (1999).
Article CAS Google Scholar
Holland, P.C. & Gallagher, M. Amygdala-frontal interactions and reward expectancy. Curr. Opin. Neurobiol. 14, 148–155 (2004).
Article CAS Google Scholar
Pasupathy, A. & Miller, E.K. Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433, 873–876 (2005).
Article CAS Google Scholar
McClure, S.M., Laibson, D.I., Loewenstein, G. & Cohen, J.D. Separate neural systems value immediate and delayed monetary rewards. Science 306, 503–507 (2004).
Article CAS Google Scholar
O'Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).
Article CAS Google Scholar
Yin, H.H., Knowlton, B.J. & Balleine, B.W. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19, 181–189 (2004).
Article Google Scholar
Balleine, B.W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).
Article CAS Google Scholar
Coutureau, E. & Killcross, S. Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146, 167–174 (2003).
Article Google Scholar
Killcross, S. & Coutureau, E. Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13, 400–408 (2003).
Article Google Scholar
Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, Cambridge, Massachusetts, 1998).
Google Scholar
Houk, J.C., Adams, J.L. & Barto, A.G. A model of how the basal ganglia generate and use neural signals that predict reinforcement. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 249–270 (MIT Press, Cambridge, Massachusetts, 1995).
Google Scholar
Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Article CAS Google Scholar
Houk, J.C. & Wise, S.P. Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. Cereb. Cortex 5, 95–110 (1995).
Article CAS Google Scholar
Dickinson, A. Actions and habits—the development of behavioural autonomy. Phil. Trans. R. Soc. Lond. B 308, 67–78 (1985).
Article Google Scholar
Adams, C.D. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol. 34B, 77–98 (1982).
Article Google Scholar
Faure, A., Haberland, U., Condé, F. & Massioui, N.E. Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J. Neurosci. 25, 2771–2780 (2005).
Article CAS Google Scholar
Colwill, R.M. & Rescorla, R.A. Instrumental responding remains sensitive to reinforcer devaluation after extensive training. J. Exp. Psychol. Anim. Behav. Process. 11, 520–536 (1985).
Article Google Scholar
Holland, P.C. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process. 30, 104–117 (2004).
Article Google Scholar
Balleine, B.W., Garner, C., Gonzalez, F. & Dickinson, A. Motivational control of heterogeneous instrumental chains. J. Exp. Psychol. Anim. Behav. Process. 21, 203–217 (1995).
Article Google Scholar
Holland, P. Amount of training affects associatively-activated event representation. Neuropharmacology 37, 461–469 (1998).
Article CAS Google Scholar
Blundell, P., Hall, G. & Killcross, S. Preserved sensitivity to outcome value after lesions of the basolateral amygdala. J. Neurosci. 23, 7702–7709 (2003).
Article CAS Google Scholar
Balleine, B.W. & Dickinson, A. The effect of lesions of the insular cortex on instrumental conditioning: evidence for a role in incentive memory. J. Neurosci. 20, 8954–8964 (2000).
Article CAS Google Scholar
Izquierdo, A., Suda, R.K. & Murray, E.A. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J. Neurosci. 24, 7540–7548 (2004).
Article CAS Google Scholar
Deneve, S. & Pouget, A. Bayesian multisensory integration and cross-modal spatial links. J. Physiol. (Paris) 98, 249–258 (2004).
Article Google Scholar
Dearden, R., Friedman, N. & Russell, S.J. Bayesian Q-learning. in Proceedings of the 15th National Conference on Artificial Intelligence (AAAI) 761–768 (1998).
Mannor, S., Simester, D., Sun, P. & Tsitsiklis, J.N. Bias and variance in value function estimation. in Proceedings of the 21st International Conference on Machine Learning (ICML) 568–575 (2004).
Nakahara, H., Doya, K. & Hikosaka, O. Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach. J. Cogn. Neurosci. 13, 626–647 (2001).
Article CAS Google Scholar
Tanaka, S.C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887–893 (2004).
Article CAS Google Scholar
Chavarriaga, R., Strosslin, T., Sheynikhovich, D. & Gerstner, W. A computational model of parallel navigation systems in rodents. Neuroinformatics 3, 223–242 (2005).
Article Google Scholar
Doya, K. What are the computations in the cerebellum, the basal ganglia, and the cerebral cortex. Neural Netw. 12, 961–974 (1999).
Article CAS Google Scholar
Suri, R.E. Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model. Exp. Brain Res. 140, 234–240 (2001).
Article CAS Google Scholar
Smith, A.J., Becker, S. & Kapur, S. A computational model of the functional role of the ventral-striatal D2 receptor in the expression of previously acquired behaviors. Neural Comput. 17, 361–395 (2005).
Article Google Scholar
Dayan, P. & Balleine, B.W. Reward, motivation and reinforcement learning. Neuron 36, 285–298 (2002).
Article CAS Google Scholar
Daw, N.D., Courville, A.C. & Touretzky, D.S. Timing and partial observability in the dopamine system. in Advances in Neural Information Processing Systems 15, 99–106 (MIT Press, Cambridge, Massachusetts, 2003).
Google Scholar
Alexander, G.E., Delong, M.R. & Strick, P.L. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357–381 (1986).
Article CAS Google Scholar
Baum, E.B. & Smith, W.D. A Bayesian approach to relevance in game playing. Artificial Intelligence 97, 195–242 (1997).
Article Google Scholar
Pouget, A., Dayan, P. & Zemel, R.S. Inference and computation with population codes. Annu. Rev. Neurosci. 26, 381–410 (2003).
Article CAS Google Scholar
Yu, A.J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).
Article CAS Google Scholar
Holroyd, C.B. & Coles, M.G. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109, 679–709 (2002).
Article Google Scholar
Botvinick, M.M., Cohen, J.D. & Carter, C.S. Conflict monitoring and anterior cingulate cortex: an update. Trends Cogn. Sci. 8, 539–546 (2004).
Article Google Scholar
Hartley, T. & Burgess, N. Complementary memory systems: competition, cooperation and compensation. Trends Neurosci. 28, 169–170 (2005).
Article CAS Google Scholar
Parkinson, J.A., Roberts, A.C., Everitt, B.J. & Di Ciano, P. Acquisition of instrumental conditioned reinforcement is resistant to the devaluation of the unconditioned stimulus. Q. J. Exp. Psychol. B 58, 19–30 (2005).
Article CAS Google Scholar

Download references

Acknowledgements

We are grateful to B. Balleine, A. Courville, A. Dickinson, P. Holland, D. Joel, S. McClure and M. Sahani for discussions. The authors are supported by the Gatsby Foundation, the EU Bayesian Inspired Brain and Artefacts (BIBA) project (P.D., N.D.), a Royal Society USA Research Fellowship (N.D.) and a Dan David Fellowship (Y.N.).

Author information

Authors and Affiliations

Gatsby Computational Neuroscience Unit, University College London, Alexandra House, 17 Queen Square, London, WC1N 3AR, UK
Nathaniel D Daw, Yael Niv & Peter Dayan
Interdisciplinary Center for Neural Computation, Hebrew University, P.O. Box 1255, Jerusalem, 91904, Israel
Yael Niv

Authors

Nathaniel D Daw
View author publications
You can also search for this author in PubMed Google Scholar
Yael Niv
View author publications
You can also search for this author in PubMed Google Scholar
Peter Dayan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathaniel D Daw.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Value propagation in tree search, after 50 steps of learning the task in Figure 1a. (PDF 248 kb)

Supplementary Fig. 2

Example of learning in the cache algorithm, following a single transition from state s to s′ having taken action a. (PDF 306 kb)

Supplementary Methods (PDF 117 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daw, N., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8, 1704–1711 (2005). https://doi.org/10.1038/nn1560

Download citation

Received: 15 April 2005
Accepted: 12 September 2005
Published: 06 November 2005
Issue Date: December 2005
DOI: https://doi.org/10.1038/nn1560

This article is cited by

Disentangling the contribution of individual and social learning processes in human advice-taking behavior
- Maayan Pereg
- Uri Hertz
- Nitzan Shahar
npj Science of Learning (2024)
Learning a covert sequence of effector movements: limits to its acquisition
- Leif Johannsen
- Iring Koch
Psychological Research (2024)
Precedent as a path laid down in walking: Grounding intrinsic normativity in a history of response
- Joshua Rust
Phenomenology and the Cognitive Sciences (2024)
A Sequential Sampling Approach to the Integration of Habits and Goals
- Chao Zhang
- Arlette van Wissen
- Wijnand A. IJsselsteijn
Computational Brain & Behavior (2024)
Time pressure promotes habitual control over goal-directed control among individuals with overweight and obesity
- Yan Jiang
- Jinfeng Han
- Hong Chen
Current Psychology (2024)

Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

Abstract

Access options

Similar content being viewed by others

Neural and computational underpinnings of biased confidence in human reinforcement learning

Reinforcement-learning in fronto-striatal circuits

Beyond dichotomies in reinforcement learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Fig. 1

Supplementary Fig. 2

Supplementary Methods (PDF 117 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

Disentangling the contribution of individual and social learning processes in human advice-taking behavior

Learning a covert sequence of effector movements: limits to its acquisition

Precedent as a path laid down in walking: Grounding intrinsic normativity in a history of response

A Sequential Sampling Approach to the Integration of Habits and Goals

Time pressure promotes habitual control over goal-directed control among individuals with overweight and obesity

Are we of two minds?

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links