Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

Abstract

A broad range of neural and behavioral data suggests that the brain contains multiple systems for behavioral choice, including one associated with prefrontal cortex and another with dorsolateral striatum. However, such a surfeit of control raises an additional choice problem: how to arbitrate between the systems when they disagree. Here, we consider dual-action choice systems from a normative perspective, using the computational theory of reinforcement learning. We identify a key trade-off pitting computational simplicity against the flexible and statistically efficient use of experience. The trade-off is realized in a competition between the dorsolateral striatal and prefrontal systems. We suggest a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate. This provides a unifying account of a wealth of experimental evidence about the factors favoring dominance by either system.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Task representations used by tree-search and caching reinforcement learning methods in a discrete-choice, discrete-trial representation of a standard instrumental conditioning task.
Figure 2: Behavioral results from reward devaluation experiments in rats.
Figure 3: Stylized tree representation of an instrumental conditioning task with two actions (a lever press and a chain pull) for two rewards.
Figure 4: Tree estimation at two stages of learning by the tree-search system on the task of Figure 1a.
Figure 5: Simulation of the dual-controller reinforcement learning model in the task of Figure 1a.
Figure 6: Simulation of the dual-controller reinforcement learning model in the task of Figure 3, in which two different actions produced two different rewards.

References

  1. 1

    Kahneman, D. & Frederick, S. Representativeness revisited: attribute substitution in intuitive judgment. in Heuristics and Biases: the Psychology of Intuitive Judgment (eds. T. Gilovich, D.G. & Kahneman, D.) 49–81 (Cambridge University Press, New York, 2002).

    Google Scholar 

  2. 2

    Loewenstein, G. & O'Donoghue, T. Animal spirits: affective and deliberative processes in economic behavior. Working Paper 04–14, Center for Analytic Economics, Cornell University (2004).

  3. 3

    Lieberman, M.D. Reflective and reflexive judgment processes: a social cognitive neuroscience approach. in Social Judgments: Implicit and Explicit Processes (eds. Forgas, J., Williams, K. & von Hippel, W.) 44–67 (Cambridge University Press, New York, 2003).

    Google Scholar 

  4. 4

    Killcross, S. & Blundell, P. Associative representations of emotionally significant outcomes. in Emotional Cognition: from Brain to Behaviour (eds. Moore, S. & Oaksford, M.) 35–73 (John Benjamins, Amsterdam, 2002).

    Google Scholar 

  5. 5

    Dickinson, A. & Balleine, B. The role of learning in motivation. in Stevens' Handbook of Experimental Psychology Vol. 3: Learning, Motivation and Emotion 3rd edn. (ed. Gallistel, C.R.) 497–533 (Wiley, New York, 2002).

    Google Scholar 

  6. 6

    Packard, M.G. & Knowlton, B.J. Learning and memory functions of the basal ganglia. Annu. Rev. Neurosci. 25, 563–593 (2002).

    CAS  Article  Google Scholar 

  7. 7

    Owen, A.M. Cognitive planning in humans: neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog. Neurobiol. 53, 431–450 (1997).

    CAS  Article  Google Scholar 

  8. 8

    Yin, H.H., Ostlund, S.B., Knowlton, B.J. & Balleine, B.W. The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005).

    Article  Google Scholar 

  9. 9

    Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V. & Graybiel, A.M. Building neural representations of habits. Science 286, 1745–1749 (1999).

    CAS  Article  Google Scholar 

  10. 10

    Holland, P.C. & Gallagher, M. Amygdala-frontal interactions and reward expectancy. Curr. Opin. Neurobiol. 14, 148–155 (2004).

    CAS  Article  Google Scholar 

  11. 11

    Pasupathy, A. & Miller, E.K. Different time courses of learning-related activity in the prefrontal cortex and striatum. Nature 433, 873–876 (2005).

    CAS  Article  Google Scholar 

  12. 12

    McClure, S.M., Laibson, D.I., Loewenstein, G. & Cohen, J.D. Separate neural systems value immediate and delayed monetary rewards. Science 306, 503–507 (2004).

    CAS  Article  Google Scholar 

  13. 13

    O'Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).

    CAS  Article  Google Scholar 

  14. 14

    Yin, H.H., Knowlton, B.J. & Balleine, B.W. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19, 181–189 (2004).

    Article  Google Scholar 

  15. 15

    Balleine, B.W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).

    CAS  Article  Google Scholar 

  16. 16

    Coutureau, E. & Killcross, S. Inactivation of the infralimbic prefrontal cortex reinstates goal-directed responding in overtrained rats. Behav. Brain Res. 146, 167–174 (2003).

    Article  Google Scholar 

  17. 17

    Killcross, S. & Coutureau, E. Coordination of actions and habits in the medial prefrontal cortex of rats. Cereb. Cortex 13, 400–408 (2003).

    Article  Google Scholar 

  18. 18

    Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, Cambridge, Massachusetts, 1998).

    Google Scholar 

  19. 19

    Houk, J.C., Adams, J.L. & Barto, A.G. A model of how the basal ganglia generate and use neural signals that predict reinforcement. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 249–270 (MIT Press, Cambridge, Massachusetts, 1995).

    Google Scholar 

  20. 20

    Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    CAS  Article  Google Scholar 

  21. 21

    Houk, J.C. & Wise, S.P. Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. Cereb. Cortex 5, 95–110 (1995).

    CAS  Article  Google Scholar 

  22. 22

    Dickinson, A. Actions and habits—the development of behavioural autonomy. Phil. Trans. R. Soc. Lond. B 308, 67–78 (1985).

    Article  Google Scholar 

  23. 23

    Adams, C.D. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol. 34B, 77–98 (1982).

    Article  Google Scholar 

  24. 24

    Faure, A., Haberland, U., Condé, F. & Massioui, N.E. Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J. Neurosci. 25, 2771–2780 (2005).

    CAS  Article  Google Scholar 

  25. 25

    Colwill, R.M. & Rescorla, R.A. Instrumental responding remains sensitive to reinforcer devaluation after extensive training. J. Exp. Psychol. Anim. Behav. Process. 11, 520–536 (1985).

    Article  Google Scholar 

  26. 26

    Holland, P.C. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process. 30, 104–117 (2004).

    Article  Google Scholar 

  27. 27

    Balleine, B.W., Garner, C., Gonzalez, F. & Dickinson, A. Motivational control of heterogeneous instrumental chains. J. Exp. Psychol. Anim. Behav. Process. 21, 203–217 (1995).

    Article  Google Scholar 

  28. 28

    Holland, P. Amount of training affects associatively-activated event representation. Neuropharmacology 37, 461–469 (1998).

    CAS  Article  Google Scholar 

  29. 29

    Blundell, P., Hall, G. & Killcross, S. Preserved sensitivity to outcome value after lesions of the basolateral amygdala. J. Neurosci. 23, 7702–7709 (2003).

    CAS  Article  Google Scholar 

  30. 30

    Balleine, B.W. & Dickinson, A. The effect of lesions of the insular cortex on instrumental conditioning: evidence for a role in incentive memory. J. Neurosci. 20, 8954–8964 (2000).

    CAS  Article  Google Scholar 

  31. 31

    Izquierdo, A., Suda, R.K. & Murray, E.A. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J. Neurosci. 24, 7540–7548 (2004).

    CAS  Article  Google Scholar 

  32. 32

    Deneve, S. & Pouget, A. Bayesian multisensory integration and cross-modal spatial links. J. Physiol. (Paris) 98, 249–258 (2004).

    Article  Google Scholar 

  33. 33

    Dearden, R., Friedman, N. & Russell, S.J. Bayesian Q-learning. in Proceedings of the 15th National Conference on Artificial Intelligence (AAAI) 761–768 (1998).

  34. 34

    Mannor, S., Simester, D., Sun, P. & Tsitsiklis, J.N. Bias and variance in value function estimation. in Proceedings of the 21st International Conference on Machine Learning (ICML) 568–575 (2004).

  35. 35

    Nakahara, H., Doya, K. & Hikosaka, O. Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach. J. Cogn. Neurosci. 13, 626–647 (2001).

    CAS  Article  Google Scholar 

  36. 36

    Tanaka, S.C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887–893 (2004).

    CAS  Article  Google Scholar 

  37. 37

    Chavarriaga, R., Strosslin, T., Sheynikhovich, D. & Gerstner, W. A computational model of parallel navigation systems in rodents. Neuroinformatics 3, 223–242 (2005).

    Article  Google Scholar 

  38. 38

    Doya, K. What are the computations in the cerebellum, the basal ganglia, and the cerebral cortex. Neural Netw. 12, 961–974 (1999).

    CAS  Article  Google Scholar 

  39. 39

    Suri, R.E. Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model. Exp. Brain Res. 140, 234–240 (2001).

    CAS  Article  Google Scholar 

  40. 40

    Smith, A.J., Becker, S. & Kapur, S. A computational model of the functional role of the ventral-striatal D2 receptor in the expression of previously acquired behaviors. Neural Comput. 17, 361–395 (2005).

    Article  Google Scholar 

  41. 41

    Dayan, P. & Balleine, B.W. Reward, motivation and reinforcement learning. Neuron 36, 285–298 (2002).

    CAS  Article  Google Scholar 

  42. 42

    Daw, N.D., Courville, A.C. & Touretzky, D.S. Timing and partial observability in the dopamine system. in Advances in Neural Information Processing Systems 15, 99–106 (MIT Press, Cambridge, Massachusetts, 2003).

    Google Scholar 

  43. 43

    Alexander, G.E., Delong, M.R. & Strick, P.L. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357–381 (1986).

    CAS  Article  Google Scholar 

  44. 44

    Baum, E.B. & Smith, W.D. A Bayesian approach to relevance in game playing. Artificial Intelligence 97, 195–242 (1997).

    Article  Google Scholar 

  45. 45

    Pouget, A., Dayan, P. & Zemel, R.S. Inference and computation with population codes. Annu. Rev. Neurosci. 26, 381–410 (2003).

    CAS  Article  Google Scholar 

  46. 46

    Yu, A.J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).

    CAS  Article  Google Scholar 

  47. 47

    Holroyd, C.B. & Coles, M.G. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109, 679–709 (2002).

    Article  Google Scholar 

  48. 48

    Botvinick, M.M., Cohen, J.D. & Carter, C.S. Conflict monitoring and anterior cingulate cortex: an update. Trends Cogn. Sci. 8, 539–546 (2004).

    Article  Google Scholar 

  49. 49

    Hartley, T. & Burgess, N. Complementary memory systems: competition, cooperation and compensation. Trends Neurosci. 28, 169–170 (2005).

    CAS  Article  Google Scholar 

  50. 50

    Parkinson, J.A., Roberts, A.C., Everitt, B.J. & Di Ciano, P. Acquisition of instrumental conditioned reinforcement is resistant to the devaluation of the unconditioned stimulus. Q. J. Exp. Psychol. B 58, 19–30 (2005).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We are grateful to B. Balleine, A. Courville, A. Dickinson, P. Holland, D. Joel, S. McClure and M. Sahani for discussions. The authors are supported by the Gatsby Foundation, the EU Bayesian Inspired Brain and Artefacts (BIBA) project (P.D., N.D.), a Royal Society USA Research Fellowship (N.D.) and a Dan David Fellowship (Y.N.).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nathaniel D Daw.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Value propagation in tree search, after 50 steps of learning the task in Figure 1a. (PDF 248 kb)

Supplementary Fig. 2

Example of learning in the cache algorithm, following a single transition from state s to s′ having taken action a. (PDF 306 kb)

Supplementary Methods (PDF 117 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Daw, N., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8, 1704–1711 (2005). https://doi.org/10.1038/nn1560

Download citation

Further reading

Search

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing