Motivational neural circuits underlying reinforcement learning

Article metrics

Subjects

Abstract

Reinforcement learning (RL) is the behavioral process of learning the values of actions and objects. Most models of RL assume that the dopaminergic prediction error signal drives plasticity in frontal–striatal circuits. The striatum then encodes value representations that drive decision processes. However, the amygdala has also been shown to play an important role in forming Pavlovian stimulus–outcome associations. These Pavlovian associations can drive motivated behavior via the amygdala projections to the ventral striatum or the ventral tegmental area. The amygdala may, therefore, play a central role in RL. Here we compare the contributions of the amygdala and the striatum to RL and show that both the amygdala and striatum learn and represent expected values in RL tasks. Furthermore, value representations in the striatum may be inherited, to some extent, from the amygdala. The striatum may, therefore, play less of a primary role in learning stimulus–outcome associations in RL than previously suggested.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Comparison of neural encoding of value between the lateral prefrontal cortex (lPFC) and the DS.
Figure 2: Choice probability during learning.
Figure 3: Learning in amygdala-lesioned, VS-lesioned and control animals.
Figure 4: Schematic diagram of the interactions between the amygdala and striatum as well as the roles of each structure in RL.

Kim Caesar/Springer Nature

References

  1. 1

    Johansen, J.P. et al. Optical activation of lateral amygdala pyramidal cells instructs associative fear learning. Proc. Natl. Acad. Sci. USA 107, 12692–12697 (2010).

  2. 2

    Nabavi, S. et al. Engineering a memory with LTD and LTP. Nature 511, 348–352 (2014).

  3. 3

    Belova, M.A., Paton, J.J. & Salzman, C.D. Moment-to-moment tracking of state value in the amygdala. J. Neurosci. 28, 10023–10030 (2008).

  4. 4

    Cardinal, R.N., Parkinson, J.A., Hall, J. & Everitt, B.J. Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci. Biobehav. Rev. 26, 321–352 (2002).

  5. 5

    Hampton, A.N., Adolphs, R., Tyszka, M.J. & O'Doherty, J.P. Contributions of the amygdala to reward expectancy and choice signals in human prefrontal cortex. Neuron 55, 545–555 (2007).

  6. 6

    Costa, V.D., Dal Monte, O., Lucas, D.R., Murray, E.A. & Averbeck, B.B. Amygdala and Ventral Striatum Make Distinct Contributions to Reinforcement Learning. Neuron 92, 505–517 (2016).

  7. 7

    Rosenkranz, J.A. & Grace, A.A. Dopamine-mediated modulation of odour-evoked amygdala potentials during pavlovian conditioning. Nature 417, 282–287 (2002).

  8. 8

    Johansen, J.P. et al. Hebbian and neuromodulatory mechanisms interact to trigger associative memory formation. Proc. Natl. Acad. Sci. USA 111, E5584–E5592 (2014).

  9. 9

    Stuber, G.D. et al. Excitatory transmission from the amygdala to nucleus accumbens facilitates reward seeking. Nature 475, 377–380 (2011).

  10. 10

    Ambroggi, F., Ishikawa, A., Fields, H.L. & Nicola, S.M. Basolateral amygdala neurons facilitate reward-seeking behavior by exciting nucleus accumbens neurons. Neuron 59, 648–661 (2008).

  11. 11

    Corbit, L.H. & Balleine, B.W. Double dissociation of basolateral and central amygdala lesions on the general and outcome-specific forms of Pavlovian-instrumental transfer. J. Neurosci. 25, 962–970 (2005).

  12. 12

    Seo, M., Lee, E. & Averbeck, B.B. Action selection and action value in frontal-striatal circuits. Neuron 74, 947–960 (2012).

  13. 13

    Lee, E., Seo, M., Dal Monte, O. & Averbeck, B.B. Injection of a dopamine type 2 receptor antagonist into the dorsal striatum disrupts choices driven by previous outcomes, but not perceptual inference. J. Neurosci. 35, 6298–6306 (2015).

  14. 14

    Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005).

  15. 15

    LeDoux, J.E. Emotion circuits in the brain. Annu. Rev. Neurosci. 23, 155–184 (2000).

  16. 16

    Davis, M. The role of the amygdala in fear and anxiety. Annu. Rev. Neurosci. 15, 353–375 (1992).

  17. 17

    Baxter, M.G. & Murray, E.A. The amygdala and reward. Nat. Rev. Neurosci. 3, 563–573 (2002).

  18. 18

    Seymour, B. & Dolan, R. Emotion, decision making, and the amygdala. Neuron 58, 662–671 (2008).

  19. 19

    Wassum, K.M. & Izquierdo, A. The basolateral amygdala in reward learning and addiction. Neurosci. Biobehav. Rev. 57, 271–283 (2015).

  20. 20

    Montague, P.R., Dayan, P. & Sejnowski, T.J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

  21. 21

    Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

  22. 22

    Houk, J.C., Adamas, J.L. & Barto, A.G. A model of how the basal ganglia generates and uses neural signals that predict reinforcement. in Models of Information Processing in the Basal Ganglia (eds. Houk, J.C., Davis, J.L. & Beiser, D.G.) 249–274 (MIT Press, 1995).

  23. 23

    O'Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).

  24. 24

    Doya, K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr. Opin. Neurobiol. 10, 732–739 (2000).

  25. 25

    Suri, R.E. & Schultz, W. Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp. Brain Res. 121, 350–354 (1998).

  26. 26

    Nakahara, H., Doya, K. & Hikosaka, O. Parallel cortico-basal ganglia mechanisms for acquisition and execution of visuomotor sequences - a computational approach. J. Cogn. Neurosci. 13, 626–647 (2001).

  27. 27

    Frank, M.J. Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. J. Cogn. Neurosci. 17, 51–72 (2005).

  28. 28

    Haber, S.N., Fudge, J.L. & McFarland, N.R. Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J. Neurosci. 20, 2369–2382 (2000).

  29. 29

    Brown, R.M., Crane, A.M. & Goldman, P.S. Regional distribution of monoamines in the cerebral cortex and subcortical structures of the rhesus monkey: concentrations and in vivo synthesis rates. Brain Res. 168, 133–150 (1979).

  30. 30

    Garris, P.A. & Wightman, R.M. Distinct pharmacological regulation of evoked dopamine efflux in the amygdala and striatum of the rat in vivo. Synapse 20, 269–279 (1995).

  31. 31

    Schultz, W. Getting formal with dopamine and reward. Neuron 36, 241–263 (2002).

  32. 32

    Frank, M.J., Seeberger, L.C. & O'reilly, R.C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943 (2004).

  33. 33

    Pessiglione, M., Seymour, B., Flandin, G., Dolan, R.J. & Frith, C.D. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442, 1042–1045 (2006).

  34. 34

    Rescorla, R.A. & Wagner, A.R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. in Classical Conditioning II: Current Research and Theory (eds. Black, A.H. & Prokasy, W.F.) 64–99 (Appleton-Century-Crofts, 1972).

  35. 35

    Lammel, S. et al. Input-specific control of reward and aversion in the ventral tegmental area. Nature 491, 212–217 (2012).

  36. 36

    Danjo, T., Yoshimi, K., Funabiki, K., Yawata, S. & Nakanishi, S. Aversive behavior induced by optogenetic inactivation of ventral tegmental area dopamine neurons is mediated by dopamine D2 receptors in the nucleus accumbens. Proc. Natl. Acad. Sci. USA 111, 6455–6460 (2014).

  37. 37

    Kravitz, A.V., Tye, L.D. & Kreitzer, A.C. Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nat. Neurosci. 15, 816–818 (2012).

  38. 38

    Gerfen, C.R. & Surmeier, D.J. Modulation of striatal projection systems by dopamine. Annu. Rev. Neurosci. 34, 441–466 (2011).

  39. 39

    Gore, F. et al. Neural representations of unconditioned stimuli in basolateral amygdala mediate innate and learned responses. Cell 162, 134–145 (2015).

  40. 40

    Namburi, P. et al. A circuit mechanism for differentiating positive and negative associations. Nature 520, 675–678 (2015).

  41. 41

    Lau, B. & Glimcher, P.W. Value representations in the primate striatum during matching behavior. Neuron 58, 451–463 (2008).

  42. 42

    Setlow, B., Schoenbaum, G. & Gallagher, M. Neural encoding in ventral striatum during olfactory discrimination learning. Neuron 38, 625–636 (2003).

  43. 43

    Roitman, M.F., Wheeler, R.A. & Carelli, R.M. Nucleus accumbens neurons are innately tuned for rewarding and aversive taste stimuli, encode their predictors, and are linked to motor output. Neuron 45, 587–597 (2005).

  44. 44

    Roitman, M.F., Wheeler, R.A., Tiesinga, P.H., Roitman, J.D. & Carelli, R.M. Hedonic and nucleus accumbens neural responses to a natural reward are regulated by aversive conditioning. Learn. Mem. 17, 539–546 (2010).

  45. 45

    Cerri, D.H., Saddoris, M.P. & Carelli, R.M. Nucleus accumbens core neurons encode value-independent associations necessary for sensory preconditioning. Behav. Neurosci. 128, 567–578 (2014).

  46. 46

    Ambroggi, F., Ghazizadeh, A., Nicola, S.M. & Fields, H.L. Roles of nucleus accumbens core and shell in incentive-cue responding and behavioral inhibition. J. Neurosci. 31, 6820–6830 (2011).

  47. 47

    Eisenegger, C. et al. Role of dopamine D2 receptors in human reinforcement learning. Neuropsychopharmacology 39, 2366–2375 (2014).

  48. 48

    Costa, V.D., Tran, V.L., Turchi, J. & Averbeck, B.B. Reversal learning and dopamine: a bayesian perspective. J. Neurosci. 35, 2407–2416 (2015).

  49. 49

    Cardinal, R.N. et al. Effects of selective excitotoxic lesions of the nucleus accumbens core, anterior cingulate cortex, and central nucleus of the amygdala on autoshaping performance in rats. Behav. Neurosci. 116, 553–567 (2002).

  50. 50

    Parkinson, J.A., Robbins, T.W. & Everitt, B.J. Dissociable roles of the central and basolateral amygdala in appetitive emotional learning. Eur. J. Neurosci. 12, 405–413 (2000).

  51. 51

    Gallagher, M., Graham, P.W. & Holland, P.C. The amygdala central nucleus and appetitive Pavlovian conditioning: lesions impair one class of conditioned behavior. J. Neurosci. 10, 1906–1911 (1990).

  52. 52

    Cador, M., Robbins, T.W. & Everitt, B.J. Involvement of the amygdala in stimulus-reward associations: interaction with the ventral striatum. Neuroscience 30, 77–86 (1989).

  53. 53

    Burns, L.H., Robbins, T.W. & Everitt, B.J. Differential effects of excitotoxic lesions of the basolateral amygdala, ventral subiculum and medial prefrontal cortex on responding with conditioned reinforcement and locomotor activity potentiated by intra-accumbens infusions of D-amphetamine. Behav. Brain Res. 55, 167–183 (1993).

  54. 54

    Belin, D., Jonkman, S., Dickinson, A., Robbins, T.W. & Everitt, B.J. Parallel and interactive learning processes within the basal ganglia: relevance for the understanding of addiction. Behav. Brain Res. 199, 89–102 (2009).

  55. 55

    Parkinson, J.A., Olmstead, M.C., Burns, L.H., Robbins, T.W. & Everitt, B.J. Dissociation in effects of lesions of the nucleus accumbens core and shell on appetitive pavlovian approach behavior and the potentiation of conditioned reinforcement and locomotor activity by D-amphetamine. J. Neurosci. 19, 2401–2411 (1999).

  56. 56

    Hofer, P.A. Urbach-Wiethe disease (lipoglycoproteinosis; lipoid proteinosis; hyalinosis cutis et mucosae). A review. Acta Derm. Venereol. Suppl. (Stockh.) 53, 1–52 (1973).

  57. 57

    Fuster, J.M. & Uyeda, A.A. Reactivity of limbic neurons of the monkey to appetitive and aversive signals. Electroencephalogr. Clin. Neurophysiol. 30, 281–293 (1971).

  58. 58

    Sanghera, M.K., Rolls, E.T. & Roper-Hall, A. Visual responses of neurons in the dorsolateral amygdala of the alert monkey. Exp. Neurol. 63, 610–626 (1979).

  59. 59

    Muramoto, K., Ono, T., Nishijo, H. & Fukuda, M. Rat amygdaloid neuron responses during auditory discrimination. Neuroscience 52, 621–636 (1993).

  60. 60

    Nishijo, H., Ono, T. & Nishino, H. Single neuron responses in amygdala of alert monkey during complex sensory stimulation with affective significance. J. Neurosci. 8, 3570–3583 (1988).

  61. 61

    Paton, J.J., Belova, M.A., Morrison, S.E. & Salzman, C.D. The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature 439, 865–870 (2006).

  62. 62

    Klavir, O., Genud-Gabai, R. & Paz, R. Functional connectivity between amygdala and cingulate cortex for adaptive aversive learning. Neuron 80, 1290–1300 (2013).

  63. 63

    Salzman, C.D. & Fusi, S. Emotion, cognition, and mental state representation in amygdala and prefrontal cortex. Annu. Rev. Neurosci. 33, 173–202 (2010).

  64. 64

    Jenison, R.L., Rangel, A., Oya, H., Kawasaki, H. & Howard, M.A. Value encoding in single neurons in the human amygdala during decision making. J. Neurosci. 31, 331–338 (2011).

  65. 65

    Saez, A., Rigotti, M., Ostojic, S., Fusi, S. & Salzman, C.D. Abstract context representations in primate amygdala and prefrontal cortex. Neuron 87, 869–881 (2015).

  66. 66

    Belova, M.A., Paton, J.J., Morrison, S.E. & Salzman, C.D. Expectation modulates neural responses to pleasant and aversive stimuli in primate amygdala. Neuron 55, 970–984 (2007).

  67. 67

    Shabel, S.J. & Janak, P.H. Substantial similarity in amygdala neuronal activity during conditioned appetitive and aversive emotional arousal. Proc. Natl. Acad. Sci. USA 106, 15031–15036 (2009).

  68. 68

    Tye, K.M. & Janak, P.H. Amygdala neurons differentially encode motivation and reinforcement. J. Neurosci. 27, 3937–3945 (2007).

  69. 69

    Tye, K.M., Stuber, G.D., de Ridder, B., Bonci, A. & Janak, P.H. Rapid strengthening of thalamo-amygdala synapses mediates cue-reward learning. Nature 453, 1253–1257 (2008).

  70. 70

    Beyeler, A. et al. Divergent routing of positive and negative information from the amygdala during memory retrieval. Neuron 90, 348–361 (2016).

  71. 71

    Tian, J. et al. Distributed and mixed information in monosynaptic inputs to dopamine neurons. Neuron 91, 1374–1389 (2016).

  72. 72

    Johansen, J.P., Tarpley, J.W., LeDoux, J.E. & Blair, H.T. Neural substrates for expectation-modulated fear learning in the amygdala and periaqueductal gray. Nat. Neurosci. 13, 979–986 (2010).

  73. 73

    Britt, J.P. et al. Synaptic and behavioral profile of multiple glutamatergic inputs to the nucleus accumbens. Neuron 76, 790–803 (2012).

  74. 74

    Floresco, S.B., Yang, C.R., Phillips, A.G. & Blaha, C.D. Basolateral amygdala stimulation evokes glutamate receptor-dependent dopamine efflux in the nucleus accumbens of the anaesthetized rat. Eur. J. Neurosci. 10, 1241–1251 (1998).

  75. 75

    Jones, J.L. et al. Basolateral amygdala modulates terminal dopamine release in the nucleus accumbens and conditioned responding. Biol. Psychiatry 67, 737–744 (2010).

  76. 76

    Takahashi, Y.K., Langdon, A.J., Niv, Y. & Schoenbaum, G. Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 91, 182–193 (2016).

  77. 77

    Parker, N.F. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 (2016).

  78. 78

    Weiskrantz, L. Behavioral changes associated with ablation of the amygdaloid complex in monkeys. J. Comp. Physiol. Psychol. 49, 381–391 (1956).

  79. 79

    Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, 1998).

  80. 80

    Pearce, J.M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).

  81. 81

    Schultz, W. Neuronal reward and decision signals: from theories to data. Physiol. Rev. 95, 853–951 (2015).

  82. 82

    Dayan, P. & Daw, N.D. Decision theory, reinforcement learning, and the brain. Cogn. Affect. Behav. Neurosci. 8, 429–453 (2008).

  83. 83

    Sutton, R.S. Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988).

  84. 84

    Menegas, W. et al. Dopamine neurons projecting to the posterior striatum form an anatomically distinct subclass. eLife 4, e10032 (2015).

  85. 85

    Day, J.J., Roitman, M.F., Wightman, R.M. & Carelli, R.M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).

  86. 86

    Hart, A.S., Rutledge, R.B., Glimcher, P.W. & Phillips, P.E. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J. Neurosci. 34, 698–704 (2014).

  87. 87

    Ravel, S. & Richmond, B.J. Dopamine neuronal responses in monkeys performing visually cued reward schedules. Eur. J. Neurosci. 24, 277–290 (2006).

  88. 88

    Hamid, A.A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).

  89. 89

    Syed, E.C. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat. Neurosci. 19, 34–36 (2016).

  90. 90

    Cohen, J.Y., Haesler, S., Vong, L., Lowell, B.B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

  91. 91

    Howe, M.W., Tierney, P.L., Sandberg, S.G., Phillips, P.E. & Graybiel, A.M. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).

  92. 92

    Guitart-Masip, M. et al. Action controls dopaminergic enhancement of reward representations. Proc. Natl. Acad. Sci. USA 109, 7511–7516 (2012).

  93. 93

    Lloyd, K. & Dayan, P. Tamping ramping: algorithmic, implementational, and computational explanations of phasic dopamine signals in the accumbens. PLoS Comput. Biol. 11, e1004622 (2015).

  94. 94

    Matsumoto, M. & Hikosaka, O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841 (2009).

  95. 95

    Joshua, M. et al. Synchronization of midbrain dopaminergic neurons is enhanced by rewarding events. Neuron 62, 695–704 (2009).

  96. 96

    Bromberg-Martin, E.S., Matsumoto, M. & Hikosaka, O. Distinct tonic and phasic anticipatory activity in lateral habenula and dopamine neurons. Neuron 67, 144–155 (2010).

  97. 97

    Mirenowicz, J. & Schultz, W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449–451 (1996).

  98. 98

    Brischoux, F., Chakraborty, S., Brierley, D.I. & Ungless, M.A. Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli. Proc. Natl. Acad. Sci. USA 106, 4894–4899 (2009).

  99. 99

    Lerner, T.N. et al. Intact-brain analyses reveal distinct information carried by SNc dopamine subcircuits. Cell 162, 635–647 (2015).

  100. 100

    Badrinarayan, A. et al. Aversive stimuli differentially modulate real-time dopamine transmission dynamics within the nucleus accumbens core and shell. J. Neurosci. 32, 15779–15790 (2012).

Download references

Author information

Correspondence to Bruno B Averbeck.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Averbeck, B., Costa, V. Motivational neural circuits underlying reinforcement learning. Nat Neurosci 20, 505–512 (2017) doi:10.1038/nn.4506

Download citation

Further reading