Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

OPINION

Believing in dopamine

Abstract

Midbrain dopamine signals are widely thought to report reward prediction errors that drive learning in the basal ganglia. However, dopamine has also been implicated in various probabilistic computations, such as encoding uncertainty and controlling exploration. Here, we show how these different facets of dopamine signalling can be brought together under a common reinforcement learning framework. The key idea is that multiple sources of uncertainty impinge on reinforcement learning computations: uncertainty about the state of the environment, the parameters of the value function and the optimal action policy. Each of these sources plays a distinct role in the prefrontal cortex–basal ganglia circuit for reinforcement learning and is ultimately reflected in dopamine activity. The view that dopamine plays a central role in the encoding and updating of beliefs brings the classical prediction error theory into alignment with more recent theories of Bayesian reinforcement learning.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: The neural architecture for reinforcement learning under state uncertainty.
Fig. 2: Experimental evidence for reflections of state uncertainty in dopamine signals.
Fig. 3: Experimental evidence for uncertainty-dependent dopamine signals in a perceptual decision making task.
Fig. 4: Two forms of uncertainty have distinct effects on exploratory choice and are governed by distinct dopamine afferents.

References

  1. Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  PubMed  Google Scholar 

  3. Courville, A. C., Daw, N. D. & Touretzky, D. S. Bayesian theories of conditioning in a changing world. Trends Cogn. Sci. 10, 294–300 (2006).

    Article  PubMed  Google Scholar 

  4. Gershman, S. J., Blei, D. M. & Niv, Y. Context, learning, and extinction. Psychol. Rev. 117, 197–209 (2010).

    Article  PubMed  Google Scholar 

  5. Gershman, S. J. A Unifying probabilistic view of associative learning. PLOS Comput. Biol. 11, e1004567 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kakade, S. & Dayan, P. Acquisition and extinction in autoshaping. Psychol. Rev. 109, 533–544 (2002).

    Article  PubMed  Google Scholar 

  7. Friston, K. et al. Active inference and epistemic value. Cogn. Neurosci. 6, 187–214 (2015).

    Article  PubMed  Google Scholar 

  8. Gershman, S. J. Deconstructing the human algorithms for exploration. Cognition 173, 34–42 (2018).

    Article  PubMed  Google Scholar 

  9. Speekenbrink, M. & Konstantinidis, E. Uncertainty and exploration in a restless bandit problem. Top. Cogn. Sci. 7, 351–367 (2015).

    Article  PubMed  Google Scholar 

  10. Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A. & Cohen, J. D. Humans use directed and random exploration to solve the explore-exploit dilemma. J. Exp. Psychol. Gen. 143, 2074–2081 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Ma, W. J. & Jazayeri, M. Neural coding of uncertainty and probability. Annu. Rev. Neurosci. 37, 205–220 (2014).

    Article  CAS  PubMed  Google Scholar 

  12. Rao, R. P. N. Decision making under uncertainty: a neural model based on partially observable Markov decision processes. Front. Comput. Neurosci. 4, 146 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Daw, N. D., Courville, A. C. & Touretzky, D. S. Representation and timing in theories of the dopamine system. Neural Comput. 18, 1637–1677 (2006).

    Article  PubMed  Google Scholar 

  14. Gershman, S. J. & Daw, N. D. Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128 (2017).

    Article  PubMed  Google Scholar 

  15. Jazayeri, M. & Movshon, J. A. Optimal representation of sensory information by neural populations. Nat. Neurosci. 9, 690–696 (2006).

    Article  CAS  PubMed  Google Scholar 

  16. Grabska-Barwińska, A. et al. A probabilistic approach to demixing odors. Nat. Neurosci. 20, 98–106 (2017).

    Article  CAS  PubMed  Google Scholar 

  17. Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P. & Pezzulo, G. Active inference: a process theory. Neural Comput. 29, 1–49 (2017).

    Article  PubMed  Google Scholar 

  18. Buesing, L., Bill, J., Nessler, B. & Maass, W. Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLOS Comput. Biol. 7, e1002211 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Pecevski, D., Buesing, L. & Maass, W. Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons. PLOS Comput. Biol. 7, e1002294 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Haefner, R. M., Berkes, P. & Fiser, J. Perceptual decision-making as probabilistic inference by neural sampling. Neuron 90, 649–660 (2016).

    Article  CAS  PubMed  Google Scholar 

  21. Orbán, G., Berkes, P., Fiser, J. & Lengyel, M. Neural variability and sampling-based probabilistic representations in the visual cortex. Neuron 92, 530–543 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Ting, C.-C., Yu, C.-C., Maloney, L. T. & Wu, S.-W. Neural mechanisms for integrating prior knowledge and likelihood in value-based probabilistic inference. J. Neurosci. 35, 1792–1805 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Yoshida, W. & Ishii, S. Resolution of uncertainty in prefrontal cortex. Neuron 50, 781–789 (2006).

    Article  CAS  PubMed  Google Scholar 

  24. Yoshida, W., Seymour, B., Friston, K. J. & Dolan, R. J. Neural mechanisms of belief inference during cooperative games. J. Neurosci. 30, 10744–10751 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Fleming, S. M., van der Putten, E. J. & Daw, N. D. Neural mediators of changes of mind about perceptual decisions. Nat. Neurosci. 21, 617–624 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kumaran, D., Banino, A., Blundell, C., Hassabis, D. & Dayan, P. Computations underlying social hierarchy learning: distinct neural mechanisms for updating and representing self-relevant information. Neuron 92, 1135–1147 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Turner, M. S., Cipolotti, L., Yousry, T. A. & Shallice, T. Confabulation: damage to a specific inferior medial prefrontal system. Cortex 44, 637–648 (2008).

    Article  PubMed  Google Scholar 

  28. Karlsson, M. P., Tervo, D. G. R. & Karpova, A. Y. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science 338, 135–139 (2012).

    Article  CAS  PubMed  Google Scholar 

  29. Fuhs, M. C. & Touretzky, D. S. Context learning in the rodent hippocampus. Neural Comput. 19, 3173–3215 (2007).

    Article  PubMed  Google Scholar 

  30. Dufort, R. H., Guttman, N. & Kimble, G. A. One-trial discrimination reversal in the white rat. J. Comp. Physiol. Psychol. 47, 248–249 (1954).

    Article  CAS  PubMed  Google Scholar 

  31. Pubols, B. H. Jr. Serial reversal learning as a function of the number of trials per reversal. J. Comp. Physiol. Psychol. 55, 66–68 (1962).

    Article  PubMed  Google Scholar 

  32. Bromberg-Martin, E. S., Matsumoto, M., Hong, S. & Hikosaka, O. A pallidus-habenula-dopamine pathway signals inferred stimulus values. J. Neurophysiol. 104, 1068–1076 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Gallistel, C. R., Mark, T. A., King, A. P. & Latham, P. E. The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. J. Exp. Psychol. Anim. Behav. Process. 27, 354–372 (2001).

    Article  CAS  PubMed  Google Scholar 

  34. Jang, A. I. et al. The role of frontal cortical and medial-temporal lobe brain areas in learning a Bayesian prior belief on reversals. J. Neurosci. 35, 11751–11760 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Hampton, A. N., Bossaerts, P. & O’Doherty, J. P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Mondragón, E., Alonso, E. & Kokkola, N. Associative learning should go deep. Trends Cogn. Sci. 21, 822–825 (2017).

    Article  PubMed  Google Scholar 

  37. Gibbon, J. Scalar expectancy theory and weber’s law in animal timing. Psychol. Rev. 84, 279–325 (1977).

    Article  Google Scholar 

  38. Gibbon, J., Church, R. M. & Meck, W. H. Scalar timing in memory. Ann. NY Acad. Sci. 423, 52–77 (1984).

    Article  CAS  PubMed  Google Scholar 

  39. Shi, Z., Church, R. M. & Meck, W. H. Bayesian optimization of time perception. Trends Cogn. Sci. 17, 556–564 (2013).

    Article  PubMed  Google Scholar 

  40. Petter, E. A., Gershman, S. J. & Meck, W. H. Integrating models of interval timing and reinforcement learning. Trends Cogn. Sci. 22, 911–922 (2018).

    Article  PubMed  Google Scholar 

  41. Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes. Nat. Neurosci. 9, 1432–1438 (2006).

    Article  CAS  PubMed  Google Scholar 

  42. Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput. 20, 3034–3054 (2008).

    Article  PubMed  Google Scholar 

  43. Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Evaluating the TD model of classical conditioning. Learn. Behav. 40, 305–319 (2012).

    Article  PubMed  Google Scholar 

  44. Gershman, S. J., Moustafa, A. A. & Ludvig, E. A. Time representation in reinforcement learning models of the basal ganglia. Front. Comput. Neurosci. 7, 194 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Mello, G. B. M., Soares, S. & Paton, J. J. A scalable population code for time in the striatum. Curr. Biol. 25, 1113–1122 (2015).

    Article  CAS  PubMed  Google Scholar 

  46. Akhlaghpour, H. et al. Dissociated sequential activity and stimulus encoding in the dorsomedial striatum during spatial working memory. eLife 5, e19507 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Kim, J., Kim, D. & Jung, M. W. Distinct dynamics of striatal and prefrontal neural activity during temporal discrimination. Front. Integr. Neurosci. 12, 34 (2018).

  48. Bakhurin, K. I. et al. Differential encoding of time by prefrontal and striatal network dynamics. J. Neurosci. 37, 854–870 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Adler, A. et al. Temporal convergence of dynamic cell assemblies in the striato-pallidal network. J. Neurosci. 32, 2473–2484 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Emmons, E. B. et al. Rodent medial frontal control of temporal processing in the dorsomedial striatum. J. Neurosci. 37, 8718–8733 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Gouvêa, T. S. et al. Striatal dynamics explain duration judgments. eLife 4, e11386 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Takahashi, Y. K., Langdon, A. J., Niv, Y. & Schoenbaum, G. Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 91, 182–193 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Wiener, S. I. Spatial and behavioral correlates of striatal neurons in rats performing a self-initiated navigation task. J. Neurosci. 13, 3802–3817 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Lavoie, A. M. & Mizumori, S. J. Spatial, movement- and reward-sensitive discharge by medial ventral striatum neurons of rats. Brain Res. 638, 157–168 (1994).

    Article  CAS  PubMed  Google Scholar 

  55. Caan, W., Perrett, D. I. & Rolls, E. T. Responses of striatal neurons in the behaving monkey. 2. Visual processing in the caudal neostriatum. Brain Res. 290, 53–65 (1984).

    Article  CAS  PubMed  Google Scholar 

  56. Brown, V. J., Desimone, R. & Mishkin, M. Responses of cells in the tail of the caudate nucleus during visual discrimination learning. J. Neurophysiol. 74, 1083–1094 (1995).

    Article  CAS  PubMed  Google Scholar 

  57. Kakade, S. & Dayan, P. Dopamine: generalization and bonuses. Neural Netw. 15, 549–559 (2002).

    Article  PubMed  Google Scholar 

  58. Schultz, W. & Romo, R. Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. J. Neurophysiol. 63, 607–624 (1990).

    Article  CAS  PubMed  Google Scholar 

  59. Kobayashi, S. & Schultz, W. Reward contexts extend dopamine signals to unrewarded stimuli. Curr. Biol. 24, 56–62 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Matsumoto, H., Tian, J., Uchida, N. & Watabe-Uchida, M. Midbrain dopamine neurons signal aversion in a reward-context-dependent manner. eLife 5, e17328 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).

    Article  CAS  PubMed  Google Scholar 

  62. Starkweather, C. K., Babayan, B. M., Uchida, N. & Gershman, S. J. Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci. 20, 581–589 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Fiorillo, C. D., Newsome, W. T. & Schultz, W. The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci. 11, 966–973 (2008).

    Article  CAS  PubMed  Google Scholar 

  64. Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y. & Hikosaka, O. Dopamine neurons can represent context-dependent prediction error. Neuron 41, 269–280 (2004).

    Article  CAS  PubMed  Google Scholar 

  65. Starkweather, C. K., Gershman, S. J. & Uchida, N. The medial prefrontal cortex shapes dopamine reward prediction errors under state uncertainty. Neuron 98, 616–629.e6 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Babayan, B. M., Uchida, N. & Gershman, S. J. Belief state representation in the dopamine system. Nat. Commun. 9, 1891 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Nomoto, K., Schultz, W., Watanabe, T. & Sakagami, M. Temporally extended dopamine responses to perceptually demanding reward-predictive stimuli. J. Neurosci. 30, 10692–10702 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Lak, A., Nomoto, K., Keramati, M., Sakagami, M. & Kepecs, A. Midbrain dopamine neurons signal belief in choice accuracy during a perceptual decision. Curr. Biol. 27, 821–832 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Sarno, S., de Lafuente, V., Romo, R. & Parga, N. Dopamine reward prediction error signal codes the temporal evaluation of a perceptual decision report. Proc. Natl Acad. Sci. USA 114, E10494–E10503 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Ghavamzadeh, M., Mannor, S., Pineau, J. & Tamar, A. Bayesian Reinforcement Learning: A Survey (Now Publishers, 2015).

  71. Gershman, S. J. Dopamine, inference, and uncertainty. Neural Comput. 29, 3311–3326 (2017).

    Article  PubMed  Google Scholar 

  72. Kamin, L. J. in Punishment and Aversive Behavior (eds Campbell, B. A. & Church, R. M.) 279–296 (Appleton-Century-Crofts, 1969).

  73. Rescorla, R. A. & Wagner, A. R. in Classical Conditioning II: Recent Research and Theory (eds Black, A. H. & Prokasy, W. F.) 64–99 (Appleton-Century-Crofts, 1972).

  74. Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001).

    Article  CAS  PubMed  Google Scholar 

  75. Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Miller, R. R. & Matute, H. Biological significance in forward and backward blocking: resolution of a discrepancy between animal conditioning and human causal judgment. J. Exp. Psychol. Gen. 125, 370–386 (1996).

    Article  CAS  PubMed  Google Scholar 

  77. Urushihara, K. & Miller, R. R. Backward blocking in first-order conditioning. PsycEXTRA Dataset https://doi.org/10.1037/e527342012-212 (2007).

    Article  Google Scholar 

  78. Blaisdell, A. P., Gunther, L. M. & Miller, R. R. Recovery from blocking achieved by extinguishing the blocking CS. Anim. Learn. Behav. 27, 63–76 (1999).

    Article  Google Scholar 

  79. Dayan, P. & Kakade, S. Explaining away in weight space. Adv. Neural Inf. Process. Syst. 13, 451–457 (2001).

    Google Scholar 

  80. Miller, R. R. & Witnauer, J. E. Retrospective revaluation: the phenomenon and its theoretical implications. Behav. Process. 123, 15–25 (2016).

    Article  Google Scholar 

  81. Lubow, R. E. Latent inhibition. Psychol. Bull. 79, 398–407 (1973).

    Article  CAS  PubMed  Google Scholar 

  82. Aguado, L., Symonds, M. & Hall, G. Interval between preexposure and test determines the magnitude of latent inhibition: Implications for an interference account. Anim. Learn. Behav. 22, 188–194 (1994).

    Article  Google Scholar 

  83. Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Young, A. M., Joseph, M. H. & Gray, J. A. Latent inhibition of conditioned dopamine release in rat nucleus accumbens. Neuroscience 54, 5–9 (1993).

    Article  CAS  PubMed  Google Scholar 

  85. Frank, M. J. & Claus, E. D. Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol. Rev. 113, 300–326 (2006).

    Article  PubMed  Google Scholar 

  86. Deco, G. & Rolls, E. T. Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex. Cereb. Cortex 15, 15–30 (2005).

    Article  PubMed  Google Scholar 

  87. Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Sadacca, B. F. et al. Orbitofrontal neurons signal sensory associations underlying model-based inference in a sensory preconditioning task. eLife 7, e30373 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  89. Jones, J. L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953–956 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Payzan-LeNestour, É. & Bossaerts, P. Do not bet on the unknown versus try to find out more: estimation uncertainty and ‘unexpected uncertainty’ both modulate exploration. Front. Neurosci. 6, 150 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Schulz, E., Konstantinidis, E. & Speekenbrink, M. Putting bandits into context: How function learning supports decision making. J. Exp. Psychol. Learn. Mem. Cogn. 44, 927–943 (2018).

    Article  PubMed  Google Scholar 

  92. Myers, J. L. & Sadler, E. Effects of range of payoffs as a variable in risk taking. J. Exp. Psychol. 60, 306–309 (1960).

    Article  CAS  PubMed  Google Scholar 

  93. Busemeyer, J. R. & Townsend, J. T. Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychol. Rev. 100, 432–459 (1993).

    Article  CAS  PubMed  Google Scholar 

  94. Gershman, S. J. Uncertainty and exploration. Decision 6, 277–286 (2019).

    Article  PubMed  Google Scholar 

  95. Frank, M. J., Doll, B. B., Oas-Terpstra, J. & Moreno, F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–1068 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Humphries, M. D., Khamassi, M. & Gurney, K. Dopaminergic control of the exploration-exploitation trade-off via the basal ganglia. Front. Neurosci. 6, 9 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Pezzulo, G., Rigoli, F. & Friston, K. J. Hierarchical active inference: a theory of motivated control. Trends Cogn. Sci. 22, 294–306 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  98. Botvinick, M. & Toussaint, M. Planning as inference. Trends Cogn. Sci. 16, 485–488 (2012).

    Article  PubMed  Google Scholar 

  99. FitzGerald, T. H. B., Dolan, R. J. & Friston, K. Dopamine, reward learning, and active inference. Front. Comput. Neurosci. 9, 136 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  100. Friston, K. J. et al. Dopamine, affordance and active inference. PLOS Comput. Biol. 8, e1002327 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Weele, C. M. V. et al. Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive stimuli. Nature 563, 397–401 (2018).

    Article  CAS  Google Scholar 

  102. Thurley, K., Senn, W. & Lüscher, H.-R. Dopamine increases the gain of the input-output response of rat prefrontal pyramidal neurons. J. Neurophysiol. 99, 2985–2997 (2008).

    Article  PubMed  Google Scholar 

  103. Gershman, S. J., Norman, K. A. & Niv, Y. Discovering latent causes in reinforcement learning. Curr. Opin. Behav. Sci. 5, 43–50 (2015).

    Article  Google Scholar 

  104. Gershman, S. J., Monfils, M.-H., Norman, K. A. & Niv, Y. The computational nature of memory modification. eLife 6, e23763 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  105. Redish, A. D., Jensen, S., Johnson, A. & Kurth-Nelson, Z. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. Psychol. Rev. 114, 784–805 (2007).

    Article  PubMed  Google Scholar 

  106. Gardner, M. P. H., Schoenbaum, G. & Gershman, S. J. Rethinking dopamine as generalized prediction error. Proc. Biol. Sci. 285, 20181645 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  107. Gershman, S. J. The successor representation: its computational logic and neural substrates. J. Neurosci. 38, 7193–7200 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Le Bouc, R. et al. Computational dissection of dopamine motor and motivational functions in humans. J. Neurosci. 36, 6623–6633 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Walton, M. E. & Bouret, S. What is the relationship between dopamine and effort? Trends Neurosci. 42, 79–91 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Westbrook, A. & Braver, T. S. Dopamine does double duty in motivating cognitive effort. Neuron 91, 708 (2016).

    Article  CAS  PubMed  Google Scholar 

  111. Niv, Y., Daw, N. D., Joel, D. & Dayan, P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology 191, 507–520 (2007).

    Article  CAS  PubMed  Google Scholar 

  112. Sutton, R. S. Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988).

    Google Scholar 

  113. Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99–134 (1998).

    Article  Google Scholar 

  114. Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  116. Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005).

    Article  CAS  PubMed  Google Scholar 

  117. Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Author contributions

The authors contributed equally to all aspects of the article.

Reviewer information

Nature Reviews Neuroscience thanks J. Pearson and the other, anonymous, reviewers for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel J. Gershman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Active inference

The hypothesis that biological agents will take actions to reduce expected surprise.

Free-energy principle

The hypothesis that the objective of brain function is to minimize expected (average) surprise.

Posterior probability distribution

The conditional probability of latent variables (for example, hidden states) conditional on observed variables (for example, sensory data).

Sufficient statistic

A function of a data sample that completely summarizes the information contained in the data about the parameters of a probability distribution.

Value function

The mapping from states to long-term expected future rewards (typically discounted to reflect a preference for sooner over later rewards).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gershman, S.J., Uchida, N. Believing in dopamine. Nat Rev Neurosci 20, 703–714 (2019). https://doi.org/10.1038/s41583-019-0220-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41583-019-0220-7

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing