Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A feature-specific prediction error model explains dopaminergic heterogeneity

Abstract

The hypothesis that midbrain dopamine (DA) neurons broadcast a reward prediction error (RPE) is among the great successes of computational neuroscience. However, recent results contradict a core aspect of this theory: specifically that the neurons convey a scalar, homogeneous signal. While the predominant family of extensions to the RPE model replicates the classic model in multiple parallel circuits, we argue that these models are ill suited to explain reports of heterogeneity in task variable encoding across DA neurons. Instead, we introduce a complementary ‘feature-specific RPE’ model, positing that individual ventral tegmental area DA neurons report RPEs for different aspects of an animal’s moment-to-moment situation. Further, we show how our framework can be extended to explain patterns of heterogeneity in action responses reported among substantia nigra pars compacta DA neurons. This theory reconciles new observations of DA heterogeneity with classic ideas about RPE coding while also providing a new perspective of how the brain performs reinforcement learning in high-dimensional environments.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Feature-specific PE model updates classic TD learning to produce heterogeneous DA signals that reflect the state representation.
Fig. 2: Feature-specific RPEs derived from a deep RL model trained on a VR evidence accumulation task are heterogeneously modulated by behavioral variables during the cue period, similar to DA neurons.
Fig. 3: Feature-specific RPEs reflect incidental high-dimensional visual inputs.
Fig. 4: Cue responses in model and DAergic neurons reflect RPEs with respect to cues rather than simply their presence.
Fig. 5: Feature-specific RPE units and DAergic neurons consistently respond to reward but show heterogeneous modulation by reward expectation.
Fig. 6: Sensory PEs from the SR model are heterogeneous during the cue period but are not responsive to confirmatory cues and reward.
Fig. 7: Expectile PEs from a distributional RL model do not fully capture cue period heterogeneity.
Fig. 8: Feature-specific APEs provide a potential explanation for movement-related heterogeneity in SNc DA neurons.

Similar content being viewed by others

Data availability

Both the model and neural data that support the findings of this study are available on Figshare at https://doi.org/10.6084/m9.figshare.25752450 (ref. 83). A description of the data can be found with the code at https://github.com/ndawlab/vectorRPE/. Source data are provided with this paper.

Code availability

Code used for the deep RL model, VR environment and analysis of the data to reproduce the figures can be found at https://github.com/ndawlab/vectorRPE/.

References

  1. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    CAS  PubMed  Google Scholar 

  2. Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Arbuthnott, G. W. & Wickens, J. Space, time and dopamine. Trends Neurosci. 30, 62–69 (2007).

    CAS  PubMed  Google Scholar 

  4. Matsuda, W. et al. Single nigrostriatal dopaminergic neurons form widely spread and highly dense axonal arborizations in the neostriatum. J. Neurosci. 29, 444–453 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).

    CAS  PubMed  Google Scholar 

  6. Parker, N. F. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Lee, R. S., Mattar, M. G., Parker, N. F., Witten, I. B. & Daw, N. D. Reward prediction error does not explain movement selectivity in DMS-projecting dopamine neurons. eLife 8, e42992 (2019).

    PubMed  PubMed Central  Google Scholar 

  8. Choi, J. Y. et al. A comparison of dopaminergic and cholinergic populations reveals unique contributions of VTA dopamine neurons to short-term memory. Cell Rep. 33, 108492 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Engelhard, B. et al. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature 570, 509–513 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Lerner, T. N. et al. Intact-brain analyses reveal distinct information carried by SNc dopamine subcircuits. Cell 162, 635–647 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Collins, A. L. & Saunders, B. T. Heterogeneity in striatal dopamine circuits: form and function in dynamic reward seeking. J. Neurosci. Res. 98, 1046–1069 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Verharen, J. P. H., Zhu, Y. & Lammel, S. Aversion hot spots in the dopamine system. Curr. Opin. Neurobiol. 64, 46–52 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Hassan, A. & Benarroch, E. E. Heterogeneity of the midbrain dopamine system. Neurology 85, 1795–1805 (2015).

    PubMed  Google Scholar 

  14. Marinelli, M. & McCutcheon, J. E. Heterogeneity of dopamine neuron activity across traits and states. Neuroscience 282, 176–197 (2014).

    CAS  PubMed  Google Scholar 

  15. Kremer, Y., Flakowski, J., Rohner, C. & Lüscher, C. Context-dependent multiplexing by individual VTA dopamine neurons. J. Neurosci. 40, 7489–7509 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Howe, M. W. & Dombeck, D. A. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature 535, 505–510 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Anderegg, A., Poulin, J.-F. & Awatramani, R. Molecular heterogeneity of midbrain dopaminergic neurons—moving toward single cell resolution. FEBS Lett. 589, 3714–3726 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Barter, J. W. et al. Beyond reward prediction errors: the role of dopamine in movement kinematics. Front. Integr. Neurosci. 9, 39 (2015).

    PubMed  PubMed Central  Google Scholar 

  19. Cai, L. X. et al. Distinct signals in medial and lateral VTA dopamine neurons modulate fear extinction at different times. eLife 9, e54936 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Hamid, A. A., Frank, M. J. & Moore, C. I. Wave-like dopamine dynamics as a mechanism for spatiotemporal credit assignment. Cell 184, 2733–2749.e16 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Mohebi, A., Wei, W., Pelattini, L., Kim, K. & Berke, J. D. Dopamine transients follow a striatal gradient of reward time horizons. Nat. Neurosci. 27, 737–746 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Zolin, A. et al. Context-dependent representations of movement in Drosophila dopaminergic reinforcement pathways. Nat. Neurosci. 24, 1555–1566 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Menegas, W., Akiti, K., Amo, R., Uchida, N. & Watabe-Uchida, M. Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli. Nat. Neurosci. 21, 1421–1430 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Jin, X. & Costa, R. M. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature 466, 457–462 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Greenstreet, F. et al. Action prediction error: a value-free dopaminergic teaching signal that drives stable learning. Preprint at bioRxiv https://doi.org/10.1101/2022.09.12.507572 (2024).

  29. Bogacz, R. Dopamine role in learning and action inference. eLife 9, e53262 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Lindsey, J. and Litwin-Kumar, A. Action-modulated midbrain dopamine activity arises from distributed control policies. In Proc. 36th International Conference on Neural Information Processing Systems (eds. Koyejo, S. et al.) 5535–5548 (2022).

  31. Gardner, M. P. H., Schoenbaum, G. & Gershman, S. J. Rethinking dopamine as generalized prediction error. Proc. Biol. Sci. 285, 20181645 (2018).

    PubMed  PubMed Central  Google Scholar 

  32. Alexander, G. E., DeLong, M. R. & Strick, P. L. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357–381 (1986).

    CAS  PubMed  Google Scholar 

  33. Lau, B., Monteiro, T. & Paton, J. J. The many worlds hypothesis of dopamine prediction error: implications of a parallel circuit architecture in the basal ganglia. Curr. Opin. Neurobiol. 46, 241–247 (2017).

    CAS  PubMed  Google Scholar 

  34. Haber, S. N., Fudge, J. L. & McFarland, N. R. Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J. Neurosci. 20, 2369–2382 (2000).

    CAS  PubMed  Google Scholar 

  35. Hintiryan, H. et al. The mouse cortico–striatal projectome. Nat. Neurosci. 19, 1100–1114 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Hunnicutt, B. J. et al. A comprehensive excitatory input map of the striatum reveals novel functional organization. eLife 5, e19103 (2016).

    PubMed  PubMed Central  Google Scholar 

  37. Pan, W. X., Mao, T. & Dudman, J. T. Inputs to the dorsal striatum of the mouse reflect the parallel circuit architecture of the forebrain. Front. Neuroanat. 4, 147 (2010).

    PubMed  PubMed Central  Google Scholar 

  38. Cox, J. & Witten, I. B. Striatal circuits for reward learning and decision-making. Nat. Rev. Neurosci. 20, 482–494 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Mnih, V. et al. Asynchronous methods for deep reinforcement learning. In Proc. 33rd International Conference on Machine Learning (eds. Balcan, M. F. & Weinberger, K. Q.) 1928–1937 (jmlr.org, 2016).

  40. Sutton, R. S. Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988).

    Google Scholar 

  41. Daw, N. D., Kakade, S. & Dayan, P. Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–616 (2002).

    PubMed  Google Scholar 

  42. Lloyd, K. & Dayan, P. Safety out of control: dopamine and defence. Behav. Brain Funct. 12, 15 (2016).

    PubMed  PubMed Central  Google Scholar 

  43. Lak, A., Nomoto, K., Keramati, M., Sakagami, M. & Kepecs, A. Midbrain dopamine neurons signal belief in choice accuracy during a perceptual decision. Curr. Biol. 27, 821–832 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Daw, N. D., Courville, A. C. & Touretzky, D. S. Timing and partial observability in the dopamine system. In Proc. 15th International Conference on Neural Information Processing Systems (eds. Becker, S. et al.) 99–106 (MIT Press, 2003).

  45. Kurth-Nelson, Z. & Redish, A. D. Temporal-difference reinforcement learning with distributed representations. PLoS ONE 4, e7362 (2009).

    PubMed  PubMed Central  Google Scholar 

  46. Gershman, S. J., Pesaran, B. & Daw, N. D. Human reinforcement learning subdivides structured action spaces by learning effector-specific values. J. Neurosci. 29, 13524–13531 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Voorn, P., Vanderschuren, L. J. M. J., Groenewegen, H. J., Robbins, T. W. & Pennartz, C. M. A. Putting a spin on the dorsal–ventral divide of the striatum. Trends Neurosci. 27, 468–474 (2004).

    CAS  PubMed  Google Scholar 

  48. Rueda-Orozco, P. E. & Robbe, D. The striatum multiplexes contextual and kinematic information to constrain motor habits execution. Nat. Neurosci. 18, 453–460 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Parker, N. F. et al. Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning. Cell Rep. 39, 110756 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Matsumoto, N., Minamimoto, T., Graybiel, A. M. & Kimura, M. Neurons in the thalamic CM–Pf complex supply striatal neurons with information about behaviorally significant sensory events. J. Neurophysiol. 85, 960–976 (2001).

    CAS  PubMed  Google Scholar 

  51. Choi, K. et al. Distributed processing for value-based choice by prelimbic circuits targeting anterior–posterior dorsal striatal subregions in male mice. Nat. Commun. 14, 1920 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. da Silva, J. A., Tecuapetla, F., Paixão, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).

    PubMed  Google Scholar 

  53. Dodson, P. D. et al. Representation of spontaneous movement by dopaminergic neurons is cell-type selective and disrupted in parkinsonism. Proc. Natl Acad. Sci. USA 113, E2180–E2188 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Jog, M. S., Kubota, Y., Connolly, C. I., Hillegaart, V. & Graybiel, A. M. Building neural representations of habits. Science 286, 1745–1749 (1999).

    CAS  PubMed  Google Scholar 

  56. Ribas-Fernandes, J. J. F. et al. A neural signature of hierarchical reinforcement learning. Neuron 71, 370–379 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Jiang, L. & Litwin-Kumar, A. Models of heterogeneous dopamine signaling in an insect learning and memory center. PLoS Comput. Biol. 17, e1009205 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Matsumoto, H., Tian, J., Uchida, N. & Watabe-Uchida, M. Midbrain dopamine neurons signal aversion in a reward-context-dependent manner. eLife 5, e17328 (2016).

    PubMed  PubMed Central  Google Scholar 

  59. de Jong, J. W. et al. A neural circuit mechanism for encoding aversive stimuli in the mesolimbic dopamine system. Neuron 101, 133–151 (2019).

    PubMed  Google Scholar 

  60. Matsumoto, M. & Hikosaka, O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Lammel, S. et al. Input-specific control of reward and aversion in the ventral tegmental area. Nature 491, 212–217 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Syed, E. C. J. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat. Neurosci. 19, 34–36 (2016).

    CAS  PubMed  Google Scholar 

  63. O’Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).

    PubMed  Google Scholar 

  64. Moss, M. M., Zatka-Haas, P., Harris, K. D., Carandini, M. & Lak, A. Dopamine axons in dorsal striatum encode contralateral visual stimuli and choices. J. Neurosci. 41, 7197–7205 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Saunders, B. T., Richard, J. M., Margolis, E. B. & Janak, P. H. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072–1083 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Mikhael, J. G., Kim, H. R., Uchida, N. & Gershman, S. J. The role of state uncertainty in the dynamics of dopamine. Curr. Biol. 32, 1077–1087.e9 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Tsutsui-Kimura, I. et al. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task. eLife 9, e62390 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Avvisati, R. et al. Distributional coding of associative learning in discrete populations of midbrain dopamine neurons. Cell Rep. 43, 114080 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Gonon, F. et al. Geometry and kinetics of dopaminergic transmission in the rat striatum and in mice lacking the dopamine transporter. Prog. Brain Res. 125, 291–302 (2000).

    CAS  PubMed  Google Scholar 

  70. Akiti, K. et al. Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction. Neuron 110, 3789–3804.e9 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Rescorla, R. A. and Wagner, A. R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In Classical Conditioning II: Current Research and Theory (eds. Black, A. H. & Prokasy, W. F.) 64–99 (Appleton-Century-Crofts, 1972).

  72. Kamin, L. J. ‘Attention-like’ processes in classical conditioning. Miami Symposium on the Prediction of Behavior: Aversive Stimulation (ed. Jones, M. R.) 9–31 (Univ. Miami Press, 1968).

  73. Gershman, S. J., Norman, K. A. & Niv, Y. Discovering latent causes in reinforcement learning. Curr. Opin. Behav. Sci. 5, 43–50 (2015).

    Google Scholar 

  74. Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).

    PubMed  PubMed Central  Google Scholar 

  75. Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci. 20, 1643–1653 (2017).

    CAS  PubMed  Google Scholar 

  76. Niv, Y. Learning task-state representations. Nat. Neurosci. 22, 1544–1553 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. Pinto, L. et al. An accumulation-of-evidence task using visual pulses for mice navigating in virtual reality. Front. Behav. Neurosci. 12, 36 (2018).

    PubMed  PubMed Central  Google Scholar 

  78. Aronov, D. & Tank, D. W. Engagement of neural circuits underlying 2D spatial navigation in a rodent virtual reality system. Neuron 84, 442–456 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Brockman, G. et al. OpenAI Gym. Preprint at https://arxiv.org/abs/1606.01540 (2016).

  80. Hill, A. et al. Stable baselines. GitHub https://github.com/hill-a/stable-baselines (2018).

  81. Barreto, A. et al. Successor features for transfer in reinforcement learning. In Proc. 31st Conference on Neural Information Processing Systems (eds. Guyon, I. et al.) 4055–4065 (Curran Associates, Inc., 2017).

  82. Rowland, M. et al. Statistics and samples in distributional reinforcement learning. In Proc. 36th International Conference on Machine Learning, Vol. 97 (eds. Chaudhuri, K. & Salakhutdinov, R.) 5528–5536 (PMLR, 2019).

  83. Lee, R. S., Sagiv, Y., Engelhard, B., Witten, I. B. & Daw, N. D. A feature-specific prediction error model explains dopaminergic heterogeneity. Figshare https://doi.org/10.6084/m9.figshare.25752450 (2024).

Download references

Acknowledgements

We thank A. Luna and J. Lopez for their help with the VR software system; W. Dabney for discussion on the distributional RL model and providing the imputation function for the distributional RL model; M. Lee and E. Grant for help with training the deep RL network; P. Dayan, A. Kahn and L. Brown for comments on this work; and the BRAIN CoGS team and the laboratories of N.D.D. and I.B.W. for their help. This work was supported by an NSF GRFP (R.S.L.), 1K99MH122657 (B.E.), National Institutes of Health R01 DA047869 (I.B.W.), U19 NS104648-01 (I.B.W.), ARO W911NF-16-1-0474 (N.D.D.), ARO W911NF1710554 (I.B.W.), Brain Research Foundation (I.B.W.), Simons Collaboration on the Global Brain (I.B.W.) and the New York Stem Cell Foundation (I.B.W.).

Author information

Authors and Affiliations

Authors

Contributions

R.S.L., I.B.W. and N.D.D. conceived the project. B.E. and I.B.W. conducted the original neural recording experiments, and B.E. provided software, interpretation and analysis. R.S.L. and N.D.D. developed the model, and Y.S. extended it. R.S.L. wrote software, trained the deep network and conducted data analyses. R.S.L. and Y.S. conducted model simulations. R.S.L., Y.S., I.B.W. and N.D.D. wrote the manuscript.

Corresponding authors

Correspondence to Ilana B. Witten or Nathaniel D. Daw.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Talia Lerner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Deep RL network retrained with the same task and a different seed.

(a) Psychometric curve showing the retrained agent’s performance, with error bars showing ±1 s.e.m. (N = 1000 trials) (b) Retrained agent’s scalar value during the cue period decreased as a function of trial difficulty (defined as the absolute tower difference, blue gradient). (c) Retrained agent’s feature-specific RPE units’ response to confirmatory (purple) and disconfirmatory (gray) cues. Response is averaged across cue-sensitive units only (N = 47/64), with error bars indicating ±1 s.e.m. (d) Retrained agent’s feature-specific RPE units averaged across trials plotted with respect to view angle. Each row represents one unit’s peak normalized response to the view angle. (e-f) Same as (d) but for the agent’s (e) position and (f) left (red) and right (blue) cues. (g) Retrained agent’s scalar RPE time-locked at reward time for rewarded (magenta) and unrewarded trials (gray). (h) Same as (g) but for rewarded trials with different trial difficulties, with hard trials (light blue) and easy trials (dark blue) defined like in Fig. 5d. (i) Histogram of retrained agent’s feature-specific RPE units’ response to reward minus omission at reward time (P = 4.06e-12 for two sided Wilcoxon signed rank test, N = 64). Yellow line indicates median. (j) Same as (i) but for rewarded trials plotted against trial difficulty (P = 0.028 for two sided Wilcoxon signed rank test, N = 64). In (j), there is an outlier data point at 0.16 for a feature-specific RPE unit showing strong reward expectation modulation.

Source data

Extended Data Fig. 2 Tuning of 64 LSTM feature units to position and evidence.

(a) Each panel shows an individual feature unit and how it is tuned to the agent’s position in the maze and the cumulative tower difference at that position.

Source data

Extended Data Fig. 3 Feature-specific RPEs clustered to behavioral features of the task to match Engelhard et al4 clustering analysis.

(a) Optimal number of clusters was 3, selected by minimizing the AIC scores for models with different numbers of clusters. (b) Relative contribution of the behavioral features including cues, position, choice and reward response for the 64 units sorted based on the highest probability belonging to the cluster, with colored lines on top indicating the cluster identity. Relative contribution is defined as the percentage of the explained variance for the partial model not including the variable versus the full model.

Source data

Extended Data Fig. 4 Scalar RPE signal does not reflect the incidental high-dimensional features.

(a) Scalar RPE signal autocorrelated across time (similarity defined as the cosine of position-lagged scalar RPE responses) does not show peaks at position lag 43 cm for the wall-pattern repetition location.

Source data

Extended Data Fig. 5 Cue responses in model and DAergic neurons represent a vector code of lateralized response and RPE.

(a) Average responses of the contralateral cue responsive DA neurons9 only recorded from the left hemisphere (N= 54/62 neurons, subset of contralateral cue responsive neurons from Fig. 4c) for confirmatory (red) and disconfirmatory (gray) contralateral cues. Colored fringes represent ±1 s.e.m. for kernel amplitudes. (b) Same as (a) but for DA neurons recorded on the right hemisphere (N = 8/62) responding to confirmatory (blue) and disconfirmatory (gray) confirmatory cues. (c-d) Same as (a-b), but for feature-specific RPE model units responding to (c) left cues specifically (N = 27/44, subset of the cue responsive neurons from Fig. 4b that were modulated by left cues only) and (d) right cues specifically (N = 17/44). Error bars indicate ±1 s.e.m.

Source data

Extended Data Fig. 6 Scalar RPE shows fine-grained reward expectation modulation.

(a) Scalar RPE’s response modulated by reward expectation given by the difficulty of the task, defined as the absolute value of the final tower difference (blue gradient) of the trial.

Source data

Extended Data Fig. 7 State diagrams for simulations of the Parker et al6 and Jin and Costa26 tasks in Fig. 8.

(a) State diagram for the Parker et al6 task simulation. (b) State diagram for the Jin and Costa26 task simulation. In both panels, arrows indicate probabilistic transitions between states, with probabilities described by the arrow labels. Unlabeled arrows denote transitions with the remaining probability to make the total sum to 1. π(x) refers to the probability of executing action x under the agent’s behavioral policy π.

Supplementary information

Supplementary Information

Supplementary Table 1: Hyperparameters for the A2C algorithm.

Reporting Summary

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Fig. 7

Statistical source data.

Source Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, R.S., Sagiv, Y., Engelhard, B. et al. A feature-specific prediction error model explains dopaminergic heterogeneity. Nat Neurosci 27, 1574–1586 (2024). https://doi.org/10.1038/s41593-024-01689-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-024-01689-1

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing