A distributional code for value in dopamine-based reinforcement learning

Dabney, Will; Kurth-Nelson, Zeb; Uchida, Naoshige; Starkweather, Clara Kwon; Hassabis, Demis; Munos, Rémi; Botvinick, Matthew

doi:10.1038/s41586-019-1924-6

Article
Published: 15 January 2020

A distributional code for value in dopamine-based reinforcement learning

Will Dabney¹^na1,
Zeb Kurth-Nelson^1,2^na1,
Naoshige Uchida³,
Clara Kwon Starkweather³,
Demis Hassabis¹,
Rémi Munos¹ &
…
Matthew Botvinick^1,4^na1

Nature volume 577, pages 671–675 (2020)Cite this article

60k Accesses
189 Citations
485 Altmetric
Metrics details

Subjects

Abstract

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain^1,2,3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning^4,5,6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Distributional value coding arises from a diversity of relative scaling of positive and negative prediction errors.

**Fig. 2: Different dopamine neurons consistently reverse from positive to negative responses at different reward magnitudes.**

**Fig. 3: Optimistic and pessimistic probability coding occur concurrently in dopamine and VTA GABAergic neurons.**

Fig. 4: Relative scaling of positive and negative dopamine responses predicts reversal point.

Fig. 5: Decoding reward distributions from neural responses.

The language network as a natural kind within the broader landscape of the human brain

Article 12 April 2024

Evelina Fedorenko, Anna A. Ivanova & Tamar I. Regev

brainlife.io: a decentralized and open-source cloud platform to support neuroscience research

Article Open access 11 April 2024

Soichi Hayashi, Bradley A. Caron, … Franco Pestilli

Uniquely human intelligence arose from expanded information capacity

Article 02 April 2024

Jessica F. Cantlon & Steven T. Piantadosi

Data availability

The neuronal data analysed in this work are available at https://doi.org/10.17605/OSF.IO/UX5RG.

Code availability

The analysis code from our value-distribution decoding and code used to generate model predictions for distributional TD are available at https://doi.org/10.17605/OSF.IO/UX5RG.

References

Schultz, W., Stauffer, W. R. & Lak, A. The phasic dopamine signal maturing: from reward via behavioural activation to formal economic utility. Curr. Opin. Neurobiol. 43, 139–148 (2017).
Article CAS Google Scholar
Glimcher, P. W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl Acad. Sci. USA 108, 15647–15654 (2011).
Article ADS CAS Google Scholar
Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).
Article CAS Google Scholar
Morimura, T., Sugiyama, M., Kashima, H., Hachiya, H. & Tanaka, T. Parametric return density estimation for reinforcement learning. In Proc. 26th Conference on Uncertainty in Artificial Intelligence (eds Grunwald, P. & Spirtes, P.) http://dl.acm.org/citation.cfm?id=3023549.3023592 (2010).
Bellemare, M. G., Dabney, W., & Munos, R. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (eds Precup, D. & The, Y. W.) 449–458 (2017).
Dabney, W. Rowland, M. Bellemare, M. G. & Munos, R. Distributional reinforcement learning with quantile regression. In AAAI Conference on Artificial Intelligence (2018).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: an Introduction Vol. 1 (MIT Press, 1998).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article ADS CAS Google Scholar
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Article ADS CAS Google Scholar
Hessel, M. et al. Rainbow: combining improvements in deep reinforcement learning. In 32nd AAAI Conference on Artificial Intelligence (2018).
Botvinick, M. M., Niv, Y. & Barto, A. G. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).
Article Google Scholar
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860–868 (2018).
Article CAS Google Scholar
Song, H. F., Yang, G. R. & Wang, X. J. Reward-based training of recurrent neural networks for cognitive and value-based tasks. eLife 6, e21492 (2017).
Article Google Scholar
Barth-Maron, G. et al. Distributed distributional deterministic policy gradients. In International Conference on Learning Representations https://openreview.net/forum?id=SyZipzbCb (2018).
Dabney, W., Ostrovski, G., Silver, D. & Munos, R. Implicit quantile networks for distributional reinforcement learning. In International Conference on Machine Learning (2018).
Pouget, A., Beck, J. M., Ma, W. J. & Latham, P. E. Probabilistic brains: knowns and unknowns. Nat. Neurosci. 16, 1170–1178 (2013).
Article CAS Google Scholar
Lammel, S., Lim, B. K. & Malenka, R. C. Reward and aversion in a heterogeneous midbrain dopamine system. Neuropharmacology 76, 351–359 (2014).
Article CAS Google Scholar
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Article ADS CAS Google Scholar
Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).
Article ADS CAS Google Scholar
Rowland, M., et al. Statistics and samples in distributional reinforcement learning. In International Conference on Machine Learning (2019).
Frank, M. J., Seeberger, L. C. & O’Reilly, R. C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943 (2004).
Article ADS CAS Google Scholar
Hirvonen, J. et al. Striatal dopamine D1 and D2 receptor balance in twins at increased genetic risk for schizophrenia. Psychiatry Res. Neuroimaging 146, 13–20 (2006).
Article CAS Google Scholar
Piggott, M. A. et al. Dopaminergic activities in the human striatum: rostrocaudal gradients of uptake sites and of D1 and D2 but not of D3 receptor binding or dopamine. Neuroscience 90, 433–445 (1999).
Article CAS Google Scholar
Rosa-Neto, P., Doudet, D. J. & Cumming, P. Gradients of dopamine D1- and D2/3-binding sites in the basal ganglia of pig and monkey measured by PET. Neuroimage 22, 1076–1083 (2004).
Article Google Scholar
Mikhael, J. G. & Bogacz, R. Learning reward uncertainty in the basal ganglia. PLOS Comput. Biol. 12, e1005062 (2016).
Article ADS Google Scholar
Robb, B. et al. A computational and neural model of momentary subjective well-being. Proc. Natl Acad. Sci. USA 111, 12252–12257 (2014).
Article ADS Google Scholar
Huys, Q. J., Daw, N. D. & Dayan, P. Depression: a decision-theoretic analysis. Annu. Rev. Neurosci. 38, 1–23 (2015).
Article CAS Google Scholar
Bennett, D. & Niv, Y. Opening Burton’s clock: psychiatric insights from computational cognitive models. Preprint at https://doi.org/10.31234/osf.io/y2vzu (2018).
Tian, J. & Uchida, N. Habenula lesions reveal that multiple mechanisms underlie dopamine prediction errors. Neuron 87, 1304–1316 (2015).
Article CAS Google Scholar
Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).
Article CAS Google Scholar
Newey, W. K. & Powell, J. L. Asymmetric least squares estimation and testing. Econometrica 55, 819–847 (1987).
Article MathSciNet Google Scholar
Chris Jones, M. Expectiles and m-quantiles are quantiles. Stat. Probab. Lett. 20, 149–153 (1994).
Article MathSciNet Google Scholar
Ziegel, J. F. Coherence and elicitability. Math. Finance 26, 901–918 (2016).
Article Google Scholar
Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013).
Article Google Scholar
Heess, N. et al. Emergence of locomotion behaviours in rich environments. Preprint at https://arxiv.org/abs/1707.02286 (2017).
Bäckman, C. M., et al. Characterization of a mouse strain expressing cre recombinase from the 3′ untranslated region of the dopamine transporter locus. Genesis 44, 383–390 (2006).
Article Google Scholar
Cohen, J. Y. et al. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Article ADS CAS Google Scholar
Stauffer, W. R., Lak, A. & Schultz, W. Dopamine reward prediction error responses reflect marginal utility. Curr. Biol. 24, 2491–2500 (2014).
Article CAS Google Scholar
Fiorillo, C. D., Song, M. R. & Yun, S. R. Multiphasic temporal dynamics in responses of midbrain dopamine neurons to appetitive and aversive stimuli. J. Neurosci. 33, 4710–4725 (2013).
Article CAS Google Scholar
Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Prioritized experience replay. In International Conference on Learning Representations (2016).
Van Hasselt, H., Guez, A. & Silver, D. Deep reinforcement learning with double q-learning. In AAAI Conference on Artificial Intelligence (2016).
Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images (Univ. of Toronto, 2009).

Download references

Acknowledgements

We thank K. Miller, P. Dayan, T. Stepleton, J. Paton, M. Frank, C. Clopath, T. Behrens and the members of the Uchida laboratory for comments on the manuscript; and N. Eshel, J. Tian, M. Bukwich and M. Watabe-Uchida for providing data.

Author information

These authors contributed equally: Will Dabney, Zeb Kurth-Nelson, Matthew Botvinick

Authors and Affiliations

DeepMind, London, UK
Will Dabney, Zeb Kurth-Nelson, Demis Hassabis, Rémi Munos & Matthew Botvinick
Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
Zeb Kurth-Nelson
Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
Naoshige Uchida & Clara Kwon Starkweather
Gatsby Computational Neuroscience Unit, University College London, London, UK
Matthew Botvinick

Authors

Will Dabney
View author publications
You can also search for this author in PubMed Google Scholar
Zeb Kurth-Nelson
View author publications
You can also search for this author in PubMed Google Scholar
Naoshige Uchida
View author publications
You can also search for this author in PubMed Google Scholar
Clara Kwon Starkweather
View author publications
You can also search for this author in PubMed Google Scholar
Demis Hassabis
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Munos
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Botvinick
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.D. conceived the project. W.D., Z.K.-N. and M.B. contributed ideas for experiments and analysis. W.D. and Z.K.-N. performed simulation experiments and analysis. N.U. and C.K.S. provided neuronal data for analysis. W.D., Z.K.-N. and M.B. managed the project. M.B., N.U., R.M. and D.H. advised on the project. M.B., W.D. and Z.K.-N. wrote the paper. W.D., Z.K.-N., M.B., N.U., C.K.S., D.H. and R.M. provided revisions to the paper.

Corresponding author

Correspondence to Will Dabney.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Rui Costa, Michael Littman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Mechanism of distributional TD.

a, The degree of asymmetry in positive to negative scale determines the equilibrium where positive and negative errors balance. Equal scaling equilibrates at the mean, whereas a larger positive (negative) scaling produces an equilibrium above (below) the mean. b, Distributional prediction emerges through experience. Quantile (sign function) version is displayed here for clarity. Model is trained on arbitrary task with trimodal reward distribution. c, Same as b, viewed in terms of cumulative distribution (left) or learned value for each predictor (quantile function) (right).

Extended Data Fig. 2 Learning the distribution of returns improves performance of deep RL agents across multiple domains.

a, DQN and distributional TD share identical nonlinear network structures. b, c, After training classical or distributional DQN on MsPacman, we freeze the agent and then train a separate linear decoder to reconstruct frames from the agent’s final layer representation. For each agent, reconstructions are shown. The distributional model’s representation allows substantially better reconstruction. d, At a single frame of MsPacman (not shown), the agent’s value predictions together represent a probability distribution over future rewards. Reward predictions of individual RPE channels shown as tick marks ranging from pessimistic (blue) to optimistic (red), and kernel density estimate shown in black. e, Atari-57 experiments with single runs of prioritized experience replay⁴⁰ and double DQN⁴¹ agents for reference. Benefits of distributional learning exceed other popular innovations. f, g, The performance pay-off of distributional RL can be seen across a wide diversity of tasks. Here we give another example, a humanoid motor-control task in the MuJoCo physics simulator. Prioritized experience replay agent is shown for reference¹⁴. Traces show individual runs; averages are in bold.

Extended Data Fig. 3 Simulation experiment to examine the role of representation learning in distributional RL.

a, Illustration of tasks 1 and 2. b, Example images for each class used in our experiment⁴² c, Experimental results, where each of ten random seeds yields an individual run shown with traces; average over seeds is shown in bold. d, Same as c, but for control experiment. e, Bird–dog t-SNE visualization of final hidden layer of network, given different input images (blue, bird; red, dog). Left, classical TD; right, distributional TD; top row, representation after training on task 1; bottom row, representation after training on task 2.

Extended Data Fig. 4 Null models.

a, Classical TD plus noise does not give rise to the pattern of results observed in real dopamine data in the variable-magnitude task. When reversal points were estimated in two independent partitions there was no correlation between the two (P = 0.32 by linear regression). b, We then estimated asymmetric scaling of responses and found no correlation between this and reversal point (P = 0.78 by linear regression). c, Model comparison between ‘same’, a single reversal point, and ‘diverse’, separate reversal points. In both, the model is used to predict whether a held-out trial has a positive or negative response. d, Simulated baseline-subtracted RPEs, colour-coded according to the ground-truth value of bias added to that cell’s RPEs. e, Across all simulated cells, there was a strong positive relationship between pre-stimulus baseline firing and the estimated reversal point. f, Two independent measurements of the reversal point were strongly correlated. g, The proportion of simulated cells that have significantly positive (blue) or negative (red) responses showed no magnitudes with both positive and negative responses. h, In the simulation, there was a significant negative relationship between the estimated asymmetry of each cell and its estimated reversal point (opposite that observed in neural data). i, Diagram illustrating a Gaussian-weighted topological mapping between RPEs and value predictors. j, Varying the standard deviation of this Gaussian modulates the degree of coupling. k, In a task with equal chance of a reward 1.0 or 0.0, distributional TD with different levels of coupling shows robustness to the degree of coupling. l, When there is no coupling, a distributional code is not learned, but asymmetric scaling can cause spurious detection of diverse reversal points. m, Even though every cell has the same reward prediction they appear to have different reversal points. n, With this model, some cells may have significantly positive responses, and others significantly negative responses, in response to the same reward. o, But this model is unable to explain a positive correlation between asymmetric scaling and reversal points. p, Simulation of ‘synaptic’ distributional RL, in which learning rates but not firing rates are asymmetrically scaled. This model predicts diversity in reversal points between dopamine neurons. q, The model predicts no correlation between asymmetric scaling of firing rates and reversal point.

Extended Data Fig. 5 Asymmetry and reversal.

a, Left, all data points (trials) from an example cell. The solid lines are linear fits to the positive and negative domains, and the shaded areas show 95% confidence intervals calculated with Bayesian regression. Right, the same cell plotted in the format of Fig. 4b. b, Cross-validated model comparison on the dopamine data favours allowing each cell to have its own asymmetric scaling (P = 1.4 × 10⁻¹¹ by paired t-test). The standard error of the mean appears large relative to the P value because the P value is computed using a paired test. c, Although the difference between single-asymmetry and diverse-asymmetry models was small in firing-rate space, such small differences correspond to large differences in decoded distribution space (more details in Supplementary Information). Each point is a TD simulation; colour indicates the degree of diversity in asymmetric scaling within that simulation. d, We were interested in whether an apparent correlation between reversal point and asymmetry could arise as an artefact, owing to a mismatch between the shape of the actual dopamine response function and the function used to fit it. Here we simulate the variable-magnitude task using a TD model without a true correlation between asymmetric scaling and reversal point. We then apply the same analysis pipeline as in the main paper, to measure the correlation (colour axis) between asymmetric scaling and reversal point. We repeat this procedure 20 times with different dopamine response functions in the simulation, and different functions used to fit the positive and negative domains of the simulated data. The functions are sorted in increasing order of concavity. An artefact can emerge if the response function used to fit the data is less concave than the response function used to generate the data. For example, when generating data with a Hill function but fitting with a linear function, a positive correlation can be spuriously measured. e, When simulating data from the distributional TD model, where a true correlation exists between asymmetric scaling and reversal point, it is always possible to detect this positive correlation, even if the fitting response function is more concave than the generating response function. The black rectangle highlights the function used to fit real neural data in c. f, Here we analyse the real dopamine cell data identically to Fig. 4d, but using Hill functions instead of linear functions to fit the positive and negative domains. Because the correlation between asymmetric scaling and reversal point still appears under these adversarial conditions, we can be confident it is not driven by this artefact. g, Same as Fig. 4d, but using linear response function and linear utility function (instead of empirical utility).

Extended Data Fig. 6 Cue responses versus outcome responses, and more evidence for diversity.

a, In the variable-probability task: firing at cue, versus firing at reward (left) or omission (right). Colour brightness denotes asymmetry. b, Same as a, but showing RPEs from distributional TD simulation. c, Data from ref. ³⁰ also included unpredicted rewards and unpredicted airpuffs. Top two panels show responses for all the cells recorded in one animal and bottom two panels show responses for all the cells of another animal. Left, the x axis is the baseline-subtracted response to free reward and the y axis is the baseline-subtracted response to airpuff. Dots with black outlines are per-cell means, and un-outlined dots are means of disjoint subsets of trials indicating consistency of asymmetry. Right, the same data plotted in a different way, with cells sorted along the x axis by response to airpuff. Response to reward is shown in greyscale dots. Asterisks indicate significant difference in firing rates from one or both neighbouring cells. d, Simulations for distributional but not classical TD produce diversity in relative response.

Extended Data Fig. 7 More details of data in variable-probability task.

a, Details of analysis method. Of the four possible outcomes of the two Mann–Whitney tests (Methods), two outcomes correspond to interpolation (middle) and one each to the pessimistic (left) and optimistic (right) groups. b, Simulation results for the classical TD and distributional TD models. y axis shows the average firing-rate change, normalized to mean zero and unit variance, in response to each of the three cues. Each curve is one cell. The cells are split into panels according to a statistical test for type of probability coding (see Methods for details). Colour indicates the degree of optimism or pessimism. Distributional TD predicts simultaneous optimistic and pessimistic coding of probability, whereas classical TD predicts all cells have the same coding. c, Same as b, but using data from real dopamine neurons. The pattern of results closely matches the predictions from the distributional TD model. d, Same as b, using data from putative VTA GABAergic interneurons.

Extended Data Fig. 8 Further distribution decoding analysis.

This figure pertains to the variable-magnitude experiment. a–c, In the decoding shown in the main text, we constrained the support of the distribution to the range of the rewards in the task. Here, we applied the decoding analysis without constraining the output values. We find similar results, although with increased variance. d, We compare the quality of the decoded distribution against several controls. The real decoding is shown as black dots. In coloured lines are reference distributions (uniform and Gaussian with the same mean and variance as the ground truth; and the ground truth mirrored). Black traces shift or scale the ground-truth distribution by varying amounts. e, Nonlinear functions used to shift asymmetries, to measure degradation of decoded distribution. The normal cumulative distribution function ϕ is used to transform asymmetry τ. This is shifted by some value s and transformed back through the normal quantile function ϕ⁻¹. Positive values s increase the value of τ and negative values decrease the value of τ. f, Decoded distributions under different shifts, s. g, Plot of shifted asymmetries for values of s used. h, Quantification of match between decoded and ground-truth distribution, for each s. i, j, Same as Fig. 5d, e, but for putative GABAergic cells rather than dopamine cells.

Extended Data Fig. 9 Simultaneous diversity.

a, b, Variable-probability task. Mean spiking (a) and licking (b) activity in response to each of the three cues (indicating 10%, 50% or 90% probability of reward) at time 0, and in response to the outcome (reward or no reward) at time 2,000 ms. c, Trial-to-trial variations in lick rates were strongly correlated with trial-to-trial variations in dopamine firing rates. Mean of each cell is subtracted from each axis, and the x axis is binned for ease of visualization. d, Dopaminergic coding of the 50% cue relative to the 10% and 90% cues (as shown in b) was not correlated with the same measure computed on lick rates. Therefore, between-session differences in cue preference, measured by anticipatory licking, cannot explain between-cell differences in optimism. e, Four simultaneously recorded dopamine neurons. These are the same four cells whose time courses are shown in Fig. 3c. f, Variable-magnitude task. Across cells, there was no relationship between asymmetric scaling of positive versus negative prediction errors, and baseline firing rates (R = 0.18, P = 0.29). Each point is a cell. These data are from dopamine neurons at reward delivery time. g, t-statistics of response to 5 μl reward compared with baseline firing rate, for all 16 cells from animal D. Some cells respond significantly above baseline and others significantly below. Cells are sorted by t-statistic. h, Spike rasters showing all trials in which the 5 μl reward was delivered. The two panels are two example cells from the same animal with rasters shown in Fig. 2.

Extended Data Fig. 10 Relationship of results to original analysis.

Here we reproduce results for the variable-magnitude task in ref. ³⁰ with two different time windows. a, Change in firing rate in response to cued reward delivery averaged over all cells. b, Comparing Hill-function fit and response averaged over all cells for expected (cued) and unexpected reward delivery. c, Correlation between response predicted by scaled common response function and actual response to expected reward delivery. d, Zooming in on c shows correlation driven primarily by larger reward magnitudes. e–h, Repeating the above analysis for a window of 200–600 ms.

Supplementary information

Supplementary Information

This material contains six sections: Section 1 covers mechanisms underlying distributional RL; Section 2 considers alternative models; Section 3 tests robustness to modelling assumptions; Section 4 presents supplementary results; Section 5 discusses relations to previous work; and Section 6 gives further predictions of the theory.

Reporting Summary

Source data

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dabney, W., Kurth-Nelson, Z., Uchida, N. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020). https://doi.org/10.1038/s41586-019-1924-6

Download citation

Received: 03 January 2019
Accepted: 19 November 2019
Published: 15 January 2020
Issue Date: 30 January 2020
DOI: https://doi.org/10.1038/s41586-019-1924-6

This article is cited by

Frontostriatal circuit dysfunction leads to cognitive inflexibility in neuroligin-3 R451C knockin mice
- Shen Lin
- Cui-ying Fan
- Jian-hong Luo
Molecular Psychiatry (2024)
A multi-stage anticipated surprise model with dynamic expectation for economic decision-making
- Ho Ka Chan
- Taro Toyoizumi
Scientific Reports (2024)
Anterior cingulate learns reward distribution
- Tao Hong
- William R. Stauffer
Nature Neuroscience (2024)
Lasting dynamic effects of the psychedelic 2,5-dimethoxy-4-iodoamphetamine ((±)-DOI) on cognitive flexibility
- Merima Šabanović
- Alberto Lazari
- David M. Bannerman
Molecular Psychiatry (2024)
Distributional reinforcement learning in prefrontal cortex
- Timothy H. Muller
- James L. Butler
- Steven W. Kennerley
Nature Neuroscience (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.