Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning

Abstract

A large body of evidence has indicated that the phasic responses of midbrain dopamine neurons show a remarkable similarity to a type of teaching signal (temporal difference (TD) error) used in machine learning. However, previous studies failed to observe a key prediction of this algorithm: that when an agent associates a cue and a reward that are separated in time, the timing of dopamine signals should gradually move backward in time from the time of the reward to the time of the cue over multiple trials. Here we demonstrate that such a gradual shift occurs both at the level of dopaminergic cellular activity and dopamine release in the ventral striatum in mice. Our results establish a long-sought link between dopaminergic activity and the TD learning algorithm, providing fundamental insights into how the brain associates cues and rewards that are separated in time.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Purchase on Springer Link

Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Temporal shift of TD error.
Fig. 2: Dopamine release in the VS during first-time classical conditioning.
Fig. 3: Dopamine axon activity in reversal learning.
Fig. 4: Dopamine axon activity in repeated associative learning.
Fig. 5: Two-photon imaging of dopamine activity in reversal learning.
Fig. 6: Dynamics of prediction error signals in models with different update rules.

Similar content being viewed by others

Data availability

The fluorometry and two-photon imaging data have been shared at a public deposit source (https://datadryad.org/stash/dataset/doi:10.5061/dryad.hhmgqnkjw). Source data are provided with this paper.

Code availability

The model code is attached as Supplemetary Data. All other conventional codes used to obtain the results are available from a public deposit source (https://github.com/VTA-SNc/Amo2022).

References

  1. Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  PubMed  Google Scholar 

  3. Rescorla, R. A. & Wagner, A. R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Class. Cond. II Curr. Res. Theory 2, 64–99 (1972).

    Google Scholar 

  4. Sutton, R. S. & Barto, A. G. A temporal-difference model of classical conditioning. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society. 355–378 (1987).

  5. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).

  6. Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).

    Article  CAS  PubMed  Google Scholar 

  7. Flagel, S. B. et al. A selective role for dopamine in stimulus–reward learning. Nature 469, 53–57 (2011).

    Article  CAS  PubMed  Google Scholar 

  8. Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Day, J. J., Roitman, M. F., Wightman, R. M. & Carelli, R. M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).

    Article  CAS  PubMed  Google Scholar 

  10. Clark, J. J., Collins, A. L., Sanford, C. A. & Phillips, P. E. M. Dopamine encoding of Pavlovian incentive stimuli diminishes with extended training. J. Neurosci. 33, 3526–3532 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward–learning network. J. Neurosci. 25, 6235–6242 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Brown, J., Bullock, D. & Grossberg, S. How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. J. Neurosci. 19, 10502–10511 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Mollick, J. A. et al. A systems-neuroscience model of phasic dopamine. Psychol. Rev. 127, 972–1021 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  14. O’Reilly, R. C., Frank, M. J., Hazy, T. E. & Watz, B. PVLV: the primary value and learned value Pavlovian learning algorithm. Behav. Neurosci. 121, 31–49 (2007).

    Article  PubMed  Google Scholar 

  15. Tan, C. O. & Bullock, D. A local circuit model of learned striatal and dopamine cell responses under probabilistic schedules of reward. J. Neurosci. 28, 10062–10074 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Maes, E. J. P. et al. Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat. Neurosci. 23, 176–178 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sutton, R. S., Precup, D. & Singh, S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999).

    Article  Google Scholar 

  19. Mohebi, A. et al. Dissociable dopamine dynamics for learning and motivation. Nature 570, 65–70 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kim, H. R. et al. A unified framework for dopamine signals across timescales. Cell 183, 1600–1616 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Tsutsui-Kimura, I. et al. Distinct temporal difference error signals in dopamine axons in three regions of the striatum in a decision-making task. eLife 9, e62390 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Li, L., Walsh, T. J. & Littman, M. L. Towards a unified theory of state abstraction for MDPs. In: Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.1229 (2006).

  23. Botvinick, M. M., Niv, Y. & Barto, A. G. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).

    Article  PubMed  Google Scholar 

  24. Bromberg-Martin, E. S., Matsumoto, M., Hong, S. & Hikosaka, O. A pallidus-habenula-dopamine pathway signals inferred stimulus values. J. Neurophysiol. 104, 1068–1076 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Coddington, L. T. & Dudman, J. T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat. Neurosci. 21, 1563–1573 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Zhong, W., Li, Y., Feng, Q. & Luo, M. Learning and stress shape the reward response patterns of serotonin neurons. J. Neurosci. 37, 8863–8875 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sun, F. et al. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat. Methods 17, 1156–1166 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kakade, S. & Dayan, P. Dopamine: generalization and bonuses. Neural Netw. 15, 549–559 (2002).

    Article  PubMed  Google Scholar 

  30. Morrens, J., Aydin, Ç., Janse van Rensburg, A., Esquivelzeta Rabell, J. & Haesler, S. Cue-evoked dopamine promotes conditioned responding during learning. Neuron 106, 142–153 (2020).

    Article  CAS  PubMed  Google Scholar 

  31. Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Tian, J. & Uchida, N. Habenula lesions reveal that multiple mechanisms underlie dopamine prediction errors. Neuron 87, 1304–1316 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Niv, Y., Duff, M. O. & Dayan, P. Dopamine, uncertainty and TD learning. Behav. Brain Funct. 1, 6 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13, 900–913 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Kobayashi, S. & Schultz, W. Reward contexts extend dopamine signals to unrewarded stimuli. Curr. Biol. 24, 56–62 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Matsumoto, H., Tian, J., Uchida, N. & Watabe-Uchida, M. Midbrain dopamine neurons signal aversion in a reward-context-dependent manner. eLife 5, e17328 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Engelhard, B. et al. Specialized coding of sensory, motor and cognitive variables in VTA dopamine neurons. Nature 570, 509–513 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Menegas, W. et al. Dopamine neurons projecting to the posterior striatum form an anatomically distinct subclass. eLife 4, e10032 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Sutton, R. S. & Barto, A. G. Reinforcement Learning, Second Edition: An Introduction (MIT Press, 2018).

  41. Babayan, B. M., Uchida, N. & Gershman, S. J. Belief state representation in the dopamine system. Nat. Commun. 9, 1891 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Kobayashi, S. & Schultz, W. Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 7837–7846 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Lee, R. S., Mattar, M. G., Parker, N. F., Witten, I. B. & Daw, N. D. Reward prediction error does not explain movement selectivity in DMS-projecting dopamine neurons. eLife 8, e42992 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Kawato, M. & Samejima, K. Efficient reinforcement learning: computational theories, neuroscience and robotics. Curr. Opin. Neurobiol. 17, 205–212 (2007).

    Article  CAS  PubMed  Google Scholar 

  46. Tian, J. et al. Distributed and mixed information in monosynaptic inputs to dopamine neurons. Neuron 91, 1374–1389 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  CAS  PubMed  Google Scholar 

  48. Botvinick, M., Wang, J. X., Dabney, W., Miller, K. J. & Kurth-Nelson, Z. Deep reinforcement learning and its neuroscientific implications. Neuron 107, 603–616 (2020).

    Article  CAS  PubMed  Google Scholar 

  49. Bäckman, C. M. et al. Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus. Genesis 44, 383–390 (2006).

    Article  PubMed  CAS  Google Scholar 

  50. Tong, Q. et al. Synaptic glutamate release by ventromedial hypothalamic neurons is part of the neurocircuitry that prevents hypoglycemia. Cell Metab. 5, 383–393 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Madisen, L. et al. A robust and high-throughput Cre reporting and characterization system for the whole mouse brain. Nat. Neurosci. 13, 133–140 (2010).

    Article  CAS  PubMed  Google Scholar 

  52. Daigle, T. L. et al. A suite of transgenic driver and reporter mouse lines with enhanced brain-cell-type targeting and functionality. Cell 174, 465–480 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Tsutsui-Kimura, I. et al. Dysfunction of ventrolateral striatal dopamine receptor type 2-expressing medium spiny neurons impairs instrumental motivation. Nat. Commun. 8, 14304 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Zhang, F. et al. Optogenetic interrogation of neural circuits: technology for probing mammalian brain structures. Nat. Protoc. 5, 439–456 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Inutsuka, A. et al. The integrative role of orexin/hypocretin neurons in nociceptive perception and analgesic regulation. Sci. Rep. 6, 29480 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Dana, H. et al. High-performance calcium sensors for imaging activity in neuronal populations and microcompartments. Nat. Methods 16, 649–657 (2019).

    Article  CAS  PubMed  Google Scholar 

  58. Uchida, N. & Mainen, Z. F. Speed and accuracy of olfactory discrimination in the rat. Nat. Neurosci. 6, 1224–1229 (2003).

    Article  CAS  PubMed  Google Scholar 

  59. Pachitariu, M. et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. Preprint at https://www.biorxiv.org/content/10.1101/061507v2 (2017).

  60. Keemink, S. W. et al. FISSA: a neuropil decontamination toolbox for calcium imaging signals. Sci. Rep. 8, 3493 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. Paxinos, G. & Franklin, K. B. J. Paxinos and Franklin’s the Mouse Brain in Stereotaxic Coordinates (Academic Press, 2019).

Download references

Acknowledgements

We thank I. Tsutsui-Kimura, H.-G. Kim and B. Babayan for technical assistance; V. Roser and S. Ikeda for assistance in animal training; H. Matsumoto for sharing data; A. Lowet, M. Bukwich and all laboratory members for discussion. We thank M. Mathis, E. Soucy, V. Murthy, M. Andermann and members of their laboratories for advice on establishing two-photon imaging of deep structures. We thank C. Dulac for sharing reagents and equipment. We thank D. Kim and the GENIE Project, Janelia Farm Research Campus, Howard Hughes Medical Institute, for pGP-CMV-GCaMP6f and pGP-AAV-CAG-FLEX-jGCaMP7f-WPRE plasmids; E. Boyden, Media Lab, Massachusetts Institute of Technology, for AAV5-CAG-FLEX-tdTomato and AAV5-CAG-tdTomato; K. Deisseroth, Stanford University, for pAAV-EF1a-DIO-hChR2(H134R)-EYFP-WPRE; and Y. Li, State Key Laboratory of Membrane Biology, Peking University, for AAV9-hSyn-DA2m. This work was supported by grants from the National Institute of Mental Health (R01MH125162, to M.W.-U.), the National Institutes of Health (U19 NS113201 and NS108740, to N.U.), the Simons Collaboration on Global Brain (to N.U.), the Japan Society for the Promotion of Science, Japan Science and Technology Agency (to R.A.), the Human Frontier Science Program (LT000801/2018, to S.M.), the Harvard Brain Science Initiative (HBI Young Scientist Transitions Award, to S.M.) and Brain Mapping by Integrated Neurotechnologies for Disease Studies (Brain/MINDS) by AMED (JP20dm0207069, to K.F.T.).

Author information

Authors and Affiliations

Authors

Contributions

R.A. and M.W.-U. designed experiments and analyzed data. R.A. and S.M. collected data. R.A. and A.Y. made constructs. K.F.T. made transgenic mice. The results were discussed and interpreted by R.A., S.M., N.U. and M.W.-U. R.A., N.U. and M.W.-U. wrote the paper.

Corresponding author

Correspondence to Mitsuko Watabe-Uchida.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Erin Calipari and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Recording sites for fiber-fluorometry and example of fiber-fluorometry signals.

a, Recording site for each animal is shown in coronal views (Paxinos and Franklin61). b, Example coronal section of a recording site and DA2m (green) expression in the VS. c, Example coronal section of a recording site and GCaMP7f (green) and tdTomato (red) expression in the VS. d, Example coronal section of a recording site and GCaMP7f (green) and tdTomato (red) expression in the VTA. Asterisks indicate fiber tip locations. Other animals (n = 7 for DA2m, n = 9 for GCaMP in VS, and n = 4 for GCaMP in VTA) showed similar result as summarized in (a). Scale bars, 1 mm. e, Raw GCaMP7f (upper panel) and DA2m (GrabDA; lower panel) signals in VS. f, Comparison of free reward response between electrophysiology and fiber-fluorometry. Electrophysiology of opto-tagged dopamine neurons (upper; data from Matsumoto et al., 2016), fiber-fluorometry of GCaMP signals in dopamine axons in the VS (middle), fiber-fluorometry of DA2m signals in the VS (bottom).

Extended Data Fig. 2 Dopamine release in the ventral striatum during first-time classical conditioning.

a, The time-course of dopamine sensor responses to cued water (left) and to free water (right) in an example mouse. Each dot represents responses in each trial, and a line shows moving averages of 20 trials. b, Dopamine response to reward-associated cue in the late phase (2–3 s after cue onset) was significantly higher than activity in the early phase (0–1 s after cue onset) during the first 1/3 of learning phase (t = 5.0, p = 0.22 × 10−2; two-sided t-test). c, Dopamine sensor signal peaks during delay periods (red) overlaid on a heatmap of dopamine sensor signals in cued water trials. n = 7 animals. d, Dopamine sensor response onset (purple) overlaid on a heatmap of dopamine sensor signals. e, Linear regression of dopamine excitation onset with trial number. f, Each panel shows correlation coefficients between activity peak and trial number for trial order shuffled data (n = 500) in each animal (7 animals). 95 percentile area is marked with red and the correlation coefficient of original data is shown as blue line. Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. **p<0.01.

Source data

Extended Data Fig. 3 Test for monotonic shift of dopamine activity and activity center of mass during first-time learning.

a, Schematic drawing of dopamine activity peaks of randomly sampled 3 trials (see Methods). In temporal shift (left), the probability of monotonic relationship is higher than chance level, while that is not the case in amplitude shift (right). b, Probability of having monotonic shift in randomly sampled 3 activity peaks. 100 sets of sampling were repeated for 500 times for each animal (see Methods). Each panel shows the result from a single animal (7 animals). Blue, actual data; red, control. The data showed significantly higher probability (p<0.05; two-sided t-test, no adjustment for multiple comparison) of monotonic shift in 6 out of 7 animals, which is significantly higher than chance level (p = 0.78 × 10−2; binomial cumulative function). c, Contour plot of activity pattern (left) and the time-course of the center of mass (0–3 s after cue onset) over training (right) in an example animal. d, Left, Time-course of the center of mass over training. Right, average centers of mass during the first 1/3 of learning period in all animals were significantly later than half time point (1.5s) (t = −2.8, p = 0.029; two-sided t-test). n = 7 animals. Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. *p<0.05, ***p<0.001.

Source data

Extended Data Fig. 4 Temporal shift of activity during reversal learning.

a, The regression coefficients ±95% confidence intervals between activity peak timing and trial number in each animal under different experimental conditions (n = 2 animals for GCaMP6f, n = 3 animals for DA2m; mean ± 95% confidence intervals). Red circles, significant (p-value ≤0.05; two-sided F-test, no adjustment for multiple comparison) slopes. b, Average dopamine activity (normalized to free water response) in response to a reward-predicting cue in the first session of reversal from nothing to reward (left and right top; n = 3 animals with DA sensor). Each line shows 8 trials mean of population neural activity across the session (mean ± sem). Right bottom, linear regression of peak timing of average activity with trial number during reversal from nothing to reward with dopamine sensor (right; n = 3 animals; regression coefficient 31.8 ms/trial, F = 22, p = 1.3 × 10−4). c, Probability of having monotonic shift in randomly sampled 3 activity peaks in actual data (blue) and control (red) (7 animals, see Methods). The data showed significantly higher probability (p<0.05; two-sided t-test, no adjustment for multiple comparison) of monotonic shift in 6 out of 7 animals in nothing to reward reversal (p = 0.78 × 10−2; binomial cumulative function) and 6 out of 7 animals in airpuff to reward reversal (p = 0.78 × 10−2; binomial cumulative function). ***p<0.001.

Source data

Extended Data Fig. 5 Center of mass and time-course of cue responses during reversal learning.

a, Time-course of center of mass of activity (0–3 s after cue onset) over training, and average center of mass during the first 1/3 of learning period in all animals. The average centers of mass were significantly later than half time point (1.5 s) (left, nothing to reward, t = −4.8, p = 0.28 × 10−2; right, airpuff to reward, t = −5.2, p = 0.18 × 10−2; two-sided t-test). b, Responses to a reward-predicting odor in all animals (mean ± sem). early: 0–1 s from odor onset (green); late: 2–3 s from odor onset (magenta). Middle, difference between early and late odor responses (grey: each animal; orange: mean ± sem). Dopamine activity in the late phase was significantly higher than activity in the early phase during the first 1/3 of learning phase (t = 3.8, p = 0.85 × 10−2; two-sided t-test). Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. **p<0.01.

Source data

Extended Data Fig. 6 Simultaneous fiber fluorometry recording of dopamine neuron activity in the VTA and VS.

a, GCaMP7f was expressed in VTA dopamine neurons and fibers were targeted at both VTA and VS. b, GCaMP signal of dopamine axons in the VS of an example animal (left; mean ± sem, and right top), and activity peak for each trial (right bottom; grey circle). Activity peaks were fitted with linear regression with trial number and the fitted line was shown with red line. c, GCaMP activity of dopamine neurons in the VTA in an example animal simultaneously recorded with (b) (left; mean ± sem, and right top). Activity peaks (right bottom; grey circle) were fitted with linear regression with trial number and the fitted line was shown with red line. d, Comparison of Pearson’s correlation coefficients of activity peaks and trial number between VS recording (p = 8.6 × 10−6, two-sided t-test) and VTA recording (p = 1.2 × 10−2, two-sided t-test) (p = 5.2 × 10−3, two-sided t-test after Fisher’s Z-transformation). Filled circles, reversal from nothing to reward. Open circle, reversal from airpuff to reward. Red circles, significant (p-value ≤0.05, F-test, no adjustment for multiple comparison). n = 8 sessions (airpuff-reward sessions and nothing-reward sessions from 4 animals). Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. *p<0.05, **p<0.01, ***p<0.001.

Source data

Extended Data Fig. 7 Temporal shift of dopamine inhibitory activity in reversal learning.

a, Lick counts during the delay period (0–3 s after odor onset) in reversal training from reward to airpuff (n = 7 animals). Mean ± sem for each trial. b,b’, GCaMP signals from example animals during reversal. Left, session mean ± sem. The white horizontal lines (top) show session boundaries. c,c’, Dopamine activity trough (grey circle) and linear regression with trial number ((c) regression coefficient 7.6 ms/trial, p = 5.9 × 10−13: (c’) regression coefficient 7.9 ms/trial, p = 3.8 × 10−3). d, Responses to the airpuff-predicting odor in an example animal (left) and in all animals (right, mean ± sem). Early: 0.75–1.75 s from odor onset (green). First 0.75 s was excluded to minimize contamination of remaining positive cue response. Late: 2–3 s from odor onset (magenta). e, Difference between early and late odor responses (grey: each animal; orange: mean ± sem). Dopamine activity in the late phase was significantly higher than activity in the early phase during the first 2–20 trials (t = −3.0, p = 2.3 × 10−3; two-sided t-test). f, Regression coefficient ±95% confidence intervals between activity peak and trial number in each animal in different experimental conditions (mean ± 95% confidence intervals). Red circles, significant (p ≤0.05; F-test, no adjustment for multiple comparison) slopes. n = 7 animals for each condition. Data of one animal in reversal from reward to nothing was removed because of insufficient number of trials with detected troughs. *p<0.05.

Source data

Extended Data Fig. 8 Comparison of dopamine axon GCaMP signal, control fluorescence signal, and licking.

a, GCaMP signals (top; green), tdTomato signals (middle; red), and lick counts (bottom; blue) recorded simultaneously in the first reversal session from airpuff to reward (mean ± sem). b, Percentage of animals that show anticipatory licking during delay periods vs trial number. Regression coefficient 1.3%/trial, F = 33, p = 1.2 × 10−6, F-test. c, Average lick counts during delay periods vs trial number. Regression coefficient 0.11 lick/trial, F = 35, p = 7.8 × 10−7, F-test. d, First lick timings vs trial number. Regression coefficient −2.3 ms/trial, F = 0.19, p = 0.66, F-test. e, Relation between the first lick and GCaMP signals during the delay period in an example animal. Right, comparison between timing of GCaMP peak and timing of the first lick. f, Linear regression coefficients for timing of GCaMP peak, tdTomato peak, lick peak and first lick with trial number (t = 8.9, p = 8.8 × 10−4 for GCaMP peak; t = −0.058, p = 0.96 for tdTomato peak; t = 0.038, p = 0.97 for lick peak; and t = 0.98, p = 0.38 for first lick, two-sided t-test). Red circles, significant (p-value ≤0.05, F-test, no adjustment for multiple comparison). g, Latency of GCaMP response and first lick (GCaMP peak to first lick, 427 ± 241 ms; and GCaMP response onset to first lick, 989 ± 154 ms, mean ± sem). h, Correlation coefficients between timing of GCaMP response peak and lick peak (t = 1.3, p = 0.27, two-sided t-test) and lick onset (first lick, t = 0.53, p = 0.62, two-sided t-test, and second lick, t = 1.6, p = 0.19, two-sided t-test). Red circles, significant (p-value ≤0.05, F-test, no adjustment for multiple comparison). n=5 animals. Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. **p<0.01.

Source data

Extended Data Fig. 9 Dopamine cue responses during repeated learning in well-trained animals.

a, Probability of having monotonic shift in randomly sampled 3 activity peaks in the actual data (blue) and control (red) (5 animals). The data showed significantly higher probability (p<0.05; two-sided t-test, no adjustment for multiple comparison) of monotonic shift in 3 out of 5 animals (p = 0.18; binomial cumulative function). b, Responses to a reward-predicting odor in all animals (center, mean ± sem). early: 0–1 s from odor onset (green); late: 2–3 s from odor onset (magenta). Middle, difference between early and late odor responses (grey: each animal; orange: mean ± sem). Right, dopamine activity in the late phase was not significantly higher than activity in the early phase during the first 2–10 trials (t = −1.3, p = 0.25; two-sided t-test). n = 5 animals. c, Left, time-course of center of mass during delay periods (0–3 s after cue onset) over training. Right, average centers of mass during 2–10 trial in all animals were not significantly later than half time point (1.5 s) (left, nothing to reward, t = 1.6, p = 0.17; two-sided t-test). n = 5 animals. Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points. ***p<0.001.

Source data

Extended Data Fig. 10 Recording sites and dopamine cue responses in deep 2-photon imaging.

a, Recording site for each animal is shown in coronal views (Paxinos and Franklin61). b, Example coronal section of a recording site and GCaMP expression in the VTA. An asterisk indicates fiber tip location. Other 4 animal also showed similar result as shown in (a). Scale bar, 1 mm. c, Probability of monotonic shift in randomly sampled 3 activity peaks in actual data (blue) and control (red). The neurons with sufficient detected peaks were used (21/36 neurons for nothing to reward, 16/36 neurons for airpuff to reward, see Methods). Sampling was repeated for 500 times for each neuron. The pi-chart summarizes the number of neurons with significantly higher probability of monotonic shift (p<0.05; two-sided t-test, no adjustment for multiple comparison, Data > Ctrl; blue), significantly lower probability of monotonic shift (p<0.05; two-sided t-test, no adjustment for multiple comparison, Data < Ctrl; orange), and not significantly different (n.s.; two-sided t-test, no adjustment for multiple comparison; grey) compared to control (left, nothing to reward, p = 0.039; right, airpuff to reward, p = 0.038, binomial cumulative function). d,e, Responses to a reward-predicting odor in an example neuron (left column of b) and all neurons (left column of c, n = 36, mean ± sem). early: 0–1 s from odor onset (green); late: 2–3 s from odor onset (magenta). Right column of b and middle column of c, difference between early and late odor responses (mean ± sem for c). Right column of c, dopamine activity in the late phase was significantly higher than activity in the early phase during the first 1/3 of learning phase for nothing to reward (t = 2.5, p = 0.015; two-sided t-test), and airpuff to reward (t = 3.1, p = 0.35 × 10−2; two-sided t-test). f, Time-course of center of mass (0–3 s after cue onset) over training and average centers of mass during the first 1/3 learning phase in all neurons used for linear fitting analysis (left, nothing to reward, n = 35, t = −1.0, p = 0.30; middle, airpuff to reward, n = 36, t = −2.0, p = 0.048; right, both types are pooled, t = −2.2, p = 0.030; two-sided t-test). Center of boxplot showing median, edges are 25th and 75th percentile, and whiskers are most extreme data points not considered as outlier. *p<0.05, **p<0.01.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–3.

Reporting Summary

Supplementary Data 1

Temporal difference model code.

Source data

Source Data Fig. 2

Statistical Source Data.

Source Data Fig. 3

Statistical Source Data.

Source Data Fig. 4

Statistical Source Data.

Source Data Fig. 5

Statistical Source Data.

Source Data Extended Data Fig. 2

Statistical Source Data.

Source Data Extended Data Fig. 3

Statistical Source Data.

Source Data Extended Data Fig. 4

Statistical Source Data.

Source Data Extended Data Fig. 5

Statistical Source Data.

Source Data Extended Data Fig. 6

Statistical Source Data.

Source Data Extended Data Fig. 7

Statistical Source Data.

Source Data Extended Data Fig. 8

Statistical Source Data.

Source Data Extended Data Fig. 9

Statistical Source Data.

Source Data Extended Data Fig. 10

Statistical Source Data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amo, R., Matias, S., Yamanaka, A. et al. A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning. Nat Neurosci 25, 1082–1092 (2022). https://doi.org/10.1038/s41593-022-01109-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-022-01109-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing