Associative learning is driven by prediction errors. Dopamine transients correlate with these errors, which current interpretations limit to endowing cues with a scalar quantity reflecting the value of future rewards. We tested whether dopamine might act more broadly to support learning of an associative model of the environment. Using sensory preconditioning, we show that prediction errors underlying stimulus–stimulus learning can be blocked behaviorally and reinstated by optogenetically activating dopamine neurons. We further show that suppressing the firing of these neurons across the transition prevents normal stimulus–stimulus learning. These results establish that the acquisition of model-based information about transitions between nonrewarding events is also driven by prediction errors and that, contrary to existing canon, dopamine transients are both sufficient and necessary to support this type of learning. Our findings open new possibilities for how these biological signals might support associative learning in the mammalian brain in these and other contexts.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Schultz, W. Dopamine neurons and their role in reward mechanisms. Curr. Opin. Neurobiol. 7, 191–197 (1997).
Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Sutton, R.S. & Barto, A.G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170 (1981).
Nakahara, H. Multiplexing signals in reinforcement learning with internal models and dopamine. Curr. Opin. Neurobiol. 25, 123–129 (2014).
Schultz, W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci. 17, 183–195 (2016).
Tolman, E.C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Gläscher, J., Daw, N., Dayan, P. & O'Doherty, J.P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Colwill, R.M. An associative analysis of instrumental learning. Curr. Dir. Psychol. Sci. 2, 111–116 (1993).
Hollland, P.C. & Rescorla, R.A. The effect of two ways of devaluing the unconditioned stimulus after first- and second-order appetitive conditioning. J. Exp. Psychol. Anim. Behav. Process. 1, 355–363 (1975).
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Steinberg, E.E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).
Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).
Chang, C.Y. et al. Brief optogenetic inhibition of dopamine neurons mimics endogenous negative prediction errors. Nat. Neurosci. 19, 111–116 (2016).
Tsai, H.C. et al. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science 324, 1080–1084 (2009).
Adamantidis, A.R. et al. Optogenetic interrogation of dopaminergic modulation of the multiple phases of reward-seeking behavior. J. Neurosci. 31, 10829–10835 (2011).
Ilango, A. et al. Similar roles of substantia nigra and ventral tegmental dopamine neurons in reward and aversion. J. Neurosci. 34, 817–822 (2014).
Stopper, C.M., Tse, M.T., Montes, D.R., Wiedman, C.R. & Floresco, S.B. Overriding phasic dopamine signals redirects action selection during risk/reward decision making. Neuron 84, 177–189 (2014).
Brogden, W.J. Sensory pre-conditioning. J. Exp. Psychol. 25, 323–332 (1939).
Blundell, P., Hall, G. & Killcross, S. Preserved sensitivity to outcome value after lesions of the basolateral amygdala. J. Neurosci. 23, 7702–7709 (2003).
Jones, J.L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953–956 (2012).
Rizley, R.C. & Rescorla, R.A. Associations in second-order conditioning and sensory preconditioning. J Comp Physiol Psychol 81, 1–11 (1972).
Kamin, L.J. “Attention-like” processes in classical conditioning. in Miami Symposium on the Prediction of Behavior, 1967: Aversive Stimulation (ed. M.R. Jones) 9–31 (University of Miami Press, 1968).
Tobler, P.N., Dickinson, A. & Schultz, W. Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. J. Neurosci. 23, 10402–10410 (2003).
Pan, W.-X., Schmidt, R., Wickens, J.R. & Hyland, B.I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).
Hollerman, J.R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).
Cohen, J.Y., Haesler, S., Vong, L., Lowell, B.B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Takahashi, Y.K. et al. The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes. Neuron 62, 269–280 (2009).
Kakade, S. & Dayan, P. Dopamine: generalization and bonuses. Neural Netw. 15, 549–559 (2002).
Horvitz, J.C., Stewart, T. & Jacobs, B.L. Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res. 759, 251–258 (1997).
Witten, I.B. et al. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721–733 (2011).
D'Ardenne, K., McClure, S.M., Nystrom, L.E. & Cohen, J.D. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319, 1264–1267 (2008).
Parker, N.F. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 (2016).
Day, J.J., Roitman, M.F., Wightman, R.M. & Carelli, R.M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).
Holland, P.C. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process. 30, 104–117 (2004).
Dickinson, A. & Balleine, B.W. Motivational control of goal-directed action. Anim. Learn. Behav. 22, 1–18 (1994).
Popescu, A.T., Zhou, M.R. & Poo, M.-M. Phasic dopamine release in the medial prefrontal cortex enhances stimulus discrimination. Proc. Natl. Acad. Sci. USA 113, E3169–E3176 (2016).
Mackintosh, N.J. A theory of attention: variations in the associability of stimuli with reinforcement. Psychol. Rev. 82, 276–298 (1975).
Pearce, J.M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).
Esber, G.R. & Haselgrove, M. Reconciling the influence of predictiveness and uncertainty on stimulus salience: a model of attention in associative learning. Proceedings of the Royal Society of London B: Biological Sciences http://dx.doi.org/10.1098/rspb.2011.0836 (2011).
Sadacca, B.F., Jones, J.L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).
Cone, J.J. et al. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. Proc. Natl. Acad. Sci. USA 113, 1943–1948 (2016).
Bromberg-Martin, E.S., Matsumoto, M., Hong, S. & Hikosaka, O. A pallidus-habenula-dopamine pathway signals inferred stimulus values. J. Neurophysiol. 104, 1068–1076 (2010).
Aitken, T.J., Greenfield, V.Y. & Wassum, K.M. Nucleus accumbens core dopamine signaling tracks the need-based motivational value of food-paired cues. J. Neurochem. 136, 1026–1036 (2016).
Deserno, L. et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc. Natl. Acad. Sci. USA 112, 1595–1600 (2015).
Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).
Lammel, S. et al. Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron 57, 760–773 (2008).
Wimmer, G.E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).
Robinson, S. et al. Chemogenetic silencing of neurons in retrosplenial cortex disrupts sensory preconditioning. J. Neurosci. 34, 10982–10988 (2014).
Johnson, A., Fenton, A.A., Kentros, C. & Redish, A.D. Looking for cognition in the structure within the noise. Trends Cogn. Sci. 13, 55–64 (2009).
Holland, P.C. Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. J. Exp. Psychol. Anim. Behav. Process. 3, 77–104 (1977).
McDannald, M.A., Lucantonio, F., Burke, K.A., Niv, Y. & Schoenbaum, G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J. Neurosci. 31, 2700–2705 (2011).
Holland, P.C. & Gallagher, M. Effects of amygdala central nucleus lesions on blocking and unblocking. Behav. Neurosci. 107, 235–245 (1993).
Holland, P.C. & Kenmuir, C. Variations in unconditioned stimulus processing in unblocking. J. Exp. Psychol. Anim. Behav. Process. 31, 155–171 (2005).
Sharpe, M.J. & Killcross, S. The prelimbic cortex contributes to the down-regulation of attention toward redundant cues. Cereb. Cortex 24, 1066–1074 (2014).
Burke, K.A., Franz, T.M., Miller, D.N. & Schoenbaum, G. The role of the orbitofrontal cortex in the pursuit of happiness and more specific rewards. Nature 454, 340–344 (2008).
The authors thank K. Deisseroth and the Gene Therapy Center at the University of North Carolina at Chapel Hill for providing viral reagents and G. Stuber for technical advice on their use. We also thank B. Harvey and the NIDA Optogenetic and Transgenic Core, M. Morales and the NIDA Histology Core for their assistance, and P. Dayan and N. Daw for their comments. This work was supported by R01-MH098861 (to Y.N.) and by the Intramural Research Program at NIDA ZIA-DA000587 (to G.S.). The opinions expressed in this article are the authors' own and do not reflect the view of the NIH/DHHS.
The authors declare no competing financial interests.
Integrated supplementary information
Plots show number of magazine entries occurring during all phases of the blocking of sensory preconditioning task with wild type rats: preconditioning (A), conditioning (B) and the probe test (C). A two-factor ANOVA (cue х group) revealed a significant difference in responding to cue F relative to cue D (F(1,13)=5.845, p = 0.031), where the same analysis revealed no difference in responding to D and C (F(1,13) =0.013, p = 0.911). ** indicates significance at p<0.05. We have interpreted our basic sensory preconditioning effect in terms of an associative chaining or value inference mechanism. An alternative account, which has been employed in other recent studies using similar procedures1,2, is that the conditioned responding to the pre-conditioned cue results from mediated learning that occurs during the conditioning phase of the experimental procedure3. Briefly, this account would argue that, during conditioning, presentations of X also activate a representation of any associated pre-conditioned cue in memory within relatively close temporal contiguity with the delivery of the sucrose pellets, resulting in the representation of the pre-conditioned cue becoming directly associated with this reward. If this were to occur, then at test, the conditioned responding to the pre-conditioned cue might reflect a direct association with sucrose, rather than requiring X to bridge the experiences of this cue and sucrose. While there is significant evidence within the literature for the phenomenon of mediated learning4,5, several features of our behavioral design were chosen to bias strongly against the operation of this mechanism. First, we used forward rather than simultaneous or backward pairings of the pre-conditioned and conditioned cues. This is important because mediated learning in rodents has been suggested to operate primarily when the constituent elements are presented simultaneously3 or in reverse i.e. backward sensory preconditioning;4. The reason for this is intuitive because either of these temporal arrangements maximizes the chances that B will evoke a representation of A during the conditioning phase and concurrent with reward delivery, an arrangement that obvious benefits in maximizing the ability of an evoked representation of A to become directly associated with reward. Our design avoids this issue by using forward pairing of the preconditioned and to-be-conditioned cues in the initial phase of training. This treatment is expected to render X relatively ineffective at subsequently conjuring up a memory of any of the preceding cues, thus making the contribution of mediated learning insubstantial6. Second, the amount of training given in conditioning, with X-reward pairings, was also designed to discourage mediated learning. As noted above, the presentation of X in conditioning could lead to mediated learning to the extent it activates a representation of a pre-conditioned cue in memory. However, with repeated presentations of X without the other cues, the ability of X to evoke a representation of these other cues will extinguish. Conditioning consisted of 4 days of AM and PM training in which X was presented without other cues. This extensive training should further undermine the likelihood of mediated learning. In conclusion, we believe our specific behavioral parameters largely eliminate any potential contribution of mediated learning to the sensory preconditioning effect in our particular design, and favor the parsimonious interpretation of the sensory preconditioning effect in terms of an associative chaining or inference mechanism. We would note that this interpretation is supported by our own prior report that OFC inactivation at probe test in this exact paradigm abolishes responding to the pre-conditioned cue and has no effect on responding to the conditioned cue7, since mediated learning is basically simple conditioning and OFC manipulations typically have no effect on expression of previously acquired conditioned responding.
1. Kurth-Nelson, Z., Barnes, G., Sejdinovic, D., Dolan, R. & Dayan, P. Temporal structure in associative retrieval. eLife 4 (2015).
2. Wimmer, G.E., Daw, N.D. & Shohamy, D. Generalization of value in reinforcement learning by humans. The European journal of neuroscience 35, 1092-1104 (2012).
3. Rescorla, R.A. & Freberg, L. The extinction of within-compound flavor associations. Learning and Motivation 9, 411-427 (1978).
4. Ward-Robinson, J. & Hall, G. Backward sensory preconditioning. Journal of Experimental Psychology: Animal Behavior Processes 22, 395-404 (1996).
5. Holland, P.C. Event representation in Pavlovian conditioning: image and action. Cognition 37, 105-131 (1990).
6. Hall, G. Learning about associatively activated representations: Implications for acquired equivalence and perceptual learning. Animal Learning & Behavior 24, 233-255 (1996).
7. Jones, J.L., et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953-956 (2012).
Supplementary Figure 2 Brief optogenetic inhibition of dopamine neurons reduces the strength of associations between cues when analyzing entries made into the food port.
Plots show the number of entries made into the magazine during cue presentation across all phases of the sensory preconditioning task: preconditioning (A), conditioning (B) and the probe test (C). Top panel shows data from the eYFP control group, bottom panel shows data from the experimental NpHR group. VTA dopamine neurons were inhibited by light delivery (yellow symbol) in the 500ms before the offset of A and carried through the first 2s of X. Error bars =SEM. As the second experiment involved a much higher amount of reward relative to experiment 1 (approximately double), the nature of the conditioned response changed. Rather than checking briefly many times for reward, the rats are more certain reward is coming and therefore they make fewer entries and spend more time in the food cup. As a result, we plot the conditioned responding as the amount of time spent in the food cup rather than number of entries in the main manuscript. Of course, both measures reflect a prediction that the food is coming. Further, the shift in the form of the response is expected based on the differences in reward in the two designs and there is a history of researchers who have reported differences in these measures8-12. Importantly this figure show that we still see the same overall pattern and direction of effects whether we look at the data in terms of the time spent in, or the number of entries made into, the food port. In order to confirm this statistically, we conducted a Multivariate Analysis of Variance (MANOVA) where we included both measures as dependent variables in a single analysis and assessed their significance as we have done previously. This multivariate analyses elicited a significant interaction between cue and group across both measures (F(2,38)= 3.5, p=0.04), which was due to a significant difference between A and B in the NpHR (F(2,38)=5.0, p=0.01) but not in the control eYFP group (F(2,38)=0.742, p=0.483). Thus, we obtained the same results when including both number and percent measures as dependent variables in the analyses. We also conducted a linear regression analysis that showed that percent responding to either cue significantly predicted the number of entries made towards that cue in the same animal (F(1,80)=48.10, p<0.001). The correlation was 0.65 and the total variability in the number of responses made towards the cues that could be predicted by percent responding was ~40%. Further, when normalizing the numbers of responses according to the coefficients obtained in the linear regression to equate them with the percent data, we found that including response measure as a factor in our repeated-measures ANOVA did not produce any interactions between this factor and cue, group, or our critical cue by group interaction (data not shown). Thus, the two response measures were significantly correlated.
8. Holland, P.C. & Gallagher, M. Effects of amygdala central nucleus lesions on blocking and unblocking. Behavioral neuroscience 107, 235 (1993).
9. Holland, P.C. & Kenmuir, C. Variations in unconditioned stimulus processing in unblocking. Journal of Experimental Psychology: Animal Behavior Processes 31, 155 (2005).
10. Sharpe, M. & Killcross, S. The prelimbic cortex contributes to the down-regulation of attention toward redundant cues. Cerebral cortex 24, 1066-1074 (2014).
11. McDannald, M.A., Lucantonio, F., Burke, K.A., Niv, Y. & Schoenbaum, G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. The Journal of Neuroscience 31, 2700-2705 (2011).
12. Burke, K.A., Franz, T.M., Miller, D.N. & Schoenbaum, G. The role of the orbitofrontal cortex in the pursuit of happiness and more specific rewards. Nature 454, 340-344 (2008).
Supplementary Figure 3 The difference between responding to cue A and cue B in the NpHR group in Experiment 2 is not caused by frequent responses to cue.
A. Plots show percent time spent in the magazine during the final probe test in Experiment 2. Consecutively removing the highest responders to cue A in the NpHR did not reduce the magnitude of the difference between cue A and B. While the NpHR group showed lower-levels of responding to cue B, it also exhibited elevated levels of responding to cue A. This likely reflects the distribution of learning (and responding) across available cues in a within-subject design. That is, if the learning (and responding) about one cue is compromised, it is sometimes elevated towards other available cues. In support of this suggestion, the critical difference in learning about cue A and B is not driven by heightened levels of responding to cue A. This is illustrated this figure, which shows that consecutively removing the highest responders in the NpHR group does not affect the magnitude of the difference. As responding to A goes down, responding to B also decreases. Accordingly, a split mean analysis including high and low responding to A as a factor in an ANOVA on data elicited from the NpHR group in the probe test revealed a main effect of cue (F(1,15) = 10.6, p =0.006) but no interaction with the level of responding (F(1,15) = 1.9, p = 0.189). Thus, high responding to A is not responsible for the difference between cue A and B in Experiment 2 in the NpHR rats.
Supplementary Figure 4 Stimulation or inhibition of VTA dopamine neurons during preconditioning does not cause rats to enter or avoid the magazine.
Left: panels indicate data from the preconditioning phase of our blocking of sensory preconditioning procedure where we stimulated dopamine neurons in our ChR2 group (left, bottom) at the beginning of X when preceded by AC trials. Rates of responding are represented as mean magazine entries (±SEM). Stimulation of dopamine did not alter rates of responding in the magazine where a repeated-measures ANOVA revealed no cue by group (F(4,140)=0.180, p=0.948),cue by session (F(1,35)=1.854, p=0.182), or any three-way interaction between these terms (F(4,14)=0.887, p = 0.474). Right: panels indicate data from preconditioning during our basic sensory preconditioning procedure where we inhibit VTA dopamine in our NpHR group at the transition of B and Y. Note again that inhibition of dopamine neurons does not change the amount of responding in the magazine where a repeated-measures ANOVA revealed no revealed no cue by group (F(3,117)=0.425, p=0.736),cue by session (F(1,39)=0.292, p=0.831), or any three-way interaction between these terms (F(3,117)=0.591, p = 0.622).
About this article
Cite this article
Sharpe, M., Chang, C., Liu, M. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat Neurosci 20, 735–742 (2017). https://doi.org/10.1038/nn.4538
Scientific Reports (2022)
The prediction-error hypothesis of schizophrenia: new data point to circuit-specific changes in dopamine activity
Nature Communications (2021)
Nature Neuroscience (2021)