Learning to predict rewards based on environmental cues is essential for survival. The orbitofrontal cortex (OFC) contributes to such learning by conveying reward-related information to brain areas such as the ventral tegmental area (VTA). Despite this, how cue–reward memory representations form in individual OFC neurons and are modified based on new information is unknown. To address this, using in vivo two-photon calcium imaging in mice, we tracked the response evolution of thousands of OFC output neurons, including those projecting to VTA, through multiple days and stages of cue–reward learning. Collectively, we show that OFC contains several functional clusters of neurons distinctly encoding cue–reward memory representations, with only select responses routed downstream to VTA. Unexpectedly, these representations were stably maintained by the same neurons even after extinction of the cue–reward pairing, and supported behavioral learning and memory. Thus, OFC neuronal activity represents a long-term cue–reward associative memory to support behavioral adaptation.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data that support the findings of this study are available from the corresponding author on request.
All of the behavioral data were collected using custom MATLAB and Arduino scripts written by V.M.K.N. These are available on request from the corresponding author. All of the analyses were done in Python using custom codes written by V.M.K.N. These will be uploaded to the Stuber laboratory Github page (https://github.com/stuberlab), and/or will be available on request from the corresponding author.
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Stalnaker, T. A., Cooch, N. K. & Schoenbaum, G. What the orbitofrontal cortex does not do. Nat. Neurosci. 18, 620–627 (2015).
Wallis, J. D. Cross-species studies of orbitofrontal cortex and value-based decision-making. Nat. Neurosci. 15, 13–19 (2011).
Izquierdo, A. Functional heterogeneity within rat orbitofrontal cortex in reward learning and decision making. J. Neurosci. 37, 10529–10540 (2017).
Rudebeck, P. H. & Murray, E. A. The orbitofrontal oracle: cortical mechanisms for the prediction and evaluation of specific behavioral outcomes. Neuron 84, 1143–1156 (2014).
Izquierdo, A., Suda, R. K. & Murray, E. A. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J. Neurosci. 24, 7540–7548 (2004).
Schoenbaum, G., Setlow, B., Nugent, S. L., Saddoris, M. P. & Gallagher, M. Lesions of orbitofrontal cortex and basolateral amygdala complex disrupt acquisition of odor-guided discriminations and reversals. Learn. Mem. 10, 129–140 (2003).
Noonan, M. P. et al. Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex. Proc. Natl Acad. Sci. USA 107, 20547–20552 (2010).
Walton, M. E., Behrens, T. E. J., Buckley, M. J., Rudebeck, P. H. & Rushworth, M. F. S. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron 65, 927–939 (2010).
Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Rich, E. L. & Wallis, J. D. Decoding subjective decisions from orbitofrontal cortex. Nat. Neurosci. 19, 973–980 (2016).
Schuck, N. W., Wilson, R. C. & Niv, Y. A state representation for reinforcement learning and decision-making in the orbitofrontal cortex. in Goal-Directed Decision Making (eds. Morris, R., Bornstein, A. & Shenhav, A.) 259–278 (Academic Press, 2018).
Bradfield, L. A., Dezfouli, A., van Holstein, M., Chieng, B. & Balleine, B. W. Medial orbitofrontal cortex mediates outcome retrieval in partially observable task situations. Neuron 88, 1268–1280 (2015).
Kepecs, A., Uchida, N., Zariwala, H. A. & Mainen, Z. F. Neural correlates, computation and behavioural impact of decision confidence. Nature 455, 227–231 (2008).
Hirokawa, J., Vaughan, A. & Kepecs, A. Categorical representations of decision-variables in orbitofrontal cortex. Preprint at bioRxiv https://doi.org/10.1101/135707 (2017).
Moorman, D. E. & Aston-Jones, G. Orbitofrontal cortical neurons encode expectation-driven initiation of reward-seeking. J. Neurosci. 34, 10234–10246 (2014).
Lichtenberg, N. T. et al. Basolateral amygdala to orbitofrontal cortex projections enable cue-triggered reward expectations. J. Neurosci. 37, 8374–8384 (2017).
Lucantonio, F. et al. Orbitofrontal activation restores insight lost after cocaine use. Nat. Neurosci. 17, 1092–1099 (2014).
Takahashi, Y. K. et al. Neural estimates of imagined outcomes in the orbitofrontal cortex drive behavior and learning. Neuron 80, 507–518 (2013).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Poort, J. et al. Learning enhances sensory and multiple non-sensory representations in primary visual cortex. Neuron 86, 1478–1490 (2015).
Otis, J. M. et al. Prefrontal cortex output circuits guide reward seeking through divergent cue encoding. Nature 543, 103–107 (2017).
Hoover, W. B. & Vertes, R. P. Projections of the medial orbital and ventral orbital cortex in the rat. J. Comp. Neurol. 519, 3766–3801 (2011).
Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).
Gutierrez, R., Carmena, J. M., Nicolelis, M. A. L. & Simon, S. A. Orbitofrontal ensemble activity monitors licking and distinguishes among natural rewards. J. Neurophysiol. 95, 119–133 (2006).
Schoenbaum, G., Setlow, B., Saddoris, M. P. & Gallagher, M. Encoding predicted outcome and acquired value in orbitofrontal cortex during cue sampling depends upon input from basolateral amygdala. Neuron 39, 855–867 (2003).
Lopatina, N. et al. Ensembles in medial and lateral orbitofrontal cortex construct cognitive maps emphasizing different features of the behavioral landscape. Behav. Neurosci. 131, 201–212 (2017).
Schoenbaum, G., Roesch, M. R., Stalnaker, T. A. & Takahashi, Y. K. A new perspective on the role of the orbitofrontal cortex in adaptive behaviour. Nat. Rev. Neurosci. 10, 885–892 (2009).
Morrison, S. E., Saez, A., Lau, B. & Salzman, C. D. Different time courses for learning-related changes in amygdala and orbitofrontal cortex. Neuron 71, 1127–1140 (2011).
Friedrich, J., Zhou, P. & Paninski, L. Fast online deconvolution of calcium imaging data. PLoS Comput. Biol. 13, e1005423 (2017).
Takahashi, Y. K. et al. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nat. Neurosci. 14, 1590–1597 (2011).
Takahashi, Y. K., Stalnaker, T. A., Roesch, M. R. & Schoenbaum, G. Effects of inference on dopaminergic prediction errors depend on orbitofrontal processing. Behav. Neurosci. 131, 127–134 (2017).
Jo, Y. S. & Mizumori, S. J. Y. Prefrontal regulation of neuronal activity in the ventral tegmental area. Cereb. Cortex 26, 4057–4068 (2016).
Takahashi, Y. K. et al. The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes. Neuron 62, 269–280 (2009).
Delamater, A. R. Outcome-selective effects of intertrial reinforcement in a Pavlovian appetitive conditioning paradigm with rats. Anim. Learn. Behav. 23, 31–39 (1995).
Rescorla, R. A. Pavlovian conditioning and its proper control procedures. Psychol. Rev. 74, 71–80 (1967).
Bouton, M. E. Context and behavioral processes in extinction. Learn. Mem. 11, 485–494 (2004).
Pan, W.-X., Brown, J. & Dudman, J. T. Neural signals of extinction in the inhibitory microcircuit of the ventral midbrain. Nat. Neurosci. 16, 71–78 (2013).
Milad, M. R. & Quirk, G. J. Neurons in medial prefrontal cortex signal memory for fear extinction. Nature 420, 70–74 (2002).
Gallagher, M., McMahan, R. W. & Schoenbaum, G. Orbitofrontal cortex and representation of incentive value in associative learning. J. Neurosci. 19, 6610–6614 (1999).
Ostlund, S. B. & Balleine, B. W. Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental conditioning. J. Neurosci. 27, 4819–4825 (2007).
Guo, J.-Z. et al. Cortex commands the performance of skilled movement. eLife 4, e10774 (2015).
Vartak, D., Jeurissen, D., Self, M. W. & Roelfsema, P. R. The influence of attention and reward on the learning of stimulus-response associations. Sci. Rep. 7, 9036 (2017).
Nguyen, D. P. & Lin, S.-C. A frontal cortex event-related potential driven by the basal forebrain. eLife 3, e02148 (2014).
Jennings, J. H. et al. Interacting neural ensembles in orbitofrontal cortex for social and feeding behaviour. Nature 565, 645–649 (2019).
Driscoll, L. N., Pettit, N. L., Minderer, M., Chettih, S. N. & Harvey, C. D. Dynamic reorganization of neuronal activity patterns in parietal cortex. Cell 170, 986–999.e16 (2017).
Namboodiri, V. M. K., Mihalas, S., Marton, T. M. & Hussain Shuler, M. G. A general theory of intertemporal decision-making and the perception of time. Front. Behav. Neurosci. 8, 61 (2014).
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Keiflin, R., Reese, R. M., Woods, C. A. & Janak, P. H. The orbitofrontal cortex as part of a hierarchical neural system mediating choice between two good options. J. Neurosci. 33, 15989–15998 (2013).
Grewe, B. F. et al. Neural ensemble dynamics underlying a long-term associative memory. Nature 543, 670–675 (2017).
Resendez, S. L. et al. Visualization of cortical, subcortical and deep brain neural circuit dynamics during naturalistic mammalian behavior with head-mounted microscopes and chronically implanted lenses. Nat. Protoc. 11, 566–597 (2016).
Kaifosh, P., Zaremba, J. D., Danielson, N. B. & Losonczy, A. SIMA: Python software for analysis of dynamic fluorescence imaging data. Front. Neuroinform. 8, 80 (2014).
Pachitariu, M. et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. Preprint at bioRxiv https://doi.org/10.1101/061507 (2016).
Burnham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (Springer, 2002).
Fisher, R. A. On the probable error of a coefficient of correlation deduced from a small sample. Metron 1, 3–32 (1921).
Steiger, J. H. Tests for comparing elements of a correlation matrix. Psychol. Bull. 87, 245 (1980).
Sparta, D. R. et al. Construction of implantable optical fibers for long-term optogenetic manipulation of neural circuits. Nat. Protoc. 7, 12–23 (2012).
We thank S. Smith, H. Kato, J. Stirman and M. Andermann for helpful discussions. This study was funded by grants from the National Institutes of Health (NIDA: grant no. F32-DA041184, J.M.O.; grant no. R01-DA032750, G.D.S.; grant no. R01-DA038168, G.D.S.; NIMH: grant no. F32-MH113327, J.R.R.), the Brain and Behavior Research Foundation (NARSAD Independent Investigator Award to G.D.S., NARSAD Young Investigator Award to V.M.K.N. and J.M.O.), the Yang Family Biomedical Scholars Award (G.D.S.), the Foundation of Hope (G.D.S.), the UNC Neuroscience Center (Helen Lyng White Fellowship, V.M.K.N.), the UNC Neuroscience Center Microscopy Core (grant no. P30 NS045892) and the UNC Department of Psychiatry (G.D.S.). We also thank members of the Stuber laboratory, especially L. Eckman, O. Kosyk, S. Resendez, C. Zhu, A. Chen and C. Cook, for their assistance. We thank K. Deisseroth (Stanford University), the GENIE project at Janelia Research Campus and E. Kremer (Institut de Génétique Moléculaire de Montpellier) for viral constructs.
The authors declare no competing interests.
Journal peer review information: Nature Neuroscience thanks Kay Tye and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
a. Schematic of experiment in which patch-clamp electrophysiology and imaging of GCaMP6S-containing OFC-CaMKII neurons was conducted. This experiment was performed so as to measure fluorescence changes in response to known firing patterns. b. The normalized measured fluorescence signal is plotted in black and the denoised signal after deconvolution is plotted in red. Inferred spikes following deconvolution (Methods) are plotted in the cyan trace underneath the fluorescence signals, with the true moments of spiking indicated by the black tick marks below the cyan line. Each individual set of graphs shows results of recordings from one specific pattern of action potentials for one neuron. In all, 4 individual recordings were made per 7 neurons, resulting in the 28 cases shown (n = 2 mice). The 4 recordings per neuron consisted of bursts of 1, 2, 4 or 8 action potentials (20 Hz), with each burst separated by 100 ms, 500 ms, 1000 ms or 5000 ms. Overall, it is apparent that spike inference is fairly accurate. c. The same curves as above are shown for recordings during suppression from a baseline level of firing rate (8 Hz), with spiking of each neuron paused briefly for 3 s and there being one brief (1 s) burst of spiking to 16 Hz. The 7 cases correspond to 7 individual neurons recorded across 2 mice. In this case, the deconvolution algorithm overfits the fluorescence signal and thus, the raw fluorescence signal is no longer visible as it is perfectly overlaid by the denoised signal. The inferred spikes now no longer reflect the magnitude of suppression, suggesting that the deconvolution algorithm would underrepresent the magnitude of suppression in firing from rates around or above 8 Hz.
An example behavioral session is shown (same as the one in which the neuron shown in Fig. 1h was recorded) from which individual trials of CS+ and CS- presentation were sorted based on the delay to first lick after cue presentation (left grayscale graph). Neuronal responses are plotted for 11 example neurons with the order of trials maintained based on the delay to first lick (also true for the neuron in Fig. 1h). A total of 20 s around the cue is plotted, including a 3 s baseline period prior to trial start. Any overlap between trials is removed, resulting in the trailing white space on some trials. Comparison of neuronal responses to the licking behavior shows that neuronal responses are not merely a consequence of licking behavior. Further, it can be seen that there is considerable trial-to-trial variability in neuronal responses.
a. Schematic of the clustering analysis (Methods) b. Scree plot of the percentage explained variance per principal component, showing the number of principal components retained (Methods). c. Individual retained principal components, showing response vectors to both CS+ and CS-. d. Clustering stability analysis by subsampling different fractions of trials to recalculate the PSTHs used for clustering. Adjusted Rand Index provides an index of stability for the clustering, and also reflects trial-to-trial variability in responses. Even a 100% subsampling shows instability due to the stochastic nature of spectral clustering. Measure of center is mean and error bars represent standard error of the mean. n=20 randomization repeats were run in each case.. e. Locations of individual cells for each cluster. Locations of all cells are shown in black behind the locations of cells within each cluster. The dorsal-ventral axis was split in half into a “dorsal plane” and a “ventral plane” for display purposes. The shift in mean location for each cluster with respect to the mean location of all cells is shown in Fig. 1l.
Neurons belonging to each pair of clusters are shown in a 2D t-distributed Stochastic Neighbor Embedding (Maaten, L. van der & Hinton, G. Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008)) of the eight dimensional PCA subspace (Supplementary Fig 3c) to show that each cluster is separable from other clusters. Perplexity was set to 100 for this visualization (Scikit-learn tsne function). Sample sizes for each cluster is the same as in Fig. 1j, l.
a. “Inferred spikes” or deconvolved signal estimated from individual neurons’ fluorescence traces. The fluorescence signals from neurons 1–11 are shown in Supplementary Fig 2 and neuron 12 is shown in Fig. 1h. b. The GLM t scores for each of these neurons for 5 behavioral variables. c. The cumulative distribution function for the R2 values calculated based on PSTH fits are shown. The model used here (Methods) was designed to get interpretable estimates of responses to behavioral variables and thus produced lower R2 values than more complete models would.
a. Empirical cumulative distribution functions of GLM t scores for each cluster on Day 1 and Trained for the behavioral variables. b. Mean GLM t scores for each cluster on Day 1 and Trained for different epochs. c. Fractional variance of fluorescence explained per cluster during both the cue onset and trace interval periods. The variance per cluster was first calculated as the net variance (mean of squared deviations) in fluorescence during these periods compared to the mean baseline (3 s before cue) fluorescence. The fractional variance was then calculated as the variance of one cluster divided by the total variance across all clusters. d. (Left) Decoding of behavior in each OFC-CaMKII animal using evolution of neural activity during cue onset, late and trace epochs (Methods), showing that neural activity evolution happens in a manner predictive of behavior. p value shown in figure represents a two-tailed p value for the Pearson’s r. See Supplementary Table 1 for r2 split per animal. (Right) Average weight contributed to behavioral decoding during each temporal epoch, calculated across every neuron within a cluster. e. Same as d for OFC-VTA animals. r represents Pearson’s r. * represents p < 0.05 (see Supplementary Table 1 for all sample sizes, p values and comparisons).
Supplementary Figure 7 Optogenetic fiber placements and optogenetic effect on reward seeking after learning.
a, b. Histology from example animals showing expression of opsin in OFC-CaMKII (a) and OFC-VTA (b) animals. c, d. Fiber placements for OFC-CaMKII (c) and OFC-VTA (d) animals. e. Test of optogenetic effect on expression of learned behavior with one second inhibition epochs (Methods). No group showed significant effect of laser on behavioral expression f. Same as in e but with the inhibition lasting from cue onset until reward delivery or omission. No group showed significant effect of laser on behavioral expression. Measure of center is the mean and error bars represent standard error of the mean. See Supplementary Table 1 for exact p values, sample sizes and tests.
a. Animals (n=2) were injected with retrogradely transported AAV2 expressing a static fluorophore (AAV2retro-CAG-tdTomato or AAV2retro-hSyn-eYFP) so as to map out locations of cells in vmOFC that project to VTA. A representative image of eYFP expression in vmOFC is shown at the top, with a zoomed in region (marked by orange square) shown on the right. b. The location of each cell was calculated as the distance from the closest boundary of the cortex. For instance, distance from the ventral most surface of cortex was calculated for cells that were ventral, and distance from the medial most surface for medial cells. This was done so as to estimate the cortical laminarity of cells, as the cortex forms a shell all around the brain (excluding olfactory bulb) at this location on the anterior-posterior axis. These results suggest that OFC-VTA cells are generally found in deeper layers and not the superficial layers 1 or 2/3.
a. CS+ trace response evolution of all neurons, including those with sigmoidal evolution, over acquisition. (Top row) The response evolution for every longitudinally-tracked neuron across behavioral acquisition is plotted for the learning-related clusters. Since the number of sessions until acquisition was different for different animals, the x-axis represents normalized trial block. In other words, the total number of trial blocks during acquisition (similar to Fig. 4a) was rescaled to contain the 15 normalized blocks on the x-axis. This rescaling was done purely for visualization and not for analysis. (Bottom row) Same as the top row but for neurons whose response evolution fits a sigmoidal model (Methods, Fig. 4c). b. Same plot as in Fig. 4b for reward responses, showing that even reward responses exhibit considerable evolution during acquisition. c. Same as Fig. 4g for reward responses, showing that cluster 1 shows earlier reward response evolution for both OFC-CaMKII and OFC-VTA neurons. d. Mean lag of all neurons from a cluster recorded within each individual animal for onset, trace and reward responses. The size of each dot is proportional to the number of neurons recorded within that animal from the given cluster. Hence, OFC-VTA dots are smaller due to the overall lower numbers of neurons recorded. Despite the fact that both OFC-CaMKII and OFC-VTA are completely independent groups, the distribution of means is quite similar between these groups per cluster. Since numbers of neurons per cluster is not a priori expected to be identical across animals due to possible anatomical variation, pooled data could in principle be contributed to more by some animals. Nevertheless, the independent replication between OFC-CaMKII and OFC-VTA groups of mean lags per animal suggests that the patterns uncovered here are robust.
a. (left) Mean (and s.e.m) CS+ trace interval GLM coefficients are plotted for all OFC-CaMKII clusters on the Trained session. This is reproduced from Supplementary Fig 6b. The mean change in GLM coefficients due to the 50% session (middle) and the Background session (right) are shown for both CS+ Trace and CS+ Onset responses. These show that the change due to contingency degradation is generally in the opposite sign as that of the mean coefficient on the Trained session, that is the response encoding strength generally reduces for both positively-responding and negatively-responding clusters. b. Same as a for OFC-VTA clusters. The variability for OFC-VTA clusters is much larger due to the lower number of projection-defined neurons. However, the magnitude and direction of effects is generally consistent with the effects in the OFC-CaMKII group. Measure of center is the mean and error bars represent standard error of the mean. * represents p < 0.05 (see Supplementary Table 1 for exact p values, sample sizes and tests).
If clusters encode value of expected reward, they should be activated by the expectation of reward (for instance, during the trace interval), should be activated at a level proportional to the expectation of reward (for instance, activation should reduce if probability of reward decreases), and should change on a trial-by-trial basis depending on the instantaneous expected reward. Merely satisfying these criteria may not be sufficient as other alternative explanations (for example movement, general motivation, task engagement etc) may also satisfy these. But if the above criteria are not met, neural encoding would not be consistent with value. If the animals were updating their estimates of cue value experientially instead of learning a model of the context (that is current context has a fixed 50% probability of reward), it would be predicted that their reward seeking behavior on any trial will update based on whether the previous trial was rewarded or unrewarded. a. Reward-seeking behavior updates on a trial-by-trial basis dependent on whether the previous trial was rewarded. This is similar to the plot in Fig. 8b, but for OFC-CaMKII imaging animals. Thus, value of expected reward likely fluctuates on a trial-by-trial basis. b. However, CS+ trace responses of clusters (except for 3 and 8) did not show trial-by-trial updating based on reward history (measured by the t score of the interaction between trial reward history and the CS+ trace interval response, Methods). Nevertheless, the average response per cluster to previous trial reward history was indeed found to be significant for two clusters, viz. 3 and 8. The sign of this response was negative, indicating that these clusters suppressed their responses when the previous trial was rewarded. If this was a value signal, it would be predicted that the CS+ trace responses of these clusters would be negative after acquisition as it seems that these clusters encode higher value with lower activity. This is indeed what we found (c, reproduced from Supplementary Fig 6b). However, if these neurons monotonically encoded value with suppression of activity, with zero value corresponding to baseline activity, one would predict that when the probability of reward is reduced to 50%, their responses would be scaled down but still negative. Instead, we found that their responses were actually positive to CS+ trace when reward probability was reduced to 50% (d). Overall, this suggests that the activity of these clusters does not simply reflect a monotonic value representation with zero value corresponding to baseline activity. It is nevertheless possible that these neurons are encoding negative value but with a shifted zero point, that is zero value causes positive response in these neurons and higher value causes suppression from this positive response. If this were true, it would be predicted that their responses to CS- trace during 50% probability would be higher than their responses to CS+ trace since CS+ predicts 50% probability of reward while CS- predicts 0% probability of reward. Instead, we found that the difference between CS+ trace and CS- trace was positive (e), demonstrating that the cue with zero value produced a response much less positive than the cue predicting reward with 50% probability. Thus, overall, the responses of clusters are not consistent with a monotonic value coding within this task. Further, value of the cue also increases during acquisition of the CS+-reward association, just like trial-by-trial increases during 50% probability sessions based on reward history (f). Thus, if neural responses were stably reflective of value, the change in CS+ trace response of a neuron over initial learning (slope across days) and its change in CS+ trace response due to the previous CS+ presentation being rewarded should be correlated. This within-neuron correlation is plotted in g, showing that the clusters that change their responses depending upon trial reward history do not show any correlation between change in responses during acquisition and change in responses due to trial reward history. Overall, these data show that within this task, activity of vmOFC clusters does not reflect a general value signal of the kind used in temporal difference learning algorithms. Measure of center is the mean and error bars represent standard error of the mean. * represents p < 0.05 (see Supplementary Table 1 for exact p values, sample sizes and tests).
It has previously been argued using functional neuroimaging in humans (Tobler, P. N., O’doherty, J. P., Dolan, R. J. & Schultz, W. Human neural learning depends on reward prediction errors in the blocking paradigm. J. Neurophysiol. 95, 301–310 (2006)) that OFC represents reward prediction error. Based on longitudinal tracking of the same neurons, we showed that the reward response of cluster 1 is positive on Day 1 but becomes negative on the Trained session (Fig. 2d, Supplementary Fig 7), which may be consistent with a reward prediction error signal on top of a negative baseline response after reward. However, if vmOFC clusters, including cluster 1, represented reward prediction errors in this context, we would expect suppression of responses when predicted rewards were omitted (see Fig. 6b for mean PSTH of cluster 1 showing higher reward omission responses compared to reward receipt). Results in the current figure quantify mean reward and omission responses per cluster when the probability of reward was reduced to 50%. GLM explanatory variables were Reward Receipt and Reward Omission (0–3 s after the first lick after reward receipt or omission respectively), and Reward Receipt Late and Reward Omission Late (3–6 s after the first lick after reward receipt or omission respectively). a. Reward Omission and Reward Receipt responses of OFC-CaMKII neurons, showing that omission of rewards does not cause suppression in responses b. These same responses were measured at a longer latency (3–6s after delivery or omission), which shows a slight but significant suppression in cluster 2, but none in the other clusters during omission trials c. The difference in magnitude of responses to omission and receipt of reward, showing that clusters 1 and 3 show significant reduction in responses due to the receipt of reward compared to the omission of rewards. d-f. Same as a-c but for OFC-VTA neurons. Thus, even though all clusters showed positive responses to rewards early in learning (unexpected rewards), we did not observe suppression in activity after reward omission, except seconds after omission in cluster 2 (as shown in b). Further, after initial learning, reward responses were not absent, even though the RPE should be zero for a fully predicted reward (Fig. 1). However, since the reward was delayed after the CS+ cue, responses to reward after initial learning may just be due to the uncertainty in timing. If this were the case, one would predict that the reward responses would reduce over initial learning but remain non-zero. Instead, in clusters 2, 5, and 6—the clusters showing large positive reward responses—reward responses increased after initial learning, instead of decreasing (Supplementary Fig 6b). Clusters 1, 3 and 7, on the other hand, get inhibited by reward after learning, instead of showing no response. Overall, no cluster strictly abided by the criteria for RPE encoding. Measure of center is the mean and error bars represent standard error of the mean. * represents p < 0.05 (see Supplementary Table 1 for exact p values, sample sizes and tests).
a. PSTHs of longitudinally-tracked neurons (y-axis) within the learning-related clusters on CS+ trials during Day before Extinction, First day of extinction, Last day of extinction and Reinstatement. The vertical lines show the cue onset and reward delivery/omission respectively. Time is on the x-axis, with there being 3 s between the vertical lines. A given row represents the same neuron across all four sessions, thereby suggesting that neuronal responses generally remain stable within the ensemble. b. Decoding results for all learning-related clusters. Results from clusters 1 and 5 are reproduced in Fig. 7f. In all cases, there is significant decoding after extinction on the Reinstatement session. c. Behavior of individual animals, showing that one animal is relatively resistant to extinction in both the OFC-CaMKII and the OFC-VTA groups (orange circle). d. In order to rule out apparent stability of encoding due to these outlier animals that did not appropriately learn extinction, we redid the decoding analyses after removing all neurons recorded from these animals. These results show that cluster 2 no longer has significant decoding. Measure of center in all panels is the mean and error bars represent standard error of the mean. * represents p < 0.05 (see Supplementary Table 1 for exact p values, sample sizes and tests).
a. A similar plot as in Supplementary Fig 13a is shown for longitudinally-tracked neurons across Day 1, Day before Trained, Trained, 50%, Background and After Background (same as Day before extinction) sessions. Comparison across clusters shows that the response profiles of neurons within a cluster retains the same response profile across sessions after learning and are different from the responses of all other clusters. b. Similar decoding analyses as in Fig. 7b done by training the decoder on the Trained session and testing on other sessions. c. An alternative analysis to the decoding stability analysis is to test if there is significant correlation in neuronal responses between different pairs of sessions. The Pearson’s correlation is shown for neuronal responses across all neurons (pooled across clusters) for different session pairs for both OFC-CaMKII and OFC-VTA neurons. These show that the correlation between session pairs becomes high only after learning. In order to remove the effect of between-animal variability from within-animal variability for cross-correlation, the responses within every animal were whitened (mean-subtracted and divided by standard deviation) prior to pooling between animals. This prevents issues similar to Simpson’s paradox. Measure of center in all panels is the mean and error bars represent standard error of the mean. * represents p < 0.05 (see Supplementary Table 1 for exact p values, sample sizes and tests).
Supplementary Figure 15 Mice do not devalue sucrose in the Background session and OFC encoding results remain if GLM coefficients are used to measure neural activity instead of t scores.
a. Licking PSTH on Trained, 50% and Background sessions. Plot shows average across all animals (shading is confidence interval as calculated in Python by the tsplot function in Seaborn). If multiple sessions of one contingency were run for one animal (so as to image different planes), these were averaged prior to averaging between animals. The apparent reduction in licking after 3 s in the 50% session is due to the presence of unrewarded trials. Licking reduces around CS+ trials during the anticipatory period between CS+ onset and reward during both the 50% and Background sessions. This is not due to devaluation of the reward in the Background session as the animals lick to consume rewards equally vigorously (middle). Licking on CS- trials (right) shows that animals reduce licking with respect to baseline after CS- onset, suggesting that they learned that CS- trials predict a temporary decrease in reward rate. These results are qualitatively consistent with the TIMERR theory of decision-making47. b–e. OFC neural encoding results remain similar with GLM coefficients as measure of neural response instead of GLM t scores. b. Replication of Fig. 2e, Figs. 4c (c), 5g (d), Supplementary Fig 9c (e) using GLM coefficients (Methods).
OFC-CaMKII recordings: Example video showing calcium activity recorded from OFC-CaMKII cells. This was recorded using a galvanometer scan at approximately 1 Hz. Playback framerate is 20 frames per second (that is ~20× sped up). Same field of view as shown in Fig. 1f.
Longitudinal tracking of OFC-CaMKII neurons: Video showing all tracked ROIs overlaid on the field of view during four sessions of acquisition (same animal as in Supplementary Video 1). Since the contrast of the standard deviation projections is not the same every day, primarily due to variability in neuronal activity across days, some ROI outlines might seem particularly dim or even hard to see on some days. The last few frames of the video show all ROI outlines on a plain white background to avoid this issue. ROI shapes do not cover the entirety of visible neurons so as to prevent contamination from overlapping neurons or dendrites (Methods). Therefore, some ROIs appear at slightly different locations across days since different parts of these neurons were drawn across these days.
OFC-VTA recordings: Example video showing calcium activity recorded from OFC-VTA cells using resonant scan. Effective frame rate of acquisition was 5Hz with online averaging. Playback framerate is 50 frames per second (that is ~10× sped up).
Longitudinal tracking of OFC-VTA neurons: Similar to Supplementary Video 2 but for OFC-VTA neurons. In this case, ROI outlines on a plain white background are not included since all ROIs are visible due to much lower fluorescent neuronal density.