Adaptive behaviour crucially depends on flexible decision-making, which in mammals relies on the frontal cortex, specifically the orbitofrontal cortex (OFC)1,2,3,4,5,6,7,8,9. How OFC encodes decision variables and instructs sensory areas to guide adaptive behaviour are key open questions. Here we developed a reversal learning task for head-fixed mice, monitored the activity of neurons of the lateral OFC using two-photon calcium imaging and investigated how OFC dynamically interacts with primary somatosensory cortex (S1). Mice learned to discriminate ‘go’ from ‘no-go’ tactile stimuli10,11 and adapt their behaviour upon reversal of stimulus–reward contingency (‘rule switch’). Imaging individual neurons longitudinally across all behavioural phases revealed a distinct engagement of S1 and lateral OFC, with S1 neural activity reflecting initial task learning, whereas lateral OFC neurons responded saliently and transiently to the rule switch. We identified direct long-range projections from lateral OFC to S1 that can feed this activity back to S1 as value prediction error. This top-down signal updated sensory representations in S1 by functionally remapping responses in a subpopulation of neurons that was sensitive to reward history. Functional remapping crucially depended on top-down feedback as chemogenetic silencing of lateral OFC neurons disrupted reversal learning, as well as plasticity in S1. The dynamic interaction of lateral OFC with sensory cortex thus implements computations critical for value prediction that are history dependent and error based, providing plasticity essential for flexible decision-making.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data that support the finding of this study are available upon reasonable request from the corresponding author.
Fettes, P., Schulze, L. & Downar, J. Cortico-striatal-thalamic loop circuits of the orbitofrontal cortex: promising therapeutic targets in psychiatric illness. Front. Syst. Neurosci. 11, 25 (2017).
Miller, E. K. The prefrontal cortex and cognitive control. Nat. Rev. Neurosci. 1, 59–65 (2000).
Fuster, J. M. The prefrontal cortex—an update: time is of the essence. Neuron 30, 319–333 (2001).
Rolls, E. T. The orbitofrontal cortex and reward. Cereb. Cortex 10, 284–294 (2000).
Izquierdo, A. Functional heterogeneity within rat orbitofrontal cortex in reward learning and decision making. J. Neurosci. 37, 10529–10540 (2017).
Rudebeck, P. H. & Murray, E. A. The orbitofrontal oracle: cortical mechanisms for the prediction and evaluation of specific behavioral outcomes. Neuron 84, 1143–1156 (2014).
Rushworth, M. F. S., Noonan, M. P., Boorman, E. D., Walton, M. E. & Behrens, T. E. Frontal cortex and reward-guided learning and decision-making. Neuron 70, 1054–1069 (2011).
Wallis, J. D. Orbitofrontal cortex and its contribution to decision-making. Annu. Rev. Neurosci. 30, 31–56 (2007).
Carlén, M. What constitutes the prefrontal cortex? Science 358, 478–482 (2017).
Chen, J. L., Carta, S., Soldado-Magraner, J., Schneider, B. L. & Helmchen, F. Behaviour-dependent recruitment of long-range projection neurons in somatosensory cortex. Nature 499, 336–340 (2013).
Chen, J. L. et al. Pathway-specific reorganization of projection neurons in somatosensory cortex during learning. Nat. Neurosci. 18, 1101–1108 (2015).
Petersen, C. C. H. Sensorimotor processing in the rodent barrel cortex. Nat. Rev. Neurosci. 20, 533–546 (2019).
Bissonette, G. B., Schoenbaum, G., Roesch, M. R. & Powell, E. M. Interneurons are necessary for coordinated activity during reversal learning in orbitofrontal cortex. Biol. Psychiatry 77, 454–464 (2015).
Jennings, J. H. et al. Interacting neural ensembles in orbitofrontal cortex for social and feeding behaviour. Nature 565, 645–649 (2019).
Pho, G. N., Goard, M. J., Woodson, J., Crawford, B. & Sur, M. Task-dependent representations of stimulus and choice in mouse parietal cortex. Nat. Commun. 9, 2596 (2018).
Ramesh, R. N., Burgess, C. R., Sugden, A. U., Gyetvan, M. & Andermann, M. L. Intermingled ensembles in visual association cortex encode stimulus identity or predicted outcome. Neuron 100, 900–915.e9 (2018).
Voigt, F. F. et al. The mesoSPIM initiative: open-source light-sheet microscopes for imaging cleared tissue. Nat. Methods 16, 1105–1108 (2019).
Bastos, A. M. et al. Canonical microcircuits for predictive coding. Neuron 76, 695–711 (2012).
Schoenbaum, G., Roesch, M. R., Stalnaker, T. A. & Takahashi, Y. K. A new perspective on the role of the orbitofrontal cortex in adaptive behaviour. Nat. Rev. Neurosci. 10, 885–892 (2009).
Schultz, W. & Dickinson, A. Neuronal coding of prediction errors. Annu. Rev. Neurosci. 23, 473–500 (2000).
Sul, J. H., Kim, H., Huh, N., Lee, D. & Jung, M. W. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron 66, 449–460 (2010).
Chudasama, Y. & Robbins, T. W. Dissociable contributions of the orbitofrontal and infralimbic cortex to Pavlovian autoshaping and discrimination reversal learning: further evidence for the functional heterogeneity of the rodent frontal cortex. J. Neurosci. 23, 8771–8780 (2003).
Groman, S. M. et al. Orbitofrontal circuits control multiple reinforcement-learning processes. Neuron 103, 734–746.e3 (2019).
Hattori, R., Danskin, B., Babic, Z., Mlynaryk, N. & Komiyama, T. Area-specificity and plasticity of history-dependent value coding during learning. Cell 177, 1858–1872 (2019).
Saez, R. A., Saez, A., Paton, J. J., Lau, B. & Salzman, C. D. Distinct roles for the amygdala and orbitofrontal cortex in representing the relative amount of expected reward. Neuron 95, 70–77.e3 (2017).
Rikhye, R. V., Gilra, A. & Halassa, M. M. Thalamic regulation of switching between cortical representations enables cognitive flexibility. Nat. Neurosci. 21, 1753–1763 (2018).
Shuler, M. G. & Bear, M. F. Reward timing in the primary visual cortex. Science 311, 1606–1609 (2006).
Chéreau, R. et al. Dynamic perceptual feature selectivity in primary somatosensory cortex upon reversal learning. Nat. Commun. 11, 3245 (2020).
Bari, A. et al. Serotonin modulates sensitivity to reward and negative feedback in a probabilistic reversal learning task in rats. Neuropsychopharmacology 35, 1290–1301 (2010).
Isaacson, J. S. & Scanziani, M. How inhibition shapes cortical activity. Neuron 72, 231–243 (2011).
Neftci, E. O. & Averbeck, B. B. Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1, 133–143 (2019).
Mayford, M. et al. Control of memory formation through regulated expression of a CaMKII transgene. Science 274, 1678–1683 (1996).
Madisen, L. et al. Transgenic mice for intersectional targeting of neural sensors and effectors with high specificity and performance. Neuron 85, 942–958 (2015).
Harris, J. A. et al. Anatomical characterization of Cre driver mice for neural circuit mapping and manipulation. Front. Neural Circuits 8, 76 (2014).
Carandini, M. & Churchland, A. K. Probing perceptual decisions in rodents. Nat. Neurosci. 16, 824–831 (2013).
Bailey, K. R. & Crawley, J. N. in Methods in Behavioral Analysis in Neuroscience (eds Bailey, K. R., Crawley, J. N. & Buccafusco, J. J.) Ch. 5 (CRC, 2009).
Farr, T. D., Liu, L., Colwell, K. L., Whishaw, I. Q. & Metz, G. A. Bilateral alteration in stepping pattern after unilateral motor cortex injury: a new test strategy for analysis of skilled limb movements in neurological mouse models. J. Neurosci. Methods 153, 104–113 (2006).
Banerjee, A. et al. Jointly reduced inhibition and excitation underlies circuit-wide changes in cortical processing in Rett syndrome. Proc. Natl Acad. Sci. USA 113, E7287–E7296 (2016).
Yang, B. et al. Single-cell phenotyping within transparent intact tissue through whole-body clearing. Cell 158, 945–958 (2014).
Chung, K. et al. Structural and molecular interrogation of intact biological systems. Nature 497, 332–337 (2013).
Gomez, J. L. et al. Chemogenetics revealed: DREADD occupancy and activation via converted clozapine. Science 357, 503–507 (2017).
Gilad, A., Gallero-Salas, Y., Groos, D. & Helmchen, F. Behavioral strategy determines frontal or posterior location of short-term memory in neocortex. Neuron 99, 814–828.e7 (2018).
Langer, D. et al. HelioScan: a software framework for controlling in vivo microscopy setups with high hardware flexibility, functional diversity and extendibility. J. Neurosci. Methods 215, 38–52 (2013).
Guo, Z. V. et al. Flow of cortical activity underlying a tactile decision in mice. Neuron 81, 179–194 (2014).
Sreenivasan, V. et al. Movement initiation signals in mouse whisker motor cortex. Neuron 92, 1368–1382 (2016).
Huber, D. et al. Multiple dynamic representations in the motor cortex during sensorimotor learning. Nature 484, 473–478 (2012).
This work is supported by a H2020 Marie Skłodowska-Curie fellowship (CIRCDYN, grant 709288) and a NARSAD Young Investigator award (grant 24941) from the Brain & Behavior Research Foundation to A.B., and grants from the Swiss National Science Foundation (310030B_170269), Sinergia SNF grant (CRSII5_180316) and European Research Council (ERC Advanced Grant BRAINCOMPATH, 670757) to F.H. We thank B. Grewe for showing us the preparation for GRIN lens imaging; M. E. Schwab for the use of equipment for open-field and ladder-rung test; S. Carta, L. Shumanovski, D. Göckeritz, L. Egolf and C. Rickenbach for various assistance; and W. Senn, F. Lucantonio, M. Goard, M. Pignatelli and B. Scholl for discussions of the manuscript.
The authors declare no competing interests.
Peer review information Nature thanks Cornelius Schwarz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Time course of task performance (discriminability index, d′) of individual mouse reveals dynamics of learning and reversal learning upon rule switch. Each line in various blue shades represents a single mouse of a total of 11 mice. b, Percentage of correct decision ‘(hit + CR)/all trials’ as ‘outcome rate’ plotted during the four salient behavioural phases of learning (learning naive, LN; learning expert, LE) and reversal (reversal naive, RN; reversal expert, RE) (n = 11 mice). c, Reversal performance is stable and remains high when mice with reversed reward contingency (P1200 as go texture, RE) were tested 6 weeks later (n = 2 mice). d, Reversal learning is independent of initial texture training (fine grit size sandpaper P1200 texture as initial go texture; n = 2 mice). e, Texture discrimination is dependent on sensory input. Left, keeping textures out of reach in expert mice after reversal (RE) impaired their performances (n = 3 sessions in 2 mice). Right, clipping whiskers in expert mice similarly resulted in impaired performance (low d′) indicating sensory input is essential for the correct execution of the task (n = 3 mice, longitudinally studied before and after whisker-clipping). Data presented as mean ± s.e.m., ***P < 0.001, two-sided Wilcoxon rank-sum test.
a, Upper row, time course of envelope whisking amplitude aligned to first-touch during go (left) and no-go trials (right) across two salient periods of initial learning (learning naive, LN; learning expert, LE). Naive (LN) mice showed low-amplitude whisking activity throughout most of the trial. In expert mice (LE), whisking behaviour became time-locked to the arrival of the texture. Lower row, equivalent whisking traces for the periods after rule switch (reversal naive, RN; reversal expert, RE; right). In both RN and RE periods, mice showed stimulus time-locked whisking amplitude (n = 3 mice). Note that amplitudes and temporal profiles of the whisking envelope were similar for the smooth P1200 and the rough P100 texture, independent of stimulus-outcome association. b, Equivalent analysis as in a but for the mean whisking velocity. No significant difference was found in the velocity profile between the two textures in the stimulus-presentation window. c, Time-course of average lick rates during go trials across two salient phases of initial learning (left) and reversal learning (right) (n = 11 mice). Expert mice (LE and RE) showed both an increase in licking activity during report window (grey) and a decrease of early licks (B, baseline; S, stimulus presentation; R, reward). Data are presented as mean (solid line) ± s.e.m. (shaded area).
Extended Data Fig. 3 Immunohistochemical and behavioural validation of pharmacogenetic silencing using hM4Di.
a, Neuronal silencing was achieved via viral injection of inhibitory DREADD (AAV-hM4Di-mCherry) into S1 and/or lOFC in mice followed by systemic CNO application. S1 injection (top) was bilateral and lOFC (LO) injection (below) was unilateral and to the ipsilateral side of the barrel field. b, Injection of hM4Di in lOFC and systemic administration (i.p.) of clozapine (1–5 mg/kg) after rule switch (RN and RE) selectively impaired reversal learning (n = 3 mice). c, Injection of hM4Di in lOFC- and CNO-treated mice showed increased perseverative errors (false alarm, FA) in RE compared to LE (n = 4 mice). d, e, Silencing medial OFC (MO) by injecting hM4Di unilaterally in the MO, followed by daily systemic CNO application after rule switch (RN through RE period), did not have any effect on reversal learning. *P < 0.05, **P < 0.01, ***P < 0.001 two-sided Wilcoxon rank-sum test. Data are presented as mean ± s.e.m.
a, Timeline depicting experimental sequence for validation of lOFC (LO) silencing (top). Schematic of acute electrophysiological recording from frontal cortex (bottom). DAPI stained slice imaged with a confocal microscope showing red fluorescence from DiD to mark the probe location. Example traces from three electrode contacts from one recording session for pre- and post-CNO injection (middle). Box plots showing change in firing rate (% change relative to baseline) for electrode contacts above, in or below lOFC. Plots show median and 25th and 75th percentiles as box edges, and 5th and 95th percentiles as whiskers. To the right, example waveforms from units showing significant modulation by CNO. *P < 0.05, t-test.
a, A schematic diagram and whole-brain image showing the location of cannula implantation in OFC. Coloured regions on the schematic indicate premotor and motor areas as described in the previous studies42,44,45,46 (left hemisphere), or regions according to the Allen institute common coordinate framework (right hemisphere). b, A schematic diagram based on the Allen brain atlas and light-microscopic and confocal views shows the GCaMP6f-expressing mice in lOFC (LO) and cannula placement above the virus injection site. c, Whisking behaviour is preserved in mice implanted with OFC cannulas. Envelope whisking amplitude (top) and whisking velocity (bottom) in expert mice (RE) centred on the texture approach (n = 2 mice). d, Open-field test showed normal locomotor function of wild-type non-implanted and OFC cannula-implanted GCaMP6f-expressing mice (n = 4 WT and n = 2 OFC cannula-implanted mice). Representative picture of locomotor track (top) and heat map (bottom) of an OFC cannula-implanted mouse. Total distance covered (cm) and mean velocity (cm/s) is plotted. Scale bar = 5 cm. e, Horizontal ladder-rung test showed normal locomotor function of wild-type (WT, n = 4) and OFC cannula-implanted mice (n = 2). A representative picture showing paw placement of a mouse on irregular horizontal rung-ladder. f, Analysis of paw placement of the limb contralateral to the cannula-implanted side showed no significant difference between WT and OFC cannula-implanted mice. g, No differences were seen in paw placement of the limb ipsilateral or contralateral to the cannula-implanted side in OFC cannula-implanted and in control WT mice. Data are presented as mean ± s.e.m.
Extended Data Fig. 6 Re-learning task with neutral context and in vivo Ca2+ imaging of lOFC neurons.
a, Schematic of the stimulus-outcome associations in a three-textures task with positive (large reward), neutral (small reward), and negative (punishment) context. Same coarse P100 and smooth P1200 sandpapers were used, but an additional intermediate coarseness P600 sandpaper was introduced as go-neutral context (gonc) associated with a small reward, that did not change upon reversal. b, Average Ca2+ transient amplitude in the reward-outcome window for lOFC neurons for hit, hitnc and CR trials (n = 63 active neurons out of 228 neurons recorded in three mice; n = 15 sessions) showing increased hit responses upon rule-switch but no significant changes during hitnc trials. Across-trial average Ca2+ transients for each behavioural period are shown above. All box plots show median and 25th and 75th percentiles as box edges, and 5th and 95th percentiles as whiskers.
Extended Data Fig. 7 Task-related functional dynamics in S1→lOFC projecting neurons during reversal learning.
a, Retrograde AAV-retro/2-tdTomato injections in vivo in the lOFC followed by clearing the brain using CLARITY and whole-brain light-sheet microscopy revealed feed-forward S1→OFC projections from both deeper (L5 and 6) and superficial (L2/3) layers of S1 (n = 2 mice). Labelling is weaker on the contralateral side of the injection site. b, S1→lOFC projecting neurons were labelled with GCaMP6f using a dual-viral strategy with retrograde AAV2-retro/2-Cre injected in lOFC and Cre-dependent AAV-DIO-GCaMP6f in S1. Inset, L2/3 neurons in S1 labelled with such strategy. c, Average Ca2+ transient amplitude in the reward-outcome window shows a significant increase in response amplitude during expert phases of training (LE and RE) (n = 96 active neurons over n = 135 recorded neurons in 2 mice, n = 5 sessions/phase). d, Top, S1→lOFC projecting neurons were labelled using a dual-viral strategy with retrograde AAV2-retro/2-Cre injected in lOFC and Cre-dependent AAV-DIO-GCaMP6f in S1. Bottom, peak reward-related responses of S1→lOFC projection neurons averaged across hit (left) and CR (right) trials, longitudinally measured across four salient periods (n = 96 neurons from n = 2 mice, n = 5 sessions/phase). Box plots (median, red line; 25th and 75th percentiles, box edges; most extreme non-outliers, whiskers; outliers, red crosses; zero, dashed grey line) are also shown (inset). e, Scatter plot and histogram comparing selectivity index (SI) of S1→lOFC projecting neurons during learning expert (LE) and reversal naive (RN) phase (n = 39 active neurons over n = 46 neurons from n = 2 mice, n = 5 sessions/phase). f, Scatter plot and histogram comparing SI of S1→lOFC projecting neurons during LE and reversal expert (RE) phases (n = 61 active neurons over n = 73 from n = 2 mice, n = 5 sessions/phase). All box plots show median and 25th and 75th percentiles as box edges, and 5th and 95th percentiles as whiskers. Data presented as mean ± s.e.m., *P < 0.05, **P < 0.01 two-sided Wilcoxon rank-sum test.
a, A schematic view of the step-by-step derivation of the selectivity index (SI) from the ROC curves. b, Selectivity indices of longitudinally tracked lOFC neurons across the salient task-periods of LE, RN, and RE. Marker colours for RN and RE indicate the assigned classes for the LE→RN and LE→→RE comparisons, respectively. Plots are shown separately for each LE→RN class. c, Fate mapping of longitudinally tracked lOFC neurons. For each LE→RN assigned class, the distribution of these neurons across classes for the LE→→RE comparison is shown as coloured bar on the right. d, Same as in b but for S1 neurons. e, Same as in c but for S1 neurons. f, Same as in b but for S1 neurons in lOFC-silenced mice. g, Same as in c but for S1 neurons in lOFC-silenced mice. Inset in e, the fate distributions of the non-selective neurons in LE→RN show a significantly smaller fraction of neurons that acquire selectivity for the newly rewarded go texture in the RE phase in S1 neurons when lOFC was silenced in mice (22% versus 60%, one-tailed χ2 test). Note that the fate mapping plots include additional neurons compared to b, d and f as these were not assigned an SI value in each phase but were still classified.
a, Average Ca2+ transient amplitude (ΔF/F) in the stimulus-presentation window for S1 neurons (n = 142 neurons in n = 3 mice, n = 2 sessions/phase). b, Scatter plot and histogram comparing texture touch-related selectivity index (SI) for the stimulus-presentation window for S1 neurons during learning expert (LE) and reversal naive (RN) phase (n = 218 from n = 3 mice, n = 28 sessions). c, Scatter plot and histogram comparing SI of S1 neurons during LE and reversal expert (RE) phase (n = 218 neurons from n = 3 mice, n = 28 sessions). d, Average Ca2+ transient amplitude (ΔF/F) in the stimulus-presentation window for S1 neurons in lOFC silenced mice (n = 87 neurons in n = 2 mice, n = 2 sessions/phase). e, Scatter plot and histogram comparing texture touch-related SI of S1 neurons during LE and RN phase in lOFC-silenced mice (n = 165 neurons, n = 25 sessions per phase). f, Scatter plot and histogram comparing touch-related SI of S1 neurons in lOFC silenced mice during LE and RE phase (n = 210 neurons in n = 3 mice, n = 28 sessions). g, Comparison of SI marginal distributions for the three salient periods LE, RN, and RE for lOFC neurons (2D scatter plots not shown), S1 neurons (c, d) and S1 neurons in lOFC-silenced mice (e, f). All box plots show median and 25th and 75th percentiles as box edges, and 5th and 95th percentiles as whiskers. *P < 0.05, two-sided Wilcoxon rank-sum test.
Extended Data Fig. 10 Differential modulation of task variable-relevant events in neuronal responses.
a, Schematic diagram of a generalized linear model (GLM, Poisson regression) to predict neural activity from behavioural task variables. Each event was expanded into a series of evenly spaced gaussian filters. b, GLM predicting deconvolved neural activity of an example S1 outcome-selective neuron from task variables. c, Separate components contributing to the average response of this neuron reveal major sensory modulation together with reward-evoked activity. B, baseline; T, texture touch; R, reward. d, To quantify each task variable contribution, the relative fraction of deviance explained is calculated and normalized by the total deviance explained for each neuron both before and after reversal. The reward component in lOFC outcome-selective neurons is significantly greater than the touch related component. e, Fraction of deviance explained for each component in separate subsets of S1 neurons reveal distinct modulations for specific task-related events. Notably, responses of outcome selective S1 neuronal responses are mostly explained by reward component. Licking activity seems to modulate S1 neural responses less than reward in each subset. Neurons analysed using GLM are same neurons from Fig. 3. Data are presented as mean ± s.e.m., *P < 0.05, **P < 0.01, two-sided Wilcoxon rank-sum test. f, Reward-history modulation index (RHMI) for functional subclasses of lOFC neurons and S1 neurons in OFC intact control mice and lOFC-silenced mice (neurons analysed are from Fig. 4b; ns = P > 0.05; bootstrap-permutation test; s.e.m. of RHMI with permutated indices as grey bars).
About this article
Cite this article
Banerjee, A., Parente, G., Teutsch, J. et al. Value-guided remapping of sensory cortex by lateral orbitofrontal cortex. Nature 585, 245–250 (2020). https://doi.org/10.1038/s41586-020-2704-z
The extracellular matrix regulates cortical layer dynamics and cross-columnar frequency integration in the auditory cortex
Communications Biology (2021)
Nature Reviews Neuroscience (2021)
Nature Communications (2021)
Neuroscience Bulletin (2021)