Abstract
In any given situation, the environment can be parsed in different ways to yield decision variables (DVs) defining strategies useful for different tasks. It is generally presumed that the brain only computes a single DV defining the current behavioral strategy. Here to test this assumption, we recorded neural ensembles in the frontal cortex of mice performing a foraging task admitting multiple DVs. Methods developed to uncover the currently employed DV revealed the use of multiple strategies and occasional switches in strategy within sessions. Optogenetic manipulations showed that the secondary motor cortex (M2) is needed for mice to use the different DVs in the task. Surprisingly, we found that regardless of which DV best explained the current behavior, M2 activity concurrently encoded a full basis set of computations defining a reservoir of DVs appropriate for alternative tasks. This form of neural multiplexing may confer considerable advantages for learning and adaptive behavior.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The preprocessed electrophysiological and behavioral data collected for this study are publicly and can be accessed at: https://doi.org/10.6084/m9.figshare.20449089.
Raw electrophysiological data are too large to be shared on a publicly available repository and are therefore available from the authors upon reasonable request.
The Allen Mouse Brain Atlas used in this study is publicly available: https://alleninstitute.github.io/AllenSDK/reference_space.html.
Code availability
All analyses were performed using custom code written in MATLAB and available upon request.
The code used for the central GLM analyses is publicly available at: https://hastie.su.domains/glmnet_matlab/.
The code developed for the LM-HMM can be accessed at: https://github.com/mazzulab/ssm/blob/master/notebooks/2c%20Input-driven%20linear%20model%20(LM-HMM).ipynb.
References
Niv, Y. Learning task-state representations. Nat. Neurosci. 22, 1544–1553 (2019).
Kang, Y. H. et al. Multiple decisions about one object involve parallel sensory acquisition but time-multiplexed evidence incorporation. eLife 10, e63721 (2021).
Pashler, H. Processing stages in overlapping tasks: evidence for a central bottleneck. J. Exp. Psychol. Hum. Percept. Perform. 10, 358–377 (1984).
Sigman, M. & Dehaene, S. Parsing a cognitive task: a characterization of the mind’s bottleneck. PLoS Biol. 3, e37 (2005).
Scott, B. B. et al. Fronto-parietal cortical circuits encode accumulated evidence with a diversity of timescales. Neuron 95, 385–398 (2017).
Bernacchia, A., Seo, H., Lee, D. & Wang, X.-J. A reservoir of time constants for memory traces in cortical neurons. Nat. Neurosci. 14, 366–372 (2011).
Cazettes, F., Reato, D., Morais, J. P., Renart, A. & Mainen, Z. F. Phasic activation of dorsal raphe serotonergic neurons increases pupil size. Curr. Biol. 31, 192–197 (2021).
Vertechi, P. et al. Inference-based decisions in a hidden state foraging task: differential contributions of prefrontal cortical areas. Neuron 106, 166–176 (2020).
Jun, J. J. et al. Fully integrated silicon probes for high-density recording of neural activity. Nature 551, 232–236 (2017).
Murakami, M., Vicente, M. I., Costa, G. M. & Mainen, Z. F. Neural antecedents of self-initiated actions in secondary motor cortex. Nat. Neurosci. 17, 1574 (2014).
Li, N., Chen, T.-W., Guo, Z. V., Gerfen, C. R. & Svoboda, K. A motor cortex circuit for motor planning and movement. Nature 519, 51–56 (2015).
Siniscalchi, M. J., Phoumthipphavong, V., Ali, F., Lozano, M. & Kwan, A. C. Fast and slow transitions in frontal ensemble activity during flexible sensorimotor behavior. Nat. Neurosci. 19, 1234–1242 (2016).
Recanatesi, S., Pereira-Obilinovic, U., Murakami, M., Mainen, Z. & Mazzucato, L. Metastable attractors explain the variable timing of stable behavioral action sequences. Neuron 110, 139–153 (2022).
Ashwood, Z. C. et al. Mice alternate between discrete strategies during perceptual decision-making. Nat. Neurosci. 25, 201–212 (2022).
Enel, P., Procyk, E., Quilodran, R. & Dominey, P. F. Reservoir computing properties of neural dynamics in prefrontal cortex. PLoS Comput. Biol. 12, e1004967 (2016).
Jaeger, H. & Haas, H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004).
Sussillo, D. & Abbott, L. F. Generating coherent patterns of activity from chaotic neural networks. Neuron 63, 544–557 (2009).
Mello, G. B. M., Soares, S. & Paton, J. J. A scalable population code for time in the striatum. Curr. Biol. 25, 1113–1122 (2015).
Simen, P., Balci, F., deSouza, L., Cohen, J. D. & Holmes, P. A model of interval timing by neural integration. J. Neurosci. 31, 9238–9253 (2011).
Sugrue, L. P., Corrado, G. S. & Newsome, W. T. Matching behavior and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004).
Hayden, B. Y., Pearson, J. M. & Platt, M. L. Neuronal basis of sequential foraging decisions in a patchy environment. Nat. Neurosci. 14, 933–939 (2011).
Brunton, B. W., Botvinick, M. M. & Brody, C. D. Rats and humans can optimally accumulate evidence for decision-making. Science 340, 95–98 (2013).
Xiong, Q., Znamenskiy, P. & Zador, A. M. Selective corticostriatal plasticity during acquisition of an auditory discrimination task. Nature 521, 348–351 (2015).
Drugowitsch, J., Mendonça, A. G., Mainen, Z. F. & Pouget, A. Learning optimal decisions with confidence. Proc. Natl Acad. Sci. USA 116, 24872–24880 (2019).
Kobak, D. et al. Demixed principal component analysis of neural population data. eLife 5, e10989 (2016).
Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).
Raposo, D., Kaufman, M. T. & Churchland, A. K. A category-free neural population supports evolving demands during decision-making. Nat. Neurosci. 17, 1784–1792 (2014).
Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).
Tanaka, G. et al. Recent advances in physical reservoir computing: a review. Neural Netw. 115, 100–123 (2019).
Wald, A. Sequential Analysis (John Wiley & Sons, 1947).
Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N. & Pouget, A. The cost of accumulating evidence in perceptual decision making. J. Neurosci. 32, 3612–3628 (2012).
Gold, J. I. & Shadlen, M. N. Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron 36, 299–308 (2002).
Glaze, C. M., Kable, J. W. & Gold, J. I. Normative evidence accumulation in unpredictable environments. eLife 4, e08825 (2015).
Krajbich, I. & Rangel, A. Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proc. Natl Acad. Sci. USA 108, 13852–13857 (2011).
Yang, T. & Shadlen, M. N. Probabilistic reasoning by neurons. Nature 447, 1075–1080 (2007).
Sarafyazd, M. & Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science 364, eaav8911 (2019).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99–134 (1998).
Rao, R. P. N. Decision making under uncertainty: a neural model based on partially observable Markov decision processes. Front. Comput. Neurosci. 4, 146 (2010).
Rushworth, M. F. S. & Behrens, T. E. J. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 11, 389–397 (2008).
Hermoso-Mendizabal, A. et al. Response outcomes gate the impact of expectations on perceptual decisions. Nat. Commun. 11, 1057 (2020).
Gershman, S. J. & Niv, Y. Learning latent structure: carving nature at its joints. Curr. Opin. Neurobiol. 20, 251–256 (2010).
Thompson, W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933).
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Pisupati, S., Chartarifsky-Lynn, L., Khanal, A. & Churchland, A. K. Lapses in perceptual decisions reflect exploration. eLife 10, e55490 (2021).
Zylberberg, A., Ouellette, B., Sigman, M. & Roelfsema, P. R. Decision making during the psychological refractory period. Curr. Biol. 22, 1795–1799 (2012).
Cisek, P. Cortical mechanisms of action selection: the affordance competition hypothesis. Philos. Trans. R. Soc. B Biol. Sci. 362, 1585–1599 (2007).
Gallivan, J. P., Logan, L., Wolpert, D. M. & Flanagan, J. R. Parallel specification of competing sensorimotor control policies for alternative action options. Nat. Neurosci. 19, 320–326 (2016).
Klapp, S. T., Maslovat, D. & Jagacinski, R. J. The bottleneck of the psychological refractory period effect involves timing of response initiation rather than response selection. Psychon. Bull. Rev. 26, 29–47 (2019).
Lopes, G. et al. Bonsai: an event-based framework for processing and controlling data streams. Front. Neuroinform. 9, 7 (2015).
Shamash, P., Carandini, M., Harris, K. & Steinmetz, N. A tool for analyzing electrode tracks from slice histology. Preprint at bioRxiv https://doi.org/10.1101/447995 (2018).
Steinmetz, N. A., Zatka-Haas, P., Carandini, M. & Harris, K. D. Distributed coding of choice, action, and engagement across the mouse brain. Nature 576, 266–273 (2019).
Simon, N., Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39, 1–13 (2011).
Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Acknowledgements
We thank P. Vertechi for insightful discussions about the project and the model and D. Reato for support with analyses. We also thank M. Beckert for assistance with the illustrations. This work was supported by an EMBO long-term fellowship (F.C.; ALTF 461-2016) an AXA postdoctoral fellowship (F.C.), the National Institute of Neurological Disorders and Stroke grant R01-NS118461 (BRAIN Initiative, L.M.), the MEXT Grant-in-Aid for Scientific Research (19H05208, 19H05310 and 19K06882 (M.M.)), the Takeda Science Foundation (M.M.), Fundação para a Ciência e a Tecnologia (PTDC/MED_NEU/32068/2017, M.M., Z.F.M.; and LISBOA-01-0145-FEDER-032077, A.R.), the European Research Council Advanced Grant (671251, Z.F.M.), Simons Foundation (SCGB 543011, Z.F.M.) and Champalimaud Foundation (Z.F.M., A.R.). This work was also supported by Portuguese national funds, through FCT—Fundação para a Ciência e a Tecnologia—in the context of the project UIDB/04443/2020 and by the research infrastructure CONGENTO, cofinanced by Lisboa Regional Operational Programme (Lisboa2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF) and Fundação para a Ciência e Tecnologia (Portugal) under the projects LISBOA-01-0145-FEDER-02217 and LISBOA-01-0145-FEDER-022122.
Author information
Authors and Affiliations
Contributions
F.C. and Z.F.M. conceived the project. F.C. and M.M. designed and performed behavioral experiments. J.P.M. and E.A. helped with surgery and behavioral training. F.C. designed and performed electrophysiological experiments. F.C. curated the data. F.C. and A.R. designed and performed the analyses. L.M. designed the LM-HMM. F.C., A.R. and Z.F.M. wrote the manuscript. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
All authors declare no competing interests.
Peer review
Peer review information
Nature Neuroscience thanks Alex Kwan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Task apparatus and behavioral properties.
(a) The behavioral apparatus consists of a treadmill, coupled to two motors. Rotating the treadmill activates in a closed-loop fashion the movement of the arms via the motors. A mouse placed on the treadmill with its head fixed can lick at the spout from the arm in front. A camera placed on the side of the animal allows on-line video detection of the licks. (b) View from the lick detector camera. A region of interest is defined around the tongue of the animal. To detect the licks a threshold is applied to the signal within the region of interest. (c) The task consists of behavioral bouts and traveling epochs. Within a behavioral bout, the outcomes of the licks are classified into three types: reward, failure and invalid. Rewards and failures occur when the mouse slows down its running speed below an arbitrary threshold after the ‘STOP event’. The ‘STOP event’ is signaled by an auditory tone when an arm comes into place. Any lick above the running threshold is considered as invalid and always unrewarded. The traveling epoch starts after the ‘LEAVE event’ when the mouse initiates the run. (d, e, f) The licking behavior of the animals is stereotyped. (d) Histogram of the time between each lick. (e) Examples of lick raster of consecutive failures (top) and consecutive rewards (bottom). Licks are aligned at the onset of a rewarded lick and sorted based on the following events. (f) The licking frequency that corresponds to the two different examples in (e) (series of consecutive rewards in green and series of consecutive failures in purple). (g, h, i, j) Time distributions of different behavioral events (mean ± s.e.m.; n = 21 mice). The time spent licking was much greater than the time to initiate licking (between STOP event and first lick) or the time to initiate running (between the last lick and LEAVE event). Notably, engaged mice took less than half a second after the last licks to leave the site in most bouts (Median time to run = 0.46 s). The running time is comparable to the licking time. (k) Monotonic relationship between the number of consecutive failures after the last reward and the time licking after the last reward (each dot represents the means across bouts for each session).
Extended Data Fig. 2 Ground truth model.
(a, b) The slope (a) and intercept (b) estimates as a function of the ground truth for simulated sessions where the number of bouts matched that of real sessions. The ground truth can be recovered (R2 = 0.99 for the slope; R2 = 0.91 for the intercept) from the logistic regression. (c, d) The slope (c) and intercept (d) estimates as a function of the ground truth for simulated sessions with varying number of bouts. Overall, the ground truth can be precisely recovered for sessions with more than 100 bouts. (e) Deviance explained from a logistic regression model that fits simulated sessions of an inference-based agent using the correct model (‘Consecutive failures’), a wrong but correlated model (‘Negative value’) and a random model (where both rewards and failures are arbitrarily accumulated or reset). The deviance explained by the consecutive failures represents the upper-bound of the model performance. The deviance explained by the consecutive failures being smaller than 1 indicates that, although the ground truth can be recovered, the switching decision is not deterministic and involves some stochasticity (here the variability was matched to that of the data). However, the deviance explained by the consecutive failures is significantly greater than the deviance explained by the correlated model and the random model (two-sided Wilcoxon signed rank test, 3 stars indicate p < 10−3, p = 0.00005 between Consec. fail. and Neg. value; p < 10−7 between Consec. fail. and Random). On each box the central mark indicates the median across simulated sessions (n = 42 sessions), and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. (f) Illustration of a logistic regression model for predicting the switching decision of an inference-based simulated agent from the two different DVs (‘Consecutive failures’ and ‘Negative value’) simultaneously. (g) Deviance explained from the model in (f) as a function of the number of bouts in each session. (h) For all simulated sessions in (e), the variance explained by the ‘consecutive failures’ DV was greater than the variance explained by the ‘negative value’ DV, indicating that the model inferred the true DV.
Extended Data Fig. 3 Testing alternative foraging strategies.
(a) Illustration of the logistic regression model for predicting the switching decision of mice using a combination of the two main DVs, ‘Consecutive failures’ and ‘Negative value’, as well as additional DVs. Specifically, we tested 3 classes of additional DVs: 1) those relying on absolute time, 2) those relying on average reward rates, and 3) those that weigh recent evidence more strongly. The design matrix of the model thus consisted of the two main DVs, the time of each lick relative to the first lick of each bout (class 1), the average reward rate over 1, 3 and 10 previous bouts (class 2) and a version of the negative value DV that weighs recent evidence more heavily than the past ones (for class 3), such as: xt+1 = (1 − α)·g(ot+1)·xt + α·c(ot+1), with α = 0.8. (b) Deviance explained from a logistic regression model that predicts choice behavior based only on the 2 main DVs (left) and from the full model that also includes the additional DVs in (a). The central mark indicates the median across behavioral sessions (n = 42 sessions), and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. There was no significant difference between the deviance explained of the two models (two-sided Wilcoxon signed rank test: p = 0.22), indicating that the additional DVs do not improve the performance of the model. (c) Relative variance explained by each predictor of the full model for each behavioral session (n = 42 sessions across 21 mice, 2 sessions per mice). The dominant DV (the one with the maximum relative variance explained) was most often the ‘Consecutive failures’ (18 sessions), followed by the ‘Negative value’ (17 sessions), and finally the additional DVs (2 session for the absolute time, 2 sessions for average reward rate, 3 sessions for the weighted negative value).
Extended Data Fig. 4 Pipeline for extracellular electrophysiology, data processing and cluster mapping.
(a) Data collection from the Neuropixels probe. (b) Kilosort2 is used to automatically match spike templates to raw data. (c) Example of voltage data input to Kilosort2. Prior to the automatic sorting, the raw data is preprocessed with offset subtraction, median subtraction, and whitening steps. (d) Manual quality control is done on the outputs of Kilosort2 using PHY to remove units with nonphysiological waveforms (e), contaminated refractory periods, low amplitude (less than 50 µV) or low spiking units (less than 0.5 spike·s−1). (f) For further quality control, visualization of peri-event spike histograms (g, top; examples histogram aligned to first lick) or scatter plots (g, bottom; example scatter plot aligned to first lick) of single neurons are made with custom-written script in MATLAB. (h, i) Example scatter plot of all neurons recorded simultaneously along the shank of the probe. This visualization helps delimitate landmarks based on electrophysiological signatures to map cluster locations. (j, k, l) Landmarks derived from electrophysiological responses are validated with estimates from histology using an open-source software (SHARP-Track).
Extended Data Fig. 5 Optogenetic effect on action timing.
(a) Illustration of the different action timing during a behavioral bout. (b) We used generalized linear mixed effect models to evaluate the effect of stimulation (‘Laser’ predictor) on each action timing (see Methods). The models were fit separately for inactivated and control mice (number of observations: Inactivated = 68; Control = 20). (c–f) Median timing across bouts in Laser OFF vs. Laser ON condition for each session (dots) of inactivated mice (violet) and control mice (red) mice. The p-value corresponding to the t-statistic for a two-sided null hypothesis test that the coefficient of the ‘Laser’ predictor is equal to 0 (plaser) is reported for each group of mice (color coded). (c) Fixed effect of stimulation (‘Laser’ predictor) on the inter-lick interval: Inactivated: −0.003 ± 0.0009, p = 0.001; Control: 0.005 ± 0.004, p = 0.24. (d) Fixed effect of stimulation (‘Laser’ predictor) on the time licking: Inactivated: 0.45 ± 0.26, p = 0.08; Control: −0.078 ± 0.22, p = 0.72. (e) Fixed effect of stimulation (‘Laser’ predictor) on the time to run: Inactivated: −0.075 ± 0.25, p = 0.76; Control: 0.014 ± 0.14, p = 0.92. (f) Fixed effect of stimulation (‘Laser’ predictor) on the time running: Inactivated: −0.079 ± 0.063, p = 0.22; Control: −0.061 ± 0.052, p = 0.28.
Extended Data Fig. 6 Properties of decision variables in M2.
(a) Illustration of a model to estimate the time constant of the reset at the end of the bout from M2 neurons. Example consecutive failures (pink) and neural projections (black right) of the neural activity (left, example neural traces) including the activity during 2 s after the end of each bout (dashed line). The projection of the neural activity on the decoding weights for the consecutive failure slowly ramps down until the beginning of the next bout. (b) To quantify the time constant of the reset at the end of the bout, the consecutive failures with an additional reset at the end of the bout were decoded from the neural activity. We considered the decoding projection at different times after the end of the last lick of bout ‘n’ and before the start of bout ‘n + 1’ and plotted the difference between the number of the consecutive failures (dashed pink) and the neural projection (dashed black) at the end of each bout across recording sessions (median ± MAD; n = 11) as a function of the time after the last lick. The neural activity can reset at the end of the bouts with a time constant of around 200 ms. (c) Deviance explained across sessions (n = 11 sessions, median ± 25th and 75th percentiles, the whiskers extend to the most extreme data points) predicted from M2 neurons for ‘Consecutive failures’ (left) and ‘Negative value’ (right) on ipsilateral vs. contralateral bouts. If the recording is performed in the right hemisphere, ipsilateral bouts are those when mice exploit the right foraging site (the right motorized arm), while contralateral bouts are those when mice exploit the left foraging site (and vice versa for recordings in the left hemisphere). We observed no significant differences in the model performance as a function of the side of the DVs (two-sided Wilcoxon signed rank test; p > 0.05). (d) This panel shows the deviance explained across sessions (n = 11 sessions, median ± MAD) for DVs (Pink: ‘consecutive failures’; Blue: ‘negative value’) as a function of window sizes. In all previous analyses, the window used to count the spikes was 200 ms centered around each lick (indicated by the black rectangle), which was a good tradeoff for including a significant number of spikes while mainly considering signals related to a single lick (since the average time between each lick was around 150 ms; Fig. 2b & Extended Data Fig. 1d). Yet, a few spikes linked to the preceding or the following events could still be included in the 200 ms window, making it more difficult to evaluate the contribution of momentary evidence. Therefore, we tested whether both DVs remained decodable in M2 even when we strictly excluded all spikes from neighboring events by using smaller analysis windows. We found that the decodability of the DVs in M2 did not depend on the size of the window for widths larger than 20 ms (one-way ANOVA followed by multiple pairwise comparison tests, all p-values > 0.05 for windows size > 20 ms, both for ‘consecutive failures’ and ‘negative value’), indicating that the results are not overly sensitive to the choice of parameters.
Extended Data Fig. 7 LM-HMM analysis of switch decision.
(a) To determine the number of states that best capture the decision-making of mice, we fit the LM-HMM with a varying number of states and then performed model comparison using cross-validation (see Methods for details). Training and test sets maximum a posteriori (MAP, with gaussian prior on the weights and Dirichlet prior on transition probabilities) are reported in units of bits per bout (median ± MAD). The dash-line rectangle highlights the log probability for the three-state model, which we used for all subsequent analyses. A single model was fit to all mice, where for each session the consecutive failures and prior rewards were min-maxed (that is, divided by their max \(F_m^{max},\;R_m^{max}\)), obtaining normalized weights w(k) and biases b(k). Single-sessions weights and biases were then obtained from these normalized parameters as \(w_m^{(k)} = w^{(k)}F_m^{max}/R_m^{max}\), \(b_m^{(k)} = b^{(k)}F_m^{max}\). (b) Weights \(w_m^{(k)}\) on total reward (left) and biases \(b_m^{(k)}\) (right) across sessions m (n = 11 sessions, median ± 25th and 75th percentiles, the whiskers extend to the most extreme data points) in the different states k = 1, 2, 3. (c) Consecutive failures before leaving as a function of total reward number across behavioral bouts (median ± MAD) in an example session from two different states (state 1, blue; state 2, pink). The slope coefficients of a linear regression model that predicted the number of consecutive failures before leaving as a function of the number of prior rewards in each state are shown on the right (n = 6 sessions for state 1, n = 7 sessions for state 2, median ± 25th and 75th percentiles across sessions, the whiskers extend to the most extreme data points). This result is consistent with the classification of stimulus-bound and inference-based strategies used in Fig. 1. (d) Posterior state probabilities for each recording session. Mice often start off the session with the stimulus-bound strategy and later switch to the inference-based strategies (in 6 out of 11 sessions).
Extended Data Fig. 8 M2 does not represent arbitrary sequences.
(a) A ‘near universal’ representational capacity is a feature of a computational framework known as ‘reservoir computing’ that exploits a potential functional capacity of recurrent networks to represent combinations of current inputs with previous evidence, even arbitrary ones. Thus, to test whether M2 also represented arbitrary signals, we examined whether sequences with similar temporal structure as the DVs but with no obvious relevance to the task could be decoded from M2. Here are examples of random sequences (gray) generated from one of the DVs (pink, here consecutive failures). The DV can lead to a shifted version (top right), a flipped version (middle right) or a random signal with equal power spectra. Each random signal is then decoded from M2 population activity (black traces). (b) Deviance explained (ordinate) by M2 neurons from decoding the DVs shifted by a given number of licks (abscissa). On each box, the central mark indicates the median across recording sessions (n = 11 sessions), and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. The dash black line indicates chance level (Dev. Exp. = 0). Shifting the DVs by a delay greater than their temporal autocorrelation greatly impaired their decodability (one-way ANOVA, F = 62.81, p < 10−26). (c) Same as in (b) but for DVs flipped across sessions. None of the flipped signals were decodable from M2 population activity. (d) Same as in (c) but for random signals with power spectra that match each DV. None of the random signals were decodable from M2 population activity. (e) Since any signal can be approximated by sums of periodic functions (Fourier analysis), we also probed the capacity of M2 to represent arbitrary temporal sequences by testing whether we could decode from M2 a basis set of cosine functions with wavelengths in the dynamic range of what we observed with integration and reset of rewards (example top gray trace; wavelength = 4 licks, phase = 0 rad). Overall, the decoding quality of the periodic function (example neural projection, top trace in black, Dev. Exp. = −0.002) was close to chance level (Dev. Exp. = 0.024 ± 0.028, median ± MAD) as seen in the matrix of deviance explained from decoding sequences with different wavelengths and phases with M2 population activity.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cazettes, F., Mazzucato, L., Murakami, M. et al. A reservoir of foraging decision variables in the mouse brain. Nat Neurosci 26, 840–849 (2023). https://doi.org/10.1038/s41593-023-01305-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41593-023-01305-8
This article is cited by
-
Physical reservoir computing with emerging electronics
Nature Electronics (2024)
-
Parallel processing of alternative approaches
Nature Reviews Neuroscience (2023)
-
The rat frontal orienting field dynamically encodes value for economic decisions under risk
Nature Neuroscience (2023)