Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A reservoir of foraging decision variables in the mouse brain

Abstract

In any given situation, the environment can be parsed in different ways to yield decision variables (DVs) defining strategies useful for different tasks. It is generally presumed that the brain only computes a single DV defining the current behavioral strategy. Here to test this assumption, we recorded neural ensembles in the frontal cortex of mice performing a foraging task admitting multiple DVs. Methods developed to uncover the currently employed DV revealed the use of multiple strategies and occasional switches in strategy within sessions. Optogenetic manipulations showed that the secondary motor cortex (M2) is needed for mice to use the different DVs in the task. Surprisingly, we found that regardless of which DV best explained the current behavior, M2 activity concurrently encoded a full basis set of computations defining a reservoir of DVs appropriate for alternative tasks. This form of neural multiplexing may confer considerable advantages for learning and adaptive behavior.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Multiple DVs predict foraging behavior.
Fig. 2: Neural activity related to the switching decision.
Fig. 3: Switching decision and running initiation are dissociable.
Fig. 4: M2 is involved in the switching decision.
Fig. 5: Neural representation of DVs.
Fig. 6: Independent representations of DVs.
Fig. 7: Simultaneous representations of DVs.
Fig. 8: M2 represents foraging algorithms.

Similar content being viewed by others

Data availability

The preprocessed electrophysiological and behavioral data collected for this study are publicly and can be accessed at: https://doi.org/10.6084/m9.figshare.20449089.

Raw electrophysiological data are too large to be shared on a publicly available repository and are therefore available from the authors upon reasonable request.

The Allen Mouse Brain Atlas used in this study is publicly available: https://alleninstitute.github.io/AllenSDK/reference_space.html.

Code availability

All analyses were performed using custom code written in MATLAB and available upon request.

The code used for the central GLM analyses is publicly available at: https://hastie.su.domains/glmnet_matlab/.

The code developed for the LM-HMM can be accessed at: https://github.com/mazzulab/ssm/blob/master/notebooks/2c%20Input-driven%20linear%20model%20(LM-HMM).ipynb.

References

  1. Niv, Y. Learning task-state representations. Nat. Neurosci. 22, 1544–1553 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Kang, Y. H. et al. Multiple decisions about one object involve parallel sensory acquisition but time-multiplexed evidence incorporation. eLife 10, e63721 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Pashler, H. Processing stages in overlapping tasks: evidence for a central bottleneck. J. Exp. Psychol. Hum. Percept. Perform. 10, 358–377 (1984).

    Article  CAS  PubMed  Google Scholar 

  4. Sigman, M. & Dehaene, S. Parsing a cognitive task: a characterization of the mind’s bottleneck. PLoS Biol. 3, e37 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Scott, B. B. et al. Fronto-parietal cortical circuits encode accumulated evidence with a diversity of timescales. Neuron 95, 385–398 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Bernacchia, A., Seo, H., Lee, D. & Wang, X.-J. A reservoir of time constants for memory traces in cortical neurons. Nat. Neurosci. 14, 366–372 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Cazettes, F., Reato, D., Morais, J. P., Renart, A. & Mainen, Z. F. Phasic activation of dorsal raphe serotonergic neurons increases pupil size. Curr. Biol. 31, 192–197 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Vertechi, P. et al. Inference-based decisions in a hidden state foraging task: differential contributions of prefrontal cortical areas. Neuron 106, 166–176 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Jun, J. J. et al. Fully integrated silicon probes for high-density recording of neural activity. Nature 551, 232–236 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Murakami, M., Vicente, M. I., Costa, G. M. & Mainen, Z. F. Neural antecedents of self-initiated actions in secondary motor cortex. Nat. Neurosci. 17, 1574 (2014).

    Article  CAS  PubMed  Google Scholar 

  11. Li, N., Chen, T.-W., Guo, Z. V., Gerfen, C. R. & Svoboda, K. A motor cortex circuit for motor planning and movement. Nature 519, 51–56 (2015).

    Article  CAS  PubMed  Google Scholar 

  12. Siniscalchi, M. J., Phoumthipphavong, V., Ali, F., Lozano, M. & Kwan, A. C. Fast and slow transitions in frontal ensemble activity during flexible sensorimotor behavior. Nat. Neurosci. 19, 1234–1242 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Recanatesi, S., Pereira-Obilinovic, U., Murakami, M., Mainen, Z. & Mazzucato, L. Metastable attractors explain the variable timing of stable behavioral action sequences. Neuron 110, 139–153 (2022).

    Article  CAS  PubMed  Google Scholar 

  14. Ashwood, Z. C. et al. Mice alternate between discrete strategies during perceptual decision-making. Nat. Neurosci. 25, 201–212 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Enel, P., Procyk, E., Quilodran, R. & Dominey, P. F. Reservoir computing properties of neural dynamics in prefrontal cortex. PLoS Comput. Biol. 12, e1004967 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Jaeger, H. & Haas, H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004).

    Article  CAS  PubMed  Google Scholar 

  17. Sussillo, D. & Abbott, L. F. Generating coherent patterns of activity from chaotic neural networks. Neuron 63, 544–557 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Mello, G. B. M., Soares, S. & Paton, J. J. A scalable population code for time in the striatum. Curr. Biol. 25, 1113–1122 (2015).

    Article  CAS  PubMed  Google Scholar 

  19. Simen, P., Balci, F., deSouza, L., Cohen, J. D. & Holmes, P. A model of interval timing by neural integration. J. Neurosci. 31, 9238–9253 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Sugrue, L. P., Corrado, G. S. & Newsome, W. T. Matching behavior and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004).

    Article  CAS  PubMed  Google Scholar 

  21. Hayden, B. Y., Pearson, J. M. & Platt, M. L. Neuronal basis of sequential foraging decisions in a patchy environment. Nat. Neurosci. 14, 933–939 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Brunton, B. W., Botvinick, M. M. & Brody, C. D. Rats and humans can optimally accumulate evidence for decision-making. Science 340, 95–98 (2013).

    Article  CAS  PubMed  Google Scholar 

  23. Xiong, Q., Znamenskiy, P. & Zador, A. M. Selective corticostriatal plasticity during acquisition of an auditory discrimination task. Nature 521, 348–351 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Drugowitsch, J., Mendonça, A. G., Mainen, Z. F. & Pouget, A. Learning optimal decisions with confidence. Proc. Natl Acad. Sci. USA 116, 24872–24880 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Kobak, D. et al. Demixed principal component analysis of neural population data. eLife 5, e10989 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Mante, V., Sussillo, D., Shenoy, K. V. & Newsome, W. T. Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Raposo, D., Kaufman, M. T. & Churchland, A. K. A category-free neural population supports evolving demands during decision-making. Nat. Neurosci. 17, 1784–1792 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Rigotti, M. et al. The importance of mixed selectivity in complex cognitive tasks. Nature 497, 585–590 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Tanaka, G. et al. Recent advances in physical reservoir computing: a review. Neural Netw. 115, 100–123 (2019).

    Article  PubMed  Google Scholar 

  30. Wald, A. Sequential Analysis (John Wiley & Sons, 1947).

  31. Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N. & Pouget, A. The cost of accumulating evidence in perceptual decision making. J. Neurosci. 32, 3612–3628 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Gold, J. I. & Shadlen, M. N. Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron 36, 299–308 (2002).

    Article  CAS  PubMed  Google Scholar 

  33. Glaze, C. M., Kable, J. W. & Gold, J. I. Normative evidence accumulation in unpredictable environments. eLife 4, e08825 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Krajbich, I. & Rangel, A. Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proc. Natl Acad. Sci. USA 108, 13852–13857 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Yang, T. & Shadlen, M. N. Probabilistic reasoning by neurons. Nature 447, 1075–1080 (2007).

    Article  CAS  PubMed  Google Scholar 

  36. Sarafyazd, M. & Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science 364, eaav8911 (2019).

    Article  CAS  PubMed  Google Scholar 

  37. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).

  38. Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99–134 (1998).

    Article  Google Scholar 

  39. Rao, R. P. N. Decision making under uncertainty: a neural model based on partially observable Markov decision processes. Front. Comput. Neurosci. 4, 146 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Rushworth, M. F. S. & Behrens, T. E. J. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 11, 389–397 (2008).

    Article  CAS  PubMed  Google Scholar 

  41. Hermoso-Mendizabal, A. et al. Response outcomes gate the impact of expectations on perceptual decisions. Nat. Commun. 11, 1057 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Gershman, S. J. & Niv, Y. Learning latent structure: carving nature at its joints. Curr. Opin. Neurobiol. 20, 251–256 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Thompson, W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933).

    Article  Google Scholar 

  44. Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Pisupati, S., Chartarifsky-Lynn, L., Khanal, A. & Churchland, A. K. Lapses in perceptual decisions reflect exploration. eLife 10, e55490 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Zylberberg, A., Ouellette, B., Sigman, M. & Roelfsema, P. R. Decision making during the psychological refractory period. Curr. Biol. 22, 1795–1799 (2012).

    Article  CAS  PubMed  Google Scholar 

  47. Cisek, P. Cortical mechanisms of action selection: the affordance competition hypothesis. Philos. Trans. R. Soc. B Biol. Sci. 362, 1585–1599 (2007).

    Article  Google Scholar 

  48. Gallivan, J. P., Logan, L., Wolpert, D. M. & Flanagan, J. R. Parallel specification of competing sensorimotor control policies for alternative action options. Nat. Neurosci. 19, 320–326 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Klapp, S. T., Maslovat, D. & Jagacinski, R. J. The bottleneck of the psychological refractory period effect involves timing of response initiation rather than response selection. Psychon. Bull. Rev. 26, 29–47 (2019).

    Article  PubMed  Google Scholar 

  50. Lopes, G. et al. Bonsai: an event-based framework for processing and controlling data streams. Front. Neuroinform. 9, 7 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Shamash, P., Carandini, M., Harris, K. & Steinmetz, N. A tool for analyzing electrode tracks from slice histology. Preprint at bioRxiv https://doi.org/10.1101/447995 (2018).

  52. Steinmetz, N. A., Zatka-Haas, P., Carandini, M. & Harris, K. D. Distributed coding of choice, action, and engagement across the mouse brain. Nature 576, 266–273 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Simon, N., Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39, 1–13 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank P. Vertechi for insightful discussions about the project and the model and D. Reato for support with analyses. We also thank M. Beckert for assistance with the illustrations. This work was supported by an EMBO long-term fellowship (F.C.; ALTF 461-2016) an AXA postdoctoral fellowship (F.C.), the National Institute of Neurological Disorders and Stroke grant R01-NS118461 (BRAIN Initiative, L.M.), the MEXT Grant-in-Aid for Scientific Research (19H05208, 19H05310 and 19K06882 (M.M.)), the Takeda Science Foundation (M.M.), Fundação para a Ciência e a Tecnologia (PTDC/MED_NEU/32068/2017, M.M., Z.F.M.; and LISBOA-01-0145-FEDER-032077, A.R.), the European Research Council Advanced Grant (671251, Z.F.M.), Simons Foundation (SCGB 543011, Z.F.M.) and Champalimaud Foundation (Z.F.M., A.R.). This work was also supported by Portuguese national funds, through FCT—Fundação para a Ciência e a Tecnologia—in the context of the project UIDB/04443/2020 and by the research infrastructure CONGENTO, cofinanced by Lisboa Regional Operational Programme (Lisboa2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF) and Fundação para a Ciência e Tecnologia (Portugal) under the projects LISBOA-01-0145-FEDER-02217 and LISBOA-01-0145-FEDER-022122.

Author information

Authors and Affiliations

Authors

Contributions

F.C. and Z.F.M. conceived the project. F.C. and M.M. designed and performed behavioral experiments. J.P.M. and E.A. helped with surgery and behavioral training. F.C. designed and performed electrophysiological experiments. F.C. curated the data. F.C. and A.R. designed and performed the analyses. L.M. designed the LM-HMM. F.C., A.R. and Z.F.M. wrote the manuscript. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Fanny Cazettes, Alfonso Renart or Zachary F. Mainen.

Ethics declarations

Competing interests

All authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Alex Kwan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Task apparatus and behavioral properties.

(a) The behavioral apparatus consists of a treadmill, coupled to two motors. Rotating the treadmill activates in a closed-loop fashion the movement of the arms via the motors. A mouse placed on the treadmill with its head fixed can lick at the spout from the arm in front. A camera placed on the side of the animal allows on-line video detection of the licks. (b) View from the lick detector camera. A region of interest is defined around the tongue of the animal. To detect the licks a threshold is applied to the signal within the region of interest. (c) The task consists of behavioral bouts and traveling epochs. Within a behavioral bout, the outcomes of the licks are classified into three types: reward, failure and invalid. Rewards and failures occur when the mouse slows down its running speed below an arbitrary threshold after the ‘STOP event’. The ‘STOP event’ is signaled by an auditory tone when an arm comes into place. Any lick above the running threshold is considered as invalid and always unrewarded. The traveling epoch starts after the ‘LEAVE event’ when the mouse initiates the run. (d, e, f) The licking behavior of the animals is stereotyped. (d) Histogram of the time between each lick. (e) Examples of lick raster of consecutive failures (top) and consecutive rewards (bottom). Licks are aligned at the onset of a rewarded lick and sorted based on the following events. (f) The licking frequency that corresponds to the two different examples in (e) (series of consecutive rewards in green and series of consecutive failures in purple). (g, h, i, j) Time distributions of different behavioral events (mean ± s.e.m.; n = 21 mice). The time spent licking was much greater than the time to initiate licking (between STOP event and first lick) or the time to initiate running (between the last lick and LEAVE event). Notably, engaged mice took less than half a second after the last licks to leave the site in most bouts (Median time to run = 0.46 s). The running time is comparable to the licking time. (k) Monotonic relationship between the number of consecutive failures after the last reward and the time licking after the last reward (each dot represents the means across bouts for each session).

Extended Data Fig. 2 Ground truth model.

(a, b) The slope (a) and intercept (b) estimates as a function of the ground truth for simulated sessions where the number of bouts matched that of real sessions. The ground truth can be recovered (R2 = 0.99 for the slope; R2 = 0.91 for the intercept) from the logistic regression. (c, d) The slope (c) and intercept (d) estimates as a function of the ground truth for simulated sessions with varying number of bouts. Overall, the ground truth can be precisely recovered for sessions with more than 100 bouts. (e) Deviance explained from a logistic regression model that fits simulated sessions of an inference-based agent using the correct model (‘Consecutive failures’), a wrong but correlated model (‘Negative value’) and a random model (where both rewards and failures are arbitrarily accumulated or reset). The deviance explained by the consecutive failures represents the upper-bound of the model performance. The deviance explained by the consecutive failures being smaller than 1 indicates that, although the ground truth can be recovered, the switching decision is not deterministic and involves some stochasticity (here the variability was matched to that of the data). However, the deviance explained by the consecutive failures is significantly greater than the deviance explained by the correlated model and the random model (two-sided Wilcoxon signed rank test, 3 stars indicate p < 10−3, p = 0.00005 between Consec. fail. and Neg. value; p < 10−7 between Consec. fail. and Random). On each box the central mark indicates the median across simulated sessions (n = 42 sessions), and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. (f) Illustration of a logistic regression model for predicting the switching decision of an inference-based simulated agent from the two different DVs (‘Consecutive failures’ and ‘Negative value’) simultaneously. (g) Deviance explained from the model in (f) as a function of the number of bouts in each session. (h) For all simulated sessions in (e), the variance explained by the ‘consecutive failures’ DV was greater than the variance explained by the ‘negative value’ DV, indicating that the model inferred the true DV.

Extended Data Fig. 3 Testing alternative foraging strategies.

(a) Illustration of the logistic regression model for predicting the switching decision of mice using a combination of the two main DVs, ‘Consecutive failures’ and ‘Negative value’, as well as additional DVs. Specifically, we tested 3 classes of additional DVs: 1) those relying on absolute time, 2) those relying on average reward rates, and 3) those that weigh recent evidence more strongly. The design matrix of the model thus consisted of the two main DVs, the time of each lick relative to the first lick of each bout (class 1), the average reward rate over 1, 3 and 10 previous bouts (class 2) and a version of the negative value DV that weighs recent evidence more heavily than the past ones (for class 3), such as: xt+1 = (1 − α)·g(ot+1)·xt + α·c(ot+1), with α = 0.8. (b) Deviance explained from a logistic regression model that predicts choice behavior based only on the 2 main DVs (left) and from the full model that also includes the additional DVs in (a). The central mark indicates the median across behavioral sessions (n = 42 sessions), and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. There was no significant difference between the deviance explained of the two models (two-sided Wilcoxon signed rank test: p = 0.22), indicating that the additional DVs do not improve the performance of the model. (c) Relative variance explained by each predictor of the full model for each behavioral session (n = 42 sessions across 21 mice, 2 sessions per mice). The dominant DV (the one with the maximum relative variance explained) was most often the ‘Consecutive failures’ (18 sessions), followed by the ‘Negative value’ (17 sessions), and finally the additional DVs (2 session for the absolute time, 2 sessions for average reward rate, 3 sessions for the weighted negative value).

Extended Data Fig. 4 Pipeline for extracellular electrophysiology, data processing and cluster mapping.

(a) Data collection from the Neuropixels probe. (b) Kilosort2 is used to automatically match spike templates to raw data. (c) Example of voltage data input to Kilosort2. Prior to the automatic sorting, the raw data is preprocessed with offset subtraction, median subtraction, and whitening steps. (d) Manual quality control is done on the outputs of Kilosort2 using PHY to remove units with nonphysiological waveforms (e), contaminated refractory periods, low amplitude (less than 50 µV) or low spiking units (less than 0.5 spike·s−1). (f) For further quality control, visualization of peri-event spike histograms (g, top; examples histogram aligned to first lick) or scatter plots (g, bottom; example scatter plot aligned to first lick) of single neurons are made with custom-written script in MATLAB. (h, i) Example scatter plot of all neurons recorded simultaneously along the shank of the probe. This visualization helps delimitate landmarks based on electrophysiological signatures to map cluster locations. (j, k, l) Landmarks derived from electrophysiological responses are validated with estimates from histology using an open-source software (SHARP-Track).

Extended Data Fig. 5 Optogenetic effect on action timing.

(a) Illustration of the different action timing during a behavioral bout. (b) We used generalized linear mixed effect models to evaluate the effect of stimulation (‘Laser’ predictor) on each action timing (see Methods). The models were fit separately for inactivated and control mice (number of observations: Inactivated = 68; Control = 20). (cf) Median timing across bouts in Laser OFF vs. Laser ON condition for each session (dots) of inactivated mice (violet) and control mice (red) mice. The p-value corresponding to the t-statistic for a two-sided null hypothesis test that the coefficient of the ‘Laser’ predictor is equal to 0 (plaser) is reported for each group of mice (color coded). (c) Fixed effect of stimulation (‘Laser’ predictor) on the inter-lick interval: Inactivated: −0.003 ± 0.0009, p = 0.001; Control: 0.005 ± 0.004, p = 0.24. (d) Fixed effect of stimulation (‘Laser’ predictor) on the time licking: Inactivated: 0.45 ± 0.26, p = 0.08; Control: −0.078 ± 0.22, p = 0.72. (e) Fixed effect of stimulation (‘Laser’ predictor) on the time to run: Inactivated: −0.075 ± 0.25, p = 0.76; Control: 0.014 ± 0.14, p = 0.92. (f) Fixed effect of stimulation (‘Laser’ predictor) on the time running: Inactivated: −0.079 ± 0.063, p = 0.22; Control: −0.061 ± 0.052, p = 0.28.

Extended Data Fig. 6 Properties of decision variables in M2.

(a) Illustration of a model to estimate the time constant of the reset at the end of the bout from M2 neurons. Example consecutive failures (pink) and neural projections (black right) of the neural activity (left, example neural traces) including the activity during 2 s after the end of each bout (dashed line). The projection of the neural activity on the decoding weights for the consecutive failure slowly ramps down until the beginning of the next bout. (b) To quantify the time constant of the reset at the end of the bout, the consecutive failures with an additional reset at the end of the bout were decoded from the neural activity. We considered the decoding projection at different times after the end of the last lick of bout ‘n’ and before the start of bout ‘n + 1’ and plotted the difference between the number of the consecutive failures (dashed pink) and the neural projection (dashed black) at the end of each bout across recording sessions (median ± MAD; n = 11) as a function of the time after the last lick. The neural activity can reset at the end of the bouts with a time constant of around 200 ms. (c) Deviance explained across sessions (n = 11 sessions, median ± 25th and 75th percentiles, the whiskers extend to the most extreme data points) predicted from M2 neurons for ‘Consecutive failures’ (left) and ‘Negative value’ (right) on ipsilateral vs. contralateral bouts. If the recording is performed in the right hemisphere, ipsilateral bouts are those when mice exploit the right foraging site (the right motorized arm), while contralateral bouts are those when mice exploit the left foraging site (and vice versa for recordings in the left hemisphere). We observed no significant differences in the model performance as a function of the side of the DVs (two-sided Wilcoxon signed rank test; p > 0.05). (d) This panel shows the deviance explained across sessions (n = 11 sessions, median ± MAD) for DVs (Pink: ‘consecutive failures’; Blue: ‘negative value’) as a function of window sizes. In all previous analyses, the window used to count the spikes was 200 ms centered around each lick (indicated by the black rectangle), which was a good tradeoff for including a significant number of spikes while mainly considering signals related to a single lick (since the average time between each lick was around 150 ms; Fig. 2b & Extended Data Fig. 1d). Yet, a few spikes linked to the preceding or the following events could still be included in the 200 ms window, making it more difficult to evaluate the contribution of momentary evidence. Therefore, we tested whether both DVs remained decodable in M2 even when we strictly excluded all spikes from neighboring events by using smaller analysis windows. We found that the decodability of the DVs in M2 did not depend on the size of the window for widths larger than 20 ms (one-way ANOVA followed by multiple pairwise comparison tests, all p-values > 0.05 for windows size > 20 ms, both for ‘consecutive failures’ and ‘negative value’), indicating that the results are not overly sensitive to the choice of parameters.

Extended Data Fig. 7 LM-HMM analysis of switch decision.

(a) To determine the number of states that best capture the decision-making of mice, we fit the LM-HMM with a varying number of states and then performed model comparison using cross-validation (see Methods for details). Training and test sets maximum a posteriori (MAP, with gaussian prior on the weights and Dirichlet prior on transition probabilities) are reported in units of bits per bout (median ± MAD). The dash-line rectangle highlights the log probability for the three-state model, which we used for all subsequent analyses. A single model was fit to all mice, where for each session the consecutive failures and prior rewards were min-maxed (that is, divided by their max \(F_m^{max},\;R_m^{max}\)), obtaining normalized weights w(k) and biases b(k). Single-sessions weights and biases were then obtained from these normalized parameters as \(w_m^{(k)} = w^{(k)}F_m^{max}/R_m^{max}\), \(b_m^{(k)} = b^{(k)}F_m^{max}\). (b) Weights \(w_m^{(k)}\) on total reward (left) and biases \(b_m^{(k)}\) (right) across sessions m (n = 11 sessions, median ± 25th and 75th percentiles, the whiskers extend to the most extreme data points) in the different states k = 1, 2, 3. (c) Consecutive failures before leaving as a function of total reward number across behavioral bouts (median ± MAD) in an example session from two different states (state 1, blue; state 2, pink). The slope coefficients of a linear regression model that predicted the number of consecutive failures before leaving as a function of the number of prior rewards in each state are shown on the right (n = 6 sessions for state 1, n = 7 sessions for state 2, median ± 25th and 75th percentiles across sessions, the whiskers extend to the most extreme data points). This result is consistent with the classification of stimulus-bound and inference-based strategies used in Fig. 1. (d) Posterior state probabilities for each recording session. Mice often start off the session with the stimulus-bound strategy and later switch to the inference-based strategies (in 6 out of 11 sessions).

Extended Data Fig. 8 M2 does not represent arbitrary sequences.

(a) A ‘near universal’ representational capacity is a feature of a computational framework known as ‘reservoir computing’ that exploits a potential functional capacity of recurrent networks to represent combinations of current inputs with previous evidence, even arbitrary ones. Thus, to test whether M2 also represented arbitrary signals, we examined whether sequences with similar temporal structure as the DVs but with no obvious relevance to the task could be decoded from M2. Here are examples of random sequences (gray) generated from one of the DVs (pink, here consecutive failures). The DV can lead to a shifted version (top right), a flipped version (middle right) or a random signal with equal power spectra. Each random signal is then decoded from M2 population activity (black traces). (b) Deviance explained (ordinate) by M2 neurons from decoding the DVs shifted by a given number of licks (abscissa). On each box, the central mark indicates the median across recording sessions (n = 11 sessions), and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points. The dash black line indicates chance level (Dev. Exp. = 0). Shifting the DVs by a delay greater than their temporal autocorrelation greatly impaired their decodability (one-way ANOVA, F = 62.81, p < 10−26). (c) Same as in (b) but for DVs flipped across sessions. None of the flipped signals were decodable from M2 population activity. (d) Same as in (c) but for random signals with power spectra that match each DV. None of the random signals were decodable from M2 population activity. (e) Since any signal can be approximated by sums of periodic functions (Fourier analysis), we also probed the capacity of M2 to represent arbitrary temporal sequences by testing whether we could decode from M2 a basis set of cosine functions with wavelengths in the dynamic range of what we observed with integration and reset of rewards (example top gray trace; wavelength = 4 licks, phase = 0 rad). Overall, the decoding quality of the periodic function (example neural projection, top trace in black, Dev. Exp. = −0.002) was close to chance level (Dev. Exp. = 0.024 ± 0.028, median ± MAD) as seen in the matrix of deviance explained from decoding sequences with different wavelengths and phases with M2 population activity.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cazettes, F., Mazzucato, L., Murakami, M. et al. A reservoir of foraging decision variables in the mouse brain. Nat Neurosci 26, 840–849 (2023). https://doi.org/10.1038/s41593-023-01305-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-023-01305-8

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing