Dorsal hippocampus contributes to model-based planning

Miller, Kevin J; Botvinick, Matthew M; Brody, Carlos D

doi:10.1038/nn.4613

Article
Published: 31 July 2017

Dorsal hippocampus contributes to model-based planning

Nature Neuroscience volume 20, pages 1269–1276 (2017)Cite this article

16k Accesses
107 Citations
216 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 17 November 2017

This article has been updated

Abstract

Planning can be defined as action selection that leverages an internal model of the outcomes likely to follow each possible action. Its neural mechanisms remain poorly understood. Here we adapt recent advances from human research for rats, presenting for the first time an animal task that produces many trials of planned behavior per session, making multitrial rodent experimental tools available to study planning. We use part of this toolkit to address a perennially controversial issue in planning: the role of the dorsal hippocampus. Although prospective hippocampal representations have been proposed to support planning, intact planning in animals with damaged hippocampi has been repeatedly observed. Combining formal algorithmic behavioral analysis with muscimol inactivation, we provide causal evidence directly linking dorsal hippocampus with planning behavior. Our results and methods open the door to new and more detailed investigations of the neural mechanisms of planning in the hippocampus and throughout the brain.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Two-step decision task for rats.**

**Figure 2: Multitrial history regression analysis.**

**Figure 4: Effects of muscimol inactivation.**

**Figure 5: Effects of muscimol inactivation on mixture model fits.**

Linear reinforcement learning in planning, grid fields, and cognitive control

Article Open access 16 August 2021

Goal-oriented representations in the human hippocampus during planning and navigation

Article Open access 23 May 2023

Learning of distant state predictions by the orbitofrontal cortex in humans

Article Open access 11 June 2019

Change history

17 November 2017
In the version of this article initially published, the green label in Fig. 1c read "rightward choices" instead of "leftward choices." The error has been corrected in the HTML and PDF versions of the article.

References

Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, 1998).
Tolman, E.C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
CAS PubMed Google Scholar
Dolan, R.J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
CAS PubMed PubMed Central Google Scholar
Balleine, B.W. & O'Doherty, J.P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).
PubMed Google Scholar
Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
CAS PubMed Google Scholar
Brogden, W.J. Sensory pre-conditioning. J. Exp. Psychol. 25, 323–332 (1939).
Google Scholar
Adams, C.D. & Dickinson, A. Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B 33, 109–121 (1981).
Google Scholar
Hilário, M.R.F., Clouse, E., Yin, H.H. & Costa, R.M. Endocannabinoid signaling is critical for habit formation. Front. Integr. Neurosci. 1, 6 (2007).
PubMed PubMed Central Google Scholar
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
CAS PubMed PubMed Central Google Scholar
Simon, D.A. & Daw, N.D. Neural correlates of forward planning in a spatial decision task in humans. J. Neurosci. 31, 5526–5539 (2011).
CAS PubMed PubMed Central Google Scholar
Wunderlich, K., Dayan, P. & Dolan, R.J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
CAS PubMed PubMed Central Google Scholar
Huys, Q.J.M. et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. USA 112, 3098–3103 (2015).
CAS PubMed PubMed Central Google Scholar
O'Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press Oxford, 1978).
Packard, M.G. & McGaugh, J.L. Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiol. Learn. Mem. 65, 65–72 (1996).
CAS PubMed Google Scholar
Morris, R.G., Garrud, P., Rawlins, J.N. & O'Keefe, J. Place navigation impaired in rats with hippocampal lesions. Nature 297, 681–683 (1982).
CAS PubMed Google Scholar
O'Keefe, J. & Dostrovsky, J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).
CAS PubMed Google Scholar
Wikenheiser, A.M. & Redish, A.D. Hippocampal theta sequences reflect current goals. Nat. Neurosci. 18, 289–294 (2015).
CAS PubMed PubMed Central Google Scholar
Pfeiffer, B.E. & Foster, D.J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
CAS PubMed PubMed Central Google Scholar
Koene, R.A., Gorchetchnikov, A., Cannon, R.C. & Hasselmo, M.E. Modeling goal-directed spatial navigation in the rat based on physiological data from the hippocampal formation. Neural Netw. 16, 577–584 (2003).
PubMed Google Scholar
Foster, D.J. & Knierim, J.J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).
CAS PubMed PubMed Central Google Scholar
Pezzulo, G., van der Meer, M.A.A., Lansink, C.S. & Pennartz, C.M.A. Internally generated sequences in learning and executing goal-directed behavior. Trends Cogn. Sci. 18, 647–657 (2014).
PubMed Google Scholar
Kimble, D.P. & BreMiller, R. Latent learning in hippocampal-lesioned rats. Physiol. Behav. 26, 1055–1059 (1981).
CAS PubMed Google Scholar
Kimble, D.P., Jordan, W.P. & BreMiller, R. Further evidence for latent learning in hippocampal-lesioned rats. Physiol. Behav. 29, 401–407 (1982).
CAS PubMed Google Scholar
Corbit, L.H. & Balleine, B.W. The role of the hippocampus in instrumental conditioning. J. Neurosci. 20, 4233–4239 (2000).
CAS PubMed PubMed Central Google Scholar
Corbit, L.H., Ostlund, S.B. & Balleine, B.W. Sensitivity to instrumental contingency degradation is mediated by the entorhinal cortex and its efferents via the dorsal hippocampus. J. Neurosci. 22, 10976–10984 (2002).
CAS PubMed PubMed Central Google Scholar
Ward-Robinson, J. et al. Excitotoxic lesions of the hippocampus leave sensory preconditioning intact: implications for models of hippocampal function. Behav. Neurosci. 115, 1357–1362 (2001).
CAS PubMed Google Scholar
Gaskin, S., Chai, S.-C. & White, N.M. Inactivation of the dorsal hippocampus does not affect learning during exploration of a novel environment. Hippocampus 15, 1085–1093 (2005).
PubMed Google Scholar
Bunsey, M. & Eichenbaum, H. Conservation of hippocampal memory function in rats and humans. Nature 379, 255–257 (1996).
CAS PubMed Google Scholar
Dusek, J.A. & Eichenbaum, H. The hippocampus and memory for orderly stimulus relations. Proc. Natl. Acad. Sci. USA 94, 7109–7114 (1997).
CAS PubMed PubMed Central Google Scholar
Devito, L.M. & Eichenbaum, H. Memory for the order of events in specific sequences: contributions of the hippocampus and medial prefrontal cortex. J. Neurosci. 31, 3169–3175 (2011).
CAS PubMed PubMed Central Google Scholar
Jones, J.L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953–956 (2012).
CAS PubMed PubMed Central Google Scholar
McDannald, M.A., Lucantonio, F., Burke, K.A., Niv, Y. & Schoenbaum, G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J. Neurosci. 31, 2700–2705 (2011).
CAS PubMed PubMed Central Google Scholar
Gremel, C.M. & Costa, R.M. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nat. Commun. 4, 2264 (2013).
PubMed Google Scholar
Miller, K.J., Brody, C.D. & Botvinick, M.M. Identifying model-based and model-free patterns in behavior on multi-step tasks. Preprint at http://www.biorxiv.org/content/early/2016/12/24/096339 (2016).
Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R.J. Model-based reasoning in humans becomes automatic with training. PLOS Comput. Biol. 11, e1004463 (2015).
PubMed PubMed Central Google Scholar
Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLOS Comput. Biol. 7, e1002055 (2011).
CAS PubMed PubMed Central Google Scholar
Kool, W., Cushman, F.A. & Gershman, S.J. When does model-based control pay off? PLOS Comput. Biol. 12, e1005090 (2016).
PubMed PubMed Central Google Scholar
Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLOS Comput. Biol. 11, e1004648 (2015).
PubMed PubMed Central Google Scholar
Padoa-Schioppa, C. Neurobiology of economic choice: a good-based model. Annu. Rev. Neurosci. 34, 333–359 (2011).
CAS PubMed PubMed Central Google Scholar
Wilson, R.C., Takahashi, Y.K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
CAS PubMed PubMed Central Google Scholar
Stalnaker, T.A., Cooch, N.K. & Schoenbaum, G. What the orbitofrontal cortex does not do. Nat. Neurosci. 18, 620–627 (2015).
CAS PubMed PubMed Central Google Scholar
Ostlund, S.B. & Balleine, B.W. Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental conditioning. J. Neurosci. 27, 4819–4825 (2007).
CAS PubMed PubMed Central Google Scholar
Foster, D.J., Morris, R.G. & Dayan, P. A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10, 1–16 (2000).
CAS PubMed Google Scholar
Olton, D.S., Becker, J.T. & Handelmann, G.E. Hippocampus, space, and memory. Behav. Brain Sci. 2, 313–322 (1979).
Google Scholar
Racine, R.J. & Kimble, D.P. Hippocampal lesions and delayed alternation in the rat. Psychon. Sci. 3, 285–286 (1965).
Google Scholar
Gilboa, A., Sekeres, M., Moscovitch, M. & Winocur, G. Higher-order conditioning is impaired by hippocampal lesions. Curr. Biol. 24, 2202–2207 (2014).
CAS PubMed Google Scholar
Solomon, P.R., Vander Schaaf, E.R., Thompson, R.F. & Weisz, D.J. Hippocampus and trace conditioning of the rabbit's classically conditioned nictitating membrane response. Behav. Neurosci. 100, 729–744 (1986).
CAS PubMed Google Scholar
Hartley, T., Lever, C., Burgess, N. & O'Keefe, J. Space in the brain: how the hippocampal formation supports spatial cognition. Phil. Trans. R. Soc. Lond. B 369, 20120510 (2013).
Google Scholar
Hassabis, D., & Maguire, E.A. Deconstructing episodic memory with construction. Trends in Cog. Sci., 11, 299–306 (2007).
Google Scholar
Eichenbaum, H. & Cohen, N.J. Can we reconcile the declarative memory and spatial navigation views on hippocampal function? Neuron 83, 764–770 (2014).
CAS PubMed PubMed Central Google Scholar
Lau, B. & Glimcher, P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).
PubMed PubMed Central Google Scholar
Stan Development Team. MatlabStan: the MATLAB interface to Stan. Stan.org. http://mc-stan.org/users/interfaces/matlab-stan (2016).
Carpenter, C. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Google Scholar
Gelman, A. et al. Bayesian Data Analysis, Third Edition (CRC Press, 2013).
Krupa, D.J., Ghazanfar, A.A. & Nicolelis, M.A. Immediate thalamic sensory plasticity depends on corticothalamic feedback. Proc. Natl. Acad. Sci. USA 96, 8200–8205 (1999).
CAS PubMed PubMed Central Google Scholar
Martin, J.H. Autoradiographic estimation of the extent of reversible inactivation produced by microinjection of lidocaine and muscimol in the rat. Neurosci. Lett. 127, 160–164 (1991).
CAS PubMed Google Scholar
Aarts, E., Verhage, M., Veenvliet, J.V., Dolan, C.V. & van der Sluis, S. A solution to dependency: using multilevel analysis to accommodate nested data. Nat. Neurosci. 17, 491–496 (2014).
CAS PubMed Google Scholar
Daw, N.D. in Decision Making, Affect, and Learning (eds. Delgado, M.R., Phelps, E.A. & Robbins, T.W.) 3–38 (Oxford University Press, 2011).
Duane, S., Kennedy, A.D., Pendleton, B.J. & Roweth, D. Hybrid Monte Carlo. Phys. Lett. B 195, 216–222 (1987).
CAS Google Scholar

Download references

Acknowledgements

We thank J. Erlich, C. Kopec, C.A. Duan, T. Hanks and A. Begelfer for training K.J.M. in the techniques necessary to carry out these experiments, as well as for comments and advice on the project. We thank N. Daw, I. Witten, Y. Niv, B. Wilson, T. Akam, A. Akrami and A. Solway for comments and advice on the project, and we thank J. Teran, K. Osorio, A. Sirko, R. LaTourette, L. Teachen and S. Stein for assistance in carrying out behavioral experiments. We especially thank T. Akam for suggestions on the physical layout of the behavior box and other experimental details. We thank A. Bornstein, B. Scott, A. Piet and L. Hunter for comments on the manuscript. K.J.M. was supported by training grant NIH T-32 MH065214 and by a Harold W. Dodds fellowship from Princeton University.

Author information

Authors and Affiliations

Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, USA
Kevin J Miller, Matthew M Botvinick & Carlos D Brody
Gatsby Computational Neuroscience Unit, University College London, London, UK
Matthew M Botvinick
Google DeepMind, London, UK
Matthew M Botvinick
Howard Hughes Medical Institute and Department of Molecular Biology, Princeton University, Princeton, New Jersey, USA
Carlos D Brody

Authors

Kevin J Miller
View author publications
You can also search for this author in PubMed Google Scholar
Matthew M Botvinick
View author publications
You can also search for this author in PubMed Google Scholar
Carlos D Brody
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.J.M., M.M.B. and C.D.B. conceived the project. K.J.M. designed and carried out the experiments and the data analysis, with supervision from M.M.B. and C.D.B. K.J.M., M.M.B. and C.D.B. wrote the paper, starting from an initial draft by K.J.M.

Corresponding authors

Correspondence to Matthew M Botvinick or Carlos D Brody.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Reward rates of model-based and model-free agents.

Reward rates achieved by synthetic datasets generated by a hybrid model-based/model-free agent. Data were generated under the constraints α_plan=α_MF, β_plan+β_MF=5, with λ, α_T, and all other betas set to zero. The highest reward rates are achieved by purely model-based agents, but the best purely model-free agents still outperformed the average rat, earning around 58% rewards (rat’s reward rate mean: 56.8%, sem: 0.4%, std: 2%).

Supplementary Figure 2 Results of one-trial-back analysis applied to the behavioral dataset.

Above: Average and standard error of the stay probability across rats. Below: Stay probability for each rat, with binomial 95% confidence intervals

Supplementary Figure 3 Results of logistic regression analysis applied to each rat, as well as simulated data generated from a fit of the mixture model to that rat’s dataset.

Rats are ordered by the relative quality of fit of the mixture model with respect to the regression model - earlier rats datasets are better explained by the mixture model than the regression, while later rats are better explained by the regression model.

Supplementary Figure 4 Movement times are faster following common transition trials.

Median movement time, in seconds, from the bottom center port to the reward port for common and uncommon transition trials, broken down by whether the movement was towards (right panel) or away from (left panel) the port with the higher reward probability.

Supplementary Figure 5 Placement of cannula in individual rats.

Purple points indicate OFC cannula tips, green points indicate PL cannula tips, and orange points indicate dH cannula tips.

Supplementary Figure 6 Results of logistic regression analysis applied to the inactivation dataset.

Above: Regression coefficients for the Saline, Control, dH, and OFC conditions. Points are averages across rats, and error bars are standard errors. Below: Differences between regression coefficients for different conditions

Supplementary Figure 7 Results of logistic regression analysis applied to each rat in the inactivation dataset.

Note that rat #6 did not complete any saline sessions

Supplementary Figure 8 Results of logistic regression analysis applied to simulated data generated by the reduced model fit to each rat in the inactivation dataset

Supplementary Figure 9 Rat performances compared between inactivation and control sessions.

Top: Fraction of times each rat selected the choice port with the greatest probability of leading to the reward port with the greatest probability of reward, for control vs. OFC sessions (Left) and for control vs. dH sessions (Right). Bottom: Fraction of times the better port was selected, as a function of the number of trials since the last reward probability flip.

Supplementary Figure 10 Results of one-trial-back analysis applied to the inactivation dataset.

Above: Average stay probability by trial-type for the Control, dH, and OFC conditions. Bar height is the average across rats, and error bars are standard errors. Below: Differences between stay probabilities coefficients for the different conditions

Supplementary Figure 11 Results of one-trial-back stay/switch analysis applied to each rat in the inactivation dataset

Supplementary Figure 12 Results of fitting the multiagent model jointly to the OFC inactivation and saline datasets.

Top Row: Posterior belief distributions over parameters governing the effect of inactivation on performance across the population. Only β_plan is significantly affected by the inactivation. Below: Posterior belief distributions over parameters governing behavior on OFC (purple) and Saline (blue) sessions. Only β_plan is affected by inactivation in a way that is consistent across animals.

Supplementary Figure 13 Results of fitting the multiagent model jointly to the dH inactivation and saline datasets.

Top Row: Posterior belief distributions over parameters governing the effect of inactivation on performance across the population. Only β_plan is significantly affected by the inactivation. Below: Posterior belief distributions over parameters governing behavior on dH (orange) and Saline (blue) sessions. Only β_plan is affected by inactivation in a way that is consistent across animals.

Supplementary Figure 14 Plots of posterior density projected onto planes defined by the parameter governing change in model-based weight and other population parameters for hippocampus (top, orange) and OFC (bottom, purple) inactivation datasets.

Supplementary Figure 15 Normalized cross-validated likelihood for logistic regression models (Online Methods), as a function of the number of previous trials used to predict the upcoming choice.

Including more than five previous trials in the model results in negligible improvements in quality of model fit.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 and Supplementary Discussion (PDF 4128 kb)

Life Sciences Reporting Summary (PDF 73 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miller, K., Botvinick, M. & Brody, C. Dorsal hippocampus contributes to model-based planning. Nat Neurosci 20, 1269–1276 (2017). https://doi.org/10.1038/nn.4613

Download citation

Received: 28 December 2016
Accepted: 20 June 2017
Published: 31 July 2017
Issue Date: 01 September 2017
DOI: https://doi.org/10.1038/nn.4613

This article is cited by

Dopamine-independent effect of rewards on choices through hidden-state inference
- Marta Blanco-Pozo
- Thomas Akam
- Mark E. Walton
Nature Neuroscience (2024)
An automated, low-latency environment for studying the neural basis of behavior in freely moving rats
- Maciej M. Jankowski
- Ana Polterovich
- Israel Nelken
BMC Biology (2023)
Distinct value computations support rapid sequential decisions
- Andrew Mah
- Shannon S. Schiereck
- Christine M. Constantinople
Nature Communications (2023)
Rethinking model-based and model-free influences on mental effort and striatal prediction errors
- Carolina Feher da Silva
- Gaia Lombardi
- Todd A. Hare
Nature Human Behaviour (2023)
Arithmetic value representation for hierarchical behavior composition
- Hiroshi Makino
Nature Neuroscience (2023)