Planning can be defined as action selection that leverages an internal model of the outcomes likely to follow each possible action. Its neural mechanisms remain poorly understood. Here we adapt recent advances from human research for rats, presenting for the first time an animal task that produces many trials of planned behavior per session, making multitrial rodent experimental tools available to study planning. We use part of this toolkit to address a perennially controversial issue in planning: the role of the dorsal hippocampus. Although prospective hippocampal representations have been proposed to support planning, intact planning in animals with damaged hippocampi has been repeatedly observed. Combining formal algorithmic behavioral analysis with muscimol inactivation, we provide causal evidence directly linking dorsal hippocampus with planning behavior. Our results and methods open the door to new and more detailed investigations of the neural mechanisms of planning in the hippocampus and throughout the brain.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, 1998).
Tolman, E.C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
Dolan, R.J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
Balleine, B.W. & O'Doherty, J.P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).
Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Brogden, W.J. Sensory pre-conditioning. J. Exp. Psychol. 25, 323–332 (1939).
Adams, C.D. & Dickinson, A. Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B 33, 109–121 (1981).
Hilário, M.R.F., Clouse, E., Yin, H.H. & Costa, R.M. Endocannabinoid signaling is critical for habit formation. Front. Integr. Neurosci. 1, 6 (2007).
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Simon, D.A. & Daw, N.D. Neural correlates of forward planning in a spatial decision task in humans. J. Neurosci. 31, 5526–5539 (2011).
Wunderlich, K., Dayan, P. & Dolan, R.J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
Huys, Q.J.M. et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. USA 112, 3098–3103 (2015).
O'Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press Oxford, 1978).
Packard, M.G. & McGaugh, J.L. Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiol. Learn. Mem. 65, 65–72 (1996).
Morris, R.G., Garrud, P., Rawlins, J.N. & O'Keefe, J. Place navigation impaired in rats with hippocampal lesions. Nature 297, 681–683 (1982).
O'Keefe, J. & Dostrovsky, J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).
Wikenheiser, A.M. & Redish, A.D. Hippocampal theta sequences reflect current goals. Nat. Neurosci. 18, 289–294 (2015).
Pfeiffer, B.E. & Foster, D.J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Koene, R.A., Gorchetchnikov, A., Cannon, R.C. & Hasselmo, M.E. Modeling goal-directed spatial navigation in the rat based on physiological data from the hippocampal formation. Neural Netw. 16, 577–584 (2003).
Foster, D.J. & Knierim, J.J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).
Pezzulo, G., van der Meer, M.A.A., Lansink, C.S. & Pennartz, C.M.A. Internally generated sequences in learning and executing goal-directed behavior. Trends Cogn. Sci. 18, 647–657 (2014).
Kimble, D.P. & BreMiller, R. Latent learning in hippocampal-lesioned rats. Physiol. Behav. 26, 1055–1059 (1981).
Kimble, D.P., Jordan, W.P. & BreMiller, R. Further evidence for latent learning in hippocampal-lesioned rats. Physiol. Behav. 29, 401–407 (1982).
Corbit, L.H. & Balleine, B.W. The role of the hippocampus in instrumental conditioning. J. Neurosci. 20, 4233–4239 (2000).
Corbit, L.H., Ostlund, S.B. & Balleine, B.W. Sensitivity to instrumental contingency degradation is mediated by the entorhinal cortex and its efferents via the dorsal hippocampus. J. Neurosci. 22, 10976–10984 (2002).
Ward-Robinson, J. et al. Excitotoxic lesions of the hippocampus leave sensory preconditioning intact: implications for models of hippocampal function. Behav. Neurosci. 115, 1357–1362 (2001).
Gaskin, S., Chai, S.-C. & White, N.M. Inactivation of the dorsal hippocampus does not affect learning during exploration of a novel environment. Hippocampus 15, 1085–1093 (2005).
Bunsey, M. & Eichenbaum, H. Conservation of hippocampal memory function in rats and humans. Nature 379, 255–257 (1996).
Dusek, J.A. & Eichenbaum, H. The hippocampus and memory for orderly stimulus relations. Proc. Natl. Acad. Sci. USA 94, 7109–7114 (1997).
Devito, L.M. & Eichenbaum, H. Memory for the order of events in specific sequences: contributions of the hippocampus and medial prefrontal cortex. J. Neurosci. 31, 3169–3175 (2011).
Jones, J.L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953–956 (2012).
McDannald, M.A., Lucantonio, F., Burke, K.A., Niv, Y. & Schoenbaum, G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J. Neurosci. 31, 2700–2705 (2011).
Gremel, C.M. & Costa, R.M. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nat. Commun. 4, 2264 (2013).
Miller, K.J., Brody, C.D. & Botvinick, M.M. Identifying model-based and model-free patterns in behavior on multi-step tasks. Preprint at http://www.biorxiv.org/content/early/2016/12/24/096339 (2016).
Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R.J. Model-based reasoning in humans becomes automatic with training. PLOS Comput. Biol. 11, e1004463 (2015).
Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLOS Comput. Biol. 7, e1002055 (2011).
Kool, W., Cushman, F.A. & Gershman, S.J. When does model-based control pay off? PLOS Comput. Biol. 12, e1005090 (2016).
Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLOS Comput. Biol. 11, e1004648 (2015).
Padoa-Schioppa, C. Neurobiology of economic choice: a good-based model. Annu. Rev. Neurosci. 34, 333–359 (2011).
Wilson, R.C., Takahashi, Y.K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Stalnaker, T.A., Cooch, N.K. & Schoenbaum, G. What the orbitofrontal cortex does not do. Nat. Neurosci. 18, 620–627 (2015).
Ostlund, S.B. & Balleine, B.W. Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental conditioning. J. Neurosci. 27, 4819–4825 (2007).
Foster, D.J., Morris, R.G. & Dayan, P. A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10, 1–16 (2000).
Olton, D.S., Becker, J.T. & Handelmann, G.E. Hippocampus, space, and memory. Behav. Brain Sci. 2, 313–322 (1979).
Racine, R.J. & Kimble, D.P. Hippocampal lesions and delayed alternation in the rat. Psychon. Sci. 3, 285–286 (1965).
Gilboa, A., Sekeres, M., Moscovitch, M. & Winocur, G. Higher-order conditioning is impaired by hippocampal lesions. Curr. Biol. 24, 2202–2207 (2014).
Solomon, P.R., Vander Schaaf, E.R., Thompson, R.F. & Weisz, D.J. Hippocampus and trace conditioning of the rabbit's classically conditioned nictitating membrane response. Behav. Neurosci. 100, 729–744 (1986).
Hartley, T., Lever, C., Burgess, N. & O'Keefe, J. Space in the brain: how the hippocampal formation supports spatial cognition. Phil. Trans. R. Soc. Lond. B 369, 20120510 (2013).
Hassabis, D., & Maguire, E.A. Deconstructing episodic memory with construction. Trends in Cog. Sci., 11, 299–306 (2007).
Eichenbaum, H. & Cohen, N.J. Can we reconcile the declarative memory and spatial navigation views on hippocampal function? Neuron 83, 764–770 (2014).
Lau, B. & Glimcher, P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).
Stan Development Team. MatlabStan: the MATLAB interface to Stan. Stan.org. http://mc-stan.org/users/interfaces/matlab-stan (2016).
Carpenter, C. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Gelman, A. et al. Bayesian Data Analysis, Third Edition (CRC Press, 2013).
Krupa, D.J., Ghazanfar, A.A. & Nicolelis, M.A. Immediate thalamic sensory plasticity depends on corticothalamic feedback. Proc. Natl. Acad. Sci. USA 96, 8200–8205 (1999).
Martin, J.H. Autoradiographic estimation of the extent of reversible inactivation produced by microinjection of lidocaine and muscimol in the rat. Neurosci. Lett. 127, 160–164 (1991).
Aarts, E., Verhage, M., Veenvliet, J.V., Dolan, C.V. & van der Sluis, S. A solution to dependency: using multilevel analysis to accommodate nested data. Nat. Neurosci. 17, 491–496 (2014).
Daw, N.D. in Decision Making, Affect, and Learning (eds. Delgado, M.R., Phelps, E.A. & Robbins, T.W.) 3–38 (Oxford University Press, 2011).
Duane, S., Kennedy, A.D., Pendleton, B.J. & Roweth, D. Hybrid Monte Carlo. Phys. Lett. B 195, 216–222 (1987).
We thank J. Erlich, C. Kopec, C.A. Duan, T. Hanks and A. Begelfer for training K.J.M. in the techniques necessary to carry out these experiments, as well as for comments and advice on the project. We thank N. Daw, I. Witten, Y. Niv, B. Wilson, T. Akam, A. Akrami and A. Solway for comments and advice on the project, and we thank J. Teran, K. Osorio, A. Sirko, R. LaTourette, L. Teachen and S. Stein for assistance in carrying out behavioral experiments. We especially thank T. Akam for suggestions on the physical layout of the behavior box and other experimental details. We thank A. Bornstein, B. Scott, A. Piet and L. Hunter for comments on the manuscript. K.J.M. was supported by training grant NIH T-32 MH065214 and by a Harold W. Dodds fellowship from Princeton University.
The authors declare no competing financial interests.
Integrated supplementary information
Reward rates achieved by synthetic datasets generated by a hybrid model-based/model-free agent. Data were generated under the constraints αplan=αMF, βplan+βMF=5, with λ, αT, and all other betas set to zero. The highest reward rates are achieved by purely model-based agents, but the best purely model-free agents still outperformed the average rat, earning around 58% rewards (rat’s reward rate mean: 56.8%, sem: 0.4%, std: 2%).
Above: Average and standard error of the stay probability across rats. Below: Stay probability for each rat, with binomial 95% confidence intervals
Supplementary Figure 3 Results of logistic regression analysis applied to each rat, as well as simulated data generated from a fit of the mixture model to that rat’s dataset.
Rats are ordered by the relative quality of fit of the mixture model with respect to the regression model - earlier rats datasets are better explained by the mixture model than the regression, while later rats are better explained by the regression model.
Median movement time, in seconds, from the bottom center port to the reward port for common and uncommon transition trials, broken down by whether the movement was towards (right panel) or away from (left panel) the port with the higher reward probability.
Purple points indicate OFC cannula tips, green points indicate PL cannula tips, and orange points indicate dH cannula tips.
Above: Regression coefficients for the Saline, Control, dH, and OFC conditions. Points are averages across rats, and error bars are standard errors. Below: Differences between regression coefficients for different conditions
Supplementary Figure 7 Results of logistic regression analysis applied to each rat in the inactivation dataset.
Note that rat #6 did not complete any saline sessions
Supplementary Figure 8 Results of logistic regression analysis applied to simulated data generated by the reduced model fit to each rat in the inactivation dataset
Top: Fraction of times each rat selected the choice port with the greatest probability of leading to the reward port with the greatest probability of reward, for control vs. OFC sessions (Left) and for control vs. dH sessions (Right). Bottom: Fraction of times the better port was selected, as a function of the number of trials since the last reward probability flip.
Above: Average stay probability by trial-type for the Control, dH, and OFC conditions. Bar height is the average across rats, and error bars are standard errors. Below: Differences between stay probabilities coefficients for the different conditions
Supplementary Figure 11 Results of one-trial-back stay/switch analysis applied to each rat in the inactivation dataset
Supplementary Figure 12 Results of fitting the multiagent model jointly to the OFC inactivation and saline datasets.
Top Row: Posterior belief distributions over parameters governing the effect of inactivation on performance across the population. Only βplan is significantly affected by the inactivation. Below: Posterior belief distributions over parameters governing behavior on OFC (purple) and Saline (blue) sessions. Only βplan is affected by inactivation in a way that is consistent across animals.
Supplementary Figure 13 Results of fitting the multiagent model jointly to the dH inactivation and saline datasets.
Top Row: Posterior belief distributions over parameters governing the effect of inactivation on performance across the population. Only βplan is significantly affected by the inactivation. Below: Posterior belief distributions over parameters governing behavior on dH (orange) and Saline (blue) sessions. Only βplan is affected by inactivation in a way that is consistent across animals.
Supplementary Figure 14 Plots of posterior density projected onto planes defined by the parameter governing change in model-based weight and other population parameters for hippocampus (top, orange) and OFC (bottom, purple) inactivation datasets.
Supplementary Figure 15 Normalized cross-validated likelihood for logistic regression models (Online Methods), as a function of the number of previous trials used to predict the upcoming choice.
Including more than five previous trials in the model results in negligible improvements in quality of model fit.
About this article
Cite this article
Miller, K., Botvinick, M. & Brody, C. Dorsal hippocampus contributes to model-based planning. Nat Neurosci 20, 1269–1276 (2017). https://doi.org/10.1038/nn.4613
Frontiers in Behavioral Neuroscience (2020)
Amphetamine disrupts haemodynamic correlates of prediction errors in nucleus accumbens and orbitofrontal cortex
PLOS Computational Biology (2020)
Cognitive Neuroscience (2020)