Dorsal hippocampus contributes to model-based planning

Abstract

Planning can be defined as action selection that leverages an internal model of the outcomes likely to follow each possible action. Its neural mechanisms remain poorly understood. Here we adapt recent advances from human research for rats, presenting for the first time an animal task that produces many trials of planned behavior per session, making multitrial rodent experimental tools available to study planning. We use part of this toolkit to address a perennially controversial issue in planning: the role of the dorsal hippocampus. Although prospective hippocampal representations have been proposed to support planning, intact planning in animals with damaged hippocampi has been repeatedly observed. Combining formal algorithmic behavioral analysis with muscimol inactivation, we provide causal evidence directly linking dorsal hippocampus with planning behavior. Our results and methods open the door to new and more detailed investigations of the neural mechanisms of planning in the hippocampus and throughout the brain.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Two-step decision task for rats.
Figure 2: Multitrial history regression analysis.
Figure 3: Model-fitting analysis.
Figure 4: Effects of muscimol inactivation.
Figure 5: Effects of muscimol inactivation on mixture model fits.

Change history

  • 17 November 2017

    In the version of this article initially published, the green label in Fig. 1c read "rightward choices" instead of "leftward choices." The error has been corrected in the HTML and PDF versions of the article.

References

  1. 1

    Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, 1998).

  2. 2

    Tolman, E.C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Dolan, R.J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4

    Balleine, B.W. & O'Doherty, J.P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).

    PubMed  Google Scholar 

  5. 5

    Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    Brogden, W.J. Sensory pre-conditioning. J. Exp. Psychol. 25, 323–332 (1939).

    Google Scholar 

  7. 7

    Adams, C.D. & Dickinson, A. Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B 33, 109–121 (1981).

    Google Scholar 

  8. 8

    Hilário, M.R.F., Clouse, E., Yin, H.H. & Costa, R.M. Endocannabinoid signaling is critical for habit formation. Front. Integr. Neurosci. 1, 6 (2007).

    PubMed  PubMed Central  Google Scholar 

  9. 9

    Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Simon, D.A. & Daw, N.D. Neural correlates of forward planning in a spatial decision task in humans. J. Neurosci. 31, 5526–5539 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Wunderlich, K., Dayan, P. & Dolan, R.J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Huys, Q.J.M. et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. USA 112, 3098–3103 (2015).

    CAS  PubMed  Google Scholar 

  13. 13

    O'Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press Oxford, 1978).

  14. 14

    Packard, M.G. & McGaugh, J.L. Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiol. Learn. Mem. 65, 65–72 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Morris, R.G., Garrud, P., Rawlins, J.N. & O'Keefe, J. Place navigation impaired in rats with hippocampal lesions. Nature 297, 681–683 (1982).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    O'Keefe, J. & Dostrovsky, J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).

    CAS  Google Scholar 

  17. 17

    Wikenheiser, A.M. & Redish, A.D. Hippocampal theta sequences reflect current goals. Nat. Neurosci. 18, 289–294 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18

    Pfeiffer, B.E. & Foster, D.J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Koene, R.A., Gorchetchnikov, A., Cannon, R.C. & Hasselmo, M.E. Modeling goal-directed spatial navigation in the rat based on physiological data from the hippocampal formation. Neural Netw. 16, 577–584 (2003).

    PubMed  Google Scholar 

  20. 20

    Foster, D.J. & Knierim, J.J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Pezzulo, G., van der Meer, M.A.A., Lansink, C.S. & Pennartz, C.M.A. Internally generated sequences in learning and executing goal-directed behavior. Trends Cogn. Sci. 18, 647–657 (2014).

    PubMed  PubMed Central  Google Scholar 

  22. 22

    Kimble, D.P. & BreMiller, R. Latent learning in hippocampal-lesioned rats. Physiol. Behav. 26, 1055–1059 (1981).

    CAS  PubMed  Google Scholar 

  23. 23

    Kimble, D.P., Jordan, W.P. & BreMiller, R. Further evidence for latent learning in hippocampal-lesioned rats. Physiol. Behav. 29, 401–407 (1982).

    CAS  PubMed  Google Scholar 

  24. 24

    Corbit, L.H. & Balleine, B.W. The role of the hippocampus in instrumental conditioning. J. Neurosci. 20, 4233–4239 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Corbit, L.H., Ostlund, S.B. & Balleine, B.W. Sensitivity to instrumental contingency degradation is mediated by the entorhinal cortex and its efferents via the dorsal hippocampus. J. Neurosci. 22, 10976–10984 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Ward-Robinson, J. et al. Excitotoxic lesions of the hippocampus leave sensory preconditioning intact: implications for models of hippocampal function. Behav. Neurosci. 115, 1357–1362 (2001).

    CAS  PubMed  Google Scholar 

  27. 27

    Gaskin, S., Chai, S.-C. & White, N.M. Inactivation of the dorsal hippocampus does not affect learning during exploration of a novel environment. Hippocampus 15, 1085–1093 (2005).

    PubMed  Google Scholar 

  28. 28

    Bunsey, M. & Eichenbaum, H. Conservation of hippocampal memory function in rats and humans. Nature 379, 255–257 (1996).

    CAS  Google Scholar 

  29. 29

    Dusek, J.A. & Eichenbaum, H. The hippocampus and memory for orderly stimulus relations. Proc. Natl. Acad. Sci. USA 94, 7109–7114 (1997).

    CAS  PubMed  Google Scholar 

  30. 30

    Devito, L.M. & Eichenbaum, H. Memory for the order of events in specific sequences: contributions of the hippocampus and medial prefrontal cortex. J. Neurosci. 31, 3169–3175 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Jones, J.L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953–956 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    McDannald, M.A., Lucantonio, F., Burke, K.A., Niv, Y. & Schoenbaum, G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J. Neurosci. 31, 2700–2705 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33

    Gremel, C.M. & Costa, R.M. Orbitofrontal and striatal circuits dynamically encode the shift between goal-directed and habitual actions. Nat. Commun. 4, 2264 (2013).

    PubMed  PubMed Central  Google Scholar 

  34. 34

    Miller, K.J., Brody, C.D. & Botvinick, M.M. Identifying model-based and model-free patterns in behavior on multi-step tasks. Preprint at http://www.biorxiv.org/content/early/2016/12/24/096339 (2016).

  35. 35

    Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R.J. Model-based reasoning in humans becomes automatic with training. PLOS Comput. Biol. 11, e1004463 (2015).

    PubMed  PubMed Central  Google Scholar 

  36. 36

    Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLOS Comput. Biol. 7, e1002055 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Kool, W., Cushman, F.A. & Gershman, S.J. When does model-based control pay off? PLOS Comput. Biol. 12, e1005090 (2016).

    PubMed  PubMed Central  Google Scholar 

  38. 38

    Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLOS Comput. Biol. 11, e1004648 (2015).

    PubMed  PubMed Central  Google Scholar 

  39. 39

    Padoa-Schioppa, C. Neurobiology of economic choice: a good-based model. Annu. Rev. Neurosci. 34, 333–359 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Wilson, R.C., Takahashi, Y.K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Stalnaker, T.A., Cooch, N.K. & Schoenbaum, G. What the orbitofrontal cortex does not do. Nat. Neurosci. 18, 620–627 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Ostlund, S.B. & Balleine, B.W. Orbitofrontal cortex mediates outcome encoding in Pavlovian but not instrumental conditioning. J. Neurosci. 27, 4819–4825 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Foster, D.J., Morris, R.G. & Dayan, P. A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10, 1–16 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Olton, D.S., Becker, J.T. & Handelmann, G.E. Hippocampus, space, and memory. Behav. Brain Sci. 2, 313–322 (1979).

    Google Scholar 

  45. 45

    Racine, R.J. & Kimble, D.P. Hippocampal lesions and delayed alternation in the rat. Psychon. Sci. 3, 285–286 (1965).

    Google Scholar 

  46. 46

    Gilboa, A., Sekeres, M., Moscovitch, M. & Winocur, G. Higher-order conditioning is impaired by hippocampal lesions. Curr. Biol. 24, 2202–2207 (2014).

    CAS  PubMed  Google Scholar 

  47. 47

    Solomon, P.R., Vander Schaaf, E.R., Thompson, R.F. & Weisz, D.J. Hippocampus and trace conditioning of the rabbit's classically conditioned nictitating membrane response. Behav. Neurosci. 100, 729–744 (1986).

    CAS  PubMed  Google Scholar 

  48. 48

    Hartley, T., Lever, C., Burgess, N. & O'Keefe, J. Space in the brain: how the hippocampal formation supports spatial cognition. Phil. Trans. R. Soc. Lond. B 369, 20120510 (2013).

    Google Scholar 

  49. 49

    Hassabis, D., & Maguire, E.A. Deconstructing episodic memory with construction. Trends in Cog. Sci., 11, 299–306 (2007).

    Google Scholar 

  50. 50

    Eichenbaum, H. & Cohen, N.J. Can we reconcile the declarative memory and spatial navigation views on hippocampal function? Neuron 83, 764–770 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51

    Lau, B. & Glimcher, P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).

    PubMed  PubMed Central  Google Scholar 

  52. 52

    Stan Development Team. MatlabStan: the MATLAB interface to Stan. Stan.org. http://mc-stan.org/users/interfaces/matlab-stan (2016).

  53. 53

    Carpenter, C. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).

    Google Scholar 

  54. 54

    Gelman, A. et al. Bayesian Data Analysis, Third Edition (CRC Press, 2013).

  55. 55

    Krupa, D.J., Ghazanfar, A.A. & Nicolelis, M.A. Immediate thalamic sensory plasticity depends on corticothalamic feedback. Proc. Natl. Acad. Sci. USA 96, 8200–8205 (1999).

    CAS  PubMed  Google Scholar 

  56. 56

    Martin, J.H. Autoradiographic estimation of the extent of reversible inactivation produced by microinjection of lidocaine and muscimol in the rat. Neurosci. Lett. 127, 160–164 (1991).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57

    Aarts, E., Verhage, M., Veenvliet, J.V., Dolan, C.V. & van der Sluis, S. A solution to dependency: using multilevel analysis to accommodate nested data. Nat. Neurosci. 17, 491–496 (2014).

    CAS  PubMed  Google Scholar 

  58. 58

    Daw, N.D. in Decision Making, Affect, and Learning (eds. Delgado, M.R., Phelps, E.A. & Robbins, T.W.) 3–38 (Oxford University Press, 2011).

  59. 59

    Duane, S., Kennedy, A.D., Pendleton, B.J. & Roweth, D. Hybrid Monte Carlo. Phys. Lett. B 195, 216–222 (1987).

    CAS  Google Scholar 

Download references

Acknowledgements

We thank J. Erlich, C. Kopec, C.A. Duan, T. Hanks and A. Begelfer for training K.J.M. in the techniques necessary to carry out these experiments, as well as for comments and advice on the project. We thank N. Daw, I. Witten, Y. Niv, B. Wilson, T. Akam, A. Akrami and A. Solway for comments and advice on the project, and we thank J. Teran, K. Osorio, A. Sirko, R. LaTourette, L. Teachen and S. Stein for assistance in carrying out behavioral experiments. We especially thank T. Akam for suggestions on the physical layout of the behavior box and other experimental details. We thank A. Bornstein, B. Scott, A. Piet and L. Hunter for comments on the manuscript. K.J.M. was supported by training grant NIH T-32 MH065214 and by a Harold W. Dodds fellowship from Princeton University.

Author information

Affiliations

Authors

Contributions

K.J.M., M.M.B. and C.D.B. conceived the project. K.J.M. designed and carried out the experiments and the data analysis, with supervision from M.M.B. and C.D.B. K.J.M., M.M.B. and C.D.B. wrote the paper, starting from an initial draft by K.J.M.

Corresponding authors

Correspondence to Matthew M Botvinick or Carlos D Brody.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Reward rates of model-based and model-free agents.

Reward rates achieved by synthetic datasets generated by a hybrid model-based/model-free agent. Data were generated under the constraints αplanMF, βplanMF=5, with λ, αT, and all other betas set to zero. The highest reward rates are achieved by purely model-based agents, but the best purely model-free agents still outperformed the average rat, earning around 58% rewards (rat’s reward rate mean: 56.8%, sem: 0.4%, std: 2%).

Supplementary Figure 2 Results of one-trial-back analysis applied to the behavioral dataset.

Above: Average and standard error of the stay probability across rats. Below: Stay probability for each rat, with binomial 95% confidence intervals

Supplementary Figure 3 Results of logistic regression analysis applied to each rat, as well as simulated data generated from a fit of the mixture model to that rat’s dataset.

Rats are ordered by the relative quality of fit of the mixture model with respect to the regression model - earlier rats datasets are better explained by the mixture model than the regression, while later rats are better explained by the regression model.

Supplementary Figure 4 Movement times are faster following common transition trials.

Median movement time, in seconds, from the bottom center port to the reward port for common and uncommon transition trials, broken down by whether the movement was towards (right panel) or away from (left panel) the port with the higher reward probability.

Supplementary Figure 5 Placement of cannula in individual rats.

Purple points indicate OFC cannula tips, green points indicate PL cannula tips, and orange points indicate dH cannula tips.

Supplementary Figure 6 Results of logistic regression analysis applied to the inactivation dataset.

Above: Regression coefficients for the Saline, Control, dH, and OFC conditions. Points are averages across rats, and error bars are standard errors. Below: Differences between regression coefficients for different conditions

Supplementary Figure 7 Results of logistic regression analysis applied to each rat in the inactivation dataset.

Note that rat #6 did not complete any saline sessions

Supplementary Figure 8 Results of logistic regression analysis applied to simulated data generated by the reduced model fit to each rat in the inactivation dataset

Supplementary Figure 9 Rat performances compared between inactivation and control sessions.

Top: Fraction of times each rat selected the choice port with the greatest probability of leading to the reward port with the greatest probability of reward, for control vs. OFC sessions (Left) and for control vs. dH sessions (Right). Bottom: Fraction of times the better port was selected, as a function of the number of trials since the last reward probability flip.

Supplementary Figure 10 Results of one-trial-back analysis applied to the inactivation dataset.

Above: Average stay probability by trial-type for the Control, dH, and OFC conditions. Bar height is the average across rats, and error bars are standard errors. Below: Differences between stay probabilities coefficients for the different conditions

Supplementary Figure 11 Results of one-trial-back stay/switch analysis applied to each rat in the inactivation dataset

Supplementary Figure 12 Results of fitting the multiagent model jointly to the OFC inactivation and saline datasets.

Top Row: Posterior belief distributions over parameters governing the effect of inactivation on performance across the population. Only βplan is significantly affected by the inactivation. Below: Posterior belief distributions over parameters governing behavior on OFC (purple) and Saline (blue) sessions. Only βplan is affected by inactivation in a way that is consistent across animals.

Supplementary Figure 13 Results of fitting the multiagent model jointly to the dH inactivation and saline datasets.

Top Row: Posterior belief distributions over parameters governing the effect of inactivation on performance across the population. Only βplan is significantly affected by the inactivation. Below: Posterior belief distributions over parameters governing behavior on dH (orange) and Saline (blue) sessions. Only βplan is affected by inactivation in a way that is consistent across animals.

Supplementary Figure 14 Plots of posterior density projected onto planes defined by the parameter governing change in model-based weight and other population parameters for hippocampus (top, orange) and OFC (bottom, purple) inactivation datasets.

Supplementary Figure 15 Normalized cross-validated likelihood for logistic regression models (Online Methods), as a function of the number of previous trials used to predict the upcoming choice.

Including more than five previous trials in the model results in negligible improvements in quality of model fit.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 and Supplementary Discussion (PDF 4128 kb)

Life Sciences Reporting Summary (PDF 73 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Miller, K., Botvinick, M. & Brody, C. Dorsal hippocampus contributes to model-based planning. Nat Neurosci 20, 1269–1276 (2017). https://doi.org/10.1038/nn.4613

Download citation

Further reading