Model-based choices involve prospective neural activity

Doll, Bradley B; Duncan, Katherine D; Simon, Dylan A; Shohamy, Daphna; Daw, Nathaniel D

doi:10.1038/nn.3981

Article
Published: 23 March 2015

Model-based choices involve prospective neural activity

Bradley B Doll^1,2,
Katherine D Duncan²,
Dylan A Simon³,
Daphna Shohamy^2,4 &
…
Nathaniel D Daw^1,3

Nature Neuroscience volume 18, pages 767–772 (2015)Cite this article

10k Accesses
172 Citations
15 Altmetric
Metrics details

Subjects

Abstract

Decisions may arise via 'model-free' repetition of previously reinforced actions or by 'model-based' evaluation, which is widely thought to follow from prospective anticipation of action consequences using a learned map or model. While choices and neural correlates of decision variables sometimes reflect knowledge of their consequences, it remains unclear whether this actually arises from prospective evaluation. Using functional magnetic resonance imaging and a sequential reward-learning task in which paths contained decodable object categories, we found that humans' model-based choices were associated with neural signatures of future paths observed at decision time, suggesting a prospective mechanism for choice. Prospection also covaried with the degree of model-based influences on neural correlates of decision variables and was inversely related to prediction error signals thought to underlie model-free learning. These results dissociate separate mechanisms underlying model-based and model-free evaluation and support the hypothesis that model-based influences on choices and neural decision variables result from prospection.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Model behavioral predictions and data.**

**Figure 3: Neural evidence of prospective activation correlates with model-based behavior.**

**Figure 4: Correlates of choice probabilities derived from chosen minus unchosen values estimated by model-free and model-based learning at the task's first stage.**

**Figure 5: Neural evidence of model-free prediction errors and correlates of prediction error with model-free behavior.**

Differential replay of reward and punishment paths predicts approach and avoidance

Article 05 April 2023

Model-based learning retrospectively updates model-free values

Article Open access 11 February 2022

Dopamine-independent effect of rewards on choices through hidden-state inference

Article Open access 12 January 2024

References

Thorndike, E.L. Animal Intelligence: Experimental Studies (Macmillan, New York, 1911).
Sutton, R.S. & Barto, A.G. Introduction to Reinforcement Learning 〈http://dl.acm.org/citation.cfm?id=551283〉 (MIT Press, 1998).
Tolman, E.C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
Article CAS PubMed Google Scholar
Shohamy, D. & Wagner, A.D. Integrating memories in the human brain: hippocampal-midbrain encoding of overlapping events. Neuron 60, 378–389 (2008).
Article CAS PubMed PubMed Central Google Scholar
Wimmer, G.E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).
Article CAS PubMed Google Scholar
Barron, H.C., Dolan, R.J. & Behrens, T.E.J. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat. Neurosci. 16, 1492–1498 (2013).
Article CAS PubMed PubMed Central Google Scholar
Doll, B.B., Simon, D.A. & Daw, N.D. The ubiquity of model-based reinforcement learning. Curr. Opin. Neurobiol. 22, 1075–1081 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dolan, R.J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
Article CAS PubMed PubMed Central Google Scholar
Doya, K. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12, 961–974 (1999).
Article CAS PubMed Google Scholar
Fermin, A., Yoshida, T., Ito, M., Yoshimoto, J. & Doya, K. Evidence for model-based action planning in a sequential finger movement task. J. Mot. Behav. 42, 371–379 (2010).
Article PubMed Google Scholar
Gläscher, J., Daw, N., Dayan, P. & O'Doherty, J.P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Article PubMed PubMed Central CAS Google Scholar
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Article CAS PubMed PubMed Central Google Scholar
Eppinger, B., Walter, M., Heekeren, H.R. & Li, S.-C. Of goals and habits: age-related and individual differences in goal-directed decision-making. Front. Neurosci. 7, 253 (2013).
Article PubMed PubMed Central Google Scholar
Pfeiffer, B.E. & Foster, D.J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Article CAS PubMed PubMed Central Google Scholar
Johnson, A. & Redish, A.D. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).
Article CAS PubMed PubMed Central Google Scholar
Schapiro, A.C., Kustner, L.V. & Turk-Browne, N.B. Shaping of object representations in the human medial temporal lobe based on temporal regularities. Curr. Biol. 22, 1622–1627 (2012).
Article CAS PubMed PubMed Central Google Scholar
Moore, A.W. & Atkeson, C.G. Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993).
Google Scholar
Sutton, R.S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Machine Learning: Proc. Seventh Int. Conf. on Machine Learning (eds. Porter, B.W. & Mooney, R.J.) 216–224 (Morgan Kaufmann, Palo Alto, California, USA, 1990).
Chapter Google Scholar
Daw, N.D. & Dayan, P. The algorithmic anatomy of model-based evaluation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130478 (2014).
Article PubMed PubMed Central Google Scholar
Zeithamova, D., Dominick, A.L. & Preston, A.R. Hippocampal and ventral medial prefrontal activation during retrieval-mediated learning supports novel inference. Neuron 75, 168–179 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gershman, S.J., Markman, A.B. & Otto, A.R. Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. Gen. 143, 182–194 (2014).
Article PubMed Google Scholar
Doll, B.B., Shohamy, D. & Daw, N.D. Multiple memory systems as substrates for multiple decision systems. Neurobiol. Learn. Mem. 117, 4–13 (2015).
Article PubMed Google Scholar
Lee, S.W., Shimojo, S. & O'Doherty, J.P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
Article CAS PubMed PubMed Central Google Scholar
Reddy, L. & Kanwisher, N. Coding of visual objects in the ventral stream. Curr. Opin. Neurobiol. 16, 408–414 (2006).
Article CAS PubMed Google Scholar
FitzGerald, T.H.B., Seymour, B. & Dolan, R.J. The role of human orbitofrontal cortex in value comparison for incommensurable objects. J. Neurosci. 29, 8388–8395 (2009).
Article CAS PubMed PubMed Central Google Scholar
Boorman, E.D., Behrens, T.E.J., Woolrich, M.W. & Rushworth, M.F.S. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62, 733–743 (2009).
Article CAS PubMed Google Scholar
Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Article CAS PubMed PubMed Central Google Scholar
Boorman, E.D., Behrens, T.E. & Rushworth, M.F. Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biol. 9, e1001093 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kolling, N., Behrens, T.E.J., Mars, R.B. & Rushworth, M.F.S. Neural mechanisms of foraging. Science 336, 95–98 (2012).
Article CAS PubMed PubMed Central Google Scholar
Shenhav, A., Straccia, M.A., Cohen, J.D. & Botvinick, M.M. Anterior cingulate engagement in a foraging context reflects choice difficulty, not foraging value. Nat. Neurosci. 17, 1249–1254 (2014).
Article CAS PubMed PubMed Central Google Scholar
Garrison, J., Erdeniz, B. & Done, J. Prediction error in reinforcement learning: a meta-analysis of neuroimaging studies. Neurosci. Biobehav. Rev. 37, 1297–1310 (2013).
Article PubMed Google Scholar
Foerde, K., Knowlton, B.J. & Poldrack, R.A. Modulation of competing memory systems by distraction. Proc. Natl. Acad. Sci. USA 103, 11778–11783 (2006).
Article CAS PubMed PubMed Central Google Scholar
Tricomi, E., Balleine, B.W. & O'Doherty, J.P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–2232 (2009).
Article PubMed PubMed Central Google Scholar
Wunderlich, K., Dayan, P. & Dolan, R.J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kurth-Nelson, Z., Barnes, G., Sejdinovic, D., Dolan, R. & Dayan, P. Temporal structure in associative retrieval. Elife 4, e04919 (2015).
Article PubMed Central Google Scholar
Tolman, E.C. & Honzik, C.H. Introduction and removal of reward, and maze performance in rats. Univ. Calif. Publ. Psychol. 4, 257–275 (1930).
Google Scholar
Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Article CAS PubMed Google Scholar
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Article Google Scholar
Botvinick, M. & Weinstein, A. Model-based hierarchical reinforcement learning and human action control. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130480 (2014).
Article PubMed PubMed Central Google Scholar
Schapiro, A.C., Rogers, T.T., Cordova, N.I., Turk-Browne, N.B. & Botvinick, M.M. Neural representations of events arise from temporal community structure. Nat. Neurosci. 16, 486–492 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gluck, M.A. & Myers, C.E. Hippocampal mediation of stimulus representation: a computational theory. Hippocampus 3, 491–516 (1993).
Article CAS PubMed Google Scholar
Badre, D., Kayser, A.S. & D'Esposito, M. Frontal cortex and the discovery of abstract action rules. Neuron 66, 315–326 (2010).
Article CAS PubMed PubMed Central Google Scholar
Botvinick, M.M., Niv, Y. & Barto, A.C. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).
Article PubMed Google Scholar
Simon, D.A. & Daw, N.D. Neural correlates of forward planning in a spatial decision task in humans. J. Neurosci. 31, 5526–5539 (2011).
Article CAS PubMed PubMed Central Google Scholar
Everitt, B.J. & Robbins, T.W. Neural systems of reinforcement for drug addiction: from actions to habits to compulsion. Nat. Neurosci. 8, 1481–1489 (2005).
Article CAS PubMed Google Scholar
Redish, A.D. Addiction as a computational process gone awry. Science 306, 1944–1947 (2004).
Article CAS PubMed Google Scholar
Voon, V. et al. Mechanisms underlying dopamine-mediated reward bias in compulsive behaviors. Neuron 65, 135–142 (2010).
Article CAS PubMed PubMed Central Google Scholar
Otto, A.R., Gershman, S.J., Markman, A.B. & Daw, N.D. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol. Sci. 24, 751–761 (2013).
Article PubMed Google Scholar
Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723 (1974).
Article Google Scholar
Daw, N.D. in Atten. Perform. XXIII (Delgado, M.R., Phelps, E.A. & Robbins, T.W.) 1–26 (Oxford University Press, 2011).

Download references

Acknowledgements

We thank S.M. Fleming and L.Y. Atlas for helpful discussions. This work was supported by NINDS grant R01NS078784.

Author information

Authors and Affiliations

Center for Neural Science, New York University, New York, New York, USA
Bradley B Doll & Nathaniel D Daw
Department of Psychology, Columbia University, New York, New York, USA
Bradley B Doll, Katherine D Duncan & Daphna Shohamy
Department of Psychology, New York University, New York, New York, USA
Dylan A Simon & Nathaniel D Daw
Kavli Institute for Brain Science, Columbia University, New York, New York, USA
Daphna Shohamy

Authors

Bradley B Doll
View author publications
You can also search for this author in PubMed Google Scholar
Katherine D Duncan
View author publications
You can also search for this author in PubMed Google Scholar
Dylan A Simon
View author publications
You can also search for this author in PubMed Google Scholar
Daphna Shohamy
View author publications
You can also search for this author in PubMed Google Scholar
Nathaniel D Daw
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors designed the experiment and analyses. B.B.D. and K.D.D. performed the experiment. B.B.D. analyzed the data. B.B.D., N.D.D. and D.S. wrote the paper.

Corresponding author

Correspondence to Bradley B Doll.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Inferior frontal gyrus activation and model-free behavior

Relationship between inferior frontal gyrus (IFG) activation and model-free behavior (Online Methods, GLM4). A prospective model-based learner is indifferent to changes in start states, facing the same prospective problem on each trial. In contrast, a model-free learner who maintains a separate set of expected values for each start state may face additional processing demands (e.g., retrieval) when start states change. To test this possibility, we sought regions where such a switch cost might be reflected in the BOLD response, via greater activation when start states differed from one trial to the next relative to when they remained the same. a. Contrast of task start states (faces, tools) that differed from the previous trial, relative to those that matched. Effect plotted at P = 0.001 uncorrected for display purposes. (Peak voxel: −48 16 22; P = 1.1 × 10⁻⁷, cluster family-wise error corrected for whole-brain comparisons. Cluster size: 833 voxels. Peak t(19) = 6.27. No other clusters survived correction) b. IFG activation correlates with model-free behavior. Individual values reflect average activation of cluster identified from group-level contrast. IFG activation correlates negatively with model-based behavior (estimate = −0.65, χ²(1) = 11.91, P = 0.0006). Lines depict group-level linear effects and 95% confidence curves.

Supplementary Figure 2 Group level depiction of category-specific activation

Group level depiction of category-specific activation used to create functional ROIs from localizer data (ROIs for analysis were created in native space for each subject). Each category ROI constructed from the intersection of contrasts with all other categories (e.g. scenes ROI: scenes > body parts ∩ scenes > faces ∩ scenes > tools), thus preventing any overlap in ROIs (here, the conjunction of these group level contrasts is presented). Each contrast thresholded at P < 0.001, uncorrected. Peaks of clusters surviving family-wise error correction for whole-brain multiple comparisons: body parts: 50 −78 8, t(19)=9.23, cluster P = 2 × 10⁻⁶; −48 −76 12, t(19)=6.48, cluster P = 0.008; scenes: −26 −46 −10, t(19) = 11.71, cluster P = 6.8 × 10⁻⁵, 24 −34 −16, t(19) = 9.42, P = 1.7 × 10⁻⁵,−12 −98 0, t(19) = 8.7, P = 2.8 × 10⁻⁹; tools: −8 −78 6, t(19) = 10.22, P = 9.3 × 10⁻¹⁴. No clusters survived correction for the faces category (peak: 34 −90 −12, t(19) = 4.28, P = 0.992).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2 and Supplementary Tables 1–4 (PDF 295 kb)

Supplementary Methods Checklist (PDF 175 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doll, B., Duncan, K., Simon, D. et al. Model-based choices involve prospective neural activity. Nat Neurosci 18, 767–772 (2015). https://doi.org/10.1038/nn.3981

Download citation

Received: 05 December 2014
Accepted: 15 February 2015
Published: 23 March 2015
Issue Date: May 2015
DOI: https://doi.org/10.1038/nn.3981

This article is cited by

A multi-stage anticipated surprise model with dynamic expectation for economic decision-making
- Ho Ka Chan
- Taro Toyoizumi
Scientific Reports (2024)
The parieto-occipital cortex is a candidate neural substrate for the human ability to approximate Bayesian inference
- Nicholas M. Singletary
- Jacqueline Gottlieb
- Guillermo Horga
Communications Biology (2024)
Using smartphones to optimise and scale-up the assessment of model-based planning
- Kelly R. Donegan
- Vanessa M. Brown
- Claire M. Gillan
Communications Psychology (2023)
Decoding cognition from spontaneous neural activity
- Yunzhe Liu
- Matthew M. Nour
- Raymond J. Dolan
Nature Reviews Neuroscience (2022)
Electrocorticographic evidence of a common neurocognitive sequence for mentalizing about the self and others
- Kevin M. Tan
- Amy L. Daitch
- Matthew D. Lieberman
Nature Communications (2022)