Humans primarily use model-based inference in the two-stage task

Feher da Silva, Carolina; Hare, Todd A.

doi:10.1038/s41562-020-0905-y

Article
Published: 06 July 2020

Humans primarily use model-based inference in the two-stage task

Nature Human Behaviour volume 4, pages 1053–1066 (2020)Cite this article

5648 Accesses
49 Citations
51 Altmetric
Metrics details

Subjects

Abstract

Distinct model-free and model-based learning processes are thought to drive both typical and dysfunctional behaviours. Data from two-stage decision tasks have seemingly shown that human behaviour is driven by both processes operating in parallel. However, in this study, we show that more detailed task instructions lead participants to make primarily model-based choices that have little, if any, simple model-free influence. We also demonstrate that behaviour in the two-stage task may falsely appear to be driven by a combination of simple model-free and model-based learning if purely model-based agents form inaccurate models of the task because of misconceptions. Furthermore, we report evidence that many participants do misconceive the task in important ways. Overall, we argue that humans formulate a wide variety of learning models. Consequently, the simple dichotomy of model-free versus model-based learning is inadequate to explain behaviour in the two-stage task and connections between reward learning, habit formation and compulsivity.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The stimuli used in the three versions of the two-stage task.**

**Fig. 2: Stay probabilities for human participants; stay probabilities and model-based weights for simulated agents.**

**Fig. 3: Example of a reward main effect that cannot be driven by model-free learning.**

**Fig. 4: Model-based weights and logistic regression coefficients for different empirical datasets.**

**Fig. 5: Simulated behaviour of agents and real participants can be influenced by irrelevant changes in stimulus position.**

**Fig. 6: Simplified diagrams representing the strategy space in the two-stage task.**

Explicit knowledge of task structure is a primary determinant of human model-based action

Article 19 May 2022

Active inference and the two-step task

Article Open access 21 October 2022

Model-based learning retrospectively updates model-free values

Article Open access 11 February 2022

Data availability

The data obtained from human participants are available at https://github.com/carolfs/muddled_models

Code availability

All the code used to perform the simulations, run the magic carpet and the spaceship tasks, and analyse the results are available at https://github.com/carolfs/muddled_models

References

Ceceli, A. O. & Tricomi, E. Habits and goals: a motivational perspective on action control. Curr. Opin. Behav. Sci. 20, 110–116 (2018).
Article Google Scholar
Redish, A. D., Jensen, S. & Johnson, A. Addiction as vulnerabilities in the decision process. Behav. Brain Sci. 31, 461–487 (2008).
Article Google Scholar
Rangel, A., Camerer, C. & Montague, P. R. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556 (2008).
Article CAS PubMed PubMed Central Google Scholar
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Article CAS PubMed Google Scholar
Sutton, R.S. & Barto, A.G. Reinforcement Learning: An Introduction (A Bradford Book, 1998).
Gillan, C. M., Otto, A. R., Phelps, E. A. & Daw, N. D. Model-based learning protects against forming habits. Cogn. Affect. Behav. Neurosci. 15, 523–536 (2015).
Article PubMed PubMed Central Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wunderlich, K., Smittenaar, P. & Dolan, R. J. Dopamine enhances model-based over model-free choice behavior. Neuron 75, 418–424 (2012).
Article CAS PubMed PubMed Central Google Scholar
Eppinger, B., Walter, M., Heekeren, H. R. & Li, S.-C. Of goals and habits: age-related and individual differences in goal-directed decision-making. Front. Neurosci. 7, 253 (2013).
Article PubMed PubMed Central Google Scholar
Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A. & Daw, N. D. Working-memory capacity protects model-based learning from stress. Proc. Natl Acad. Sci. USA 110, 20941–20946 (2013).
Article CAS PubMed PubMed Central Google Scholar
Otto, A. R., Gershman, S. J., Markman, A. B. & Daw, N. D. The curse of planning. Psychol. Sci. 24, 751–761 (2013).
Article PubMed Google Scholar
Smittenaar, P., FitzGerald, T. H., Romei, V., Wright, N. D. & Dolan, R. J. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80, 914–919 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sebold, M. et al. Model-based and model-free decisions in alcohol dependence. Neuropsychobiol. 70, 122–131 (2014).
Article CAS Google Scholar
Voon, V. et al. Disorders of compulsivity: a common bias towards learning habits. Mol. Psych. 20, 345–352 (2015).
Article CAS Google Scholar
Doll, B. B., Shohamy, D. & Daw, N. D. Multiple memory systems as substrates for multiple decision systems. Neurobiol. Learn. Mem. 117, 4–13 (2015).
Article PubMed Google Scholar
Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D. & Daw, N. D. Model-based choices involve prospective neural activity. Nat. Neurosci. 18, 767–772 (2015).
Article CAS PubMed PubMed Central Google Scholar
Cushman, F. & Morris, A. Habitual control of goal selection in humans. Proc. Natl Acad. Sci. USA 112, 13817–13822 (2015).
Article CAS PubMed PubMed Central Google Scholar
Otto, A. R., Skatova, A., Madlon-Kay, S. & Daw, N. D. Cognitive control predicts use of model-based reinforcement learning. J. Cogn. Neurosci. 27, 319–333 (2015).
Article PubMed PubMed Central Google Scholar
Deserno, L. et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc. Natl Acad. Sci. USA 112, 1595–1600 (2015).
Article CAS PubMed PubMed Central Google Scholar
Decker, J. H., Otto, A. R., Daw, N. D. & Hartley, C. A. From creatures of habit to goal-directed learners: tracking the developmental emergence of model-based reinforcement learning. Psychol. Sci. 27, 848–858 (2016).
Article PubMed PubMed Central Google Scholar
Kool, W., Cushman, F. A. & Gershman, S. J. When does model-based control pay off? PLoS Comput. Biol. 12, e1005090 (2016).
Article PubMed PubMed Central CAS Google Scholar
Kool, W., Gershman, S.J. & Cushman, F.A. Cost–benefit arbitration between multiple reinforcement-learning systems. Psychol. Sci. 28, 1321–1333 (2017).
Article PubMed Google Scholar
Miller, K. J., Botvinick, M. M. & Brody, C. D. Dorsal hippocampus contributes to model-based planning. Nat. Neurosci. 20, 1269–1276 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kool, W., Gershman, S. J. & Cushman, F. A. Planning complexity registers as a cost in metacontrol. J. Cogn. Neurosci. 30, 1391–1404 (2018).
Article PubMed Google Scholar
FeherdaSilva, C. & Hare, T. A. A note on the analysis of two-stage task results: how changes in task structure affect what model-free and model-based strategies predict about the effects of reward and transition on the stay probability. PLoS ONE 13, e0195328 (2018).
Article CAS Google Scholar
Shahar, N. et al. Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling. PLoS Comput. Biol. 15, e1006803 (2019).
Article CAS PubMed PubMed Central Google Scholar
Toyama, A., Katahira, K. & Ohira, H. Biases in estimating the balance between model-free and model-based learning systems due to model misspecification. J. Math. Psychol. 91, 88–102 (2019).
Article Google Scholar
Daw, N. D. Are we of two minds?. Nat. Neurosci. 21, 1497–1499 (2018).
Article CAS PubMed Google Scholar
Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Comput. Biol. 11, e1004648 (2015).
Article PubMed PubMed Central CAS Google Scholar
Miller, K. J., Shenhav, A. & Ludvig, E. A. Habits without values. Psychol. Rev. 126, 292–311 (2019).
Article PubMed PubMed Central Google Scholar
Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680–692 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Article Google Scholar
Dayan, P. & Berridge, K. C. Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation. Cogn. Affect. Behav. Neurosci. 14, 473–492 (2014).
Article PubMed PubMed Central Google Scholar
Dayan, P. & Niv, Y. Reinforcement learning: the good, the bad and the ugly. Curr. Opin. Neurobiol. 18, 185–196 (2008).
Article CAS PubMed Google Scholar
Radulescu, A., Niv, Y. & Ballard, I. Holistic reinforcement learning: the role of structure and attention. Trends Cogn. Sci. 23, 278–292 (2019).
Article PubMed PubMed Central Google Scholar
Shahar, N. et al. Credit assignment to state-independent task representations and its relationship with model-based decision making. Proc. Natl Acad. Sci. USA 116, 15871–15876 (2019).
Article CAS PubMed PubMed Central Google Scholar
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Article CAS PubMed Google Scholar
Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005).
Article CAS PubMed PubMed Central Google Scholar
Caplin, A. & Dean, M. Axiomatic methods, dopamine and reward prediction error. Curr. Opin. Neurobiol. 18, 197–202 (2008).
Article CAS PubMed Google Scholar
Bromberg-Martin, E. S., Matsumoto, M., Hong, S. & Hikosaka, O. A pallidus–habenula–dopamine pathway signals inferred stimulus values. J. Neurophysiol. 104, 1068–1076 (2010).
Article PubMed PubMed Central Google Scholar
Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).
Article PubMed PubMed Central CAS Google Scholar
Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).
Article CAS PubMed PubMed Central Google Scholar
Doll, B. B., Simon, D. A. & Daw, N. D. The ubiquity of model-based reinforcement learning. Curr. Opin. Neurobiol. 22, 1075–1081 (2012).
Article CAS PubMed PubMed Central Google Scholar
Balleine, B. W. & O’Doherty, J. P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).
Article PubMed Google Scholar
Dezfouli, A. & Balleine, B. W. Habits, action sequences and reinforcement learning. Eur. J. Neurosci. 35, 1036–1051 (2012).
Article PubMed PubMed Central Google Scholar
Dezfouli, A. & Balleine, B. W. Actions, action sequences and habits: evidence that goal-directed and habitual action control are hierarchically organized. PLoS Comput. Biol. 9, e1003364 (2013).
Article PubMed PubMed Central CAS Google Scholar
Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
Article CAS PubMed PubMed Central Google Scholar
Dezfouli, A., Lingawi, N. W. & Balleine, B. W. Habits as action sequences: hierarchical action control and changes in outcome value. Philos. Trans. R. Soc. Lond. B 369, 20130482–20130482 (2014).
Article Google Scholar
Gershman, S. J., Markman, A. B. & Otto, A. R. Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. Gen. 143, 182–194 (2014).
Article PubMed Google Scholar
Balleine, B. W., Dezfouli, A., Ito, M. & Doya, K. Hierarchical control of goal-directed action in the cortical-basal ganglia network. Curr. Opin. Behav. Sci. 5, 1–7 (2015).
Article Google Scholar
Miller, K. J., Shenhav, A. & Ludvig, E. A. Habits without values. Psychol. Rev. 126, 292–311 (2019).
Article PubMed PubMed Central Google Scholar
Morris, A. & Cushman, F. Model-free RL or action sequences? Front. Psychol. 10, 2892 (2019).
Article PubMed PubMed Central Google Scholar
Konovalov, A. & Krajbich, I. Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning. Nat. Commun. 7, 12438 (2016).
Article CAS PubMed PubMed Central Google Scholar
Redish, A. D. Vicarious trial and error. Nat. Rev. Neurosci. 17, 147–159 (2016).
Article CAS PubMed PubMed Central Google Scholar
Krajbich, I., Armel, C. & Rangel, A. Visual fixations and the computation and comparison of value in simple choice. Nat. Neurosci. 13, 1292–1298 (2010).
Article CAS PubMed Google Scholar
Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R. J. Model-based reasoning in humans becomes automatic with training. PLoS Comput. Biol. 11, e1004463 (2015).
Article PubMed PubMed Central CAS Google Scholar
Shenhav, A. et al. Toward a rational and mechanistic account of mental effort. Annu. Rev. Neurosci. 40, 99–124 (2017).
Article CAS PubMed Google Scholar
Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R. J. Model-based reasoning in humans becomes automatic with training. PLoS Comput. Biol. 11, e1004463 (2015).
Article PubMed PubMed Central CAS Google Scholar
Schad, D. J. et al. Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning. Front. Psychol. 5, 1450 (2014).
Article PubMed PubMed Central Google Scholar
Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A. & Daw, N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5, e11305 (2016).
Article PubMed PubMed Central Google Scholar
Feher da Silva, C., Yao, Y.-W. & Hare, T.A. Can model-free reinforcement learning operate over information stored in working-memory? Preprint at bioRxiv https://doi.org/10.1101/107698 (2018).
Stan Development Team. PyStan: the Python interface to Stan http://mc-stan.org (2017).
Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. http://www.jstatsoft.org/v76/i01/ (2017).
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, Version 2.16.0 (2017).
Lewandowski, D., Kurowicka, D. & Joe, H. Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal. 100, 1989–2001 (2009).
Article Google Scholar
Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27, 1413–1432 (2017).
Article Google Scholar
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 92–96 (SciPy, 2010).

Download references

Acknowledgements

We thank G. M. Parente for the wonderful illustrations used in the spaceship and magic carpet tasks, S. Gobbi, G. Lombardi and M. Edelson for many helpful discussions and ideas, the participants at the NYU Neuroeconomics Colloquium for useful feedback and P. Dayan, W. Kool, A. Konovalov, I. Krajbich and S. Nebe for helpful comments on early drafts of this manuscript. Our acknowledgement of their feedback does not imply that these individuals fully agree with our conclusions or opinions in this paper. We also acknowledge W. Kool, F. Cushman and S. Gershman for making the data from their 2016 paper openly available at https://github.com/wkool/tradeoffs. This work was supported by the CAPES Foundation (https://www.capes.gov.br) grant no. 88881.119317/2016-01 and the European Union’s Seventh Framework programme for research, technological development and demonstration under grant no. 607310 (Nudge-it). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland
Carolina Feher da Silva & Todd A. Hare

Authors

Carolina Feher da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Todd A. Hare
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.F.S. and T.A.H. designed the tasks and computational models. C.F.S. programmed the tasks, collected the data and performed the analyses with input from T.A.H. Both authors wrote the manuscript.

Corresponding authors

Correspondence to Carolina Feher da Silva or Todd A. Hare.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary Handling Editor: Marike Schiffer.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Timelines of the spaceship and magic carpet task.

Each box depicts an event within the spaceship or magic carpet tasks. The duration of each even is given in seconds on the left. A) In the spaceship task, the 1st screen simply indicates that a new trial has begun. The 2nd and 3rd screens represent the initial state. At the 4th screen, the participant has up to 2 seconds to indicate her choice. The common or rare transition is shown on the 5th screen. The second-stage state was indicated by the background colour (black, red) and the choice by the green left and right arrows on the 6th screen. The 7th and final screen in a trial revealed whether or not a reward was delivered. After feedback, the task advanced directly to the next trial. B) The magic carpet task was designed to closely mimic the original, abstract version of the two-stage task while still allowing for story-based instructions that included causes and effects for all task events. Thus, we used the same Tibetan characters from the original task, made them into labels for magic carpets and genies rather than simply identifying coloured squares. In the magic carpet task, the 1st screen represented the initial state and first-stage choice. Participants had up to 2s to make this choice. On the second screen, the chosen option was highlighted for 3 seconds. Next, a ‘nap’ screen was shown for 1s while the magic carpet automatically took the participant to one of the two mountains. Although participants saw the common or rare transition screens depicted in Fig. 1d during the practice trials, the transitions were not shown during the main task to make it more comparable with previous versions. The second-stage state (blue, pink) and choice were indicated by the pink or blue lamps on the right and left side of the 4th screen. The participant had up to 2s to make her choice. The 5th screen highlighted the chosen lamp/genie for3s. The 6th and final screen in a trial revealed whether or not a reward was delivered. After reward feedback, there was a blank screen for 0.7-1.3s before the next trial began.

Supplementary information

Supplementary Information

Supplementary methods, results and discussion.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feher da Silva, C., Hare, T.A. Humans primarily use model-based inference in the two-stage task. Nat Hum Behav 4, 1053–1066 (2020). https://doi.org/10.1038/s41562-020-0905-y

Download citation

Received: 15 July 2019
Accepted: 30 May 2020
Published: 06 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1038/s41562-020-0905-y

This article is cited by

Memory for rewards guides retrieval
- Juliane Nagel
- David Philip Morgan
- Gordon Benedikt Feld
Communications Psychology (2024)
Bridging minds and policies: supporting early career researchers in translating computational psychiatry research
- Aleya A. Marzuki
- Tsen Vei Lim
Neuropsychopharmacology (2024)
A multi-stage anticipated surprise model with dynamic expectation for economic decision-making
- Ho Ka Chan
- Taro Toyoizumi
Scientific Reports (2024)
Distinct reinforcement learning profiles distinguish between language and attentional neurodevelopmental disorders
- Noyli Nissan
- Uri Hertz
- Yafit Gabay
Behavioral and Brain Functions (2023)
Using smartphones to optimise and scale-up the assessment of model-based planning
- Kelly R. Donegan
- Vanessa M. Brown
- Claire M. Gillan
Communications Psychology (2023)