Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Humans primarily use model-based inference in the two-stage task

Abstract

Distinct model-free and model-based learning processes are thought to drive both typical and dysfunctional behaviours. Data from two-stage decision tasks have seemingly shown that human behaviour is driven by both processes operating in parallel. However, in this study, we show that more detailed task instructions lead participants to make primarily model-based choices that have little, if any, simple model-free influence. We also demonstrate that behaviour in the two-stage task may falsely appear to be driven by a combination of simple model-free and model-based learning if purely model-based agents form inaccurate models of the task because of misconceptions. Furthermore, we report evidence that many participants do misconceive the task in important ways. Overall, we argue that humans formulate a wide variety of learning models. Consequently, the simple dichotomy of model-free versus model-based learning is inadequate to explain behaviour in the two-stage task and connections between reward learning, habit formation and compulsivity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The stimuli used in the three versions of the two-stage task.
Fig. 2: Stay probabilities for human participants; stay probabilities and model-based weights for simulated agents.
Fig. 3: Example of a reward main effect that cannot be driven by model-free learning.
Fig. 4: Model-based weights and logistic regression coefficients for different empirical datasets.
Fig. 5: Simulated behaviour of agents and real participants can be influenced by irrelevant changes in stimulus position.
Fig. 6: Simplified diagrams representing the strategy space in the two-stage task.

Similar content being viewed by others

Data availability

The data obtained from human participants are available at https://github.com/carolfs/muddled_models

Code availability

All the code used to perform the simulations, run the magic carpet and the spaceship tasks, and analyse the results are available at https://github.com/carolfs/muddled_models

References

  1. Ceceli, A. O. & Tricomi, E. Habits and goals: a motivational perspective on action control. Curr. Opin. Behav. Sci. 20, 110–116 (2018).

    Article  Google Scholar 

  2. Redish, A. D., Jensen, S. & Johnson, A. Addiction as vulnerabilities in the decision process. Behav. Brain Sci. 31, 461–487 (2008).

    Article  Google Scholar 

  3. Rangel, A., Camerer, C. & Montague, P. R. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).

    Article  CAS  PubMed  Google Scholar 

  5. Sutton, R.S. & Barto, A.G. Reinforcement Learning: An Introduction (A Bradford Book, 1998).

  6. Gillan, C. M., Otto, A. R., Phelps, E. A. & Daw, N. D. Model-based learning protects against forming habits. Cogn. Affect. Behav. Neurosci. 15, 523–536 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Wunderlich, K., Smittenaar, P. & Dolan, R. J. Dopamine enhances model-based over model-free choice behavior. Neuron 75, 418–424 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Eppinger, B., Walter, M., Heekeren, H. R. & Li, S.-C. Of goals and habits: age-related and individual differences in goal-directed decision-making. Front. Neurosci. 7, 253 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A. & Daw, N. D. Working-memory capacity protects model-based learning from stress. Proc. Natl Acad. Sci. USA 110, 20941–20946 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Otto, A. R., Gershman, S. J., Markman, A. B. & Daw, N. D. The curse of planning. Psychol. Sci. 24, 751–761 (2013).

    Article  PubMed  Google Scholar 

  12. Smittenaar, P., FitzGerald, T. H., Romei, V., Wright, N. D. & Dolan, R. J. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80, 914–919 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sebold, M. et al. Model-based and model-free decisions in alcohol dependence. Neuropsychobiol. 70, 122–131 (2014).

    Article  CAS  Google Scholar 

  14. Voon, V. et al. Disorders of compulsivity: a common bias towards learning habits. Mol. Psych. 20, 345–352 (2015).

    Article  CAS  Google Scholar 

  15. Doll, B. B., Shohamy, D. & Daw, N. D. Multiple memory systems as substrates for multiple decision systems. Neurobiol. Learn. Mem. 117, 4–13 (2015).

    Article  PubMed  Google Scholar 

  16. Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D. & Daw, N. D. Model-based choices involve prospective neural activity. Nat. Neurosci. 18, 767–772 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Cushman, F. & Morris, A. Habitual control of goal selection in humans. Proc. Natl Acad. Sci. USA 112, 13817–13822 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Otto, A. R., Skatova, A., Madlon-Kay, S. & Daw, N. D. Cognitive control predicts use of model-based reinforcement learning. J. Cogn. Neurosci. 27, 319–333 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Deserno, L. et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc. Natl Acad. Sci. USA 112, 1595–1600 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Decker, J. H., Otto, A. R., Daw, N. D. & Hartley, C. A. From creatures of habit to goal-directed learners: tracking the developmental emergence of model-based reinforcement learning. Psychol. Sci. 27, 848–858 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Kool, W., Cushman, F. A. & Gershman, S. J. When does model-based control pay off? PLoS Comput. Biol. 12, e1005090 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Kool, W., Gershman, S.J. & Cushman, F.A. Cost–benefit arbitration between multiple reinforcement-learning systems. Psychol. Sci. 28, 1321–1333 (2017).

    Article  PubMed  Google Scholar 

  23. Miller, K. J., Botvinick, M. M. & Brody, C. D. Dorsal hippocampus contributes to model-based planning. Nat. Neurosci. 20, 1269–1276 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kool, W., Gershman, S. J. & Cushman, F. A. Planning complexity registers as a cost in metacontrol. J. Cogn. Neurosci. 30, 1391–1404 (2018).

    Article  PubMed  Google Scholar 

  25. FeherdaSilva, C. & Hare, T. A. A note on the analysis of two-stage task results: how changes in task structure affect what model-free and model-based strategies predict about the effects of reward and transition on the stay probability. PLoS ONE 13, e0195328 (2018).

    Article  CAS  Google Scholar 

  26. Shahar, N. et al. Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling. PLoS Comput. Biol. 15, e1006803 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Toyama, A., Katahira, K. & Ohira, H. Biases in estimating the balance between model-free and model-based learning systems due to model misspecification. J. Math. Psychol. 91, 88–102 (2019).

    Article  Google Scholar 

  28. Daw, N. D. Are we of two minds?. Nat. Neurosci. 21, 1497–1499 (2018).

    Article  CAS  PubMed  Google Scholar 

  29. Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Comput. Biol. 11, e1004648 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Miller, K. J., Shenhav, A. & Ludvig, E. A. Habits without values. Psychol. Rev. 126, 292–311 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680–692 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).

    Article  Google Scholar 

  33. Dayan, P. & Berridge, K. C. Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation. Cogn. Affect. Behav. Neurosci. 14, 473–492 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Dayan, P. & Niv, Y. Reinforcement learning: the good, the bad and the ugly. Curr. Opin. Neurobiol. 18, 185–196 (2008).

    Article  CAS  PubMed  Google Scholar 

  35. Radulescu, A., Niv, Y. & Ballard, I. Holistic reinforcement learning: the role of structure and attention. Trends Cogn. Sci. 23, 278–292 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Shahar, N. et al. Credit assignment to state-independent task representations and its relationship with model-based decision making. Proc. Natl Acad. Sci. USA 116, 15871–15876 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  PubMed  Google Scholar 

  38. Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Caplin, A. & Dean, M. Axiomatic methods, dopamine and reward prediction error. Curr. Opin. Neurobiol. 18, 197–202 (2008).

    Article  CAS  PubMed  Google Scholar 

  40. Bromberg-Martin, E. S., Matsumoto, M., Hong, S. & Hikosaka, O. A pallidus–habenula–dopamine pathway signals inferred stimulus values. J. Neurophysiol. 104, 1068–1076 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Doll, B. B., Simon, D. A. & Daw, N. D. The ubiquity of model-based reinforcement learning. Curr. Opin. Neurobiol. 22, 1075–1081 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Balleine, B. W. & O’Doherty, J. P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).

    Article  PubMed  Google Scholar 

  45. Dezfouli, A. & Balleine, B. W. Habits, action sequences and reinforcement learning. Eur. J. Neurosci. 35, 1036–1051 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Dezfouli, A. & Balleine, B. W. Actions, action sequences and habits: evidence that goal-directed and habitual action control are hierarchically organized. PLoS Comput. Biol. 9, e1003364 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Dezfouli, A., Lingawi, N. W. & Balleine, B. W. Habits as action sequences: hierarchical action control and changes in outcome value. Philos. Trans. R. Soc. Lond. B 369, 20130482–20130482 (2014).

    Article  Google Scholar 

  49. Gershman, S. J., Markman, A. B. & Otto, A. R. Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. Gen. 143, 182–194 (2014).

    Article  PubMed  Google Scholar 

  50. Balleine, B. W., Dezfouli, A., Ito, M. & Doya, K. Hierarchical control of goal-directed action in the cortical-basal ganglia network. Curr. Opin. Behav. Sci. 5, 1–7 (2015).

    Article  Google Scholar 

  51. Miller, K. J., Shenhav, A. & Ludvig, E. A. Habits without values. Psychol. Rev. 126, 292–311 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Morris, A. & Cushman, F. Model-free RL or action sequences? Front. Psychol. 10, 2892 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Konovalov, A. & Krajbich, I. Gaze data reveal distinct choice processes underlying model-based and model-free reinforcement learning. Nat. Commun. 7, 12438 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Redish, A. D. Vicarious trial and error. Nat. Rev. Neurosci. 17, 147–159 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Krajbich, I., Armel, C. & Rangel, A. Visual fixations and the computation and comparison of value in simple choice. Nat. Neurosci. 13, 1292–1298 (2010).

    Article  CAS  PubMed  Google Scholar 

  56. Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R. J. Model-based reasoning in humans becomes automatic with training. PLoS Comput. Biol. 11, e1004463 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  57. Shenhav, A. et al. Toward a rational and mechanistic account of mental effort. Annu. Rev. Neurosci. 40, 99–124 (2017).

    Article  CAS  PubMed  Google Scholar 

  58. Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R. J. Model-based reasoning in humans becomes automatic with training. PLoS Comput. Biol. 11, e1004463 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Schad, D. J. et al. Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning. Front. Psychol. 5, 1450 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A. & Daw, N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5, e11305 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Feher da Silva, C., Yao, Y.-W. & Hare, T.A. Can model-free reinforcement learning operate over information stored in working-memory? Preprint at bioRxiv https://doi.org/10.1101/107698 (2018).

  62. Stan Development Team. PyStan: the Python interface to Stan http://mc-stan.org (2017).

  63. Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. http://www.jstatsoft.org/v76/i01/ (2017).

  64. Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, Version 2.16.0 (2017).

  65. Lewandowski, D., Kurowicka, D. & Joe, H. Generating random correlation matrices based on vines and extended onion method. J. Multivar. Anal. 100, 1989–2001 (2009).

    Article  Google Scholar 

  66. Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27, 1413–1432 (2017).

    Article  Google Scholar 

  67. Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J.) 92–96 (SciPy, 2010).

Download references

Acknowledgements

We thank G. M. Parente for the wonderful illustrations used in the spaceship and magic carpet tasks, S. Gobbi, G. Lombardi and M. Edelson for many helpful discussions and ideas, the participants at the NYU Neuroeconomics Colloquium for useful feedback and P. Dayan, W. Kool, A. Konovalov, I. Krajbich and S. Nebe for helpful comments on early drafts of this manuscript. Our acknowledgement of their feedback does not imply that these individuals fully agree with our conclusions or opinions in this paper. We also acknowledge W. Kool, F. Cushman and S. Gershman for making the data from their 2016 paper openly available at https://github.com/wkool/tradeoffs. This work was supported by the CAPES Foundation (https://www.capes.gov.br) grant no. 88881.119317/2016-01 and the European Union’s Seventh Framework programme for research, technological development and demonstration under grant no. 607310 (Nudge-it). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

C.F.S. and T.A.H. designed the tasks and computational models. C.F.S. programmed the tasks, collected the data and performed the analyses with input from T.A.H. Both authors wrote the manuscript.

Corresponding authors

Correspondence to Carolina Feher da Silva or Todd A. Hare.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary Handling Editor: Marike Schiffer.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Timelines of the spaceship and magic carpet task.

Each box depicts an event within the spaceship or magic carpet tasks. The duration of each even is given in seconds on the left. A) In the spaceship task, the 1st screen simply indicates that a new trial has begun. The 2nd and 3rd screens represent the initial state. At the 4th screen, the participant has up to 2 seconds to indicate her choice. The common or rare transition is shown on the 5th screen. The second-stage state was indicated by the background colour (black, red) and the choice by the green left and right arrows on the 6th screen. The 7th and final screen in a trial revealed whether or not a reward was delivered. After feedback, the task advanced directly to the next trial. B) The magic carpet task was designed to closely mimic the original, abstract version of the two-stage task while still allowing for story-based instructions that included causes and effects for all task events. Thus, we used the same Tibetan characters from the original task, made them into labels for magic carpets and genies rather than simply identifying coloured squares. In the magic carpet task, the 1st screen represented the initial state and first-stage choice. Participants had up to 2s to make this choice. On the second screen, the chosen option was highlighted for 3 seconds. Next, a ‘nap’ screen was shown for 1s while the magic carpet automatically took the participant to one of the two mountains. Although participants saw the common or rare transition screens depicted in Fig. 1d during the practice trials, the transitions were not shown during the main task to make it more comparable with previous versions. The second-stage state (blue, pink) and choice were indicated by the pink or blue lamps on the right and left side of the 4th screen. The participant had up to 2s to make her choice. The 5th screen highlighted the chosen lamp/genie for3s. The 6th and final screen in a trial revealed whether or not a reward was delivered. After reward feedback, there was a blank screen for 0.7-1.3s before the next trial began.

Supplementary information

Supplementary Information

Supplementary methods, results and discussion.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feher da Silva, C., Hare, T.A. Humans primarily use model-based inference in the two-stage task. Nat Hum Behav 4, 1053–1066 (2020). https://doi.org/10.1038/s41562-020-0905-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-020-0905-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing