Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Explicit knowledge of task structure is a primary determinant of human model-based action

Abstract

Explicit information obtained through instruction profoundly shapes human choice behaviour. However, this has been studied in computationally simple tasks, and it is unknown how model-based and model-free systems, respectively generating goal-directed and habitual actions, are affected by the absence or presence of instructions. We assessed behaviour in a variant of a computationally more complex decision-making task, before and after providing information about task structure, both in healthy volunteers and in individuals suffering from obsessive-compulsive or other disorders. Initial behaviour was model-free, with rewards directly reinforcing preceding actions. Model-based control, employing predictions of states resulting from each action, emerged with experience in a minority of participants, and less in those with obsessive-compulsive disorder. Providing task structure information strongly increased model-based control, similarly across all groups. Thus, in humans, explicit task structural knowledge is a primary determinant of model-based reinforcement learning and is most readily acquired from instruction rather than experience.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Behavioural task.
Fig. 2: Uninstructed behaviour is predominantly model-free.
Fig. 3: Impaired learning of model-based control from experience in OCD.
Fig. 4: Explicit knowledge increases model-based control.
Fig. 5: Explicit knowledge increases model-based control in OCD.

Similar content being viewed by others

Data availability

The data used in the study are available from https://github.com/ThomasAkam/Two-step_explicit_knowledge.

Code availability

The two-step task analysis code is available from https://github.com/ThomasAkam/Two-step_explicit_knowledge.

References

  1. Dickinson, A. Actions and habits: the development of behavioural autonomy. Phil. Trans. R. Soc. B 308, 67–78 (1985).

    Google Scholar 

  2. Sloman, S. A. The empirical case for two systems of reasoning. Psychol. Bull. 119, 3–22 (1996).

    Article  Google Scholar 

  3. Kahneman, D. A perspective on judgment and choice: mapping bounded rationality. Behav. Sci. 58, 697–720 (2003).

    Google Scholar 

  4. Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).

    Article  CAS  PubMed  Google Scholar 

  5. Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Robbins, T. W. & Costa, R. M. Habits. Curr. Biol. 27, R1200–R1206 (2017).

    Article  CAS  PubMed  Google Scholar 

  7. Adams, C. D. & Dickinson, A. Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B 33, 109–121 (1981).

    Article  Google Scholar 

  8. Adams, C. D. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol. B 34, 77–98 (1982).

    Article  Google Scholar 

  9. Colwill, R. M. & Rescorla, R. A. Postconditioning devaluation of a reinforcer affects instrumental responding. J. Exp. Psychol. Anim. Behav. Process. 11, 120–132 (1985).

    Article  Google Scholar 

  10. Sutton, R. S. & Barto, A. G. Introduction to Reinforcement Learning Vol. 4 (The MIT Press, 1998).

  11. Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1005768 (2017).

  13. Wan Lee, S., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).

    Article  CAS  Google Scholar 

  14. Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349, 273–278 (2015).

    Article  CAS  PubMed  Google Scholar 

  15. Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A. & Daw, N. D. Working-memory capacity protects model-based learning from stress. Proc. Natl Acad. Sci. USA 110, 20941–20946 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Worbe, Y. et al. Valence-dependent influence of serotonin depletion on model-based choice strategy. Mol. Psychiatry 21, 624–629 (2016).

    Article  CAS  PubMed  Google Scholar 

  19. Friedel, E. et al. Devaluation and sequential decisions: linking goal-directed and model-based behavior. Front. Hum. Neurosci. 8, 587 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Otto, A. R., Gershman, S. J., Markman, A. B. & Daw, N. D. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol. Sci. 24, 751–761 (2013).

    Article  PubMed  Google Scholar 

  21. Skatova, A., Chan, P. A. & Daw, N. D. Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task. Front. Hum. Neurosci. 7, 525 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Eppinger, B., Walter, M., Heekeren, H. R. & Li, S. C. Of goals and habits: age-related and individual differences in goal-directed decision-making. Front. Neurosci. https://doi.org/10.3389/fnins.2013.00253 (2013).

  23. Smittenaar, P., FitzGerald, T. H. B., Romei, V., Wright, N. D. & Dolan, R. J. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80, 914–919 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Schad, D. J. et al. Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning. Front. Psychol. 5, 1450 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Radenbach, C. et al. The interaction of acute and chronic stress impairs model-based behavioral control. Psychoneuroendocrinology 53, 268–280 (2015).

    Article  PubMed  Google Scholar 

  26. Deserno, L. et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc. Natl Acad. Sci. USA 112, 1595–1600 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R. J. Model-based reasoning in humans becomes automatic with training. PLoS Comput. Biol. 11, e1004463 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Sebold, M. et al. Model-based and model-free decisions in alcohol dependence. Neuropsychobiology 70, 122–131 (2014).

    Article  CAS  PubMed  Google Scholar 

  29. Voon, V. et al. Disorders of compulsivity: a common bias towards learning habits. Mol. Psychiatry 20, 345–352 (2015).

    Article  CAS  PubMed  Google Scholar 

  30. Voon, V. et al. Motivation and value influences in the relative balance of goal-directed and habitual behaviours in obsessive-compulsive disorder. Transl. Psychiatry 5, e670 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A. & Daw, N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5, e11305 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Culbreth, A. J., Westbrook, A., Daw, N. D., Botvinick, M. & Barch, D. M. Reduced model-based decision-making in schizophrenia. J. Abnorm. Psychol. 125, 777–787 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  33. da Silva, C. F. & Hare, T. Humans primarily use model-based inference in the two-stage task. Nat. Hum. Behav. 4, 1053–1066 (2020).

    Article  Google Scholar 

  34. Kaufman, A., Baron, A. & Kopp, R. E. Some effects of instructions on human operant behavior. Psychon. Monogr. Suppl. 1, 243–250 (1966).

    Google Scholar 

  35. Baron, A., Kaufman, A. & Stauber, K. A. Effects of instructions and reinforcement-feedback on human operant behavior maintained by fixed-interval reinforcement. J. Exp. Anal. Behav. https://doi.org/10.1901/jeab.1969.12-701 (1969).

  36. Baron, A. & Galizio, M. Instructional control of human operant behavior. Psychol. Rec. 33, 495 (1983).

    Google Scholar 

  37. Wilson, G. D. Reversal of differential GSR conditioning by instructions. J. Exp. Psychol. 76, 491–493 (1968).

    Article  CAS  PubMed  Google Scholar 

  38. Atlas, L. Y., Doll, B. B., Li, J., Daw, N. D. & Phelps, E. A. Instructed knowledge shapes feedback-driven aversive learning in striatum and orbitofrontal cortex, but not the amygdala. eLife https://doi.org/10.7554/elife.15192 (2016).

  39. Doll, B. B., Jacobs, W. J., Sanfey, A. G. & Frank, M. J. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation. Brain Res. 1299, 74–94 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Biele, G., Rieskamp, J. & Gonzalez, R. Computational models for the combination of advice and individual learning. Cogn. Sci. https://doi.org/10.1111/j.1551-6709.2009.01010.x (2009).

  41. Li, J., Delgado, M. R. & Phelps, E. A. How instructed knowledge modulates the neural systems of reward learning. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.1014938108 (2011).

  42. Hertwig, R. & Erev, I. The description–experience gap in risky choice. Trends Cogn. Sci. https://doi.org/10.1016/j.tics.2009.09.004 (2009).

  43. Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Comput. Biol. 11, e1004648 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Kool, W., Cushman, F. A. & Gershman, S. J. When does model-based control pay off? PLoS Comput. Biol. 12, e1005090 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Balleine, B. W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).

    Article  CAS  PubMed  Google Scholar 

  46. Bostan, A. C. & Strick, P. L. The basal ganglia and the cerebellum: nodes in an integrated network. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-018-0002-7 (2018).

  47. Thorndike, E. L. Animal intelligence: an experimental study of the associative processes in animals. Psychol. Rev. 2, 1–107 (1898).

    Google Scholar 

  48. Biele, G., Rieskamp, J., Krugel, L. K. & Heekeren, H. R. The neural basis of following advice. PLoS Biol. https://doi.org/10.1371/journal.pbio.1001089 (2011).

  49. Gillan, C. M. et al. Comparison of the association between goal-directed planning and self-reported compulsivity vs obsessive-compulsive disorder diagnosis. JAMA Psychiatry https://doi.org/10.1001/jamapsychiatry.2019.2998 (2020).

  50. Hirschtritt, M. E., Bloch, M. H. & Mathews, C. A. Obsessive-compulsive disorder advances in diagnosis and treatment. J. Am. Med. Assoc. https://doi.org/10.1001/jama.2017.2200 (2017).

  51. Wheaton, M. G., Gillan, C. M. & Simpson, H. B. Does cognitive–behavioral therapy affect goal-directed planning in obsessive-compulsive disorder? Psychiatry Res. https://doi.org/10.1016/j.psychres.2018.12.079 (2019).

  52. Shahar, N. et al. Credit assignment to state-independent task representations and its relationship with model-based decision making. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.1821647116 (2019).

  53. Rushworth, M. F. S., Behrens, T. E. J., Rudebeck, P. H. & Walton, M. E. Contrasting roles for cingulate and orbitofrontal cortex in decisions and social behaviour. Trends Cogn. Sci. https://doi.org/10.1016/j.tics.2007.01.004 (2007).

  54. Akam, T. et al. The anterior cingulate cortex predicts future states to mediate model-based action selection. Neuron 109, 149–163 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Konovalov, A. & Krajbich, I. Mouse tracking reveals structure knowledge in the absence of model-based choice. Nat. Commun. 11, 1893 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Gershman, S. J. & Uchida, N. Believing in dopamine. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-019-0220-7 (2019).

  57. Baxter, L. R. Jr. et al. Local cerebral glucose metabolic rates in obsessive-compulsive disorder: a comparison with rates in unipolar depression and in normal controls. Arch. Gen. Psychiatry 44, 211–218 (1987).

    Article  PubMed  Google Scholar 

  58. Menzies, L. et al. Integrating evidence from neuroimaging and neuropsychological studies of obsessive-compulsive disorder: the orbitofronto-striatal model revisited. Neurosci. Biobehav. Rev. 32, 525–549 (2008).

    Article  PubMed  Google Scholar 

  59. Chamberlain, S. R. et al. Orbitofrontal dysfunction in patients with obsessive-compulsive disorder and their unaffected relatives. Science https://doi.org/10.1126/science.1154433 (2008).

  60. Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron https://doi.org/10.1016/j.neuron.2016.08.019 (2016).

  61. Piray, P. & Daw, N. Linear reinforcement learning: flexible reuse of computation in planning, grid fields, and cognitive control. Nat. Commun. 12, 4942 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Collins, A. G. E. & Cockburn, J. Beyond dichotomies in reinforcement learning. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-020-0355-6 (2020).

  63. Farashahi, S., Rowe, K., Aslami, Z., Lee, D. & Soltani, A. Feature-based learning improves adaptability without compromising precision. Nat. Commun. https://doi.org/10.1038/s41467-017-01874-w (2017).

  64. Farashahi, S., Xu, J., Wu, S. W. & Soltani, A. Learning arbitrary stimulus–reward associations for naturalistic stimuli involves transition from learning about features to learning about objects. Cognition https://doi.org/10.1016/j.cognition.2020.104425 (2020).

  65. Sheehan, D. V. et al. The validity of the Mini International Neuropsychiatric Interview (MINI) according to the SCID-P and its reliability. Eur. Psychiatry 12, 232–241 (1997).

    Article  Google Scholar 

  66. First, M. B., Spitzer, R. L., Gibbon, M. & Williams, J. B. W. Structured Clinical Interview for DSM-IV Axis I Disorders (New York State Psychiatric Institute, 2002).

  67. Goodman, W. K. et al. The Yale–Brown Obsessive Compulsive Scale: I. Development, use, and reliability. Arch. Gen. Psychiatry 46, 1006–1011 (1989).

    Article  CAS  PubMed  Google Scholar 

  68. Storch, E. A. et al. Development and psychometric evaluation of the Yale–Brown Obsessive-Compulsive Scale—second edition. Psychol. Assess. 22, 223–232 (2010).

    Article  PubMed  Google Scholar 

  69. Spielberger, C. Manual for the State-Trait Anxiety Inventory (STAI) (Consulting Psychologists Press, 1983).

  70. Castro-Rodrigues, P. et al. Criterion validity of the Yale–Brown Obsessive-Compulsive Scale second edition for diagnosis of obsessive-compulsive disorder in adults. Front. Psychiatry https://doi.org/10.3389/fpsyt.2018.00397 (2018).

  71. Beck, A. T., Steer, R. A. & Brown, G. K. Manual for the Beck Depression Inventory-II (Psychological Corporation, 1996).

  72. Berch, D. B., Krikorian, R. & Huha, E. M. The Corsi block-tapping task: methodological and theoretical considerations. Brain Cogn. 38, 317–338 (1998).

    Article  CAS  PubMed  Google Scholar 

  73. Mueller, S. T. & Piper, B. J. The Psychology Experiment Building Language (PEBL) and PEBL Test Battery. J. Neurosci. Methods 222, 250–259 (2014).

    Article  PubMed  Google Scholar 

  74. Lovibond, S. H. & Lovibond, P. F. Manual for the Depression Anxiety Stress Scales (Psychology Foundation of Australia, 1995); https://doi.org/10.1016/0005-7967(94)00075-U

  75. Huys, Q. J. M. et al. Disentangling the roles of approach, activation and valence in instrumental and Pavlovian responding. PLoS Comput. Biol. 7, e1002028 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

P.C.-R. was supported by a doctoral fellowship (reference no. SFRH/SINTD/94350/2013) from Fundação para a Ciência e Tecnologia and by a Fulbright Research Grant from the Bureau of Educational and Cultural Affairs of the US Department of State. T.A. was funded by Wellcome Trust grants no. WT096193AIA, no. 202831/Z/16/Z and no. 214314/Z/18/Z. A.M. was supported by a doctoral fellowship (reference no. SFRH/BD/144508/2019) from Fundação para a Ciência e Tecnologia. J.B.B.-C. was supported by grant no. PTDC/MEC-PSQ/30302/2017-IC&DT-LISBOA-01-0145-FEDER, funded by national funds from FCT/MCTES and co-funded by FEDER, under the Partnership Agreement Lisboa 2020—Programa Operacional Regional de Lisboa. P.D. was supported by the Max-Planck-Gesellschaft (Max Planck Society) and the Alexander von Humboldt-Stiftung (Alexander von Humboldt Foundation). A.J.O.-M. was supported by grant no. PTDC/MEC-PSQ/30302/2017-IC&DT-LISBOA-01-0145-FEDER, funded by national funds from FCT/MCTES and co-funded by FEDER, under the Partnership Agreement Lisboa 2020—Programa Operacional Regional de Lisboa, by grant no. PTDC/MED-NEU/31331/2017 from Fundação para a Ciência e Tecnologia, and by a Starting Grant from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 950357). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

P.C.-R., T.A., J.B.B.-C., H.B.S., R.M.C. and A.J.O.-M. conceived and designed the experiments. P.C.-R., I.S., M.C. and A.M. performed the experiments. P.C.-R. and T.A. analysed the data. T.A., V.P., P.D., R.M.C. and A.J.O.-M. contributed the materials and analysis tools. P.C.-R., T.A. and A.J.O.-M. wrote the paper.

Corresponding author

Correspondence to Albino J. Oliveira-Maia.

Ethics declarations

Competing interests

J.B.B.-C. received honoraria from Janssen-Cilag, Ltd, as a member of a local Advisory Board. H.B.S. has received research support for an industry-sponsored clinical trial from Biohaven Pharmaceuticals, royalties from UpToDate Inc. and a stipend from the American Medical Association for her role as Associate Editor of JAMA Psychiatry. A.J.O.-M. was the national coordinator for Portugal of a non-interventional study (EDMS-ERI-143085581, 4.0) to characterize a Treatment-Resistant Depression Cohort in Europe, sponsored by Janssen-Cilag, Ltd (2019–2020) and of a trial of psilocybin therapy for treatment-resistant depression, sponsored by Compass Pathways, Ltd (EudraCT nos. 2017-003288-36; 2020–2021); is recipient of a grant from Schuhfried GmBH for norming and validation of cognitive tests; and is national coordinator for Portugal of a trial of esketamine for treatment-resistant depression, sponsored by Janssen-Cilag, Ltd (EudraCT no. 2019-002992-33). The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Laurence Hunt, Alireza Soltani and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Results, Tables 1–5 and Figs. 1–10.

Reporting Summary

Peer Review File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Castro-Rodrigues, P., Akam, T., Snorasson, I. et al. Explicit knowledge of task structure is a primary determinant of human model-based action. Nat Hum Behav 6, 1126–1141 (2022). https://doi.org/10.1038/s41562-022-01346-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41562-022-01346-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing