Abstract
Explicit information obtained through instruction profoundly shapes human choice behaviour. However, this has been studied in computationally simple tasks, and it is unknown how model-based and model-free systems, respectively generating goal-directed and habitual actions, are affected by the absence or presence of instructions. We assessed behaviour in a variant of a computationally more complex decision-making task, before and after providing information about task structure, both in healthy volunteers and in individuals suffering from obsessive-compulsive or other disorders. Initial behaviour was model-free, with rewards directly reinforcing preceding actions. Model-based control, employing predictions of states resulting from each action, emerged with experience in a minority of participants, and less in those with obsessive-compulsive disorder. Providing task structure information strongly increased model-based control, similarly across all groups. Thus, in humans, explicit task structural knowledge is a primary determinant of model-based reinforcement learning and is most readily acquired from instruction rather than experience.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data used in the study are available from https://github.com/ThomasAkam/Two-step_explicit_knowledge.
Code availability
The two-step task analysis code is available from https://github.com/ThomasAkam/Two-step_explicit_knowledge.
References
Dickinson, A. Actions and habits: the development of behavioural autonomy. Phil. Trans. R. Soc. B 308, 67–78 (1985).
Sloman, S. A. The empirical case for two systems of reasoning. Psychol. Bull. 119, 3–22 (1996).
Kahneman, D. A perspective on judgment and choice: mapping bounded rationality. Behav. Sci. 58, 697–720 (2003).
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
Robbins, T. W. & Costa, R. M. Habits. Curr. Biol. 27, R1200–R1206 (2017).
Adams, C. D. & Dickinson, A. Instrumental responding following reinforcer devaluation. Q. J. Exp. Psychol. B 33, 109–121 (1981).
Adams, C. D. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol. B 34, 77–98 (1982).
Colwill, R. M. & Rescorla, R. A. Postconditioning devaluation of a reinforcer affects instrumental responding. J. Exp. Psychol. Anim. Behav. Process. 11, 120–132 (1985).
Sutton, R. S. & Barto, A. G. Introduction to Reinforcement Learning Vol. 4 (The MIT Press, 1998).
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1005768 (2017).
Wan Lee, S., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349, 273–278 (2015).
Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
Otto, A. R., Raio, C. M., Chiang, A., Phelps, E. A. & Daw, N. D. Working-memory capacity protects model-based learning from stress. Proc. Natl Acad. Sci. USA 110, 20941–20946 (2013).
Worbe, Y. et al. Valence-dependent influence of serotonin depletion on model-based choice strategy. Mol. Psychiatry 21, 624–629 (2016).
Friedel, E. et al. Devaluation and sequential decisions: linking goal-directed and model-based behavior. Front. Hum. Neurosci. 8, 587 (2014).
Otto, A. R., Gershman, S. J., Markman, A. B. & Daw, N. D. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol. Sci. 24, 751–761 (2013).
Skatova, A., Chan, P. A. & Daw, N. D. Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task. Front. Hum. Neurosci. 7, 525 (2013).
Eppinger, B., Walter, M., Heekeren, H. R. & Li, S. C. Of goals and habits: age-related and individual differences in goal-directed decision-making. Front. Neurosci. https://doi.org/10.3389/fnins.2013.00253 (2013).
Smittenaar, P., FitzGerald, T. H. B., Romei, V., Wright, N. D. & Dolan, R. J. Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron 80, 914–919 (2013).
Schad, D. J. et al. Processing speed enhances model-based over model-free reinforcement learning in the presence of high working memory functioning. Front. Psychol. 5, 1450 (2014).
Radenbach, C. et al. The interaction of acute and chronic stress impairs model-based behavioral control. Psychoneuroendocrinology 53, 268–280 (2015).
Deserno, L. et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making. Proc. Natl Acad. Sci. USA 112, 1595–1600 (2015).
Economides, M., Kurth-Nelson, Z., Lübbert, A., Guitart-Masip, M. & Dolan, R. J. Model-based reasoning in humans becomes automatic with training. PLoS Comput. Biol. 11, e1004463 (2015).
Sebold, M. et al. Model-based and model-free decisions in alcohol dependence. Neuropsychobiology 70, 122–131 (2014).
Voon, V. et al. Disorders of compulsivity: a common bias towards learning habits. Mol. Psychiatry 20, 345–352 (2015).
Voon, V. et al. Motivation and value influences in the relative balance of goal-directed and habitual behaviours in obsessive-compulsive disorder. Transl. Psychiatry 5, e670 (2015).
Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A. & Daw, N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5, e11305 (2016).
Culbreth, A. J., Westbrook, A., Daw, N. D., Botvinick, M. & Barch, D. M. Reduced model-based decision-making in schizophrenia. J. Abnorm. Psychol. 125, 777–787 (2016).
da Silva, C. F. & Hare, T. Humans primarily use model-based inference in the two-stage task. Nat. Hum. Behav. 4, 1053–1066 (2020).
Kaufman, A., Baron, A. & Kopp, R. E. Some effects of instructions on human operant behavior. Psychon. Monogr. Suppl. 1, 243–250 (1966).
Baron, A., Kaufman, A. & Stauber, K. A. Effects of instructions and reinforcement-feedback on human operant behavior maintained by fixed-interval reinforcement. J. Exp. Anal. Behav. https://doi.org/10.1901/jeab.1969.12-701 (1969).
Baron, A. & Galizio, M. Instructional control of human operant behavior. Psychol. Rec. 33, 495 (1983).
Wilson, G. D. Reversal of differential GSR conditioning by instructions. J. Exp. Psychol. 76, 491–493 (1968).
Atlas, L. Y., Doll, B. B., Li, J., Daw, N. D. & Phelps, E. A. Instructed knowledge shapes feedback-driven aversive learning in striatum and orbitofrontal cortex, but not the amygdala. eLife https://doi.org/10.7554/elife.15192 (2016).
Doll, B. B., Jacobs, W. J., Sanfey, A. G. & Frank, M. J. Instructional control of reinforcement learning: a behavioral and neurocomputational investigation. Brain Res. 1299, 74–94 (2009).
Biele, G., Rieskamp, J. & Gonzalez, R. Computational models for the combination of advice and individual learning. Cogn. Sci. https://doi.org/10.1111/j.1551-6709.2009.01010.x (2009).
Li, J., Delgado, M. R. & Phelps, E. A. How instructed knowledge modulates the neural systems of reward learning. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.1014938108 (2011).
Hertwig, R. & Erev, I. The description–experience gap in risky choice. Trends Cogn. Sci. https://doi.org/10.1016/j.tics.2009.09.004 (2009).
Akam, T., Costa, R. & Dayan, P. Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Comput. Biol. 11, e1004648 (2015).
Kool, W., Cushman, F. A. & Gershman, S. J. When does model-based control pay off? PLoS Comput. Biol. 12, e1005090 (2016).
Balleine, B. W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).
Bostan, A. C. & Strick, P. L. The basal ganglia and the cerebellum: nodes in an integrated network. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-018-0002-7 (2018).
Thorndike, E. L. Animal intelligence: an experimental study of the associative processes in animals. Psychol. Rev. 2, 1–107 (1898).
Biele, G., Rieskamp, J., Krugel, L. K. & Heekeren, H. R. The neural basis of following advice. PLoS Biol. https://doi.org/10.1371/journal.pbio.1001089 (2011).
Gillan, C. M. et al. Comparison of the association between goal-directed planning and self-reported compulsivity vs obsessive-compulsive disorder diagnosis. JAMA Psychiatry https://doi.org/10.1001/jamapsychiatry.2019.2998 (2020).
Hirschtritt, M. E., Bloch, M. H. & Mathews, C. A. Obsessive-compulsive disorder advances in diagnosis and treatment. J. Am. Med. Assoc. https://doi.org/10.1001/jama.2017.2200 (2017).
Wheaton, M. G., Gillan, C. M. & Simpson, H. B. Does cognitive–behavioral therapy affect goal-directed planning in obsessive-compulsive disorder? Psychiatry Res. https://doi.org/10.1016/j.psychres.2018.12.079 (2019).
Shahar, N. et al. Credit assignment to state-independent task representations and its relationship with model-based decision making. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.1821647116 (2019).
Rushworth, M. F. S., Behrens, T. E. J., Rudebeck, P. H. & Walton, M. E. Contrasting roles for cingulate and orbitofrontal cortex in decisions and social behaviour. Trends Cogn. Sci. https://doi.org/10.1016/j.tics.2007.01.004 (2007).
Akam, T. et al. The anterior cingulate cortex predicts future states to mediate model-based action selection. Neuron 109, 149–163 (2021).
Konovalov, A. & Krajbich, I. Mouse tracking reveals structure knowledge in the absence of model-based choice. Nat. Commun. 11, 1893 (2020).
Gershman, S. J. & Uchida, N. Believing in dopamine. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-019-0220-7 (2019).
Baxter, L. R. Jr. et al. Local cerebral glucose metabolic rates in obsessive-compulsive disorder: a comparison with rates in unipolar depression and in normal controls. Arch. Gen. Psychiatry 44, 211–218 (1987).
Menzies, L. et al. Integrating evidence from neuroimaging and neuropsychological studies of obsessive-compulsive disorder: the orbitofronto-striatal model revisited. Neurosci. Biobehav. Rev. 32, 525–549 (2008).
Chamberlain, S. R. et al. Orbitofrontal dysfunction in patients with obsessive-compulsive disorder and their unaffected relatives. Science https://doi.org/10.1126/science.1154433 (2008).
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron https://doi.org/10.1016/j.neuron.2016.08.019 (2016).
Piray, P. & Daw, N. Linear reinforcement learning: flexible reuse of computation in planning, grid fields, and cognitive control. Nat. Commun. 12, 4942 (2021).
Collins, A. G. E. & Cockburn, J. Beyond dichotomies in reinforcement learning. Nat. Rev. Neurosci. https://doi.org/10.1038/s41583-020-0355-6 (2020).
Farashahi, S., Rowe, K., Aslami, Z., Lee, D. & Soltani, A. Feature-based learning improves adaptability without compromising precision. Nat. Commun. https://doi.org/10.1038/s41467-017-01874-w (2017).
Farashahi, S., Xu, J., Wu, S. W. & Soltani, A. Learning arbitrary stimulus–reward associations for naturalistic stimuli involves transition from learning about features to learning about objects. Cognition https://doi.org/10.1016/j.cognition.2020.104425 (2020).
Sheehan, D. V. et al. The validity of the Mini International Neuropsychiatric Interview (MINI) according to the SCID-P and its reliability. Eur. Psychiatry 12, 232–241 (1997).
First, M. B., Spitzer, R. L., Gibbon, M. & Williams, J. B. W. Structured Clinical Interview for DSM-IV Axis I Disorders (New York State Psychiatric Institute, 2002).
Goodman, W. K. et al. The Yale–Brown Obsessive Compulsive Scale: I. Development, use, and reliability. Arch. Gen. Psychiatry 46, 1006–1011 (1989).
Storch, E. A. et al. Development and psychometric evaluation of the Yale–Brown Obsessive-Compulsive Scale—second edition. Psychol. Assess. 22, 223–232 (2010).
Spielberger, C. Manual for the State-Trait Anxiety Inventory (STAI) (Consulting Psychologists Press, 1983).
Castro-Rodrigues, P. et al. Criterion validity of the Yale–Brown Obsessive-Compulsive Scale second edition for diagnosis of obsessive-compulsive disorder in adults. Front. Psychiatry https://doi.org/10.3389/fpsyt.2018.00397 (2018).
Beck, A. T., Steer, R. A. & Brown, G. K. Manual for the Beck Depression Inventory-II (Psychological Corporation, 1996).
Berch, D. B., Krikorian, R. & Huha, E. M. The Corsi block-tapping task: methodological and theoretical considerations. Brain Cogn. 38, 317–338 (1998).
Mueller, S. T. & Piper, B. J. The Psychology Experiment Building Language (PEBL) and PEBL Test Battery. J. Neurosci. Methods 222, 250–259 (2014).
Lovibond, S. H. & Lovibond, P. F. Manual for the Depression Anxiety Stress Scales (Psychology Foundation of Australia, 1995); https://doi.org/10.1016/0005-7967(94)00075-U
Huys, Q. J. M. et al. Disentangling the roles of approach, activation and valence in instrumental and Pavlovian responding. PLoS Comput. Biol. 7, e1002028 (2011).
Acknowledgements
P.C.-R. was supported by a doctoral fellowship (reference no. SFRH/SINTD/94350/2013) from Fundação para a Ciência e Tecnologia and by a Fulbright Research Grant from the Bureau of Educational and Cultural Affairs of the US Department of State. T.A. was funded by Wellcome Trust grants no. WT096193AIA, no. 202831/Z/16/Z and no. 214314/Z/18/Z. A.M. was supported by a doctoral fellowship (reference no. SFRH/BD/144508/2019) from Fundação para a Ciência e Tecnologia. J.B.B.-C. was supported by grant no. PTDC/MEC-PSQ/30302/2017-IC&DT-LISBOA-01-0145-FEDER, funded by national funds from FCT/MCTES and co-funded by FEDER, under the Partnership Agreement Lisboa 2020—Programa Operacional Regional de Lisboa. P.D. was supported by the Max-Planck-Gesellschaft (Max Planck Society) and the Alexander von Humboldt-Stiftung (Alexander von Humboldt Foundation). A.J.O.-M. was supported by grant no. PTDC/MEC-PSQ/30302/2017-IC&DT-LISBOA-01-0145-FEDER, funded by national funds from FCT/MCTES and co-funded by FEDER, under the Partnership Agreement Lisboa 2020—Programa Operacional Regional de Lisboa, by grant no. PTDC/MED-NEU/31331/2017 from Fundação para a Ciência e Tecnologia, and by a Starting Grant from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 950357). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
P.C.-R., T.A., J.B.B.-C., H.B.S., R.M.C. and A.J.O.-M. conceived and designed the experiments. P.C.-R., I.S., M.C. and A.M. performed the experiments. P.C.-R. and T.A. analysed the data. T.A., V.P., P.D., R.M.C. and A.J.O.-M. contributed the materials and analysis tools. P.C.-R., T.A. and A.J.O.-M. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
J.B.B.-C. received honoraria from Janssen-Cilag, Ltd, as a member of a local Advisory Board. H.B.S. has received research support for an industry-sponsored clinical trial from Biohaven Pharmaceuticals, royalties from UpToDate Inc. and a stipend from the American Medical Association for her role as Associate Editor of JAMA Psychiatry. A.J.O.-M. was the national coordinator for Portugal of a non-interventional study (EDMS-ERI-143085581, 4.0) to characterize a Treatment-Resistant Depression Cohort in Europe, sponsored by Janssen-Cilag, Ltd (2019–2020) and of a trial of psilocybin therapy for treatment-resistant depression, sponsored by Compass Pathways, Ltd (EudraCT nos. 2017-003288-36; 2020–2021); is recipient of a grant from Schuhfried GmBH for norming and validation of cognitive tests; and is national coordinator for Portugal of a trial of esketamine for treatment-resistant depression, sponsored by Janssen-Cilag, Ltd (EudraCT no. 2019-002992-33). The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Laurence Hunt, Alireza Soltani and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Methods, Results, Tables 1–5 and Figs. 1–10.
Rights and permissions
About this article
Cite this article
Castro-Rodrigues, P., Akam, T., Snorasson, I. et al. Explicit knowledge of task structure is a primary determinant of human model-based action. Nat Hum Behav 6, 1126–1141 (2022). https://doi.org/10.1038/s41562-022-01346-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-022-01346-2
This article is cited by
-
Neurocomputational mechanisms involved in adaptation to fluctuating intentions of others
Nature Communications (2024)
-
Memory for rewards guides retrieval
Communications Psychology (2024)
-
Using smartphones to optimise and scale-up the assessment of model-based planning
Communications Psychology (2023)
-
Active inference and the two-step task
Scientific Reports (2022)
-
Spontaneous instrumental avoidance learning in social contexts
Scientific Reports (2022)