Abstract
Arguably, the most difficult part of learning is deciding what to learn about. Should I associate the positive outcome of safely completing a street-crossing with the situation ‘the car approaching the crosswalk was red’ or with ‘the approaching car was slowing down’? In this Perspective, we summarize our recent research into the computational and neural underpinnings of ‘representation learning’—how humans (and other animals) construct task representations that allow efficient learning and decision-making. We first discuss the problem of learning what to ignore when confronted with too much information, so that experience can properly generalize across situations. We then turn to the problem of augmenting perceptual information with inferred latent causes that embody unobservable task-relevant information, such as contextual knowledge. Finally, we discuss recent findings regarding the neural substrates of task representations that suggest the orbitofrontal cortex represents ‘task states’, deploying them for decision-making and learning elsewhere in the brain.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015).
Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V. & Niv, Y. Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93, 451–463 (2017).
Gershman, S. J., Blei, D. M. & Niv, Y. Context, learning, and extinction. Psychol. Rev. 117, 197–209 (2010).
Gershman, S. J., Norman, K. A. & Niv, Y. Discovering latent causes in reinforcement learning. Curr. Opin. Behav. Sci. 5, 43–50 (2015).
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
Sutton, R.S. & Barto, A.G. Reinforcement Learning: An Introduction. (MIT Press, 2018).
Kaelbling, L. P., Littman, M. L. & Moore, A. W. Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996).
Daw, N.D. & Tobler, P.N. Value learning through reinforcement: the basics of dopamine and reinforcement learning. in Neuroeconomics. (eds. Glimcher, P. W. & Fehr, E.) 283–298 (Academic Press, 2014).
Daw, N.D. & O’Doherty, J.P. Multiple systems for value learning. in Neuroeconomics. (eds. Glimcher, P. W. & Fehr, E.) 393–410 (Academic Press, 2014).
Niv, Y. & Langdon, A. Reinforcement learning with Marr. Curr. Opin. Behav. Sci. 11, 67–73 (2016).
Watkins, C. J. C. H. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).
Friedrich, J. & Lengyel, M. Goal-directed decision making with spiking neurons. J. Neurosci. 36, 1529–1546 (2016).
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Keramati, M., Smittenaar, P., Dolan, R. J. & Dayan, P. Adaptive integration of habits into depth-limited planning defines a habitual-goal-directed spectrum. Proc. Natl Acad. Sci. USA 113, 12868–12873 (2016).
Barto, A.G. Adaptive critics and the basal ganglia. in Models of Information Processing in the Basal Ganglia (eds. Houk, J. C., Davis, J. L. & Beiser, D. G.) 215–232 (MIT Press, 1995).
Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Yin, H. H., Knowlton, B. J. & Balleine, B. W. Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behav. Brain Res. 166, 189–196 (2006).
Miller, K. J., Botvinick, M. M. & Brody, C. D. Dorsal hippocampus contributes to model-based planning. Nat. Neurosci. 20, 1269–1276 (2017).
Vikbladh, O. M. et al. Hippocampal contributions to model-based planning and spatial memory. Neuron 102, 683–693.e4 (2019).
McDannald, M. A., Lucantonio, F., Burke, K. A., Niv, Y. & Schoenbaum, G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J. Neurosci. 31, 2700–2705 (2011).
Boorman, E. D., Rajendran, V. G., O’Reilly, J. X. & Behrens, T. E. Two anatomically and computationally distinct learning signals predict changes to stimulus-outcome associations in hippocampus. Neuron 89, 1343–1354 (2016).
Kempadoo, K. A., Mosharov, E. V., Choi, S. J., Sulzer, D. & Kandel, E. R. Dopamine release from the locus coeruleus to the dorsal hippocampus promotes spatial learning and memory. Proc. Natl Acad. Sci. USA 113, 14835–14840 (2016).
Rouhani, N., Norman, K. A. & Niv, Y. Dissociable effects of surprising rewards on learning and memory. J. Exp. Psychol. Learn. Mem. Cogn. 44, 1430–1443 (2018).
Langdon, A. J., Sharpe, M. J., Schoenbaum, G. & Niv, Y. Model-based predictions for dopamine. Curr. Opin. Neurobiol. 49, 1–7 (2018).
Ponsen, M., Taylor, M.E. & Tuyls, K. Abstraction and generalization in reinforcement learning: a summary and framework. in International Workshop on Adaptive and Learning Agents (ALA 2009): Adaptive and Learning Agents. (eds. Taylor M.E. & Tuyls K.) 1–32 (Springer, 2010).
Canas, F. & Jones, M. Attention and reinforcement learning: constructing representations from indirect feedback. Proc. Annu. Meet. Cogn. Sci. Soc. 32, 1264–1269 (2010).
Jones, M. & Canas, F. Integrating reinforcement learning with models of representation learning. Proc. Annu. Meet. Cogn. Sci. Soc. 32, 1258–1263 (2010).
Bellman, R. Dynamic Programming (Princeton University Press, 1957)
Sutton, R.S. Generalization in reinforcement learning: Successful examples using sparse coarse coding. in Advances in Neural Information Processing Systems (eds. Touretzky, D. S., Mozer, M. C. & Hasselmo, M. E.) 1038–1044 (1996).
Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994).
Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Evaluating the TD model of classical conditioning. Learn. Behav. 40, 305–319 (2012).
McCallum, R. A. Hidden state and reinforcement learning with instance-based state identification. IEEE Trans. Syst. Man Cybern. B Cybern. 26, 464–473 (1996).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).
Wang, J.X. et al. Learning to reinforcement learn. Preprint at arXiv https://arxiv.org/abs/1611.05763 (2016).
Bramley, N. R., Dayan, P., Griffiths, T. L. & Lagnado, D. A. Formalizing Neurath’s ship: Approximate algorithms for online causal learning. Psychol. Rev. 124, 301–338 (2017).
Griffiths, T. L., Chater, N., Kemp, C., Perfors, A. & Tenenbaum, J. B. Probabilistic models of cognition: exploring representations and inductive biases. Trends Cogn. Sci. 14, 357–364 (2010).
Dias, R., Robbins, T. W. & Roberts, A. C. Primate analogue of the Wisconsin Card Sorting Test: effects of excitotoxic lesions of the prefrontal cortex in the marmoset. Behav. Neurosci. 110, 872–886 (1996).
Frank, M. J., Seeberger, L. C. & O’reilly, R. C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306, 1940–1943 (2004).
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Milner, B. Effects of different brain lesions on card sorting. Arch. Neurol. 9, 90–100 (1963).
Kruschke, J. K. ALCOVE: an exemplar-based connectionist model of category learning. Psychol. Rev. 99, 22–44 (1992).
Petersen, S. E. & Posner, M. I. The attention system of the human brain: 20 years after. Annu. Rev. Neurosci. 35, 73–89 (2012).
Corbetta, M. & Shulman, G. L. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3, 201–215 (2002).
Kruschke, J.K. Learning involves attention. in: Connectionist Models in Cognitive Psychology (ed. Houghton, G.) 113–140 (Psychology Press, 2005).
Kruschke, J. K. Toward a unified model of attention in associative learning. J. Math. Psychol. 45, 812–863 (2001).
McCallum, R.A. Instance-based utile distinctions for reinforcement learning with hidden state. in Machine Learning Proceedings 1995 (eds. Prieditis, A. &Russell, S.) 387–395 (Morgan Kaufmann, 1995).
Collins, A. G. & Frank, M. J. Cognitive control over learning: creating, clustering, and generalizing task-set structure. Psychol. Rev. 120, 190–229 (2013).
Langdon, A.J., Song, M. & Niv, Y. Uncovering the ‘state’: tracing the hidden state representations that structure learning and decision-making. Behav. Processes https://doi.org/10.1016/j.beproc.2019.103891 (2019).
Love, B. C., Medin, D. L. & Gureckis, T. M. SUSTAIN: a network model of category learning. Psychol. Rev. 111, 309–332 (2004).
Rescorla, R. A. Spontaneous recovery. Learn. Mem. 11, 501–509 (2004).
Bouton, M. E. Context and behavioral processes in extinction. Learn. Mem. 11, 485–494 (2004).
Rescorla, R. A. & Heth, C. D. Reinstatement of fear to an extinguished conditioned stimulus. J. Exp. Psychol. Anim. Behav. Process. 1, 88–96 (1975).
Gershman, S. J., Jones, C. E., Norman, K. A., Monfils, M. H. & Niv, Y. Gradual extinction prevents the return of fear: implications for the discovery of state. Front. Behav. Neurosci. 7, 164 (2013).
Gershman, S. J. & Hartley, C. A. Individual differences in learning predict the return of fear. Learn. Behav. 43, 243–250 (2015).
Gershman, S. J. & Niv, Y. Perceptual estimation obeys Occam’s razor. Front. Psychol. 4, 623 (2013).
Preminger, S., Blumenfeld, B., Sagi, D. & Tsodyks, M. Mapping dynamic memories of gradually changing objects. Proc. Natl Acad. Sci. USA 106, 5371–5376 (2009).
Gershman, S. J., Radulescu, A., Norman, K. A. & Niv, Y. Statistical computations underlying the dynamics of memory updating. PLOS Comput. Biol. 10, e1003939 (2014).
Gershman, S. J., Monfils, M. H., Norman, K. A. & Niv, Y. The computational nature of memory modification. eLife 6, e23763 (2017).
Ji, J. & Maren, S. Hippocampal involvement in contextual modulation of fear extinction. Hippocampus 17, 749–758 (2007).
Honey, R. C. & Good, M. Selective hippocampal lesions abolish the contextual specificity of latent inhibition and conditioning. Behav. Neurosci. 107, 23–33 (1993).
Yap, C. S. & Richardson, R. Extinction in the developing rat: an examination of renewal effects. Dev. Psychobiol. 49, 565–575 (2007).
Yap, C. S. & Richardson, R. Latent inhibition in the developing rat: an examination of context-specific effects. Dev. Psychobiol. 47, 55–65 (2005).
Knight, R. Contribution of human hippocampal region to novelty detection. Nature 383, 256–259 (1996).
Kumaran, D. & Maguire, E. A. Which computational mechanisms operate in the hippocampus during novelty detection? Hippocampus 17, 735–748 (2007).
Mednick, S. A. & Lehtinen, L. E. Stimulus generalization as a function of age in children. J. Exp. Psychol. 53, 180–183 (1957).
Droit-Volet, S., Clément, A. & Wearden, J. Temporal generalization in 3- to 8-year-old children. J. Exp. Child Psychol. 80, 271–288 (2001).
Wikenheiser, A. M. & Schoenbaum, G. Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. Nat. Rev. Neurosci. 17, 513–523 (2016).
Stalnaker, T. A., Cooch, N. K. & Schoenbaum, G. What the orbitofrontal cortex does not do. Nat. Neurosci. 18, 620–627 (2015).
Izquierdo, A., Suda, R. K. & Murray, E. A. Bilateral orbital prefrontal cortex lesions in rhesus monkeys disrupt choices guided by both reward value and reward contingency. J. Neurosci. 24, 7540–7548 (2004).
Chudasama, Y. & Robbins, T. W. Dissociable contributions of the orbitofrontal and infralimbic cortex to pavlovian autoshaping and discrimination reversal learning: further evidence for the functional heterogeneity of the rodent frontal cortex. J. Neurosci. 23, 8771–8780 (2003).
Walton, M. E., Behrens, T. E., Buckley, M. J., Rudebeck, P. H. & Rushworth, M. F. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron 65, 927–939 (2010).
Tsuchida, A., Doll, B. B. & Fellows, L. K. Beyond reversal: a critical role for human orbitofrontal cortex in flexible learning from probabilistic feedback. J. Neurosci. 30, 16868–16875 (2010).
Lak, A. et al. Orbitofrontal cortex is required for optimal waiting based on decision confidence. Neuron 84, 190–201 (2014).
Takahashi, Y. K. et al. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nat. Neurosci. 14, 1590–1597 (2011).
Blanchard, T. C., Hayden, B. Y. & Bromberg-Martin, E. S. Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity. Neuron 85, 602–614 (2015).
Stalnaker, T. A. et al. Orbitofrontal neurons infer the value and identity of predicted outcomes. Nat. Commun. 5, 3926 (2014).
Farovik, A. et al. Orbitofrontal cortex encodes memories within value-based schemas and represents contexts that guide memory retrieval. J. Neurosci. 35, 8333–8344 (2015).
Zhou, J. et al. Rat orbitofrontal ensemble activity contains multiplexed but dissociable representations of value and task structure in an odor sequence task. Curr. Biol. 29, 897–907.e3 (2019).
Howard, J. D., Gottfried, J. A., Tobler, P. N. & Kahnt, T. Identity-specific coding of future rewards in the human orbitofrontal cortex. Proc. Natl Acad. Sci. USA 112, 5195–5200 (2015).
Chan, S. C., Niv, Y. & Norman, K. A. A probability distribution over latent causes, in the orbitofrontal cortex. J. Neurosci. 36, 7817–7828 (2016).
Hampton, A. N., Bossaerts, P. & O’Doherty, J. P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).
Takahashi, Y. K., Langdon, A. J., Niv, Y. & Schoenbaum, G. Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 91, 182–193 (2016).
Bradfield, L. A., Dezfouli, A., van Holstein, M., Chieng, B. & Balleine, B. W. Medial orbitofrontal cortex mediates outcome retrieval in partially observable task situations. Neuron 88, 1268–1280 (2015).
Takahashi, Y. K., Stalnaker, T. A., Roesch, M. R. & Schoenbaum, G. Effects of inference on dopaminergic prediction errors depend on orbitofrontal processing. Behav. Neurosci. 131, 127–134 (2017).
Schuck, N. W. & Niv, Y. Sequential replay of nonspatial task states in the human hippocampus. Science 364, eaaw5181 (2019).
Sharpe, M. J. et al. An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annu. Rev. Psychol. 70, 53–76 (2019).
Foa, E. B. & Kozak, M. J. Emotional processing of fear: exposure to corrective information. Psychol. Bull. 99, 20–35 (1986).
Wallis, J. D. Orbitofrontal cortex and its contribution to decision-making. Annu. Rev. Neurosci. 30, 31–56 (2007).
Schoenbaum, G., Setlow, B. & Ramus, S. J. A systems approach to orbitofrontal cortex function: recordings in rat orbitofrontal cortex reveal interactions with different learning systems. Behav. Brain Res. 146, 19–29 (2003).
Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Padoa-Schioppa, C. Neurobiology of economic choice: a good-based model. Annu. Rev. Neurosci. 34, 333–359 (2011).
Plassmann, H., O’Doherty, J. & Rangel, A. Orbitofrontal cortex encodes willingness to pay in everyday economic transactions. J. Neurosci. 27, 9984–9988 (2007).
McNamee, D., Rangel, A. & O’Doherty, J. P. Category-dependent and category-independent goal-value codes in human ventromedial prefrontal cortex. Nat. Neurosci. 16, 479–485 (2013).
Nosofsky, R. M. Attention, similarity, and the identification-categorization relationship. J. Exp. Psychol. Gen. 115, 39–61 (1986).
Summerfield, C. & de Lange, F. P. Expectation in perceptual decision making: neural and computational mechanisms. Nat. Rev. Neurosci. 15, 745–756 (2014).
Colgin, L. L., Moser, E. I. & Moser, M. B. Understanding memory through hippocampal remapping. Trends Neurosci. 31, 469–477 (2008).
Leutgeb, J. K. et al. Progressive transformation of hippocampal neuronal representations in “morphed” environments. Neuron 48, 345–358 (2005).
Acknowledgements
I am grateful to my lab members, past and present, for their creative, methodical and incredibly revealing work on representation learning in the brain. I thank A. Langdon, N. Rouhani, and J. Zarate for helpful comments on a previous draft. This work was funded by grant W911NF-14-1-0101 from the Army Research Office and grant R01DA042065 from the National Institute on Drug Abuse.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Niv, Y. Learning task-state representations. Nat Neurosci 22, 1544–1553 (2019). https://doi.org/10.1038/s41593-019-0470-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41593-019-0470-8
This article is cited by
-
Placebo treatment affects brain systems related to affective and cognitive processes, but not nociceptive pain
Nature Communications (2024)
-
Reconciling shared versus context-specific information in a neural network model of latent causes
Scientific Reports (2024)
-
The ventromedial prefrontal cortex in response to threat omission is associated with subsequent explicit safety memory
Scientific Reports (2024)
-
Nicotine-related beliefs induce dose-dependent responses in the human brain
Nature Mental Health (2024)
-
Forming cognitive maps for abstract spaces: the roles of the human hippocampus and orbitofrontal cortex
Communications Biology (2024)