Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Cortical substrates for exploratory decisions in humans


Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this ‘exploration–exploitation’ dilemma1, a gambler choosing between multiple slot machines balances the desire to select what seems, on the basis of accumulated experience, the richest option, against the desire to choose a less familiar option that might turn out more advantageous (and thereby provide information for improving future decisions). Far from representing idle curiosity, such exploration is often critical for organisms to discover how best to harvest resources such as food and water. In appetitive choice, substantial experimental evidence, underpinned by computational reinforcement learning2 (RL) theory, indicates that a dopaminergic3,4, striatal5,6,7,8,9 and medial prefrontal network mediates learning to exploit. In contrast, although exploration has been well studied from both theoretical1 and ethological10 perspectives, its neural substrates are much less clear. Here we show, in a gambling task, that human subjects' choices can be characterized by a computationally well-regarded strategy for addressing the explore/exploit dilemma. Furthermore, using this characterization to classify decisions as exploratory or exploitative, we employ functional magnetic resonance imaging to show that the frontopolar cortex and intraparietal sulcus are preferentially active during exploratory decisions. In contrast, regions of striatum and ventromedial prefrontal cortex exhibit activity characteristic of an involvement in value-based exploitative decision making. The results suggest a model of action selection under uncertainty that involves switching between exploratory and exploitative behavioural modes, and provide a computationally precise characterization of the contribution of key decision-related brain systems to each of these functions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Task design.
Figure 2: Reward-related activations.
Figure 3: Exploration-related activity in frontopolar cortex.
Figure 4: Exploration-related activity in intraparietal sulcus.

Similar content being viewed by others


  1. Gittins, J. C. & Jones, D. in Progress in Statistics (ed. Gani, J.) 241–266 (North-Holland, Amsterdam, 1974)

    Google Scholar 

  2. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, Massachusetts, 1998)

    MATH  Google Scholar 

  3. Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Delgado, M. R., Nystrom, L. E., Fissell, C., Noll, D. C. & Fiez, J. A. Tracking the hemodynamic responses to reward and punishment in the striatum. J. Neurophysiol. 84, 3072–3077 (2000)

    Article  CAS  PubMed  Google Scholar 

  6. Knutson, B., Westdorp, A., Kaiser, E. & Hommer, D. fMRI visualization of brain activity during a monetary incentive delay task. Neuroimage 12, 20–27 (2000)

    Article  CAS  PubMed  Google Scholar 

  7. McClure, S. M., Berns, G. S. & Montague, P. R. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38, 339–346 (2003)

    Article  CAS  PubMed  Google Scholar 

  8. O'Doherty, J. P., Dayan, P., Friston, K., Critchley, H. & Dolan, R. J. Temporal difference models and reward-related learning in the human brain. Neuron 38, 329–337 (2003)

    Article  CAS  PubMed  Google Scholar 

  9. O'Doherty, J. P. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004)

    Article  ADS  CAS  PubMed  Google Scholar 

  10. Charnov, E. L. Optimal foraging: The marginal value theorem. Theor. Popul. Biol. 9, 129–136 (1976)

    Article  CAS  PubMed  Google Scholar 

  11. Owen, A. M. Cognitive planning in humans: Neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog. Neurobiol. 53, 431–450 (1997)

    Article  CAS  PubMed  Google Scholar 

  12. Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioural control. Nature Neurosci. 8, 1704–1711 (2005)

    Article  CAS  PubMed  Google Scholar 

  13. Kakade, S. & Dayan, P. Dopamine: Generalization and bonuses. Neural Netw. 15, 549–559 (2002)

    Article  PubMed  Google Scholar 

  14. Kaelbling, L. P. Learning in Embedded Systems (MIT Press, Cambridge, Massachusetts, 1993)

    Google Scholar 

  15. McClure, S. M., Laibson, D. I., Loewenstein, G. & Cohen, J. D. Separate neural systems value immediate and delayed monetary rewards. Science 306, 503–507 (2004)

    Article  ADS  CAS  PubMed  Google Scholar 

  16. O'Doherty, J., Kringelbach, M. L., Rolls, E. T., Hornak, J. & Andrews, C. Abstract reward and punishment representations in the human orbitofrontal cortex. Nature Neurosci. 4, 95–102 (2001)

    Article  CAS  PubMed  Google Scholar 

  17. O'Doherty, J. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Curr. Opin. Neurobiol. 14, 769–776 (2004)

    Article  CAS  PubMed  Google Scholar 

  18. Gottfried, J. A., O'Doherty, J. & Dolan, R. J. Encoding predictive reward value in human amygdala and orbitofrontal cortex. Science 301, 1104–1107 (2003)

    Article  ADS  CAS  PubMed  Google Scholar 

  19. Tanaka, S. C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neurosci. 7, 887–893 (2004)

    Article  CAS  PubMed  Google Scholar 

  20. Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001)

    Article  CAS  PubMed  Google Scholar 

  21. Ramnani, N. & Owen, A. M. Anterior prefrontal cortex: Insights into function from anatomy and neuroimaging. Nature Rev. Neurosci. 5, 184–194 (2004)

    Article  CAS  Google Scholar 

  22. Koechlin, E., Ody, C. & Kouneiher, F. A. The architecture of cognitive control in the human prefrontal cortex. Science 302, 1181–1185 (2003)

    Article  ADS  CAS  PubMed  Google Scholar 

  23. Braver, T. S. & Bongiolatti, S. R. The role of frontopolar cortex in subgoal processing during working memory. Neuroimage 15, 523–536 (2002)

    Article  PubMed  Google Scholar 

  24. Platt, M. L. & Glimcher, P. W. Neural correlates of decision variables in parietal cortex. Nature 400, 233–238 (1999)

    Article  ADS  CAS  PubMed  Google Scholar 

  25. Sugrue, L. P., Corrado, G. S. & Newsome, W. T. Matching behaviour and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004)

    Article  ADS  CAS  PubMed  Google Scholar 

  26. Dorris, M. C. & Glimcher, P. W. Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron 44, 365–378 (2004)

    Article  CAS  PubMed  Google Scholar 

  27. Grefkes, C. & Fink, G. R. The functional organization of the intraparietal sulcus in humans and monkeys. J. Anat. 207, 3–17 (2005)

    Article  PubMed  PubMed Central  Google Scholar 

  28. Burgess, P. W., Veitch, E., de Lacy Costello, A. & Shallice, T. The cognitive and neuroanatomical correlates of multitasking. Neuropsychologia 38, 848–863 (2000)

    Article  CAS  PubMed  Google Scholar 

  29. Usher, M., Cohen, J. D., Servan-Schreiber, D., Rajkowski, J. & Aston-Jones, G. The role of locus coeruleus in the regulation of cognitive performance. Science 283, 549–554 (1999)

    Article  ADS  CAS  PubMed  Google Scholar 

  30. Doya, K. Metalearning and neuromodulation. Neural Netw. 15, 495–506 (2002)

    Article  PubMed  Google Scholar 

Download references


We thank J. Li, S. McClure, B. King-Casas and P. R. Montague for sharing their unpublished data on exploration, and Y. Niv, Z. Gharamani and C. Camerer for discussions. Funding was from a Royal Society USA Research Fellowship (N.D.), the Gatsby Foundation (N.D., P.D.), the EU BIBA project (N.D., P.D.), and a Wellcome Trust Programme Grant (J.O.D., R.D.).

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Nathaniel D. Daw or John P. O'Doherty.

Ethics declarations

Competing interests

Reprints and permissions information is available at The authors declare no competing financial interests.

Supplementary information

Supplementary Notes

This file contains Supplementary Methods, Supplementary Discussion and Supplementary Tables 1–5. (PDF 371 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daw, N., O'Doherty, J., Dayan, P. et al. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing