Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Sources of suboptimality in a minimalistic explore–exploit task

A Publisher Correction to this article was published on 11 March 2019

This article has been updated

Abstract

People often choose between sticking with an available good option (exploitation) and trying out a new option that is uncertain but potentially more rewarding (exploration)1,2. Laboratory studies on explore–exploit decisions often contain real-world complexities such as non-stationary environments, stochasticity under exploitation and unknown reward distributions3,4,5,6,7. However, such factors might limit the researcher’s ability to understand the essence of people’s explore–exploit decisions. For this reason, we introduce a minimalistic task in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behaviour of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a temporal ratio law. Finally, we find evidence for ‘sequence-level’ variability that is shared across all decisions in the same sequence. Our results emphasize the importance of examining sequence-level strategies and their variability when studying sequential decision-making.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Experimental design, optimal policy and summary statistics.
Fig. 2: Fits of the Opt model to selected summary statistics.
Fig. 3: Evidence of a threshold rule depending on the proportion of days left.
Fig. 4: Sequence-level variability as implemented through variable-threshold models (Num-V and Prop-V).

Code availability

All experimental and analysis codes used in this paper are available at https://github.com/mingyus/explore-exploit.

Data availability

All data that support the findings of this paper are available at https://github.com/mingyus/explore-exploit.

Change history

  • 11 March 2019

    The original and corrected figures and equations are shown in the accompanying Publisher Correction.

References

  1. 1.

    Cohen, J. D., McClure, S. M. & Angela, J. Yu Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Phil. Trans. R. Soc. Lond. B 362, 933–942 (2007).

    Article  Google Scholar 

  2. 2.

    Mehlhorn, K. et al. Unpacking the exploration–exploitation tradeoff: a synthesis of human and animal literatures. Decision 2, 191–215 (2015).

    Article  Google Scholar 

  3. 3.

    Acuna. D. & Schrater. P. Bayesian modeling of human sequential decision-making on the multi-armed bandit problem. In Proc. 30th Annual Conference of the Cognitive Science Society 2065–2070 (Cognitive Science Society, 2008).

  4. 4.

    Constantino, S. M. & Daw, N. D. Learning the opportunity cost of time in a patch-foraging task. Cogn. Affect. Behav. Neurosci. 15, 837–853 (2015).

    Article  Google Scholar 

  5. 5.

    Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).

    CAS  Article  Google Scholar 

  6. 6.

    Knox, W. B., Otto, A. R., Stone, P. & Love, B. The nature of belief-directed exploratory choice in human decision-making. Front. Psychol. 2, 398 (2012).

    Article  Google Scholar 

  7. 7.

    Steyvers, M., Lee, M. D. & Wagenmakers, E.-J. A Bayesian analysis of human decision-making on bandit problems. J. Math. Psychol. 53, 168–179 (2009).

    Article  Google Scholar 

  8. 8.

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, 1998).

  9. 9.

    Seale, D. A. & Rapoport, A. Optimal stopping behavior with relative ranks: the secretary problem with unknown population size. J. Behav. Decis. Mak. 13, 391–411 (2000).

    Article  Google Scholar 

  10. 10.

    Bellman, R. Dynamic Programming 1st edn (Princeton Univ. Press, Princeton, 1957).

  11. 11.

    Lee, M. D., Zhang, S., Munro, M. & Steyvers, M. Psychological models of human and optimal performance in bandit problems. Cogn. Syst. Res. 12, 164–174 (2011).

    Article  Google Scholar 

  12. 12.

    McFadden, D. et al. in Frontiers in Econometrics (ed. Zarembka, P.) 105–142 (Academic Press, New York, 1973).

  13. 13.

    Gigerenzer, G. & Gaissmaier, W. Heuristic decision making. Annu. Rev. Psychol. 62, 451–482 (2011).

    Article  Google Scholar 

  14. 14.

    Simon, H. A. Rational choice and the structure of the environment. Psychol. Rev. 63, 129–138 (1956).

    CAS  Article  Google Scholar 

  15. 15.

    Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723 (1974).

    Article  Google Scholar 

  16. 16.

    Cavanaugh, J. E. et al. Unifying the derivations for the Akaike and corrected Akaike information criteria. Stat. Probabil. Lett. 33, 201–208 (1997).

    Article  Google Scholar 

  17. 17.

    Schwarz, G. et al. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).

    Article  Google Scholar 

  18. 18.

    Kello, C. T. et al. Scaling laws in cognitive sciences. Trends Cogn. Sci. 14, 223–232 (2010).

    Article  Google Scholar 

  19. 19.

    Rigoux, L., Stephan, K. E., Friston, K. J. & Daunizeau, J. Bayesian model selection for group studies—revisited. NeuroImage 84, 971–985 (2014).

    CAS  Article  Google Scholar 

  20. 20.

    Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J. & Friston, K. J. Bayesian model selection for group studies. NeuroImage 46, 1004–1017 (2009).

    Article  Google Scholar 

  21. 21.

    Lau, B. & Glimcher, P. W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).

    Article  Google Scholar 

  22. 22.

    Ito, M. & Doya, K. Validation of decision-making models and analysis of decision variables in the rat basal ganglia. J. Neurosci. 29, 9861–9874 (2009).

    CAS  Article  Google Scholar 

  23. 23.

    Boehner, P. Ockham: Philosophical Writings (Nelson, Canada, 1957).

  24. 24.

    Chater, N. & Vitányi, P. Simplicity: a unifying principle in cognitive science? Trends Cogn. Sci. 7, 19–22 (2003).

    Article  Google Scholar 

  25. 25.

    Buhusi, C. V. & Meck, W. H. What makes us tick? Functional and neural mechanisms of interval timing. Nat. Rev. Neurosci. 6, 755–765 (2005).

    CAS  Article  Google Scholar 

  26. 26.

    Gibbon, J. Scalar expectancy theory and Weber’s law in animal timing. Psychol. Rev. 84, 279–325 (1977).

    Article  Google Scholar 

  27. 27.

    Brown, G. D. A., Neath, I. & Chater, N. A temporal ratio model of memory. Psychol. Rev. 114, 539–576 (2007).

    Article  Google Scholar 

  28. 28.

    Robbins, H. Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58, 527–535 (1952).

    Article  Google Scholar 

  29. 29.

    Charnov, E. Optimal foraging: the marginal value theorem. Theor. Popul. Biol. 9, 129–136 (1976).

    CAS  Article  Google Scholar 

  30. 30.

    Seale, D. A. & Rapoport, A. Sequential decision making with relative ranks: an experimental investigation of the “secretary problem”. Organ. Behav. Hum. Decis. Process. 69, 221–236 (1997).

    Article  Google Scholar 

  31. 31.

    Van Opheusden, B., Galbiati, G., Bnaya, Z., Li, Y. & Ma, W. J. A computational model for decision tree search. (2017). In Proc. 39th Annual Conference of the Cognitive Science Society 1254–1259 (Cognitive Science Society, 2017).

  32. 32.

    MacGregor, J. N. & Ormerod, T. Human performance on the traveling salesman problem. Percept. Psychophys. 58, 527–539 (1996).

    CAS  Article  Google Scholar 

  33. 33.

    Sang, K. Modeling Exploration/Exploitation Behavior and the Effect of Individual Differences. PhD thesis, Indiana Univ. (2017).

  34. 34.

    Sang, K., Todd, P. & Goldstone, R. Learning near-optimal search in a minimal explore/exploit task. In Proc. 33rd Annual Conference of the Cognitive Science Society 2800–2805 (Cognitive Science Society, 2011).

  35. 35.

    Sang, K., Todd, P. M., Goldstone, R. & Hills, T. T. Explore/exploit tradeoff strategies in a resource accumulation search task. Preprint at https://psyarxiv.com/zw3s8 (2018).

  36. 36.

    Hills, T. T., Todd, P. M. & Goldstone, R. L. The central executive as a search process: priming exploration and exploitation across domains. J. Exp. Psychol. Gen. 139, 590–609 (2010).

    Article  Google Scholar 

  37. 37.

    Navarro, D. J., Newell, B. R. & Schulze, C. Learning and choosing in an uncertain world: an investigation of the explore–exploit dilemma in static and dynamic environments. Cogn. Psychol. 85, 43–77 (2016).

    Article  Google Scholar 

  38. 38.

    Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A. & Cohen, J. D. Humans use directed and random exploration to solve the explore–exploit dilemma. J. Exp. Psychol. Gen. 143, 2074–2081 (2014).

    Article  Google Scholar 

  39. 39.

    Stoll, F. M., Fontanier, V. & Procyk, E. Specific frontal neural dynamics contribute to decisions to check. Nat. Commun. 7, 11990 (2016).

    CAS  Article  Google Scholar 

  40. 40.

    Kolling, N., Wittmann, M. & Rushworth, M. F. S. Multiple neural mechanisms of decision making and their competition under changing risk pressure. Neuron 81, 1190–1202 (2014).

    CAS  Article  Google Scholar 

  41. 41.

    Mai, J.-E. Looking for Information: A Survey of Research on Information Seeking, Needs, and Behavior (Emerald Group Publishing, UK, 2016).

  42. 42.

    Badre, D., Doll, B. B., Long, N. M. & Frank, M. J. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron 73, 595–607 (2012).

    CAS  Article  Google Scholar 

  43. 43.

    Boorman, E. D., Behrens, T. E. J., Woolrich, M. W. & Rushworth, M. F. S. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62, 733–743 (2009).

    CAS  Article  Google Scholar 

  44. 44.

    Barraclough, D. J., Conroy, M. L. & Lee, D. Prefrontal cortex and decision making in a mixed-strategy game. Nat. Neurosci. 7, 404–410 (2004).

    CAS  Article  Google Scholar 

  45. 45.

    Miller, E. K. & Cohen, J. D. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001).

    CAS  Article  Google Scholar 

  46. 46.

    Wallis, J. D. & Miller, E. K. Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task. Eur. J. Neurosci. 18, 2069–2081 (2003).

    Article  Google Scholar 

  47. 47.

    Watanabe, M. Reward expectancy in primate prefrontal neurons. Nature 382, 629–632 (1996).

    CAS  Article  Google Scholar 

  48. 48.

    Rich, A. S. & Gureckis, T. M. Exploratory choice reflects the future value of information.Decision 5, 177–192 (2018).

    Article  Google Scholar 

  49. 49.

    Gureckis, T. M. et al. psiTurk: an open-source framework for conducting replicable behavioral experiments online. Behav. Res. Methods 48, 829–842 (2016).

    Article  Google Scholar 

  50. 50.

    Glimcher, P. & Fehr, E. Neuroeconomics 2nd edn (Academic Press, 2014).

Download references

Acknowledgements

The authors thank R. Polonia for helpful comments on the manuscript, and people in W.J.M.’s laboratory for helpful discussions. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

All authors designed the study, developed the models, interpreted the results, and wrote the paper. M.S. and Z.B. collected the data and performed the analyses.

Corresponding author

Correspondence to Wei Ji Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figures 1–16, Supplementary Tables 1–3, Supplementary Methods 1–3, Supplementary Results 1 and 2, and Supplementary References

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Song, M., Bnaya, Z. & Ma, W.J. Sources of suboptimality in a minimalistic explore–exploit task. Nat Hum Behav 3, 361–368 (2019). https://doi.org/10.1038/s41562-018-0526-x

Download citation

Search

Quick links