Expertise increases planning depth in human gameplay

van Opheusden, Bas; Kuperwajs, Ionatan; Galbiati, Gianni; Bnaya, Zahy; Li, Yunqi; Ma, Wei Ji

doi:10.1038/s41586-023-06124-2

Article
Published: 31 May 2023

Expertise increases planning depth in human gameplay

Nature volume 618, pages 1000–1005 (2023)Cite this article

11k Accesses
5 Citations
331 Altmetric
Metrics details

Subjects

Abstract

A hallmark of human intelligence is the ability to plan multiple steps into the future^1,2. Despite decades of research^3,4,5, it is still debated whether skilled decision-makers plan more steps ahead than novices^6,7,8. Traditionally, the study of expertise in planning has used board games such as chess, but the complexity of these games poses a barrier to quantitative estimates of planning depth. Conversely, common planning tasks in cognitive science often have a lower complexity^9,10 and impose a ceiling for the depth to which any player can plan. Here we investigate expertise in a complex board game that offers ample opportunity for skilled players to plan deeply. We use model fitting methods to show that human behaviour can be captured using a computational cognitive model based on heuristic search. To validate this model, we predict human choices, response times and eye movements. We also perform a Turing test and a reconstruction experiment. Using the model, we find robust evidence for increased planning depth with expertise in both laboratory and large-scale mobile data. Experts memorize and reconstruct board features more accurately. Using complex tasks combined with precise behavioural modelling might expand our understanding of human planning and help to bridge the gap with progress in artificial intelligence.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Task and computational model.**

**Fig. 2: The model accounts for multivariate data and generalizes to unseen data.**

**Fig. 3: The effects of expertise and time pressure on planning.**

**Fig. 4: The effects of expertise on planning in mobile data.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Maximum diffusion reinforcement learning

Article 02 May 2024

Machine learning reveals the control mechanics of an insect wing hinge

Article 17 April 2024

Data availability

Data supporting the findings of this study are publicly available at the Open Science Framework (https://osf.io/n2xjm/).

Code availability

Code used in this study is publicly available at the Open Science Framework (https://osf.io/n2xjm/).

References

Miller, K. J. & Venditto, S. J. C. Multi-step planning in the brain. Curr. Opin. Behav. Sci. 38, 29–39 (2021).
Article Google Scholar
Mattar, M. G. & Lengyel, M. Planning in the brain. Neuron 110, 914–934 (2022).
Article CAS PubMed Google Scholar
de Groot, A. D. Het Denken van den Sckaken (Noord-Holland. Uitgev. Maatschappij, 1946).
Charness, N. in Toward a General Theory of Expertise: Prospects and Limits (eds Anders, E. K. & Smith, J.) 39–63 (Cambridge University Press, 1991).
Holding, D. H. Theories of chess skill. Psychol. Res. 54, 10–16 (1992).
Article Google Scholar
Gobet, F. A pattern-recognition theory of search in expert problem solving. Think. Reasoning 3, 291–313 (1997).
Article Google Scholar
Campitelli, G. & Gobet, F. Adaptive expert decision making: Skilled chess players search more and deeper. J. Int. Comput. Games Assoc. 27, 209–216 (2004).
Linhares, A., Freitas, A. E. T., Mendes, A. & Silva, J. S. Entanglement of perception and reasoning in the combinatorial game of chess: differential errors of strategic reconstruction. Cogn. Syst. Res. 13, 72–86 (2012).
Article Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Article CAS PubMed PubMed Central Google Scholar
Huys, Q. J. et al. Bonsai trees in your head: how the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Comput. Biol. 8, e1002410 (2012).
Article CAS PubMed PubMed Central MathSciNet Google Scholar
Chase, W. G. & Simon, H. A. Perception in chess. Cogn. Psychol. 4, 55–81 (1973).
Article Google Scholar
Van Harreveld, F., Wagenmakers, E.-J. & Van Der Maas, H. L. The effects of time pressure on chess skill: an investigation into fast and slow processes underlying expert performance. Psychol. Res. 71, 591–597 (2007).
Article PubMed Google Scholar
Sheridan, H. & Reingold, E. M. Chess players’ eye movements reveal rapid recognition of complex visual patterns: evidence from a chess-related visual search task. J. Vis. 17, 4 (2017).
Article PubMed Google Scholar
Gobet, F. & Simon, H. A. Expert chess memory: revisiting the chunking hypothesis. Memory 6, 225–255 (1998).
Article CAS PubMed Google Scholar
Bilalić, M., Langner, R., Erb, M. & Grodd, W. Mechanisms and neural basis of object and pattern recognition: a study with chess experts. J. Exp. Psychol. Gen. 139, 728–742 (2010).
Article PubMed Google Scholar
Saariluoma, P. Visuospatial and articulatory interference in chess players’ information intake. Appl. Cogn. Psychol. 6, 77–89 (1992).
Article Google Scholar
Holding, D. H. The Psychology of Chess Skill (Lawrence Erlbaum, 1985).
Holding, D. H. Evaluation factors in human tree search. Am. J. Psychol. 102, 103–108 (1989).
Article Google Scholar
Gobet, F. & Jansen, P. Towards a chess program based on a model of human memory. Adv. Comput. Chess 7, 35–60 (1994).
Google Scholar
Holding, D. H. Counting backward during chess move choice. Bull. Psychon. Soc. 27, 421–424 (1989).
Article Google Scholar
Charness, N. in Complex Information Processing 203–228 (Psychology Press, 2013).
Huys, Q. J. et al. Interplay of approximate planning strategies. Proc. Natl Acad. Sci. USA 112, 3098–3103 (2015).
Article CAS PubMed PubMed Central ADS Google Scholar
Snider, J., Lee, D., Poizner, H. & Gepshtein, S. Prospective optimization with limited resources. PLoS Comput. Biol. 11, e1004501 (2015).
Article PubMed PubMed Central ADS Google Scholar
Kolling, N., Scholl, J., Chekroud, A., Trier, H. A. & Rushworth, M. F. Prospection, perseverance, and insight in sequential behavior. Neuron 99, 1069–1082 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Article CAS PubMed PubMed Central ADS Google Scholar
Redish, A. D. Vicarious trial and error. Nat. Rev. Neurosci. 17, 147–159 (2016).
Article CAS PubMed PubMed Central Google Scholar
Pezzulo, G., Donnarumma, F., Maisto, D. & Stoianov, I. Planning at decision time and in the background during spatial navigation. Curr. Opin. Behav. Sci. 29, 69–76 (2019).
Article Google Scholar
Miller, K. J., Botvinick, M. M. & Brody, C. D. Dorsal hippocampus contributes to model-based planning. Nat. Neurosci. 20, 1269 (2017).
Article CAS PubMed PubMed Central Google Scholar
Groman, S. M., Rich, K. M., Smith, N. J., Lee, D. & Taylor, J. R. Chronic exposure to methamphetamine disrupts reinforcement-based decision making in rats. Neuropsychopharmacology 43, 770–780 (2018).
Article CAS PubMed Google Scholar
Akam, T. et al. The anterior cingulate cortex predicts future states to mediate model-based action selection. Neuron 109, 149–163 (2020).
Beck, J. Combinatorial Games: Tic-Tac-Toe Theory Vol. 114 (Cambridge Univ. Press, 2008).
van Opheusden, B. & Ma, W. J. Tasks for aligning human and machine planning. Curr. Opin. Behav. Sci. 29, 127–133 (2019).
Article Google Scholar
Pearl, J. Heuristics: Intelligent Search Strategies for Computer Problem Solving (Addison-Wesley Longman Publishing Co., Inc., 1984).
Bonet, B. & Geffner, H. Planning as heuristic search. Artif. Int. 129, 5–33 (2001).
Dechter, R. & Pearl, J. Generalized best-first search strategies and the optimality of A*. J. ACM 32, 505–536 (1985).
Article MATH MathSciNet Google Scholar
Callaway, F. et al. Rational use of cognitive resources in human planning. Nat. Hum. Behav. 6, 1112–1125 (2022).
Article PubMed Google Scholar
Treisman, A. M. & Gelade, G. A feature-integration theory of attention. Cogn. Psychol. 12, 97–136 (1980).
Article CAS PubMed Google Scholar
van Opheusden, B., Acerbi, L. & Ma, W. J. Unbiased and efficient log-likelihood estimation with inverse binomial sampling. PLOS Comput. Biol. 16, e1008483 (2020).
Article PubMed PubMed Central Google Scholar
Acerbi, L. & Ma, W. J. Practical Bayesian optimization for model fitting with Bayesian adaptive direct search. Proceedings of the 31st International Conference on Neural Information Processing Systems 1834–1844 (2017).
Turing, A. Computing machinery and intelligence. Mind 59, 433–460 (1950).
Article MathSciNet Google Scholar
Elo, A. E. The Rating of Chessplayers, Past and Present (Arco Pub., 1978).
Chabris, C. F. & Hearst, E. S. Visualization, pattern recognition, and forward search: Effects of playing speed and sight of the position on grandmaster chess errors. Cogn. Sci. 27, 637–648 (2003).
Article Google Scholar
Calderwood, R., Klein, G. A. & Crandall, B. W. Time pressure, skill, and move quality in chess. Am. J. Psychol. 101, 481–493 (1988).
Article Google Scholar
Krusche, M. J., Schulz, E., Guez, A. & Speekenbrink, M. Adaptive planning in human search. Preprint at BioRxiv https://doi.org/10.1101/268938 (2018).
Huang, J., Velarde, I., Ma, W. J. & Baldassano, C. Schema-based predictive eye movements support sequential memory encoding. eLife 12, e82599 (2023).
Article PubMed PubMed Central Google Scholar
Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L. & Efros, A. A. Investigating human priors for playing video games. In Proc. Intennational Conference of Machine Learning (ICML) (2018).
Charness, N., Tuffiash, M., Krampe, R., Reingold, E. & Vasyukova, E. The role of deliberate practice in chess expertise. Appl. Cogn. Psychol. 19, 151–165 (2005).
Article Google Scholar
Brown, N. & Sandholm, T. Superhuman AI for multiplayer poker. Science 365, 885–890 (2019).
Article CAS PubMed MATH ADS MathSciNet Google Scholar
Meta Fundamental AI Research Diplomacy Team (FAIR) et al.Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science 378, 1067–1074 (2022).
Article ADS MathSciNet Google Scholar
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362, 1140–1144 (2018).
Article CAS PubMed MATH ADS MathSciNet Google Scholar
Hamrick, J. B. et al. Combining q-learning and search with amortized value estimates. In Proc. International Conference on Learning Representations (ICLR) (2020).
Ma, I., Phaneuf, C., van Opheusden, B., Ma, W. J. & Hartley, C. The component processes of complex planning follow distinct developmental trajectories. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/d62rw (2022).
Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Article CAS PubMed PubMed Central ADS Google Scholar
Cornelissen, F. W., Peters, E. M. & Palmer, J. The eyelink toolbox: eye tracking with MATLAB and the psychophysics toolbox. Behav. Res. Methods Instr. Comput. 34, 613–617 (2002).
Article Google Scholar
Zermelo, E. Die berechnung der turnier-ergebnisse als ein maximumproblem der wahrscheinlichkeitsrechnung. Math. Z. 29, 436–460 (1929).
Article MATH MathSciNet Google Scholar
Hunter, D. R. MM algorithms for generalized Bradley-Terry models. Ann. Stat. 32, 384–406 (2004).
Article MATH MathSciNet Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction Vol. 1 (MIT Press, 1998).
Sutton, R. S., McAllester, D. A., Singh, S. P. & Mansour, Y. in Advances in Neural Information Processing Systems 1057–1063 (2000).
Dawson, R. Unbiased Tests, Unbiased Estimators, and Randomized Similar Regions. PhD thesis, Harvard Univ. (1953).
de Groot, M. H. Unbiased sequential estimation for binomial populations. Ann. Math. Stat. 30, 80–101 (1959).
Article MathSciNet Google Scholar
Huyer, W. & Neumaier, A. Global optimization by multilevel coordinate search. J. Glob. Optim. 14, 331–355 (1999).
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

We thank Z. Shu for piloting an early version of the experiment; F. Khalidi for assistance with data collection; and A. Mihali, A. Yoo, M. Honig, L. Acerbi, W. Adler, F. Callaway, T. Griffiths and M. Mattar, and the other current members and alumni of the Ma laboratory for discussions. This work was supported by grant number IIS-1344256 to W.J.M. and by Graduate Research Fellowship number DGE1839302 to I.K. from the National Science Foundation.

Author information

Gianni Galbiati
Present address: Vidrovr, New York, NY, USA

Authors and Affiliations

Center for Neural Science and Department of Psychology, New York University, New York, NY, USA
Bas van Opheusden, Ionatan Kuperwajs, Gianni Galbiati, Zahy Bnaya, Yunqi Li & Wei Ji Ma
Department of Computer Science, Princeton University, Princeton, NJ, USA
Bas van Opheusden

Authors

Bas van Opheusden
View author publications
You can also search for this author in PubMed Google Scholar
Ionatan Kuperwajs
View author publications
You can also search for this author in PubMed Google Scholar
Gianni Galbiati
View author publications
You can also search for this author in PubMed Google Scholar
Zahy Bnaya
View author publications
You can also search for this author in PubMed Google Scholar
Yunqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ji Ma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All of the authors contributed to conceptualization of the research. B.v.O., G.G., I.K. and Y.L. collected data. B.v.O., I.K., G.G., Y.L. and Z.B. developed software, methodology and performed analysis. B.v.O., I.K. and W.J.M. wrote the paper. W.J.M. supervised the project and acquired funding.

Corresponding author

Correspondence to Bas van Opheusden.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Quentin Huys and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Model comparison.

We validate our main model specification by comparing to alternatives in three categories: lesions generated by removing model components (red), extensions generated by adding new model components (blue) and modifications generated by replacing a model component with a similar implementation (green). A. Cross-validated log-likelihood per move, across all participants in the laboratory experiments. Error bars indicate mean and s.e.m. of the difference in log-likelihood with the main model. B–F. Same as A., for participants in the human-vs-human, generalization, eye tracking, learning and time pressure experiments.

Extended Data Fig. 2 Parameter validation.

Because model fitting is too computationally expense for parameter recovery, we assess the reliability of the parameter estimates using less computationally expensive methods. A. Pearson correlation across participants between model parameters estimated in two independent fits. Error bars indicate the confidence interval. B. Same as A., for different sessions in the learning experiment. Error bars indicate s.e.m. across participants C-D. Same as A-B., for the derived metrics. E. 2-sample Kolmogorov-Smirnov test statistic between the distribution of \({\hat{\theta }}_{j}^{{\rm{lesion}}\,i}\) and \({\hat{\theta }}_{j}^{{\rm{full}}}\) for each pair of parameters. In all panels, we indicate tests that are significant after correcting for multiple comparisons using false discovery rate by ^*: α = 0.05, ^**: α = 0.01, ^***: α = 0.001. For significant tests, we additionally report uncorrected two-sided p-values. F. Trade-offs between model parameters using a Pearson correlation between \({\hat{\theta }}_{i}^{{\rm{full}}}\) and \({\hat{\theta }}_{j}^{{\rm{full}}}-{\hat{\theta }}_{j}^{{\rm{lesion}}\,i}\) for each pair of model parameters. G-H. Same as E-F., for the derived metrics.

Extended Data Fig. 3 Summary statistics.

Comparing our main model directly to human choices is challenging because the data is high-dimensional and discrete. Instead, we compute summary statistics as a function of number of pieces on the board, to probe for systematic patterns in the time course of people’s games, such as a tendency to start playing near the centre of the board and gradually expand outwards. We compare moves made in human-vs-human games (green solid lines), the behavioural model with inferred parameters on the same positions (blue solid lines) or random moves (black dashed lines). For all summary statistics, people deviate considerably from random, and the main model closely matches the human data. All panels depict cross-validated predictions.

Extended Data Fig. 4 Individual differences across summary statistics.

Each panel shows a scatterplot for the same set of summary statistics as in Extended Data Fig. 3, where each point represents a participant in the human-vs-human experiment, the horizontal coordinate the statistic computed on that participant’s moves, and the vertical coordinate the statistic computed on moves made by the model, with parameters inferred for that participant on out-of-sample choices. The Pearson correlation coefficient and two-sided p-value are reported within each panel. The model accurately predicts individual differences between participants.

Extended Data Fig. 5 Example board positions illustrating model components.

To investigate which patterns in the data are explained by tree search and feature dropping, we compare the distribution of choices predicted by the main model against lesion models. A. Example positions from human-vs-human games in which the model with (right column) and without tree search (left column) make highly different predictions (red shade), as quantified by Jensen-Shannon divergence. In each position, we also show the models’ preferred move (with an x) and the move made by the human participant (open circle). These predictions are averaged across simulations with 200 different parameter vectors from fits to human data, to capture positions with robust differences between planning and no planning. Upon inspection, we recognize these positions as ones where the player to move has multiple reasonable options, but to evaluate their quality one has to calculate many moves ahead. For example, in the second position, the move preferred by the No tree model is losing and the one by the main model is drawn, but this relies on a specific 10-move forced sequence that can only be found through explicit search. B. Same as A., but lesioning the feature drop metric, and using the ratio of the predicted probability of the human move as metric for selecting positions. The feature drop mechanism is primarily necessary to account for people’s tendency to overlook possibilities to immediately make four-in-a-row, or block immediate four-in-a-row threats by the opponent.

Extended Data Fig. 6 Turing test.

In the Turing test, we showed participants video segments of sequences of moves, on average 9.38 moves long. A. Classification accuracy in the Turing test as a function of video length. Error bars indicate s.e.m. Participants are at chance level for classification of one-move videos (of which there were 8), and their accuracy only substantially exceeds 50% for sequences longer than 10 moves. A mixed effects linear regression with accuracy as dependent variable and observer-specific random intercepts estimates the increase in accuracy per observed move as only 0.33 ± 0.10%. B. Histogram of the percentage of observers classifying a given video as human-vs-human or computer-vs-computer, for either human games (pink), or computer-generated games (grey). While human games are on average more likely to be classified as human and computer games as computers, there are no videos for which all 30 observers agree, and there is a considerable fraction of videos (63 out of 180) for which a majority of observers respond incorrectly.

Extended Data Fig. 7 Eye tracking.

A. Coefficients in a linear regression predicting participants’ attentional distribution from the distribution of squares that the model includes in its principal variation at each depth. The regression coefficients are significantly greater than zero (one-sample T-test across participants) for depth up to 7, and highest for depth closer to 1. Error bars indicate s.e.m. across participants. B. Example positions from the eye tracking data in which the No feature drop model assigns low probability to the participant’s move. The right column shows the eye movements while the participant contemplates their move. In most positions, the participant spends no time whatsoever looking at the square preferred by the model, suggesting they indeed dropped the relevant four-in-a-row feature.

Extended Data Fig. 8 Playing strength correlations and response times.

A. Planning depth vs Elo rating of all participants in the learning (green) and time pressure experiments (purple). Playing strength correlates with planning depth (ρ = 0.62, p < 0.001). B. Same as A., for feature drop rate (ρ = −0.73, p < 0.001). C. Same as A., for heuristic quality, which does correlate with playing strength (ρ = 0.11, p = 0.088). C. Response times for participants in each session of the learning experiment. Error bars indicate s.e.m. across participants. Participants play slightly faster in later sessions. Therefore, our finding of increased planning in later sessions is not confounded by an increase in thinking time. Instead, people plan more while using less time. D. Same as C., for the time pressure experiment. The time limit manipulation is effective at increasing participants’ response times, even though they use only a fraction of the available time on average.

Extended Data Fig. 9 Memory and reconstruction experiment.

A. Error rates in the memory and reconstruction experiment. Although experts are slightly worse than novices in the extra piece error rate (β = 0.0071 ± 0.0031, p = 0.049), experts substantially outperform novices in the missed piece (β = 0.037 ± 0.006, p < 0.001) and the wrong colour rate (β = 0.019 ± 0.003, p < 0.001). B. Scatterplot of total reconstruction time for experts and novices. Each point represents a board position in the memory in reconstruction experiment, the x-coordinate the average time that experts take to finish their reconstruction, and the y-coordinate the same but for novices. Positions from games are coloured pink, randomly scrambled positions in grey. Experts take more time to reconstruct pieces (β = 2.73 ± 0.57, p < 0.001), meaning that the error rate result could reflect a speed-accuracy trade-off as opposed to an overall improvement. However, experts reconstruct game-relevant features such as 3-in-a-row more accurately in the same amount of time. C. Example position of the memory and reconstruction experiment. The original board contains a 3-in-a-row feature on the bottom row (yellow shading). In the reconstructions, each circle indicates the distribution of pieces placed by different observers, with the angles of the grey, black and white wedges indicating the probability for that square to be empty, contain a black or contain a white piece, respectively. Novices correctly reconstruct the 3-in-a-row feature 42.1% of the time, but experts 84.2%. Together, these results suggest that players represent boards in memory in terms of game-relevant features.

Extended Data Table 1 Robustness analysis

Full size table

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

van Opheusden, B., Kuperwajs, I., Galbiati, G. et al. Expertise increases planning depth in human gameplay. Nature 618, 1000–1005 (2023). https://doi.org/10.1038/s41586-023-06124-2

Download citation

Received: 03 June 2021
Accepted: 24 April 2023
Published: 31 May 2023
Issue Date: 29 June 2023
DOI: https://doi.org/10.1038/s41586-023-06124-2

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.