Abstract
A hallmark of human intelligence is the ability to plan multiple steps into the future1,2. Despite decades of research3,4,5, it is still debated whether skilled decision-makers plan more steps ahead than novices6,7,8. Traditionally, the study of expertise in planning has used board games such as chess, but the complexity of these games poses a barrier to quantitative estimates of planning depth. Conversely, common planning tasks in cognitive science often have a lower complexity9,10 and impose a ceiling for the depth to which any player can plan. Here we investigate expertise in a complex board game that offers ample opportunity for skilled players to plan deeply. We use model fitting methods to show that human behaviour can be captured using a computational cognitive model based on heuristic search. To validate this model, we predict human choices, response times and eye movements. We also perform a Turing test and a reconstruction experiment. Using the model, we find robust evidence for increased planning depth with expertise in both laboratory and large-scale mobile data. Experts memorize and reconstruct board features more accurately. Using complex tasks combined with precise behavioural modelling might expand our understanding of human planning and help to bridge the gap with progress in artificial intelligence.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
Data supporting the findings of this study are publicly available at the Open Science Framework (https://osf.io/n2xjm/).
Code availability
Code used in this study is publicly available at the Open Science Framework (https://osf.io/n2xjm/).
References
Miller, K. J. & Venditto, S. J. C. Multi-step planning in the brain. Curr. Opin. Behav. Sci. 38, 29–39 (2021).
Mattar, M. G. & Lengyel, M. Planning in the brain. Neuron 110, 914–934 (2022).
de Groot, A. D. Het Denken van den Sckaken (Noord-Holland. Uitgev. Maatschappij, 1946).
Charness, N. in Toward a General Theory of Expertise: Prospects and Limits (eds Anders, E. K. & Smith, J.) 39–63 (Cambridge University Press, 1991).
Holding, D. H. Theories of chess skill. Psychol. Res. 54, 10–16 (1992).
Gobet, F. A pattern-recognition theory of search in expert problem solving. Think. Reasoning 3, 291–313 (1997).
Campitelli, G. & Gobet, F. Adaptive expert decision making: Skilled chess players search more and deeper. J. Int. Comput. Games Assoc. 27, 209–216 (2004).
Linhares, A., Freitas, A. E. T., Mendes, A. & Silva, J. S. Entanglement of perception and reasoning in the combinatorial game of chess: differential errors of strategic reconstruction. Cogn. Syst. Res. 13, 72–86 (2012).
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Huys, Q. J. et al. Bonsai trees in your head: how the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Comput. Biol. 8, e1002410 (2012).
Chase, W. G. & Simon, H. A. Perception in chess. Cogn. Psychol. 4, 55–81 (1973).
Van Harreveld, F., Wagenmakers, E.-J. & Van Der Maas, H. L. The effects of time pressure on chess skill: an investigation into fast and slow processes underlying expert performance. Psychol. Res. 71, 591–597 (2007).
Sheridan, H. & Reingold, E. M. Chess players’ eye movements reveal rapid recognition of complex visual patterns: evidence from a chess-related visual search task. J. Vis. 17, 4 (2017).
Gobet, F. & Simon, H. A. Expert chess memory: revisiting the chunking hypothesis. Memory 6, 225–255 (1998).
Bilalić, M., Langner, R., Erb, M. & Grodd, W. Mechanisms and neural basis of object and pattern recognition: a study with chess experts. J. Exp. Psychol. Gen. 139, 728–742 (2010).
Saariluoma, P. Visuospatial and articulatory interference in chess players’ information intake. Appl. Cogn. Psychol. 6, 77–89 (1992).
Holding, D. H. The Psychology of Chess Skill (Lawrence Erlbaum, 1985).
Holding, D. H. Evaluation factors in human tree search. Am. J. Psychol. 102, 103–108 (1989).
Gobet, F. & Jansen, P. Towards a chess program based on a model of human memory. Adv. Comput. Chess 7, 35–60 (1994).
Holding, D. H. Counting backward during chess move choice. Bull. Psychon. Soc. 27, 421–424 (1989).
Charness, N. in Complex Information Processing 203–228 (Psychology Press, 2013).
Huys, Q. J. et al. Interplay of approximate planning strategies. Proc. Natl Acad. Sci. USA 112, 3098–3103 (2015).
Snider, J., Lee, D., Poizner, H. & Gepshtein, S. Prospective optimization with limited resources. PLoS Comput. Biol. 11, e1004501 (2015).
Kolling, N., Scholl, J., Chekroud, A., Trier, H. A. & Rushworth, M. F. Prospection, perseverance, and insight in sequential behavior. Neuron 99, 1069–1082 (2018).
Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Redish, A. D. Vicarious trial and error. Nat. Rev. Neurosci. 17, 147–159 (2016).
Pezzulo, G., Donnarumma, F., Maisto, D. & Stoianov, I. Planning at decision time and in the background during spatial navigation. Curr. Opin. Behav. Sci. 29, 69–76 (2019).
Miller, K. J., Botvinick, M. M. & Brody, C. D. Dorsal hippocampus contributes to model-based planning. Nat. Neurosci. 20, 1269 (2017).
Groman, S. M., Rich, K. M., Smith, N. J., Lee, D. & Taylor, J. R. Chronic exposure to methamphetamine disrupts reinforcement-based decision making in rats. Neuropsychopharmacology 43, 770–780 (2018).
Akam, T. et al. The anterior cingulate cortex predicts future states to mediate model-based action selection. Neuron 109, 149–163 (2020).
Beck, J. Combinatorial Games: Tic-Tac-Toe Theory Vol. 114 (Cambridge Univ. Press, 2008).
van Opheusden, B. & Ma, W. J. Tasks for aligning human and machine planning. Curr. Opin. Behav. Sci. 29, 127–133 (2019).
Pearl, J. Heuristics: Intelligent Search Strategies for Computer Problem Solving (Addison-Wesley Longman Publishing Co., Inc., 1984).
Bonet, B. & Geffner, H. Planning as heuristic search. Artif. Int. 129, 5–33 (2001).
Dechter, R. & Pearl, J. Generalized best-first search strategies and the optimality of A*. J. ACM 32, 505–536 (1985).
Callaway, F. et al. Rational use of cognitive resources in human planning. Nat. Hum. Behav. 6, 1112–1125 (2022).
Treisman, A. M. & Gelade, G. A feature-integration theory of attention. Cogn. Psychol. 12, 97–136 (1980).
van Opheusden, B., Acerbi, L. & Ma, W. J. Unbiased and efficient log-likelihood estimation with inverse binomial sampling. PLOS Comput. Biol. 16, e1008483 (2020).
Acerbi, L. & Ma, W. J. Practical Bayesian optimization for model fitting with Bayesian adaptive direct search. Proceedings of the 31st International Conference on Neural Information Processing Systems 1834–1844 (2017).
Turing, A. Computing machinery and intelligence. Mind 59, 433–460 (1950).
Elo, A. E. The Rating of Chessplayers, Past and Present (Arco Pub., 1978).
Chabris, C. F. & Hearst, E. S. Visualization, pattern recognition, and forward search: Effects of playing speed and sight of the position on grandmaster chess errors. Cogn. Sci. 27, 637–648 (2003).
Calderwood, R., Klein, G. A. & Crandall, B. W. Time pressure, skill, and move quality in chess. Am. J. Psychol. 101, 481–493 (1988).
Krusche, M. J., Schulz, E., Guez, A. & Speekenbrink, M. Adaptive planning in human search. Preprint at BioRxiv https://doi.org/10.1101/268938 (2018).
Huang, J., Velarde, I., Ma, W. J. & Baldassano, C. Schema-based predictive eye movements support sequential memory encoding. eLife 12, e82599 (2023).
Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L. & Efros, A. A. Investigating human priors for playing video games. In Proc. Intennational Conference of Machine Learning (ICML) (2018).
Charness, N., Tuffiash, M., Krampe, R., Reingold, E. & Vasyukova, E. The role of deliberate practice in chess expertise. Appl. Cogn. Psychol. 19, 151–165 (2005).
Brown, N. & Sandholm, T. Superhuman AI for multiplayer poker. Science 365, 885–890 (2019).
Meta Fundamental AI Research Diplomacy Team (FAIR) et al.Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science 378, 1067–1074 (2022).
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362, 1140–1144 (2018).
Hamrick, J. B. et al. Combining q-learning and search with amortized value estimates. In Proc. International Conference on Learning Representations (ICLR) (2020).
Ma, I., Phaneuf, C., van Opheusden, B., Ma, W. J. & Hartley, C. The component processes of complex planning follow distinct developmental trajectories. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/d62rw (2022).
Padoa-Schioppa, C. & Assad, J. A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
Cornelissen, F. W., Peters, E. M. & Palmer, J. The eyelink toolbox: eye tracking with MATLAB and the psychophysics toolbox. Behav. Res. Methods Instr. Comput. 34, 613–617 (2002).
Zermelo, E. Die berechnung der turnier-ergebnisse als ein maximumproblem der wahrscheinlichkeitsrechnung. Math. Z. 29, 436–460 (1929).
Hunter, D. R. MM algorithms for generalized Bradley-Terry models. Ann. Stat. 32, 384–406 (2004).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction Vol. 1 (MIT Press, 1998).
Sutton, R. S., McAllester, D. A., Singh, S. P. & Mansour, Y. in Advances in Neural Information Processing Systems 1057–1063 (2000).
Dawson, R. Unbiased Tests, Unbiased Estimators, and Randomized Similar Regions. PhD thesis, Harvard Univ. (1953).
de Groot, M. H. Unbiased sequential estimation for binomial populations. Ann. Math. Stat. 30, 80–101 (1959).
Huyer, W. & Neumaier, A. Global optimization by multilevel coordinate search. J. Glob. Optim. 14, 331–355 (1999).
Acknowledgements
We thank Z. Shu for piloting an early version of the experiment; F. Khalidi for assistance with data collection; and A. Mihali, A. Yoo, M. Honig, L. Acerbi, W. Adler, F. Callaway, T. Griffiths and M. Mattar, and the other current members and alumni of the Ma laboratory for discussions. This work was supported by grant number IIS-1344256 to W.J.M. and by Graduate Research Fellowship number DGE1839302 to I.K. from the National Science Foundation.
Author information
Authors and Affiliations
Contributions
All of the authors contributed to conceptualization of the research. B.v.O., G.G., I.K. and Y.L. collected data. B.v.O., I.K., G.G., Y.L. and Z.B. developed software, methodology and performed analysis. B.v.O., I.K. and W.J.M. wrote the paper. W.J.M. supervised the project and acquired funding.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Quentin Huys and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Model comparison.
We validate our main model specification by comparing to alternatives in three categories: lesions generated by removing model components (red), extensions generated by adding new model components (blue) and modifications generated by replacing a model component with a similar implementation (green). A. Cross-validated log-likelihood per move, across all participants in the laboratory experiments. Error bars indicate mean and s.e.m. of the difference in log-likelihood with the main model. B–F. Same as A., for participants in the human-vs-human, generalization, eye tracking, learning and time pressure experiments.
Extended Data Fig. 2 Parameter validation.
Because model fitting is too computationally expense for parameter recovery, we assess the reliability of the parameter estimates using less computationally expensive methods. A. Pearson correlation across participants between model parameters estimated in two independent fits. Error bars indicate the confidence interval. B. Same as A., for different sessions in the learning experiment. Error bars indicate s.e.m. across participants C-D. Same as A-B., for the derived metrics. E. 2-sample Kolmogorov-Smirnov test statistic between the distribution of \({\hat{\theta }}_{j}^{{\rm{lesion}}\,i}\) and \({\hat{\theta }}_{j}^{{\rm{full}}}\) for each pair of parameters. In all panels, we indicate tests that are significant after correcting for multiple comparisons using false discovery rate by *: α = 0.05, **: α = 0.01, ***: α = 0.001. For significant tests, we additionally report uncorrected two-sided p-values. F. Trade-offs between model parameters using a Pearson correlation between \({\hat{\theta }}_{i}^{{\rm{full}}}\) and \({\hat{\theta }}_{j}^{{\rm{full}}}-{\hat{\theta }}_{j}^{{\rm{lesion}}\,i}\) for each pair of model parameters. G-H. Same as E-F., for the derived metrics.
Extended Data Fig. 3 Summary statistics.
Comparing our main model directly to human choices is challenging because the data is high-dimensional and discrete. Instead, we compute summary statistics as a function of number of pieces on the board, to probe for systematic patterns in the time course of people’s games, such as a tendency to start playing near the centre of the board and gradually expand outwards. We compare moves made in human-vs-human games (green solid lines), the behavioural model with inferred parameters on the same positions (blue solid lines) or random moves (black dashed lines). For all summary statistics, people deviate considerably from random, and the main model closely matches the human data. All panels depict cross-validated predictions.
Extended Data Fig. 4 Individual differences across summary statistics.
Each panel shows a scatterplot for the same set of summary statistics as in Extended Data Fig. 3, where each point represents a participant in the human-vs-human experiment, the horizontal coordinate the statistic computed on that participant’s moves, and the vertical coordinate the statistic computed on moves made by the model, with parameters inferred for that participant on out-of-sample choices. The Pearson correlation coefficient and two-sided p-value are reported within each panel. The model accurately predicts individual differences between participants.
Extended Data Fig. 5 Example board positions illustrating model components.
To investigate which patterns in the data are explained by tree search and feature dropping, we compare the distribution of choices predicted by the main model against lesion models. A. Example positions from human-vs-human games in which the model with (right column) and without tree search (left column) make highly different predictions (red shade), as quantified by Jensen-Shannon divergence. In each position, we also show the models’ preferred move (with an x) and the move made by the human participant (open circle). These predictions are averaged across simulations with 200 different parameter vectors from fits to human data, to capture positions with robust differences between planning and no planning. Upon inspection, we recognize these positions as ones where the player to move has multiple reasonable options, but to evaluate their quality one has to calculate many moves ahead. For example, in the second position, the move preferred by the No tree model is losing and the one by the main model is drawn, but this relies on a specific 10-move forced sequence that can only be found through explicit search. B. Same as A., but lesioning the feature drop metric, and using the ratio of the predicted probability of the human move as metric for selecting positions. The feature drop mechanism is primarily necessary to account for people’s tendency to overlook possibilities to immediately make four-in-a-row, or block immediate four-in-a-row threats by the opponent.
Extended Data Fig. 6 Turing test.
In the Turing test, we showed participants video segments of sequences of moves, on average 9.38 moves long. A. Classification accuracy in the Turing test as a function of video length. Error bars indicate s.e.m. Participants are at chance level for classification of one-move videos (of which there were 8), and their accuracy only substantially exceeds 50% for sequences longer than 10 moves. A mixed effects linear regression with accuracy as dependent variable and observer-specific random intercepts estimates the increase in accuracy per observed move as only 0.33 ± 0.10%. B. Histogram of the percentage of observers classifying a given video as human-vs-human or computer-vs-computer, for either human games (pink), or computer-generated games (grey). While human games are on average more likely to be classified as human and computer games as computers, there are no videos for which all 30 observers agree, and there is a considerable fraction of videos (63 out of 180) for which a majority of observers respond incorrectly.
Extended Data Fig. 7 Eye tracking.
A. Coefficients in a linear regression predicting participants’ attentional distribution from the distribution of squares that the model includes in its principal variation at each depth. The regression coefficients are significantly greater than zero (one-sample T-test across participants) for depth up to 7, and highest for depth closer to 1. Error bars indicate s.e.m. across participants. B. Example positions from the eye tracking data in which the No feature drop model assigns low probability to the participant’s move. The right column shows the eye movements while the participant contemplates their move. In most positions, the participant spends no time whatsoever looking at the square preferred by the model, suggesting they indeed dropped the relevant four-in-a-row feature.
Extended Data Fig. 8 Playing strength correlations and response times.
A. Planning depth vs Elo rating of all participants in the learning (green) and time pressure experiments (purple). Playing strength correlates with planning depth (ρ = 0.62, p < 0.001). B. Same as A., for feature drop rate (ρ = −0.73, p < 0.001). C. Same as A., for heuristic quality, which does correlate with playing strength (ρ = 0.11, p = 0.088). C. Response times for participants in each session of the learning experiment. Error bars indicate s.e.m. across participants. Participants play slightly faster in later sessions. Therefore, our finding of increased planning in later sessions is not confounded by an increase in thinking time. Instead, people plan more while using less time. D. Same as C., for the time pressure experiment. The time limit manipulation is effective at increasing participants’ response times, even though they use only a fraction of the available time on average.
Extended Data Fig. 9 Memory and reconstruction experiment.
A. Error rates in the memory and reconstruction experiment. Although experts are slightly worse than novices in the extra piece error rate (β = 0.0071 ± 0.0031, p = 0.049), experts substantially outperform novices in the missed piece (β = 0.037 ± 0.006, p < 0.001) and the wrong colour rate (β = 0.019 ± 0.003, p < 0.001). B. Scatterplot of total reconstruction time for experts and novices. Each point represents a board position in the memory in reconstruction experiment, the x-coordinate the average time that experts take to finish their reconstruction, and the y-coordinate the same but for novices. Positions from games are coloured pink, randomly scrambled positions in grey. Experts take more time to reconstruct pieces (β = 2.73 ± 0.57, p < 0.001), meaning that the error rate result could reflect a speed-accuracy trade-off as opposed to an overall improvement. However, experts reconstruct game-relevant features such as 3-in-a-row more accurately in the same amount of time. C. Example position of the memory and reconstruction experiment. The original board contains a 3-in-a-row feature on the bottom row (yellow shading). In the reconstructions, each circle indicates the distribution of pieces placed by different observers, with the angles of the grey, black and white wedges indicating the probability for that square to be empty, contain a black or contain a white piece, respectively. Novices correctly reconstruct the 3-in-a-row feature 42.1% of the time, but experts 84.2%. Together, these results suggest that players represent boards in memory in terms of game-relevant features.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
van Opheusden, B., Kuperwajs, I., Galbiati, G. et al. Expertise increases planning depth in human gameplay. Nature 618, 1000–1005 (2023). https://doi.org/10.1038/s41586-023-06124-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-023-06124-2
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.