Abstract
Classical reinforcement learning (CRL) has been widely applied in neuroscience and psychology; however, quantum reinforcement learning (QRL), which shows superior performance in computer simulations, has never been empirically tested on human decision-making. Moreover, all current successful quantum models for human cognition lack connections to neuroscience. Here we studied whether QRL can properly explain value-based decision-making. We compared 2 QRL and 12 CRL models by using behavioural and functional magnetic resonance imaging data from healthy and cigarette-smoking subjects performing the Iowa Gambling Task. In all groups, the QRL models performed well when compared with the best CRL models and further revealed the representation of quantum-like internal-state-related variables in the medial frontal gyrus in both healthy subjects and smokers, suggesting that value-based decision-making can be illustrated by QRL at both the behavioural and neural levels.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data are available from the corresponding author on reasonable request.
Code availability
All code used to generate the results central to the main claims in this study is available from the corresponding author on reasonable request.
References
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction, Vol. 1 (MIT Press, 1998).
Niv, Y. Reinforcement learning in the brain. J. Math. Psychol. 53, 139–154 (2009).
Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).
Dong, D., Chen, C., Li, H. & Tarn, T.-J. Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. Pt B 38, 1207–1220 (2008).
Dong, D., Chen, C., Chu, J. & Tarn, T.-J. Robust quantum-inspired reinforcement learning for robot navigation. IEEE/ASME Trans. Mechatron. 17, 86–97 (2012).
Fakhari, P., Rajagopal, K., Balakrishnan, S. N. & Busemeyer, J. R. Quantum inspired reinforcement learning in changing environment. New Math. Nat. Comput. 9, 273–294 (2013).
Wittek, P. Quantum Machine Learning: What Quantum Computing Means to Data Mining (Academic Press, 2014).
Dunjko, V., Taylor, J. M. & Briegel, H. J. Quantum-enhanced machine learning. Phys. Rev. Lett. 117, 130501 (2016).
Manousakis, E. Quantum formalism to describe binocular rivalry. Biosystems 98, 57–66 (2009).
Busemeyer, J. R. & Bruza, P. D. Quantum Models of Cognition and Decision (Cambridge Univ. Press, 2012).
Busemeyer, J. R., Wang, Z. & Shiffrin, R. M. Bayesian model comparison favors quantum over standard decision theory account of dynamic inconsistency. Decision 2, 1–12 (2015).
Kvam, P. D., Pleskac, T. J., Yu, S. & Busemeyer, J. R. Interference effects of choice on confidence: quantum characteristics of evidence accumulation. Proc. Natl Acad. Sci. USA 112, 10645–10650 (2015).
Ashtiani, M. & Azgomi, M. A. A survey of quantum-like approaches to decision making and cognition. Math. Soc. Sci. 75, 49–80 (2015).
Yukalov, V. I. & Sornette, D. Quantum probability and quantum decision-making. Phil. Trans. R. Soc. A 374, 20150100 (2016).
de Barros, J. A. & Oas, G. in The Palgrave Handbook of Quantum Models in Social Science (eds Haven, E. & Khrennikov, A.) 195–228 (Springer, 2017).
Takahashi, T. Can quantum approaches benefit biology of decision making? Prog. Biophys. Mol. Biol. 130, 99–102 (2017).
Gold, J. I. & Shadlen, M. N. The neural basis of decision making. Annu. Rev. Neurosci. 30, 535–574 (2007).
Sanfey, A. G., Loewenstein, G., McClure, S. M. & Cohen, J. D. Neuroeconomics: cross-currents in research on decision-making. Trends Cogn. Sci. 10, 108–116 (2006).
Glimcher, P. W. Indeterminacy in brain and behavior. Annu. Rev. Psychol. 56, 25–56 (2005).
Glimcher, P. W. & Fehr, E. Neuroeconomics: Decision Making and the Brain (Academic Press, 2013).
Lee, D., Seo, H. & Jung, M. W. Neural basis of reinforcement learning and decision making. Annu. Rev. Neurosci. 35, 287–308 (2012).
Daw, N. D. & Tobler, P. N. in Neuroeconomics 2nd edn (eds Glimcher, P. W. & Fehr, E.) 283–298 (Academic Press, 2014).
Kornmeier, J., Friedel, E., Wittmann, M. & Atmanspacher, H. EEG correlates of cognitive time scales in the Necker-Zeno model for bistable perception. Conscious. Cogn. 53, 136–150 (2017).
Bechara, A., Damasio, A. R., Damasio, H. & Anderson, S. W. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition 50, 7–15 (1994).
Ahn, W. Y., Dai, J., Vassileva, J., Busemeyer, J. R. & Stout, J. C. in Progress in Brain Research Vol. 224 (eds Ekhtiari, H. & Paulus, M.) 53–65 (Elsevier, 2016).
Buelow, M. T. & Suhr, J. A. Risky decision making in smoking and nonsmoking college students: examination of Iowa Gambling Task performance by deck type selections. Appl. Neuropsychol. Child 3, 38–44 (2014).
Wei, Z. et al. Chronic nicotine exposure impairs uncertainty modulation on reinforcement learning in anterior cingulate cortex and serotonin system. NeuroImage 169, 323–333 (2018).
Steingroever, H. et al. Data from 617 healthy participants performing the Iowa gambling task: a “many labs” collaboration. J. Open Psychol. Data 3, 340–353 (2015).
Rangel, A., Camerer, C. & Montague, P. R. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556 (2008).
Ahn, W.-Y., Busemeyer, J. R., Wagenmakers, E.-J. & Stout, J. C. Comparison of decision learning models using the generalization criterion method. Cogn. Sci. 32, 1376–1402 (2008).
Worthy, D. A., Pang, B. & Byrne, K. A. Decomposing the roles of perseveration and expected value representation in models of the Iowa gambling task. Front. Psychol. 4, 640 (2013).
Ahn, W. Y. et al. Decision-making in stimulant and opiate addicts in protracted abstinence: evidence from computational modeling with pure users. Front. Psychol. 5, 849 (2014).
Worthy, D. A. & Maddox, W. T. Age-based differences in strategy use in choice tasks. Front. Neurosci. 5, 145 (2012).
Ahn, W.-Y., Krawitz, A., Kim, W., Busemeyer, J. R. & Brown, J. W. A model-based fMRI analysis with hierarchical Bayesian parameter estimation. Decision 1, 8–23 (2013).
Byrne, K. A., Norris, D. D. & Worthy, D. A. Dopamine, depressive symptoms, and decision-making: the relationship between spontaneous eye blink rate and depressive symptoms predicts Iowa Gambling Task performance. Cogn. Affect. Behav. Neurosci. 16, 23–36 (2016).
Cavanaugh, J. E. Unifying the derivations for the Akaike and corrected Akaike information criteria. Stat. Probab. Lett. 33, 201–208 (1997).
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).
Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J. & Friston, K. J. Bayesian model selection for group studies. NeuroImage 46, 1004–1017 (2009).
Dajka, J., Łuczka, J. & Hänggi, P. Distance between quantum states in the presence of initial qubit-environment correlations: a comparative study. Phys. Rev. A 84, 032120 (2011).
O’Doherty, J. P., Hampton, A. & Kim, H. Model-based fMRI and its application to reward learning and decision making. Ann. N. Y. Acad. Sci. 1104, 35–53 (2007).
Ma, W. J. & Jazayeri, M. Neural coding of uncertainty and probability. Annu. Rev. Neurosci. 37, 205–220 (2014).
Bach, D. R., Hulme, O., Penny, W. D. & Dolan, R. J. The known unknowns: neural representation of second-order uncertainty, and ambiguity. J. Neurosci. 31, 4811–4820 (2011).
Payzan-LeNestour, E., Dunne, S., Bossaerts, P. & O’Doherty, J. P. The neural representation of unexpected uncertainty during value-based decision making. Neuron 79, 191–201 (2013).
Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).
Yu, A. J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).
Singh, V. A potential role of reward and punishment in the facilitation of the emotion-cognition dichotomy in the Iowa Gambling Task. Front. Psychol. 4, 944 (2013).
Yechiam, E. & Ert, E. Evaluating the reliance on past choices in adaptive learning models. J. Math. Psychol. 51, 75–84 (2007).
Chuang, I. L., Gershenfeld, N. & Kubinec, M. Experimental implementation of fast quantum searching. Phys. Rev. Lett. 80, 3408 (1998).
Dunjko, V., Taylor, J. M. & Briegel, H. J. Advances in quantum reinforcement learning. In Proc. 2017 IEEE International Conference on Systems, Man, and Cybernetics 282–287 (IEEE, 2017).
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge Univ. Press, 2010).
Yearsley, J. M. Advanced tools and concepts for quantum cognition: a tutorial. J. Math. Psychol. 78, 24–39 (2017).
Crawford, D., Levit, A., Ghadermarzy, N., Oberoi, J. S. & Ronagh, P. Reinforcement learning using quantum Boltzmann machines. Quantum Info. Comput. 18, 51–74 (2018).
Krain, A. L., Wilson, A. M., Arbuckle, R., Castellanos, F. X. & Milham, M. P. Distinct neural mechanisms of risk and ambiguity: a meta-analysis of decision-making. NeuroImage 32, 477–484 (2006).
Hsu, M., Bhatt, M., Adolphs, R., Tranel, D. & Camerer, C. F. Neural systems responding to degrees of uncertainty in human decision-making. Science 310, 1680–1683 (2005).
Litt, A., Plassmann, H., Shiv, B. & Rangel, A. Dissociating valuation and saliency signals during decision-making. Cereb. Cortex 21, 95–102 (2010).
Wang, Y. et al. Neural substrates of updating the prediction through prediction error during decision making. NeuroImage 157, 1–12 (2017).
Vickery, T. J. & Jiang, Y. V. Inferior parietal lobule supports decision making under uncertainty in humans. Cereb. Cortex 19, 916–925 (2008).
Xue, G., Lu, Z., Levin, I. P. & Bechara, A. The impact of prior risk experiences on subsequent risky decision-making: the role of the insula. NeuroImage 50, 709–716 (2010).
Haggard, P. Human volition: towards a neuroscience of will. Nat. Rev. Neurosci. 9, 934–946 (2008).
Nachev, P., Kennard, C. & Husain, M. Functional role of the supplementary and pre-supplementary motor areas. Nat. Rev. Neurosci. 9, 856–869 (2008).
Tanji, J. & Kurata, K. Contrasting neuronal activity in supplementary and precentral motor cortex of monkeys. I. Responses to instructions determining motor responses to forthcoming signals of different modalities. J. Neurophysiol. 53, 129–141 (1985).
Okano, K. & Tanji, J. Neuronal activities in the primate motor fields of the agranular frontal cortex preceding visually triggered and self-paced movement. Exp. Brain Res. 66, 155–166 (1987).
Rushworth, M. F. S. & Behrens, T. E. J. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 11, 389–397 (2008).
Sul, J. H., Kim, H., Huh, N., Lee, D. & Jung, M. W. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron 66, 449–460 (2010).
Kepecs, A., Uchida, N., Zariwala, H. A. & Mainen, Z. F. Neural correlates, computation and behavioural impact of decision confidence. Nature 455, 227–231 (2008).
O’Neill, M. & Schultz, W. Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value. Neuron 68, 789–800 (2010).
Studer, B., Cen, D. & Walsh, V. The angular gyrus and visuospatial attention in decision-making under risk. NeuroImage 103, 75–80 (2014).
Tversky, A. & Kahneman, D. Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5, 297–323 (1992).
De Barros, J. A. & Suppes, P. Quantum mechanics, interference, and the brain. J. Math. Psychol. 53, 306–313 (2009).
Lambert, N. et al. Quantum biology. Nat. Phys. 9, 10–18 (2013).
Busemeyer, J. R., Pothos, E. M., Franco, R. & Trueblood, J. S. A quantum theoretical explanation for probability judgment errors. Psychol. Rev. 118, 193–218 (2011).
beim Graben, P. & Atmanspacher, H. Complementarity in classical dynamical systems. Found. Phys. 36, 291–306 (2006).
beim Graben, P., Filk, T. & Atmanspacher, H. Epistemic entanglement due to non-generating partitions of classical dynamical systems. Int. J. Theor. Phys. 52, 723–734 (2013).
Ivakhnenko, O. V., Shevchenko, S. N. & Nori, F. Simulating quantum dynamical phenomena using classical oscillators: Landau-Zener-Stückelberg-Majorana interferometry, latching modulation, and motional averaging. Sci. Rep. 8, 12218 (2018).
Bliokh, K. Y., Bekshaev, A. Y., Kofman, A. G. & Nori, F. Photon trajectories, anomalous velocities and weak measurements: a classical interpretation. New J. Phys. 15, 073022 (2013).
Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).
Busemeyer, J. R., Fakhari, P. & Kvam, P. Neural implementation of operations used in quantum cognition. Prog. Biophys. Mol. Biol. 130, 53–60 (2017).
Phelps, E. A., Lempert, K. M. & Sokol-Hessner, P. Emotion and decision making: multiple modulatory neural circuits. Annu. Rev. Neurosci. 37, 263–287 (2014).
Hu, H. Reward and aversion. Annu. Rev. Neurosci. 39, 297–324 (2016).
Chen, C., Takahashi, T., Nakagawa, S., Inoue, T. & Kusumi, I. Reinforcement learning in depression: a review of computational research. Neurosci. Biobehav. Rev. 55, 247–267 (2015).
Sanfey, A. G. Social decision-making: insights from game theory and neuroscience. Science 318, 598–602 (2007).
Roskies, A. L. How does neuroscience affect our conception of volition? Annu. Rev. Neurosci. 33, 109–130 (2010).
Schack, R., Brun, T. A. & Caves, C. M. Quantum Bayes rule. Phys. Rev. A 64, 014305 (2001).
Kouda, N., Matsui, N., Nishimura, H. & Peper, F. Qubit neural network and its learning efficiency. Neural Comput. Appl. 14, 114–121 (2005).
Piotrowski, E. W. & Sladkowski, J. The next stage: quantum game theory. in Mathematical Physics Research at the Cutting Edge (ed. Benton, C. V.) 247–268 (Nova Science Publishers, 2004).
Ahn, W.-Y., Krawitz, A., Kim, W., Busemeyer, J. R. & Brown, J. W. A model-based fMRI analysis with hierarchical Bayesian parameter estimation. J. Neurosci. Psychol. Econ. 4, 95–110 (2011).
He, Q. et al. Altered dynamics between neural systems sub-serving decisions for unhealthy food. Front. Neurosci. 8, 350 (2014).
Brevers, D., Noël, X., He, Q., Melrose, J. A. & Bechara, A. Increased ventral-striatal activity during monetary decision making is a marker of problem poker gambling severity. Addict. Biol. 21, 688–699 (2016).
Yechiam, E. & Busemeyer, J. R. Comparison of basic assumptions embedded in learning models for experience-based decision making. Psychon. Bull. Rev. 12, 387–402 (2005).
Busemeyer, J. R. & Stout, J. C. A contribution of cognitive decision models to clinical assessment: decomposing performance on the Bechara gambling task. Psychol. Assess. 14, 253–262 (2002).
Erev, I. & Barron, G. On adaptation, maximization, and reinforcement learning among cognitive strategies. Psychol. Rev. 112, 912–931 (2005).
Ahn, W.-Y., Haines, N. & Zhang, L. Revealing neurocomputational mechanisms of reinforcement learning and decision-making with the hBayesDM package. Comput. Psychiatr. 1, 24–57 (2017).
Wagner, A. R. & Rescorla, R. A. in Inhibition and Learning (eds Boakes, R. A. & Halliday, M. S.) 301–336 (1972).
Erev, I. & Roth, A. E. Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am. Econ. Rev. 88, 848–881 (1998).
Grover, L. K. A fast quantum mechanical algorithm for database search. In Proc. 28th Annual ACM Symposium on Theory of Computing 212–219 (ACM, 1996).
Acerbi, L. & Ji, W. Practical Bayesian optimization for model fitting with Bayesian adaptive direct search. Adv. Neural Inf. Proc. Syst. 30, 1836–1846 (2017).
Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723 (1974).
Cox, R. W. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput. Biomed. Res. 29, 162–173 (1996).
Li, N. et al. Resting-state functional connectivity predicts impulsivity in economic decision-making. J. Neurosci. 33, 4886–4895 (2013).
Acknowledgements
We thank Y. Yang, R. Zha, J. Besumeyer and N. Ma for their inspirational comments. We thank L. Acerbi, G. R. Yang, C. Gneiting, A. Miranowicz, X. Li, Z. Jin and X. Li for their helpful suggestions. This work was supported by grants from the National Key Basic Research Programme (grant nos. 2016YFA0400900 and 2018YFC0831101), the National Natural Science Foundation of China (grant nos. 31471071, 31771221, 61773360, 71671115, 71874170 and 71942003), the Fundamental Research Funds for the Central Universities of China, the MURI Center for Dynamic Magneto-Optics via the Air Force Office of Scientific Research (AFOSR; grant no. FA9550-14-1-0040), the Army Research Office (ARO; grant no. W911NF-18-1-0358), the Asian Office of Aerospace Research and Development (AOARD; grant no. FA2386-18-1-4045), the Japan Science and Technology Agency (JST; via the Q-LEAP programme and CREST grant no. JPMJCR1676), the Japan Society for the Promotion of Science (JSPS; JSPS–RFBR grant no. 17-52-50023 and JSPS–FWO grant no. VS.059.18N), the RIKEN–AIST Challenge Research Fund, the Templeton Foundation, the Foundational Questions Institute (FQXi) and the NTT PHI Laboratory, the Australian Research Council’s Discovery Projects funding scheme under Project DP190101566, the Alexander von Humboldt Foundation and the US Office of Naval Research. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the Bioinformatics Centre of the University of Science and Technology of China, School of Life Science for providing supercomputing resources for this project.
Author information
Authors and Affiliations
Contributions
L.J.-A., Y.P. and X.Z. conceived the study. Y.L. and Z.W. provided the devices and collected the data. L.J.-A. built the models. L.J.-A. and Z.W. analysed the data. All authors participated in discussions. L.J.-A., D.D., Y.P., F.N. and X.Z. wrote the paper. X.Z. supervised the project and acquired funding.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Primary Handling Editor: Stavroula Kousta.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Methods, Supplementary Results, Supplementary Discussion, Supplementary Figs. 1–13, Supplementary Tables 1–6 and Supplementary References.
Rights and permissions
About this article
Cite this article
Li, JA., Dong, D., Wei, Z. et al. Quantum reinforcement learning during human decision-making. Nat Hum Behav 4, 294–307 (2020). https://doi.org/10.1038/s41562-019-0804-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-019-0804-2
This article is cited by
-
Quantum autoencoders using mixed reference states
npj Quantum Information (2024)
-
Distinct roles of monkey OFC-subcortical pathways in adaptive behavior
Nature Communications (2024)
-
On quantum computing for artificial superintelligence
European Journal for Philosophy of Science (2024)
-
A hybrid classical-quantum approach to speed-up Q-learning
Scientific Reports (2023)
-
On-chip phonon-magnon reservoir for neuromorphic computing
Nature Communications (2023)