Quantum reinforcement learning during human decision-making


Classical reinforcement learning (CRL) has been widely applied in neuroscience and psychology; however, quantum reinforcement learning (QRL), which shows superior performance in computer simulations, has never been empirically tested on human decision-making. Moreover, all current successful quantum models for human cognition lack connections to neuroscience. Here we studied whether QRL can properly explain value-based decision-making. We compared 2 QRL and 12 CRL models by using behavioural and functional magnetic resonance imaging data from healthy and cigarette-smoking subjects performing the Iowa Gambling Task. In all groups, the QRL models performed well when compared with the best CRL models and further revealed the representation of quantum-like internal-state-related variables in the medial frontal gyrus in both healthy subjects and smokers, suggesting that value-based decision-making can be illustrated by QRL at both the behavioural and neural levels.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Task diagram and task performance.
Fig. 2: Diagrams of model architecture.
Fig. 3: The AICc and BIC of each model, computed separately for each group.
Fig. 4: The inferred model probability of each model, computed separately for each group.
Fig. 5: The simulation results of each model, computed separately for each group.
Fig. 6: Generalized quantum distance (computed by the QSPP model)-related activity in the control group.
Fig. 7: fMRI results of the uncertainty × penalty/reward interaction.

Data availability

All data are available from the corresponding author on reasonable request.

Code availability

All code used to generate the results central to the main claims in this study is available from the corresponding author on reasonable request.


  1. 1.

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction, Vol. 1 (MIT Press, 1998).

  2. 2.

    Niv, Y. Reinforcement learning in the brain. J. Math. Psychol. 53, 139–154 (2009).

  3. 3.

    Biamonte, J. et al. Quantum machine learning. Nature 549, 195–202 (2017).

  4. 4.

    Dong, D., Chen, C., Li, H. & Tarn, T.-J. Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. Pt B 38, 1207–1220 (2008).

  5. 5.

    Dong, D., Chen, C., Chu, J. & Tarn, T.-J. Robust quantum-inspired reinforcement learning for robot navigation. IEEE/ASME Trans. Mechatron. 17, 86–97 (2012).

  6. 6.

    Fakhari, P., Rajagopal, K., Balakrishnan, S. N. & Busemeyer, J. R. Quantum inspired reinforcement learning in changing environment. New Math. Nat. Comput. 9, 273–294 (2013).

  7. 7.

    Wittek, P. Quantum Machine Learning: What Quantum Computing Means to Data Mining (Academic Press, 2014).

  8. 8.

    Dunjko, V., Taylor, J. M. & Briegel, H. J. Quantum-enhanced machine learning. Phys. Rev. Lett. 117, 130501 (2016).

  9. 9.

    Manousakis, E. Quantum formalism to describe binocular rivalry. Biosystems 98, 57–66 (2009).

  10. 10.

    Busemeyer, J. R. & Bruza, P. D. Quantum Models of Cognition and Decision (Cambridge Univ. Press, 2012).

  11. 11.

    Busemeyer, J. R., Wang, Z. & Shiffrin, R. M. Bayesian model comparison favors quantum over standard decision theory account of dynamic inconsistency. Decision 2, 1–12 (2015).

  12. 12.

    Kvam, P. D., Pleskac, T. J., Yu, S. & Busemeyer, J. R. Interference effects of choice on confidence: quantum characteristics of evidence accumulation. Proc. Natl Acad. Sci. USA 112, 10645–10650 (2015).

  13. 13.

    Ashtiani, M. & Azgomi, M. A. A survey of quantum-like approaches to decision making and cognition. Math. Soc. Sci. 75, 49–80 (2015).

  14. 14.

    Yukalov, V. I. & Sornette, D. Quantum probability and quantum decision-making. Phil. Trans. R. Soc. A 374, 20150100 (2016).

  15. 15.

    de Barros, J. A. & Oas, G. in The Palgrave Handbook of Quantum Models in Social Science (eds Haven, E. & Khrennikov, A.) 195–228 (Springer, 2017).

  16. 16.

    Takahashi, T. Can quantum approaches benefit biology of decision making? Prog. Biophys. Mol. Biol. 130, 99–102 (2017).

  17. 17.

    Gold, J. I. & Shadlen, M. N. The neural basis of decision making. Annu. Rev. Neurosci. 30, 535–574 (2007).

  18. 18.

    Sanfey, A. G., Loewenstein, G., McClure, S. M. & Cohen, J. D. Neuroeconomics: cross-currents in research on decision-making. Trends Cogn. Sci. 10, 108–116 (2006).

  19. 19.

    Glimcher, P. W. Indeterminacy in brain and behavior. Annu. Rev. Psychol. 56, 25–56 (2005).

  20. 20.

    Glimcher, P. W. & Fehr, E. Neuroeconomics: Decision Making and the Brain (Academic Press, 2013).

  21. 21.

    Lee, D., Seo, H. & Jung, M. W. Neural basis of reinforcement learning and decision making. Annu. Rev. Neurosci. 35, 287–308 (2012).

  22. 22.

    Daw, N. D. & Tobler, P. N. in Neuroeconomics 2nd edn (eds Glimcher, P. W. & Fehr, E.) 283–298 (Academic Press, 2014).

  23. 23.

    Kornmeier, J., Friedel, E., Wittmann, M. & Atmanspacher, H. EEG correlates of cognitive time scales in the Necker-Zeno model for bistable perception. Conscious. Cogn. 53, 136–150 (2017).

  24. 24.

    Bechara, A., Damasio, A. R., Damasio, H. & Anderson, S. W. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition 50, 7–15 (1994).

  25. 25.

    Ahn, W. Y., Dai, J., Vassileva, J., Busemeyer, J. R. & Stout, J. C. in Progress in Brain Research Vol. 224 (eds Ekhtiari, H. & Paulus, M.) 53–65 (Elsevier, 2016).

  26. 26.

    Buelow, M. T. & Suhr, J. A. Risky decision making in smoking and nonsmoking college students: examination of Iowa Gambling Task performance by deck type selections. Appl. Neuropsychol. Child 3, 38–44 (2014).

  27. 27.

    Wei, Z. et al. Chronic nicotine exposure impairs uncertainty modulation on reinforcement learning in anterior cingulate cortex and serotonin system. NeuroImage 169, 323–333 (2018).

  28. 28.

    Steingroever, H. et al. Data from 617 healthy participants performing the Iowa gambling task: a “many labs” collaboration. J. Open Psychol. Data 3, 340–353 (2015).

  29. 29.

    Rangel, A., Camerer, C. & Montague, P. R. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556 (2008).

  30. 30.

    Ahn, W.-Y., Busemeyer, J. R., Wagenmakers, E.-J. & Stout, J. C. Comparison of decision learning models using the generalization criterion method. Cogn. Sci. 32, 1376–1402 (2008).

  31. 31.

    Worthy, D. A., Pang, B. & Byrne, K. A. Decomposing the roles of perseveration and expected value representation in models of the Iowa gambling task. Front. Psychol. 4, 640 (2013).

  32. 32.

    Ahn, W. Y. et al. Decision-making in stimulant and opiate addicts in protracted abstinence: evidence from computational modeling with pure users. Front. Psychol. 5, 849 (2014).

  33. 33.

    Worthy, D. A. & Maddox, W. T. Age-based differences in strategy use in choice tasks. Front. Neurosci. 5, 145 (2012).

  34. 34.

    Ahn, W.-Y., Krawitz, A., Kim, W., Busemeyer, J. R. & Brown, J. W. A model-based fMRI analysis with hierarchical Bayesian parameter estimation. Decision 1, 8–23 (2013).

  35. 35.

    Byrne, K. A., Norris, D. D. & Worthy, D. A. Dopamine, depressive symptoms, and decision-making: the relationship between spontaneous eye blink rate and depressive symptoms predicts Iowa Gambling Task performance. Cogn. Affect. Behav. Neurosci. 16, 23–36 (2016).

  36. 36.

    Cavanaugh, J. E. Unifying the derivations for the Akaike and corrected Akaike information criteria. Stat. Probab. Lett. 33, 201–208 (1997).

  37. 37.

    Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).

  38. 38.

    Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J. & Friston, K. J. Bayesian model selection for group studies. NeuroImage 46, 1004–1017 (2009).

  39. 39.

    Dajka, J., Łuczka, J. & Hänggi, P. Distance between quantum states in the presence of initial qubit-environment correlations: a comparative study. Phys. Rev. A 84, 032120 (2011).

  40. 40.

    O’Doherty, J. P., Hampton, A. & Kim, H. Model-based fMRI and its application to reward learning and decision making. Ann. N. Y. Acad. Sci. 1104, 35–53 (2007).

  41. 41.

    Ma, W. J. & Jazayeri, M. Neural coding of uncertainty and probability. Annu. Rev. Neurosci. 37, 205–220 (2014).

  42. 42.

    Bach, D. R., Hulme, O., Penny, W. D. & Dolan, R. J. The known unknowns: neural representation of second-order uncertainty, and ambiguity. J. Neurosci. 31, 4811–4820 (2011).

  43. 43.

    Payzan-LeNestour, E., Dunne, S., Bossaerts, P. & O’Doherty, J. P. The neural representation of unexpected uncertainty during value-based decision making. Neuron 79, 191–201 (2013).

  44. 44.

    Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007).

  45. 45.

    Yu, A. J. & Dayan, P. Uncertainty, neuromodulation, and attention. Neuron 46, 681–692 (2005).

  46. 46.

    Singh, V. A potential role of reward and punishment in the facilitation of the emotion-cognition dichotomy in the Iowa Gambling Task. Front. Psychol. 4, 944 (2013).

  47. 47.

    Yechiam, E. & Ert, E. Evaluating the reliance on past choices in adaptive learning models. J. Math. Psychol. 51, 75–84 (2007).

  48. 48.

    Chuang, I. L., Gershenfeld, N. & Kubinec, M. Experimental implementation of fast quantum searching. Phys. Rev. Lett. 80, 3408 (1998).

  49. 49.

    Dunjko, V., Taylor, J. M. & Briegel, H. J. Advances in quantum reinforcement learning. In Proc. 2017 IEEE International Conference on Systems, Man, and Cybernetics 282–287 (IEEE, 2017).

  50. 50.

    Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge Univ. Press, 2010).

  51. 51.

    Yearsley, J. M. Advanced tools and concepts for quantum cognition: a tutorial. J. Math. Psychol. 78, 24–39 (2017).

  52. 52.

    Crawford, D., Levit, A., Ghadermarzy, N., Oberoi, J. S. & Ronagh, P. Reinforcement learning using quantum Boltzmann machines. Quantum Info. Comput. 18, 51–74 (2018).

  53. 53.

    Krain, A. L., Wilson, A. M., Arbuckle, R., Castellanos, F. X. & Milham, M. P. Distinct neural mechanisms of risk and ambiguity: a meta-analysis of decision-making. NeuroImage 32, 477–484 (2006).

  54. 54.

    Hsu, M., Bhatt, M., Adolphs, R., Tranel, D. & Camerer, C. F. Neural systems responding to degrees of uncertainty in human decision-making. Science 310, 1680–1683 (2005).

  55. 55.

    Litt, A., Plassmann, H., Shiv, B. & Rangel, A. Dissociating valuation and saliency signals during decision-making. Cereb. Cortex 21, 95–102 (2010).

  56. 56.

    Wang, Y. et al. Neural substrates of updating the prediction through prediction error during decision making. NeuroImage 157, 1–12 (2017).

  57. 57.

    Vickery, T. J. & Jiang, Y. V. Inferior parietal lobule supports decision making under uncertainty in humans. Cereb. Cortex 19, 916–925 (2008).

  58. 58.

    Xue, G., Lu, Z., Levin, I. P. & Bechara, A. The impact of prior risk experiences on subsequent risky decision-making: the role of the insula. NeuroImage 50, 709–716 (2010).

  59. 59.

    Haggard, P. Human volition: towards a neuroscience of will. Nat. Rev. Neurosci. 9, 934–946 (2008).

  60. 60.

    Nachev, P., Kennard, C. & Husain, M. Functional role of the supplementary and pre-supplementary motor areas. Nat. Rev. Neurosci. 9, 856–869 (2008).

  61. 61.

    Tanji, J. & Kurata, K. Contrasting neuronal activity in supplementary and precentral motor cortex of monkeys. I. Responses to instructions determining motor responses to forthcoming signals of different modalities. J. Neurophysiol. 53, 129–141 (1985).

  62. 62.

    Okano, K. & Tanji, J. Neuronal activities in the primate motor fields of the agranular frontal cortex preceding visually triggered and self-paced movement. Exp. Brain Res. 66, 155–166 (1987).

  63. 63.

    Rushworth, M. F. S. & Behrens, T. E. J. Choice, uncertainty and value in prefrontal and cingulate cortex. Nat. Neurosci. 11, 389–397 (2008).

  64. 64.

    Sul, J. H., Kim, H., Huh, N., Lee, D. & Jung, M. W. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron 66, 449–460 (2010).

  65. 65.

    Kepecs, A., Uchida, N., Zariwala, H. A. & Mainen, Z. F. Neural correlates, computation and behavioural impact of decision confidence. Nature 455, 227–231 (2008).

  66. 66.

    O’Neill, M. & Schultz, W. Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value. Neuron 68, 789–800 (2010).

  67. 67.

    Studer, B., Cen, D. & Walsh, V. The angular gyrus and visuospatial attention in decision-making under risk. NeuroImage 103, 75–80 (2014).

  68. 68.

    Tversky, A. & Kahneman, D. Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5, 297–323 (1992).

  69. 69.

    De Barros, J. A. & Suppes, P. Quantum mechanics, interference, and the brain. J. Math. Psychol. 53, 306–313 (2009).

  70. 70.

    Lambert, N. et al. Quantum biology. Nat. Phys. 9, 10–18 (2013).

  71. 71.

    Busemeyer, J. R., Pothos, E. M., Franco, R. & Trueblood, J. S. A quantum theoretical explanation for probability judgment errors. Psychol. Rev. 118, 193–218 (2011).

  72. 72.

    beim Graben, P. & Atmanspacher, H. Complementarity in classical dynamical systems. Found. Phys. 36, 291–306 (2006).

  73. 73.

    beim Graben, P., Filk, T. & Atmanspacher, H. Epistemic entanglement due to non-generating partitions of classical dynamical systems. Int. J. Theor. Phys. 52, 723–734 (2013).

  74. 74.

    Ivakhnenko, O. V., Shevchenko, S. N. & Nori, F. Simulating quantum dynamical phenomena using classical oscillators: Landau-Zener-Stückelberg-Majorana interferometry, latching modulation, and motional averaging. Sci. Rep. 8, 12218 (2018).

  75. 75.

    Bliokh, K. Y., Bekshaev, A. Y., Kofman, A. G. & Nori, F. Photon trajectories, anomalous velocities and weak measurements: a classical interpretation. New J. Phys. 15, 073022 (2013).

  76. 76.

    Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).

  77. 77.

    Busemeyer, J. R., Fakhari, P. & Kvam, P. Neural implementation of operations used in quantum cognition. Prog. Biophys. Mol. Biol. 130, 53–60 (2017).

  78. 78.

    Phelps, E. A., Lempert, K. M. & Sokol-Hessner, P. Emotion and decision making: multiple modulatory neural circuits. Annu. Rev. Neurosci. 37, 263–287 (2014).

  79. 79.

    Hu, H. Reward and aversion. Annu. Rev. Neurosci. 39, 297–324 (2016).

  80. 80.

    Chen, C., Takahashi, T., Nakagawa, S., Inoue, T. & Kusumi, I. Reinforcement learning in depression: a review of computational research. Neurosci. Biobehav. Rev. 55, 247–267 (2015).

  81. 81.

    Sanfey, A. G. Social decision-making: insights from game theory and neuroscience. Science 318, 598–602 (2007).

  82. 82.

    Roskies, A. L. How does neuroscience affect our conception of volition? Annu. Rev. Neurosci. 33, 109–130 (2010).

  83. 83.

    Schack, R., Brun, T. A. & Caves, C. M. Quantum Bayes rule. Phys. Rev. A 64, 014305 (2001).

  84. 84.

    Kouda, N., Matsui, N., Nishimura, H. & Peper, F. Qubit neural network and its learning efficiency. Neural Comput. Appl. 14, 114–121 (2005).

  85. 85.

    Piotrowski, E. W. & Sladkowski, J. The next stage: quantum game theory. in Mathematical Physics Research at the Cutting Edge (ed. Benton, C. V.) 247–268 (Nova Science Publishers, 2004).

  86. 86.

    Ahn, W.-Y., Krawitz, A., Kim, W., Busemeyer, J. R. & Brown, J. W. A model-based fMRI analysis with hierarchical Bayesian parameter estimation. J. Neurosci. Psychol. Econ. 4, 95–110 (2011).

  87. 87.

    He, Q. et al. Altered dynamics between neural systems sub-serving decisions for unhealthy food. Front. Neurosci. 8, 350 (2014).

  88. 88.

    Brevers, D., Noël, X., He, Q., Melrose, J. A. & Bechara, A. Increased ventral-striatal activity during monetary decision making is a marker of problem poker gambling severity. Addict. Biol. 21, 688–699 (2016).

  89. 89.

    Yechiam, E. & Busemeyer, J. R. Comparison of basic assumptions embedded in learning models for experience-based decision making. Psychon. Bull. Rev. 12, 387–402 (2005).

  90. 90.

    Busemeyer, J. R. & Stout, J. C. A contribution of cognitive decision models to clinical assessment: decomposing performance on the Bechara gambling task. Psychol. Assess. 14, 253–262 (2002).

  91. 91.

    Erev, I. & Barron, G. On adaptation, maximization, and reinforcement learning among cognitive strategies. Psychol. Rev. 112, 912–931 (2005).

  92. 92.

    Ahn, W.-Y., Haines, N. & Zhang, L. Revealing neurocomputational mechanisms of reinforcement learning and decision-making with the hBayesDM package. Comput. Psychiatr. 1, 24–57 (2017).

  93. 93.

    Wagner, A. R. & Rescorla, R. A. in Inhibition and Learning (eds Boakes, R. A. & Halliday, M. S.) 301–336 (1972).

  94. 94.

    Erev, I. & Roth, A. E. Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am. Econ. Rev. 88, 848–881 (1998).

  95. 95.

    Grover, L. K. A fast quantum mechanical algorithm for database search. In Proc. 28th Annual ACM Symposium on Theory of Computing 212–219 (ACM, 1996).

  96. 96.

    Acerbi, L. & Ji, W. Practical Bayesian optimization for model fitting with Bayesian adaptive direct search. Adv. Neural Inf. Proc. Syst. 30, 1836–1846 (2017).

  97. 97.

    Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723 (1974).

  98. 98.

    Cox, R. W. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput. Biomed. Res. 29, 162–173 (1996).

  99. 99.

    Li, N. et al. Resting-state functional connectivity predicts impulsivity in economic decision-making. J. Neurosci. 33, 4886–4895 (2013).

Download references


We thank Y. Yang, R. Zha, J. Besumeyer and N. Ma for their inspirational comments. We thank L. Acerbi, G. R. Yang, C. Gneiting, A. Miranowicz, X. Li, Z. Jin and X. Li for their helpful suggestions. This work was supported by grants from the National Key Basic Research Programme (grant nos. 2016YFA0400900 and 2018YFC0831101), the National Natural Science Foundation of China (grant nos. 31471071, 31771221, 61773360, 71671115, 71874170 and 71942003), the Fundamental Research Funds for the Central Universities of China, the MURI Center for Dynamic Magneto-Optics via the Air Force Office of Scientific Research (AFOSR; grant no. FA9550-14-1-0040), the Army Research Office (ARO; grant no. W911NF-18-1-0358), the Asian Office of Aerospace Research and Development (AOARD; grant no. FA2386-18-1-4045), the Japan Science and Technology Agency (JST; via the Q-LEAP programme and CREST grant no. JPMJCR1676), the Japan Society for the Promotion of Science (JSPS; JSPS–RFBR grant no. 17-52-50023 and JSPS–FWO grant no. VS.059.18N), the RIKEN–AIST Challenge Research Fund, the Templeton Foundation, the Foundational Questions Institute (FQXi) and the NTT PHI Laboratory, the Australian Research Council’s Discovery Projects funding scheme under Project DP190101566, the Alexander von Humboldt Foundation and the US Office of Naval Research. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the Bioinformatics Centre of the University of Science and Technology of China, School of Life Science for providing supercomputing resources for this project.

Author information

L.J.-A., Y.P. and X.Z. conceived the study. Y.L. and Z.W. provided the devices and collected the data. L.J.-A. built the models. L.J.-A. and Z.W. analysed the data. All authors participated in discussions. L.J.-A., D.D., Y.P., F.N. and X.Z. wrote the paper. X.Z. supervised the project and acquired funding.

Correspondence to Xiaochu Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Primary Handling Editor: Stavroula Kousta.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary Results, Supplementary Discussion, Supplementary Figs. 1–13, Supplementary Tables 1–6 and Supplementary References.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, J., Dong, D., Wei, Z. et al. Quantum reinforcement learning during human decision-making. Nat Hum Behav 4, 294–307 (2020). https://doi.org/10.1038/s41562-019-0804-2

Download citation