Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Experimental quantum speed-up in reinforcement learning agents


As the field of artificial intelligence advances, the demand for algorithms that can learn quickly and efficiently increases. An important paradigm within artificial intelligence is reinforcement learning1, where decision-making entities called agents interact with environments and learn by updating their behaviour on the basis of the obtained feedback. The crucial question for practical applications is how fast agents learn2. Although various studies have made use of quantum mechanics to speed up the agent’s decision-making process3,4, a reduction in learning time has not yet been demonstrated. Here we present a reinforcement learning experiment in which the learning process of an agent is sped up by using a quantum communication channel with the environment. We further show that combining this scenario with classical communication enables the evaluation of this improvement and allows optimal control of the learning progress. We implement this learning protocol on a compact and fully tunable integrated nanophotonic processor. The device interfaces with telecommunication-wavelength photons and features a fast active-feedback mechanism, demonstrating the agent’s systematic quantum advantage in a setup that could readily be integrated within future large-scale quantum communication networks.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic of a learning agent.
Fig. 2: Experimental setup.
Fig. 3: Circuit implementation.
Fig. 4: Behaviour of the average reward η for different learning strategies.

Data availability

All the datasets used in the current work are available on Zenodo at


  1. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).

  2. Dunjko, V., Taylor, J. M. & Briegel, H. J. Quantum-enhanced machine learning. Phys. Rev. Lett. 117, 130501 (2016).

    Article  ADS  MathSciNet  Google Scholar 

  3. Paparo, G. D., Dunjiko, V., Makmal, A., Martin-Delgrado, M. A. & Briegel, H. J. Quantum speedup for active learning agents. Phys. Rev. X4, 031002 (2014).

    Google Scholar 

  4. Sriarunothai, T. et al. Speeding-up the decision making of a learning agent using an ion trap quantum processor. Quantum Sci. Technol. 4, 015014 (2019).

    Article  ADS  Google Scholar 

  5. Johannink, T. et al. Residual reinforcement learning for robot control. In 2019 International Conference on Robotics and Automation (ICRA) 6023–6029 (IEEE, 2019).

  6. Tjandra, A., Sakti, S. & Nakamura, S. Sequence-to-aequence ASR optimization via reinforcement learning. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5829–5833 (IEEE, 2018).

  7. Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C. & Faisal A. A. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 1716–1720 (2018).

    Article  CAS  Google Scholar 

  8. Thakur, C. S. et al. Large-scale neuromorphic spiking array processors: a quest to mimic the brain. Front. Neurosci. 12, 891 (2018).

    Article  Google Scholar 

  9. Steinbrecher, G. R., Olson, J. P., Englund, D. & Carolan, J. Quantum optical neural networks. npj Quantum Inf. 5, 60 (2019).

    Article  ADS  Google Scholar 

  10. Silver, D. et al. Mastering the game of Go without human knowledge. Nature550, 354–359 (2017).

    Article  ADS  CAS  Google Scholar 

  11. Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature574, 505–510 (2019).

    Article  ADS  CAS  Google Scholar 

  12. Dong. D., Chen, C., Li, H. & Tarn, T.-J. Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. B38, 1207–1220 (2008).

    Article  Google Scholar 

  13. Dunjko, V. & Briegel, H. J. Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Rep. Prog. Phys. 81, 074001 (2018).

    Article  ADS  MathSciNet  Google Scholar 

  14. Baireuther, P., O’Brien, T. E., Tarasinski, B. & Beenakker, C. W. J. Machine-learning-assisted correction of correlated qubit errors in a topological code. Quantum2, 48 (2018).

    Article  Google Scholar 

  15. Breuckmann, N. P. & Ni, X. Scalable neural network decoders for higher dimensional quantum codes. Quantum2, 68–92 (2018).

    Article  Google Scholar 

  16. Chamberland, C. & Ronagh, P. Deep neural decoders for near term fault-tolerant experiments. Quant. Sci. Technol. 3, 044002 (2018).

    Article  ADS  Google Scholar 

  17. Fösel, T., Tighineanu, P., Weiss, T. & Marquardt, F. Reinforcement learning with neural networks for quantum feedback. Phys. Rev. X8, 031084 (2018).

    Google Scholar 

  18. Poulsen Nautrup, H., Delfosse, N., Dunjko, V., Briegel, H. J. & Friis, N. Optimizing quantum error correction codes with reinforcement learning. Quantum3, 215 (2019).

    Article  Google Scholar 

  19. Yu, S. et al. Reconstruction of a photonic qubit state with reinforcement learning. Adv. Quantum Technol. 2, 1800074 (2019).

    Article  Google Scholar 

  20. Krenn, M., Malik, M., Fickler, R., Lapkiewicz, R. & Zeilinger, A. Automated search for new quantum experiments. Phys. Rev. Lett. 116, 090405 (2016).

    Article  ADS  Google Scholar 

  21. Melnikov, A. A. et al. Active learning machine learns to create new quantum experiments. Proc. Natl Acad. Sci. USA115, 1221–1226 (2018).

    Article  ADS  CAS  Google Scholar 

  22. Dunjko, V., Friis, N. & Briegel, H. J. Quantum-enhanced deliberation of learning agents using trapped ions. New J. Phys. 17, 023006 (2015).

    Article  ADS  Google Scholar 

  23. Jerbi, S., Poulsen Nautrup, H., Trenkwalder, L. M., Briegel, H. J. & Dunjko, V. A framework for deep energy-based reinforcement learning with quantum speed-up. Preprint at (2019).

  24. Kimble, H. J. The quantum internet. Nature453, 1023–1030 (2008).

    Article  ADS  CAS  Google Scholar 

  25. Cacciapuoti, A. S. et al. Quantum internet: networking challenges in distributed quantum computing. IEEE Netw. 34, 137–143 (2020).

    Article  Google Scholar 

  26. Briegel, H. J. & De las Cuevas, G. Projective simulation for artificial intelligence. Sci. Rep. 2, 400 (2012).

    Article  Google Scholar 

  27. Grover, L. K. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett. 79, 325–328 (1997).

    Article  ADS  CAS  Google Scholar 

  28. Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge Univ. Press, 2000).

  29. Flamini, F. et al. Photonic architecture for reinforcement learning. New. J. Phys. 22, 045002 (2020).

    Article  ADS  MathSciNet  Google Scholar 

  30. Harris, N. C. et al. Quantum transport simulations in a programmable nanophotonic processor. Nat. Photon. 11, 447–452 (2017).

    Article  ADS  CAS  Google Scholar 

  31. Boyer, M., Brassard, G., Hoyer, P. & Tappa, A. Tight bounds on quantum searching. Fortschr. Phys. 46, 493–505 (1998).

    Article  Google Scholar 

  32. Senellart, P., Solomon, G. & White, A. High-performance semiconductor quantum-dot single-photon sources. Nat. Nanotechnol. 12, 1026–1039 (2017).

    Article  ADS  CAS  Google Scholar 

  33. Wan, N. H. et al. Large-scale integration of artificial atoms in hybrid photonic circuits. Nature583, 226–231 (2020).

    Article  ADS  CAS  Google Scholar 

  34. Northup, T. E. & Blatt, R. Quantum information transfer using photons. Nat. Photon. 8, 356–363 (2014).

    Article  ADS  CAS  Google Scholar 

  35. Denil, M. et al. Learning to perform physics experiments via deep reinforcement learning. Proc. Int. Conf. on Learning Representations (2017).

  36. Bukov, M. et al. Reinforcement learning in different phases of quantum control. Phys. Rev. X8, 031086 (2018).

    CAS  Google Scholar 

  37. Poulsen Nautrup, H. et al. Operationally meaningful representations of physical systems in neural networks. Preprint at (2020).

  38. Yoder, T. J., Low, G. H. & Chuang, I. L. Fixed-point quantum search with an optimal number of queries. Phys. Rev. Lett. 113, 210501 (2014).

    Article  ADS  Google Scholar 

  39. Kim, T., Fiorentino, M. & Wong, F. N. C. Phase-stable source of polarization-entangled photons using a polarization Sagnac interferometer. Phys. Rev. A73, 012316 (2006).

    Article  ADS  Google Scholar 

  40. Saggio, V. et al. Experimental few-copy multipartite entanglement detection. Nat. Phys. 15, 935–940 (2019).

    Article  CAS  Google Scholar 

  41. Marsili, F. et al. Detecting single infrared photons with 93% system efficiency. Nat. Photon. 7, 210–214 (2013).

    Article  ADS  CAS  Google Scholar 

Download references


We thank L. A. Rozema, I. Alonso Calafell and P. Jenke for help with the detectors. A.H. acknowledges support from the Austrian Science Fund (FWF) through the project P 30937-N27. V.D. acknowledges support from the Dutch Research Council (NWO/OCW), as part of the Quantum Software Consortium programme (project number 024.003.037). N.F. acknowledges support from the Austrian Science Fund (FWF) through the project P 31339-N27. H.J.B. acknowledges support from the Austrian Science Fund (FWF) through SFB BeyondC F7102, the Ministerium für Wissenschaft, Forschung, und Kunst Baden-Württemberg (Az. 33-7533-30-10/41/1) and the Volkswagen Foundation (Az. 97721). P.W. acknowledges support from the research platform TURIS, the European Commission through ErBeStA (no. 800942), HiPhoP (no. 731473), UNIQORN (no. 820474), EPIQUS (no. 899368), and AppQInfo (no. 956071), from the Austrian Science Fund (FWF) through CoQuS (W1210-N25), BeyondC (F 7113) and Research Group (FG 5), and Red Bull GmbH. The MIT portion of the work was supported in part by AFOSR award FA9550-16-1-0391 and NTT Research.

Author information

Authors and Affiliations



V.S. and B.E.A. implemented the experiment and performed data analysis. A.H., V.D., N.F., S.W. and H.J.B. developed the theoretical idea. T.S. and P.S. provided help with the experimental implementation. N.C.H., M.H. and D.E. designed the nanophotonic processor. V.S., S.W. and P.W. supervised the project. All the authors contributed to writing the paper.

Corresponding authors

Correspondence to V. Saggio or P. Walther.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review informationNature thanks Vojtěch Havlíček, Lucas Lamata and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saggio, V., Asenbeck, B.E., Hamann, A. et al. Experimental quantum speed-up in reinforcement learning agents. Nature 591, 229–233 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing