Outracing champion Gran Turismo drivers with deep reinforcement learning

Wurman, Peter R.; Barrett, Samuel; Kawamoto, Kenta; MacGlashan, James; Subramanian, Kaushik; Walsh, Thomas J.; Capobianco, Roberto; Devlic, Alisa; Eckert, Franziska; Fuchs, Florian; Gilpin, Leilani; Khandelwal, Piyush; Kompella, Varun; Lin, HaoChih; MacAlpine, Patrick; Oller, Declan; Seno, Takuma; Sherstan, Craig; Thomure, Michael D.; Aghabozorgi, Houmehr; Barrett, Leon; Douglas, Rory; Whitehead, Dion; Dürr, Peter; Stone, Peter; Spranger, Michael; Kitano, Hiroaki

doi:10.1038/s41586-021-04357-7

Article
Published: 09 February 2022

Outracing champion Gran Turismo drivers with deep reinforcement learning

Peter R. Wurman ORCID: orcid.org/0000-0001-9349-0624¹,
Samuel Barrett¹,
Kenta Kawamoto²,
James MacGlashan¹,
Kaushik Subramanian³,
Thomas J. Walsh¹,
Roberto Capobianco ORCID: orcid.org/0000-0002-2219-215X³,
Alisa Devlic³,
Franziska Eckert³,
Florian Fuchs ORCID: orcid.org/0000-0002-1072-3718³,
Leilani Gilpin ORCID: orcid.org/0000-0002-9741-2014¹,
Piyush Khandelwal¹,
Varun Kompella¹,
HaoChih Lin³,
Patrick MacAlpine¹,
Declan Oller¹,
Takuma Seno²,
Craig Sherstan¹,
Michael D. Thomure¹,
Houmehr Aghabozorgi¹,
Leon Barrett ORCID: orcid.org/0000-0001-9896-3259¹,
Rory Douglas¹,
Dion Whitehead¹,
Peter Dürr ORCID: orcid.org/0000-0002-3840-5009³,
Peter Stone¹,
Michael Spranger ORCID: orcid.org/0000-0001-9443-7008² &
…
Hiroaki Kitano²

Nature volume 602, pages 223–228 (2022)Cite this article

35k Accesses
118 Citations
2315 Altmetric
Metrics details

Subjects

Abstract

Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits¹. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics. In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

TacticAI: an AI assistant for football tactics

Article Open access 19 March 2024

Solving olympiad geometry without human demonstrations

Article Open access 17 January 2024

Data availability

There are no static data associated with this project. All data are generated from scratch by the agent each time it learns. Videos of the races are available at https://sonyai.github.io/gt_sophy_public.

Code availability

Pseudocode detailing the training process and algorithms used is available as a supplement to this article. The agent interface in GT is not enabled in commercial versions of the game; however, Polyphony Digital has provided a small number of universities and research facilities outside Sony access to the API and is considering working with other groups.

References

Milliken, W. F. et al. Race Car Vehicle Dynamics Vol. 400 (Society of Automotive Engineers, 1995).
Mnih, V. et al. Playing Atari with deep reinforcement learning. Preprint at https://arxiv.org/abs/1312.5602 (2013).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Article ADS CAS Google Scholar
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Article ADS CAS Google Scholar
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Article ADS CAS Google Scholar
Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at https://arxiv.org/abs/1912.06680 (2019).
Laurense, V. A., Goh, J. Y. & Gerdes, J. C. In 2017 American Control Conference (ACC) 5586–5591 (IEEE, 2017).
Spielberg, N. A., Brown, M., Kapania, N. R., Kegelman, J. C. & Gerdes, J. C. Neural network vehicle models for high-performance automated driving. Sci. Robot. 4, eaaw1975 (2019).
Article Google Scholar
Burke, K. Data makes it beta: Roborace returns for second season with updateable self-driving vehicles powered by NVIDIA DRIVE. The Official NVIDIA Blog https://blogs.nvidia.com/blog/2020/10/29/roborace-second-season-nvidia-drive/ (2020).
Leporati, G. No driver? no problem—this is the Indy Autonomous Challenge. Ars Technica https://arstechnica.com/cars/2021/07/a-science-fair-or-the-future-of-racing-the-indy-autonomous-challenge/ (2021).
Williams, G., Drews, P., Goldfain, B., Rehg, J. M. & Theodorou, E. A. In 2016 IEEE International Conference on Robotics and Automation (ICRA) 1433–1440 (IEEE, 2016).
Williams, G., Drews, P., Goldfain, B., Rehg, J. M. & Theodorou, E. A. Information-theoretic model predictive control: theory and applications to autonomous driving. IEEE Trans. Robot. 34, 1603–1622 (2018).
Article Google Scholar
Pan, Y. et al. In Proc. Robotics: Science and Systems XIV (eds Kress-Gazit, H., Srinivasa, S., Howard, T. & Atanasov, N.) https://doi.org/10.15607/RSS.2018.XIV.056 (Carnegie Mellon Univ., 2018).
Pan, Y. et al. Imitation learning for agile autonomous driving. Int. J. Robot. Res. 39, 286–302 (2020).
Article Google Scholar
Amazon Web Services. AWS DeepRacer League. https://aws.amazon.com/deepracer/league/ (2019).
Pyeatt, L. D. & Howe, A. E. Learning to race: experiments with a simulated race car. In Proc. Eleventh International FLAIRS Conference 357–361 (AAAI, 1998).
Chaperot, B. & Fyfe, C. In 2006 IEEE Symposium on Computational Intelligence and Games 181–186 (IEEE, 2006).
Cardamone, L., Loiacono, D. & Lanzi, P. L. In Proc. 11th Annual Conference on Genetic and Evolutionary Computation 1179–1186 (ACM, 2009).
Cardamone, L., Loiacono, D. & Lanzi, P. L. In 2009 IEEE Congress on Evolutionary Computation 2622–2629 (IEEE, 2009).
Loiacono, D., Prete, A., Lanzi, L. & Cardamone, L. In IEEE Congress on Evolutionary Computation 1–8 (IEEE, 2010).
Jaritz, M., de Charette, R., Toromanoff, M., Perot, E. & Nashashibi, F. In 2018 IEEE International Conference on Robotics and Automation (ICRA) 2070–2075 (IEEE, 2018).
Weiss, T. & Behl, M. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1163–1168 (IEEE, 2020).
Weiss, T., Babu, V. S. & Behl, M. In NeurIPS 2020 Workshop on Machine Learning for Autonomous Driving (NeurIPS, 2020).
Fuchs, F., Song, Y., Kaufmann, E., Scaramuzza, D. & Dürr, P. Super-human performance in Gran Turismo Sport using deep reinforcement learning. IEEE Robot. Autom. Lett. 6, 4257–4264 (2021).
Article Google Scholar
Song, Y., Lin, H., Kaufmann, E., Dürr, P. & Scaramuzza, D. In Proc. IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2021).
Theodosis, P. A. & Gerdes, J. C. In Dynamic Systems and Control Conference Vol. 45295, 235–241 (American Society of Mechanical Engineers, 2012).
Funke, J. et al. In 2012 IEEE Intelligent Vehicles Symposium 541–547 (IEEE, 2012).
Kritayakirana, K. & Gerdes, J. C. Autonomous vehicle control at the limits of handling. Int. J. Veh. Auton. Syst. 10, 271–296 (2012).
Article Google Scholar
Bonkowski, J. Here’s what you missed from the Indy Autonomous Challenge main event. Autoweek https://www.autoweek.com/racing/more-racing/a38069263/what-missed-indy-autonomous-challenge-main-event/ (2021).
Rutherford, S. J. & Cole, D. J. Modelling nonlinear vehicle dynamics with neural networks. Int. J. Veh. Des. 53, 260–287 (2010).
Article Google Scholar
Pomerleau, D. A. In Robot Learning (eds Connell, J. H. & Mahadevan, S.) 19–43 (Springer, 1993).
Togelius, J. & Lucas, S. M. In 2006 IEEE International Conference on Evolutionary Computation 1187–1194 (IEEE, 2006).
Schwarting, W. et al. Deep latent competition: learning to race using visual control policies in latent space. Preprint at https://arxiv.org/abs/2102.09812 (2021).
Gozli, D. G., Bavelier, D. & Pratt, J. The effect of action video game playing on sensorimotor learning: evidence from a movement tracking task. Hum. Mov. Sci. 38, 152–162 (2014).
Article Google Scholar
Davids, K., Williams, A. M. & Williams, J. G. Visual Perception and Action in Sport (Routledge, 2005).
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. In Proc. 35th International Conference on Machine Learning 1856–1865 (PMLR, 2018).
Haarnoja, T. et al. Soft actor-critic algorithms and applications. Preprint at https://arxiv.org/abs/1812.05905 (2018).
Mnih, V. et al. In Proc. 33rd International Conference on Machine Learning 1928–1937 (PMLR, 2016).
Dabney, W., Rowland, M., Bellemare, M. G. & Munos, R. In 32nd AAAI Conference on Artificial Intelligence (AAAI, 2018).
Lin, L.-J. Reinforcement Learning for Robots Using Neural Networks. Dissertation, Carnegie Mellon Univ. (1993).
Siu, H. C. et al. Evaluation of human-AI teams for learned and rule-based agents in Hanabi. Preprint at https://arxiv.org/abs/2107.07630 (2021).
Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994).
Article Google Scholar
Devore, J. L. Probability and Statistics for Engineering and the Sciences 6th edn (Brooks/Cole, 2004).
Xia, L., Zhou, Z., Yang, J. & Zhao, Q. DSAC: distributional soft actor critic for risk-sensitive reinforcement learning. Preprint at https://arxiv.org/abs/2004.14547 (2020).
Fujimoto, S., van Hoof, H. & Meger, D. In Proc. 35th International Conference on Machine Learning 1587–1596 (PMLR, 2018).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
MathSciNet MATH Google Scholar
Liu, Z., Li, X., Kang, B. & Darrell, T. In International Conference on Learning Representations (ICLR, 2021).
Kingma, D. P. & Ba, J. In International Conference on Learning Representations (ICLR, 2015).
Cassirer, A. et al. Reverb: a framework for experience replay. Preprint at https://arxiv.org/abs/2102.04736 (2021).
Narvekar, S., Sinapov, J., Leonetti, M. & Stone, P. In Proc. 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016) (2016).

Download references

Acknowledgements

We thank K. Yamauchi, S. Takano, A. Hayashi, C. Ferreira, N. Nozawa, T. Teramoto, M. Hakim, K. Yamada, S. Sakamoto, T. Ueda, A. Yago, J. Nakata and H. Imanishi at Polyphony Digital for making the Gran Turismo franchise, providing support throughout the project and organizing the Race Together events on 2 July 2021 and 21 October 2021. We also thank U. Gallizzi, J. Beltran, G. Albowicz, R. Abdul-ahad and the staff at CGEI for access to their PlayStation Now network to train agents and their help building the infrastructure for our experiments. We benefited from the advice of T. Grossenbacher, a retired competitive GT driver. Finally, we thank E. Kato Marcus and E. Ohshima of Sony AI, who managed the partnership activities with Polyphony Digital and Sony Interactive Entertainment.

Author information

Authors and Affiliations

Sony AI, New York, NY, USA
Peter R. Wurman, Samuel Barrett, James MacGlashan, Thomas J. Walsh, Leilani Gilpin, Piyush Khandelwal, Varun Kompella, Patrick MacAlpine, Declan Oller, Craig Sherstan, Michael D. Thomure, Houmehr Aghabozorgi, Leon Barrett, Rory Douglas, Dion Whitehead & Peter Stone
Sony AI, Tokyo, Japan
Kenta Kawamoto, Takuma Seno, Michael Spranger & Hiroaki Kitano
Sony AI, Zürich, Switzerland
Kaushik Subramanian, Roberto Capobianco, Alisa Devlic, Franziska Eckert, Florian Fuchs, HaoChih Lin & Peter Dürr

Authors

Peter R. Wurman
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Barrett
View author publications
You can also search for this author in PubMed Google Scholar
Kenta Kawamoto
View author publications
You can also search for this author in PubMed Google Scholar
James MacGlashan
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Subramanian
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Walsh
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Capobianco
View author publications
You can also search for this author in PubMed Google Scholar
Alisa Devlic
View author publications
You can also search for this author in PubMed Google Scholar
Franziska Eckert
View author publications
You can also search for this author in PubMed Google Scholar
Florian Fuchs
View author publications
You can also search for this author in PubMed Google Scholar
Leilani Gilpin
View author publications
You can also search for this author in PubMed Google Scholar
Piyush Khandelwal
View author publications
You can also search for this author in PubMed Google Scholar
Varun Kompella
View author publications
You can also search for this author in PubMed Google Scholar
HaoChih Lin
View author publications
You can also search for this author in PubMed Google Scholar
Patrick MacAlpine
View author publications
You can also search for this author in PubMed Google Scholar
Declan Oller
View author publications
You can also search for this author in PubMed Google Scholar
Takuma Seno
View author publications
You can also search for this author in PubMed Google Scholar
Craig Sherstan
View author publications
You can also search for this author in PubMed Google Scholar
Michael D. Thomure
View author publications
You can also search for this author in PubMed Google Scholar
Houmehr Aghabozorgi
View author publications
You can also search for this author in PubMed Google Scholar
Leon Barrett
View author publications
You can also search for this author in PubMed Google Scholar
Rory Douglas
View author publications
You can also search for this author in PubMed Google Scholar
Dion Whitehead
View author publications
You can also search for this author in PubMed Google Scholar
Peter Dürr
View author publications
You can also search for this author in PubMed Google Scholar
Peter Stone
View author publications
You can also search for this author in PubMed Google Scholar
Michael Spranger
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Kitano
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.R.W. managed the project. S.B., K.K., P.K., J.M., K.S. and T.J.W. led the research and development efforts. R.C., A.D., F.E., F.F., L.G., V.K., H.L., P.M., D.O., C.S., T.S. and M.D.T. participated in the research and the development of GT Sophy and the AI libraries. H.A., L.B., R.D. and D.W. built the research platform that connected to CGEI’s PlayStation network. P.S. provided executive support and technical and research advice and P.D. provided executive support and technical advice. H.K. and M.S. conceived and set up the project, provided executive support, resources and technical advice and managed stakeholders.

Corresponding author

Correspondence to Peter R. Wurman.

Ethics declarations

Competing interests

P.R.W. and other team members have submitted US provisional patent application 63/267,136 covering aspects of the scenario training techniques described in this paper.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Seaside and Sarthe training.

Kudos Prime data from global time-trial challenges on Seaside (a and b) and Sarthe (c and d), with the cars used in the competition. Note that these histograms represent the single best lap time for more than 12,000 individual players on Seaside and almost 9,000 on Sarthe. In both cases, the secondary diagrams compare the top five human times to a histogram of 100 laps by the 2 July 2021 time-trial version of GT Sophy. In both cases, the data show that GT Sophy was reliably superhuman, with all 100 laps better than the best human laps. Not surprisingly, it takes longer for the agent to train on the much longer Sarthe course, taking 48 h to reach the 99th percentile of human performance. e, Histogram of a snapshot of the ERB during training on Sarthe on the basis of the scenario breakdown in Fig. 1f. The x axis is the course position and the stacked colours represent the number of samples that were collected in that region from each scenario. In a more condensed format than Fig. 1f, f and g show the sections of Seaside and Maggiore that were used for skill training.

Extended Data Fig. 2 Time trial on Sarthe.

An analysis of Igor Fraga’s best lap in the time-trial test compared with GT Sophy’s lap. a, Areas of the track where Igor lost time with respect to GT Sophy. Corner 20, highlighted in yellow, shows an interesting effect common to the other corners in that Igor seems to catch up a little by braking later, but then loses time because he has to brake longer and comes out of the corner slower. Igor’s steering controls (b) and Igor’s throttle and braking (c) compared with GT Sophy on corner 20. Through the steering wheel and brake pedal, Igor is able to give smooth, 60-Hz signals compared with GT Sophy’s 10-Hz action rate.

Extended Data Fig. 3 Policy selection.

An illustration of the process by which policies were selected to run in the final race. Starting on the left side of the diagram, thousands of policies were generated and saved during the experiments. They were first filtered in the experiment to select the subset on the Pareto frontier of a simple evaluation criteria trading off lap time versus off-course and collision metrics. The selected policies were run through a series of tests evaluating their overall racing performance against a common set of opponents and their performance on a variety of hand-crafted skill tests. The results were ranked and human judgement was applied to select a small number of candidate policies. These policies were matched up in round-robin, policy-versus-policy competitions. The results were again analysed by the human committee for overall team scores and collision metrics. The best candidate policies were run in short races against test drivers at Polyphony Digital. Their subjective evaluations were included in the final decisions on which policies to run in the October 2021 event.

Extended Data Table 1 Reward weights

Full size table

Supplementary information

Supplementary Information

This file contains more details about the training procedures and algorithms in the form of pseudocode. It also contains several tables that detail the hyperparameters used in training.

Peer Review File

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wurman, P.R., Barrett, S., Kawamoto, K. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022). https://doi.org/10.1038/s41586-021-04357-7

Download citation

Received: 09 August 2021
Accepted: 15 December 2021
Published: 09 February 2022
Issue Date: 10 February 2022
DOI: https://doi.org/10.1038/s41586-021-04357-7

This article is cited by

Stable training via elastic adaptive deep reinforcement learning for autonomous navigation of intelligent vehicles
- Yujiao Zhao
- Yong Ma
- Xinping Yan
Communications Engineering (2024)
QvQ-IL: quantity versus quality in incremental learning
- Jidong Han
- Ting Zhang
- Yujian Li
Neural Computing and Applications (2024)
A fast spatio-temporal temperature predictor for vacuum assisted resin infusion molding process based on deep machine learning modeling
- Runyu Zhang
- Yingjian Liu
- Dong Qian
Journal of Intelligent Manufacturing (2024)
Subspace Adaptation Prior for Few-Shot Learning
- Mike Huisman
- Aske Plaat
- Jan N. van Rijn
Machine Learning (2024)
Deep reinforcement learning based on balanced stratified prioritized experience replay for customer credit scoring in peer-to-peer lending
- Yadong Wang
- Yanlin Jia
- Jin Xiao
Artificial Intelligence Review (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.