Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Outracing champion Gran Turismo drivers with deep reinforcement learning

Abstract

Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits1. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics. In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Training.
Fig. 2: Ablations.
Fig. 3: Results.

Data availability

There are no static data associated with this project. All data are generated from scratch by the agent each time it learns. Videos of the races are available at https://sonyai.github.io/gt_sophy_public.

Code availability

Pseudocode detailing the training process and algorithms used is available as a supplement to this article. The agent interface in GT is not enabled in commercial versions of the game; however, Polyphony Digital has provided a small number of universities and research facilities outside Sony access to the API and is considering working with other groups.

References

  1. Milliken, W. F. et al. Race Car Vehicle Dynamics Vol. 400 (Society of Automotive Engineers, 1995).

  2. Mnih, V. et al. Playing Atari with deep reinforcement learning. Preprint at https://arxiv.org/abs/1312.5602 (2013).

  3. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article  ADS  CAS  Google Scholar 

  4. Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

    Article  ADS  CAS  Google Scholar 

  5. Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

    Article  ADS  CAS  Google Scholar 

  6. Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at https://arxiv.org/abs/1912.06680 (2019).

  7. Laurense, V. A., Goh, J. Y. & Gerdes, J. C. In 2017 American Control Conference (ACC) 5586–5591 (IEEE, 2017).

  8. Spielberg, N. A., Brown, M., Kapania, N. R., Kegelman, J. C. & Gerdes, J. C. Neural network vehicle models for high-performance automated driving. Sci. Robot. 4, eaaw1975 (2019).

    Article  Google Scholar 

  9. Burke, K. Data makes it beta: Roborace returns for second season with updateable self-driving vehicles powered by NVIDIA DRIVE. The Official NVIDIA Blog https://blogs.nvidia.com/blog/2020/10/29/roborace-second-season-nvidia-drive/ (2020).

  10. Leporati, G. No driver? no problem—this is the Indy Autonomous Challenge. Ars Technica https://arstechnica.com/cars/2021/07/a-science-fair-or-the-future-of-racing-the-indy-autonomous-challenge/ (2021).

  11. Williams, G., Drews, P., Goldfain, B., Rehg, J. M. & Theodorou, E. A. In 2016 IEEE International Conference on Robotics and Automation (ICRA) 1433–1440 (IEEE, 2016).

  12. Williams, G., Drews, P., Goldfain, B., Rehg, J. M. & Theodorou, E. A. Information-theoretic model predictive control: theory and applications to autonomous driving. IEEE Trans. Robot. 34, 1603–1622 (2018).

    Article  Google Scholar 

  13. Pan, Y. et al. In Proc. Robotics: Science and Systems XIV (eds Kress-Gazit, H., Srinivasa, S., Howard, T. & Atanasov, N.) https://doi.org/10.15607/RSS.2018.XIV.056 (Carnegie Mellon Univ., 2018).

  14. Pan, Y. et al. Imitation learning for agile autonomous driving. Int. J. Robot. Res. 39, 286–302 (2020).

    Article  Google Scholar 

  15. Amazon Web Services. AWS DeepRacer League. https://aws.amazon.com/deepracer/league/ (2019).

  16. Pyeatt, L. D. & Howe, A. E. Learning to race: experiments with a simulated race car. In Proc. Eleventh International FLAIRS Conference 357–361 (AAAI, 1998).

  17. Chaperot, B. & Fyfe, C. In 2006 IEEE Symposium on Computational Intelligence and Games 181–186 (IEEE, 2006).

  18. Cardamone, L., Loiacono, D. & Lanzi, P. L. In Proc. 11th Annual Conference on Genetic and Evolutionary Computation 1179–1186 (ACM, 2009).

  19. Cardamone, L., Loiacono, D. & Lanzi, P. L. In 2009 IEEE Congress on Evolutionary Computation 2622–2629 (IEEE, 2009).

  20. Loiacono, D., Prete, A., Lanzi, L. & Cardamone, L. In IEEE Congress on Evolutionary Computation 1–8 (IEEE, 2010).

  21. Jaritz, M., de Charette, R., Toromanoff, M., Perot, E. & Nashashibi, F. In 2018 IEEE International Conference on Robotics and Automation (ICRA) 2070–2075 (IEEE, 2018).

  22. Weiss, T. & Behl, M. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1163–1168 (IEEE, 2020).

  23. Weiss, T., Babu, V. S. & Behl, M. In NeurIPS 2020 Workshop on Machine Learning for Autonomous Driving (NeurIPS, 2020).

  24. Fuchs, F., Song, Y., Kaufmann, E., Scaramuzza, D. & Dürr, P. Super-human performance in Gran Turismo Sport using deep reinforcement learning. IEEE Robot. Autom. Lett. 6, 4257–4264 (2021).

    Article  Google Scholar 

  25. Song, Y., Lin, H., Kaufmann, E., Dürr, P. & Scaramuzza, D. In Proc. IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2021).

  26. Theodosis, P. A. & Gerdes, J. C. In Dynamic Systems and Control Conference Vol. 45295, 235–241 (American Society of Mechanical Engineers, 2012).

  27. Funke, J. et al. In 2012 IEEE Intelligent Vehicles Symposium 541–547 (IEEE, 2012).

  28. Kritayakirana, K. & Gerdes, J. C. Autonomous vehicle control at the limits of handling. Int. J. Veh. Auton. Syst. 10, 271–296 (2012).

    Article  Google Scholar 

  29. Bonkowski, J. Here’s what you missed from the Indy Autonomous Challenge main event. Autoweek https://www.autoweek.com/racing/more-racing/a38069263/what-missed-indy-autonomous-challenge-main-event/ (2021).

  30. Rutherford, S. J. & Cole, D. J. Modelling nonlinear vehicle dynamics with neural networks. Int. J. Veh. Des. 53, 260–287 (2010).

    Article  Google Scholar 

  31. Pomerleau, D. A. In Robot Learning (eds Connell, J. H. & Mahadevan, S.) 19–43 (Springer, 1993).

  32. Togelius, J. & Lucas, S. M. In 2006 IEEE International Conference on Evolutionary Computation 1187–1194 (IEEE, 2006).

  33. Schwarting, W. et al. Deep latent competition: learning to race using visual control policies in latent space. Preprint at https://arxiv.org/abs/2102.09812 (2021).

  34. Gozli, D. G., Bavelier, D. & Pratt, J. The effect of action video game playing on sensorimotor learning: evidence from a movement tracking task. Hum. Mov. Sci. 38, 152–162 (2014).

    Article  Google Scholar 

  35. Davids, K., Williams, A. M. & Williams, J. G. Visual Perception and Action in Sport (Routledge, 2005).

  36. Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. In Proc. 35th International Conference on Machine Learning 1856–1865 (PMLR, 2018).

  37. Haarnoja, T. et al. Soft actor-critic algorithms and applications. Preprint at https://arxiv.org/abs/1812.05905 (2018).

  38. Mnih, V. et al. In Proc. 33rd International Conference on Machine Learning 1928–1937 (PMLR, 2016).

  39. Dabney, W., Rowland, M., Bellemare, M. G. & Munos, R. In 32nd AAAI Conference on Artificial Intelligence (AAAI, 2018).

  40. Lin, L.-J. Reinforcement Learning for Robots Using Neural Networks. Dissertation, Carnegie Mellon Univ. (1993).

  41. Siu, H. C. et al. Evaluation of human-AI teams for learned and rule-based agents in Hanabi. Preprint at https://arxiv.org/abs/2107.07630 (2021).

  42. Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994).

    Article  Google Scholar 

  43. Devore, J. L. Probability and Statistics for Engineering and the Sciences 6th edn (Brooks/Cole, 2004).

  44. Xia, L., Zhou, Z., Yang, J. & Zhao, Q. DSAC: distributional soft actor critic for risk-sensitive reinforcement learning. Preprint at https://arxiv.org/abs/2004.14547 (2020).

  45. Fujimoto, S., van Hoof, H. & Meger, D. In Proc. 35th International Conference on Machine Learning 1587–1596 (PMLR, 2018).

  46. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    MathSciNet  MATH  Google Scholar 

  47. Liu, Z., Li, X., Kang, B. & Darrell, T. In International Conference on Learning Representations (ICLR, 2021).

  48. Kingma, D. P. & Ba, J. In International Conference on Learning Representations (ICLR, 2015).

  49. Cassirer, A. et al. Reverb: a framework for experience replay. Preprint at https://arxiv.org/abs/2102.04736 (2021).

  50. Narvekar, S., Sinapov, J., Leonetti, M. & Stone, P. In Proc. 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016) (2016).

Download references

Acknowledgements

We thank K. Yamauchi, S. Takano, A. Hayashi, C. Ferreira, N. Nozawa, T. Teramoto, M. Hakim, K. Yamada, S. Sakamoto, T. Ueda, A. Yago, J. Nakata and H. Imanishi at Polyphony Digital for making the Gran Turismo franchise, providing support throughout the project and organizing the Race Together events on 2 July 2021 and 21 October 2021. We also thank U. Gallizzi, J. Beltran, G. Albowicz, R. Abdul-ahad and the staff at CGEI for access to their PlayStation Now network to train agents and their help building the infrastructure for our experiments. We benefited from the advice of T. Grossenbacher, a retired competitive GT driver. Finally, we thank E. Kato Marcus and E. Ohshima of Sony AI, who managed the partnership activities with Polyphony Digital and Sony Interactive Entertainment.

Author information

Authors and Affiliations

Authors

Contributions

P.R.W. managed the project. S.B., K.K., P.K., J.M., K.S. and T.J.W. led the research and development efforts. R.C., A.D., F.E., F.F., L.G., V.K., H.L., P.M., D.O., C.S., T.S. and M.D.T. participated in the research and the development of GT Sophy and the AI libraries. H.A., L.B., R.D. and D.W. built the research platform that connected to CGEI’s PlayStation network. P.S. provided executive support and technical and research advice and P.D. provided executive support and technical advice. H.K. and M.S. conceived and set up the project, provided executive support, resources and technical advice and managed stakeholders.

Corresponding author

Correspondence to Peter R. Wurman.

Ethics declarations

Competing interests

 P.R.W. and other team members have submitted US provisional patent application 63/267,136 covering aspects of the scenario training techniques described in this paper.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Seaside and Sarthe training.

Kudos Prime data from global time-trial challenges on Seaside (a and b) and Sarthe (c and d), with the cars used in the competition. Note that these histograms represent the single best lap time for more than 12,000 individual players on Seaside and almost 9,000 on Sarthe. In both cases, the secondary diagrams compare the top five human times to a histogram of 100 laps by the 2 July 2021 time-trial version of GT Sophy. In both cases, the data show that GT Sophy was reliably superhuman, with all 100 laps better than the best human laps. Not surprisingly, it takes longer for the agent to train on the much longer Sarthe course, taking 48 h to reach the 99th percentile of human performance. e, Histogram of a snapshot of the ERB during training on Sarthe on the basis of the scenario breakdown in Fig. 1f. The x axis is the course position and the stacked colours represent the number of samples that were collected in that region from each scenario. In a more condensed format than Fig. 1f, f and g show the sections of Seaside and Maggiore that were used for skill training.

Extended Data Fig. 2 Time trial on Sarthe.

An analysis of Igor Fraga’s best lap in the time-trial test compared with GT Sophy’s lap. a, Areas of the track where Igor lost time with respect to GT Sophy. Corner 20, highlighted in yellow, shows an interesting effect common to the other corners in that Igor seems to catch up a little by braking later, but then loses time because he has to brake longer and comes out of the corner slower. Igor’s steering controls (b) and Igor’s throttle and braking (c) compared with GT Sophy on corner 20. Through the steering wheel and brake pedal, Igor is able to give smooth, 60-Hz signals compared with GT Sophy’s 10-Hz action rate.

Extended Data Fig. 3 Policy selection.

An illustration of the process by which policies were selected to run in the final race. Starting on the left side of the diagram, thousands of policies were generated and saved during the experiments. They were first filtered in the experiment to select the subset on the Pareto frontier of a simple evaluation criteria trading off lap time versus off-course and collision metrics. The selected policies were run through a series of tests evaluating their overall racing performance against a common set of opponents and their performance on a variety of hand-crafted skill tests. The results were ranked and human judgement was applied to select a small number of candidate policies. These policies were matched up in round-robin, policy-versus-policy competitions. The results were again analysed by the human committee for overall team scores and collision metrics. The best candidate policies were run in short races against test drivers at Polyphony Digital. Their subjective evaluations were included in the final decisions on which policies to run in the October 2021 event.

Extended Data Table 1 Reward weights

Supplementary information

Supplementary Information

This file contains more details about the training procedures and algorithms in the form of pseudocode. It also contains several tables that detail the hyperparameters used in training.

Peer Review File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wurman, P.R., Barrett, S., Kawamoto, K. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022). https://doi.org/10.1038/s41586-021-04357-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-021-04357-7

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing