Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits1. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics. In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.
Your institute does not have access to this article
Open Access articles citing this article.
Autonomous Intelligent Systems Open Access 12 April 2022
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
There are no static data associated with this project. All data are generated from scratch by the agent each time it learns. Videos of the races are available at https://sonyai.github.io/gt_sophy_public.
Pseudocode detailing the training process and algorithms used is available as a supplement to this article. The agent interface in GT is not enabled in commercial versions of the game; however, Polyphony Digital has provided a small number of universities and research facilities outside Sony access to the API and is considering working with other groups.
Milliken, W. F. et al. Race Car Vehicle Dynamics Vol. 400 (Society of Automotive Engineers, 1995).
Mnih, V. et al. Playing Atari with deep reinforcement learning. Preprint at https://arxiv.org/abs/1312.5602 (2013).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at https://arxiv.org/abs/1912.06680 (2019).
Laurense, V. A., Goh, J. Y. & Gerdes, J. C. In 2017 American Control Conference (ACC) 5586–5591 (IEEE, 2017).
Spielberg, N. A., Brown, M., Kapania, N. R., Kegelman, J. C. & Gerdes, J. C. Neural network vehicle models for high-performance automated driving. Sci. Robot. 4, eaaw1975 (2019).
Burke, K. Data makes it beta: Roborace returns for second season with updateable self-driving vehicles powered by NVIDIA DRIVE. The Official NVIDIA Blog https://blogs.nvidia.com/blog/2020/10/29/roborace-second-season-nvidia-drive/ (2020).
Leporati, G. No driver? no problem—this is the Indy Autonomous Challenge. Ars Technica https://arstechnica.com/cars/2021/07/a-science-fair-or-the-future-of-racing-the-indy-autonomous-challenge/ (2021).
Williams, G., Drews, P., Goldfain, B., Rehg, J. M. & Theodorou, E. A. In 2016 IEEE International Conference on Robotics and Automation (ICRA) 1433–1440 (IEEE, 2016).
Williams, G., Drews, P., Goldfain, B., Rehg, J. M. & Theodorou, E. A. Information-theoretic model predictive control: theory and applications to autonomous driving. IEEE Trans. Robot. 34, 1603–1622 (2018).
Pan, Y. et al. In Proc. Robotics: Science and Systems XIV (eds Kress-Gazit, H., Srinivasa, S., Howard, T. & Atanasov, N.) https://doi.org/10.15607/RSS.2018.XIV.056 (Carnegie Mellon Univ., 2018).
Pan, Y. et al. Imitation learning for agile autonomous driving. Int. J. Robot. Res. 39, 286–302 (2020).
Amazon Web Services. AWS DeepRacer League. https://aws.amazon.com/deepracer/league/ (2019).
Pyeatt, L. D. & Howe, A. E. Learning to race: experiments with a simulated race car. In Proc. Eleventh International FLAIRS Conference 357–361 (AAAI, 1998).
Chaperot, B. & Fyfe, C. In 2006 IEEE Symposium on Computational Intelligence and Games 181–186 (IEEE, 2006).
Cardamone, L., Loiacono, D. & Lanzi, P. L. In Proc. 11th Annual Conference on Genetic and Evolutionary Computation 1179–1186 (ACM, 2009).
Cardamone, L., Loiacono, D. & Lanzi, P. L. In 2009 IEEE Congress on Evolutionary Computation 2622–2629 (IEEE, 2009).
Loiacono, D., Prete, A., Lanzi, L. & Cardamone, L. In IEEE Congress on Evolutionary Computation 1–8 (IEEE, 2010).
Jaritz, M., de Charette, R., Toromanoff, M., Perot, E. & Nashashibi, F. In 2018 IEEE International Conference on Robotics and Automation (ICRA) 2070–2075 (IEEE, 2018).
Weiss, T. & Behl, M. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) 1163–1168 (IEEE, 2020).
Weiss, T., Babu, V. S. & Behl, M. In NeurIPS 2020 Workshop on Machine Learning for Autonomous Driving (NeurIPS, 2020).
Fuchs, F., Song, Y., Kaufmann, E., Scaramuzza, D. & Dürr, P. Super-human performance in Gran Turismo Sport using deep reinforcement learning. IEEE Robot. Autom. Lett. 6, 4257–4264 (2021).
Song, Y., Lin, H., Kaufmann, E., Dürr, P. & Scaramuzza, D. In Proc. IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2021).
Theodosis, P. A. & Gerdes, J. C. In Dynamic Systems and Control Conference Vol. 45295, 235–241 (American Society of Mechanical Engineers, 2012).
Funke, J. et al. In 2012 IEEE Intelligent Vehicles Symposium 541–547 (IEEE, 2012).
Kritayakirana, K. & Gerdes, J. C. Autonomous vehicle control at the limits of handling. Int. J. Veh. Auton. Syst. 10, 271–296 (2012).
Bonkowski, J. Here’s what you missed from the Indy Autonomous Challenge main event. Autoweek https://www.autoweek.com/racing/more-racing/a38069263/what-missed-indy-autonomous-challenge-main-event/ (2021).
Rutherford, S. J. & Cole, D. J. Modelling nonlinear vehicle dynamics with neural networks. Int. J. Veh. Des. 53, 260–287 (2010).
Pomerleau, D. A. In Robot Learning (eds Connell, J. H. & Mahadevan, S.) 19–43 (Springer, 1993).
Togelius, J. & Lucas, S. M. In 2006 IEEE International Conference on Evolutionary Computation 1187–1194 (IEEE, 2006).
Schwarting, W. et al. Deep latent competition: learning to race using visual control policies in latent space. Preprint at https://arxiv.org/abs/2102.09812 (2021).
Gozli, D. G., Bavelier, D. & Pratt, J. The effect of action video game playing on sensorimotor learning: evidence from a movement tracking task. Hum. Mov. Sci. 38, 152–162 (2014).
Davids, K., Williams, A. M. & Williams, J. G. Visual Perception and Action in Sport (Routledge, 2005).
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. In Proc. 35th International Conference on Machine Learning 1856–1865 (PMLR, 2018).
Haarnoja, T. et al. Soft actor-critic algorithms and applications. Preprint at https://arxiv.org/abs/1812.05905 (2018).
Mnih, V. et al. In Proc. 33rd International Conference on Machine Learning 1928–1937 (PMLR, 2016).
Dabney, W., Rowland, M., Bellemare, M. G. & Munos, R. In 32nd AAAI Conference on Artificial Intelligence (AAAI, 2018).
Lin, L.-J. Reinforcement Learning for Robots Using Neural Networks. Dissertation, Carnegie Mellon Univ. (1993).
Siu, H. C. et al. Evaluation of human-AI teams for learned and rule-based agents in Hanabi. Preprint at https://arxiv.org/abs/2107.07630 (2021).
Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994).
Devore, J. L. Probability and Statistics for Engineering and the Sciences 6th edn (Brooks/Cole, 2004).
Xia, L., Zhou, Z., Yang, J. & Zhao, Q. DSAC: distributional soft actor critic for risk-sensitive reinforcement learning. Preprint at https://arxiv.org/abs/2004.14547 (2020).
Fujimoto, S., van Hoof, H. & Meger, D. In Proc. 35th International Conference on Machine Learning 1587–1596 (PMLR, 2018).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Liu, Z., Li, X., Kang, B. & Darrell, T. In International Conference on Learning Representations (ICLR, 2021).
Kingma, D. P. & Ba, J. In International Conference on Learning Representations (ICLR, 2015).
Cassirer, A. et al. Reverb: a framework for experience replay. Preprint at https://arxiv.org/abs/2102.04736 (2021).
Narvekar, S., Sinapov, J., Leonetti, M. & Stone, P. In Proc. 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016) (2016).
We thank K. Yamauchi, S. Takano, A. Hayashi, C. Ferreira, N. Nozawa, T. Teramoto, M. Hakim, K. Yamada, S. Sakamoto, T. Ueda, A. Yago, J. Nakata and H. Imanishi at Polyphony Digital for making the Gran Turismo franchise, providing support throughout the project and organizing the Race Together events on 2 July 2021 and 21 October 2021. We also thank U. Gallizzi, J. Beltran, G. Albowicz, R. Abdul-ahad and the staff at CGEI for access to their PlayStation Now network to train agents and their help building the infrastructure for our experiments. We benefited from the advice of T. Grossenbacher, a retired competitive GT driver. Finally, we thank E. Kato Marcus and E. Ohshima of Sony AI, who managed the partnership activities with Polyphony Digital and Sony Interactive Entertainment.
P.R.W. and other team members have submitted US provisional patent application 63/267,136 covering aspects of the scenario training techniques described in this paper.
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Kudos Prime data from global time-trial challenges on Seaside (a and b) and Sarthe (c and d), with the cars used in the competition. Note that these histograms represent the single best lap time for more than 12,000 individual players on Seaside and almost 9,000 on Sarthe. In both cases, the secondary diagrams compare the top five human times to a histogram of 100 laps by the 2 July 2021 time-trial version of GT Sophy. In both cases, the data show that GT Sophy was reliably superhuman, with all 100 laps better than the best human laps. Not surprisingly, it takes longer for the agent to train on the much longer Sarthe course, taking 48 h to reach the 99th percentile of human performance. e, Histogram of a snapshot of the ERB during training on Sarthe on the basis of the scenario breakdown in Fig. 1f. The x axis is the course position and the stacked colours represent the number of samples that were collected in that region from each scenario. In a more condensed format than Fig. 1f, f and g show the sections of Seaside and Maggiore that were used for skill training.
An analysis of Igor Fraga’s best lap in the time-trial test compared with GT Sophy’s lap. a, Areas of the track where Igor lost time with respect to GT Sophy. Corner 20, highlighted in yellow, shows an interesting effect common to the other corners in that Igor seems to catch up a little by braking later, but then loses time because he has to brake longer and comes out of the corner slower. Igor’s steering controls (b) and Igor’s throttle and braking (c) compared with GT Sophy on corner 20. Through the steering wheel and brake pedal, Igor is able to give smooth, 60-Hz signals compared with GT Sophy’s 10-Hz action rate.
An illustration of the process by which policies were selected to run in the final race. Starting on the left side of the diagram, thousands of policies were generated and saved during the experiments. They were first filtered in the experiment to select the subset on the Pareto frontier of a simple evaluation criteria trading off lap time versus off-course and collision metrics. The selected policies were run through a series of tests evaluating their overall racing performance against a common set of opponents and their performance on a variety of hand-crafted skill tests. The results were ranked and human judgement was applied to select a small number of candidate policies. These policies were matched up in round-robin, policy-versus-policy competitions. The results were again analysed by the human committee for overall team scores and collision metrics. The best candidate policies were run in short races against test drivers at Polyphony Digital. Their subjective evaluations were included in the final decisions on which policies to run in the October 2021 event.
About this article
Cite this article
Wurman, P.R., Barrett, S., Kawamoto, K. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022). https://doi.org/10.1038/s41586-021-04357-7
Autonomous Intelligent Systems (2022)