Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models

A preprint version of the article is available at arXiv.

Abstract

Knowledge from animals and humans inspires robotic innovations. Numerous efforts have been made to achieve agile locomotion in quadrupedal robots through classical controllers or reinforcement learning approaches. These methods usually rely on physical models or handcrafted rewards to accurately describe the specific system, rather than on a generalized understanding like animals do. Here we propose a hierarchical framework to construct primitive-, environmental- and strategic-level knowledge that are all pre-trainable, reusable and enrichable for legged robots. The primitive module summarizes knowledge from animal motion data, where, inspired by large pre-trained models in language and image understanding, we introduce deep generative models to produce motor control signals stimulating legged robots to act like real animals. Then, we shape various traversing capabilities at a higher level to align with the environment by reusing the primitive module. Finally, a strategic module is trained focusing on complex downstream tasks by reusing the knowledge from previous levels. We apply the trained hierarchical controllers to the MAX robot, a quadrupedal robot developed in-house, to mimic animals, traverse complex obstacles and play in a designed challenging multi-agent chase tag game, where lifelike agility and strategy emerge in the robots.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A framework overview of the proposed method.
Fig. 2: Evaluation of the primitive motor controllers.
Fig. 3: Performance evaluation of the environmental-primitive motor controllers.
Fig. 4: Snapshots in the chase tag game.

Similar content being viewed by others

Data availability

The full motion data from the Labrador retriever together with the retargeted data for the MAX robot are available from Code Ocean at https://doi.org/10.24433/CO.8441152.v3 (ref. 51) and GitHub at https://tencent-roboticsx.github.io/lifelike-agility-and-play/. The raw motion clips are in .bvh format, and the retargeted data are organized in .txt files.

Code availability

The codes are available in Code Ocean at https://doi.org/10.24433/CO.8441152.v3 (ref. 51) and GitHub at https://tencent-roboticsx.github.io/lifelike-agility-and-play/.

References

  1. Tan, J. et al. Sim-to-real: learning agile locomotion for quadruped robots. In Proc. Robotics: Science and Systems Vol. XIV (MIT Press Journals, 2018).

  2. Haarnoja, T., Hartikainen, K., Abbeel, P. and Levine, S. Latent space policies for hierarchical reinforcement learning. In Proc. 35th International Conference on Machine Learning 1851–1860 (PMLR, 2018).

  3. Hwangbo, J. et al. Learning agile and dynamic motor skills for legged robots. Sci. Rob. 4, eaau5872 (2019).

    Article  Google Scholar 

  4. Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Rob. 5, eabc5986 (2020).

    Article  Google Scholar 

  5. Miki, T. et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Rob. 7, eabk2822 (2022).

    Article  Google Scholar 

  6. Kumar, A., Fu, Z., Pathak, D. & Malik, J. RMA: rapid motor adaptation for legged robots. In Proc. Robotics: Science and Systems Vol. XVII (2021).

  7. Cheng, X., Shi, K., Agarwal, A. & Pathak, D. Extreme parkour with legged robots. In Conference on Robot Learning (2023).

  8. Zhuang, Z. et al. Robot parkour learning. In Conference on Robot Learning (2023).

  9. Hoeller, D., Rudin, N., Sako, D. & Hutter, M. ANYmal parkour: learning agile navigation for quadrupedal robots. Sci. Rob. 9, eadi7566 (2024).

    Article  Google Scholar 

  10. Yang, Y. et al. CAJun: continuous adaptive jumping using a learned centroidal controller. In Proc. 7th Conference on Robot Learning Vol. 229, 2791–2806 (PMLR, 2023).

  11. Caluwaerts, K. et al. Barkour: benchmarking animal-level agility with quadruped robots. Preprint at https://doi.org/10.48550/arXiv.2305.14654 (2023).

  12. Choi, S. et al. Learning quadrupedal locomotion on deformable terrain. Sci. Rob. 8, eade2256 (2023).

    Article  Google Scholar 

  13. Yang, C., Yuan, K., Zhu, Q., Yu, W. & Li, Z. Multi-expert learning of adaptive legged locomotion. Sci. Rob. 5, eabb2174 (2020).

    Article  Google Scholar 

  14. Peng, X. B. et al. Learning agile robotic locomotion skills by imitating animals. In Proc. Robotics: Science and Systems (2020).

  15. Bohez, S. et al. Imitate and repurpose: learning reusable robot movement skills from human and animal behaviors. Preprint at https://doi.org/10.48550/arXiv.2203.17138 (2022).

  16. Levine, S., Wang, J. M., Haraux, A., Popović, Z. & Koltun, V. Continuous character control with low-dimensional embeddings. ACM Trans. Graphics 31, 28 (2012).

    Article  Google Scholar 

  17. Ling, H. Y., Zinno, F., Cheng, G. & Van De Panne, M. Character controllers using motion VAEs. ACM Trans. Graphics 39, 40 (2020).

    Article  Google Scholar 

  18. Tirumala, D. et al. Behavior priors for efficient reinforcement learning. J. Mach. Learn. Res. 23, 9989–10056 (2022).

    MathSciNet  Google Scholar 

  19. Heess, N. et al. Learning and transfer of modulated locomotor controllers. Preprint at https://doi.org/10.48550/arXiv.1610.05182 (2016).

  20. Merel, J. et al. Neural probabilistic motor primitives for humanoid control. In International Conference on Learning Representations 4647–4660 (Curran Assoc., 2019).

  21. Hasenclever, L., Pardo, F., Hadsell, R., Heess, N. & Merel, J. CoMic: complementary task learning & mimicry for reusable skills. In Proc. 37th International Conference on Machine Learning Vol. 119, 4105–4115 (PMLR, 2020).

  22. Liu, S. et al. From motor control to team play in simulated humanoid football. Sci. Rob. 7, eabo0235 (2022).

    Article  Google Scholar 

  23. Zhu, Q., Zhang, H., Lan, M. & Han, L. Neural categorical priors for physics-based character control. ACM Trans. Graphics 42, 178 (2023).

    Article  Google Scholar 

  24. Ji, Y. et al. Hierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems 1479–1486 (IEEE, 2022).

  25. van den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. In Advances in Neural Information Processing Systems (2017).

  26. Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference on Machine Learning Vol. 139, 8821–8831 (PMLR, 2021).

  27. Roy, A., Vaswani, A., Neelakantan, A. & Parmar, N. Theory and experiments on vector quantized autoencoders. Preprint at https://doi.org/10.48550/arXiv.1805.11063 (2018).

  28. Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine Learning Vol. 4 (Springer, 2006).

  29. Chi, W., Jiang, X. & Zheng, Y. A linearization of centroidal dynamics for the model-predictive control of quadruped robots. In 2022 International Conference on Robotics and Automation 4656–4663 (IEEE, 2022).

  30. Zhou, Q. et al. Max: A wheeled-legged quadruped robot for multimodal agile locomotion. IEEE Transactions on Automation Science and Engineering 1–21 (2024).

  31. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://doi.org/10.48550/arXiv.1707.06347 (2017).

  32. Sun, P. et al. TLeague: a framework for competitive self-play based distributed multi-agent reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.2011.12895 (2020).

  33. Higgins, I. et al. beta-VAE: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (2017).

  34. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  35. Gouelle, A., Mégrot, F. & Müller, B. Interpreting spatiotemporal parameters, symmetry, and variability in clinical gait analysis. Handb. Hum. Motion 689, 707 (2018).

    Google Scholar 

  36. Jarvis, S. L. et al. Kinematic and kinetic analysis of dogs during trotting after amputation of a thoracic limb. Am. J. Vet. Res. 74, 1155–1163 (2013).

    Article  Google Scholar 

  37. Pálya, Z., Rácz, K., Nagymáté, G. & Kiss, R. M. Development of a detailed canine gait analysis method for evaluating harnesses: a pilot study. PLoS ONE 17, e0264299 (2022).

  38. World chase tag. Wikipedia https://en.wikipedia.org/wiki/World_Chase_Tag (accessed 23 March 2023).

  39. Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

    Article  Google Scholar 

  40. Han, L. et al. TStarBot-X: an open-sourced and comprehensive study for efficient league training in starcraft ii full game. Preprint at https://doi.org/10.48550/arXiv.2011.13729 (2020).

  41. Coulom, R. Bayesian Elo rating (2005).

  42. Xie, Z. et al. Learning locomotion skills for cassie: Iterative design and sim-to-real. In Proc. Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research 317–329 (PMLR, 2020).

  43. Peng, X. B., Kanazawa, A., Malik, J., Abbeel, P. & Levine, S. SFV: reinforcement learning of physical skills from videos. ACM Trans. Graph. 37, 178 (2018).

    Article  Google Scholar 

  44. Zhang, H. et al. Learning physically simulated tennis skills from broadcast videos. ACM Trans. Graph. 42, 95 (2023).

    Article  Google Scholar 

  45. Gleicher, M. Retargetting motion to new characters. In Proc. 25th Annual Conference on Computer Graphics and Interactive Techniques 33–42 (Association for Computing Machinery, 1998).

  46. Peng, X. B. and Van De Panne, M. Learning locomotion skills using DeepRL: does the choice of action space matter? In Proc. ACM SIGGRAPH/Eurographics Symposium on Computer Animation 12:1–12:13 (Association for Computing Machinery, 2017).

  47. Ho, J. and Ermon, S. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems Vol. 29 (2016).

  48. Agarwal, A., Kumar, A., Malik, J. & Pathak, D. Legged locomotion in challenging terrains using egocentric vision. In Proc. 6th Conference on Robot Learning (eds Liu, K. et al.) 403–415 (PMLR, 2023).

  49. Li, T. et al. Learning terrain-adaptive locomotion with agile behaviors by imitating animals. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems 339–345, (2023).

  50. Rusu, A. A. et al. Policy distillation. Preprint at https://doi.org/10.48550/arXiv.1511.06295 (2015).

  51. Han, L. et al. Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models. Code Ocean https://doi.org/10.24433/CO.8441152.v3 (2024).

Download references

Acknowledgements

We would like to thank S. Li for his early contributions to motion retargeting. We would like to thank our colleagues in Tencent Robotics X and Tencent Cloud for providing constructive discussions and computing resources. We would like to thank the Labrador who wore the motion capture markers and moved for motion data collection.

Author information

Authors and Affiliations

Authors

Contributions

L.H. organized the research project. L.H., Q.Z., C. Zhang, T.L. and H.Z. designed, implemented and experimented with various environmental settings, neural network architectures, algorithms and so on. C. Zhou, T.L. and C. Zhang collected the animal motion dataset. L.H. and Yizheng Zhang iterated over multiple versions of the physics-based simulator and its settings. J.S., Y.L., Yizheng Zhang, T.L., Q.Z. and L.H. completed the real robot experiments. Q.Z., R.Z. and C. Zhou contributed to improving the training infrastructure. Y.L., J.L., Yufeng Zhang, R.W., W.C., X.L., Y. Zhu, L.X. and X.T. maintained the robot hardware and software during the project. L.H. wrote the paper with contributions from H.Z., C. Zhang, Q.Z., T.L. and J.S.; Z.Z. provided general scope advice and consistently supported the team.

Corresponding authors

Correspondence to Lei Han or Qingxu Zhu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Ken Caluwaerts, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 6.1–6.5, Tables 1–4 and Figs. 1–4.

Reporting Summary

Supplementary Video 1

Main movie for the PMC model.

Supplementary Video 2

Main movie for the EPMC model.

Supplementary Video 3

Main movie for the SEPMC model.

Supplementary Video 4

The performance of all the trained policies in simulation.

Supplementary Video 5

The performance of the fall recovery model in real-world experiment.

Supplementary Video 6

The performance of the student environment-level network using onboard depth camera in real-world experiment.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, L., Zhu, Q., Sheng, J. et al. Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models. Nat Mach Intell 6, 787–798 (2024). https://doi.org/10.1038/s42256-024-00861-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-024-00861-3

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics