Abstract
Knowledge from animals and humans inspires robotic innovations. Numerous efforts have been made to achieve agile locomotion in quadrupedal robots through classical controllers or reinforcement learning approaches. These methods usually rely on physical models or handcrafted rewards to accurately describe the specific system, rather than on a generalized understanding like animals do. Here we propose a hierarchical framework to construct primitive-, environmental- and strategic-level knowledge that are all pre-trainable, reusable and enrichable for legged robots. The primitive module summarizes knowledge from animal motion data, where, inspired by large pre-trained models in language and image understanding, we introduce deep generative models to produce motor control signals stimulating legged robots to act like real animals. Then, we shape various traversing capabilities at a higher level to align with the environment by reusing the primitive module. Finally, a strategic module is trained focusing on complex downstream tasks by reusing the knowledge from previous levels. We apply the trained hierarchical controllers to the MAX robot, a quadrupedal robot developed in-house, to mimic animals, traverse complex obstacles and play in a designed challenging multi-agent chase tag game, where lifelike agility and strategy emerge in the robots.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The full motion data from the Labrador retriever together with the retargeted data for the MAX robot are available from Code Ocean at https://doi.org/10.24433/CO.8441152.v3 (ref. 51) and GitHub at https://tencent-roboticsx.github.io/lifelike-agility-and-play/. The raw motion clips are in .bvh format, and the retargeted data are organized in .txt files.
Code availability
The codes are available in Code Ocean at https://doi.org/10.24433/CO.8441152.v3 (ref. 51) and GitHub at https://tencent-roboticsx.github.io/lifelike-agility-and-play/.
References
Tan, J. et al. Sim-to-real: learning agile locomotion for quadruped robots. In Proc. Robotics: Science and Systems Vol. XIV (MIT Press Journals, 2018).
Haarnoja, T., Hartikainen, K., Abbeel, P. and Levine, S. Latent space policies for hierarchical reinforcement learning. In Proc. 35th International Conference on Machine Learning 1851–1860 (PMLR, 2018).
Hwangbo, J. et al. Learning agile and dynamic motor skills for legged robots. Sci. Rob. 4, eaau5872 (2019).
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Rob. 5, eabc5986 (2020).
Miki, T. et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Rob. 7, eabk2822 (2022).
Kumar, A., Fu, Z., Pathak, D. & Malik, J. RMA: rapid motor adaptation for legged robots. In Proc. Robotics: Science and Systems Vol. XVII (2021).
Cheng, X., Shi, K., Agarwal, A. & Pathak, D. Extreme parkour with legged robots. In Conference on Robot Learning (2023).
Zhuang, Z. et al. Robot parkour learning. In Conference on Robot Learning (2023).
Hoeller, D., Rudin, N., Sako, D. & Hutter, M. ANYmal parkour: learning agile navigation for quadrupedal robots. Sci. Rob. 9, eadi7566 (2024).
Yang, Y. et al. CAJun: continuous adaptive jumping using a learned centroidal controller. In Proc. 7th Conference on Robot Learning Vol. 229, 2791–2806 (PMLR, 2023).
Caluwaerts, K. et al. Barkour: benchmarking animal-level agility with quadruped robots. Preprint at https://doi.org/10.48550/arXiv.2305.14654 (2023).
Choi, S. et al. Learning quadrupedal locomotion on deformable terrain. Sci. Rob. 8, eade2256 (2023).
Yang, C., Yuan, K., Zhu, Q., Yu, W. & Li, Z. Multi-expert learning of adaptive legged locomotion. Sci. Rob. 5, eabb2174 (2020).
Peng, X. B. et al. Learning agile robotic locomotion skills by imitating animals. In Proc. Robotics: Science and Systems (2020).
Bohez, S. et al. Imitate and repurpose: learning reusable robot movement skills from human and animal behaviors. Preprint at https://doi.org/10.48550/arXiv.2203.17138 (2022).
Levine, S., Wang, J. M., Haraux, A., Popović, Z. & Koltun, V. Continuous character control with low-dimensional embeddings. ACM Trans. Graphics 31, 28 (2012).
Ling, H. Y., Zinno, F., Cheng, G. & Van De Panne, M. Character controllers using motion VAEs. ACM Trans. Graphics 39, 40 (2020).
Tirumala, D. et al. Behavior priors for efficient reinforcement learning. J. Mach. Learn. Res. 23, 9989–10056 (2022).
Heess, N. et al. Learning and transfer of modulated locomotor controllers. Preprint at https://doi.org/10.48550/arXiv.1610.05182 (2016).
Merel, J. et al. Neural probabilistic motor primitives for humanoid control. In International Conference on Learning Representations 4647–4660 (Curran Assoc., 2019).
Hasenclever, L., Pardo, F., Hadsell, R., Heess, N. & Merel, J. CoMic: complementary task learning & mimicry for reusable skills. In Proc. 37th International Conference on Machine Learning Vol. 119, 4105–4115 (PMLR, 2020).
Liu, S. et al. From motor control to team play in simulated humanoid football. Sci. Rob. 7, eabo0235 (2022).
Zhu, Q., Zhang, H., Lan, M. & Han, L. Neural categorical priors for physics-based character control. ACM Trans. Graphics 42, 178 (2023).
Ji, Y. et al. Hierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems 1479–1486 (IEEE, 2022).
van den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. In Advances in Neural Information Processing Systems (2017).
Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference on Machine Learning Vol. 139, 8821–8831 (PMLR, 2021).
Roy, A., Vaswani, A., Neelakantan, A. & Parmar, N. Theory and experiments on vector quantized autoencoders. Preprint at https://doi.org/10.48550/arXiv.1805.11063 (2018).
Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine Learning Vol. 4 (Springer, 2006).
Chi, W., Jiang, X. & Zheng, Y. A linearization of centroidal dynamics for the model-predictive control of quadruped robots. In 2022 International Conference on Robotics and Automation 4656–4663 (IEEE, 2022).
Zhou, Q. et al. Max: A wheeled-legged quadruped robot for multimodal agile locomotion. IEEE Transactions on Automation Science and Engineering 1–21 (2024).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://doi.org/10.48550/arXiv.1707.06347 (2017).
Sun, P. et al. TLeague: a framework for competitive self-play based distributed multi-agent reinforcement learning. Preprint at https://doi.org/10.48550/arXiv.2011.12895 (2020).
Higgins, I. et al. beta-VAE: learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (2017).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Gouelle, A., Mégrot, F. & Müller, B. Interpreting spatiotemporal parameters, symmetry, and variability in clinical gait analysis. Handb. Hum. Motion 689, 707 (2018).
Jarvis, S. L. et al. Kinematic and kinetic analysis of dogs during trotting after amputation of a thoracic limb. Am. J. Vet. Res. 74, 1155–1163 (2013).
Pálya, Z., Rácz, K., Nagymáté, G. & Kiss, R. M. Development of a detailed canine gait analysis method for evaluating harnesses: a pilot study. PLoS ONE 17, e0264299 (2022).
World chase tag. Wikipedia https://en.wikipedia.org/wiki/World_Chase_Tag (accessed 23 March 2023).
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Han, L. et al. TStarBot-X: an open-sourced and comprehensive study for efficient league training in starcraft ii full game. Preprint at https://doi.org/10.48550/arXiv.2011.13729 (2020).
Coulom, R. Bayesian Elo rating (2005).
Xie, Z. et al. Learning locomotion skills for cassie: Iterative design and sim-to-real. In Proc. Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research 317–329 (PMLR, 2020).
Peng, X. B., Kanazawa, A., Malik, J., Abbeel, P. & Levine, S. SFV: reinforcement learning of physical skills from videos. ACM Trans. Graph. 37, 178 (2018).
Zhang, H. et al. Learning physically simulated tennis skills from broadcast videos. ACM Trans. Graph. 42, 95 (2023).
Gleicher, M. Retargetting motion to new characters. In Proc. 25th Annual Conference on Computer Graphics and Interactive Techniques 33–42 (Association for Computing Machinery, 1998).
Peng, X. B. and Van De Panne, M. Learning locomotion skills using DeepRL: does the choice of action space matter? In Proc. ACM SIGGRAPH/Eurographics Symposium on Computer Animation 12:1–12:13 (Association for Computing Machinery, 2017).
Ho, J. and Ermon, S. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems Vol. 29 (2016).
Agarwal, A., Kumar, A., Malik, J. & Pathak, D. Legged locomotion in challenging terrains using egocentric vision. In Proc. 6th Conference on Robot Learning (eds Liu, K. et al.) 403–415 (PMLR, 2023).
Li, T. et al. Learning terrain-adaptive locomotion with agile behaviors by imitating animals. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems 339–345, (2023).
Rusu, A. A. et al. Policy distillation. Preprint at https://doi.org/10.48550/arXiv.1511.06295 (2015).
Han, L. et al. Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models. Code Ocean https://doi.org/10.24433/CO.8441152.v3 (2024).
Acknowledgements
We would like to thank S. Li for his early contributions to motion retargeting. We would like to thank our colleagues in Tencent Robotics X and Tencent Cloud for providing constructive discussions and computing resources. We would like to thank the Labrador who wore the motion capture markers and moved for motion data collection.
Author information
Authors and Affiliations
Contributions
L.H. organized the research project. L.H., Q.Z., C. Zhang, T.L. and H.Z. designed, implemented and experimented with various environmental settings, neural network architectures, algorithms and so on. C. Zhou, T.L. and C. Zhang collected the animal motion dataset. L.H. and Yizheng Zhang iterated over multiple versions of the physics-based simulator and its settings. J.S., Y.L., Yizheng Zhang, T.L., Q.Z. and L.H. completed the real robot experiments. Q.Z., R.Z. and C. Zhou contributed to improving the training infrastructure. Y.L., J.L., Yufeng Zhang, R.W., W.C., X.L., Y. Zhu, L.X. and X.T. maintained the robot hardware and software during the project. L.H. wrote the paper with contributions from H.Z., C. Zhang, Q.Z., T.L. and J.S.; Z.Z. provided general scope advice and consistently supported the team.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Ken Caluwaerts, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Sections 6.1–6.5, Tables 1–4 and Figs. 1–4.
Supplementary Video 1
Main movie for the PMC model.
Supplementary Video 2
Main movie for the EPMC model.
Supplementary Video 3
Main movie for the SEPMC model.
Supplementary Video 4
The performance of all the trained policies in simulation.
Supplementary Video 5
The performance of the fall recovery model in real-world experiment.
Supplementary Video 6
The performance of the student environment-level network using onboard depth camera in real-world experiment.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, L., Zhu, Q., Sheng, J. et al. Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models. Nat Mach Intell 6, 787–798 (2024). https://doi.org/10.1038/s42256-024-00861-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-024-00861-3