High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning

Jin, Yongbin; Liu, Xianwei; Shao, Yecheng; Wang, Hongtao; Yang, Wei

doi:10.1038/s42256-022-00576-3

Article
Published: 14 December 2022

High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning

Nature Machine Intelligence volume 4, pages 1198–1208 (2022)Cite this article

3320 Accesses
12 Citations
56 Altmetric
Metrics details

Subjects

Abstract

Fast and stable locomotion of legged robots involves demanding and contradictory requirements, in particular rapid control frequency as well as an accurate dynamics model. Benefiting from universal approximation ability and offline optimization of neural networks, reinforcement learning has been used to solve various challenging problems in legged robot locomotion; however, the optimal control of quadruped robot requires optimizing multiple objectives such as keeping balance, improving efficiency, realizing periodic gait and following commands. These objectives cannot always be achieved simultaneously, especially at high speed. Here, we introduce an imitation-relaxation reinforcement learning (IRRL) method to optimize the objectives in stages. To bridge the gap between simulation and reality, we further introduce the concept of stochastic stability into system robustness analysis. The state space entropy decreasing rate is a quantitative metric and can sharply capture the occurrence of period-doubling bifurcation and possible chaos. By employing IRRL in training and the stochastic stability analysis, we are able to demonstrate a stable running speed of 5.0 m s^–1 for a MIT-MiniCheetah-like robot.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Statistics of the maximum speed and body mass of mammals and quadrupedal robots in logarithmic scales.**

**Fig. 2: Concept and validation of IRRL method.**

**Fig. 3: Entropy-based system stability analysis.**

**Fig. 4: Robustness analysis based on stochastic stability.**

**Fig. 5: High speed locomotion results by deploying long short-term memory neural network controller to the MiniCheetach-like robot.**

**Fig. 6: Controller deployment in outdoor scenarios.**

Viability leads to the emergence of gait transitions in learning agile quadrupedal locomotion on challenging terrains

Article Open access 09 April 2024

Learning plastic matching of robot dynamics in closed-loop central pattern generators

Article Open access 18 July 2022

Identifying important sensory feedback for learning locomotion skills

Article Open access 21 August 2023

Data availability

The experimental data are available at https://github.com/WoodenJin/High_Speed_Quadrupedal_Locomotion_by_IRRL. The mammal running data come from work by Hirt and colleagues².

Code availability

The code and a demo are available at https://github.com/WoodenJin/High_Speed_Quadrupedal_Locomotion_by_IRRL

References

Pfeifer, R., Lungarella, M. & Iida, F. Self-organization, embodiment, and biologically inspired robotics. Science 318, 1088–1093 (2007).
Article Google Scholar
Hirt, M. R., Jetz, W., Rall, B. C. & Brose, U. A general scaling law reveals why the largest animals are not the fastest. Nat. Ecol. Evol. 1, 1116–1122 (2017).
Article Google Scholar
Wensing, P. M. et al. Proprioceptive actuator design in the MIT Cheetah: impact mitigation and high-bandwidth physical interaction for dynamic legged robots. IEEE Trans. Robot. 33, 509–522 (2017).
Article Google Scholar
Katz, B., Carlo, J. D. I. & Kim, S. Mini cheetah: a platform for pushing the limits of dynamic quadruped control. Proc. IEEE Int. Conf. Robot. Autom. 2019, 6295–6301 (2019).
Google Scholar
Kim, D., Di Carlo, J., Katz, B., Bledt, G. & Kim, S. Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. Preprint at https://arxiv.org/abs/1909.06586 (2019).
Park, H. W., Wensing, P. M. & Kim, S. High-speed bounding with the MIT Cheetah 2: control design and experiments. Int. J. Rob. Res. 36, 167–192 (2017).
Article Google Scholar
Bledt, G. et al. MIT Cheetah 3: design and control of a robust, dynamic quadruped robot. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems 2245–2252 (IEEE, 2018); https://doi.org/10.1109/IROS.2018.8593885
Di Carlo, J., Katz, B., Kim, S., Wensing, P. M. & Bledt, G. Dynamic locomotion in the MIT Cheetah 3 through convex model-predictive control. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems 1–9 (IEEE, 2019); https://doi.org/10.1109/iros.2018.8594448
Bledt, G. & Kim, S. Extracting legged locomotion heuristics with regularized predictive control. Proc. IEEE Int. Conf. Robot. Autom. 406–412 (IEEE, 2020); https://doi.org/10.1109/ICRA40945.2020.9197488
Bledt, G. & Kim, S. Implementing regularized predictive control for simultaneous real-time footstep and ground reaction force optimization. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems 6316–6323 (IEEE, 2019); https://doi.org/10.1109/IROS40897.2019.8968031
Bledt, G., Wensing, P. M. & Kim, S. Policy-regularized model predictive control to stabilize diverse quadrupedal gaits for the MIT Cheetah. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems 4102–4109 (IEEE, 2017); https://doi.org/10.1109/IROS.2017.8206268
Ding, Y., Pandala, A. & Park, H. W. Real-time model predictive control for versatile dynamic motions in quadrupedal robots. Proc. IEEE Int. Conf. Robot. Autom. 2019, 8484–8490 (2019).
Google Scholar
Hong, S., Kim, J. H. & Park, H. W. Real-time constrained nonlinear model predictive control on SO(3) for dynamic legged locomotion. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems 3982–3989 (IEEE, 2020); https://doi.org/10.1109/IROS45743.2020.9341447
Chignoli, M., Kim, D., Stanger-Jones, E. & Kim, S. The MIT humanoid robot: design, motion planning, and control for acrobatic behaviors. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids) 1–8 (IEEE, 2021); https://doi.org/10.1109/HUMANOIDS47582.2021.9555782
Peng, X. B., Ma, Z., Abbeel, P., Levine, S. & Kanazawa, A. AMP: adversarial motion priors for stylized physics-based character control. ACM Trans. Graph. 40, 1–20 (2021).
Article Google Scholar
Peng, X. B., Abbeel, P., Levine, S. & van de Panne, M. DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 1–14 (2018).
Google Scholar
Lee, S., Lee, S., Lee, Y. & Lee, J. Learning a family of motor skills from a single motion clip. ACM Trans. Graph. 40, 1–13 (2021).
Google Scholar
Siekmann, J., Green, K., Warila, J., Fern, A. & Hurst, J. Blind bipedal stair traversal via sim-to-real reinforcement learning. In Conference on Robotics: Science and Systems (RSS Foundation, 2021); https://www.webofscience.com/wos/alldb/full-record/WOS:000684604200061
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 5, 1–49 (2020).
Article Google Scholar
Miki, T. et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 7, abk2822 (2022).
Lee, J., Hwangbo, J. & Hutter, M. Robust recovery controller for a quadrupedal robot using deep reinforcement learning. Preprint at https://arxiv.org/abs/1901.07517 (2019).
Yang, C., Yuan, K., Zhu, Q., Yu, W. & Li, Z. Multi-expert learning of adaptive legged locomotion. Sci. Robot. 5, 1–14 (2020).
Article Google Scholar
Hwangbo, J. et al. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4, 1–14 (2019).
Article Google Scholar
Tsounis, V., Alge, M., Lee, J., Farshidian, F. & Hutter, M. DeepGait: planning and control of quadrupedal gaits using deep reinforcement learning. IEEE Robot. Autom. Lett. 5, 3699–3706 (2020).
Article Google Scholar
Siekmann, J., Godse, Y., Fern, A. & Hurst, J. Sim-to-real Learning of all common bipedal gaits via periodic reward composition. In 2021 IEEE International Conference on Robotics and Automation 7309–7315 (IEEE, 2021); https://doi.org/10.1109/ICRA48506.2021.9561814
Ji, G., Mun, J., Kim, H. & Hwangbo, J. Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion. IEEE Robot. Autom. Lett. 7, 4630–4637 (2022).
Margolis, G. B., Yang, G., Paigwar, K., Chen, T. & Agrawal, P. Rapid locomotion via reinforcement learning. In Conference on Robotics: Science and Systems (RSS Foundation, 2022); https://www.webofscience.com/wos/alldb/full-record/WOS:000827625700022
Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Rob. Res. 40, 698–721 (2021).
Article Google Scholar
Lee, J., Hyun, D. J., Ahn, J., Kim, S. & Hogan, N. On the dynamics of a quadruped robot model with impedance control: self-stabilizing high speed trot-running and period-doubling bifurcations. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems 4907–4913 (2014); https://doi.org/10.1109/IROS.2014.6943260
Peng, X. B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Sim-to-real transfer of robotic control with dynamics randomization. In Proc. IEEE International Conference on Robotics and Automation 3803–3810 (IEEE, 2018); https://doi.org/10.1109/ICRA.2018.8460528
Peng, X. B. et al. Learning agile robotic locomotion skills by imitating animals. In 16th Conference on Robotics: Science and Systems (RSS Foundation, 2020); https://www.webofscience.com/wos/alldb/full-record/WOS:000570976900064
Raffalt, P. C., Kent, J. A., Wurdeman, S. R. & Stergiou, N. To walk or to run—a question of movement attractor stability. J. Exp. Biol. 223, 1–11 (2020).
Google Scholar
Bruijn, S. M., Bregman, D. J. J., Meijer, O. G., Beek, P. J. & van Dieën, J. H. Maximum Lyapunov exponents as predictors of global gait stability: a modelling approach. Med. Eng. Phys. 34, 428–436 (2012).
Article Google Scholar
Heim, S. & Spröwitz, A. Beyond basins of attraction: quantifying robustness of natural dynamics. IEEE Trans. Robot. 35, 939–952 (2019).
Article Google Scholar
Zaytsev, P., Wolfslag, W. & Ruina, A. The boundaries of walking stability: viability and controllability of simple models. IEEE Trans. Robot. 34, 336–352 (2018).
Article Google Scholar
Lee, Y. et al. Push-recovery stability of biped locomotion. ACM Trans. Graph. 34, 1–9 (2015).
Article Google Scholar
Park, H., Yu, R., Lee, Y., Lee, K. & Lee, J. Understanding the stability of deep control policies for biped locomotion. Vis. Comput. https://doi.org/10.1007/s00371-021-02342-9 (2022).
Joshi, V. & Srinivasan, M. A controller for walking derived from how humans recover from perturbations. J. R. Soc. Interface 16, 20190027 (2019).
Article Google Scholar
Khadiv, M., Herzog, A., Moosavian, S. A. A., Righetti, L. & Righetti, L. Walking control based on step timing adaptation. IEEE Trans. Robot. 36, 629–643 (2020).
Article Google Scholar
Luo, Y.-S., Soeseno, J. H., Chen, T. P.-C. & Chen, W.-C. CARL: controllable agent with reinforcement learning for quadruped locomotion. ACM Trans. Graph. 39, 38:1–38:10 (2020).
Article Google Scholar
Phillis, Y. A. Entropy stability of continuous dynamic systems. Int. J. Control 35, 323–340 (1982).
Article MathSciNet MATH Google Scholar
Phillis, Y. A. Entropy stability of discrete dynamic systems. Int. J. Control 34, 159–171 (1981).
Article MathSciNet MATH Google Scholar
Latora, V. & Baranger, M. Kolmogorov–Sinai entropy rate versus physical entropy. Phys. Rev. Lett. 82, 520–523 (1999).
Article Google Scholar
Seok, S. Design principles for energy-efficient legged locomotion and implementation on the MIT Cheetah Robot. In IEEE/ASME Transactions on Mechatronics Vol. 20, 1117–1129 (IEEE, 2015).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
Siekmann, J. et al. Learning memory-based control for human-scale bipedal Locomotion. In 16th Conference on Robotics: Science and Systems (RSS Foundation, 2020); https://www.webofscience.com/wos/alldb/full-record/WOS:000570976900031
Hwangbo, J., Lee, J. & Hutter, M. Per-contact iteration method for solving contact dynamics. IEEE Robot. Autom. Lett. 3, 895–902 (2018).
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).
Byl, K. & Tedrake, R. Metastable walking machines. Int. J. Rob. Res. 28, 1040–1064 (2009).
Article Google Scholar
He, J. & Gao, F. Mechanism, actuation, perception, and control of highly dynamic multilegged robots: a review. Chinese J. Mech. Eng. 33, 79 (2020).
Article Google Scholar
Kau, N., Schultz, A., Ferrante, N. & Slade, P. Stanford doggo: an open-source, quasi-direct-drive quadruped. Proc. IEEE Int. Conf. Robot. Autom. 2019, 6309–6315 (2019).
Google Scholar
Kenneally, G., De, A. & Koditschek, D. E. Design principles for a family of direct-drive legged robots. IEEE Robot. Autom. Lett. 1, 900–907 (2016).
Article Google Scholar
De, A. & Koditschek, D. E. Vertical hopper compositions for preflexive and feedback-stabilized quadrupedal bounding, pacing, pronking, and trotting. Int. J. Robotics Res. 37, 743–778 (2018).
Ding, Y., Pandala, A., Li, C., Shin, Y.-H. & Park, H.-W. Representation-free model predictive control for dynamic motions in quadrupeds. IEEE Trans. Robot. 37, 1154–1171 (2021).
Article Google Scholar
Unitree A1 (Unitree, 2022); https://www.unitree.com/products/a1/
Hutter, M. et al. ANYmal—a highly mobile and dynamic quadrupedal robot. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Vol. 2016-Novem 38–44 (IEEE, 2016).
Biswal, P. & Mohanty, P. K. Development of quadruped walking robots: a review. Ain Shams Eng. J. 12, 2017–2031 (2021).
Article Google Scholar
Raibert, M. H. Trotting, pacing and bounding by a quadruped robot. J. Biomech. 23, 79–98 (1990).
Article Google Scholar
Estremera, J. & Waldron, K. J. Thrust control, stabilization and energetics of a quadruped running robot. Int. J. Rob. Res. 27, 1135–1151 (2008).
Article Google Scholar

Download references

Acknowledgements

We would like to thank S. Kim and his group at MIT for opening the source of the Mini Cheetah robot. We would like to thank J. Hwangbo and his team for the free academic license of Raisim. This work was supported partly by the State Key Laboratory of Fluid Power and Mechatronic Systems (Zhejiang University).

Author information

Authors and Affiliations

Center for X-Mechanics, Zhejiang University, Hangzhou, China
Yongbin Jin, Xianwei Liu, Yecheng Shao, Hongtao Wang & Wei Yang
ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
Yongbin Jin, Hongtao Wang & Wei Yang
State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China
Yongbin Jin, Hongtao Wang & Wei Yang
Institute of Applied Mechanics, Zhejiang University, Hangzhou, China
Yongbin Jin, Yecheng Shao, Hongtao Wang & Wei Yang

Authors

Yongbin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xianwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yecheng Shao
View author publications
You can also search for this author in PubMed Google Scholar
Hongtao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W. Y. and H. T. W. initiated the project. H. T. W. and Y. B. J. created the experimental protocols. Y. B. J. wrote the code for controller training, and deployment and robustness analysis. Y. B. J. and Y. C. S. conceive the characteristic hyperplane to analyse the reward. Y. B. J. and X. W. L. assembled the robots and did the experiments. Y. B. J., H.T.W. and W. Y. wrote the manuscript, and all of the authors contributed to the discussions on, and revisions to, the manuscript.

Corresponding authors

Correspondence to Hongtao Wang or Wei Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Steve Heim and Hae-Won Park for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Section 1, Tables 1–3 and Figs. 1–5^{50,51,52,53,54,55,56,57,58,59}.

Reporting Summary

Supplementary Video 1

Design, evaluation and outdoor testing of the high-speed locomotion controller.

Supplementary Video 2

Comparison of command tracking performance among different trot gait controllers.

Supplementary Video 3

Visualization of the cumulative reward surface in rotating 3D form and transformation process by varying weight coefficients.

Supplementary Video 4

Comparison of the command tracking performances among different bounding gait controllers.

Supplementary Video 5

Generality test, IRRL for 15-dof and 22-dof bipedal robot locomotion controller.

Supplementary Video 6

Comparison of motion trajectories with different friction coefficients and gaits.

Supplementary Video 7

LSTM NN controller deployment on physical platform.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jin, Y., Liu, X., Shao, Y. et al. High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nat Mach Intell 4, 1198–1208 (2022). https://doi.org/10.1038/s42256-022-00576-3

Download citation

Received: 15 February 2022
Accepted: 25 October 2022
Published: 14 December 2022
Issue Date: December 2022
DOI: https://doi.org/10.1038/s42256-022-00576-3

This article is cited by

Development of Wheel-Legged Biped Robots: A Review
- Xuefei Liu
- Yi Sun
- Aihong Ji
Journal of Bionic Engineering (2024)
Bird’s Eye View feature selection for high-dimensional data
- Samir Brahim Belhaouari
- Mohammed Bilal Shakeel
- Khelil Kassoul
Scientific Reports (2023)
Contact detection with multi-information fusion for quadruped robot locomotion under unstructured terrain
- Yangyang Han
- Zhenyu Lu
- Zekang Chen
Frontiers of Mechanical Engineering (2023)