Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning

Abstract

Fast and stable locomotion of legged robots involves demanding and contradictory requirements, in particular rapid control frequency as well as an accurate dynamics model. Benefiting from universal approximation ability and offline optimization of neural networks, reinforcement learning has been used to solve various challenging problems in legged robot locomotion; however, the optimal control of quadruped robot requires optimizing multiple objectives such as keeping balance, improving efficiency, realizing periodic gait and following commands. These objectives cannot always be achieved simultaneously, especially at high speed. Here, we introduce an imitation-relaxation reinforcement learning (IRRL) method to optimize the objectives in stages. To bridge the gap between simulation and reality, we further introduce the concept of stochastic stability into system robustness analysis. The state space entropy decreasing rate is a quantitative metric and can sharply capture the occurrence of period-doubling bifurcation and possible chaos. By employing IRRL in training and the stochastic stability analysis, we are able to demonstrate a stable running speed of 5.0 m s–1 for a MIT-MiniCheetah-like robot.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Statistics of the maximum speed and body mass of mammals and quadrupedal robots in logarithmic scales.
Fig. 2: Concept and validation of IRRL method.
Fig. 3: Entropy-based system stability analysis.
Fig. 4: Robustness analysis based on stochastic stability.
Fig. 5: High speed locomotion results by deploying long short-term memory neural network controller to the MiniCheetach-like robot.
Fig. 6: Controller deployment in outdoor scenarios.

Similar content being viewed by others

Data availability

The experimental data are available at https://github.com/WoodenJin/High_Speed_Quadrupedal_Locomotion_by_IRRL. The mammal running data come from work by Hirt and colleagues2.

Code availability

The code and a demo are available at https://github.com/WoodenJin/High_Speed_Quadrupedal_Locomotion_by_IRRL

References

  1. Pfeifer, R., Lungarella, M. & Iida, F. Self-organization, embodiment, and biologically inspired robotics. Science 318, 1088–1093 (2007).

    Article  Google Scholar 

  2. Hirt, M. R., Jetz, W., Rall, B. C. & Brose, U. A general scaling law reveals why the largest animals are not the fastest. Nat. Ecol. Evol. 1, 1116–1122 (2017).

    Article  Google Scholar 

  3. Wensing, P. M. et al. Proprioceptive actuator design in the MIT Cheetah: impact mitigation and high-bandwidth physical interaction for dynamic legged robots. IEEE Trans. Robot. 33, 509–522 (2017).

    Article  Google Scholar 

  4. Katz, B., Carlo, J. D. I. & Kim, S. Mini cheetah: a platform for pushing the limits of dynamic quadruped control. Proc. IEEE Int. Conf. Robot. Autom. 2019, 6295–6301 (2019).

    Google Scholar 

  5. Kim, D., Di Carlo, J., Katz, B., Bledt, G. & Kim, S. Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. Preprint at https://arxiv.org/abs/1909.06586 (2019).

  6. Park, H. W., Wensing, P. M. & Kim, S. High-speed bounding with the MIT Cheetah 2: control design and experiments. Int. J. Rob. Res. 36, 167–192 (2017).

    Article  Google Scholar 

  7. Bledt, G. et al. MIT Cheetah 3: design and control of a robust, dynamic quadruped robot. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems 2245–2252 (IEEE, 2018); https://doi.org/10.1109/IROS.2018.8593885

  8. Di Carlo, J., Katz, B., Kim, S., Wensing, P. M. & Bledt, G. Dynamic locomotion in the MIT Cheetah 3 through convex model-predictive control. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems 1–9 (IEEE, 2019); https://doi.org/10.1109/iros.2018.8594448

  9. Bledt, G. & Kim, S. Extracting legged locomotion heuristics with regularized predictive control. Proc. IEEE Int. Conf. Robot. Autom. 406–412 (IEEE, 2020); https://doi.org/10.1109/ICRA40945.2020.9197488

  10. Bledt, G. & Kim, S. Implementing regularized predictive control for simultaneous real-time footstep and ground reaction force optimization. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems 6316–6323 (IEEE, 2019); https://doi.org/10.1109/IROS40897.2019.8968031

  11. Bledt, G., Wensing, P. M. & Kim, S. Policy-regularized model predictive control to stabilize diverse quadrupedal gaits for the MIT Cheetah. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems 4102–4109 (IEEE, 2017); https://doi.org/10.1109/IROS.2017.8206268

  12. Ding, Y., Pandala, A. & Park, H. W. Real-time model predictive control for versatile dynamic motions in quadrupedal robots. Proc. IEEE Int. Conf. Robot. Autom. 2019, 8484–8490 (2019).

    Google Scholar 

  13. Hong, S., Kim, J. H. & Park, H. W. Real-time constrained nonlinear model predictive control on SO(3) for dynamic legged locomotion. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems 3982–3989 (IEEE, 2020); https://doi.org/10.1109/IROS45743.2020.9341447

  14. Chignoli, M., Kim, D., Stanger-Jones, E. & Kim, S. The MIT humanoid robot: design, motion planning, and control for acrobatic behaviors. In 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids) 1–8 (IEEE, 2021); https://doi.org/10.1109/HUMANOIDS47582.2021.9555782

  15. Peng, X. B., Ma, Z., Abbeel, P., Levine, S. & Kanazawa, A. AMP: adversarial motion priors for stylized physics-based character control. ACM Trans. Graph. 40, 1–20 (2021).

    Article  Google Scholar 

  16. Peng, X. B., Abbeel, P., Levine, S. & van de Panne, M. DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 1–14 (2018).

    Google Scholar 

  17. Lee, S., Lee, S., Lee, Y. & Lee, J. Learning a family of motor skills from a single motion clip. ACM Trans. Graph. 40, 1–13 (2021).

    Google Scholar 

  18. Siekmann, J., Green, K., Warila, J., Fern, A. & Hurst, J. Blind bipedal stair traversal via sim-to-real reinforcement learning. In Conference on Robotics: Science and Systems (RSS Foundation, 2021); https://www.webofscience.com/wos/alldb/full-record/WOS:000684604200061

  19. Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V. & Hutter, M. Learning quadrupedal locomotion over challenging terrain. Sci. Robot. 5, 1–49 (2020).

    Article  Google Scholar 

  20. Miki, T. et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 7, abk2822 (2022).

  21. Lee, J., Hwangbo, J. & Hutter, M. Robust recovery controller for a quadrupedal robot using deep reinforcement learning. Preprint at https://arxiv.org/abs/1901.07517 (2019).

  22. Yang, C., Yuan, K., Zhu, Q., Yu, W. & Li, Z. Multi-expert learning of adaptive legged locomotion. Sci. Robot. 5, 1–14 (2020).

    Article  Google Scholar 

  23. Hwangbo, J. et al. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4, 1–14 (2019).

    Article  Google Scholar 

  24. Tsounis, V., Alge, M., Lee, J., Farshidian, F. & Hutter, M. DeepGait: planning and control of quadrupedal gaits using deep reinforcement learning. IEEE Robot. Autom. Lett. 5, 3699–3706 (2020).

    Article  Google Scholar 

  25. Siekmann, J., Godse, Y., Fern, A. & Hurst, J. Sim-to-real Learning of all common bipedal gaits via periodic reward composition. In 2021 IEEE International Conference on Robotics and Automation 7309–7315 (IEEE, 2021); https://doi.org/10.1109/ICRA48506.2021.9561814

  26. Ji, G., Mun, J., Kim, H. & Hwangbo, J. Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion. IEEE Robot. Autom. Lett. 7, 4630–4637 (2022).

  27. Margolis, G. B., Yang, G., Paigwar, K., Chen, T. & Agrawal, P. Rapid locomotion via reinforcement learning. In Conference on Robotics: Science and Systems (RSS Foundation, 2022); https://www.webofscience.com/wos/alldb/full-record/WOS:000827625700022

  28. Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Rob. Res. 40, 698–721 (2021).

    Article  Google Scholar 

  29. Lee, J., Hyun, D. J., Ahn, J., Kim, S. & Hogan, N. On the dynamics of a quadruped robot model with impedance control: self-stabilizing high speed trot-running and period-doubling bifurcations. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems 4907–4913 (2014); https://doi.org/10.1109/IROS.2014.6943260

  30. Peng, X. B., Andrychowicz, M., Zaremba, W. & Abbeel, P. Sim-to-real transfer of robotic control with dynamics randomization. In Proc. IEEE International Conference on Robotics and Automation 3803–3810 (IEEE, 2018); https://doi.org/10.1109/ICRA.2018.8460528

  31. Peng, X. B. et al. Learning agile robotic locomotion skills by imitating animals. In 16th Conference on Robotics: Science and Systems (RSS Foundation, 2020); https://www.webofscience.com/wos/alldb/full-record/WOS:000570976900064

  32. Raffalt, P. C., Kent, J. A., Wurdeman, S. R. & Stergiou, N. To walk or to run—a question of movement attractor stability. J. Exp. Biol. 223, 1–11 (2020).

    Google Scholar 

  33. Bruijn, S. M., Bregman, D. J. J., Meijer, O. G., Beek, P. J. & van Dieën, J. H. Maximum Lyapunov exponents as predictors of global gait stability: a modelling approach. Med. Eng. Phys. 34, 428–436 (2012).

    Article  Google Scholar 

  34. Heim, S. & Spröwitz, A. Beyond basins of attraction: quantifying robustness of natural dynamics. IEEE Trans. Robot. 35, 939–952 (2019).

    Article  Google Scholar 

  35. Zaytsev, P., Wolfslag, W. & Ruina, A. The boundaries of walking stability: viability and controllability of simple models. IEEE Trans. Robot. 34, 336–352 (2018).

    Article  Google Scholar 

  36. Lee, Y. et al. Push-recovery stability of biped locomotion. ACM Trans. Graph. 34, 1–9 (2015).

    Article  Google Scholar 

  37. Park, H., Yu, R., Lee, Y., Lee, K. & Lee, J. Understanding the stability of deep control policies for biped locomotion. Vis. Comput. https://doi.org/10.1007/s00371-021-02342-9 (2022).

  38. Joshi, V. & Srinivasan, M. A controller for walking derived from how humans recover from perturbations. J. R. Soc. Interface 16, 20190027 (2019).

    Article  Google Scholar 

  39. Khadiv, M., Herzog, A., Moosavian, S. A. A., Righetti, L. & Righetti, L. Walking control based on step timing adaptation. IEEE Trans. Robot. 36, 629–643 (2020).

    Article  Google Scholar 

  40. Luo, Y.-S., Soeseno, J. H., Chen, T. P.-C. & Chen, W.-C. CARL: controllable agent with reinforcement learning for quadruped locomotion. ACM Trans. Graph. 39, 38:1–38:10 (2020).

    Article  Google Scholar 

  41. Phillis, Y. A. Entropy stability of continuous dynamic systems. Int. J. Control 35, 323–340 (1982).

    Article  MathSciNet  MATH  Google Scholar 

  42. Phillis, Y. A. Entropy stability of discrete dynamic systems. Int. J. Control 34, 159–171 (1981).

    Article  MathSciNet  MATH  Google Scholar 

  43. Latora, V. & Baranger, M. Kolmogorov–Sinai entropy rate versus physical entropy. Phys. Rev. Lett. 82, 520–523 (1999).

    Article  Google Scholar 

  44. Seok, S. Design principles for energy-efficient legged locomotion and implementation on the MIT Cheetah Robot. In IEEE/ASME Transactions on Mechatronics Vol. 20, 1117–1129 (IEEE, 2015).

  45. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  Google Scholar 

  46. Siekmann, J. et al. Learning memory-based control for human-scale bipedal Locomotion. In 16th Conference on Robotics: Science and Systems (RSS Foundation, 2020); https://www.webofscience.com/wos/alldb/full-record/WOS:000570976900031

  47. Hwangbo, J., Lee, J. & Hutter, M. Per-contact iteration method for solving contact dynamics. IEEE Robot. Autom. Lett. 3, 895–902 (2018).

    Article  Google Scholar 

  48. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).

  49. Byl, K. & Tedrake, R. Metastable walking machines. Int. J. Rob. Res. 28, 1040–1064 (2009).

    Article  Google Scholar 

  50. He, J. & Gao, F. Mechanism, actuation, perception, and control of highly dynamic multilegged robots: a review. Chinese J. Mech. Eng. 33, 79 (2020).

    Article  Google Scholar 

  51. Kau, N., Schultz, A., Ferrante, N. & Slade, P. Stanford doggo: an open-source, quasi-direct-drive quadruped. Proc. IEEE Int. Conf. Robot. Autom. 2019, 6309–6315 (2019).

    Google Scholar 

  52. Kenneally, G., De, A. & Koditschek, D. E. Design principles for a family of direct-drive legged robots. IEEE Robot. Autom. Lett. 1, 900–907 (2016).

    Article  Google Scholar 

  53. De, A. & Koditschek, D. E. Vertical hopper compositions for preflexive and feedback-stabilized quadrupedal bounding, pacing, pronking, and trotting. Int. J. Robotics Res. 37, 743–778 (2018).

  54. Ding, Y., Pandala, A., Li, C., Shin, Y.-H. & Park, H.-W. Representation-free model predictive control for dynamic motions in quadrupeds. IEEE Trans. Robot. 37, 1154–1171 (2021).

    Article  Google Scholar 

  55. Unitree A1 (Unitree, 2022); https://www.unitree.com/products/a1/

  56. Hutter, M. et al. ANYmal—a highly mobile and dynamic quadrupedal robot. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Vol. 2016-Novem 38–44 (IEEE, 2016).

  57. Biswal, P. & Mohanty, P. K. Development of quadruped walking robots: a review. Ain Shams Eng. J. 12, 2017–2031 (2021).

    Article  Google Scholar 

  58. Raibert, M. H. Trotting, pacing and bounding by a quadruped robot. J. Biomech. 23, 79–98 (1990).

    Article  Google Scholar 

  59. Estremera, J. & Waldron, K. J. Thrust control, stabilization and energetics of a quadruped running robot. Int. J. Rob. Res. 27, 1135–1151 (2008).

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank S. Kim and his group at MIT for opening the source of the Mini Cheetah robot. We would like to thank J. Hwangbo and his team for the free academic license of Raisim. This work was supported partly by the State Key Laboratory of Fluid Power and Mechatronic Systems (Zhejiang University).

Author information

Authors and Affiliations

Authors

Contributions

W. Y. and H. T. W. initiated the project. H. T. W. and Y. B. J. created the experimental protocols. Y. B. J. wrote the code for controller training, and deployment and robustness analysis. Y. B. J. and Y. C. S. conceive the characteristic hyperplane to analyse the reward. Y. B. J. and X. W. L. assembled the robots and did the experiments. Y. B. J., H.T.W. and W. Y. wrote the manuscript, and all of the authors contributed to the discussions on, and revisions to, the manuscript.

Corresponding authors

Correspondence to Hongtao Wang or Wei Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Steve Heim and Hae-Won Park for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Section 1, Tables 1–3 and Figs. 1–550,51,52,53,54,55,56,57,58,59.

Reporting Summary

Supplementary Video 1

Design, evaluation and outdoor testing of the high-speed locomotion controller.

Supplementary Video 2

Comparison of command tracking performance among different trot gait controllers.

Supplementary Video 3

Visualization of the cumulative reward surface in rotating 3D form and transformation process by varying weight coefficients.

Supplementary Video 4

Comparison of the command tracking performances among different bounding gait controllers.

Supplementary Video 5

Generality test, IRRL for 15-dof and 22-dof bipedal robot locomotion controller.

Supplementary Video 6

Comparison of motion trajectories with different friction coefficients and gaits.

Supplementary Video 7

LSTM NN controller deployment on physical platform.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, Y., Liu, X., Shao, Y. et al. High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nat Mach Intell 4, 1198–1208 (2022). https://doi.org/10.1038/s42256-022-00576-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-022-00576-3

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics