Learning Neuroplastic Matching of Robot Dynamics in Closed-loop CPGs

—Legged robots have the potential to show loco- 1 motion performance with reduced control effort and energy 2 efﬁciency by leveraging elastic structures inspired by animals’ 3 elastic tendons and muscles. However, it remains a challenge 4 to match the natural dynamics of complex legged robots and 5 their control task dynamics. Here we present a framework to 6 match control task dynamics and natural dynamics based on the 7 neuroelasticity and neuroplasticity concept. Inspired by animals 8 we design quadruped robot Morti with strong natural dynamics 9 as a testing platform. It is controlled through a bioinspired 10 closed-loop central pattern generator (CPG) that is designed to 11 neuroelastically mitigate short term perturbations using sparse 12 contact feedback. We use the amount of neuroelastic activity as 13 a proxy to quantify the dynamics’ mismatching. By minimizing 14 neuroelastic activity, we neuroplastically match the control task 15 dynamics to the robot’s natural dynamics. Through matching 16 the robot learns to walk within one hour with only sparse 17 feedback and improves its energy efﬁciency without explicitly 18 minimizing it in the cost function. 19

Dynamic Locomotion Group, Max Planck Institute for Intelligent Systems, Stuttgart, Germany Correspondance: ruppert@is.mpg.de requires additional energy to enforce a desired behavior (see 48 Toy example section S5). Yet so far, no formulation for the 49 matching of the control task to a given robot's dynamics 50 exists, especially for robots with engineered natural dynamics. 51 Previous work focused on designing specific aspects of 52 natural dynamics to fit a given control scheme [22]- [24]. 53 In this work, we focus on quantifying the match between 54 control task dynamics and natural dynamics, and how to 55 improve and learn matching. 56 57 In animals the neural structure and neuromuscular 58 pathways evolved over many generations and are inherent 59 to each individual at birth [25]. In robotics the control 60 laws and electrical connections are hardcoded in the design 61 phase before production. The timing and intensity of muscle 62 activity patterns in animals and robot motor controller 63 activity however, have to be matched to the system's natural 64 dynamics as a lifelong learning task in animals [26], [27] or 65 during the development of a robot.

67
In this study, we implement a quadrupedal robot with 68 engineered natural dynamics that is controlled by a central 69 pattern generator (CPG). CPGs are neural networks found 70 in animals that produce rhythmic output signals from 71 non-rhythmic inputs [28], [ In our quadruped robot, we implement feedback and reflexes 85 and observe the robot's behavior measured through sparse 86 feedback from contact sensors on the robot's feet. This 87 neuroelastic activity aims to correct discrepancies between 88 desired and measured robot behavior (Figure 2a). We transfer 89 the concept of neuroelasticity from neuroscience that 90 describes the handling of stochastic short-term perturbations 91 while interacting with the environment[37] into locomotion 92 control. To quantify the matching of the robot's natural 93 dynamics and the control task dynamics we use the 94 neuroelastic activity as a proxy. If the dynamics do not 95 match, the feedback mechanisms constantly have to intervene 96 to correct for the discrepancy between the commanded and 97 measured behavior of the robot. 98 We optimize the CPG parameters that describe the control 99 task dynamics by minimizing the amount of neuroelastic  To optimize and tune the control tasks, different methods 111 have been used such as optimization [41]- [43], self-modeling 112 [44], adaptive CPGs [45]- [47], and machine learning 113 techniques [21], [47]- [52]. For this study, we apply 114 Bayesian optimization [53], [54] to minimize the amount 115 of neuroelastic activity to neuroplastically adapt the CPG 116 parameters. 117 In previous work, Owaki et al. [55] presented a control 118 approach that shows spontaneous gait transition based on 119 mechanical coupling ('physical communication'). The CPG 120 is coupled through mechanical coupling. Buchli et al. [47] 121 presented an adaptive oscillator that adapts its frequency to 122 the natural frequency of a SLIP-like simulation model. In 123 their adaptive frequency oscillator approach the matching of 124 control frequency and natural frequency leads to performance 125 improvements and reduction in energy requirements. Fukuoka 126 et al. [45] implemented short-term reflexes that adapt the 127 robot's controller to the motion of the robot induced by 128 external perturbations. Through a closed-loop CPG that 129 incorporates the 'rolling body motion' the robot can actively 130 adapt to its surrounding. Thandiackal et al. [35] showed, 131 that feedback from hydrodynamic pressure in CPGs can 132 lead to self-organized undulatory swimming. Yet so far 133 no approaches for long-term neuroplastic matching of 134 control task dynamics and complex walking system's natural 135 dynamics in passive elastic robots exist.

137
To reduce the experimentation cost in terms of wear, 138 critical failure and time, the control task dynamics are first 139 optimized on a simulated robot. After successful optimization 140 in simulation, the acquired optimal parameter set is applied 141 in hardware. We transfer the optimized CPG parameter set 142 into hardware to measure the performance of the real robot 143 and validate the effectiveness of our approach by evaluating 144 a performance measure.

145
While optimization and learning in simulation are efficient 146 and cheap, the transfer of control policies can be difficult 147 due to the sim2real gap [20], [21], [52]. We examine the 148 transferability of our approach by quantifying the sim2real 149 gap comparing simulation and hardware experiments.

150
The novelty of this study is twofold: The best simulation result is 41% lower compared to the 210 lowest hardware result.

211
To validate the performance as well as the differences between 212 simulation and hardware rollouts in detail we investigate the 213 individual reward factors (Table S3) for both simulation and 214 hardware rollouts (Figure 5a). We find that no one reward 215 factor is responsible for the higher returned reward. Rather, all 216 reward factors are slightly higher and their summation leads 217 to the higher reward returned for the hardware results. The 218 distance reward term (J distance ) and the feedback reward term 219 (J feedback ) attribute the highest difference between simulation 220 and hardware reward value. J distance has a mean hardware 221 reward of 2.13 ± 0.36 compared to a simulation reward of 222 1.67. J feedback has a mean hardware reward of 0.43 ± 0.06 223 compared to the simulation reward of 0.13. We assume the 224 difference is due to modeling assumptions that were made in 225 the simulation. The hardware robot shows a lower speed due 226 to contact losses, gearbox backlash, friction and elasticity in 227 the FootTile sensors. During touchdown, imperfect contact of 228 the feet leads to higher feedback activity which is penalized by 229 the feedback reward term. The body pitch reward term J pitch 230 is in the range of the simulated reward, the mean hardware 231 reward is 0.85 ± 0.22 compared to 0.80 in simulation. Both 232 during the optimization shown here and initial tests the 233 simulated robot showed more body pitch for untuned CPG 234 parameters and the robot flipped over during several rollouts. 235 This never happened on the hardware and even in early 236 experiments the hardware robot never pitched more than 30 • . 237 The periodicity reward term (J periodicity ) (hardware: 0.15 ± 238 0.33, simulation: 0.0) and the contact reward term (J contact ) 239 (hardware: 0.12 ± 0.09, simulation: 0.03) behave similarly in 240 simulation and hardware rollouts. The similarity is expected 241 since both simulation and hardware gaits converge to the 242 desired gait and the latter three reward terms were introduced 243 to guide the optimizer in finding natural gaits mostly during 244 the first rollouts.

245
At the core of our approach, we hypothesize that matching 246 dynamics improves energy efficiency. We therefor explicitly 247 do not incorporate energy efficiency into the cost function. To 248 quantify how matching dynamics improves energy efficiency 249 we calculate a normalized torque as a measure of performance: 250 where n is the leg index, τ knee and τ hip are the mean knee 253 and hip torque per rollout per leg and v body is the mean body 254 velocity of the respective rollout.

255
The initial normalized torque is 2.52, the final value is 256 1.02. The mean normalized torque is 1.7±0.5, the median 257 normalized torque is 1.55 ( Figure 6). As expected the 258 normalized torque reduces over the optimization by 42% 259 (compare section S5). The reduction in normalized torque as 260 an efficiency measure confirms our hypothesis, that matching 261 the control task dynamics to the system's natural dynamics 262 has beneficial effects on energy requirements.  Fig. 2: Schematic depiction of the neuroelasticity and neuroplasticity framework. a, Schematic depiction of short-term neuroelasticity and long-term neuroplasticity. Neuroelasticity (green) mitigates stochastic short term perturbations (red) like a pot hole that disturb the system (spring) from its desired state (dashed line). Neuroelastic activity is reversible and only active when a perturbation is present. Just like a spring only deflects as long as an external force is active and then returns to its initial state. Neuroplasticity (yellow) changes the system behavior permanently to adapt to long-term active stimuli from the environment. If the same perturbation is frequently present, the system adapts to the perturbation. In our example the spring adapts its set point (spring length) and stiffness (spring thickness). This way an initial desired system state that might be encoded in initial control design, can be adapted to better deal with perturbations throughout its life span as well as to changing environments. After the neuroplastic adaptation the spring now deflects less (bottom right green). In this study we measure the amount of neuroelastic activity during level walking of a quadruped robot. We quantify the mismatch between the robot's natural dynamics and the control task dynamics based on the amount of active feedback. By minimizing the neuroelastic activity through optimization we neuroplastically change the control dynamics to increase dynamics matching. b, Control structure of quadruped robot Morti. c, Flowchart of the matching approach. The neuroelastic activity mitigates short term perturbations through sparse contact feedback from the FootTile contact sensors. We measure the amount of neuroelastic activity as a proxy for the mismatching of dynamics. Over a longer time window the optimizer minimizes the neuroelastic activity to neuroplastically match the control task dynamics of the CPG to the robot's natural dynamics. d, Diagram of a step cycle in phase space. Colored sections for feedback mechanisms: late touchdown (red) later than the desired touchdown time TD (δoverSwing), late toe-off (yellow) later than the desired toeoff time TO (δ φ,knee ), early toe-off (green), early touchdown (purple). Stance phase from touchdown to toeoff is shaded blue.

264
In this paper, we develop an approach to measure and 265 improve the matching between control and natural dynamics. 266 We measure the neuroelastic activity of feedback to correct for machine learning approaches [20], [21]. We also find that 279 matching dynamics is beneficial for energy efficiency. We 280 calculate a normalized performance measure which shows a 281 decrease in power requirements.

282
The designed passive behavior our robot Morti enable a 283 simple matched CPG control structure to leverage the natural 284 dynamics of the leg design. Through sparse, binary feedback 285 from touch sensors the controller is able to neuroelastically 286 mitigate the perturbations stemming from initial mismatching. 287 Through the interplay of natural dynamics and the matched 288 CPG, Morti can achieve convincing locomotion performance 289 on inexpensive hardware with lower computation power and 290 with lower control and sensor bandwidth compared to state 291 of the art model-based locomotion controllers.  Fig. 4: CPG parameters and neuroelastic activity. a, Example CPG output for four coupled oscillators and the generated trajectories. Top are the coupled phases, middle and bottom are the hip and knee joint trajectories for one oscillator with their respective CPG parameters pm (Table S2). Parameters here are D = 0.35, δ φ,knee = 0.3, δoverSwing = 0.2, f = 1. b, Simulation results showing the four feedback mechanisms (same color coding as Figure S1). Data is shown for the front left (blue) and front right (orange) leg. Late touchdown (red) on the front left leg phase shows the phase delay to wait for touchdown. Early touchdown (purple) on the right leg shows the knee pull up reflex. Late toeoff (yellow) is shown on he left leg. Early toeoff (green) is shown on the right leg. Stance phase is shaded gray.

358
For both experimentation and simulation, we design and 359 implement quadruped robot Morti. Morti has a monoarticular 360 knee spring and a biarticular spring between hip and foot 361 that provides series elastic behavior [8]. The robot is con-362 trolled by a closed-loop CPG. Through reflex-like feedback 363 mechanisms, the robot can neuroelastically mitigate short-364 term perturbations. To minimize the neuroelastic activity we 365 implement a Bayesian optimizer that neuroplastically matches 366 the control task dynamics to the robot's natural dynamics.

368
The robot consists of four 'biarticular legs' [8, Fig. 1B] 369 mounted to a carbon-fiber body. Each leg has three segments: 370 femur, shank and foot segment. Femur and foot segment 371 are connected through a spring-loaded parallel mechanism 372 mimicking the biarticular muscle-tendon structure formed by 373 the gastrocnemius muscle-tendon group in quadruped animals 374 [60]. A knee spring inspired by the patellar tendon in animals 375 provides passive elasticity of the knee joint.

376
The robot walks on a treadmill and is constrained to 377 the sagittal plane by a linear rail that allows body pitch 378 (Figure 1b). The robot is instrumented with joint angle sensors, 379 position sensors and the treadmill speed sensor. To measure 380 ground contact, four FootTile sensors [61] are mounted on 381 the robot's feet. Through a threshold, these analog pressure 382 sensors can be used to determine if the robot established 383 ground contact. Detailed descriptions of the experimental 384 setup can be found in section S1. 386 We implement the simulation in PyBullet [56], a multibody 387 simulator based on the bullet physics engine (Figure 3a). 388 The robot mechanics are derived from the mechanical robot 389 and its CAD model (Table S1). To increase the match 390 between robot hardware and simulation, we implement 391 motor limits and the motor controller to resemble the real 392 actuator limits [20]. The simulation runs at 1 kHz, the CPG 393 control loop is running at 500 Hz and ground contacts 394 are polled at 250 Hz to resemble the hardware implementation. 395 396 C. CPG 397 The CPG used in this work is a modified Hopf oscillator 398 [34] that is modeled in phase space. Similar to its biological 399 counterpart, it can be entrained through feedback from exter-400 nal sensory input or from internal coupling to neighboring 401 nodes. Based on the desired phase shifts in between oscillator 402 nodes a variety of gaits can be implemented by adapting the 403 phase difference matrix while keeping the network dynamics 404 identical (section S6). The joint trajectories generated by the 405 CPG are described by eight parameters (Figure 4a As the CPG implemented here is written as a model-free 417 feed-forward network it can be difficult to find parameters 418 that lead to viable gaits with given robot dynamics. 419 Essentially the CPG commands desired trajectories without 420 knowledge of the robot's natural dynamics. In the worst 421 case the CPG could command behavior the robot cannot 422 fulfill because of its own natural dynamics and mechanical 423 limitations like inertia, motor speed, or torque limitations. 424 To fix this shortcoming, feedback can be used to mitigate 425 the differences between desired and measured behavior. One 426 possible feedback that has been shown to aid in entrainment 427 and can mitigate perturbations is foot contact information 428 [32]. This contact information can be integrated into the 429 CPG to measure timing differences between the desired and 430 measured trajectories.

432
The trajectories created by the CPG can be influenced 433 through feedback either by changing the CPG dynamics, 434 meaning accelerating or decelerating the CPG's phases. 435 Alternatively, feedback can influence the generated joint angle 436 trajectories. During a step cycle (Figure 2d), contact signals 437 can be used for several feedback mechanisms ( Figure S1). The 438 feedback mechanisms react to timing discrepancies for the 439 touchdown and toeoff events and correct the CPG trajectories 440 if the robot establishes or loses ground contact earlier or later 441 than instructed by the CPG. In this work, we implement a 442 phase deceleration for delayed touchdown, early knee flexion 443 when ground contact is lost too early, a phase deceleration 444 9 when knee flexion is delayed, and a knee pull-up reflex in  To achieve long-term (close to) optimal behavior we use 460 Bayesian optimization for its global optimization capabilities, 461 data efficiency and robustness to noise [53], [54].   2) Cost function: We evaluate the robot based on 502 a cost function comprised of three major components. 503 The first component influences the matching behavior 504 (J feedback ), specifically the amount of neuroelastic activity 505 that the robot uses during a rollout. The second component 506 measures effective forward locomotion (J distance ) to provide 507 meaningful results. The third component ensures a gait 508 comparable to the gaits observed in animals and serves 509 mostly as a penalty for 'unnatural gait characteristics'. It 510 enforces, that the robot moves with little body pitch (J pitch ), 511 enforces only one contact phase per leg and step (J contact ) 512 to prevent dragging and skipping and only takes one step 513 per stride and leg (J periodicity ). Further description can be 514 found in section S4. To validate the optimal set of CPG parameters p m from 518 simulation, we test the same parameters on the hardware 519 robot. The hardware controller has the same neuroelastic 520 mechanisms that were previously described in subsection III-521 D. We test 10 parameter sets and randomly vary the CPG 522 parameters obtained from simulation by ≤ 10% to validate 523 the hardware reward function around the optimal point 524 found in simulation. We then evaluate the robot performance 525 with the same reward function used for the simulation. 526 Like the simulation, the robot CPG is entrained in air and 527 the performance is only measured for 10 s after the robot 528 converged to a stable gait. Videos of the robot walking can 529 be found in the supplementary material.

531
We thank the International Max Planck Research School for 532 Intelligent Systems (IMPRS-IS) for supporting the academic 533 development of Felix Ruppert. This work was made possible 534 thanks to a Max Planck Group Leader grant awarded to ABS 535 by the Max Planck Society. The authors thank ZWE robotics 536 for support with 3D printing, the MPI machine shop for 537 support with metal manufacturing, Majid Khadiv and Ludovic 538 Righetti for the discussions concerning Bayesian optimization, 539 Alborz Sarvestani for the many fruitful discussions and Robin 540 Petereit for his input concerning Morti's firmware.   The robot consists of four 'biarticular legs' [8, Fig. 1B] 813 mounted to a carbon fiber compound plate (CFK sandwich, 814 carbon-vertrieb). The body structure provides high bending 815 and torsion stiffness at low weight. Each leg is articulated by 816 two brushless outrunner motors (MN7005, tmotors). The hip 817 motor is geared through a 1:5 planetary gearbox (RS3505S, 818 Matex), the knee motor is geared through a 1:12 one-819 directional cable drive mechanism to flex the knee joint. 820 The knee extends through the knee spring that tensions 821 during flexion. The knee and ankle joints of the robot are 822 instrumented with rotary encoders (AEAT8800, Broadcom). 823 A hall-effect switch (DRV5023, Texas Instruments) between 824 the body and femur segment provides a reference to initiate 825 the angle measurements of the hip joints. Hard stops at the 826 robot's knee and ankle joints prevent overextension during 827 stance phase. Four hall-effect current sensors (ACS723, Alegro 828 Microsystems) measure input currents to the motor driver 829 boards. The robot is controlled by a Raspberry Pi 4 through 830 a custom-made shield. The shield consists of a SPI GPIO 831 expander (MCP23017, Microchip and four tri-state buffers 832 (74HC125, nexperia). The shield connects four SPI controlled 833 custom brushless motor drivers [40], the hall switches and 834 the joint encoders to the computer. Two external 12 V 80 A 835 lead batteries supply motors and the computer with power.

836
Here we focus on motion in the sagittal plane. The robot is 837 therefor constrained to motion in the sagittal plane by a linear 838 rail and lever mechanism that also allows body pitch around 839 the robot's center of mass (COM). The robot walks on an off-840 the-shelf recreational treadmill (TM500S, Christopeit) that is 841 retrofitted with a motor controller (DPCANIE, amc). Tread-842 mill speed is measured with a rotary encoder (AEAT8800, 843 Broadcom). The robot is connected to the treadmill with a 844 linear rail (SSEB, Misumi) and a lever mechanism. The linear 845 rail is equipped with a linear encoder (AS5311, ams). The 846 experimental setup can be seen in Figure 1b.

847
To measure ground contact each foot is equipped with 848 a FootTile sensor [61] connected to a I 2 C multiplexer 849 (TCA9548A, Texas Instruments). The sensor dome is a half-850 cylinder with a half-cylinder air cavity that houses the sensor. 851 We do not expect forces in the lateral direction due to the 852 guiding mechanism. Therefor the sensor domes are laterally 853 symmetric ( Figure 1b). As the FootTile sensors are analog 854 pressure sensors, we define a threshold to measure when 855 the sensors are in contact with the ground. The sensors read 856 values between 98.5 and 99 kPa when not in contact with the 857 ground. We define 100 kPa as the contact threshold.

858
All data on the robot is sampled at 500 Hz except for the 859 FootTile data is sampled at 250 Hz because of limitations in 860 the pressure sensor hardware.
where φ is the oscillator phase vector and Ω is the angular 869 velocity vector.
where f is the frequency, α dyn,j,k is the conversion constant 872 of the network dynamics between nodes j and k, C jk is the 873 coupling matrix weight between nodes j and k, Φ jk is the 874 desired phase difference matrix value between nodes j and 875 k.
where φ j is the j th end-effector phase, D is the duty factor, 880 t stance is the duration of stance phase and t flight is the duration 881 of flight phase.

885
Depending on the gait symmetry, Θ offset and Θ hipAmplitude 886 are also only equal in legs that share the same gait symmetry.
where i hipMotor is the desired hip motor current, k p,i,d are 939 the controller gains and Θ hip is the hip angle.
where i kneeMotor is the desired knee motor current, k p , k i , 947 k d are PID controller gains and Θ knee is the knee angle.

948
Additionally the desired knee current contains a feed-forward 949 term, i feedForward , that calculates the required static current :   The same mechanism for late touchdown can also be 993 applied for late toeoff. At the end of stance phase, the 994 leg waits until the end of ground contact before initiating 995 swing phase. This waiting period effectively also closes 996 the loop on the timing of the knee trajectory as the knee 997 and hip trajectories are directly coupled through the same 998 phase reference. The late toeoff mechanism described here, 999 however, only works for gaits with flight phase as the legs 1000 otherwise never lift off the ground because of double support 1001 during stance phase. In this case, swing phase has to be 1002 shown in the results section, the late toeoff mechanism where J contact is the mean amount of flight-stance changes 1065 per step, n is the number of legs, t is time, T is the evaluation 1066 duration and contact is the contact sensor data matrix for all 1067 four legs.

1068
The periodicity term minimizes non-periodic behavior of the 1069 robot to make sure the performance of the robot does not come 1070 from undesired behavior like non-periodic jumps or skips. 1071 To do so we calculate the average distance of the maxima 1072 from the autocorrelation of the pitch angle. The pitch angle 1073 is convoluted with itself to obtain the frequency spectrum 1074 of the body pitch angle. We then compare this frequency 1075 with the actual CPG frequency, by calculating the standard 1076 deviation to determine how well the CPG and the robot's 1077 passive dynamics match. The standard deviation provides a 1078 good measure of the variation in the oscillatory behavior and 1079 is used to characterize how well the CPG dynamics fit the 1080 mechanical dynamics of the robot.

1081
S pitch =α pitch * α pitch where S pitch is the frequency spectrum of the body pitch 1087 signal α pitch , f bodyPitch is the frequency of the body pitch 1088 measurement, J periodicity is the standard deviation of the peri-1089 odicity measure and f cpg is the commanded CPG frequency. 1090 The body pitch term minimizes the body rotation of the 1091 robot during locomotion and ensures stable gaits and energy 1092 efficient behavior.
where J pitch is the mean body pitch amplitude of the robot. 1095 It is calculated as the difference between mean minimum and 1096 mean maximum pitch angle of all strides during one iteration. 1097 The reward function is then calculated as the weighted sum of 1098 all the reward terms shown in Table S3. Should the robot fall 1099 before the entrainment period is over or it moves backwards, 1100 the robot is rewarded a high penalty (100) for failure. If the 1101 robot falls after the entrainment time, the performance until 1102 that point is evaluated. Here the distance reward J distance is 1103 5    where α is the pendulum angle, τ is the motor torque, m is 1129 mass, g is gravitational acceleration, l is the pendulum length, 1130 I is the pendulum inertia, α desired is the desired pendulum 1131 angle, α 0 is the oscillation amplitude, f is the oscillation 1132 frequency, t is time, k d and k d are the PD controller gains and 1133 P is motor power. Because this matching is non-trivial in our 1134 robotic walking machine due to highly nonlinear impedance 1135 behavior leading to nonlinear Eigenmodes and underactuation, 1136 a mathematical formulation is not possible. Alternatively, 1137 a data-driven approach can be used to approximate the 1138 performance landscape of the frequency relationship.

S6. CPG MATRICES
1140 Table S4 show the coupling matrix C that defines the 1141 connections between CPG nodes for φ as in Equation S1. Φ 1142 describes the desired phase differences between CPG nodes. 1143 Table S5 describes the conversion factors used in Equation S2 1144 and Equation S13. With these factors, the convergence of the 1145 smooth transitions can be accelerated when a CPG parameter 1146 is changed. The CPG conversion factors are chosen in a 1147 way that ensures changes in trajectory-related parameters to 1148 change within one stride of the robot. Factors for frequency 1149 (α f ) and phase difference (α phaseDifference ) are lower, so that 1150 6    Figure S3 shows the gait pattern of the most successful Emerging gait pattern after optimization in hardware. The gait that emerged is a trot gait. Data shown here is averaged over 10 steps. In both hind legs a small fraction of time is visible where the legs lose contact to the ground due to the pitching body.