Learning plastic matching of robot dynamics in closed-loop central pattern generators

Animals achieve agile locomotion performance with reduced control effort and energy efficiency by leveraging compliance in their muscles and tendons. However, it is not known how biological locomotion controllers learn to leverage the intelligence embodied in their leg mechanics. Here we present a framework to match control patterns and mechanics based on the concept of short-term elasticity and long-term plasticity. Inspired by animals, we design a robot, Morti, with passive elastic legs. The quadruped robot Morti is controlled by a bioinspired closed-loop central pattern generator that is designed to elastically mitigate short-term perturbations using sparse contact feedback. By minimizing the amount of corrective feedback on the long term, Morti learns to match the controller to its mechanics and learns to walk within 1 h. By leveraging the advantages of its mechanics, Morti improves its energy efficiency by 42% without explicit minimization in the cost function. Using the natural dynamics of a legged robot for locomotion is challenging and can be computationally complex. A newly designed quadruped robot called Morti uses a central pattern generator inside two feedback loops as an adaptive method so that it efficiently uses the passive elasticity of its legs and can learn to walk within 1 h.

A nimals can locomote with grace and efficiency due to intelligence embodied in their leg designs 1 . Owing to compliant mechanisms in their leg designs, animals can safely traverse rough and unstructured terrain 2,3 in the presence of neural delays and limited actuator power and bandwidth 4,5 . These compliant mechanisms are important components of the natural dynamics of a system. Natural, or passive, dynamics 6 describes the system's passive dynamic behaviour governed by its mechanical characteristics, such as impedance or inertia. More specifically, it describes the dynamics of the unactuated plant transfer function 7 .
Compliant mechanisms help to mitigate the interaction forces between walking systems and the environment that are hard to model and are defined by a high degree of uncertainty 8 .
To gain a better understanding of the underlying mechanics, bioinspired robots with passive compliant structures that provide the same advantages to robots and simplify the control task have been investigated 2,[9][10][11] . By designing mechanical properties such as impedance [12][13][14] and spring-loaded inverted pendulum behaviour 3,15,16 , the natural dynamics can be designed to achieve viable behaviour with no or reduced control effort, improved energy efficiency and robustness [17][18][19] comparable to nature.
In a system with strong natural dynamics, the mechanical elements produce forces comparable to the actuators. The challenge of how a controller learns to leverage those natural dynamics then arises. How can animals and bioinspired robots learn to match the control patterns (meaning the desired muscle or motor activation patterns) they produce to their natural dynamics to leverage advantageous passive characteristics?
If the control patterns do not match the natural dynamics, the controller requires additional energy to enforce a desired behaviour (see Supplementary Section 5) as it has to overcome the forces and torques produced by the passive mechanical elements. There is a lack of model-free learning formulations for the matching of control patterns to a given robot's dynamics, especially for robots with strong engineered passive compliant elements. Previous work focused on designing specific aspects of natural dynamics to fit a given control scheme 9,20,21 . In this work we focus on quantifying the match between control patterns and natural dynamics, and how to improve and learn matching in a bioinspired quadruped robot (Fig. 1a).
The neural structure and neuromuscular pathways of animals evolved over many generations and are inherent to each individual at birth 22 . In robotics, the control approach and electrical connections are hardcoded in the design phase before deployment. The timing and intensity of muscle activity patterns in animals have to be matched to the system's natural dynamics as a lifelong learning task [23][24][25] , whereas in robotics the controller has to be learned or tuned for optimal performance during the testing phase or during the robot's lifetime [26][27][28] .
To learn matching the control patterns to the natural dynamics, we separated system perturbations by their time horizon. A one-time stochastic perturbation, like stumbling, should not trigger a long-term adaptation. However, if stumbling occurs frequently, the system should adapt to this systematic discrepancy between the desired control patterns and the system's behaviour governed by the natural dynamics.
To implement this approach we took inspiration from the concept of neuroelasticity and long-term neuroplasticity from neuroscience 29 , as well as the concept of elasticity and plasticity in mechanics 30 that describes the reaction to environmental stimuli based on its intensity. A one-time stimulus with low intensity will be mitigated and the control pattern elastically returns to its initial state (Fig. 2a, top). Permanent or frequent stimuli will plastically adapt the control pattern to remove the discrepancy between desired control pattern and natural dynamics behaviour (Fig. 2a, bottom).
In this study we implemented a quadruped robot, Morti, with engineered natural dynamics that is controlled by a central pattern generator (CPG). CPGs are neural networks found in animals that produce rhythmic output signals from non-rhythmic inputs 31,32 for tasks such as chewing, breathing and legged locomotion 33,34 . In robotics CPGs are used as joint trajectory generators 13,35,36 or bioinspired muscle activation pattern generators 37,38 . Feedforward CPGs dictate control and coordination of motor or muscle activation without knowledge of the system's dynamics. These model-free feedforward patterns work well in combination with passively compliant leg designs that provide passive stability and robustness 13,37,38 . By closing feedback loops in CPGs, the system can actively react to unforeseen influences from its environment and mitigate perturbations 32,35,39 such as unstructured terrain.

Learning plastic matching of robot dynamics in closed-loop central pattern generators Felix Ruppert ✉ and Alexander Badri-Spröwitz
Animals achieve agile locomotion performance with reduced control effort and energy efficiency by leveraging compliance in their muscles and tendons. However, it is not known how biological locomotion controllers learn to leverage the intelligence embodied in their leg mechanics. Here we present a framework to match control patterns and mechanics based on the concept of short-term elasticity and long-term plasticity. Inspired by animals, we design a robot, Morti, with passive elastic legs. The quadruped robot Morti is controlled by a bioinspired closed-loop central pattern generator that is designed to elastically mitigate short-term perturbations using sparse contact feedback. By minimizing the amount of corrective feedback on the long term, Morti learns to match the controller to its mechanics and learns to walk within 1 h. By leveraging the advantages of its mechanics, Morti improves its energy efficiency by 42% without explicit minimization in the cost function.
In our quadruped robot Morti, we implemented feedback loops based on continuous sensor data and feedforward reflexes triggered by discrete perturbation events. We then observed Morti's behaviour as measured using sparse feedback from contact sensors on its feet. This short-term feedback acts as a mechanism for mitigating elastic short-term perturbations.
To quantify how well the control pattern matched Morti's natural dynamics, we used the elastic feedback activity as a proxy. If the dynamics did not match, the feedback mechanisms constantly had to intervene to correct for the discrepancy between the commanded and measured behaviour of Morti. The matching of the control patterns needed to be increased plastically.
To improve the matching plastically, we optimized the CPG parameters that generate the control patterns by minimizing the amount of elastic feedback activity (Fig. 2b,c).
In previous work, Owaki and Ishiguro 54 presented a control approach that showed spontaneous gait transition based on mechanical coupling ('physical communication'). The CPG was coupled through mechanical coupling. Buchli et al. 45 presented an adaptive oscillator that adapted its frequency to the natural frequency of a spring-loaded inverted pendulum-like simulation model. In their adaptive frequency oscillator approach, the matching of control frequency and natural frequency led to performance improvements and a reduction in energy requirements. Fukuoka et al. 43 implemented short-term reflexes that adapted the robot's controller to the motion of the robot induced by external perturbations. Through a closed-loop CPG that incorporated the 'rolling body motion' the robot could actively adapt to its surrounding. Thandiackal et al. 38 showed that feedback from hydrodynamic pressure in CPGs can lead to self-organized undulatory swimming. However, there are no approaches for long-term plastic matching of control patterns and the natural dynamics of complex walking systems in passive elastic robots at present.
As learning and exploration in hardware are prone to critical failure, the control patterns were first optimized in simulation (Fig. 3a), as is common practice in robotics 47,50,51 . After successful optimization in simulation, the acquired optimal parameter set was applied in hardware (Fig. 3b). We transferred the optimized CPG parameter set into hardware to measure the performance of the real robot and validate the effectiveness of our approach by evaluating a performance measure.
Although optimization and learning in simulation are efficient and cheap, the transfer of control policies can be difficult due to the sim2real gap 50,51,55 . We examined the transferability of our approach by quantifying the sim2real gap by comparing simulation and hardware experiments.
Here we implemented elastic CPG feedback pathways triggered by foot contact. We utilized this elastic feedback activity to mitigate short-term perturbations. Over the long term, we used the feedback activity as a proxy for the mismatching between Morti's natural dynamics and the control pattern. We plastically minimized the required elastic feedback activity through model-free Bayesian optimization. Our approach enabled Morti to learn a trot gait at 0.3 m s −1 within 1 h. Matching improved energy efficiency without explicit formulation in the cost function. The improved energy efficiency is evidence of increased matching.

Results
We first examined the performance of the feedback mechanisms in simulation (Fig. 4). The feedback mechanism for late touchdown (r LTD ), shown in red, decelerated the phase of the front left leg to wait until ground contact was established. The deceleration was visible in the flatter gradient of the oscillator phase when the mechanism was active.
The early touchdown mechanism (r ETD ) triggered a knee pull-up reflex (purple line) to shorten the leg to prevent further impact. In the event of early toeoff (r ETO ), shown in yellow, the knee flexion started earlier than instructed by the feedforward CPG. The late toeoff mechanism (r LTO ) measured the mismatching of control task and natural dynamics but did not trigger a feedback mechanism. The feedback mechanisms helped Morti to mitigate perturbations stemming from dynamics mismatching. This mitigation effect was especially important in the first rollouts of the optimization, where good dynamics matching was not yet achieved. In this rollout, the late touchdown mechanism was active for 7% of the step cycle, the early touchdown mechanism was active for 5% of the step cycle, the late toeoff mechanism was active for 8% of the step cycle and the early toeoff mechanism was active for 9% of the step cycle.
We found that 150 rollouts in simulation (Fig. 5a) were sufficient to learn a gait at a speed of 0.3 m s −1 . Each rollout in simulation took an average of 23 s for 20 s of simulation runtime on an Intel i7 CPU, making the whole optimization duration roughly 1 h. The hardware rollouts were roughly 1 min long to ensure stable locomotion. When Morti reached stable behaviour, 10 s were evaluated, as in the simulation rollouts.
After initialization of the Gaussian kernel with 15 rollouts with random CPG parameters, the optimizer started to approximate the cost function and performance converged toward the optimum point.
During the whole optimization, Morti fell 16 times or 11% of rollouts. Nine of the failed rollouts occurred during the first 15 rollouts with random CPG parameters. . Elastic feedback (green) mitigates stochastic short-term perturbations (red), such as pot holes, that disturb the system (spring) from its desired state (dashed line). Elastic activity is reversible and only active when a perturbation is present, just as a spring only deflects as long as an external force is active and then returns to its initial state. Plasticity (yellow) changes the system behaviour permanently to adapt to long-term active stimuli from the environment. If the same perturbation is frequently present, the system adapts to the perturbation. In our example the spring adapts its set point (spring length, dashed line) and stiffness (spring thickness). In this way, an initial desired system state that might be encoded in the initial control design can be adapted to better deal with perturbations throughout its life span, as well as changing environments. After plastic adaptation the spring deflects less (green, bottom right). b, Control structure of Morti. k p , k d and k i are the joint controller gains; G(s) is the plant transfer function in Laplace space s. c, Flowchart of the matching approach. The elastic feedback activity mitigates short-term perturbations through sparse contact feedback from the FootTile contact sensors. We measured the amount of elastic feedback activity as a proxy for the mismatching of dynamics. Over a longer time window, the optimizer minimizes the elastic feedback activity to plastically match the control pattern of the CPG to Morti's natural dynamics. F, front; L, left; H, hind; R, right. Colour representations are similar to those in Fig. 5b. d, Diagram of a step cycle in phase space (ϕ). The segments are colour-coded by feedback mechanism: late touchdown (red) later than the desired touchdown time (δ overSwing ), late toeoff (yellow) later than the desired toeoff time (δ ϕ,knee ), early toeoff (green) and early touchdown (purple). The stance phase from touchdown to toeoff is shaded blue.
Through optimization, the simulated robot increased its performance from the least-performing rollout (rollout 107, cost 5.62) to the optimal rollout (rollout 109, cost 2.59) by 215%. In comparison, the simulation results transferred to hardware scored a cost between 5.65 and 4.41. The mean simulation cost was 3.49 ± 0.66 and the median simulation cost was 3.34. The mean hardware cost was 4.96 ± 0.38. The best simulation result was 41% lower than the lowest hardware result.
To validate the performance, as well as the differences between simulation and hardware rollouts in detail, we investigated the individual cost factors (Supplementary Table 3) for both simulation and hardware rollouts (Fig. 5a). We found that no single cost factor was responsible for the higher returned cost. Instead, all cost factors were slightly higher and their summation led to the higher cost returned for the hardware results. The distance cost term (J distance ) and the feedback cost term (J feedback ) contributed the highest difference between the simulation and hardware cost values: J distance had a mean hardware cost of 2.13 ± 0.36 compared with a simulation cost of 1.67 and J feedback had a mean hardware cost of 0.43 ± 0.06 compared with a simulation cost of 0.13. We assumed that the difference was due to modelling assumptions that were made in the simulation. The hardware robot showed a lower speed due to contact losses, gearbox backlash, friction and elasticity in the FootTile sensors. During touchdown, imperfect contact of the feet led to higher feedback activity, which imposed a penalty via the feedback cost term. The body pitch cost term J pitch was in the range of the simulated cost; the mean hardware cost was 0.85 ± 0.22 compared to 0.80 in simulation. Morti showed more body pitch both during the optimization shown here and initial tests for untuned CPG parameters, and it flipped over during several rollouts. This did not happen in hardware-even in early experiments the hardware robot never pitched more than 30°. The periodicity cost term (J periodicity ) (hardware: 0.15 ± 0.33; simulation: 0.0) and the contact cost term (J contact ) (hardware: 0.12 ± 0.09, simulation: 0.03) behaved similarly in simulation and hardware rollouts. This similarity was expected as both simulation and hardware gaits converged to the desired gait, and the latter three cost terms were introduced to guide the optimizer to find gaits similar to the desired CPG patterns, mostly during the first rollouts.
At the core of our approach, we hypothesized that matching dynamics improves energy efficiency. We therefore explicitly did not incorporate energy efficiency into the cost function. To quantify how matching dynamics improved energy efficiency we calculated a normalized torque as a measure of performance. We chose a normalized mean torque as we showed in previous work 12 that the torque signal has no major oscillations ( Supplementary Fig. 5). It is therefore a sufficient representation of the system's energy requirement and is simple to measure both in simulation and in the hardware robot.
where n is the leg index, τ knee and τ hip are the mean knee and hip torque per rollout per leg and v body is the mean body velocity of the respective rollout. The initial normalized torque was 2.52, and the final value 1.02. The mean normalized torque was 1.7 ± 0.5 and the median normalized torque was 1.55 (Fig. 6). As expected, the normalized torque reduced over the optimization by 42% from plastically unmatched initial conditions (compare with Supplementary Section 5). The reduction in normalized torque as an efficiency measure confirmed our hypothesis, that matching the control pattern to the system's natural dynamics has beneficial effects on energy requirements.

Discussion
We suggested that enabling a locomotion controller to leverage the passively compliant leg structures could increase the energy efficiency indirectly. By minimizing the required elastic feedback activity, the controller learns to increase the matching between its control pattern and the natural dynamics. We showed that 150 optimization rollouts sufficed to learn a stable trot gait on flat ground at a speed of 0.3 m s −1 from random initial conditions with an optimization duration of 1 h. In our experiment, we showed that matching dynamics is indeed beneficial for energy-efficient locomotion. We calculated a normalized performance measure that showed a decrease in power requirements.
In the normalized torque measure (τ normal ), Morti benefited from the increase in distance cost and a reduction in the required torque. Even though Morti increased its speed more than two-fold, the required normalized torque did not increase. Instead, the normalized torque decreased with a trend comparable to the cost function. The improved control pattern matching enabled the controller to leverage the natural dynamics to achieve better performance (Fig. 6).
The designed passive behaviour of Morti enabled a simple matched CPG control structure to leverage the natural dynamics of the leg design. Through sparse binary feedback from touch sensors, the controller was able to elastically mitigate the perturbations stemming from initial mismatching. Through synergy of the natural dynamics and matched CPG, Morti learned to walk on inexpensive  Table 2). Parameters here are D = 0.35, δ ϕ,knee = 0.3, δ overSwing = 0.2, f = 1. Right: simulation results showing the four feedback mechanisms (same colour coding as Supplementary Fig. 2). Data are shown for the front left and front right legs. Late touchdown (red) on the front left leg phase shows the phase delay to wait for touchdown. Early touchdown (purple) on the right leg shows the knee pull-up reflex. Late toeoff (yellow) is shown on the left leg. Early toeoff (green) is shown on the right leg. The stance phase is shaded grey. Θ hipAmplitude , hip amplitude; Θ kneeAmplitude , knee amplitude; Θ hipOffset , hip offset; f, frequency; δ knee , knee phase shift; δ overSwing , knee overswing; D duty factor as described in Supplementary Section 2.
hardware (<€4,000) with low computational power (5 W Raspberry power) and with lower control (500 Hz control loop) and sensor (250 Hz binary sensor signal) frequencies than state-of-the-art model-based locomotion controllers that require high-bandwidth computation and high control frequencies (>2 kHz control frequency, >17 W processor power) 50,56 . Closely examining the cost (Fig. 5b) showed that through dynamics matching and the minimization of J feedback , Morti travelled longer distances in the given time, as shown by the improved J distance . There was little change in J pitch over the optimization, which was expected because the CPG cannot actively control body pitch. The J periodicity and J contact terms were used as penalty terms for undesired gait characteristics. They are an order of magnitude lower than the remaining cost terms, and only peaked for less performant rollouts.
Although the gait learned in this work was simpler than state-of-the-art full-body control approaches, we provide evidence that minimizing feedback activity stemming from systematic mismatching between the control pattern and the system's natural dynamics provides an alternative learning approach.
Compared with end-to-end learning approaches 50,51 , our method requires fewer rollouts. As the underlying control structure (the CPG network) was predefined, it required no approximation in the learning approach first. On the other hand, the versatility of CPG-based locomotion hinges on the complexity of the chosen CPG model. In this proof of concept, we chose a simplified CPG to limit the complexity of the underlying model and thus the technical implementation of our approach. However, we believe that our approach of minimizing elastic feedback activity to increase the matching between control pattern and natural dynamics is not limited by the choice of pattern generator, and could be transferred to other locomotion controllers that possess a metric for the amount of required elastic feedback.
The discrete feedback events (Fig. 2d) allowed the amount of feedback activity to be measured easily in our example. In systems with continuous feedback such as whole-body control 16,56,57 , we believe control effort 58 could be an alternative measure. Our approach is also not limited to Bayesian optimization approaches, and we believe it could be used as part of the cost/reward function for different optimization or learning approaches. In this way, our model-free matching approach could scale to different compliant robots. More generally, our approach could be adapted for more versatile control approaches such as model-based full-body control 56,57 or CPGs with more versatile behaviours 43 .
Although other studies reported problems with transferring simulation results to hardware (the sim2real gap) 59-61 , we successfully transferred our simulation results to the Morti hardware without post-transfer modifications. The hardware performance was comparable both quantitatively (Fig. 5b) and in qualitative observation of the resultant gaits ( Supplementary Videos 1 and 2). We believe that because the joint torques of Morti were not calculated from potentially inaccurate model parameters, the sim2real gap is not as evident here. Learned controllers that directly influence joint torques and leg forces might suffer more from the sim2real gap because smaller  Table 3). The individual cost terms show similar results between the simulation results (lines) and the hardware samples (dots). The mean hardware cost values are similar to the optimal costs from the simulation. inaccuracies between model and hardware behaviour can have a direct effect on the forces exerted on Morti. However, more research will be required to understand the transferability of results for underactuated robots with strong natural dynamics.
In future work, we intend to extend the CPG, taking body pitch into account when generating the hip trajectories as done by ref. 39 . With an inertial measurement unit the body pitch could be fed back into the CPG. In the current formulation, the CPG assumes no body pitch and relies on the robustness the passive elasticity adds to the system to compensate the existing body pitch. Abduction/adduction degrees of freedom with their respective feedback loops 39,62 could also be added to Morti to enable 3D locomotion without a guiding mechanism. The optimization loop could be implemented to run online on the hardware robot's computer. With online optimization and 3D locomotion, it would become possible to investigate the lifelong adaptation of the CPG control patterns to changing ground conditions and surface properties over extended time windows, as well as adaptations to wear and tear throughout Morti's lifetime.
In this Article, we examined how a walking system with limited control and sensor bandwidth could learn to leverage the intelligence embodied in its leg mechanics. Energy efficiency and speed are often used as criteria to evaluate performance in robotic systems. Here we proposed an additional measure that focuses on the synergy of passive mechanical structures and neural control. By separating feedback by its time horizon, we achieved perturbation mitigation in the short term and at the same time quantified the mismatching of control patterns and natural dynamics. We optimized the long-term performance of the system and adapted the controller to its mechanical system. Although investigated in a robotic surrogate, our findings could provide a new perspective on how learning in biological systems might happen in the presence of neural limitations and sparse feedback. Matching is probably not the sole driving factor in animal learning, but our study suggests that a quantitative measure for 'long-term learning from failure' could in part be influenced by the goal of maximizing the synergy between locomotion control and the robot's or animal's mechanical walking system. In contrast to task-specific cost functions such as speed or energy efficiency, our matching approach provides an intrinsic motivation to leverage the embodied intelligence in the natural dynamics as much as possible.

Methods
For both the experimentation and simulation, we designed and implemented quadruped robot Morti. Morti has a monoarticular knee spring and a biarticular spring between hip and foot that provide series elastic behaviour 12 . It was controlled by a closed-loop CPG. Through reflex-like feedback mechanisms, Morti could elastically mitigate short-term perturbations. To minimize the elastic activity, we implemented a Bayesian optimizer that plastically matched the control pattern to Morti's natural dynamics.

Robot mechanics.
Morti consists of four 'biarticular legs' (Fig. 1b; ref. 12 ) mounted on a carbon fibre body. Each leg has three segments: femur, shank and foot. The femur and foot segments are connected by a spring-loaded parallel mechanism that mimics the biarticular muscle-tendon structure formed by the gastrocnemius muscle-tendon group in quadruped animals 63 . A knee spring inspired by the patellar tendon in animals provided passive elasticity of the knee joint.
Morti walked on a treadmill and was constrained to the sagittal plane by a linear rail that allowed body pitch (Fig. 1b). It was instrumented with joint angle sensors, position sensors and the treadmill speed sensor. To measure ground contact, four FootTile sensors 64 were mounted on Morti's feet. Using a threshold, these analogue pressure sensors could be used to determine whether it established ground contact. Detailed descriptions of the experimental set-up can be found in Supplementary Section 1.

Simulation.
We implemented the simulation in PyBullet 59 , a multibody simulator based on the bullet physics engine (Fig. 3a). The robot mechanics were derived from the mechanical robot and its computer-aided design model (Supplementary  Table 1). To increase the matching between the robot hardware and simulation, we imposed motor limits and set the motor controller to resemble the real actuator limits 55 . The simulation ran at 1 kHz, the CPG control loop ran at 500 Hz and ground contacts are polled at 250 Hz to resemble the hardware implementation. The control frequency was chosen for technical reasons to guarantee stable position control and fast data acquisition. It could be lower, as shown in similar systems 5,13 , to more closely resemble the neural delays and low technical complexity in animals.

CPG.
The CPG used in this work was a modified Hopf oscillator 13,35,62 that was modelled in phase space. More biologically accurate and biomimetic CPG models do exist 37,43,54 ; we chose this representation because of its reduced parameter space while retaining the functionality required to generate joint trajectories for locomotion. Similar to their biological counterparts, CPGs can be entrained through feedback from external sensory input 38 or from internal coupling to neighbouring nodes 54 . The CPG in this work consisted of four coupled nodes, representing the four legs (see Supplementary Fig. 1). The hip and knee of each leg were coupled through a variable phase shift. Depending on the desired phase shifts in between oscillator nodes (legs), a variety of gaits can be implemented by adapting the phase difference matrix while keeping the network dynamics identical (Supplementary Section 6). The joint trajectories generated by the CPG are described by eight parameters (Supplementary Table 6): the hip offset (Θ hipOffset ) and hip amplitude (Θ hipAmplitude ) describe the hip trajectory, the knee offset amplitude (Θ kneeOffset ) describes the knee flexion, the frequency f determines the robot's overall speed, duty factors (D) describe the ratio of stance phase to flight phase, the knee phase shift (δ ϕ,knee ) describes the phase shift between hip protraction and knee flexion and overswing (δ overSwing ) describes the amount of swing leg retraction 65 . The mathematical description of the CPG dynamics can be found in Supplementary Section 2.
Elasticity. As the CPG implemented here was written as a model-free feedforward network, it could be difficult to find parameters that lead to viable gaits with given robot dynamics. Essentially, the CPG commands desired trajectories without knowledge of the robot's natural dynamics. In the worst-case scenario the CPG would command behaviour that the robot cannot fulfil because of its own natural dynamics and mechanical limitations such as inertia, motor speed or torque limitations. To address this shortcoming, feedback can be used to mitigate the differences between desired and measured behaviour.
The feedback implemented here was an adaptation from Righetti et al. 35 that has been shown to aid in entrainment and can mitigate perturbations in foot contact information. This contact information can be integrated into the CPG to measure timing differences between the desired and measured trajectories.
The trajectories created by the CPG can be influenced through feedback by changing the CPG dynamics (meaning accelerating or decelerating the CPG's phases). Alternatively, feedback can influence the generated joint angle trajectories. During a step cycle (Fig. 2d), contact signals were used for several feedback mechanisms (Supplementary Fig. 2). The feedback mechanisms reacted to timing discrepancies in the touchdown and toeoff events and corrected the CPG trajectories if Morti established or lost ground contact earlier or later than instructed by the CPG. Righetti et al. 35 showed how these feedback mechanisms can actively stabilize a CPG controlled robot. We adapted these mechanisms to a phase-space CPG formulation and robot hardware to achieve similar performance in a different class of robot.
Using the feedback mechanisms implemented in the CPG, we corrected the timing of discrete events. If touchdown and toeoff happened earlier or later than commanded by the feedforward control pattern, the individual phases of each leg (CPG node) could be accelerated or decelerated to correct Morti's behaviour. We implemented a phase deceleration when touchdown was delayed (Fig. 2d, red). We accelerated knee flexion when a foot lost ground contact too early (Fig. 2d, orange), in addition to a phase deceleration when toeoff occurred later than commanded (Fig. 2d, green). We combined these feedback mechanisms with a knee pull-up reflex when a leg hit the ground too early to mimic a patellar reflex as adapted from ref. 66 . If a leg hit the ground during the swing phase (Fig. 2d, purple), the knee flexed in a predefined trajectory to generate more ground clearance. In addition to the mechanism in ref. 66 , we disabled the hip motor from interfering in the passive impact mitigation of the mechanical leg springs. An in-depth description of the feedback mechanisms can be found in Supplementary Section 3 and the CPG output with active feedback can be seen in Fig. 4.

Plasticity.
To match the CPG to the robot dynamics, we wanted to tune the CPG parameters p m to achieve optimal performance. To do so, we evaluated the performance of Morti for a number of steps. The timescale of the optimization was designed to be much bigger than the frequency of the elastic feedback activity mechanisms (≤0.1 Hz versus ≥100 Hz). Consequently, the effects of the elastic feedback activity were minimized and small perturbations within one step were not captured in the plastic optimization that will only improve long-term performance.
To achieve long-term (close to) optimal behaviour we used Bayesian optimization for its global optimization capabilities, data efficiency and robustness to noise 52,53 .
Bayesian optimization. Bayesian optimization is a black-box optimization approach that uses Gaussian kernels for function approximation. It is model-free, derivative-free and has been used successfully in many robotic optimization approaches [67][68][69] . Bayesian optimization is favoured over other data-driven optimization and learning approaches because of its data efficiency for ten or more parameters.
We implemented a Bayesian optimizer that was based on skopt gp_minimize (ref. 70 ). The optimizer evaluated the PyBullet simulation for 10 s (approximately ten step cycles) of each rollout with a cost function. Morti walked for 10 s to entrain the CPG from its initial condition (standing still; see Fig. 4, top) to achieve steady-state behaviour before the evaluation began. One complete rollout therefore took 20 s. We sampled 15 rollouts with random CPG parameters before approximating the cost function. Then we optimized for 135 rollouts with the gp_ hedge acquisition function, which is a probabilistic choice of the lower confidence bound, negative expected improvement and negative probability of improvement.
To reduce complexity we limited the parameter space to six parameters. The six parameters are Θ hipOffset , Θ hipAmplitude , D front and D hind , δ ϕ,knee and δ overSwing (Fig. 4). More parameters would probably improve performance more, but would also lead to more corner cases where the selected cost function could be exploited by the optimizer and result in undesired gait characteristics (such as skipping gaits) or gaits where the feet drag on the ground. For this proof of concept, we chose independent duty factors D front and D hind to allow some front-hind asymmetry that could help the optimizer find gaits that reduce body pitch. Where only one CPG parameter was selected, the parameter was used for all legs. For simplicity, we also fixed the frequency to f = 1 Hz to reduce experimental cost in terms of hardware wear from violent motions at high speed. The hip amplitude Θ kneeAmplitude is set to 30 ∘ to ensure adequate ground clearance.
Cost function. We evaluated Morti using a cost function comprising three major components. The first component evaluated J feedback , specifically the percentage of a step cycle that one of the elastic feedback mechanisms was active: r ETO + r ETD + r LTO + r LTD (2) where J feedback is the average percentage of active feedback per step and leg, T is the evaluation time and r ETO , r ETD , r LTO and r LTD are the time vectors when the specific feedback was active for each leg. The second component evaluates the distance travelled (J distance (1/m)) to encourage forward locomotion.
where x body is the centre-of-mass position in the walking direction. The third component penalized deviations from the commanded gait characteristics. It ensured that Morti moved with a low mean J pitch : J pitch =∥ max(α pitch ) − min(α pitch ) ∥ (4) and was calculated as the difference between the mean minimum and the mean maximum of body pitch angle (α pitch ) of all strides during one rollout. It also imposed a penalty if more than one ground contact phase per foot and step (J contact [% of step cycle]) occurred, as would take place during stumbling or dragging of the feet.
where J contact is the mean number of flight-stance changes per step, t is time, T is the evaluation time and contact is the contact sensor data matrix for all four legs. The third component penalized differences between the desired gait frequency and the measured gait frequency to prevent non-periodic gaits or multi-step gaits (J periodicity (Hz)).
where S pitch is the frequency spectrum of α pitch , f bodyPitch is the frequency of the body pitch measurement, J periodicity is the standard deviation of the periodicity measure and f cpg is the commanded CPG frequency. Detailed descriptions can be found in Supplementary Section 4.

Hardware rollouts.
To validate the optimal set of CPG parameters from simulation, we tested the same parameters on the hardware robot. The hardware controller had the same elastic mechanisms described in the 'Elasticity' section. We tested ten parameter sets and randomly varied the CPG parameters obtained from simulation by ≤10% to validate the hardware cost function around the optimal point found in simulation. We then evaluated Morti's performance with the same cost function used for the simulation. As in the simulation, Morti ran for 10 s to entrain itself. The performance was measured for 10 s after Morti converged on a stable gait. Videos of Morti walking are available as Supplementary Videos 1 and 2.

Data availability
The experimental data are available at https://doi.org/10.17617/3.XDOQNW (ref. 71 ). The robot model and CAD design are available for non-commercial use at the same link.