Introduction

A combination of two types of neural circuitry appears responsible for the basic locomotory motor pattern. One type is the central pattern generator (CPG; Fig. 1A), which generates pre-programmed, rhythmically timed, motor commands1,2,3. The other is the reflex circuit, which produces motor patterns triggered by sensory feedback (Fig. 1C). Although they normally work together, each is also capable of independent action. The intrinsic CPG rhythm patterns can be sustained with no sensory feedback and only a tonic, descending input, as demonstrated by observations of fictive locomotion4,5. Reflex loops alone also appear capable of controlling locomotion1, particularly with a hierarchy of loops integrating multiple sensory modalities for complex behaviors such as stepping and standing control6,7. We refer to the independent extremes as pure feedforward control and pure feedback control. Of course, within the intact animal, both types of circuitry work together for normal locomotion (Fig. 1B)8. However, this cooperation also presents a dilemma, of how authority should optimally be shared between the two9.

Figure 1
figure 1

Three ways to control bipedal walking. (A) The central pattern generator (CPG) comprises neural oscillators that can produce rhythmic motor commands, even in the absence of sensory feedback. Rhythm can be produced by mutually inhibiting neural half-center oscillators (shaded circles). (B) In normal animal locomotion, the CPG is thought to combine an intrinsic rhythm with sensory feedback, so that the periphery can influence the motor rhythm. (C) In principle, sensory feedback can also control and stabilize locomotion through reflexes, without need for neural oscillators. The extreme of (A) CPG control without feedback is referred to here as pure feedforward control, and the opposite extreme (C) with no oscillators as pure feedback control. Any of these schemes could potentially produce the same nominal locomotion pattern, but some (B) combination of feedforward and feedback appears advantageous.

The combination of central pattern generators with sensory feedback has been explored in computational models. For example, some models have added feedback10,11,12,13 to biologically-inspired neural oscillators e.g.,14, which employ networks of mutually inhibiting neurons to intrinsically produce alternating bursts of activity. Sensory input to the neurons can change network behavior based on system state, such as foot contact and limb or body orientation, to help respond to disturbances. The gain or weight of sensory input determines whether it slowly entrains the CPG15, or whether it resets the phase entirely16,17. Controllers of this type have demonstrated legged locomotion in bipedal18 and quadrupedal robots19,20, and even swimming and other behaviors21. A general observation is that feedback improves robustness such as against uneven terrain22. And the addition of feedforward into feedback-based control has been used to vary walking speed23, adjust interlimb coordination24, or enhance stability25. However, a disadvantage is that the means of combining CPG and feedback is often designed ad hoc. This makes it challenging to extend the findings from one CPG model to another or to other gaits or movement tasks.

Optimization principles offer a means for a model to be uniquely defined by quantitative and objective performance measures26. Indeed, CPG models have long used optimization to determine parameter values23,27,28. However, the most robust and capable models to date have tended not to use CPGs or intrinsic timing rhythms. For example, human-like optimization models can traverse highly uneven terrain29,30,31 using only state-based control, where the control command is a function of system state (e.g., positions and velocities of limbs). In fact, reinforcement learning and other robust optimization approaches (e.g., dynamic programming32,33) are typically expressed solely in terms of state, and do not even have provision for time as an explicit input. They have no need for, nor even benefit from, an internally generated rhythm. But feedforward is clearly important in biological CPGs, suggesting that some insight is missing from these optimal control models.

There may be a principled reason for a biological controller not to rely on state, as measured, alone. Realistically, a system's state can only be known imperfectly, due to noisy and imperfect sensors. The solution is state estimation33, in which an internal model of the body is used to predict the expected state and sensory information, and feedback from actual sensors is used to correct the state estimate. The sensory feedback gains may be optimized for minimum estimation error separate from control. The separation principle33 of control systems shows that state-based control may be optimized for control performance without regard to noisy sensors, and nevertheless combine well with state estimation33. In practice, actual robots (e.g., bipedal Atlas34 and quadrupedal BigDog35) gain high performance and robustness through such a combination of state estimation driving state-(estimator)-based control. In fact, state-estimator control may be optimized for a noisy environment33, and has been proposed as a model for biological systems36,37,38. The internal model for state estimation is not usually regarded as relevant to the biological CPG’s feedforward, internal rhythm, but we have proposed that it can potentially produce a CPG-like rhythm under conditions simulating fictive locomotion39. Here the rhythmic output is interpreted not as the motor command per se, but as a state estimate that drives the motor command. We demonstrated this concept with a simple model of rhythmic leg motions39 and a preliminary walking model40. This suggests that a walking model designed objectively with state estimator-based control might produce CPG-like rhythms that objectively contribute to locomotion performance.

The purpose of the present study was to test an estimator-based CPG controller with a dynamic walking model. We devised a simple state-based control scheme to produce a stable and economical nominal gait, producing stance and swing leg torques as a function of the leg states. Assuming a noisy system and environment, we devised a state estimator for leg states. A departure from other CPG models is that we optimize sensory feedback and associated gains not for walking performance, but for accurate state estimation. The combination of control and estimation define our interpretation of a CPG controller that incorporates sensory feedback in a noisy environment. Moreover, this same controller may be realized in the form of a biologically-inspired half center oscillator14, with neuron-like dynamics. Because the control scheme depends on accurate state information for its stability and economy, we expected that minimizing state estimation error (and not explicitly walking performance) would nonetheless allow this model to achieve better walking performance. Scaling the sensory feedback either higher or lower than theoretically optimal would be expected to yield poorer performance. Such a model may conceptually explain how CPGs could optimally incorporate sensory feedback.

Results

Central pattern generator controls a dynamic walking model

The CPG controller produced a periodic gait with a model of human-like dynamic walking (Fig. 2A). The model was inspired by the mechanics and energetics of humans41, whose legs have pendulum-like passive dynamics42 (the swing leg as a pendulum, the stance leg as inverted pendulum), modulated by active control. Model parameters such as mass distribution and foot curvature, and the resultant walking speed and step length were similar to human walking. The nominal step length for a given walking speed also minimized the model’s energy expenditure, as also observed in humans43. Acting on the legs were torque commands (\(T_{1}\) and \(T_{2}\), Fig. 2B) from the CPG, designed to yield a periodic gait (Fig. 2C), by restoring energy dissipated with each step’s ground contact collision44,45. The leg angles and the ground contact condition (“GC”, 1 for contact, 0 otherwise; Fig. 2C) were treated as measurements to be fed back to the CPG. Each leg’s states (\({\varvec{x}}_{i} \triangleq \left[ {\theta_{i} ,\dot{\theta }_{i} } \right]^{T}\)) described a periodic orbit or limit cycle (Fig. 2D), which was locally stable for zero or mild disturbances, but could easily be perturbed enough to make it fall (Fig. 2E).

Figure 2
figure 2

Dynamic walking model controlled by CPG controller with feedback. (A) Pendulum-like legs are controlled by motor commands for hip torques \({T}_{1}\) and \({T}_{2}\), with sensory feedback of leg angle and ground contact “GC” relayed back to controller. (B) Controller produces alternating motor commands versus time, which drive (C) leg movement \({\uptheta }\). Sensory measurements of leg angle and ground contact in turn drive the CPG. (D) Resulting motion is a nominal periodic gait (termed a “limit cycle”) plotted in state space \({\dot{\theta }}\) versus \({\uptheta }\). (E) Discrete perturbation to the limit cycle can cause model to fall.

The resulting nominal (undisturbed) gait had approximately human-like walking speed and step length. The nominal walking speed was equivalent to 1.25 m/s with step length 0.55 m (or normalized 0.4 \(\left( {gl} \right)^{0.5}\) and 0.55 \(l\), respectively; \(g\) is gravitational constant, \(l\) is leg length). The corresponding mechanical cost of transport was 0.053, comparable to other passive and active dynamic walking models (e.g.,43,46,47).

This controller had four important features for the analyses that follow. First, the gait had dynamic, pendulum-like leg behavior similar to humans41,42. Second, the controller stabilized walking, meaning ability to withstand minor perturbations due to its state-based control. Third, the information driving control was produced by the controller including CPG, whose entire dynamics and feedback gains were objectively designed by optimal estimation principles, with no ad hoc design. And fourth, the overall amount of feedback could be varied continuously between either extreme of pure feedforward and pure feedback (sensory feedback gain \(L\) ranging zero to infinity), while always producing the same nominal gait under noiseless conditions. This was to facilitate study of how parametric variation of sensory feedback affects performance, particularly under noisy conditions.

Pure feedforward and pure feedback are both susceptible to noise

The critical importance of sensory feedback was demonstrated with a disturbance acting on the legs (Fig. 3A). Termed process noise, it represents not only disturbances but also any uncertainty in the environment or internal model. For this demonstration, the disturbance consisted of a single impulsive swing leg angular acceleration (amplitude 5 \(\left( {g/l} \right)\) at 15% of nominal stride time). The pure feedforward controller failed to recover (Fig. 3A left), and would fall within about two steps. Its perturbed leg and ground contact states became mismatched to the nominal rhythm, which in pure feedforward does not respond to state deviations. In contrast, the feedback controller could recover from the same perturbation (Fig. 3A right) and return to the nominal gait. Feedback control is driven by system state, and therefore automatically alters the motor command in response to perturbations. Our expectation is that even if a feedforward control is stable under nominal conditions (with zero or mild disturbances), a feedback controller could generally be designed to be more robust.

Figure 3
figure 3

Pure feedforward and pure feedback (left and right columns, respectively) are adversely affected by (A) process and (B) sensor noise. Process noise refers to disturbances from the environment or imperfect actuation, and sensor noise refers to imperfect sensing. Plots show ground contact condition, leg angles, commanded leg torques, and noise levels versus time, including both the nominal condition without noise (dashed lines), and the perturbed condition with noise (solid lines). With an impulsive, process noise disturbance, pure feedforward control tended to fall, whereas pure feedback was quite stable. With sensor noise alone, pure feedforward was unaffected, but pure feedback tended to fall.

We also applied an analogous demonstration with sensor noise (Fig. 3B). Adding continuous sensor noise (standard deviation of \(0.1\) for each independent leg) to sensory measurements had no effect on pure feedforward control (Fig. 3B left), which ignores sensory signals entirely. But pure feedback was found to be sensitive to noise-corrupted measurements, and would fall within a few steps (Fig. 3B right). This is because erroneous feedback would trigger erroneous motor commands not in accordance with actual limb state. The combined result was that both pure feedforward and pure feedback control had complementary weaknesses. They performed identically without noise, but each was unable to compensate for its particular weakness, either process noise or sensor noise. Feedback control can be robust, but it needs accurate state information.

Neural half-center oscillators are re-interpreted as state estimator

We determined two apparently different representations for the same CPG model. This first was a biologically inspired, neural oscillator (Fig. 4A) representation, intended to resemble previous CPG models10,11,12,13 demonstrating incorporation of sensory feedback. As with such models, the intrinsic rhythm was produced with two mutually inhibiting half-center oscillators, one driving each leg (\(i = 1\) for left leg, \(i = 2\) for right leg). Each half-center had a total of three neurons, one a primary neuron with standard second-order dynamics (states \(u\) and \(v\)). Its output drove the second neuron (\(\alpha\)) producing the motor command to the ipsilateral leg. The third neuron was responsible for relaying ground contact (“\(c\)”) sensory information, to both excite the ipsilateral primary neuron and inhibit the contralateral one. Although this architecture is superficially similar to previous ad hoc models, the present CPG rhythmic dynamics were determined objectively by optimal estimation principles.

Figure 4
figure 4

Locomotion control circuit interpreted in two representations: (A) Neural central pattern generator with mutually inhibiting half-center oscillators, and as (B) state estimator with feedback control. Each half-center has a primary neuron with two states (\(u\) and \(v\), respectively), an auxiliary neuron \(c\) for registering ground contact, and an alpha motoneuron \(\alpha\) driving leg torque commands. Inputs include a tonic descending drive, and afferent sensory data with gain \(L\). State estimator acts as second-order internal model of leg dynamics to estimate leg states \(\hat{\theta }\) (hat symbol denotes estimate) and ground contact \(\widehat{{{\text{GC}}}}\), which drive state-based command \(T\). The estimator dynamics and estimator parameters including sensory feedback \(L\), and thus the corresponding neural connections and weights, are designed for minimum mean-square estimation error. Leg dynamics have nonlinear terms (see “Methods” section) of small magnitude (thin grayed lines).

The same CPG architecture was then re-interpreted in a second, control systems and estimation framework (Fig. 4B), while changing none of the neural circuitry. Here, the structure was not treated as half-center oscillators, but rather as three neural stages from afferent to efferent. The first stage receiving sensory feedback signal was interpreted as a feedback gain \(L\) (upper rectangular block, Fig. 4B), modulating the behavior of the second stage, interpreted as a state estimator (middle rectangular block, Fig. 4B) acting as an internal model of leg dynamics. Its output was interpreted as the state estimate, which was fed into the third, state-based motor command stage (lower rectangular block, Fig. 4B). In this interpretation, the three stages correspond with a standard control systems architecture for a state estimator driving state feedback control. In fact, the neural connections and weights of the half-center oscillators were determined by, and are therefore specifically equivalent to, a state estimator driving motor commands to the legs.

The two representations provide complementary insights. The half-center model shows how sensory information can be incorporated into and modulate a CPG rhythm. Half-center models have previously been designed ad hoc and tuned for a desired behavior, and have generally lacked an objective and unique means to determine the architecture (e.g., number of neurons and interconnections) and neural weights. The state estimator-based model offers a means to determine the architecture, neural weights, and parameters for the best performance. This half-center model with feedback could thus be regarded as optimal, for producing accurate state estimates despite the presence of noise.

Optimized sensory feedback gain L yields accurate state estimates

We next examined walking performance in the presence of both process and sensor noise, while varying sensory feedback gain \(L\) above and below optimal (Fig. 5; result from 20 simulation trials per condition, 100 steps per trial). We intentionally applied a substantial amount of noise (with fixed covariances), sufficient to topple the model. This was to demonstrate how walking performance can be improved with appropriate sensory feedback gain \(L\), unlike the noiseless case where the model always walks perfectly.

Figure 5
figure 5

State estimation accuracy and walking performance under noisy conditions, as a function of sensory feedback gain. The theoretically optimal sensory feedback gain (normalized gain of 1) yielded best performance, in terms of mechanical cost of transport (mCOT), step length variability, mean time between falls (MBTF), and state estimator error. Normalized sensory feedback gain varies between extremes of pure feedforward (to the left) and pure feedback (to the right), with 1 corresponding to theoretically predicted optimum \(L_{{{\text{lqe}}}}^{*}\). Formally, normalized gain is defined as \(\left| L \right|/\left| {L_{{{\text{lqe}}}}^{*} } \right|,\) where \(\left| \cdot \right|\) denotes matrix norm. Vertical arrow indicates best performance (minimum for all measures except maximum for MTBF). For all gains, model was simulated with a fixed combination of process and sensor noise as input to multiple trials, yielding ensemble average measures. Each data point is an average of 20 trials of 100 steps each, and errorbar indicates standard deviation of the trials. Mechanical cost of transport (mCOT) was defined as positive work divided by body weight and distance travelled, and step variability as root-mean-square (RMS) variability of step length. Falling takes time and dissipates mechanical energy, and so mCOT was computed both including and excluding losses from falls (work, time, distance).

As expected of optimal estimation, best estimation performance was achieved for the gain \(L\) equal to theoretically predicted optimum \(L_{{{\text{lqe}}}}^{*}\) (Fig. 5, normalized sensory feedback gain of 1). We had designed \(L_{{{\text{lqe}}}}^{*}\) with linear quadratic estimation (LQE) based on the covariances of process and sensor noise. Applying that gain in nonlinear simulation with added noise resulted in minimum estimation error (Fig. 5, bottom). This suggests that a linear gain was sufficient to yield a good state estimate despite system nonlinearities.

The optimal sensory feedback gain also yielded best walking performance. Even though the same state-based motor command function (\(\alpha\) in Fig. 4) was applied in all conditions, that function was dependent on accurate state information. As a result, the optimal gain \(L_{{{\text{lqe}}}}^{*}\) yielded greater economy, less step length variability, and fewer falls (Fig. 5). This was actually a side-effect of optimal state estimation, because our state-based motor command was not explicitly designed to optimize any of these performance measures. The minimal mechanical cost of transport was 0.077 under noisy conditions, somewhat higher than the nominal 0.053 without noise. Step length variability was 0.046 \(l\), and the model experienced occasional falls, with MTBF (mean time between falls) of about 9.61 \(g^{ - 0.5} l^{0.5}\) (or about 7.1 steps). This optimal case served as a basis for comparisons with other values for gain \(L\).

Accurate state estimates yield good walking performance

Applying sensory feedback gains either lower or higher than theoretically predicted optimum generally resulted in poorer walking performance (Fig. 5). We expected that any state-based motor command would be adversely affected by poorer state estimates. The effect of reducing sensory feedback gain (normalized gain less than 1) was to make the system (and particularly its state estimate) more reliant on its feedforward rhythm, and less responsive to external perturbations. The effect of increasing sensory feedback gain was to make the system more reliant and responsive to noisy feedback. Indeed, the direct effect of selecting either too low or too high a sensory feedback gain was an increased error in state estimate. The consequences of control based on less accurate state information were more falls, more step length variability, and greater cost of transport. Over the range of gains examined (normalized sensory feedback gain \(\left| L \right|/\left| {L_{{{\text{lqe}}}}^{*} } \right|\) ranging 0.82–1.44), the performance measures worsened on the order of about 10% (Fig. 5). This suggests that, in a noisy environment, a combination of feedforward and feedback is important for achieving precise and economical walking, and for avoiding falls. Moreover, the optimal combination can be designed using control and estimation principles. This testing condition was referred as “reference condition” for the following demonstration.

Amount of noise determines optimal sensory feedback gain L

We next evaluated how walking performance and feedback gain would change with different amounts of noise. The theoretically calculated optimum sensory feedback gain \(L\) depends on the ratio of process noise to sensor noise covariances48. Relatively more process noise favors higher sensory feedback gain, and relatively more sensor noise favors a lower sensory feedback gain and thus greater reliance on the feedforward internal model. We demonstrated this by applying different amounts of process noise (low, medium, and high) with a fixed amount of sensor noise, evaluating the optimal sensory feedback gain, and performing walking simulations with gains varied about the optimum. Noise covariances were set to multiples of the reference conditions (Fig. 5), of 0.36, 1.15, and 2.06 respectively for low, medium, high process noise; sensor noise was 1.15 of reference condition. The ratio of process to sensor noise covariance was thus smaller (low), the same (medium), and larger (high) compared to the reference condition, as was the theoretically optimal gain obtained using linear quadratic estimation (LQE) equation (Fig. 6).

Figure 6
figure 6

Theoretically optimal sensory feedback gains increase with greater process noise. Effect of three conditions of increasing process noise (L low, M medium, H high) on walking performance as a function of sensory feedback gain. The theoretically optimal gains (vertical lines) led to best performance, as quantified by mechanical cost of transport (mCOT, including falls), step length variability, mean time between falls (MBTF), and state estimator error. (An exception was step length variability, which had a broad and indistinct minimum.) The predicted optimal sensory feedback gains for each noise condition are indicated with vertical lines. Arrows indicate best performance for each measure for each noise condition. Performance is plotted with normalized sensory feedback gain ranging between extremes of pure feedforward (to the left) and pure feedback (to the right), with 1 corresponding to theoretical optimum \(L_{{{\text{lqe}}}}^{*}\) of the previous testing condition (Fig. 5). The process noise covariance was set to multiples of the previous reference values: 0.36 for L, 1.15 for M, and 2.06 for H. Sensor noise covariance was set to 1.15 of previous value.

With varying noise, good performance was still achieved with the corresponding, theoretically optimal gain (Fig. 6). The theoretically predicted optimal gain increased with greater process noise, and simulation trials yielded minimum state estimation error at that gain (Fig. 6, Estimation RMS error). Accurate state estimation also contributed to walking performance, with a trend of minimum cost of transport and step variability, and maximum time between falls at the corresponding theoretical optimum. An exception was step length variability in the high process noise condition, which had a broad minimum, with simulation optimum at a slightly lower gain than theoretically predicted. Overall, the linear state estimator predicted the best performing gain well, in terms of state estimation error, energy economy, and robustness to falls. These results are consistent with the expectation that accurate state estimation contributes to improved control.

Removal of sensory feedback causes emergence of fictive locomotion

Although the CPG model normally interacts with the body, it was also found to produce fictive locomotion with peripheral feedback removed (Fig. 7). Here we considered two types of biological sensors, referred to as “error feedback” and “measurement feedback” sensors. Error feedback refers to sensors that can distinguish unexpected perturbations from intended movements49. For example, some muscle spindles and fish lateral lines50 receive corollary efferent signals (e.g. gamma motor neurons in mammals, alpha in invertebrates51) that signify intended movements, and could be interpreted as effectively computing an error signal within the sensor itself50. Measurement feedback sensors refers to those without efferent inputs (e.g., nociceptors, golgi tendon organs, cutaneous skin receptors, and other muscle spindles52), that provide information more directly related to body movement. Both types of sensors are considered important for locomotion, and so we examined the consequences of removing either type.

Figure 7
figure 7

Emergence of fictive locomotion from CPG model. (A) Block diagram of intact control loop, where sensory measurements \({\text{y}}\) and estimation error \(e\) are fed into internal model. Motor command \(T\) drives the legs and (through efference copy) the internal model of legs. (B) Two models of fictive locomotion, starting with the intact system but with sensory feedback removed in two ways. Error feedback refers to sensors that receive efferent copy as inhibitory drive (e.g., some muscle spindles). Removal of error (dashed line) results in sustained fictive rhythm, due to feedback between internal model and state-based command. Measurement feedback refers to other, more direct sensors of limb state \({\varvec{x}}\). Removal of such feedback can also produce sustained rhythm from internal model of legs and sensors, interacting with state-based command. (C) Simulated motor spike trains show how fictive locomotion can resemble intact. Measurement feedback case produces slower and weaker rhythm than error feedback.

These cases were modeled by disconnecting the periphery in two different ways. This is best illustrated by redrawing the CPG (Fig. 4) more explicitly as a traditional state estimator block diagram (Fig. 7A). The state estimator block diagram could be rearranged into two equivalent forms (Fig. 7B) by relocating the point where the error signal is calculated. In the Error feedback model (Fig. 7B, top), the error is treated as a peripheral comparison associated with the sensor, and in the Measurement feedback model (Fig. 7B, bottom), as a more central comparison. The two block diagrams are logically equivalent when intact, but they differ in behavior with the periphery disconnected.

The case of fictive locomotion with Error feedback sensors (Fig. 7B, top) was modeled by disconnecting error signal \(e\), so that the estimator would run in an open-loop fashion, as if the state estimate were always correct. Despite this disconnection, there remained an internal loop between the estimator internal model and the state-based command generator, that could potentially sustain rhythmic oscillations. The case of fictive locomotion with measurement feedback sensors (Fig. 7C Measurement feedback) was modeled by disconnecting afferent signal y, and reducing estimator gain by about half, as a crude representation of highly disturbed conditions. There remained an internal loop, also potentially capable of sustained oscillations. We tested whether either case would yield a sustained fictive rhythm, illustrated by transforming the motor command \(T\) into neural firing rates using a Poisson process.

We found that removal of both types of sensors still yielded sustained neural oscillations (Fig. 7C), equivalent to fictive locomotion. In the case of Error feedback (Fig. 7B), the motor commands from the isolated CPG were equivalent to the intact case without noise in terms of frequency and amplitude. In the case of Measurement feedback (Fig. 7B), the state estimator tended to drive estimate \(\hat{\theta }\) toward zero, and simulations still produced periodic oscillations, albeit with slower frequency and reduced amplitude compared to intact. This is not unlike observations of fictive locomotion in animals53, although our model’s response depends on manner of disconnection. Here, with both types of disconnection, the resulting fictive locomotion should not be interpreted as evidence of an intrinsic rhythm, but rather a side effect of incorrect state estimation.

Discussion

We have examined how central pattern generators could optimally integrate sensory information to control locomotion. Our CPG model offers an adjustable gain on sensory feedback, to allow for continuous adjustment between pure feedback control to pure feedforward control, all with the same nominal gait under perfect conditions. The model is compatible with previous neural oscillator models, while also being designed through optimal state estimation principles. Simulations reveal how sensory feedback becomes critical under noisy conditions, although not to the exclusion of intrinsic, neural dynamics. In fact, a combination of feedforward and feedback is generally favorable, and the optimal combination can be designed through standard estimation principles. Estimation principles apply quite broadly, and could be readily applied to other models, including ones far more complex than examined here. The state estimation approach also suggests new interpretations for the role of CPGs in animal or robot locomotion.

Feedforward and feedback may be combined optimally for state estimation

One of our most basic findings was that the extremes of pure feedforward or pure feedback control each performed relatively poorly in the presence of noise (Fig. 3). Pure feedforward control, driven solely by an open-loop rhythm, was highly susceptible to falling as a result of process noise. The general problem with a feedforward or time-based rhythm is that a noisy environment can disturb the legs from their nominal motion, so that the nominal command pattern is mismatched for the perturbed state. Under noisy conditions, it is better to trigger motor commands based on feedback of actual limb state, rather than time. But feedback also has its weaknesses, in that noisy sensory information can lead to noisy commands.

Better than these extremes is to combine both feedforward and feedback together, modulated by sensory feedback gain \(L\). The relative trade-off between process and measurement noise, described by the ratio of their covariances, determines the theoretically optimal gain. That gain was found to yield the least estimation error in simulation (normalized sensory feedback gain = 1 in Fig. 5). Moreover, variations in covariance produced predictable shifts in the theoretical optimum, which again yielded least estimation error in simulation (Fig. 6). We demonstrated this using relatively simple, but rigorously defined33 linear estimator dynamics, which worked well despite nonlinearities in the motor command and walking dynamics. Still better performance would be expected with nonlinear estimation techniques such as extended Kalman filters and particle filters54.

State estimation may be separated from state-based control

A unique aspect of our approach is the separation of sensory processing from control. We treat sensory information as inherently noisy, and treat the CPG as the optimal filter of noisy sensory information yielding the best state estimate. The control is then driven by that estimate, and may be designed independently of sensors and for arbitrary objectives. In fact, the present state-based motor command (Eq. 6) was designed ad hoc for reasonable performance, without explicitly optimizing any performance measures. Nevertheless, measures such as cost of transport and robustness against falls showed best performance with the theoretically optimal sensory feedback gain \(L\) (Figs. 5, 6).

There are several advantages to this combination of control and estimation. First, accurate state estimation contributes to good state-based control, and poor estimation to poor control. For example, imprecise visual information can induce variability in foot placement55. Increased variability in walking has been associated with poorer walking economy56 and increased fall risk57. Second, this is manifested more rigorously as “the separation principle” of control systems design, where state-based control and state estimation may be optimized separately and then combined for good performance33,48. Third, the same state estimator (and therefore most of the CPG) can be paired with a variety of different state-based motor control schemes, such as commands for different gaits, for non-rhythmic movements including gait transitions, or to achieve different objectives such as balance and agility. This differs from ad hoc approaches, where different tasks are generally expected to require re-design of the entire CPG.

Central pattern generators may be re-interpreted as state estimators

Our model also explains how neural oscillators can be interpreted as state estimators (Fig. 4). Previous CPG oscillator models have incorporated sensory feedback for locomotion15,16,17,18,20,21,22,58, but have not generally defined an optimal feedback gain based on mechanistic control principles. We have re-interpreted neural oscillator circuits in terms of state estimation (Fig. 4), and shown how the gain can be determined in a principled manner, to minimize estimation error (Fig. 5) in the presence of noise. Here, the entire optimal state estimator architecture is defined objectively by the dynamics of the body and environment (including noise parameters), with no ad hoc parameters or architecture. This makes it possible to predict how feedback gains should be altered for different disturbance characteristics (Fig. 6). The nervous system has long been interpreted in terms of internal models, for example in central motor planning and control59,60,61 and in peripheral sensors49. Here we apply internal model concepts to CPGs, for better locomotion performance.

This interpretation also explains fictive locomotion as an emergent behavior. We observed persistent CPG activity despite removal of sensors (and either error or measurement feedback; Fig. 7), but this was not because the CPG was in any way intended to produce rhythmic timing. Rather, fictive locomotion was a side effect of a state-based motor command, in an internal feedback loop with a state estimator, resulting in an apparently time-based rhythm (Fig. 7B). Others have cautioned that CPGs should not be interpreted as generating decisive timing cues62,63,64, especially given the critical role of peripheral feedback in timing9,65,66. In normal locomotion, central circuits and periphery act together in a feedback loop, and so neither can be assigned primacy. The present model operationalizes this interaction, demonstrates its optimality for performance, and shows how it can yield both normal and fictive locomotion.

Optimal control principles may be compatible with neural control

This study argues that it is better to control with state rather than time. The kinematics and muscle forces of locomotion might appear to be time-based trajectories driven by an internal clock. But another view is that the body and legs comprise a dynamical system dependent on state (described e.g. by phase-plane diagrams, Fig. 2), such that the motor command should also be a function of state. That control could be continuous as examined here, or include discrete transitions between circuits (e.g.,8), and could be optimized or adapted through a variety of approaches, such as optimal control33, dynamic programming32, iterative linear quadratic regulators67, and deep reinforcement learning31). Such state-based control is capable of quite complex tasks, including learning different gaits and their transitions, avoiding or climbing over obstacles, and kicking balls31,68. But with noisy sensors34,35, state-based control typically also requires state estimation34,54, which introduces intrinsic dynamics and the possibility of sustained internal oscillations. We therefore suggest that models should be controlled by state-based control (e.g. deep reinforcement learning) coupled with state estimation for noisy environments to achieve advanced capability and performance. The resulting combination of state-driven control and estimation might also exhibit CPG-like fictive behavior, despite having no explicit time-dependent controls.

State estimation may also be applicable to movements other than locomotion. The same circuitry employed here (Fig. 4) could easily contribute a state estimate \(\hat{x}\) for any state-dependent movements. For example, a different rhythmic39 movement might be produced with a different state-based motor command; a non-rhythmic postural stabilization might employ a reflex-like (proportional-derivative) control; and a point-to-point movement might be produced by a descending command, supplemented by local stabilization. All of these would be equivalent to substituting different gains and interconnections to the state-based motor command (\(\alpha\) or \(T\) in Fig. 4), while still relying on all of the remaining circuitry (of Fig. 4). In our view, persistent oscillations could be the outcome of state estimation with an appropriate state-based command for the \(\alpha\) motoneuron (see “Methods” section). But the same half-center circuitry could be active and contribute to other movements that use non-locomotory, state-based commands. It is certainly possible that biological CPGs are indeed specialized purely for locomotion alone, but the state estimation interpretation suggests the possibility of a more general, and perhaps previously unrecognized, role in other movements.

The present optimization approach may offer insight on neural adaptation. Although we have explicitly designed a state estimator here, we would also expect a generic neural network, given an appropriate objective function, to be able to learn the equivalent of state estimation. The learning objective could be to minimize error of predicted sensory information, or simply locomotion performance such as cost of transport. Moreover, our results suggest that the eventual performance and control behavior should ultimately depend on body dynamics and noise. A neural system adapting to relatively low process noise (and high sensor noise) would be expected to learn and rely heavily on an internal model. Conversely, relatively high process noise (and low sensor noise) would rely more heavily on sensory feedback. A limitation of our model is that it places few constraints on neural representation, because there are many ways (or “state realizations”48) to achieve the same input–output function for estimation. But the importance and effects of noise on adaptation are hypotheses that might be testable with artificial neural networks or animal preparations.

Limitations of the study

There are, however, cases where state estimation is less applicable. State estimation applies best to systems with inertial dynamics or momentum. Examples include inverted pendulum gaits with limited inherent (or passive dynamic46) stability and pendulum-like leg motions39. The perturbation sensitivity of such dynamics makes state estimation more critical. But other organisms and models may have well-damped limb dynamics and inherently stable body postures, and thus benefit less from state estimation. Others have proposed that intrinsic CPG rhythms may have greater importance in lower than higher vertebrates, potentially because of differences in inherent stability69. There may also be task requirements that call for fast reactions with short synaptic delays, or organismal, energetic, or developmental considerations that limit the complexity of neural circuitry. Such concerns might call for reduced-order internal models48, or even their elimination altogether, in favor of faster and simpler pure feedforward or feedback. At the same time, actual neural circuitry is considerably more complex than the half-center model depicted here, and animals have far more degrees of freedom than considered here. There are some aspects of animal CPGs that could be simpler than full state estimation, and others that encompass very complex dynamics. A more holistic view would balance the principled benefits of internal models and state estimation against the practicality, complexity, and organismal costs.

There are a number of other limitations to this study. The “Anthropomorphic” walking model does not capture three-dimensional motion and multiple degrees of freedom in real animals. We used such a simple model because it is unlikely to have hidden features that could produce the same results for unexpected reasons. We also modeled extremely simple sensors, without representing the complexities of actual biological sensors. The estimator also used a constant, linear gain, and could be improved with nonlinear estimator variants. However, more complex body dynamics could be incorporated quite readily, because the state estimator (Fig. 4) consists of an internal model of body dynamics (Eq. 7) and a feedback loop with appropriate gain (\(L\) designed by estimation principles), and no ad hoc parameters. Nonlinearities might also be accommodated by methods of Bayesian estimation70, of which optimal state estimation or Kalman filtering is a special case.

Another limitation is that we used a particularly simple, state-based command law, which was designed more for robustness than for economy. Better economy could be achieved by powering gait with precisely-triggered, trailing-leg push-off39, rather than the simple hip torque applied here. However, the timing is so critical that feedforward conditions (low sensory feedback gains, Fig. 5) would fall too frequently to yield meaningful economy or step variability measures. We therefore elected for more robust control to allow a range of feedforward through feedback to be compared (Fig. 5). But even with more economical control or other control objectives, we would still expect best performance to correspond with optimal sensory gain, due to the advantages of accurate state information. We also used a relatively simple model of walking dynamics, for which more complex models could readily be substituted (Eq. 7) to yield an appropriate estimator. But more complex dynamics also imply more complex state-based locomotion control, for which there are few principled approaches. The present study interprets the CPG as a means to produce an accurate state estimate, which could be considered helpful for state-based control of any complexity.

Conclusion

Our principal contribution has been to reconcile optimal control and estimation with biological CPGs. Evidence of fictive locomotion has long shown that neural oscillators produce timing and amplitude cues. But pre-determined timing is also problematic for optimal control in unpredictable situations63, leading some to question why CPG oscillators should dictate timing62,64. To our knowledge, previous CPG models have not included process or sensor noise in control design. Such noise is simply a reality of non-uniform environments and imperfect sensors. But it also yields an objective criterion for uniquely defining control and estimation parameters. The resulting neural circuits resemble previous oscillator models and can produce and explain nominal, noisy, or fictive locomotion. In our interpretation, there is no issue of primacy between CPG oscillators and sensory feedback, because they interact optimally to deal with a noisy world.

Method

Details of the model and testing are as follows. The CPG model is first described in terms of neural, half-center circuitry, which is then paired with a walking model with pendulum-like leg dynamics. The walking gait is produced by a state-based command generator, which governs how state information is used to drive motor neurons. The model is subjected to process and sensor noise, which tend to cause the gait to be imprecise and subject to falling. The CPG is then re-interpreted as an optimal state estimator, for which sensory feedback gain and internal model parameters may be designed, as a function of noise characteristics. The model is then simulated over multiple trials to computationally evaluate its walking performance as a function of sensory feedback gain. It is also simulated without sensory feedback, to test whether it produces fictive locomotion.

CPG architecture based on Matsuoka oscillator

The CPG consists of two, mutually-inhibiting half-center oscillators, receiving a tonic descending input (Fig. 4A). Each half-center has second-order dynamics, described by states \(u_{i}\) and \(v_{i}\). This is equivalent to a primary Matsuoka neuron with states for a membrane potential and adaptation or fatigue14. Locomotion requires relatively longer time constants than is realistic for a single biological neuron, and so each model neuron here should be regarded as shorthand for a network of biological neurons with adaptable time constants and synaptic weights, that in aggregate produce first- or second-order dynamics of appropriate time scale. The state \(u_{i}\) produces an output \(q_{i}\) that can be fed to other neurons. In addition, we included two types of auxiliary neurons (for a total of three neurons per half-center): one for accepting the ground contact input (\(c_{i}\), with value 1 when in ground contact and 0 otherwise for leg \(i\)), and the other to act as an alpha (\(\alpha_{i}\)) motoneuron to drive the leg. We used a single motoneuron to generate both positive and negative (extensor and flexor) hip torques, as a simplifying alternative to including separate rectifying motoneurons.

Each half-center receives a descending command and two types of sensory feedback. The descending command is a tonic input \(s\), which determines the walking speed. Sensory input from the corresponding leg includes continuous and discrete information. The continuous feedback contains information about leg angle from muscle spindles and other proprioceptors71, which could be modeled as leg angle \(y_{i}\) for measurement feedback, or error \(e_{i}\) for error feedback sensors. The discrete information is about ground contact \(c_{i}\) sent from cutaneous afferents72.

The primary neuron’s second-order dynamics are described by two states. The state \(u_{i}\) is mainly affected by its own adaptation, a mutually inhibiting connection from other neurons, sensory input, and efference copy of the motor commands. The second state \(v_{i}\) has a decay term, and is driven by the same neuron’s \(u_{i}\) as well as sensory input. This is described by the following equations, inspired by14 and previous robot controllers designed for rhythmic arm movements (e.g.,73 and walking22):

$$\dot{u}_{i} + a_{i} u_{i} = - b_{i} v_{i} + \mathop \sum \limits_{j = 1}^{2} - w_{ij} q_{j} + \mathop \sum \limits_{j = 1}^{2} h_{ij} e_{j} + \mathop \sum \limits_{j = 1}^{2} r_{ij} \alpha_{j} \left( {s,v_{j} ,c_{j} } \right) + f_{i} \left( {{\varvec{u}},\user2{ v},{\varvec{c}}} \right)$$
(1)
$$q_{i} = g\left( {u_{i} } \right)$$
(2)
$$\dot{v}_{i} + a_{i}^{{\prime}} v_{i} = q_{i} + \mathop \sum \limits_{j = 1}^{2} h_{ij}^{{\prime}} e_{j}$$
(3)

where there are several synaptic weightings: decays \(a_{i}\) and \(a_{i}^{{\prime}}\), adaptation gain \(b_{i}\), mutual inhibition strength \(w_{ij}\) (weighting of neuron \(i\)’s input from neuron \(j\)’s output, where \(w_{ii} = 0\)), an output function \(g\left( {u_{i} } \right)\) (set to identity here), sensory input gains \(h_{ij}\) and \(h_{ij}^{{\prime}}\), and efference copy strength \(r_{ij}\). The neuron also receives efference copy of its associated motor command \(\alpha_{j} \left( {s, v_{j} ,c_{j} } \right)\), which depends on neuron state, descending drive and ground contact. There are also secondary, higher-order influences summarized by the function \(f_{i} \left( {{\varvec{u}},\user2{ v},\user2{ c}} \right)\), which have a relatively small effect on neural dynamics but are part of the state estimator as described below (see “Theoretical equivalence” section). The network parameters for such CPG oscillators are traditionally set through a combination of design rules of thumb and hand-tuning, but here nearly all of the parameters will be determined from an optimal state estimator, as described below.

Walking model with pendulum dynamics

The system being controlled is a simple bipedal model walking in the sagittal plane (Fig. 2A). The model features pendulum-like leg dynamics46, with 16% of body mass at each leg (of length \(l\)) and 68% to pelvis/torso, which is modeled as a point mass. The leg’s center of mass was located at 0.645 \(l\) from the feet, radius of gyration of 0.326 \(l\), and curved foot with radius 0.3 \(l\). The legs are also actively actuated by added torque inputs (the "Anthropomorphic Model,"43), and energy is dissipated mainly with the collision of leg with ground in the step-to-step transition. The dissipation determines the amount of positive work required each step. In humans, muscles perform much of that work, which in turn accounts for much of the energetic cost of walking44.

The walking model is described mathematically as follows. The equations of motion may be written in terms of vector \({\varvec{\theta}} \triangleq \left[ {\theta_{1} ,\theta_{2} } \right]^{T}\) as

$${\text{M}}\left( {{\varvec{\theta}},{ }{\mathbf{GC}}} \right)\user2{\ddot{\theta }} + {\text{C}}\left( {{\varvec{\theta}},{ }\dot{\user2{\theta }},{\mathbf{GC}}} \right)\dot{\user2{\theta }} + {\text{G}}\left( {{\varvec{\theta}},{ }{\mathbf{GC}}} \right) = {\text{T}}\left( {{\text{s}},{ }{\varvec{\theta}},{ }{\mathbf{GC}}} \right)$$
(4)

where M is the mass matrix, C describes centripetal and Coriolis effects, G contains position-dependent moments such as from gravity, \({\mathbf{GC}} \triangleq \left[ {{\text{GC}}_{1} ,{\text{GC}}_{2} } \right]^{T}\) contains ground contact, and \({\text{T}} \triangleq \left[ {T_{1} ,T_{2} } \right]^{T}\) contains hip torques exerted on the legs (State-based control, below). The equations of motion depend on ground contact because each leg alternates between stance and swing leg behaviors, inverted pendulum and hanging pendulum, respectively. We define each matrix to switch the order of elements at heel-strikes, so that equation of motion can be expressed in the same form.

At heelstrike, the model experiences a collision with ground affecting the angular velocities. This is modeled as a perfectly inelastic collision. Using impulse-momentum, the effect may be summarized as the linear transformation

$$\left[ {\begin{array}{*{20}c} {\dot{\theta }_{1}^{ + } } \\ {\dot{\theta }_{2}^{ + } } \\ {{\text{GC}}_{1}^{ + } } \\ {{\text{GC}}_{2}^{ + } } \\ \end{array} } \right] = {\text{S}}\left( {{\varvec{\theta}},{\mathbf{GC}}} \right) \cdot \left[ {\begin{array}{*{20}c} {\dot{\theta }_{1}^{ - } } \\ {\dot{\theta }_{2}^{ - } } \\ {{\text{GC}}_{1}^{ - } } \\ {{\text{GC}}_{2}^{ - } } \\ \end{array} } \right]$$
(5)

where the plus and minus signs (‘ + ’ and ‘–’) denote just after and before impact, respectively. The ground contact states are switched such that the previous stance leg becomes the swing leg, and vice versa. The simulation avoided inadvertent ground contact of the swing legs by ignoring collisions until the stance leg reached a threshold (set as 10% of nominal stance leg angle at heel strike), as is common in simple 2D walking models with rigid legs46.

The resulting gait has several characteristics relevant to the CPG. First, the legs have pendulum-like inertial dynamics, which allow much of the gait to occur passively. For example, each leg swings passively, and its collision with ground automatically induces the next step of a periodic cycle46. Second, inertial dynamics integrate forces over time, such that disturbances can disrupt timing and cause falls. This sensitivity could be reduced with overdamped joints and low-level control, but humans are thought to have significant inertial dynamics42. And third, inertial dynamics are retained in most alternative models with more degrees of freedom and more complexity (e.g.,29,31). We believe most other dynamical models would also benefit from feedback control similar in concept to presented here.

State-based motor command generator

The model produces state-dependent hip torque commands to the legs. Of the many ways to power a dynamic walking model (e.g.,43,74,75,76), we apply a constant extensor hip torque against the stance leg, for its parametric simplicity and robustness to perturbations. The torque normally performs positive work (Fig. 2B) to make up for collision losses, and could be produced in reaction to a torso leaned forward (not modeled explicitly here;46). The swing leg experiences a hip torque proportional to swing leg angle (Fig. 2B,C), with the effect of tuning the swing frequency39. This control scheme is actually suboptimal for economy, and is selected for its robustness. Optimal economy actually requires perfectly-timed, impulsive forces from the legs45, and has poor robustness to the noisy conditions examined here. The present non-impulsive control is much more robust, and can still have its performance optimized by appropriate state estimation.

The overall torque command \(T_{i}\) for leg \(i\) is used as the motor command \(\alpha_{i}\), and may be summarized as

$$T_{i} \left( {s,\theta_{i} , {\text{GC}}_{i} } \right) = - { }\left( {k_{{{\text{st}}}} + \mu_{{{\text{st}}}} { }s{ }} \right) \cdot {\text{ GC}}_{i} - \left( {k_{{{\text{sw}}}} \theta_{i} } \right) \cdot \left( {1 - {\text{GC}}_{i} } \right)$$
(6)

where the stance phase torque is increased from the initial value \(k_{st}\) by the amount proportional to the descending command \(s\) with gain \(\mu_{st}\). The swing phase torque has gain \(k_{sw}\) for the proportionality to leg angle \(\theta_{i}\).

There are also two higher level types of control acting on the system. One is to regulate walking speed, by slowly modulating the tonic, descending command \(s\) (Eq. 6). An integral control is applied on \(s\), so keep attain the same average walking speed despite noise, which would otherwise reduce average speed. The second type of high-level control is to restart the simulation after falling. When falling is detected (as a horizontal stance leg angle), the walking model is reset to its nominal initial condition, except advanced one nominal step length forward from the previous footfall location. No penalty is assessed for this re-set process, other than additional energy and time wasted in the fall itself. We quantify the susceptibility to falling with a mean time between falls (MTBF), and report overall energetic cost in two ways, including and excluding failed steps. The wasted energy of failed steps is ignored in the latter case, resulting in lower reported energy cost.

Noise model with process and sensor noise

The walking dynamics are subject to two types of disturbances, process and sensor noise (Fig. 3). Both are modeled as zero-mean, Gaussian white noise. Process noise \(n_{x}\) (with covariance \(N_{x}\)) acts as an unpredictable disturbance to the states, due to external perturbations or noisy motor commands. Sensor or measurement noise \(n_{y}\) (with covariance \(N_{y}\)) models imperfect sensors, and acts additively to the sensory measurements \(y\). The errors induced by both types of noise are unknown to the central nervous system controller, and so both tend to reduce performance.

The noise covariances were set so that the model would be significantly affected by both types of noise. We sought levels sufficient to cause significant risk of falling, so that good control would be necessary to avoid falling while also achieving good economy. Process noise was described by covariance matrix \(N_{x}\), with diagonals filled with variances of noisy angular accelerations, which had standard deviations of \(0.015 \left( {g/l} \right)\) for stance leg, \(0.16 \left( {g/l} \right)\) for swing leg for the reference testing condition. Sensor noise covariance \(N_{y}\) was also set as a diagonal matrix with both entries of standard deviation \(0.1\) for the reference testing condition. For the later demonstrations to test the effect of the different amount of noise, we multiplied 1.15 to the sensor noise covariance, and 0.36, 1.15, 2.06 to the process noise covariance. Noise was implemented as a spline interpolation of discrete white noise sampled at frequency of 16 \(\left( {g/l} \right)^{0.5}\) (well above pendulum bandwidth) and truncated to no more than ± 3 standard deviations.

State estimator with internal model of dynamics

A state estimator is formed from an internal model of the leg dynamics being controlled (see block diagram in Fig. 4), to produce a prediction of the expected state \(\hat{x}\) and sensory measurements \(\hat{y}\) (with the hat symbol ‘^’ denoting an internal model estimate). Although the actual state is unknown, the actual sensory feedback \(y\) is known, and the expectation error \(e = y - \hat{y}\) may be fed back to the internal model with negative feedback (gain \(L\)) to correct the state estimate. Estimation theory shows that regulating error \(e\) toward zero also tends to drive the state estimate towards actual state (assuming system observability, as is the case here; e.g.,48). This may be formulated as an optimization problem, where gain \(L\) is selected to minimize the mean-square estimation error. Here we interpret the half-center oscillator network as such an optimal state estimator, the design of which will determine the network parameters.

The estimator equations may be described in state space. The estimator states are governed by the same equations of motion as the walking model (Eqs. 4, 5), with the addition of the feedback correction. Again using hat notation for state estimates, the nonlinear state estimate equations are

$$\left[ {\begin{array}{*{20}c} {\user2{\dot{\hat{\theta }}}} \\ {\user2{\ddot{\hat{\theta }}}} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\user2{\dot{\hat{\theta }}}} \\ {{\text{M}}^{ - 1} \left( { - {\text{C}}\user2{\dot{\hat{\theta }}} - {\text{G}} + {\text{T}}} \right)} \\ \end{array} } \right] + {\text{L}}\left( {{\varvec{\theta}} - \hat{\user2{\theta }}} \right)$$
(7)

We used standard state estimator equations to determine a constant sensory feedback gain \(L\). This was done by linearizing the dynamics about a nominal state, and then designing an optimal estimator based on process and sensor noise covariances (\(N_{x}\) and \(N_{y}\)) using standard procedures (“lqe” command in Matlab, The MathWorks, Natick, MA). This yields a set of gains that minimize mean-square estimation error (\({\varvec{x}} - \hat{\user2{x}}\)), for an infinite horizon and linear dynamics. The constant gain was then applied to the nonlinear system in simulation, with the assumption that the resulting estimator would still be nearly optimal in behavior. Another sensory input to the system is ground contact \({\text{GC}}_{{\text{i}}}\), a boolean variable. The state estimator ignores measured \({\text{GC}}_{{\text{i}}}\) for pure feedforward control (zero feedback gain \(L\)), but for all other conditions (non-zero \(L\)), any sensed change in ground contact overrides the estimated ground contact \(\widehat{{{\text{GC}}}}_{{\text{i}}}\). When the estimated ground contact state changes, the estimated angular velocities are updated according to the same collision dynamics as the walking model (Eq. 5 except with estimated variables).

The state estimate is applied to the state-based motor command (Eq. 6). Although the walking control was designed for actual state information (\(\theta_{i} , {\text{GC}}_{i}\)), for walking simulations it uses the state estimate instead:

$$T_{i} \left( {{\text{s}},\hat{\theta }_{i} , \widehat{{{\text{GC}}}}_{{\text{i}}} } \right) = - { }\left( {k_{{{\text{st}}}} + \mu_{{{\text{st}}}} { }s{ }} \right) \cdot { }\widehat{{{\text{GC}}}}_{{\text{i}}} - \left( {k_{{{\text{sw}}}} \hat{\theta }_{i} } \right) \cdot \left( {1 - \widehat{{{\text{GC}}}}_{{\text{i}}} } \right)$$
(8)

As with the estimator gain, this also requires an assumption. In the present nonlinear system, we assume that the state estimate may replace the state without ill effect, a proven fact only for linear systems (certainty-equivalence principle,33,77). Both assumptions, regarding gain \(L\) and use of state estimate, are tested in simulation below.

Theoretical equivalence between neural oscillator and state estimator

Having fully described the walking model in terms of control systems principles, the equivalent half-center oscillator model may be determined (Fig. 4B). The identical behavior is obtained by re-interpreting the neural states in terms of the dynamic walking model states,

$$u_{1} \triangleq \dot{\hat{\theta }}_{1} , v_{2} \triangleq \hat{\theta }_{1} , u_{2} \triangleq \dot{\hat{\theta }}_{2} , v_{2} \triangleq \hat{\theta }_{2}$$
(9)

along with the neural output function defined as identity,

$$q_{i} = g\left( {u_{i} } \right) \triangleq u_{i} .$$
(10)

In addition, motor command and ground contact state are defined to match state-based variables (Eq. 6):

$$\alpha_{1} \triangleq T_{1} ,{ }\alpha_{2} \triangleq T_{2} ,{ }c_{1} \triangleq \widehat{{{\text{GC}}}}_{1} ,c_{2} \triangleq \widehat{{{\text{GC}}}}_{2} .$$
(11)

The synaptic weights and higher-order functions (Eqs. 13) are defined according to the internal model equations of motion (Eq. 7),

$$\left[ {\begin{array}{*{20}c} {a_{1} } & {w_{12} } \\ {w_{21} } & {a_{2} } \\ \end{array} } \right] = {\text{M}}^{ - 1} {\text{C}}$$
(12)
$$\left[ {\begin{array}{*{20}c} {b_{1} \hat{\theta }_{1} - f_{1} \left( {\user2{\hat{\theta }},\user2{\dot{\hat{\theta }}},{ }\widehat{{{\mathbf{GC}}}}} \right)} \\ {b_{2} \hat{\theta }_{2} - f_{2} \left( {\user2{\hat{\theta }},\user2{\dot{\hat{\theta }}},{ }\widehat{{{\mathbf{GC}}}}} \right)} \\ \end{array} } \right] = {\text{M}}^{ - 1} {\text{G }}$$
(13)
$$\left[ {\begin{array}{*{20}c} {h{^{\prime}}_{11} } & {h{^{\prime}}_{12} } \\ {h^{\prime}_{21} } & {h^{\prime}_{22} } \\ {h_{11} } & {h_{12} } \\ {h_{21} } & {h_{22} } \\ \end{array} } \right] = {\text{L }}$$
(14)
$$\left[ {\begin{array}{*{20}c} {r_{11} } & {r_{12} } \\ {r_{21} } & {r_{22} } \\ \end{array} } \right] = {\text{M}}^{ - 1}$$
(15)
$$a{^{\prime}}_{1} = a_{2}^{{\prime}} = 0{ }$$
(16)

Because the mass matrix and other variables are state dependent, the weightings above are state dependent as well. The functions \(f_{1}\) and \(f_{2}\) are higher-order terms, which could be considered optional; omitting them would effectively yield a reduced-order estimator.

The result of these definitions is that the half-center neuron equations (Eqs. 13) may be rewritten in terms of \(\hat{\theta }_{i}\) and \(\dot{\hat{\theta }}_{i}\), to illustrate how the network models the leg dynamics and receives inputs from sensory feedback and efference copy:

$$\ddot{\hat{\theta }}_{i} + a_{i} \dot{\hat{\theta }}_{i} = - b_{i} \hat{\theta }_{i} + \mathop \sum \limits_{j = 1}^{2} - w_{ij} \dot{\hat{\theta }}_{j} + \mathop \sum \limits_{j = 1}^{2} h_{ij} e_{j} + \mathop \sum \limits_{{{\text{j}} = 1}}^{2} r_{ij} \alpha \left( {s,\hat{\theta }_{j} , \widehat{{{\text{GC}}}}_{j} } \right) + f_{i} \left( {\user2{\hat{\theta }},\user2{\dot{\hat{\theta}}}, \widehat{{{\mathbf{GC}}}}} \right)$$
(17)
$$\dot{\hat{\theta }}_{i} + a_{i}^{{\prime}} \hat{\theta }_{i} = \dot{\hat{\theta }}_{i} + \mathop \sum \limits_{j = 1}^{2} h_{ij} e_{j}$$
(18)

The above may be interpreted as an internal model of the stance and swing leg as pendulums, with pendulum phasing modulated by error feedback \(e_{j}\) and efference copy of the motor command (plus small nonlinearities due to inertial coupling of the two pendulums).

The result is that the entire neural circuitry and parameters are fully specified by the control systems model. In Eqs. (1718), all of the quantities except motor command \(\alpha\) are determined by a state estimator (Eq. 7) with optimal gains (determined by single Matlab command ‘lqe’.) For example, higher-order terms \(f_{i} \left( {\user2{\hat{\theta }}, \user2{\dot{\hat{\theta }}}, \widehat{{{\mathbf{GC}}}}} \right)\) are defined by Eq. (13). The only aspect of the system not determined by optimal estimation was the motor command \(\alpha\), equal to the state-based motor command (Eq. 8). This was designed ad hoc to produce alternating stance and swing phases with high robustness to perturbations.

Parametric effect of varying sensory feedback gain L

The sensory feedback gain is selected using state estimation theory, according to the amount of process noise and sensor noise. High process noise, or uncertainty about the dynamics and environment, favors a higher feedback gain, whereas high sensor noise favors a lower feedback gain. The ratio between the noise levels determines the optimal linear quadratic estimator gain \(L_{{{\text{lqe}}}}^{*}\) (Matlab function “lqe”). A constant gain was determined based on a linear approximation for the leg dynamics, an infinite horizon for estimation, and a stationarity assumption for noise. In simulation, the state estimator was implemented with nonlinear dynamics, assuming this would yield near-optimal performance.

It is thus instructive to evaluate walking performance for a range of feedback gains. Setting \(L\) too low or too high would be expected to yield poor performance. Setting \(L\) equal to the optimal LQE gain \(L_{{{\text{lqe}}}}^{*}\) would be expected to yield approximately the least estimation error, and therefore the most precise control (e.g.78). In terms of gait, more precise control would be expected to reduce step variability and mechanical work, both of which are related to metabolic energy expenditure in humans (e.g.,56). The walking model is also prone to falling when disturbed by noise, and optimal state estimation would be expected to reduce the frequency of falling.

We performed a series of walking simulations to test the effect of varying the feedback gain. The model was tested with 20 trials of 100 steps each, subjected to pseudorandom process and sensor noise of fixed covariance (\(W\) and \(V\), respectively). In each trial, walking performance was assessed with mechanical cost of transport (mCOT, defined as positive mechanical work per body weight and distance travelled; e.g.,47), step length variability, and mean time between falls (MTBF) as a measure of walking robustness (also referred to as Mean First Passage Time79). The sensory feedback gain \(L_{{{\text{lqe}}}}^{*}\) was first designed in accordance with the experimental noise parameters, and then the corresponding walking performance was evaluated. Additional trials were performed, varying sensory feedback gain \(L\) with lower and higher than optimal values to test for a possible performance penalty. These sub-optimal gains were determined by re-designing the estimator with process noise \(\rho W\) (\(\rho\) between 10^-4 and 10^0.8, with smaller values tending toward pure feedforward and larger toward pure feedback). This procedure guarantees stable closed-loop estimator dynamics, which would not be the case if the matrix \({ }L_{{{\text{lqe}}}}^{*}\) were simply scaled higher or lower. For all trials, the redesigned \(L\) was tested in simulations using the fixed process and sensor noise levels. The overall sensory feedback gain was quantified with a scalar, defined as the L2 norm (largest singular value) of matrix \(L\), normalized by the L2 norm of \(L_{{{\text{lqe}}}}^{*}\).

We expected that optimal performance in simulation would be achieved with gain \(L\) close to the theoretically optimal LQE gain, \(L_{{{\text{lqe}}}}^{*}\). With too low a gain (\(L = 0\), feedforward Fig. 1A), the model would perform poorly due to sensitivity to process noise, and with too high a gain (\(L \to \infty\), feedback Fig. 1C), it would perform poorly due to sensor noise. And for intermediate gains, we expected performance to have an approximately convex bowl shape, centered about a minimum at or near \(L_{{{\text{lqe}}}}^{*}\). These differences were expected from noise alone, as the model was designed to yield the same nominal gait regardless of gain \(L\). Simulations were necessary to test the model, because its nonlinearities do not admit analytical calculation of performance statistics.

Evaluation of fictive locomotion

We tested whether the model would produce fictive locomotion with removal of sensory feedback. Disconnection of feedback in a closed-loop control system would normally be expected to eliminate any persistent oscillations. But estimator-based control actually contains two types of inner loops (Fig. 7A), both of which could potentially allow for sustained oscillations in the absence of sensory feedback. However, the emergence of fictive locomotion and its characteristics depend on what kind of sensory signal is removed. We considered two broad classes of sensors, referred to producing error feedback and measurement feedback, with different expectations for the effects of their removal.

Some proprioceptors relevant to locomotion, including some muscle spindles and fish lateral lines50, could be regarded as producing error feedback. They receive corollary discharge of motor commands, and appear to predict intended movements, so that the afferents are most sensitive to unexpected perturbations. The comparison between expected and actual sensory output largely occurs within the sensor itself, yielding error signal \(e\) (Fig. 7B). Disconnecting the sensor would therefore disconnect error signal \(e\), and would isolate an inner loop between state-based command and internal model. The motor command normally sustains rhythmic movement of the legs for locomotion, and would also be expected to sustain rhythmic oscillations within the internal model. Fictive locomotion in this case would be expected to resemble the nominal motor pattern.

Sensors that do not receive corollary discharge could be regarded as direct sensors, in that they relay measurement feedback related to state. In this case, disconnecting the sensor would be equivalent to removing measurement \(y\). This isolates two inner loops, both the command-and-internal-model loop above, as well as a sensory prediction loop between sensor model and internal model. The interaction of these loops would be expected to yield a more complex response, highly dependent on parameter values. Nonetheless, we would expect that removal of \(y\) would substantially weaken the sensory input to the internal model, and generally result in a weaker or slower fictive rhythm.

We tested for the existence of sustained rhythms for both extremes of error feedback and measurement feedback. Of course, actual biological sensors within animals are vastly more diverse and complex than this model. But the existence of sustained oscillations in extreme cases would also indicate whether fictive locomotion would be possible with some combination of different sensors within these extremes.