Online learning for orientation estimation during translation in an insect ring attractor network

Insect neural systems are a promising source of inspiration for new navigation algorithms, especially on low size, weight, and power platforms. There have been unprecedented recent neuroscience breakthroughs with Drosophila in behavioral and neural imaging experiments as well as the mapping of detailed connectivity of neural structures. General mechanisms for learning orientation in the central complex (CX) of Drosophila have been investigated previously; however, it is unclear how these underlying mechanisms extend to cases where there is translation through an environment (beyond only rotation), which is critical for navigation in robotic systems. Here, we develop a CX neural connectivity-constrained model that performs sensor fusion, as well as unsupervised learning of visual features for path integration; we demonstrate the viability of this circuit for use in robotic systems in simulated and physical environments. Furthermore, we propose a theoretical understanding of how distributed online unsupervised network weight modification can be leveraged for learning in a trajectory through an environment by minimizing orientation estimation error. Overall, our results may enable a new class of CX-derived low power robotic navigation algorithms and lead to testable predictions to inform future neuroscience experiments.

www.nature.com/scientificreports/ coordinate system. Theoretical models have been proposed for how Hebbian unsupervised learning can lead to the strengthening of connections between neurons receptive to visual features and heading direction ring attractor neurons that are co-active in order to learn the mapping between landmark cues and an orientation estimate 42,71 . Hebbian learning is a general type of synaptic mechanism for associative learning well studied in neuroscience and recent experimental evidence in Drosophila support a role for Hebbian learning for the mapping between neurons representing visual features and the compass neurons 66,72 , however the performance of such Hebbian learning is unknown with changes of position, where the relative landmark angles change over time.
While there has been compelling insight into the Drosophila heading direction system at a cell-type level, an outstanding question is how amenable this approach may be for robotic applications that include a trajectory through an environment. In particular, the performance of the heading direction system has not been quantified when using cell-type specific connectivity patterns. Additionally, due to constraints of experimental approaches, both empirical observations and existing computational models have been limited to single location trajectories where there is rotation without translation. In particular, the coordinated learning of a visual feature mapping while traversing a trajectory through an environment is a necessary and challenging function for such a network that is not currently understood.
The key contributions of this work are to develop and evaluate a model for how a Drosophila connectivityconstrained network can perform both sensor fusion and online learning for estimating orientation in a trajectory through an environment to better understand the mechanisms employed in biology and to enable neuromorphic visually guided robotic navigation. We evaluate the model through trajectories with noisy inputs in a simulated environment, investigate performance with measurements from a robotic platform, and quantify the potential power efficiency gains of neuromorphic implementation with such a compact network model. Furthermore, we propose a theoretical understanding of how distributed online unsupervised learning can be leveraged for learning in a trajectory through an environment in coordination with a ring attractor network that addresses current challenges in deployment time training of neural networks as part of a navigation algorithm.

Results
A connectivity-constrained network model with online learning is developed and evaluated with a series of targeted experiments to investigate (1) the coordination of insect cell-type connectivity with online learning, (2) the learning challenge where the relative orientation of visual cues changes over time with changing position throughout an environment, and (3) network performance when utilized on a robotic platform. Finally, we analyze the network's online learning from the perspective of objective function minimization and propose an updated online rule to enhance heading direction estimation accuracy which can be investigated further in behavioral experiments.
Connectivity-constrained network for sensor fusion and online learning. Our model transforms angular velocity and visual features into a fused representation of orientation utilizing a total of 141 neurons distributed across five populations of neurons as specified in Fig. 1A, which are constrained by connectivity patterns of neuron types observed in Drosophila across glomeruli, and with plastic synaptic connections that enable the learning of visual landmarks. The five types of modeled neurons all synapse in either the protocerebral bridge (PB) or ellipsoid body (EB) regions of the CX and include: (1) Ring neurons which are receptive to visual inputs, (2) PB-EB-Noduli (P-EN) neurons which receive angular velocity inputs, (3) EB-PB-Gall neurons (E-PG) neurons, (4) PB-EB-Gall (P-EG) neurons, and (5) intrinsic neurons of the PB (PIntr), also referred to as Δ7 neurons. Orientation is encoded in a population of E-PG neurons or "compass" neurons, which have been observed in experimental studies to have maximal activation in a position along the anatomical circumference of the ellipsoid body which rotates corresponding to the orientation of the fly. The modeled population of 18 E-PG neurons is divided into left and right hemispheric groupings depending on whether the neuron projects to the left or right hemisphere of the PB. We utilize an orientation representation scheme where each of the 9 E-PG neurons per hemisphere is maximally active at a distributed preferred angle as demonstrated in Fig. 1C. The architecture of our model is constrained by the nine anatomically observed glomeruli observed per hemisphere in the PB representing a discretization in the heading direction system uniquely observed in insect systems 63 .
A stable bump of activity in E-PG neurons can persist given the recurrent excitation to keep the bump active along with inhibition provided by PIntr neurons to prevent runaway excitation. Local recurrent excitation for maintaining bump activation in E-PG neurons is mediated by both P-EN and P-EG neurons. Notably, the PIntr neurons do not directly inhibit E-PG neurons, but rather inhibit the P-EN and P-EG as well as themselves in a more complex pattern of inhibition than would be strictly necessary to prevent runaway excitation among the E-PG neuron population. Positive angular velocity is encoded in the P-EN neurons in the right hemisphere, which when active, shifts the activity bump in the counter-clockwise direction. Conversely, negative angular velocity is encoded by P-EN neurons in the left hemisphere which shifts the activity bump in the clockwise direction (Fig. 1B).
Sensed visual features represented in ring neuron activation patterns can update the activity bump in E-PG neurons according to their connectivity weights (Fig. 1D). The intuition behind the online learning of visual features is that the activity bump is initially driven by angular velocity signals which serves as a noisy teaching signal to update a matrix of weights, W r→c , between ring neurons and "compass" E-PG neurons. Each individual element, w nm , is the weight between the mth ring neuron and the nth E-PG neuron. The mechanism utilized for learning is a Hebbian learning rule, where the co-activation of neurons increases the effective connection weights between ring and E-PG neurons. In particular, a presynaptically-gated Hebbian learning rule is used to modify weights 72  www.nature.com/scientificreports/ where r m is the activation of the mth ring neuron, and c n is the activation trace of the n th E-PG neuron. The rule is parameterized with a learning rate, η , as well as additional constants a and b . All utilized w nm weights are assumed to be negative, as in previous computational studies 68 given the inhibitory characteristics of ring neurons 66,73 , so the maximum effective weight is zero, which can activate E-PG neurons through disinhibition. When the network model is deployed for a simulated or physical environment, all other weights besides W r→c are held fixed. The model's orientation estimate is calculated as the center of the activity bump utilizing the preferred direction of each E-PG neuron to create a linear decoder applied to filtered spiking events. Additional details for model implementation can be found in the methods section.
Model performance with rotation. An initial evaluation of the network is performed in the case of a simple rotation in an environment without translation (Fig. 2). Ring neurons encode visual input, with a separate set of ring neurons responsive to each landmark and each individual ring neuron with a receptive field tiled across a 270° field of view. As the simulated agent and field of view rotates, the index of the most active ring neuron in each sub-population shifts (Fig. 2B). A corresponding shift in a bump of activity is observed in E-PG, P-EN, and P-EG neurons ( Fig. 2C-E), which appears as two bumps given the separate indexing of left and right hemisphere neurons. Angular velocity input is provided to the model by injecting current in the P-EN neurons, where a counter-clockwise rotation in the simulation corresponds to an observed increase in activation of right P-EN neurons (Fig. 2E). Throughout the trial, there is consistent activation of Pintr neurons (Fig. 2F), which provide inhibition to the network and prevents the model from going into a state of runaway excitation. Three model configurations are compared in order to evaluate sensor fusion and online learning (1) initialization of W r→c weights to an optimal set of offline calculated weights, (2) online learning of modeled weights according to Eq. (1) after random initialization, and (3) setting all W r→c weights to zero such that angular velocity is the only input into the model. Due to the stochasticity of the neuron model, a total of 50 simulation seeds www.nature.com/scientificreports/ are used to evaluate performance. An example of the decoded orientation from all three model configurations is shown in Fig. 2G, where all of the model configurations are able to track the orientation over time (with the optimal offline weights calculated weights having the best performance). The optimized weights (Fig. 2H) are calculated with regularization, which effectively select a subset of model ring neurons to utilize for angle estimation. The weights that are learned over the course of the trial (Fig. 2I), have a similar banded structure to the optimized weights and increase in correlation over the course of the trial to an average value of 0.36 (Fig. 2J).
Factors limiting higher correlation include (1) the effective "teaching" signal in the online training is from integrating angular velocity cues which are inherently noisy as well as (2) the inherent differences between the training approaches. Estimated orientation error accumulates over the course of the trial in the angular velocity only case (Fig. 2L), which is reduced when there is online learning. Overall, the average orientation root mean squared error (RMSE) is significantly reduced by 28% (1.03 to 0.74 radians, P = 0.0023, Mann-Whitney U-test) with the online learning verses angular velocity alone, compared to an 89% reduction with the set of optimized weights. Sources of error in the weight optimization process include: (1) W r→c is optimized with regards to the feedforward visual input alone and target E-PG activity which does not explicitly compensate for the recurrent Model performance with translation. The network is next evaluated in a more challenging trajectory that involves translation through the simulated environment, where there is no longer an invariant transformation between egocentric visual landmark features referenced to a sensor frame and an orientation referenced to an allocentric world frame coordinate system (Fig. 3). The bump of E-PG activation follows the time-varying www.nature.com/scientificreports/ orientation in the trajectory and is able to be decoded into an accurate orientation estimation over time ( Fig. 3A-B). In the trajectory through the environment, there are two distant landmarks, which individually have a relatively invariant transformation between egocentric and allocentric representations, along with a proximal landmark which has a time-varying transformation between egocentric and allocentric representations (Fig. 3C). The set of offline optimized weights for the network (Fig. 3D) preferentially selects visual features corresponding to the distant landmarks. The evaluated network configuration with online plasticity learns weights with increasing correlation over time to the optimal weights ( Fig. 3E-G). Similar general trends for orientation estimation accuracy are observed between network configurations as to the rotation only trajectories (Fig. 3H, J), where the orientation RMSE for online learning is between the angular velocity only case and the optimal set of weights. The RMSE of the angular velocity only network configuration in this trajectory with translation is similar to the simple rotation case (1.00 vs. 1.03 radians), but is compensated further by online learning (38% vs. 28% reduction in error).
Position estimation with path integration is more challenging than orientation estimation because of the accumulation of error driven by orientation estimation error. Similar trends are observed with average position estimation error (as a percentage of path length), where online learning has a position estimation error between the angular velocity alone and the optimized weights network configuration (6.7% vs. 8.6% and 4.8%, Fig. 3I). The decrease in average position error with online learning vs. angular velocity alone is 22% (P = 0.0021, Mann Whitney U-test). Sources of error for position estimation for path integration include the aforementioned error accumulation from orientation estimation that could be further reduced with neuro-inspired 74 or traditional approaches 75 .
Overall, network simulation results with translation demonstrate that an accurate allocentric estimation of orientation referenced to a world frame can be estimated with egocentric visual features despite the challenges of transforming information over time from proximal landmarks. The model configuration with online learning is correlated with the optimized set of weights and improves orientation and position estimation by an average of 38% and 22% respectively versus a configuration with angular integration alone. Simulation results suggest that improvements to the online learning rule (Eq. 1) may be beneficial to effectively select visual features from distal landmarks to further increase accuracy.
Model performance on a robotic platform. The model for orientation estimation and position estimation is extended from simulated environments to measurements from a wheeled robotic platform (Fig. 4) in an arena with colored landmarks (Fig. 4B). Visual inputs are measured from a camera with two separated sensors www.nature.com/scientificreports/ offset at 180°, where the relative angle offset of landmarks is calculated from blob detection on color-masked images (Fig. 4C). The relative angle offset detected for the center of each of the green, yellow, and red colored landmarks are used to drive the activation of ring neurons (Fig. 4D) according to their receptive fields, which are mapped equivalently to the three populations of ring neurons mapped to simulated landmarks across a 270° field of view. Given the less than 360° field of view on the camera sensors, there are ring neurons selective to outside the camera's field of view which will never activate. While several different neural visual encoding schemes are possible which would likely improve performance, mirroring the simplified visual encoding from simulations enables straightforward comparison between simulated and physically measured model performance. An added source of noise in the visual feature measurements is the intermittent dropping of recorded image frames due to maximum disk-writing speeds, which is apparent in the lack of ring neuron activity at intermittent periods, which further tests the network's performance in relation to sensor fusion. The angular velocity measurements to drive the activation of P-EN neurons (Fig. 4E) are derived from an on-board IMU sensor. The center of the bump of activity in the P-EG and E-PG neurons shifts over time (Fig. 4F-G), which leads to a decoded orientation that follows the ground truth orientation (Fig. 4I) and a position estimate utilizing ground truth linear velocity for path integration (Fig. 5). A comparison of the model performance is performed with the plasticity model, the angular velocity only model, and the optimized weights (Fig. 6). An example of the comparison between the optimized weights and the online learning of weights is shown in Fig. 6A-B, which again share a banded structure. The correlation of the optimized and online learned weights monotonically increases over time to a maximum average value of 0.29 (Fig. 6C-D). The orientation error accumulates over time fastest in the angular velocity only case with a slight decrease in error accumulation with online learning (Fig. 6G). Overall, the orientation RMSE is 14% less with online learning vs. angular velocity alone (average 0.80 vs. 0.93 radians, P = 0.014, Mann-Whitney U-test), however the optimized weights have the lowest error. The position estimate similarly accumulates over time in all comparisons, with a notable increase in position error at 1.5 s in the weight optimized case due to a prolonged period of dropout of visual features (Fig. 6H). A minor decrease in the average position error with the online learning versus the angular velocity case alone is observed.
To analyze the potential power efficiency impact for neuromorphic implementation, we can estimate the amount of power required for the algorithmic computation of our model based on reported estimates from SPICE simulations of Intel's Loihi neuromorphic chip 25 . Utilizing figures for energy per synaptic spike operation, synaptic update, neuron update, and within-tile spike energy, we can approximate the power utilization necessary for the algorithmic computation of our model on neuromorphic hardware as 18.58 μW (11.42 μW for neuron update, 4.38 μW communication, and 2.77 μW for plasticity).

Online learning analysis.
In all experiments, weights derived from online learning are correlated with an optimal solution and improve orientation accuracy after a single trajectory. A challenge for robotic translation is how to enhance estimation accuracy to further minimize error accumulation.
One approach is to identify the objective function that the utilized learning rule is minimizing. Upon further inspection (see "Methods" Eqs. 5-6), it follows that Eq. 1 is minimizing the overall objective function, While the learning rule (Eq. 1) effectively maximizes the overlap from the input current from ring neurons and the E-PG neuron activation, it is not directly optimizing an objective function to minimize the squared error of the orientation estimate. An example of this is how the online learning model (Eq. 1) did not preferentially select distal over proximal landmark features. The success of utilizing a scaled version of the offline optimized weights solution in preferentially selecting distal landmark visual features in simulation (and increasing accuracy overall in simulations and on the robotic platform), lends support to using a set of directly optimized weights that minimize squared orientation error in future approaches. In order to directly minimize the objective function of the scaled squared orientation estimation difference, we can define the objective function for each E-PG neuron, with a scaling factor, β . Upon further inspection, it follows that in order to minimize the objective function,ϕ , (see "Methods" Eqs. 7-8), a weight update rule is This rule has a commonality to the previously utilized rule (Eq. 1), where weight modifications are presynaptically gated; however, there is an additional term, I r→n , that is utilized. We propose the learning rule above www.nature.com/scientificreports/ (Eq. 4) for utilization in future online robotic applications to enable an improvement in accuracy in learning an environmental mapping. Each of the terms utilized in Eq. (4) is biologically plausible, which could motivate future experimental evaluation. Furthermore, each term is a local variable specific to pairs of synaptically coupled neurons, which would be amenable for implementation in distributed learning on neuromorphic hardware.

Discussion
Our motivation is to investigate the potential to leverage details from Drosophila neurobiology and neuroanatomy for sensor fusion and online learning for orientation estimation as a basis for future low SWaP neuromorphic robotic navigation approaches. Central to our analysis is understanding how online learning in the underlying insect neural circuit can incorporate visual features during changing positions in complex environments, a necessary functionality for both robotic and biological systems. We develop a compact model for Drosophila sensor fusion and online learning in a cell-type connectivity-constrained model for orientation estimation and environmental learning that integrates angular velocity measurements and visual features. Through a series of experiments in simulated environments, we demonstrate that improvement in orientation and position accuracy estimation is possible with online learning of visual features (versus angular velocity alone) over a single trajectory and that online learned weights are correlated with a set of offline calculated optimal weights. The network model is adapted for use with sensors from a robotic platform and is demonstrated to have increased accuracy with online learning over a single trajectory. Finally, a theoretical understanding and weight update rule for distributed online learning with local variables is proposed that can be utilized to minimize estimation error.
There are many neuroscience-inspired robotic approaches for navigation that are largely inspired by the cell types observed in mammalian systems, e.g. 33 , with a subset of these utilizing a ring attractor to represent heading direction, e.g. 53 . In comparison to robotic approaches that utilize well-defined external cues 36,52,57 , or projections to mammalian-inspired place cells 38 to reduce accumulation of error with angular velocity integration in a heading direction network, we demonstrate learning a mapping between neurons encoding visual features over receptive fields to neurons encoding heading direction which has been experimentally observed. We demonstrate that online learning can decrease heading direction estimation error by 14-38% in a single trajectory through the environment by emulating learning that has been observed in Drosophila at ring neuron to E-PG neuron synapses 66,72 . A salient difference between insect and mammalian heading direction systems is the far fewer number of neurons that insects have in their heading direction system. Additionally, insect models benefit from the greater level of cell-type connectivity information available, as compared to mammalian systems. Models of ring attractor networks developed for mammalian heading direction estimation typically employ (1) a mechanism for self-sustained activation, (2) direct, non-recurrent inhibition, and (3) representations of heading direction based on large numbers of neurons that approach continuous approximations [42][43][44][45][46][47][48] . In insect systems, however, network architectures identified in the heading direction system are more compact but more complex in network details, which include (1) additional paths of recurrent excitation facilitated by an additional class of neurons (P-EG), (2) a distributed population of inhibitory neurons, which do not directly inhibit the compass neurons, and (3) a discretization of the compass neurons into nine glomeruli per hemisphere 40 . Computational studies have been performed outlining how this compact set of Drosophila cell types can have properties as a ring attractor 68,70 , but we demonstrate here how the network can be operationalized for sensor fusion and orientation estimation for robotic navigation. Considering that features of this Drosophila network are preserved across insects 70,76 , which systemically differ from conventional rodent-based ring attractor networks, the navigation implementation proof of principle established here can inform future studies into enhanced mechanistic understanding and performance of this system.
We investigate how a heading direction system can learn sensory cues to maintain accuracy with a changing position across an environment from the perspective of objective function minimization. Previous computational modeling, focused on a fixed location 42,71,72 , investigated Hebbian plasticity rule formulations which will lead to increasing the effective connectivity weights between co-active visual input feature neurons and compass neurons, but without minimizing an objective function for orientation estimation error. At a single location, our results show that a Hebbian rule is sufficient to learn mappings between landmark features, however when the spatial location is allowed to vary, a Hebbian rule formulation struggles to ignore uninformative landmark features. Indeed, at multiple spatial locations or when there are uninformative landmark features, estimating heading direction from a set of visual features can be challenging 58 , without a set of network weights that can perfectly perform this mapping. Nevertheless, a set of network weights can be identified which minimize an objective function for heading direction estimation from visual features. We find that landmarks at a distance help anchor the heading system better than nearby landmarks through analysis of optimized visual feature weights, which is intuitive because a distant landmark provides a more consistent signal at different positions in an environment. A learning rule (Eq. 4) is proposed to directly minimize orientation estimation error in the network, which implicitly ignores visual features corresponding to less informative or less reliable landmark features, such as non-unique features or features corresponding to local clutter and can form the basis for performant distributed online learning in future robotic investigations.
One of the motivating factors for neural-inspired algorithm development for robotic applications is the potential for power savings and resource efficiency for neuromorphic implementations. Given recent innovations in neuromorphic hardware development with Intel's Loihi platform 25 , which is able to implement networks with online learning with considerable efficiency, a central challenge in the development of low power neuromorphic robotic applications is to identify performant algorithms utilizing online learning with a compact neural circuit. Conventional algorithms which perform online learning and loop closure with SLAM are power-intense with special-purpose FPGA implementations still requiring approximately 2 or more watts 77,78 . By contrast, conventional approaches without online learning with loop closure, such as VIO, have reduced power consumptions of www.nature.com/scientificreports/ as little as 2mW on special purpose hardware accelerators 79 . Neuromorphic approaches for robotic navigation show potential for reduced power utilization, such as 9mW in dynamic power consumption demonstrated for a mammalian-inspired network to perform uni-dimensional SLAM with 15,162 compartments 80 . In contrast, our approach utilizes an insect connectivity constrained network with 141 compartments (0.1% of the maximum allowable units on a Loihi chip) and we estimate computation power utilization of 18.6 μW, roughly five orders of magnitude less power than SLAM implementations, and two orders of magnitude less than VIO implementations. While these power estimates are based on published measurements from Loihi SPICE simulations and do not consider system elements such as static power requirements, sensor communication, and low-level visual processing, they nevertheless demonstrate the potential for drastic power savings utilizing inspect-inspired neuromorphic approaches for navigation. Our modeling results can inform future experiments to better understand the computational mechanisms of heading direction estimation and visually guided navigation in Drosophila and the CX more generally. While a hallmark of a typical mammalian head-direction cell is that the preferred direction of the cell is the same regardless of the animal's location, it is unknown whether a location-invariant heading direction representation is computed in Drosophila. In order to perform path integration, a navigation strategy widely observed across insect species 81 and implicated for Drosophila 82 , a location-invariant representation of heading direction relative to a world frame is computationally more robust 81 . Our modeling results demonstrate that online learning can be used to improve heading direction estimation and path integration error estimates relative to a world frame with a local learning rule, even in the case of position changes through the environment where relative landmark angles change. While the CX-dependent ability to utilize visual features for 2D navigation in an arena whose relative orientations change during movement has been demonstrated experimentally in Drosophila 83 , the extent to which the conversion of those visual features to a location-invariant heading direction representation has yet to be tested experimentally. We hypothesize that a local learning which forms a location-invariant heading estimation from visual cues will be a function of input current from multiple neurons, and not only as function of pre-and post-synaptic activity. Additionally, assuming a plasticity rule that drives learning through orientation estimation error rather than co-occurrence of landmark features and heading direction, we predict that the addition of new landmarks after learning should be effectively ignored after an accurate mapping has been developed.
There are several assumptions and simplifications utilized in the presented results. One simplification is that path integration is performed with a known linear velocity. Insect-inspired approaches for path integration in the CX that operate downstream from orientation estimation 74 are not included in our model. Furthermore, recently released synapse-level neural connectivity data 59,61 is not incorporated into our model. Another simplification utilized in the network model is the processing of visual features by ring neurons with landmark specific tuning curves. We expect our findings to generalize across more visual feature encoding schemes such as more detailed models of the insect optic lobes, deep networks, or incorporation of processed dynamic vision sensor data which could be investigated in future work. Such encoding schemes could capture visual features that match the expected statistics of landmarks across more realistic environments and provide a richer substrate for learning an orientation mapping.
We present a critical proof of principle for translation of insect-inspired approaches to robotics navigation to enable a future class of low SWaP algorithms to perform online learning and heading representation constrained in detail from biology. Given that all neurons are modeled as dynamic integrate-and-fire neurons, the model is amenable to incorporating event-driven low latency sensors such as dynamic vision sensors to enable updating estimates with visual features detected during high velocity movement. One of the key model features is the ability to utilize previously encountered visual features to update an estimate of orientation, loop closure for orientation estimation, which is not possible in low power navigation approaches utilizing VIO. While this is less than loop closure capabilities of a complete pose in full SLAM systems, it still is a promising functionality due to the ability for the model to be implemented as a parallelized distributed network on neuromorphic hardware. Additionally, compared to deep neural network approaches utilized in visual navigation where networks are trained offline due to computational resource and large data size requirements, the weights in the network model are learned online and are used over a single trajectory to increase estimation accuracy.
In conclusion, we present a critical proof-of-concept for a low SWaP robotics navigation algorithm utilizing orientation estimation in a ring attractor network constrained using circuit details from Drosophila with online distributed learning amenable for neuromorphic implementation. By focusing on the objective function minimization necessary for a robotics implementation with a changing position, we establish a formalism for common computational goals underlying both biological and artificial systems and identify testable predictions and areas of focus for future neuroscience experiments.

Methods
All experiments are performed in a simulated or physical environment utilizing the same connectivity-constrained network model for performing orientation estimation.

Network model input.
A total of 81 ring neurons are simulated which are selectively tuned to the position of landmarks in the visual field in order to abstract the initial visual processing in the optic lobes. Specifically, sets of 27 ring neurons are selective to each of three landmarks with a Gaussian tuning curve with a standard deviation of 6.44° and a maximum response offset by 10° across a 270° field of view (Fig. 7A). The standard deviation of the Gaussian is selected such that adjacent neurons had overlapping turning curves starting at half maximum values. The current provided to each simulated E-PG neuron in each time step is determined by the visual ring neuron activation multiplied by W r→c . www.nature.com/scientificreports/ The 16 P-EN neurons are split into two hemispheres (right and left), such that the 8 right P-EN neurons encode positive angular velocities, and the 8 left neurons negative velocities. The current provided to each neuron as input is calculated such that with no other input, each P-EN neuron has a steady state maximum firing rate of 250 Hz at 10 radians per second (Fig. 7B). In simulations, zero mean Gaussian noise is added to angular velocity with a standard deviation of 0.1 radians per second.
Network model and configurations. Neural activation of each ring, E-PG, P-EG, P-EN, and Pintr neuron is modeled as leaky integrate and fire neurons utilizing the nengo software package 84 with a timestep of 1 ms. The external inputs driving the network activity are input currents from visual encoding to E-PG neurons and angular velocity encoding to P-EN neurons as described above. The connectivity pattern between E-PG, P-EG, P-EN, and Pintr neurons are based on previously reported biologically-constrained connectivity patterns 68 . The network weights between neural subpopulations are 20 for all excitatory connections (E-PG → P-EN, E-PG → P-EG, E-PG → Pintr, P-EN → E-PG, P-EG → E-PG), -15 for all inhibitory connections to excitatory neurons (Pintr → P-EG, Pintr → P-EN), and -20 for all Pintr → Pintr connections. Stochasticity is introduced to the network with mean zero Gaussian noise with a standard deviation of 0.1 added to P-EG and Pintr neurons.
Three model configurations are used whose only difference is the weight of the ring neuron to E-PG connections. For the angular velocity only case, there is no visual input (the effective W r→c is 0). For the offline optimal comparison, a supervised set of scaled W r→c weights is solved for using linear lasso regularized positive regression to minimize the objective function for each compass neuron t c n (t) − m (−r m (t)w n,m ) + α n 2 + mw n,m , where c n (t) is a target set of compass neuron activation at each simulation timestep generated with the preferred angle of each compass neuron. In order to enforce negative weights for w n,m , the weights used in simulation are a scaled version of the w n,m = −βw n,m , and β=0.0025, to optimize estimation accuracy. For the online learning rule comparison, the weights between ring neurons and E-PG neurons are updated according to Eq. 1 with parameter values of 1.7e−6, 1.7e−4, and 0.29 for a, b, and η , respectively.
Learning rule objective functions. In order to perform gradient descent on the objective function φ as defined in (Eq. 2) over time by modifying the synaptic weights, w nm , it follows from the chain rule that if we ignore the implicit dependence of c n on w nm assuming that the bump here is determined mostly by the recurrent weights. The discrete form of the gradient descent is Similarly, to minimize the orientation error objective function, ϕ , as defined in (Eq. 3), it follows from the chain rule that with a discrete form of with a scalar learning rate parameter, η = 2βη.
Hardware translation. The robotic platform utilized is a modified Robotis Turtlebot3 "Burger, " equipped with a Nvidia Jetson TX2, and a Ricoh Theta S 360° camera. The Turtlebot has an OpenCR embedded motor controller and sensor suite, as well as two Dynamixel XL430-W250 servos for wheel control.
The relative angle of colored landmarks in the arena are detected from 360° camera images utilizing color masks and blob detection prior to activation of visual ring neurons as described above with Gaussian receptive (5) ∂w nm ∂t = − ∂φ ∂w nm = ac n ∂I r→n ∂w nm − r m (b − w nm ) = r m (ac n + b − w nm ) (6) �w nm = η ∂w nm ∂t = ηr m (ac n + b − w nm ). www.nature.com/scientificreports/ fields with respect to the relative angle of each landmark. All sensor data was recorded in ROS and used as input to the network model. We accelerated the input stream by a factor of ten to facilitate an evaluation of time courses corresponding to those used in simulated environments. The robot was run in an arena with an Optitrack system. IR-reflective markers were attached to the robot such that the position and orientation of the robot was tracked by the system. Red, green, blue, and yellow landmarks were made out of posterboard and placed in the environment with their own IR markers. Data from the Optitrack system was used for comparison.
Power estimation. The estimated power consumption for algorithmic computation utilizes published energy per operation values 25 and is calculated seperately for the neuron update, ( e neuron = n neuron * 81 pJ ), spike communication ( e comm = 1 T T t=1 n neuron j=1 n neuron i=1 y i (t)w + i,j * 1.7pJ ), and plasticity ( e plasicity = n synRC * 120pJ/�p ), where n neuron is the total number of neurons in the network, T is the total number of timesteps, y i (t) is the binary spike output for each neuron in each time step t , w + i,j is a binary variable equal to 1 if there is a non-zero weight from neuron i to neuron j , n synRC is the total number of synapses between ring and E-PG neurons, and the and p is the number of time steps assumed to be 63 between synaptic update. The final power estimates assume 1000 time steps per second.

Data availability
All data needed to support the conclusions in the paper are available via sources described in the paper or upon reasonable request to the authors. www.nature.com/scientificreports/