A hybrid and scalable brain-inspired robotic platform

Recent years have witnessed tremendous progress of intelligent robots brought about by mimicking human intelligence. However, current robots are still far from being able to handle multiple tasks in a dynamic environment as efficiently as humans. To cope with complexity and variability, further progress toward scalability and adaptability are essential for intelligent robots. Here, we report a brain-inspired robotic platform implemented by an unmanned bicycle that exhibits scalability of network scale, quantity and diversity to handle the changing needs of different scenarios. The platform adopts rich coding schemes and a trainable and scalable neural state machine, enabling flexible cooperation of hybrid networks. In addition, an embedded system is developed using a cross-paradigm neuromorphic chip to facilitate the implementation of diverse neural networks in spike or non-spike form. The platform achieved various real-time tasks concurrently in different real-world scenarios, providing a new pathway to enhance robots’ intelligence.


Supplementary Methods
Notation of CNN The output of CNN contains only one grid cell and are encoded as a (4+1)×2 tensor denoting the central coordinates (x, y), the width (w) and height h and the confidence c for both human and obstacle. To stabilize training and avoid numerical problems, the output values w and h are the scaling factor for two anchors, while the central coordinates (x, y) are the bias ratio adjusted by sigmoid nonlinearity: where the default anchor (width, height) for human and obstacle are the average values in the training dataset. The network is jointly trained with weighted sum of different losses: where obj denotes if object appears and { , } obj human obstacle  , denotes labels and  denotes weight of the loss. We find that decreasing c  helps training, and the tracking task mostly focus on the central position, so we set Optimizing the architecture of CNN, the final results of ( , , ) Auditory Data pre-processing The voice recognition is a six-instruction task (including 'left', 'right', 'straight', 'speed up', 'slow down', 'follow me' instructions). To obtain enough training data, we collect these commands announcing by several people for several times in the playground, a real-world scenario. After elaborate collections (e.g. remove some data with too high signal-to-noise rate), we retain a total of 690 instructions as the Page 5 of 6 dataset. For training and testing, we randomly partition them into test and training subsets by a ratio of 5:1.
During data pre-processing, an end-point detection method based on short-term energy was used to detect and abstract the useful speaker information. After getting segmentations from raw audio stream, we take MFCC, a well-used speech signal processing method inspired by the processing mechanism of the cochlear, to get different frequency features.
Specifically For each MFCC features, we use a Gaussian population consisting of 10 neurons with different respective fields to encode it into spike trains.

Sub-parts of motion module
The motion control module needs to generate the steering angle to help the bicycle maintain balance and follow desired trajectories, which mainly contains two steps: generating sequential action signals and integrating all sensor signals to target rotation angle.
In step one, the action instructions can be set manually or generated by high-level module.
We design three motion functions that contains certain motion patterns to generate sequences of action from a decision: (1) Turning command: The turning core transform one spike turning command to a smooth angle sequence. The turning curve we applied can be formulate as: where k is a coefficient, A is the sign of the signal which represents the direction (minus indicates turn left while positive indicates right), and t denotes the time step. For single turning command, the max can be set less than N/2 and when max tt  , y will maintain as max () yt . For 'force turn' instruction, N is 100 and max is equal to N, which means this core produces a sequential value in the following 100 time-steps, indicating a specialized turning pattern with reset.
(2) Tracking function: The tracking core continuously receives the detection results from CNN network. The tracking error is to smoothed by an exponential moving average: where  is a coefficient and set to 0.8, x is the tracking result x and ranges from [0,1].
Then the averaged error and the result of turning are summed as the final target angle of inclination 1  . ( where  notes the value in a single adjustment. We set  to 0.4, then the bicycle has 3 types of speed.