Convolutional spiking neural networks for intent detection based on anticipatory brain potentials using electroencephalogram

Spiking neural networks (SNNs) are receiving increased attention because they mimic synaptic connections in biological systems and produce spike trains, which can be approximated by binary values for computational efficiency. Recently, the addition of convolutional layers to combine the feature extraction power of convolutional networks with the computational efficiency of SNNs has been introduced. This paper studies the feasibility of using a convolutional spiking neural network (CSNN) to detect anticipatory slow cortical potentials (SCPs) related to braking intention in human participants using an electroencephalogram (EEG). Data was collected during an experiment wherein participants operated a remote-controlled vehicle on a testbed designed to simulate an urban environment. Participants were alerted to an incoming braking event via an audio countdown to elicit anticipatory potentials that were measured using an EEG. The CSNN’s performance was compared to a standard CNN, EEGNet and three graph neural networks via 10-fold cross-validation. The CSNN outperformed all the other neural networks, and had a predictive accuracy of 99.06% with a true positive rate of 98.50%, a true negative rate of 99.20% and an F1-score of 0.98. Performance of the CSNN was comparable to the CNN in an ablation study using a subset of EEG channels that localized SCPs. Classification performance of the CSNN degraded only slightly when the floating-point EEG data were converted into spike trains via delta modulation to mimic synaptic connections.


Introduction
Significant advancements in computing hardware, such as graphics processing units and field-programmable gate arrays, along with the availability of large datasets, has enabled researchers to develop highly effective neural networks in the last decade.However, training and utilizing these networks often involves a large amount of energy consumption, thus restricting the deployment of neural networks for data, and/or energy, limited settings: typically applications in dynamic/mobile environments.On the contrary, biology-inspired neural networks need only very few or even only one data point to perform at a competitive level compared to "traditional" neural networks (see page 54 in ref 1 ).Therefore, machine learning architectures more closely resembling biological neural networks are quickly gaining in popularity.One such example is the spiking neural network (SNN) [2][3][4] which mimics biological neural networks through its layers composed of spiking neurons.These neurons more closely resemble the synaptic connections between neurons in biological neural networks through their emission of aperiodic spikes as opposed to floating point numbers in the case of the traditional artificial neuron.This sparse and discrete behavior of SNNs has been shown to reduce energy consumption by orders of magnitude when implemented on emerging neuromorphic hardware 5 .However, shallow SNNs can be insufficient to detect patterns that occur at random times/locations in tasks such as object detection/segmentation, similar to standard multi-layer perceptrons.This has inspired the development of hybrid convolutional and spiking neural networks, referred to as convolutional spiking neural networks (CSNNs) [6][7][8] , which combine the convolutional layer's power of extracting spatio-temporal features with the energy efficiency of spiking neuron layers.In the past few years, CSNNs have received increased attention in diverse applications such as computer vision 9,10 , speech recognition 11 , hand-gesture recognition 12 and detection of Alzheimer's disease 13 reinforcing their utility in deciphering complex and multi-dimensional data.
The main contribution of this study is to evaluate the use of CSNNs in advanced driver-assist systems (ADAS), specifically those approaches that utilize electroencephalograms (EEGs).ADAS can be summarized as a group of assistive technologies designed to decrease the cognitive load associated with the driving task by assisting with driving and/or parking decisions thus aiding the driver in safely operating their vehicle.This technology has been rapidly introduced in modern vehicles and has shown to greatly improve road safety and reduce traffic accidents 14 .EEG is a method of measuring and recording electrical potentials from across various points in the human brain, thus serving as a primary method of discerning a person's current cognitive activity.EEG-based applications are commonly explored in the field of brain-computer-interface (BCI) 15 , which has contributed in the development of machine learning models dedicated specifically to the analysis and interpretation of EEG signals, for example "EEGNet" 16 .The inclusion of EEG as an auxiliary input source effectively fuses the fields of ADAS and BCI and gives subsequently developed technologies the advantage of an accurate real-time measure of otherwise unknown aspects of the driver state [17][18][19][20] and also allows for the prediction of a driver's intended action (e.g.braking) [21][22][23] before it occurs.Literature has reported anticipatory potentials being observed as early as 130 ms 24 and 320 ± 200 ms 25 before action onset.The present study focuses on the latter advantage of EEG and seeks to train a CSNN as the predictive classifier to detect these anticipatory brain potentials and thus predict braking intention.
Although some initial studies have been made to demonstrate the effectiveness of shallow SNNs in typical BCI applications [26][27][28][29][30][31][32] , the proliferation of other convolutional networks in the realm of BCI (e.g.EEGNet) and the reported success of deep learning methods in EEG decoding problems 33 implies that the inclusion of deep learning methods, such as the addition of convolutional layers, leads to a performance gain in classification tasks involving EEG data.Furthermore, the relative ease with which SNNs and their deep learning counterparts, CSNNs, can be mapped to emerging high-efficiency neuromorphic-computing hardware 5,[34][35][36][37] makes them ideally suited for deployment in mobile, energy-limited applications.The use of energy-efficient neuromorphic hardware becomes even more advantageous when implementing various learning methods for online continuous learning or one-shot learning 38 in energy-constrained applications.
To the author's knowledge, the potential of CSNNs for EEG-based ADAS has not yet been explored and will be a novel contribution.To achieve a fairer juxtaposition than directly comparing the CSNN's performance in this study to other methods in the literature, additional neural network models were trained on the same dataset to provide clearer context.These models include: i) a CNN of similar architecture; ii) EEGNet; and iii) three graph neural networks (GNNs).The CNN was chosen to be a direct comparison of the spiking architecture to a non-spiking architecture, the EEGNet was chosen as the "state of the art" benchmark model because of its previous history of generalizing better across different BCI paradigms and high performance achievement as compared to existing CNNs and traditional approaches 16 .Lastly, the inclusion of GNNs was motivated as an alternative to standard CNN networks because of their similar performance on adjacent EEG decoding tasks [39][40][41][42] .
Related Work.Previous studies on BCI-based driver intent detection present a gamut of technical approaches that mainly differ in the pre-processing strategies and various classifiers used.A popular family of classifiers rely on a powerful technique called "discriminant analysis", where a predictive function of a certain family (linear, quadratic, etc.) is created using independent variables and regression coefficients and used to predict a dependent variable.For instance, Teng et al 43 used the sequential forward-floating search method to define a feature set from powers of frequency points across 16 EEG channels.These features were then used as input to a regularized linear discriminant analysis (RLDA) classifier to determine braking intention from normal driving with a reported accuracy over 94%.In another study, a modality combination consisting of EEG, tibalis anterior electromyography (EMG) and brake pedal signal were used as input to a RLDA classifier for braking intent detection 44 .Khaliliardali, et al 25,45 used a low frequency bandpass filter ranging from 0.1 Hz to 1 Hz and a quadratic discriminant analysis (QDA) classifier to classify braking intention.Haufe et al. 24 compared EEG, EMG and brake pedal response to determine the input feature that predicts braking intention the fastest, again using RLDA as the classifier.
As a competitor to the "discriminant analysis" methods, the other popular classification methods in the literature use neural networks and deep neural networks.For example, Hernandez, et al 46 conducted a braking intention study using support vector machines and convolutional neural networks (CNNs) to differentiate normal driving and braking intention EEG signals achieving a reported average accuracy of 71% and 72% for support vector machines and CNNs, respectively.Nguyen, et al 47 compared EEG band power-based and autoregressive-based feature selection methods for braking intent detection using EEG signals as input to a multilayer perceptron neural network, reporting a better accuracy of 91% with the autoregressive based method.Lee, et al 48 used recurrent convolutional neural networks (RCNNs) to predict braking intention from EEG data, achieving an AUC score of 0.86.It is evident from the literature that a variety of methods have been used with mixed results.Although there are some examples of neural network usage, the use of SNNs for the braking intention EEG decoding problem is noticeably absent.
The EEG pattern studied here is the contingent negative variation (CNV), which is a type of slow cortical potential (SCP) that occurs prior to movement in the central region of the brain.The CNV, in particular, manifests when a subject is given a warning stimulus followed closely by an imperial stimulus, or stimulus requiring an action.It is featured in previous movement intention literature that also focused on driver braking intent detection 21,25,45 .However, the CNV is not the only EEG pattern used for intention detection in the literature.Event-related desynchronization (ERD) is an EEG phenomenon occurring in the mu and beta frequency bands up to two seconds before movement is realized.It is marked by a decrease in the spectral power of EEG within those bands that is not restored until after the movement is completed.Planelles et al 49 conducted a study to find a suitable classifier for ERD stemming from a self-determined reaching movement in healthy patients reporting 72% accuracy using an SVM classifer.Chamanzar et al 50 developed a novel algorithm for using ERDs to detect hand movement intention using adaptive wavelet transform.They reported a one second detection delay, a sensitivity of 88% and a selectivity of 78%.ERDs are also used for motor imagery decoding problems.Song et al 51 used a two-phase classifier design to reduce false positives in a motor imagery for rehabilitation application using ERDs as the EEG input.The reported results included a sensitivity of 61% and a selectivity of 78%.Indeed, ERD is a popular and useful EEG related signal that could serve as an alternative to slow cortical potentials for movement intention related applications, and the potential of CSNNs for ERD, or CNV in general, phenomenon has not yet been explored.
The use of Bereitschaftpotentials (readiness potentials) has also received attention within the field of EEG related BCI.This is a slowly building neural signal occurring 1-2 seconds before movement onset 52 .To better understand how the readiness potential is connected with areas of the brain responsible for the motor preparation process, Nguyen et al 52 conducted a study simultaneously integrating acquired EEG and fMRI through computational modeling and determined that reciprocal connections between the SMA and anterior mid-cingulate cortex (aMCC) are important to maintain the sustained activity of the readiness potential before movement.Other works have used the readiness potential for movement intent detection.Mirzabagherian et al 53 developed two convolutional neural networks composed of temporal-spatial, separable and depth-wise layers and used these networks to detect movement-related cortical potentials (MRCPs) indicating five different hand movements performed by patients with cervical spinal cord injury.They reported a classification accuracy of 71% and 65% for the Temporal-Spatial Convolutional Iterative Residual Network and Temporal-Spatial Convolutional Residual Network for the EEG_All dataset and accuracies of 58% and 68% for the EEG_Low frequency dataset.Gatti et al 54 also studied harnessing MRCPs for movement speed and force intent detection.A four class dataset was created having subjects conduct a right hand palmar grasp task using 20% and 60% of the maximum voluntary contraction and either a 3 second slow grasp or 0.5 second fast grasp for each amount of force.Convolutional neural nets were used as the classification model achieving an overall accuracy of 84%.Mussini and Di Russo 55 investigated how anxiety can affect anticipatory brain functions by observing its effect on pre-stimulus ERP and the Bereitschaftspotential when performing tasks with and without feedback.Their results showed that high anxiety can diminish the presence of these signals but the addition of feedback can restore them to low anxiety levels.

Identification of Braking Intention Signature in EEG Signals
An example of the pre-processed EEG signals from 19 channels with data markers signifying the temporal locations of the audible countdown commands is shown in Figure 1(a).Each marker is denoted by its associated countdown number from 5 to 1 and ending with "STOP" when the stop command was given.The Cz grand average of the pre-processed data is shown in Figure 1(b) along with scalp plots visualizing how the channel grand averages changed over time.The grand averages were calculated by averaging the Cz electrode signal from all participants and across all trials, similar to the procedure followed by Khaliliardali, Chavarriaga, Gheorghe and Millan 25 .The following observations can be made from Figure 1.
1.The negative EEG potential, termed contingent negative variation (CNV) potential, started after the "2" count marker and reached the maximum negative value between the "1" count marker and "Stop" command.
2. The negativity rate sharply increased at the "1" count marker and the potential rate became sharply positive midway between the "1" count marker and "Stop" command.
3. Anticipatory potentials were clearly observed before the actual braking action.
4. The more negative potentials were spatially localized in the centro-medial electrodes.
The results obtained are consistent with other past studies on CNV 21,25,45 .

Classification Performance -Case 1: 32-bit Single Precision Floating-Point EEG Input Data
The final pre-processed dataset used for training the models included 10702 data segments collected from 15 participants in an experiment described in the Experimental Design section and preprocessed according to the procedure outlined in the Data Preprocessing section: 8573 data segments labelled as class '0' (no intention signal) and 2129 data segments labelled as class '1' (intention signal).The models were evaluated by means of 10-fold stratified cross-validation where the training and testing partitions of each fold maintained the original class distribution.The data segments in each fold were shuffled before training and testing.To mitigate the skewed distribution of the classes, wherein approximately 80% of the dataset was negative class and approximately 20% was positive class, the samples belonging to each class were given a weight used in the training loss function.The class weights were calculated as:   where N D and N c,i are the total number of data points in the training partition and the number of class i data points in the training partition, respectively.The convolutional layers in the CSNN and CNN contained a kernel size of 5 × 5, a stride of 1, a padding of 0, and 12 and 64 filters for the first and second convolutional layers, respectively.All max pooling layers of both architectures used a 2 × 2 pooling region with a stride of 2. The sigmoid surrogate function smoothness value, k, for the surrogate backpropagation of the CSNN was chosen as 0.25.The GCN and GCS graph convolutional layers had output sizes of 115, 28, 14 and 3.The GIN had graph convolutional layers with output sizes of 924, 462 and 231.The multi-layer perceptrons used in the GIN contained 5 hidden layers, each with 256 hidden neurons.All GNN architectures utilized in this work were implemented using the Spektral Python package 56 .The CSNN and CNN models were implemented using PyTorch 57 .EEGNet was implemented using tensorflow 58 with default values with the exception of the length of the 2D convolutional kernel, which was set to 250.
All architectures were trained for up to 1000 epochs per fold with a batch size of 8, and with early termination if the loss did not improve for 50 subsequent epochs.The architectures had the same respective weight initialization for each fold.The CSNN Leaky-Integrate-and-Fire (LIF) layer had a memory decay rate of 0.5, a spiking threshold of 0.5 and used 25 input steps.All models used the 'ADAM' optimizer with learning rate γ = 5e − 4, running average coefficients β 1 = 0.9 and β 2 = 0.999, stability parameter ε = 1e − 8, and binary or categorical (for EEGNet) cross entropy as the loss function.
Table 1 shows the mean and standard deviation for predictive accuracy (Acc), true positive rate (TPR), true negative rate (TNR), F1-score, number of epochs trained, and total training time for each model.The table also shows each model's inference time for a single data point.The p-values from two-sample t-tests comparing the CSNN with the other models are shown for each classification metric.No attempt was made to optimize the learning parameters of each network.The results indicate that the CSNN outperformed all the other models in every classification metric category albeit closely followed by EEGNet.The CSNN also showed small standard deviations showcasing the consistency in results across folds.The GNNs exhibited a tendency to get stuck in local minima, evidenced by large standard deviations in all the performance metrics.However, the CSNN had the largest average training time and largest inference time out of all of the models.

Classification Performance -Case 2: Ablation Study with 32-bit Single Precision Floating-Point EEG Input Data from Five Channels
Laboratory experiments offer the privilege of using state-of-the-art equipment that is typically unhindered in its data collection methods.However, real-world scenarios may present unique challenges where a full 20 channel EEG headset is not possible or practical.To study how a reduced number of available channels would affect the performance of the classifier, a five-channel analysis was performed.The Cz channel and four surrounding channels (Pz, C3, C4 and Fz) were considered for the ablation study.Table 2 shows results for the classification performance with a reduced number of channels.

Classification Performance -Case 3: Delta-Modulated Spike Train Input Data
The effect of processing the input EEG data from all 19 channels into spike train data before passing to the CSNN network was also studied, namely its impact on classification performance of the network.The filtered and segmented EEG data, normalized to lie in the range of 0 to 1 inclusively, was transformed into a 19 channel array of spike train data by monitoring the change in value of successive data points in each channel.If the value change was greater than a threshold value, a spike '1' was recorded.
If not, a '0' was recorded.The result was a binary array of the same dimensions as the floating-point input.The threshold value was varied from 0.05 to 1 which resulted in spike trains of differing densities.The CSNN model used had the same parameters as in Case 1.
Table 3 shows the 10-fold classification results for Case 3. The results indicate that good predictive performance can still be achieved when the floating-point EEG data were converted into spike trains prior to being input to the network, if a suitable threshold value was selected.Spike train conversion thresholds that are too small or too large constrain the abilities of the CSNN to learn effectively, most likely because features important to the correct classification of data were being obscured.If the threshold was too small, then the spike train becomes saturated, making it difficult for the network to determine the important features.If the threshold was too large, then important features may not be captured at all.
The best results were obtained with a threshold of 0.5.The sensitivity of the classification results to the threshold value was studied using two-sample t-tests comparing the classification measures corresponding to the 0.5 threshold with the measures corresponding to the other thresholds.The t-test results shown in Table 3 indicate significant performance degradation above a threshold of 0.625 and below 0.375.This implies that a range of threshold values could be used to obtain statistically similar results, offering flexibility in the threshold selection.The performance of the CSNN using a threshold value of 0.5 was comparable to the EEGNet results when trained on the floating-point data.A two-sample t-test showed that TNR for the CSNN was statistically better (p-value was 0.041), but the accuracy and F1-score metrics were statistically similar (p-values were 0.700 and 0.537, respectively).On the other hand, the TPR for EEGNet was significantly better (p-value was 0.028).

Discussion
The results presented here show that the CSNN can be used as a classifier for detecting features in EEG data that predict braking intention, which occurs before the actual physical activity.To benchmark the CSNN performance, results were compared to a standard CNN, EEGNet and three GNN models using a 10-fold cross-validation scheme with the CSNN achieving the highest performance and with more consistency.The p-values from two-sample t-tests in Table 1 show a significantly higher performance of the CSNN over the GNNs in almost every metric category (except TNR of the GCN network, where the p-value was slightly above 0.05).This result is not surprising when the means of the metrics are compared.This fact is in stark contrast to the p-values of the CNN and EEGNet, where the CNN has noticeably lower mean values than the CSNN and is statistically similar; however, EEGNet is the closest of any of the other models and is statistically different.This can be explained by the fact that the CNN had enough folds containing results that were nearly identical to the CSNN performance, but also had a couple of folds with poor performance that biased the grand averages.As a result, the overall model performance was not significantly 5/16 different than the CSNN.On the other hand, although EEGNet performance was competitive and consistent, it did not quite match the CSNN performance on any fold.Therefore, the p-values indicated statistically significant higher performance of the CSNN compared with EEGNet.The authors hypothesize that a possible explanation for the CSNN's success is that converting the floating-point numbers to spike trains allows it to filter more efficiently, passing the most important feature maps to the next layer.
Despite the CSNN's success in classification, it had the longest average training time and largest inference time by a large margin.While computational efficiency could be improved by deploying CSNNs on neuromorphic hardware, those gains were not realized in this study where only a von Neumann computer was utilized.Neuromorphic chips, such as the Intel Loihi chip, have been shown to produce faster training times than von Neumann processors and deep learning accelerators 5 .
Results of the five-channel ablation study indicate that a few strategically chosen channels may be sufficient with the CSNN's classification performance being nearly identical to that attained with all 19 channels.It can also be seen that the CNN performance increased considerably compared to the results shown in Table 1 with all 19 channels.Although the mean performance of the CNN was substantially lower when all 19 channels were used, the p-values in Table 1 indicate that the results were not significantly different from the CSNN.As noted previously, a couple of folds with poor performance biased the CNN grand average.Therefore, the boost in CNN performance to the level of CSNN in the ablation study is not altogether surprising and could simply have resulted from more consistent performance across folds when using only five channels.
The spike train conversion results shown in Table 3 have significant implications because converting the floating-point EEG data to spike trains at the outset could allow for additional energy savings, thereby taking full advantage of any neuromorphic computing hardware used to implement the CSNN for large, complex datasets and/or in real-time applications.Findings from this study can be exploited in future work to implement the CSNN on a neuromorphic platform to study the actual energy efficiency and feasibility for on-line learning in real-time applications.
The Cz grand average calculation and the analysis itself is based on the assumption of zero phase shift of the CNV pattern relative to the external stimuli.Lew et al 21 and Khaliliardali et al 25 discuss in great detail SCPs, of which CNVs are a subset.In these studies, the SCP was determined to begin as early as 1.5 seconds before the onset of movement and therefore the crucial aspects of the CNV should be contained within the data between the "1" count marker and the stop marker.Variations of the exact timing of the pattern will occur between trials but as mentioned above, the grand average pattern obtained in this study are consistent with the results reported in the literature.Also, it is well known that EEG has poor spatial resolution.For this reason, the CNV pattern in Figure 1 appears to occur across the entire brain instead of the central area as expected.Results with a higher number of channels than the 19 channels considered in this study have reported greater regional localization 25 .Furthermore, ablation study results presented in Table 2 confirm that the CNV is occurring in the central area.
The EEG data was collected from 15 participants with the dataset containing a total of 3244 trials that were cleaned and segmented into 10,702 data segments.Although the number of participants in this study is relatively small, it is on par with other studies in the literature.The classification experiment conducted in-house involved using a simulated-realistic testbed with the participant operating a remote-controlled vehicle using a live video feed under ideal conditions.It would be interesting to study the performance of the CSNN when the participants are cognitively stressed, for example, when under fatigue or in the presence of distractions.Also of interest would be to study the abilities of the CSNN in other EEG-BCI applications such as P300, motor imagery, motor-related cortical potentials and steady-state evoked potentials or to explore the use of CSNNs for movement intention detection using Bereitschaftpotentials, which are well-known as an indicator of movement preparation

Convolutional Spiking Neural Networks
CSNNs are deep networks comprised of standard convolutional layers that extract feature maps from the input data before passing these feature maps to subsequent spiking layers.These combination layers are referred to here as "convolutionalspiking" layers.For classification, the output layer is composed of a fully connected layer with linear activation function followed by a spiking layer.Figure 2(a) shows a schematic of the specific CSNN used in this study.It is comprised of two convolutional-spiking layers followed by the output layer.Each convolutional-spiking layer is comprised of a two-dimensional convolution layer followed by a two-dimensional max pooling layer and ending with an output spiking layer.The spiking layer in each convolutional-spiking layer is composed of a tensor of LIF neurons having the same shape as the shape of the input to the layer.In the output layer, the fully connected layer and the subsequent output LIF spiking layer both have two neurons for the two classes in the EEG dataset.The predicted output of the CSNN was determined by counting the number of spikes output by these two neurons and setting the predicted label to the class represented by the neuron which produced the most spikes.A tie would result in a predicted class of '0'.
Due to the non-differentiable nature of the output of spiking neurons, training SNNs is difficult and requires special approaches beyond simply using standard backpropagation.If the spiking behavior of a neuron is represented as: where Θ(•) is the heavy-side function, U mem [t] is the membrane potential of the neuron and θ is the spiking threshold, the derivative of ( 2) with respect to U mem is the dirac delta function: which is defined as zero for all time except where U mem = θ at which it is infinity.This leads to the "dead neuron" problem for training using backpropagation.To mitigate this, the surrogate gradient approach is employed wherein during the "backwardpass", when the gradient of the loss function due to the network parameters is being the heavy-side function is approximated using a sigmoidal function, thereby creating a readily differential function.The exact function used as the surrogate in this paper is described as: where k is known as the 'slope' and determines the smoothness of the surrogate function.The derivative of ( 4) is then obtained as: From ( 5) it can be seen that as k increases, (5) converges to (3).For a more detailed explanation of SNNs and their training, the reader is referred to 59 .

Convolutional Neural Network
For the sake of a direct spiking vs. non-spiking comparison, a CNN composed of a largely identical architecture as that of the CSNN is considered to fully quantify any differences in performance that may arise by the addition of the spiking layers.As shown in Figure 2(b), the CNN has two convolutional layers, each including a max pooling layer, followed by a fully connected linear layer and ending with a logistic sigmoid output layer for class prediction.

EEGNet
EEGNet is a single CNN architecture designed for classification tasks across multiple EEG-based BCI domains (P300 visualevoked potentials, error-related negativity responses, movement-related cortical potentials and sensory motor rhythms) 16 .It consists of two blocks of convolutional layers followed by a dense layer with finally a softmax layer.It is compact in terms of the number of model parameters (see Figure 3).In the first block, two convolutional operations are done in sequence.This block starts with a temporal convolution to learn frequency filters followed by "depthwise" convolution to learn frequency-specific spatial filters.The second block also includes two convolutional operations.The first is another "depthwise" convolution to individually learn the temporal feature map and the second is "pointwise" convolution to optimally combine the feature maps.These two convolutions are combined into one layer, termed "Separable 2D Convolution".The output of the second block of layers is then flattened and passed to the dense and softmax layer for generation of the predicted class.

Graph Neural Networks
GNNs are a specialized version of neural networks designed to operate on graph data.A graph is a grouping of data with defined internal relationships (edges) between objects (nodes) where these relationships may or may not be euclidean in nature.Mathematically, a graph is typically represented as: G = (V , E , A) where V represents a finite set of nodes having length |V | = N, E is a set of edges between the nodes, and A ∈ R N×N is the adjacency matrix containing the edge weights.Graph data is input to a GNN as a matrix of node feature vectors: X ∈ R N×n , where N is the number of nodes and n is the number of node features, along with its adjacency matrix, A, and sometimes a set of edge features, E. Some graph operations also include the diagonal degree matrix D where D ii = ∑ j A i j .For more details on graph theory and a detailed survey providing a comprehensive overview of GNNs, the reader is referred to 60,61 .Because of the spatial relationship between electrodes, EEG data can naturally be represented as graphs with each data input sharing the same node and edge structure, thereby differing only in node data.Expressing the data as graphs allows for the non-euclidean spatial relationships to be exploited as extra information available to the classifier.For this reason, EEG-BCI applications have used GNNs previously, to notable effect.As shown in Figure 4, the three GNN architectures used in this work differ only in their initial graph processing layers, whilst universally sharing the last three layers.Each network possessed a global attention sum pool graph aggregation layer where a is a trainable weight vector, X is the layer input tensor, N is the number of nodes in the input graph, and the softmax operation is applied over nodes instead of features.As shown in Figure 4a, the first architecture, GCNConv network 62 , consists of four graph convolutional (GCN) layers.The GCN layers perform the operation: where Y is the output of the layer, Â = A + I is the adjacency matrix of the input graph plus the identity matrix of appropriate shape, D = ∑ j Âi j is the degree matrix, X is the layer input tensor, W is the layer weights, and b is the layer matrix.
Figure 4b shows the second architecture, which is the GCSConv network.This consists of 4 GCS layers, which are GCN layers with an added, trainable skip connection.The GCS layer operation is described by: where Y is the output of the layer, D is the degree matrix, A is the adjacency matrix, X is the node feature matrix, W 1 and W 2 are the two sets of layer weights, and b is the layer bias.The third architecture is the GIN architecture 63 and is shown in Figure 4c.This architecture contains three graph isometric network (GIN) layers where each layer performs the following operation for each node in the input matrix: where MLP(•) is a multi-layer perceptron, ε is a learned parameter and x i is the i th node of the input matrix.The adjacency matrix, which would be common to all data samples, for all the three GNNs, was calculated as follows: where A ∈ R N×N is the adjacency matrix, |P| ∈ R N×N + is the absolute value of the Pearson's correlation coefficient of the dataset and I N×N is the identity matrix.

Experimental Design
The total number of participants in the experiment was 15 (13 male, 2 female).They consisted of Missouri University of Science and Technology students and professors, all in healthy condition.The participants had normal or corrected to normal vision and had normal hearing.The experiment received approval from the University of Missouri Institutional Review Board, and all experiments were performed in accordance with relevant guidelines and regulations.Written informed consent was obtained from all subjects and/or their legal guardian(s) prior to their participation.Further, written informed consent was obtained for publication of identifying information/images in an online open-access publication.
The objective of the experiment was to induce a predictable response in the participants such that any anticipatory signals that may occur can be reliably measured and recorded using an EEG.The experiment simulated a real-world driving environment wherein the participants operated an open-source remote-controlled robot called JetBot (built using Waveshare's Jetbot AI Kit and Nvidia's 4GB Jetson-Nano) on a novel testbed designed to simulate urban roadways (see picture inserted in Figure 5) .The testbed boundary was marked with standard masking tape and the track material was Delxo's anti-slip tape (with 80-grit granularity) to provide additional traction to the JetBot wheels.The participants navigated the JetBot in the testbed lanes using a Logitech G29 Driving Force racing wheel and pedal setup, while watching a live video feed cast to a computer monitor from an on-board camera.The JetBot was programmed to drive at a constant speed without the participant pressing the acceleration pedal, necessitating only the use of the steering wheel and brake pedal for full control.There were no other 'vehicles' or obstructions on the testbed lanes and participants were free to navigate anywhere within the testbed boundaries.
The EEG signals of the participants were recorded using a Neuroelectrics ENOBIO 20 EEG headset.The electrode setup used was the international 10-20 standard and the sampling frequency during the experiment was 500 Hz.Data was collected from 19 channels by applying a high conductivity Signagel saline gel on the electrodes to increase the quality of data capture.The data acquisition software used was the Neuroelectrics NIC2 software, which featured its own EEG signal quality monitor.The quality monitor assessed the EEG signal by computing a quality index (QI) that was dependent on: i) line noise, which was defined as electrical noise originating from surrounding power lines; ii) main noise, which was defined as the signal power of the standard EEG band; and iii) offset, which was the mean value of the waveform.Specifically, QI was calculated as 64 : A green indicator meant that the signal had a QI between 0 and 0.5, an orange indicator meant a QI of between 0.5 and 0.8, and a red indicator meant a QI of 0.8 to 1.For this experiment, green indicators for all channels was the standard; however, brief periods of orange indicator were considered acceptable.The data was filtered using a 60 Hz filter during capture to help reduce electrical noise and all channels were captured with reference to the Common Mode Sense channel (20th channel) which was fixed to the participant's right ear lobe.The experimental design was based, in part, on the experiment conducted in 25 and is illustrated in Figure 5.Each participant underwent 8 sets of 30 trials each for a total of 240 trials, with short 5-10 minute breaks between sets.Each trial consisted of a set of audible commands issued by MATLAB that included a "Start" command, upon hearing which the participant would release the brake allowing the JetBot to move, followed by a countdown from 5 to 1 and ending with a "Stop" command, when the participant would immediately stop the JetBot by pressing the brake.To ensure that the participants responded in a timely fashion, activity at the brake pedal was monitored and any trial where the brake pedal depression did not register a numerical reading higher than 0.05% of its total depression range within 0.25 seconds of the issuance of the "Stop" command were marked for removal.Trials where the participant braked too early were manually marked for removal as well.The EEG recording would begin concurrently with the issuance of the "Start" command, markers corresponding to the countdown numbers would be applied to the data concurrently with each audio count and the EEG recording would stop at detection of braking action or after the 0.25 second delay to check brake pedal depression.

Data Preprocessing
The data was processed in several steps using the open source EEG toolkit EEGLAB 65 along with the TBT plugin 66 for only the trials that were not marked for removal during the experiments (see Figure 6).

Each trial was:
(a) Spectrally filtered using a FIR bandpass filter from 0.1 Hz to 1 Hz as suggested in 22 .
(b) Cleaned using EEGLAB's built-in automated cleaning function "Clean_RawData and ASR" (Artificial Subspace Reconstruction) to remove bad channels under the following criteria: If the channel: i) was flat for more than five seconds; ii) correlated at less than 0.8 to an estimate based on nearby channels; and iii) contained more than four standard deviations of line noise relative to its signal.The ASR attempted to correct bad data periods containing artifacts and its maximum acceptable 0.5 second window standard deviation limit was set to a conservative 20 standard deviations.The values used in this step were the standard default values in EEGLAB and were also used as part of a pre-processing scheme in a previous study 67 .
(c) Segmented by slicing the data according to the markers corresponding to the countdown numbers or the "Stop" command.For example, as shown in Figure 5, the first segment consisted of taking only the data that occurred between the "5" count marker and the "4" count marker.The segments from 5 to 1 were given a "0" label or were regarded as not containing an intention signal and the segment between "1" and "Stop" was labelled as "1" and was regarded as containing the signal of interest.Each segment was then baseline corrected by subtracting the mean value of the segment from every value in the segment.
2. Each segment was then further cleaned using the TBT plugin to remove high amplitude noise.Channels were removed from the segment if either of the following two criteria was met for a data period duration of 10% of an segment or more 67 : i) if they exceeded ±100 µv in magnitude, or ii) if the joint probabilities (i.e., probabilities of activity) exceeded 3 standard deviations for local or global thresholds.If either criterion was met for less than 10% of an segment, then the offending data period was removed and subsequently interpolated.Segments with more than 50% of channels removed (i.e., 10 channels) were omitted entirely.Any removed channels were re-interpolated after the cleaning was finished for consistency in input dimension.
3. Every data point was padded to a uniform length of 1848, the size of the largest dataset in a trial, by appending zeros to the end.Padding data should not significantly change the prediction results and is commonly done when using machine learning algorithms on datasets containing images having different sizes.Furthermore, padding the data should less significantly alter the dataset versus truncating data, which allows for the possibility of deletion of key features.
4. Finally, each channel was normalized such that the data lay within a range of 0 to 1.The particular equation used was: where i is the i th data channel and X is the data vector corresponding to that channel, max(•) and min(•) are the maximum and minimum channel values per segment, respectively.

Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Figure 1 .
Figure 1.Pre-Processed EEG Signals.(a) Channel potentials with associated countdown and "Stop" command markers and scale.(b) Cz grand average signal with scalp maps representing the grand average at the midpoint between two neighboring markers and color bar on the right displaying the potential in µV .

9 / 16 Figure 5 .
Figure 5. Experimental design illustration of a trial.Photo by Micheal Pierce/Missouri S&T ) where ζ L (t) and W ζ L denote the line noise and line noise normalizing weight (= 100 µV), respectively, ζ m (t) and W ζ m denote the main noise and main noise normalizing weight (= 250 µV), respectively, and O(t) and W O denote the offset and offset normalizing weight (= 280 mV), respectively.The NIC2 software indicators used a color scheme to indicate different levels of QI.

Figure 1 .
Figure 1.Pre-Processed EEG Signals.(a) Channel potentials with associated countdown and "Stop" command markers and scale.(b) Cz grand average signal with scalp maps representing the grand average at the midpoint between two neighboring markers and color bar on the right displaying the potential in µV .

Figure 5 .Figure 6 .
Figure 5. Experimental design illustration of a trial.Photo by Micheal Pierce/Missouri S&T

Table 2 .
Five -channel ablation study with floating-point EEG input data (best performance in each classification measure highlighted in bold font)

Table 3 .
Classification performance with delta modulated spike train input data (best performance in each classification measure highlighted in bold font)

Table 1 .
Classification performance with floating-point EEG input data (best performance in each classification measure highlighted in bold font)

Table 2 .
Five-channel ablation study with floating-point EEG input data (best performance in each classification measure highlighted in bold font)

Table 3 .
Classification performance with delta modulated spike train input data (best performance in each classification measure highlighted in bold font