Visual explanations from spiking neural networks using inter-spike intervals

By emulating biological features in brain, Spiking Neural Networks (SNNs) offer an energy-efficient alternative to conventional deep learning. To make SNNs ubiquitous, a ‘visual explanation’ technique for analysing and explaining the internal spike behavior of such temporal deep SNNs is crucial. Explaining SNNs visually will make the network more transparent giving the end-user a tool to understand how SNNs make temporal predictions and why they make a certain decision. In this paper, we propose a bio-plausible visual explanation tool for SNNs, called Spike Activation Map (SAM). SAM yields a heatmap (i.e., localization map) corresponding to each time-step of input data by highlighting neurons with short inter-spike interval activity. Interestingly, without the use of gradients and ground truth, SAM produces a temporal localization map highlighting the region of interest in an image attributed to an SNN’s prediction at each time-step. Overall, SAM outsets the beginning of a new research area ‘explainable neuromorphic computing’ that will ultimately allow end-users to establish appropriate trust in predictions from SNNs.


Introduction
Artificial Neural Networks (ANNs) [17,49,21] have shown human-level performance on a wide variety of tasks but incur huge computational cost.For instance, while ResNet-50 [17] reduces the top-5 error by 11.1% on Ima-geNet dataset compared to AlexNet, it requires about 5× more energy for classifying one image [48].However, in many real-world applications, neural networks are required to be implemented on resource-constrained platforms.Spiking Neural Networks (SNNs) [37,33,7,11,8] offer an alternative way for enabling low-power artificial intelligence.SNNs emulate biological neuronal functionality by processing visual information with binary events (i.e., spikes) over multiple time-steps.This discrete spiking behavior of SNNs have been shown to yield high energyefficiency on emerging neuromorphic hardware [14,3,9].
Optimization methods for SNNs have made great strides on image classification tasks over the recent past.Conversion methods [41,16,12,38] convert a pre-trained ANN to an SNN by normalizing firing thresholds or weights to transfer ReLU activation to Integrate-and-Fire (IF) spiking activity.So far, conversion techniques have been able to achieve competitive accuracy with ANN counterparts on large-scale architectures and datasets but incur large latency or time-steps for processing.On the other hand, surrogate gradient descent methods [29,16,27] train SNNs using an approximated gradient function to overcome the nondifferentiability of the Leaky-Integrate-and-Fire (LIF) spiking neuron [23].Such methods enable SNNs to be trained from scratch with lower latency on conventional deep learning frameworks (e.g., TensorFlow [1]) with reasonable classification accuracy.
Despite improvement in optimization techniques, there is a lack of understanding pertaining to internal spike behavior of SNNs compared to conventional ANN.Neural networks have been conceived to be "black-boxes".However, with ubiquitous usage of neural networks, there is a need to understand what happens when a network predicts or makes a decision.On the ANN front, several interpretation tools have been proposed [51,13,56,40] and have found practical usage for obtaining visual explanations and understanding the network prediction.On similar lines, an SNN interpretation tool is also highly crucial because lowpower SNNs are increasingly becoming viable candidates for deployment in real-world applications such as medical robots [6], self-driving cars [20], and drones [39], where explainability in addition to performance is critical.In this work, we aim to shed light on the explainability of SNNs.
The naïve approach for explainability is to exploit widely used visualization tools from ANN domain.Among them, Grad-CAM [40] has a huge flexibility in terms of application, and is also used by state-of-the-art interpretation algorithms [19].The authors of Grad-CAM show that the contribution of a neuron from shallow layers to deep layers towards any target class can be quantified by calculating the gradient with backpropagation.But, SNNs cannot compute exact gradient (i.e., contribution) because of the nondifferentiable integrate and firing behavior of an LIF neuron (see Section 3) as shown in Fig. 1.Therefore, a new concept of visualization for SNNs is required.
In this study, we propose a novel visualization tool for SNN, called Spike Activation Map (SAM), which does not require any backpropagation.Instead, we calculate an attention map by monitoring neurons that carry more information (i.e., spikes) over different time-steps during forward propagation.We exploit the biological observation that short Inter-Spike-Interval (ISI) spikes have more information in a neurological system [36,46,45] because these spikes are more likely to induce post-synaptic spikes by increasing the membrane potential of the neuron.Specifically, for each neuron, we compute a neuronal contribution score (NCS) for the prediction.The NCS score is defined as the sum of temporal spike contribution score (TSCS) with an exponential kernel.The TSCS score assigns high value for spikes firing within a short time window, otherwise, assigns low value.Then, we add the NCS values across the channel axis to get a 2D spatial heatmap.We highlight that, unlike conventional visualization tools, our SAM does not require target class label to find a contribution or visual explanation [55,40].
Further, by using SAM, we investigate various configurations of SNNs.Firstly, we compare the internal spiking behavior of two different SNN training methods: surrogate gradient based training [27] and ANN-SNN [41] conversion on a non-trivial image dataset (i.e., Tiny-ImageNet).Then, we observe the spike representation of each layer across different time-steps to understand the temporal characteristics of SNNs.We also analyze the effect of varying factors such as a leak rate and related hyperparameters on SAM and overall prediction.Finally, we provide a visual understanding of previously observed results [43] that SNNs are more robust to adversarial attacks [15].We measure the difference of heat maps between clean samples and adversarial samples using SAM to highlight the robustness of SNNs with respect to ANNs.
In summary, our key contributions can be summarized as follows: (i) For the first time, we introduce a novel visualization technique for SNNs, called Spike Activation Map (SAM).We circumvent the non-differentiability problem of LIF neuron by calculating an attention map based on short ISI spikes in neurons.(ii) Interestingly, we find that SAM shows reliable visualization results without any ground truth class labels.(iii) By using SAM, we visualize and analyze the temporal characteristics and internal spike behavior of SNNs across various configurations, such as, training schemes, temporal parameters, adversarial inputs.Overall, our proposed SAM opens up the possibility towards interpretable and reliable neuromorphic computing.

Spiking Neural Networks
Spiking Neural Networks (SNNs) have recently emerged as the next generation AI due to their huge energy efficiency benefits on asynchronous neuromorphic hardware.Following the recent development of neuromorphic computing architectures such as TrueNorth [3] and Loihi [9], training algorithm of SNNs has received huge attention.One intriguing learning algorithm is spike-timing-dependent plasticity (STDP) [5] with a bio-plausible Hebbian learning rule [18].This algorithm is based on local learning by using the spike correlation of pre-synaptic spikes and post-synaptic spikes.So far, STDP-based learning has been confined to shallow networks on small-scale datasets due to the absence of a global optimization rule.Another widely-used method is ANN-SNN conversion method [41,16,12,38], which converts a pre-trained ANN to an SNN.Since networks are trained in ANN domain, the training complexity is significantly removed.With careful threshold (or weight) balancing [12], ANN-SNN conversion shows good performance on large-scale datasets.It is worth mentioning that temporal dynamics are not considered in the process of training for converted SNNs.Recently, training SNNs with backpropagation [32,28,53,27] has been studied because it can take into account temporal neuronal dynamics during surrogate gradient descent.Despite the huge progress in training methods for SNNs, there is little attention given to the internal spike behavior of SNNs.Therefore, in this paper, we focus on SNN interpretability.Our results show that surrogate methods which have explicit temporal dependence during training are more interpretable than conversion.

Visualization Tools for ANNs
The interpretation of prediction in neural networks has received considerable attention due to its practicality in realworld scenarios.Class Activation Map (CAM) [55] highlights the discriminative region of an image by using a global average pooling layer at the end of the feature extractor.The CAM heat map is obtained by summing the feature maps at the last convolutional layer.Several variations of CAM have been proposed [54,52,44].However, the necessity of the global average pooling layer in CAM limits its usage.To address this issue, Selvaraju et al. proposed Grad-CAM [40], which is the generalized version of CAM.Grad-CAM computes backward gradients from the classifier to a given intermediate layer where visual explanation is required.Thus, the contribution of each neuron to the classification result can be quantified with the corresponding gradient value.Then, a 2D heatmap is obtained by using the weighted sum of the activations across the channel axis based on the gradient value.In this work, we justify that directly applying Grad-CAM to calculate visual explanations in SNNs does not yield accurate results due to the non-differentiable nature of LIF neuron as well as nondependence on temporal dynamics.

Background
Poisson Rate Coding: To convert a static image into multiple binary spikes, we use Poisson rate coding, or ratebased coding.This is based on the human visual system [2], and shows outstanding performance among various spike coding schemes such as temporal [31], phase [26], and burst [34].Poisson coding generates a spike train over multiple time-steps where the number of spikes is approximately proportional to the pixel intensity of the input image.In practice, we compare each pixel value with a random number [0, 255] at every time-step.If the generated random number is less than the pixel intensity, the Poisson spike generator does not produce spikes, otherwise, it generates a spike with amplitude 1.The generated spikes are then passed through an SNN.
Leaky-Integrate-and-Fire Neuron: Leaky-Integrateand-Fire (LIF) neuron is the main component of SNNs.The internal state of an LIF neuron is represented by a membrane potential U m .As time goes on, the membrane potential decays with time constant τ m .Given an input signal I(t) and an input register R at time t, the differential equation of the LIF neuron can be formulated as: This continuous dynamic equation is converted into a discrete equation for digital simulation.More concretely, we formulate the membrane potential u t i of a single neuron i as: where, λ is a leak factor, w ij is the weight of the connection between pre-synaptic neuron j and post-synaptic neuron i.
If the membrane potential u t−1 i exceeds a firing threshold θ, the neuron i generate spikes o t−1 i , which can be formulated as: After the neuron fires, we perform a soft reset, where the membrane potential value is lowered by threshold θ.Because of this non-differentiable firing behavior, training SNNs with gradient learning is a huge challenge [37].To address this issue, previous studies [32,28] approximate the backward gradient function (e.g., piecewise linear and exponential) to implement gradient learning.Fig. 1 illustrates the membrane potential dynamics of an LIF neuron.

Methodology
In this paper, we visualize the internal spike behavior of two representative and widely-used training methods: surrogate gradient training [27] and ANN-SNN conversion [41].Since ANNs can be trained with well-established optimization methods and frameworks, SNNs from ANN-SNN conversion shows reliable performance on very largescale datasets (e.g., ImageNet).In contrast, most surrogate gradient training methods are limited to small datasets (e.g., MNIST and CIFAR10) due to approximated backward gradients.These simple datasets are too small to be analyzed by visualizing heatmap.But, the authors in [27] recently proposed temporal adaptive batch normalization (BN) for surrogate gradient learning, enabling training on larger datasets such as CIFAR100 and Tiny-ImageNet.We exploit this algorithm for the case study of surrogate gradient training to compare with ANN-SNN conversion on Tiny-ImageNet dataset.

Surrogate Gradient Backpropagation
The SNN-crafted BN layer [27], called Batch Normalization Through Time (BNTT), improves training stability and reduces latency while preserving classification accuracy.We add the BNTT layer before an LIF neuron.Therefore, the weighted pre-synaptic input spikes are normalized as: where, γ t i is a learnable parameter in the BNTT layer, is a small constant for numerical stability, the mean µ t i and variance σ t i are calculated from the samples in a mini-batch for each time step t.We append all intermediate layers of an SNN with a BNTT layer.At the output layer, we set the number of output neurons to the number of classes C. To prevent information loss from the leakage of a neuron, we accumulate the spikes over all time-steps by fixing the leak parameter λ (Eq.2) as one.This stacked voltage is converted into probability distribution using a softmax layer.Finally, we compute the cross-entropy loss as: Here, y i represents the ground truth label, and T is the total number of time-steps.Then, we accumulate the backward gradients over all time-steps, which is called back-propagation through time (BPTT) [32].The accumulated gradients at hidden layers and the output layer can be represented as: l: output layer (6) where, W l , O l and U l stand for weight matrix, output spike matrix, and membrane potential matrix at layer l, respectively.As the output layer does not generate spikes, we compute the exact derivative of the loss L with respect to the membrane potential u T i : However, for hidden layer, the gradient term is not differentiable due to the firing behavior of an LIF neuron.Therefore, ∂o t i ∂u t i should be formulated with an approximated continuous function (Fig. 1).To this end, we use a piecewise linear function: where, β is a scaling factor for the gradient value.We set β as 0.3 to prevent a gradient exploding problem.Based on the gradient value, the weights of SNNs are updated.

ANN-SNN Conversion
We use [41] for implementing the ANN-SNN conversion method.They normalize the weights or the firing threshold (θ in Eq. 3) to take into account the actual SNN operation in the conversion process.The overall algorithm for the conversion method is shown in Algorithm 1. First, we copy the weight parameters of a pre-trained ANN to an SNN.Then, for every layer, we compute the maximum activation across all time-steps and set the firing threshold to the maximum activation value.The conversion process starts from the first layer and sequentially goes through deeper layers.Note that we do not use BN [22] since all input spikes have zero mean values.Also, following the previous works [16,41,12], we use Dropout [47] for both ANNs and SNNs.if ltmp < l then 10:

Algorithm 1 ANN-SNN Conversion
11: // Threshold update for each layer end for 17: end for

SNN-crafted Grad-CAM
Grad-CAM [40] highlights the region of the image that highly contributes to classification results.Grad-CAM computes a backward gradient from the classifier logit to the pre-defined target layer.After that, channel-wise attention value is obtained by using global average pooling.Based on this, the final heatmap is defined as the weighted sum of feature maps.Different from conventional deep neural networks, SNNs take spike trains across multiple time-steps.Therefore, we can compute multiple SNN-crafted Grad-CAMs across the total number of time-steps T .Similar to Grad-CAM, we quantify the contribution of each channel by accumulating gradients across all time-steps: Here, N is a normalization factor, and A k ij,t is the activation value of the kth channel at time-step t, and (i, j) is the pixel location.Note that we use a ground truth label c for a given image to compute the heatmap.Therefore, the channel-wise weighted sum of spike activation can be calculated as: For a clear comparison with conventional Grad-CAM, we call G c ij,t as "SNN-crafted Grad-CAM" afterward.However, SNN-crafted Grad-CAM suffers from what we term as a "heatmap smoothing effect" caused by the approximated backward gradient function.To visualize the heatmap at shallow/initial layers, the gradients need to pass through multiple layers using the approximated backward function (Eq.8).The accumulated approximation error induces non-discriminative heatmap as shown in Fig. 2. Note that the beginning and end of time-steps have few spike activity [27] resulting in heatmaps with zero values (see Fig. 2).To validate the "heatmap smoothing effect" quantitatively, we compute the pixel-wise variance of the heatmap.Thus, the heatmap containing non-discriminative information (i.e.similar pixel values) should have lower variance.In Fig. 3, SNN-crafted Grad-CAM shows lower variance compared to our proposed SAM (will be discussed in the next section).Note that there are multiple heatmaps (one heatmap per time-step) in SNN visualization.So, we use the maximum variance value across all time-steps in Fig. 3. Further, we note that the visualization in both SAM and SNN-crafted Grad-CAM in Fig. 2 varies across each time-step underlying the fact that the SNN looks at different regions of the same input over time to make a prediction.Overall, the visualization tool for SNNs requires a new perspective that can circumvent the error accumulation problem in backpropagation.

Spike Activation Map (SAM)
We present a new paradigm for the visualization of SNNs.We do not use any class label for backpropagation, and only use the spike activity in forward propagation.Thus, this heatmap is not just for a specific class but highlights the regions that the network focuses for any given image.Surprisingly, we observe that SAM shows meaningful visualization even without any ground truth labels (see Section 5.2).Mathematically, our objective can be formulated as to find a mapping function f (•): (a) Surrogate Gradient (b) Conversion Figure 3. Pixel-level variance in heatmaps obtained from surrogate gradient learning and conversion.We report the average variance from the total samples in Tiny-ImageNet.For all scenarios, SAM shows a higher hearmap variance compared to SNN-crafted Grad-CAM that suffers from the approximated backward gradient.
where, M t is SAM and S t is spike activity at time-step t.
In this paper, we use the biological observation that spikes having short inter-spike interval (ISI) highly contribute to the neural decision process [36,46,45].This is because short-ISI spikes are more likely to stimulate post-synaptic neurons, conveying more information [4,30,46].To apply this to our visualization method, we first define the temporal spike contribution score (TSCS) which evaluates the contribution of previous spike at time t to the current time t in the same neuron.The TSCS value can be formulated as: where, γ is a hyperparameter which controls the steepness of the exponential kernel function.
To consider multiple previous spikes, we define a set P k ij that consists of previous firing times of a neuron at location (i, j) in kth channel.For every time-step, we compute a neuronal contribution score (NCS) N k ij,t at time-step t, by adding all TSCS of spikes in P k ij : Thus, a neuron has a high NCS if large number of spikes are fired in a short time interval and vice-versa.Finally, we calculate the SAM heatmap M ij,t at time-step t and location (i, j) by multiplying spike activity S ij,t with NCS value N ij,t : We illustrate the overall flow of SAM in Fig. 4. For every neuron, we compute NCS and add the values across the channel axis in order to get SAM.To elaborate, we depict two examples (case A and case B) for calculating NCS.In case A, the previous spikes fire at time-step t p1 and t p2 , a long time before the current spike time t.As a result, the contribution of previous spikes is small due to the exponential kernel.On the other hand, in case B, t p1 and t p2 are close to the current spike time t.In this case, the neuron has a high NCS value.For each channel, we compute a neuron-wise contribution score.After that, we sum all neuronal contribution score (NCS) map in the channel axis.The NCS for each neuron is based on the previous spike trajectory.For every spike, we define temporal spike contribution score (TSCS) with an exponential kernel.We take into account TSCS from previous spikes in order to compute NCS.
Discussion: Overall, without any requirement for backpropagation and ground truth labels, we can visualize the discriminative region by using the concept of inter-spike interval.We would like to note that SAM cannot be applied to ANNs due to the real-valued static nature of input processing in ANNs.So far, SNNs have been explored as an energy-efficient alternative to ANNs.With SAM, for the first time, we bring out the interpretability advantage of the temporal dynamics in SNNs over static ANNs.We assert that the proposed SAM is hardware friendly since all computations are in forward propagation.Therefore, our SAM can be used as a practical interpretation tool for future neuromorphic computing applications.

Experimental Setup
Dataset and Network: To conduct a comprehensive analysis, we carefully select the dataset for our experiments.This is because smaller datasets such as MNIST, CIFAR10, and CIFAR100 have too low resolution (e.g., 28 × 28 or 32 × 32) to visualize.ImageNet dataset has a high image resolution but directly training SNNs with surrogate gradient becomes hard.Therefore, we conduct a case study on the Tiny-ImageNet which is the modified subset of the original ImageNet dataset.Tiny-ImageNet consists of 200 different classes of ImageNet dataset [10], with 100,000 training and 10,000 validation images.The resolution of the images is 64 × 64 pixels.Our implementation is based on Pytorch [35].We adopt a VGG11 architecture for both ANNs and SNNs.For ANN-SNN conversion method, we use 500 time-steps with firing threshold scaling [16].For surrogate gradient training, we train the networks with standard SGD with momentum 0.9, weight decay 0.0005, time-steps 30.The base learning rate is set to 0.1.We use step-wise learning rate scheduling with a decay factor 10 at [0.5, 0.7, 0.9] of the total number of epochs.We set the total number of epochs to 90.We set the leak factor of SNN with surrogate gradient learning and conversion to 0.99 and 1, respectively.For visualization, we uniformly sample 10 images for both surrogate gradient learning and conversion.
Evaluation Metric: To quantitatively compare the SAM visualization of conversion and surrogate gradient, we use Grad-CAM obtained from ANNs as a reference.To quantify the error between SAM and Grad-CAM, we compute the cross entropy function between the predicted SAMs (one SAM for one time-step) and a Grad-CAM from ANN at every time-step.Then we select the minimum error across all time-steps and define the minimum value as a localization error.

SAM: Unsupervised Visualization Tool
In Fig. 5, we visualize the qualitative results of SAM on SNNs trained with surrogate learning as well as ANN-SNN conversion.We also show the Grad-CAM visualization obtained from a corresponding ANN for reference.Note that SAM does not require any class label (unsupervised) compared to Grad-CAM that uses ground truth class labels.Interestingly, heatmaps obtained from SAM across different time-steps on SNNs shows a similar result with Grad-CAM on ANNs where the region of interest is highlighted in a discriminative fashion.This supports our assertion that SAM is an effective visualization tool for SNNs.Moreover, the results imply that ISI and temporal dynamics can yield intepretability for deep SNNs.

Surrogate Gradient Learning vs. Conversion
We compare the SAM visualization results of surrogate gradient learning (Fig. 5(c)) and conversion (Fig. 5(d)).From the figure, we observe a trend in the heatmap visualization of surrogate gradient learning with zero activity at early time-steps leading to discriminative activity in the mid-range followed by zero activity again towards the end.In contrast, conversion maintains similar heatmaps during the entire time period.This is related to the variation in spike activity for each time-step as shown in Fig. 6(b).Since surrogate gradient learning considers a temporal dynamic during training [27,37], each layer passes the information (i.e., the number of spikes) consecutively.On the other hand, conversion does not show any temporal propagation.Moreover, we observe that surrogate gradient learning has more accurate (i.e.similar to Grad-CAM from ANN) heatmaps highlighting the region of interest across all layers.Notably, the conversion method highlights only partial regions of the object (e.g., lemon) and in some cases (e.g., bird) the wrong region.This observation is supported by the localization error comparison in Fig. 6(a).For all layers, surrogate gradient learning shows lower localization error.It is well known and evident that conversion methods do not account for any temporal dynamics during training [37].We believe that this missing temporal dependence accounts for less interpretability.Thus, we assert that SNNs obtained with surrogate gradient learning (incorporating temporal dynamics) are more interpretable.Therefore, all visualization analyses in the next sections focus on the surrogate gradient learning method.

Intermediate Layers of SNN
So far, no studies have analysed the underlying information learnt in different layers of an SNN.It has been always assumed that SNNs like ANNs learn features in a genericto-specific manner as we go deeper.For the first time, we visualize the explanations at intermediate layers of SNN using SAM to support this assumption.In Fig. 5 (see SAM-Surrogate results), the SAM visualization shows that shallow layers of SNNs represent low-level structure and deep layers focus on semantic information.For example, layer 4 highlights the edges or blobs of the lion, such as eyes and nose.On the other hand, layer 8 highlights the full face of the lion.

Effect of Leak in SNN
We analyze the effect of leak factor λ (Eq.2), one of the important parameters in SNNs.The leak parameter λ (0 < λ ≤ 1) controls the forgetting behavior of LIF neurons similar to the human brain.We note that high λ means less forgetting.In order to explain the effect of leak on visualization, we measure the localization error for different leak values [0.7, 0.8, 0.9].Table 1 shows that high leak parameter λ achieves low localization error.This is because a low λ forgets the stored voltage in a neuron (i.e.information) within a few time-steps and thus cannot produce any reasinable spike activity or visualization.We also compute the classification accuracy on Tiny-ImageNet in Table 1.The results show that low λ induces a drastic accuracy drop due to the excessive forgetting behavior.Overall, appropriate leak selection is important to achieve accurate localization/visualization as well as performance.

Effect of Hyperparamter γ
We conduct ablation studies to understand the effect of hyperparameter γ on SAM in Eq. 12.The γ value decides the steepness of the exponential kernel function in TSCS.A kernel with high γ takes into account recent spike trajectory, where as low γ considers long-period spike trajectory.In Fig. 8, we visualize the localization error with respect to γ for different layers in VGG 11 for conversion and surro-   gate gradient methods.For both methods, γ = 0 shows the highest localization error since the kernel does not filter redundant long ISI spikes.Another interesting observation is that the localization error increases for large gamma value (e.g., 1.0).This is because high γ limits reliable visualization by considering only very recent spikes and ignores spike history to a great extent.

Adversarial Robustness of SNN
Previous studies [43,42] have shown that SNNs are more robust to adversarial inputs than ANNs.In order to observe the effectiveness of SNNs under attack, we conduct a qualitative and quantitative comparison between Grad-CAM and SAM.We attack both ANN and SNN using FGSM attack [15] and SNN-crafted FGSM attack [43] with = 4 255 .In Fig. 7, we can observe that Grad-CAM shows large change before/after attack.On the other hand, SAM shows almost similar results.Moreover, we show the classification accuracy with respect to the attack intensity, and normalized L1-distance between heatmaps of clean and adversarial images at = 4 255 , in Fig. 9.The results show that SNN is more robust than ANN in terms of both accuracy and visualization.Therefore, using SNN with SAM for a secured system (e.g., military defense) will be a huge advantage in terms of robust interpretation.

Sensory Suppression Behavior of SNN
Neuroscience studies have suggested that human brain undergoes [24,25,50] "sensory suppression".That is, the brain focuses on one of multiple objects when these objects are presented at the same time.Co-incidentally, with SAM, we observe that SNNs also emulate sensory suppression when presented with multiple objects.To show this, we concatenate two randomly chosen images from Tiny Im-ageNet dataset and pass the concatenated image into the SNN trained with surrogate gradient learning.Interestingly, as shown in Fig. 10, the results show that neurons compete in the earlier time-steps for attending to both objects and finally focus/attend on only one of the objects at later timesteps.Note, for each image, the final prediction from the SNN matches the final attention shown by SAM.These results unleash the bio-plausible characteristics of SNNs and also establish SAM as a suitable interpretation tool.

Conclusion
In this paper, we propose a visualization tool for SNNs, called SAM.This is different from a conventional ANN visualization tool since SAM does not require any target labels and backpropagated gradients.Instead, we use the temporal dynamics of SNNs to compute a neuronal contribution score in forward propagation based on the history of previous spikes.Without any label, SAM highlight the discriminative region for prediction.Through extensive experiments, we show the functionality of SAM in various configuration of SNNs.Overall, SAM opens up the possibility towards interpretable neuromorphic computing.

Figure 1 .
Figure 1.The illustration of forward propagation (blue arrow) and backward propagation (red arrow) of an LIF neuron.During forward propagation, the membrane potential increases according to the pre-synaptic spike input.If the membrane potential exceeds the firing threshold, the LIF neuron generates the post-synaptic spike with the membrane potential reset.This integrate-and-fire behavior induces the non-differentiability of the membrane potential.Therefore, surrogate gradient functions are used to implement the backward gradient.

Figure 2 .
Figure 2. Visualization of SNN-crafted Grad-CAM and SAM at Conv4 in VGG11 on Tiny-ImageNet dataset.We use surrogate gradient training (the conversion method also shows similar results).The approximated backward gradient function in SNNcrafted Grad-CAM induces "heatmap smoothing effect".In contrast, the proposed SAM visualization highlights the discriminative region of the image.

Figure 4 .
Figure 4. Illustration of spike activation map (SAM).For each channel, we compute a neuron-wise contribution score.After that, we sum all neuronal contribution score (NCS) map in the channel axis.The NCS for each neuron is based on the previous spike trajectory.For every spike, we define temporal spike contribution score (TSCS) with an exponential kernel.We take into account TSCS from previous spikes in order to compute NCS.

Figure 5 .Figure 6 .
Figure5.Visualization of the internal spike representation of VGG11 using SAM at layer 4, layer 6, and layer 8.We show the visualization for 10 uniformly sampled time-steps.It is worth mentioning Grad-CAM exploits ground truth labels but our SAM can be obtained without any label information.

Figure 7 .
Figure 7. Visualization of robustness of SAM.(a) Original image.(b) Grad-CAM from ANN on clean input.(c) Grad-CAM from ANN on adversarial input.SAM from SNN trained with surrogate gradient learning on (d) clean inputs and (e) adversarial inputs.

Figure 9 .
Figure 9. Classification accuracy with respect to the FGSM intensity.We also compute the normalized L1 distance between heatmaps for clean and adversarial inputs at = 4 255 .For SNN, we report the maximum difference among multiple time-steps.

Figure 10 .
Figure 10.Visualization of SAM for multi-object images.

Table 1 .
Ablation studies on leak factor λ. We show localization error and classification accuracy with respect to λ.