Abstract
Extracellular data analysis has become a quintessential method for understanding the neurophysiological responses to stimuli. This demands stringent techniques owing to the complicated nature of the recording environment. In this paper, we highlight the challenges in extracellular multielectrode recording and data analysis as well as the limitations pertaining to some of the currently employed methodologies. To address some of the challenges, we present a unified algorithm in the form of selective sorting. Selective sorting is modelled around hypothesized generative model, which addresses the natural phenomena of spikes triggered by an intricate neuronal population. The algorithm incorporates Cepstrum of Bispectrum, ad hoc clustering algorithms, wavelet transforms, least square and correlation concepts which strategically tailors a sequence to characterize and form distinctive clusters. Additionally, we demonstrate the influence of noise modelled wavelets to sort overlapping spikes. The algorithm is evaluated using both raw and synthesized data sets with different levels of complexity and the performances are tabulated for comparison using widely accepted qualitative and quantitative indicators.
Introduction
Neurophysiological studies are of paramount importance in revealing the underlying behaviours and properties of neurons and eventually providing a good understanding of the human nervous system. The studies have proved to be very important in the development of neuroprosthetics and Brain Machine Interface (BMI) devices. As an example, intraneuronal recordings from the primary motor cortex have been investigated to develop neural decoders that can eventually drive artificial prostheses or machines^{1}. Further, the contribution of these studies in understanding neurological disorders are extremely valued, especially, the use of intracranial electrodes to gather information pertaining to epileptic patients^{2}. Indeed, MEA’s have been employed to understand the influence of gammaprotocadherine, which regulates the endurance of a neural network and the generation of new synapses^{3}.
One of the key aspects of neurophysiological studies involves the tapping of intraneuronal signals, so as to decipher the neural networks collective behaviours without disrupting their natural functioning. Extracellular recordings are the preferred techniques to aid in neurophysiological studies, and the recordings can be mainly grouped into two categories, that is: invivo (invasive) and invitro (noninvasive). Invivo recording techniques use a microelectrode like probe or a tetrode (probe with four electrodes) to be surgically implanted onto a region of observation, in which intraneuronal activities are recorded^{4}. In contrast, invitro recording techniques use a microelectrode array (MEA) with the cell samples cultured in a petri dish^{5}. Similarly, active cell specimens from animals are collected and placed on the microelectrodes from which the intraneuronal activities are recorded^{6}.
Problem Statement
Irrespective of the recording techniques used, the intricate nature of the nervous system poses major problems during tapping and processing of intraneuronal signals. The main attribute of any intraneuronal activity is the pattern made up of action potential followed by a refractory period, which is referred to as a neuronal spike^{7,8}. A major problem associated with the processing of any intraneuronal recording is that each electrode is subjected to more than one neuronal activity at any instance^{9}. The electrode closer to a neuron renders stronger signals to be picked up by the channel, whilst action potential from neighbouring neurons superimposes upon the stronger ones contributing as noise to the channel. Additionally, noise may also be contributed by the recording unit, processing system and the surrounding environment^{10} resulting in uncharacteristic spike events. To interpret the collective behaviour it becomes imperative to distinguish each neurons activity both in time and space under reduced influence of noise.
Existing Procedures and Drawbacks
The process of identifying the number of neurons and the spike times associated with each neuron is referred to as spike sorting. The major challenges in our sequential spike sorting algorithm can be broadly summarized as follows^{4,9}.
Detecting the instances at which the spiking activity appears in each voltage channel of the recording device;
Extracting the spike shapes for clustering; and
Estimating overlapping spikes.
Each of the aforementioned steps depend on the results of its preceding steps. The precision of results at each step is of utmost importance as any error accumulates through every step, thus degrading the performance of the algorithm. Simple spike detection techniques employ a hard thresholding, which is a straightforward technique. Each channel voltage is examined and through visualisation a statistical estimator such as: standard deviation of the channel signal^{8,11}, root mean square^{4,12} or standard deviation of the background noise^{13,14} is employed to identify a spike event. Generally, the performance of these simple techniques degrades under low SignaltoNoise Ratio (SNR)^{15}. A more comprehensive method is described in ref. 16 where a windowbased spike detector is proposed. Specifically, a preset window of defined duration scans for a positive peak followed by negative peak and corresponds the span to be a spike waveform. Nevertheless, the extracted spike duration represents only one spike and any secondary positive peaks will be ignored. Overlapped spike shapes with their unrepresentative appearance, as compared with typical spike shapes are considered as noisy or distorted waveforms.
Assuming that the spikes were detected comprehensively, appropriate feature vectors are required to be identified prior to clustering. Initially, spike shapes are extracted using windowed discriminators, as discussed in the previous paragraph. And to highlight the features of interest, the extracted spikes are subjected to a suitable transformation. Principle component analysis has been widely adopted in selecting the prominent features^{4,17}. Alternatively, a random number of features that display higher deviation from normality, following a normality test, are chosen as the inputs for clustering^{13,18}. It has been observed that the deviation results are dependent on the transformation method used on the data set and no standard is defined to test the quality of the selected features. Besides that, a poor data transformation could result in poor selection of features and could hinder the performance of clustering.
Standard clustering algorithms including kmeans, partition around medoids (PAM) and hierarchical clustering require prior knowledge of number of clusters, that is the K value^{19}. The lack of ground truth forces one to use bruteforce experiments to approximate the Kvalue. This dilemma of choosing an optimal K value leaves the general clustering algorithms not well disposed to sort neuronal spikes. Besides, many shortcomings of kmeans are described in ref. 20, making it a even less favourable spike sorting method. To overcome these shortcomings, many novel clustering algorithms have been proposed, such as Wave_clus, which is an opensource spike sorting program that incorporates the super paramagnetic clustering (SPC) algorithm^{13,21}. Klustakwik is another opensource software based on genome clustering and CEM which uses the idea of classification by expectation maximization^{4}. The Ordering Points To Identify Clustering Structure (OPTICS) is another algorithm^{22} developed to compensate the complex feature selection processes^{8}. Despite the availability of such power clustering procedures, they are often vulnerable under overlapping spike shapes.
Traditional approach employ Mclust, a manual procedure of forming clusters. Further, the clusters are visually examined and a clear uncontaminated spike shape is identified to represent the subset of a spike waveforms^{11}. The representative spike waveforms were subtracted from an overlapping event and the one which resulted in the least channel voltage error was considered as the best match. This manual clustering and the arbitrary matching procedure based on spike waveforms voltage distribution across a channel, would not yield a good result under complex neuron populations and long recording durations. This shortcoming is addressed in ref. 23, where the background noise covariance is used to enhance the principal spike component in a channel, by targeting some specific spikes shapes. The model proposed in ref. 8 is based on the maximum likelihood estimation incorporating noise covariance characterization into their matching process. However, the algorithms lacks a definitive generative model, does not address any previous ground truth estimation processes and, as such the model is less attractive.
A more sophisticated model is proposed in refs 24 and 25, which assumes the channel voltage to be the result of a convolution between impulse spike train and spike shapes. The process also considers background noise distinction and greedy matching procedures to segregate overlapping spikes. The basic notion of the greedy matching procedure simply identifies peaks as spike events and tends to recognize any detected events into a predefined groups formed by clustering algorithms. The lack of a comprehensive thresholding technique means that the greedy matching procedure identifies far too many false positives.
One common drawback of all the aforementioned models is that they fail to rationally address the spike event detection. Therefore, when the channel SNR is low these models perform ineffectively.
Proposed Approach and Improvements
The proposed selective sorting model addresses the problem of spike detection in novel way. We have adapted the concept of Cepstrum of Bispectrum (CoB) based spike detection owing to its effective results even at a low SNR^{26}. Windowbased spike waveform extractors are used to extract spike shapes in the detected region^{16}. Instances at which spikes are detected are stored as index information, which is later used during statistical estimation. The spikes are subjected to wavelet transformation and a test of normality is employed to choose the best features for clustering^{18}.
A novel probability density function (pdf) based technique is introduced to visually examine the quality of the chosen features. The chosen features are subjected to three different clustering procedures including SPC^{21}, Klustakwik^{25} and OPTICS^{22,27,28}. The results are statistically compared, and the one with a better overall score is chosen for estimation of overlapping spikes. The statistical estimation introduced here incorporates the principal spike shape employing linear regression, noise distinction based filtration technique, which is followed by matching and iterative elimination through a model formulated in refs 8 and 23. The advantage of this algorithm is, instead of greedily choosing the spikes with the maximum peak^{8,24,25} the algorithm is restricted to only those indices detected during spike detection.
Selective Sorting Algorithm
Figures 1 and 2 summarizes the Smith and Mtetwa’s^{29} model for generating extracellular signals which is used as a basis for tailoring the proposed algorithm. An overview of the selective spike sorting algorithm is described in Fig. 3. Accordingly, the voltage information v(t) on a single channel can be formulated using (1) as
where n is the number of neurons assumed to be spiking over discrete time t, N is total number of neurons under consideration, s_{n} is the spike train of the nth neuron and η(t) is noise. Spike train s_{n} can be realized as representing the voltage information for any nth neuron without system noise η as
where w_{n}(τ) is a distinct spike waveform associated with the nth neuron, τ is the length of each spike waveform, which is approximately 2.5 ms or 60 samples at 24 kHz sampling rate. The proposed algorithm carefully analyses the synthesizing process described by 1 and 2. At every step, we aim to distinguish the input impulse sequence δ_{n}(t) from the extracellular voltage ν(t). To facilitate this process establishing logical ground truths is very important, which can be conveyed by an approximate estimate of the number of neurons and their spiking events δ_{n}(t).
With ground truth estimation as our foremost motivation and, considering the fact that ν(t) is made up of a combination of spike train s_{n}(t) from n neurons and system noise η(t), as described in 1. We examine the spike sorting process by assuming an ideal case i.e. in a noisefree environment where only a single neuron is actively spiking during the recording process. As such, for n = 1 and η(t) = 0 in 1, ν(t) = s_{n}(t) = s(t) and . Extending this method for real case scenarios, where more than one neurons have contributed to the channel voltage, represents a combined input sequence, i.e. if n ≥ 1 then .
Given , one simple method to segregate the combined sequence into n neuronal sequence δ_{n}(t) is to identify all M spike shapes and group the similar ones based on a clustering procedure. It should be noted that not all spike shapes can be grouped as they include many overlapped spike shapes. The defined groups are used as a base to approximate the n value that is δ_{n}(t) from 2. This is under the assumption that represents the mean or an average spike shape for each group similar to w_{n}. This leads to the probability of finding any nth spike waveform w_{n} given which follows a simple likelihood principle as
For a valid the chances of finding the nth neuron follow the Bernoulli’s principle as
where f refers to the transfer function defined by 2. By estimating the likelihood of all n neurons at all , the one with the highest probability constitutes the ideal choice, which reflects on the respective δ_{n}(t) accordingly.
Methods
We summarize the concept of CoBbased spike detection from^{26}. The CoB technique assumes a model where the channel voltage ν(t) is the resultant of a binomial incoming process δ_{n}(t) filtered through f(t), filter transfer function made up of the intraneuronal spike shape w_{n}(τ) and the spike transfer characteristic defined in 2, summed with noise η(t). By filtering ν(t) through an inverse filter f^{−1}(t), the input sequence can be recovered along with its noisy component.
Estimating the combined impulse sequence
If f(t) represents the filter transfer function in the time domain whereas F(n) in the frequency domain obtained by taking the Fourier transform of f(t). Bispectrum B_{ν}(n, m) of voltage ν(t) is estimated by taking the third moment of 2dimensional Fourier transformation at frequency components n and m using 5 as
Ceptrum can be computed by taking an inverse Fourier transform in logarithm of 5 at frequency m as
or by assuming V to be result of 2 in which case 6 can be rewritten as
where ξ represents the skewness of δ(t). The term F(n) can be computed by solving 7 for log as
Now f^{−1}(t) can be computed by taking the inverse Fourier transform of F^{−1}(n). To recover the input sequence δ_{n}(t) from its noise term, the filtered signal is further subjected to a stationary discrete wavelet transform using the coiflet wavelet δ_{n}(t) owing to the fact that CoB just identifies the spike times but does not segregate it to their respective neurons.
As an example, we synthesize a single channel voltage data v(t) using the model described in ref. 29. For each of the distribution pattern shown in Fig. 4(a) our synthesizing model uses two distinct spike shapes displayed in Fig. 4(b). The model establishes the stringent case of overlapping and the capability of CoB to identify two overlapping spike events with a near synchronous overlap as demonstrated in Fig. 4(c). The resulting voltage v(t) resembles as shown in Fig. 4(d).
Establishing and approximating the ground truths
From 2, the recovered sequence is modelled such that the spiked instance t is set to 1 as presence and 0 as absence. The following procedure relays the methodology to distinguish this sequence into their respective neurons δ_{n}(t).
Spike waveforms extraction
The windowed discriminator technique similar to the one described in ref. 16 is used to extract spike waveforms. For each instance of a waveform of approximately samples are extracted where τ ranges between 1 and . Secondary spike found within are neglected and only the maximum peak for each waveform is considered. For uniformity and simplification of the feature selection procedure the waveforms are organised with all their peaks lined up as shown in Fig. 5.
Construction of feature set
Feature selection follows the same procedure as described in ref. 13, where all M number of extracted spike waveforms, , are decomposed using Haar wavelets. The feature set is constructed by identifying 10 best features, which have a better deviation from normality. It is also possible that while some features are identified with better results in KStest, they do not favour clustering. Therefore, each identified feature set is cross verified by plotting against their respective pdfs using the following expression
where, ψ_{τ} represents the wavelet transform at any τ over the dimension M, while and are the respective standard deviation.
Clustering
Three major clustering algorithms specially designed to enhance spike sorting are described. It is worth pointing out that real spike data always differ from synthetic spike data in terms of SNR, ground truths, and their unknown statistical distribution. The features selected in the previous step are subject to all the following clustering algorithms to better approximate the ground truths.
Superparamagnetic clustering
Superparamagnetic clustering (SPC) is devised around the concept of the ising Model^{27,30} as in a chemical bonding of any lattice structure. Instead of restricting the number of states to just two (+ or −) qstates are introduced as in the potts model^{31}. Each of the M waveforms is initially assigned to one of the qstates. The Euclidean distance e_{i,j} of all M spikes are estimated and the shortest path is derived by assuming that the neighbours are formed within a specified K value as
where i, j = 1, 2, ··· M. The interaction strength J_{i,j} is evaluated as
where is the average neighbours identified for each i and j and a is the average of e_{ij}^{27}. The main feature of SPC is that by varying temperature T from low to high value over several Monte Carlo simulations, the system undergoes magnetisation changes traversing from the ferromagnetic state to the paramagnetic state. The states simultaneously flip and take up a different q–value and the new states are defined at the superparamagnetic phase forming the required clusters^{30,31}. The probability of two neighbouring features sets change their states s_{i} and s_{j} to a new state, which is determined by the thermal average of pointtopoint correlation function as is defined in refs 12 and 13. This probability is expressed as
Klustakwik
Klustakwik has been an integral part of the spike sorting algorithm proposed in refs 12 and 32. The CEM algorithm by Celeux and Govaert^{33} is used as a platform to construct Klustakwik ver–1.5 an unsupervised clustering algorithm. The Expectation Maximisation method is adopted to estimate the maximum likelihood by incorporating classification between the expectation and maximisation method.
Klustakwik partitions a feature set into K partitions by iteratively splitting a defined cluster or deleting and reassigning points from a cluster and simultaneously monitoring whether any of the actions improve the performance. Many versions of Klustakwik have been developed with its predecessor constructed on the genome sequence clustering (gclust) platform and are publicly accessible^{4,28}.
OPTICS
The OPTICS algorithm was developed to abstain from using the tedious feature selection processes^{8}. This algorithm partitions the data solely on the basis of set theory and Euclidean distance between the waveforms w_{n}(τ). A prespecified number of the minimum data samples K to be present in any group is used to estimate the boundary. The border samples which do not comprehensively satisfy the boundary conditions are not grouped^{22}.
Statistical estimation
From 3 to estimate the likelihood, we need to establish a subject which clearly distinguishes noise from the real data by isolating the overall noise characteristics from the clustered information. Putative spike waveforms provide a platform to extract the noise attributes and by filtering ν(t) through a transfer function modelled using the covariance of noise. It is possible to enhance the components of in ν. The likelihood L_{n} is computed for each n neuron. Following the Bernoulli’s principle an n^{th} L with better probability is considered as a valid match and its respective sequence δ_{n}(t) is set to 1 to acknowledge the presence of spike. The statistical estimation steps for sorting overlapped spikes can be summarized as
Estimating putative spike waveforms .
Estimating noise ν_{Δ}(t) and its covariance.
Computing the coiflet type filter transfer function and filtering the channel voltage .
Identifying the nonclustered index from and performing a likelihood estimation for the probability of finding the nth waveform and updating the respective input sequence δ_{n}(t).
Estimating putative spike waveforms
From 2, it can be assumed that a spike train s_{n}(t) is associated with a distinct spike shape w_{n}(τ) representing an important characteristic of a neuron. Therefore, the putative spike waveforms estimated by taking an average of spike shapes from each cluster approximate closely to the distinct spike shape. The clustering information is used to calculate spike impulse response δ_{n}(t) of each neuron as
Each cluster is assumed to be originating from a particular neuron. As such, the indices of clustered spike waveforms are set to 1 and the rest to 0 over a complete t axis. The least squares linear regression method^{24} is adopted to estimate the putative spike shapes as presented in 13. This is achieved to accommodate the transfer function described in 2 as opposed to the convolution model.
where is the cross correlation between impulse response δ_{n(τ)} and the channel voltage ν(t) for any nth neuron, is the autocorrelation of δ_{n}(τ), + indicates the toeplitz matrix and\is the pseudoinverse function. The indices detected correspond to the peak of the spike waveforms action potential and not the start of the spike waveform. Therefore the length of spike waveform τ has to be carefully adjusted between −ve and +ve lags such that the peak is at the 0.
Noise covariance estimation
All the unwanted data from the channel voltage ν(t) are considered as noise. This includes system noise η(t) and the unclustered spike waveforms. A simple way to extract noise is to target those clustered indices and eliminate their spike shapes from ν(t). The residual ν_{Δ}(t) contains noise, the unclustered spike waveforms along with difference between and at each clustered index. Auto covariance of the residual noise ν_{Δ} represents the noise covariance of any particular channel as
Estimating coiflet type filter transfer function
Coiflet type wavelets are generally used to compute wavelet transform^{34,35} of a signal which can enhance the signal strength over its noise component. By filtering the channel voltage ν(t) through the coiflet filter constructed using noise covariance of a finite length m, it is possible to accentuate the presence of putative spike shape features. The Toeplitz matrix of is initially computed and the centre column from the matrix defines the coiflet transfer function f(m). Figure 6(b) shows the spike shapes extracted from filtered and Fig. 6(c) displays the estimated wavelet resembling noise characteristic filter. The filtering process has an overall effect on ν(t) therefore the putative spike shapes are reestimated to produce .
Prediction and elimination
With the new putative spike shapes , filtered voltage and target instances , the maximum likelihood estimation described in 3 can be rewritten to accommodate for L_{n}, i.e. the likelihood estimated for each n neurons as
L_{n} maximises the chances of finding from the filtered voltage at any tth instance of .
Note that δ_{n} described in 1 and 2 follows the Bernoullis principle where at any t the presence or absence of is indicated by δ_{n}(t) equal to 1 or 0. With this assumption 16 becomes
Assuming the condition for δ_{n} = 1 and substituting 4 into 17 and considering the log likelihood, we obtain
Neglecting the scaling factor of in 19 we have
It should be noted that the entire operation is performed only for all and assuming that δ_{n} = 1 for each n neurons. According to the Bernoulli principle, the matching process follows a Bayesian distribution according to which there is either a chance of being present at that instance or not. At any the maximum of the nth L wins the prediction and its respective δ_{n}(t) is set to 1.
Data
Two categories of data sets are used to test the performance of the proposed system. The data sets were chosen such that the degree of complexity could be augmented upon successful and promising results.
Synthetic data
A synthetic data set C_Difficult1_noise(xx) (abbreviated as D1n(XX)) from^{36} was used to test various phases of our algorithm. The ubiquitous nature of the availability of this data set with ground truth information makes it a popular choice for performance evaluation and comparison. Note that “XX” corresponds to the noise levels relative to the amplitude of the spike classes. Four data sets with noise levels 0.05, 0.1, 0.15 and 0.2 were employed for comparative performance analysis. The data set is constructed with three distinct spike shapes sampled at 24 kHz with a known number of neurons (3 in this case) and overlapped spikes.
Extracellular data
It is observed that invivo recordings are not stable; therefore invitro recordings are preferred in testing and development of any such algorithms^{37}. Henceforth, we chose to evaluate the algorithm using publicly accessible data sets^{36} as well as multielectrode raw data from an amphibian retina^{6,38}. The eye of an amphibian animal is enucleated and highly precision surgical equipments are used to isolate the lens of the eye and the cornea from its posterior half. The eyecup is now filtered to extract the retina specimen form the surrounding vitreous. And finally, through careful dissection the pigment epithelium is removed from the eyecup resulting in a retina specimen of approximately 1.5mm radius. The dissection was performed in ringer’s solution which will be transferred along with the extracted specimen on to the electrode array recording bin.
The recordings were performed by Meister et al.^{6,38} using a dense electrode array comprising of 61 electrodes. The electrode array was fabricated by Regehr et al.^{39}, designed by Pine and Gilbert^{40} and the development was supported by Caltech and Stanford Center for Integrated Systems. The dimension of each electrode was approximately 5 μm radius and the spacing between the electrodes was 70 μm. Figure 7 is an overview of arrangement of electrodes in the MEA. During the recording process, the isolated cells are kept alive through an oxygenated solution and the retina cells are subjected to different coloured frames projected using an RGB display monitor. The electrical activity due to the stimuli response for each frame is recorded across an MEA comprising of 61 electrodes^{6,38,40}.
To establish the data set, we examine the spiking activity of 7 cells across all 61 electrodes, the recording comprises of 1.5 million samples. To narrow the importance of our algorithm on superseding spikes we observe the activity of cells 4 and 5 as shown in Fig. 7. The traces of spikes for cells 4 and 5 have equal chances of being found on a certain channel indicated by spikes with the highest amplitude. Therefore, this channel (identified as the 28th channel) presents sufficient chances of spike shapes from either cell to supersede one another.
Results and Performance Evaluation
The algorithm is examined through both quantitative and qualitative analyses and the results are compared to evaluate the performance of the algorithm. The synthetic data sets are the preferred choice for preliminary evaluation due to the availability of the ground truth information required to stepwise testing and calibration of the algorithm. The general quantitative analyses mentioned in refs 8, 25 and 26 include calculating the number of true spikes, number of false spikes, number of neurons approximated, number of correctly identified overlapped spikes and number of correctly sorted overlapped spikes. Qualitative analysis is the best way to compare the recovered spike shapes and spike trains especially with the raw data when there is inadequate information. The raw data relies on approximated generative information at every stage of the analysis procedure. Albeit, the raw data set used in this context provides partial generative information and initial data analysis results. We have compared our results with the original results which show substantial and effective improvement over its original predecessor.
The results of spikes detected by selective sorting is compared with a number of popularly adopted techniques including noise standard deviation, root mean square of channel voltage and standard deviation of the channel^{4,8,12,13,14}. Figure 8(a) demonstrates the superiority of selective sorting to identify a larger number of spikes. Figure 8(c) shows the number of falsely identified spike events, which validates the accuracy of selective sorting to correctly identify spike events. To maintain uniformity Fig. 8(a,c) are constructed using the same data set. Figure 8(b) shows the number of spike events detected for the raw extracellular data set. Although, the noise standard deviation displays an improvement in the number of spikes detected, the higher false positive readings in Fig. 8(c) reduces the performance of the technique, making it less favourable. Additionally, Fig. 8(c) is accountable owing to its known ground truth information, which is not available for the raw data set.
Table 1 demonstrates a summary of results obtained by various ad hoc clustering approaches and the clustering procedure employed in selective sorting, along with the original information used to simulate the data set. Ideally, the spike sorting algorithms should aim to achieve the results similar to the ground truth information. The results in Table 1 compare the number of spikes allocated into any cluster and the number of clusters formed by any clustering approach for a particular data sets. For example, for the data set D1n − 0.1, Wave_clus produces 5 clusters with 639, 470, 653, 424 and, 1059 spikes respectively and, the * indicates an overlap spike cluster. The results for Wave_clus and Klustakwik were generated using the online portal^{41}, which uses features selection via the KStest. The result for OPTICS were generated using the conventional method which uses no feature selection processes^{8}. It is evident from the results in Table 1 and Fig. 9 that the clustering process employed in selective spike sorting is very effective in identifying appropriate features, defining clusters corresponding to the right number of neurons and displays better sensitivity to overlapping spikes.
The greedy pursuit algorithms: Continuous basis pursuit (CBP), binary pursuit and Bayesian algorithm based greedy algorithms fairly share the same basic principle in their methodology, albeit, the results vary marginally depending on the algorithm used. For comparison purposes, we implemented the greedy pursuit method as described in ref. 25. Figure 8(d) highlights the final results in terms of number of spikes after overall computation of selective spike sorting, clustering algorithms and greedy pursuit. The superior advantage offered by greedy pursuit and selective spike sorting to sort overlapping spikes over other clustering algorithms could be visualised from the graph in Fig. 8(d). Further, it can be inferred from the results summarized in Table 2 that, the overall performance of selective sorting is very effective owing to its impressive number of sorted spikes to number of false positive’s ratio and rightfully targeting the genuine spikes as compared to greedy pursuit.
For the raw data set, the spike detection technique employed in selective spike sorting identified 38,484 spike events and all the spike waveforms were extracted using a window of 31 samples with their peaks aligned at the 12th index, as shown in Fig. 10(a). Interpolation was employed to improve resolution of each spike waveform and appropriate attributes were extracted for clustering. Figure 10(b) shows three clusters and the overlapped spikes window, following a full exploitation of super paramagnetic clustering and feature selection. Figure 6(a) shows the putative spike shapes, estimated for each cluster using the simple regression method. The background noise was tailored to represent a coiflet type wavelet as shown in Fig. 6(c). The original data was optimised by stationary decomposition of the original data through the estimated wavelet. The spike shapes extracted using the decomposed original data are shown in Fig. 6(b).
The prediction and elimination algorithm statistically incorporates the spike rates 0.0015, 0.0007 and 0.0001 estimated for each cluster, respectively, to individually target the correlated spike shapes. The effect of the likelihood principle is displayed in Fig. 11. Figure 11(a,b) display a strong similarity with spikes in their respective clustered groups while Fig. 11(c) depicts the effect of overlapping, where the spike shapes of clusters1 and clusters2 establishes a strong correlation with those of cluster3.
The significance of the proposed algorithm can be realized through following analysis. We use the same procedure as in refs 3 and 38 which involves approximating the stimuli response of each cell to an evoked frame on the display, so as to locate the spatial and temporal response fields of the retina. Figure 12 shows two conditions of the recovered spatiotemporal stimulus response of cell 4. The image quality in Fig. 12 (top) is high as compared to that in Fig. 12 (bottom), owing to fact that Fig. 12 (Top) is constructed using 16000 stimuli responses while Fig. 12 (bottom) is constructed with just 4000 stimuli responses. The enhanced image quality in Fig. 12 (Top) was a result of algorithms capability to successfully identify and sort the overlapping and distorted spikes. This increase in true spikes to false positives ratio added extra pixels to the image, thereby receptive fields were more distinct.
One of important attributes of the algorithm is its ability to address correlated spike waveforms. A commonly accepted qualitative evaluation method in the absence of ground truth is by percentage similarity estimation i.e. coefficient of determination and another important technique is to employ correlation analysis^{24,42}. To be adaptable for either of the analysis methods, we generate a pseudo temporal voltage information similar to synthetic data generation technique described in^{36} to compete with the original data. For each identified neuron, their individual voltage contribution is estimated, in an effort that the sum of individual voltage’s should resemble the original data V_{m}. To do so, a temporal binary impulse response map δ(n, t) of each neurons spiking activity on an electrode is created. The estimated putative spike shape of each cluster on any channel as shown in 11a is fit into impulse area of temporal region such that
where m represents the channel identity or electrode number, n is the approximated neuron number and t is the time sample in integer. A channel voltage matrix at any spiking event is prepared to observe the correlation activity as shown in Fig. 10(a) for cell 4 and 10(b) for cell 5. As an example, Fig. 10 displays peak spikes on channel 28 for both cell 4 and cell 5 indicating a correlation activity. This information is in turn used to estimate their respective impulse response and eventually to calculate individual temporal spike train or voltage response using 21.
Comparing before and after processing the data V_{m} through the algorithm, correlation index is estimated for individual channels by identifying correlated channel and the respective neurons contributing to the channel. The results for the identified correlated channel 28 is shown in Fig. 13 where, identified correlation times to the right of peak in every cluster is shown in 13B and the resolved secondary amplitude is shown in 13C. Figure 13(A) also shows the algorithms ability to clearly isolate the correlated spike waveforms. Additionally, coefficient of determination was estimated for and channel data V_{m} by taking square of pearson’s correlation index. Ultimately, the coefficient of determination for correlated channels ranged between 95% to 98% similarity. The algorithm was capable of identifying additional 4682 spike events in channel 28.
Discussion and Conclusion
The ground truth information provided by the synthetic data sets has been effectively exploited to construct an efficient spike sorting algorithm. With the assistance of the generative model it is possible to explore the unknown distribution of spikes, spike train, and background noise resulting in the introduction of many key improvements which have not been possible with unknown synthetic data. The results for the synthetic and raw data clearly indicates that SPC performs better in clustering the spikes and could be employed as a default approach for establishing the partial ground truth. The selective spike sorting algorithm is convincingly effective because dependency of the performance is equally distributed throughout the unified sequence of the unit. The algorithm does not demand full identification at every ramification but rather depends on accuracy of the output. This motivation has resulted in reduced false identification, greatly reflecting on the improved efficiency of clustering algorithms. These results establish a strong rationale for the prediction and elimination methodology to work through the raw voltage data and sort the identified spike events.
Another significant advantage of selective spike sorting is that it provides the flexibility to analyse every data channel individually, irrespective of MEA or tetrode based recording. The ambiguity in selecting the appropriate cluster procedure is eliminated by the flexibility of formatted clusters offered in this algorithm. We incorporate threshold techniques to identify the spike shapes, and normalise the data as a preliminary step to improve the performance of CoB. The performance of COB is depends on amplitude of the neuron and response of spikes for bispectrum. This is really important for spike detection as its functioning is independent of shape of spike and depends on the frequency of spiking interval. The novel feature selection process and the clustering procedure bolster the spike detection procedure, establishing sufficient ground truths, and does not have to just rely on probabilistic models. The statistical estimation method exploits the wavelet featured background noise to decompose any overlapping spikes and identifies the spikes on the basis of its shape and, spike rate at any preestablished spike event. Additionally, the inter spike interval sets a quantitative threshold on the length of spike shape; therefore any secondary spikes from the same cluster appearing inside the interval are neglected. With SPC, it is also possible to isolate uncharacteristic overlapping spikes which pose a problem with Klustakwik.
The performance of selective spike sorting is not limited to processing of individual data channel. The extracellular data using the tetrode electrode, also provides intracellular spike shapes to match the spike events^{43}, which could be used as references in designing the spatiotemporal filter. Multiple channel information, as in tetrode electrodes where the number of channels is fewer than that in an MEA, it is possible to tailor the noise covariance filter to be effective both in space and time. The real time realization of the algorithm combined with wireless in vivo monitoring techniques^{44} will lead to a state of the art system. This would be a remarkable achievement and open up neurophysiological studies to whole new exploring environment.
Additional Information
How to cite this article: Veerabhadrappa, R. et al. Unified selective sorting approach to analyse multielectrode extracellular data. Sci. Rep. 6, 28533; doi: 10.1038/srep28533 (2016).
References
 1.
MussaIvaldi, F. A. & Miller, L. E. Brain machine interfaces: computational demands and clinical needs meet basic neuroscience. Trends Neurosci. 26(6), 329–334, doi: 10.1016/S0166–2236(03)00121–8 (2003).
 2.
Quian Quiroga, R. Concept cells: the building blocks of declarative memory functions. Nature Reviews Neuroscience, 13, 587–597, doi: 10.1038/nrn3251 (2012).
 3.
Lefebvre, J. L., Zhang, Y., Meister, M., Wang, X. & Sanes, J. R. GammaProtocadherins regulate neuronal survival but are dispensable for circuit formation in retina. Development. 135(24), 4141–4151, doi: 10.1242/dev.027912 (2008).
 4.
Harris, K. D., Henze, D. A., Csicsvari, J., Hirase, H. & Buzsáki, G. Accuracy of Tetrode Spike Separation as Determined by Simultaneous Intracellular and Extracellular Measurements. Journal of Neurophysiology, 84, 401–414 (2000).
 5.
Hottowy, P. et al. 512electrode mea system for spatiotemporal distributed stimulation and recording of neural activity, 7th International Meeting on SubstrateIntegrated Microelectrode Arrays Reutlingen Germany. Stuttgart: BIOPRO (2010, June 29–July 2).
 6.
Meister, M., Pine, J. & Baylor, D. A. Multineuronal signals from the retina: acquisition and analysis. J. Neurosci. Meth. 51, 95–106, doi: 10.1016/0165–0270(94)90030–2 (1994).
 7.
Litke, A. et al. What does the eye tell the brain?: Development of a system for the large scale recording of retinal output activity. Nuclear Science, IEEE Transactions On. 51, 1434–1440, doi: 10.1109/TNS.2004.832706 (2004).
 8.
Jason, S. P. et al. Fast, scalable, bayesian spike identification for multielectrode arrays. PLoS ONE, 6, e19884, doi: 10.1371/journal.pone.0019884 (2011).
 9.
Hulata, E., Segev, R. & BenJacob, E. A method for spike sorting and detection based on wavelet packets and shannon’s mutual information. J. Neurosci. Meth. 117(1), 1–12, doi: 10.1371/journal.pone.0019884 (2002).
 10.
Bierer, S. M. & Anderson, D. J. Multichannel spike detection and sorting using an array processing technique. Neurocomputing, 26/27, 947–956, doi: 10.1016/S0925–2312(99)00090–9 (1999).
 11.
Segev, R., Goodhouse, J., Puchalla, J. & Berry, M. J. Recording spikes from a large fraction of the ganglion cells in a retinal patch. Nat. Neurosci. 7, 1154–1161, doi: 10.1038/nn1323 (2004).
 12.
Takahashi, S., Anzai, Y. & Sakurai, Y. Automatic sorting for multineuronal activity recorded with tetrodes in the presence of overlapping spikes. Journal of Neurophysiology, 89, 2245–2258, doi: 10.1152/jn.00827.2002 (2003).
 13.
Quiroga, R. Q., Nadasdy, Z. & BenShaul, Y. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput. 16, 1661–1687 (2004).
 14.
Donoho, D., Johnstone, I. & Johnstone, I. M. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1993).
 15.
Smith, L. S., Shahid, S., Vernier, A., Mtetwa, N. & Franchecomté, U. D. Finding events in noisy signals. In IET Irish Signals and Systems conference, Derry Northern Ireland. Derry: IET. (2007, September 13–14).
 16.
Borghi, T., Gusmeroli, R., Spinelli, A. & Baranauskas, G. A simple method for efficient spike detection in multiunit recordings. J. Neurosci. Meth. 163, 176–180, doi: 10.1016/j.jneumeth.2007.02.014 (2007).
 17.
Wood, F. & Black, M. J. A nonparametric bayesian alternative to spike sorting. J. Neurosci. Meth. 173, 1–12 (2008).
 18.
Drezner, Z. et al. A modified kolmogorovsmirnov test for normality. Communications in Statistics  Simulation and Computation, 39, 693–704, doi: 10.1080/03610911003615816 (2010).
 19.
Wang, K., Wang, B. & Peng, L. Cvap: Validation for cluster analyses. Data Science Journal, 8, 88–93, doi: 10.2481/dsj.007020 (2009).
 20.
Ortega, J. P., Del, M., Rojas, R. B. & Somodevilla, M. J. Research issues on kmeans algorithm: An experimental trial using matlab, In CEUR Workshop Proceedings: Semantic Web and New Technologies, Puebla Mexico. Puebla: CEUR. (2009, March 23).
 21.
Blatt, M., Wiseman, S. & Domany, E. Superparamagnetic Clustering of Data. Physical Review Letters, 76, 3251–3254, doi: 10.1103/physrevlett.76.3251 (1996).
 22.
Ankerst, M., Breunig, M. M., Kriegel, H.P. & Sander, J. Optics: Ordering points to identify the clustering structure, SIGMOD Rec. 28, 49–60, doi: 10.1145/304181.304187 (1999).
 23.
Pouzat, C., Mazor, O. & Laurent, G. Using noise signature to optimize spikesorting and to assess neuronal classification quality. J. Neurosci. Meth. 122, 43–57 (2002).
 24.
Pillow, J. W., Shlens, J., Chichilnisky, E. J. & Simoncelli, E. P. A modelbased spike sorting algorithm for removing correlation artifacts in multineuron recordings. PLoS ONE, 8, e62123, doi: 10.1371/journal.pone.0062123 (2013).
 25.
Ekanadham, C., Tranchina, D. & Simoncelli, E. P. A unified framework and method for automatic neural spike identification. J. Neurosci. Meth. 222, 47–55, doi: 10.1016/j.jneumeth.2013.10.001 (2014).
 26.
Shahid, S., Walker, J. & Smith, L. A new spike detection algorithm for extracellular neural recordings. IEEE Transactions on Biomedical Engineering, 57, 853–866, doi: 10.1109/TBME.2009.2026734 (2010).
 27.
Blatt, M., Wiseman, S. & Domany, E. In Advances in Neural Information Processing Systems, Vol. 8 (eds Touretzky, D. S., Mozer, M. C. & Hasselmo, M. E.) Ch. Clustering data through an analogy to the potts model, 416–422 (MIT Press, 1995).
 28.
Kadir, S. N., Goodman, D. F. & Harris, K. D. Highdimensional cluster analysis with the masked EM algorithm. Klstakwik. Neural Comput. 26(11), 2379–2394, doi: 10.1162/NECOa00661 (2014).
 29.
Smith, L. S. & Mtetwa, N. A tool for synthesizing spike trains with realistic interference. J. Neurosci. Meth. 159, 170–180, doi: 10.1016/j.jneumeth.2006.06.019 (2007).
 30.
Binder, K. & Heermann, D. W. In Monte Carlo simulation in statistical physics: an introduction, Ch. Guide to Practical Work with the Monte Carlo Method, 69–110 (Springer, 1988).
 31.
Blatt, M., Wiseman, S. & Domany, E. Data clustering using a model granular magnet. Neural Comput. 9(8), 1805–1842, doi: 10.1162/neco.1997.9.8.1805 (1997).
 32.
Hazan, L., Zugaro, M. & Buzsaki, G. Klusters, neuroscope, ndmanager: A free software suite for neurophysiological data processing and visualization. J. Neurosci. Meth. 155, 207–216, doi: 10.1016/j.jneumeth.2006.01.017 (2006).
 33.
Celeux, G. & Govaert, G. A classification em algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14, 315–332, doi: 10.1016/0167–9473(92)90042–E (1992).
 34.
Bhatti, A. & Nahavandi, S. Stereo Correspondence Estimation Based on Wavelets and Multiwavelets Analysis in Stereo Vision (ed. Bhatti, A.), 27–48 (Intech, 2008).
 35.
Ozkaramanli, H., Bhatti, A. & Bilgehan, B. Multiwavelets from bspline superfunctions with approximation order, signal processing, 82(8), 1029–1046 (2002).
 36.
Quirogo, R. Q. Spike sorting. Scholarpedia, 2(12), 3583 (revision # 137442) (2007).
 37.
Goris, R. L., Movshon, J. A. & Simoncelli, E. P. Partitioning neuronal variability. Nature Neuroscience, 17, 858–865, doi: 10.1038/nn.3711 (2014).
 38.
Sim, S. L. et al. Simultaneous recording of mouse retinal ganglion cells during epiretinal or subretinal stimulation. Vision Research, 101(8), 41–50, doi: 10.1016/j.visres.2014.05.005 (2014).
 39.
Regehr, W. G., Pine, J., Cohan, C. S., Mischke, M. D. & Tank, D. W. Sealing cultured invertebrate neurons to embedded dish electrodes facilitates longterm stimulation and recording. J. Neurosci. Meth. 30, 91–106, doi: 10.1016/0165–0270(89)90055–1 (1989).
 40.
Pine, J. & Gilbert, J. Studies of cultured cells in dishes incorporating integral microcircuit electrodes. Soc. Neurosci. Abst. 8, 670 (1982).
 41.
Gibson, F. et al. The CARMEN Virtual Laboratory: WebBased Paradigms for Collaboration in Neurophysiology, 6th International Meeting on SubstrateIntegrated Micro Electrode Arrays, 117–120, Reutlingen Germany. Stuttgart: BIOPRO (2008, July 8–11).
 42.
Muthmann, J.O. et al. Spike detection for large neural populations using high density multielectrode arrays. Front. Neuroinform. 9(25), 1–21, doi: 10.3389/fninf.2015.00028 (2015).
 43.
Borhegyi, Z., Csicsvari, J., Mamiya, A., Harris, K. D. & Buzsáki, G. Intracellular features predicted by extracellular recordings in the hippocampus in vivo. J Neurophysiol. 84(1), 390–400 (2000).
 44.
Hasegawa, T. et al. A wireless neural recording system with a precision motorized microdrive for freely behaving animals. Scientific Reports, 5, 7853, doi: 10.1038/srep07853 (2015).
Acknowledgements
This work is fully funded by Deakin University Grants (RM25986 and RM25313) and Institute for Intelligent Systems Research and Innovation, Deakin University. We would also like to Acknowledge Prof Hiroki Asari, Markus Mister Lab, California Institute of Technology for providing multielectrode raw data of amphibian retina for our analysis.
Author information
Affiliations
Institute for Intelligent Systems Research and Innovation, Deakin University, Vic 3216, Australia
 R. Veerabhadrappa
 , C. P. Lim
 , T. T. Nguyen
 , S. Nahavandi
 & A. Bhatti
IMPACT Strategic Research Centre, Barwon Health, Deakin University, Vic 3216, Australia
 M. Berk
Department of Psychiatry & Psychology, Mayo Clinic, Rochester, MN 55905, USA
 S. J. Tye
Australian Animal Health Laboratory, CSIRO, Vic 3219, Australia
 P. Monaghan
Authors
Search for R. Veerabhadrappa in:
Search for C. P. Lim in:
Search for T. T. Nguyen in:
Search for M. Berk in:
Search for S. J. Tye in:
Search for P. Monaghan in:
Search for S. Nahavandi in:
Search for A. Bhatti in:
Contributions
A.B. and S.N. conceived the idea and the algorithm, R.V. developed the algorithm, C.P.L. and T.T.N. analysed the results, and M.B., J.T. and P.M. conducted the experiments. All authors equally participated in writing and reviewing of the manuscript.
Competing interests
The authors declare no competing financial interests.
Corresponding author
Correspondence to A. Bhatti.
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Further reading

Zika virusinduced hyper excitation precedes death of mouse primary neuron
Virology Journal (2018)

Rational design of silicon structures for optically controlled multiscale biointerfaces
Nature Biomedical Engineering (2018)

Electrophysiological evidence of RML12 mosquito cell line towards neuronal differentiation by 20hydroxyecdysdone
Scientific Reports (2018)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.