The impact of spike timing precision and spike emission reliability on decoding accuracy

Precisely timed and reliably emitted spikes are hypothesized to serve multiple functions, including improving the accuracy and reproducibility of encoding stimuli, memories, or behaviours across trials. When these spikes occur as a repeating sequence, they can be used to encode and decode a potential time series. Here, we show both analytically and in simulations that the error incurred in approximating a time series with precisely timed and reliably emitted spikes decreases linearly with the number of neurons or spikes used in the decoding. This was verified numerically with synthetically generated patterns of spikes. Further, we found that if spikes were imprecise in their timing, or unreliable in their emission, the error incurred in decoding with these spikes would be sub-linear. However, if the spike precision or spike reliability increased with network size, the error incurred in decoding a time-series with sequences of spikes would maintain a linear decrease with network size. The spike precision had to increase linearly with network size, while the probability of spike failure had to decrease with the square-root of the network size. Finally, we identified a candidate circuit to test this scaling relationship: the repeating sequences of spikes with sub-millisecond precision in area HVC (proper name) of the zebra finch. This scaling relationship can be tested using both neural data and song-spectrogram-based recordings while taking advantage of the natural fluctuation in HVC network size due to neurogenesis.

where x(t) is the neurally decoded approximation to x(t), given by the following: where φ x i is the optimal decoder for x(t).The term r i (t) is the filtered sequence of spike times for neuron i: where K(t) is some filtering function.For the sets of simulations considered here, the filtering function K(t) is the single exponential synaptic filter where τ s = 10 ms.The derivation of (1) will be broken into two steps.In the first step, we will prove (1) for the case where each neuron fires a single spike, and the spikes are uniformly spread over the interval [0, T ].In the second step, we will prove that the same result holds for more general spike rasters by using linear transformations to determine when a general spike raster can be transformed into the evenly distributed one.The following derivation follows largely from classical approaches from functional analysis, the theory of function approximation, and the Simple Function Approximation Theorem [1,2].The 1 N in the RMSE scaling is indeed the "unavoidable discretization error" stated by [3], and numerically demonstrated in [3] (Figure 1a of [3], the regular-rate code).
Step 1: Evenly Distributed Spikes Suppose the spike train fired by the N neurons is evenly distributed over an interval [0, T ], reminiscent of the HVC RA projection neurons [4].Thus, N neurons fire at successive ∆ = T N where T is the total duration of the signal to be approximated, x(t).Each neuron fires at times t j = t j−1 + ∆ Suppose further more that we decode each spike with a box filter: In order to approximate the signal x(t), on an interval [0, T ], we need to determine what the decoders φ x j (t) for each neuron j.The decoders are easily resolved as the intervals [t j−1 , t j ] are non-overlapping, and with box-filtering, the decoded spikes r j (t) are orthogonal.This immediately yields the following decomposition of the approximant: x(s) ds The formula for the decoders is easily derived when one considers the orthogonality of the spike train.Note that if we determine the order of error in ∆, then with ∆ = T /N we determine how the error scales with the network size.The squared error is thus: Now, if we can analytically determine or bound the integral then we can bound the error in equation ( 4).First, note that by the mean-value theorem for integrals, on the interval [t j , t j + ∆] there exists a c j ∈ [t j , t j + ∆] φ x i is the mean value x(t) over [t j , t j + ∆], by definition, thus by the intermediate value theorem, there must exist some c * j such that x(c * j ) = φ x j , then we have We use the mean-value theorem for derivatives on the smaller interval [c * j , c j ].We can assume without loss of generality that c * j < c j , as the opposite case is entirely identical.If we assume that x(t) is differentiable, then the mean-value theorem tells us there exists some d ∈ [c * j , c j ] such that: and thus: where the inequality comes from the fact that the interval [c * j , c j ] lies within [t j , t j + ∆] and thus c j − c * j ≤ ∆.This yields the following: Thus, we have the following: Result (5) implies that for uniformly distributed, precisely timed spikes, the RMSE in approximating a function is inversely proportional to the network size.Doubling the network size halves error, unlike in a conventional rate-code, where the network must quadruple in size to halve the RMSE.We also note that the condition that x(t) must be differentiable can be relaxed to x(t) being Lipschitz as then: with the rest of the derivation following along similar lines.
Step 2: Use Step 1 To Derive Result for More General Spike Trains More generally, we will assume that the spikes are not uniformly distributed, however the time intervals defined above, [t j , t j + ∆], j = 1, 2, . . .N remain, and we still consider a network of N neurons.Thus, the following matrix emerges: where element (i, j) of rij is 1 if neuron j fires a spike (t * ) in the ith time interval, and 0 otherwise.Finally, we will assume that r is an invertible matrix, or equivalently, the rank of r is N .Note that if we consider the matrix generated for the uniformly distributed spiking case above (r), then r = I N where I N is the N × N identity matrix.
Then, if the matrix r is invertible, consider the decoder defined by: where φ x is the same decoder as in the uniformly distributed spiking case considered above.Applying ψ x to r yields x r which restores the uniformly distributed spiking approximation in Step 1. Now, consider the optimal linear decoder for the spike train r, as given by ψx .Then, we have the following: Thus, result (6) shows that any invertible spike-train is bounded by an O(N −1 ) error.This is in principle most randomly generated spike trains for sufficiently large N as these matrices are highly likely to be full-rank (see for example [5,6]. As a final comment, we note that this result has an implication for trained spiking neural networks with linear decoder-construction based approaches [7][8][9][10][11][12][13][14][15][16].In particular, if the timing of spikes are stabilized to be precisely reproducible between training and testing phases, then O(N −1 ) is the expected result.This, however, is not a necessary criterion, as under error-correcting spike-based codes (such as [3,17,18]), O(N −1 ) convergence can still be achieved without precisely repeating spikes.

RMSE Scaling with Spike Failure
Given a signal x(t), the optimal linear decoded approximation x(t) is defined as the lionear approximation to x(t) using a basis of N evenly distributed spikes with box-filters on [0, T ] as: x(s) ds (7) It is assumed that each spike fails with probability p F , which can be modelled by a Bernolli random variable B j ∼ B(1, 1 − p F ) for each spike.Then, the estimated signal under spike failure is Next, we compute the root mean squared error (RMSE) for a given failure rate.The mean squared error (MSE) is given by: We have already established that the bias scales like N −1 (See S1).The variance is given by: As each decoder φ x j is the mean value of the function x(s) over the interval [t j , t j + ∆], then by the mean value theorem for integrals, for each interval [t j , t j + ∆] there exists an s j ∈ [t j , t j + ∆] where φ x j = x(s j ).Further, the variance can be bounded by assuming that x(s) has a maximum, M , on [0, T ] (a consequence of the extreme value theorem).Thus: For the cross term: Thus, we have As the bias scales like N −1 , in order for the RMSE to also scale like N −1 , it is sufficient for the probability of spike failure, p F to scale like N −2 as N → ∞.Note that in this derivation, there is no assumed spike-redundancy.If a single spike fails, there are no other spikes in the interval [t j , t j + ∆] to serve in place of the failed spike.

RMSE Scaling for Differentiable Basis with Spike Jitter
Given a signal x(t) on an interval [0, T ] and a basis of N differentiable functions r j (t), constructed from filtering spike times, the approximant x(t) is defined as: We then consider the effect of random jitter applied to each of the basis elements on the mean squared error (MSE) scaling.In this derivation, it is assumed that the spikes fired by a single neuron are not jittered independently.Each random jitter δ j in the j th basis element is assumed to be normally distributed with probability density N (0, σ 2 ).Then, define the jittered basis rj (t) and decoded signal x(t) to be: The mean squared error (MSE) can be decomposed into the bias squared, variance, and a cross-term: We have established that the bias scales like N −1 (See S1).Next, we investigate the scaling of the variance and cross-term.First, we consider the effect of a single perturbation δ j on a single basis element.If we r j (t) is differentiable and |r j (t)| has a maximum M , then we can apply a Taylor expansion to o r j (t + δ j ) centered at t: where M is the maximum of the derivative of the filtered spikes, r j (t).We can then consider the impact of N perturbations δ 1 , . . ., δ N to the variance: We can now use the fact that E δ δ 2 = σ 2 and E δ [|δ|] = σ 2 π ≤ σ to get: For the cross-term we can apply Hölder's inequality to get Thus, overall we get Thus, if the quantity and if σ ∝ N −1 , then the RMSE will scale like N −1 as all terms in the bias variance decomposition will individually scale like N −1 .Note that equation (10) need not be satisfied by every possible spike train generated and supervisor x(t) generated.For example, if two filtered spike trains are linearly dependent for two neurons, neuron m and n, then the optimal decoder is not uniquely specified.For example, consider where Ψ is any real number.Thus, the decoders in this scenario need not be bounded.As a result, φ m and φ n can become arbitrarily large.In general, condition (10) will likely depend on the basis elements/spike trains generated.

RMSE Scaling for Non-Smooth Functions
Consider the step function H(t), where: for some t * ∈ (0, 1).Now, consider the uniformly distributed, box-filtered basis set defined by: With the linear decoder φ H , where Ĥ(t) = φ T r(t) This decoder is determined by minimizing the following: where t i is the unique interval start-point that satisfies t i ≤ t * < t i+1 .The first two terms are minimized and equal to 0 by setting φ j = 0 for t j < t i and φ j = 1 for t j ≥ t i+1 .Then, the loss is: