Accurate spike estimation from noisy calcium signals for ultrafast three-dimensional imaging of large neuronal populations in vivo

Extracting neuronal spiking activity from large-scale two-photon recordings remains challenging, especially in mammals in vivo, where large noises often contaminate the signals. We propose a method, MLspike, which returns the most likely spike train underlying the measured calcium fluorescence. It relies on a physiological model including baseline fluctuations and distinct nonlinearities for synthetic and genetically encoded indicators. Model parameters can be either provided by the user or estimated from the data themselves. MLspike is computationally efficient thanks to its original discretization of probability representations; moreover, it can also return spike probabilities or samples. Benchmarked on extensive simulations and real data from seven different preparations, it outperformed state-of-the-art algorithms. Combined with the finding obtained from systematic data investigation (noise level, spiking rate and so on) that photonic noise is not necessarily the main limiting factor, our method allows spike extraction from large-scale recordings, as demonstrated on acousto-optical three-dimensional recordings of over 1,000 neurons in vivo.

noise level), while temporal accuracy at high SNR is limited by frame rate (the SNR=4,8 curves overlap when plotted as function of frame rate). d: Dependency of both MLspike and Peeling on its most important parameters (graphical conventions as in Supp. Fig. 1). Curves in (a-c) were obtained using optimal parameter settings (white stars in (d)).

Supplementary Figure 3:
Parameter dependencies; temporal accuracy. Case 2: different types of noise. As in Supp. Fig. 1, panel a shows all estimation errors for different conditions (columns: noise type, noise level and spiking rate) and different MLspike parameter settings (rows: drift, spikerate and sigma, see Supp. Fig. 1 for their meaning). It expands Fig. 3c. As expected, the most critical parameter was sigma, higher noise levels calling for larger values (note also its smaller range as compared to Supp. Fig. 1) in order to limit the number of false detections (b), therefore restricting the range of good parameter values. b: Same as (a), but displaying miss-and false detection rates separately. c: Error rate and spike time error. Same simulations as in Fig. 3c, but plotted using three different noise quantifications. Parameters were set to optimal values (white stars in (a),(b)). Once more, quantifying noise as "noise level" allows to strongly reducing MLspike's performance dependency on the type of noise. Frame rate: 100Hz in all panels. Figure 4: Performance comparison of MLspike with Peeling on different noise types; parameter dependencies. These simulations are similar to those in Fig.  3c and S3, but without any calcium saturation (allowing to use the linear version of Peeling, which in our hands is more stable). Moreover, here, the baseline fluctuations have an additive rather than multiplicative effect, i.e., the signal increase for one spike is fixed to a constant value -as required by the Peeling algorithm. On the contrary, in all other simulations the signal increase was scaled by the baseline value. This is more rigorous but also more delicate to handle, in particular at high spiking rates where exact baseline estimation is difficult. a: Dependency on tuning parameters. b: Estimation on example traces; as in Fig. 3, blue traces (simulated fluorescence signals) are the sum of red (noise-free fluorescence signals) and grey (noise) traces.

Supplementary
Three estimation examples (same graphical conventions as in Fig. 2,3) are displayed in black (MLspike) and green (Peeling). Their noise levels and ERs are marked by green/black circles in (c). Note that the difference in performance between the two algorithms is mostly due to a better ability of MLspike to discriminate between calcium transients due to spikes and baseline fluctuations. c: ER and the average delay between estimated and true spike times were plotted as a function of noise level and SNR. Frame rate = 100Hz in all panels. Figure 5: Illustration of the autocalibration algorithm on one example of real data. Data consisted of 75 trials of 25s. a: The six steps of the autocalibration (for details see: Methods). Steps 1-2-3 are aimed at estimating the amplitudes of isolated Ca 2+ fluorescence events. These amplitudes are indicated as percentages in black on four example trials (N.B.: electrically recorded spikes are also shown in red to compare the results to the "ground truth", but are obviously not used by the method). Steps 4-5 are aimed at assigning a number of spikes to each of these events, based on the histogram of event amplitudes (more precisely, on specific functions derived from the histogram).

Supplementary
Step 6 finally estimates parameters A and τ. b: Spike estimation by MLspike, using the autocalibrated parameter. Figure 6: Calibration report: model parameters obtained using simultaneous imaging and electrophysiology. a: Table summarizing "calibrated" model parameters, i.e., minimizing the discrepancy between the measured fluorescence and that predicted by the electrically recorded spikes. Our estimations of parameter A for GECIs are slightly lower than those reported in 2 , due to our different normalization convention (division of the background-subtracted signal by baseline alone, rather than by (baseline-0.7xbackground)). b: Plotting single-exponential decay constant τ against unitary calcium transient A ("calibrated" parameter values) allows comparing the characteristics of the different indicators (see Supplementary Methods and Supplementary Fig. 9 for more details, including on different response models). Numbers near points correspond to example traces in Fig. 5a,b. c: Amplitude of fluorescence transients as function of spikes recorded from cells in (a). Non-linearities in the response dynamics were accounted for by either a cubic polynomial based on heuristics 3 , or by a more physiological model based on Ca 2+ dynamics as predicted by the Hill equation 4 . In both cases, the model parameters were obtained through calibration as described above. Top: mean. Bottom: same data as (top), but plotted separately for parameters obtained from each neuron in (b). Some variability was encountered among cells. In addition, in some GCaMP6 cases the response amplitude predicted by the polynomial model re-decreased to near-zero values for large spike numbers, underscoring the limitations of this non-physiology based model.  Fig. 5b,c, in all panels). Bottom: mean spike time error shown as a function of noise level. b: Same as (a), but for autocalibrated parameters (no electrophysiology data were used). Note that estimations using optimized parameters (a) are overall more equilibrated in terms of misses and false detections than in (b), where some points are largely imbalanced due to errors in parameter estimation by the autocalibration ((a) and (b) top are differently scaled). Average timing error, however, was the same in the two cases. c: Correlation between autocalibrated parameter values and those obtained by calibration using simultaneous electrophysiological recordings (see Methods). Two models were tried for the GECIs: a cubic polynomial with instantaneous risetime (left) and the "Hill exponent" (right), i.e., finite risetime and amplitude derived from the Hill equation 4 . Same color conventions as in (a). d-e: Correlation values between ER and noise RMS/A in various narrow frequency bands, for synthetic Ca 2+ indicator OGB (d) and GECIs (e). In the OGB case, a correlation peak is clearly seen around 1Hz, despite strong correlation of the noise power among different bands. The latter is due to global-effect factors such as: quality of staining, physiological condition of the preparation, photonic noise, etc. In the case of GECIs, the correlation between ER and noise level was altogether weaker: despite the small value of some of the cells' parameter A (leading to low SNR values and hampering estimations of transients evoked by single spikes), the SNR and estimation accuracy were excellent for bursts, thanks to the probe's strong supralinearity. Both for OGB and GECIs, the precise frequency band maximizing correlation differed slightly from the 0.1-3Hz band used throughout our study (e.g., yielding ρ = 0.76 vs. 0.69 in the case of OGB (d)). This is possibly due to small differences in the power frequency spectrum of actual vs. simulated unitary fluorescence transients. However, both for OGB and GECIs, correlation between ER and noise power dropped above 10Hz independently of the precise band chosen to quantify noise level (d), thus underscoring the stronger impact of low-frequency with respect to high-frequency noise on spike estimation accuracy.  Fig. 6a for details), which illustrate the variety of signals that MLspike can handle. a: This OGB in vivo recording shows MLspike's better handling of several kinds of noises than the other algorithms, including slow drifts and temporally correlated faster fluctuations. In particular, these noises strongly confuse parameter estimation for the MCMC algorithm (A being underestimated, leading respectively to many false detections). b: Example from the GCaMP6s dataset: although the probe's strong nonlinearities pose a strong challenge also to MLspike, it handles them better than the other algorithms. Peeling underestimates small transients (resulting in many misses) but overestimates large transients (resulting in many false positives); MCMC overestimates A, leading to many misses.

Supplementary Figure 9: Comparing three models for the non-linearities of GECIs
Three different models were tested to account for the non-linear response dynamics of GCaMP5 and GCaMP6: (i) "Hill exponent": instantaneous risetime but amplitude derived from the Hill equation 4 , (ii) "Hill + risetime": same as (i) but with finite risetime, and (iii) "polynomial": instantaneous risetime and amplitude modeled heuristically via a cubic polynomial 3 see Methods). a: Comparison of estimation accuracy between the three non-linearity models applied on the three different GECIs, for optimized and autocalibrated (the latter failed for GCaMP5) parameter sets. Top: spike estimation error, bottom: spike timing error (one-sided Wilcoxon signed ranked test, *: p<0.05, **: p<0.01, ***: p<0.001). The poorer performance of the finite as compared to the instantaneous risetime model in the autocalibration case might result from some cases where the true risetime was shorter than the one used in the model (which was fixed to the values specified in Supplementary Fig. 6a), thus inducing large errors in the estimation of the other model parameters. b: Example spike estimation using the three different models. In one case, an isolated spike is correctly detected (arrow) only by the model with finite rise time. c: same as (b), but during a period displaying both an isolated spike and a burst. Only the polynomial model succeeds in estimating the correct number of spikes in both cases (arrow). Figure 10: MLspike used to return either the unique MAP estimate, spike probabilities, or sample spike trains. The original equations of MLspike can be modified so as to return either spike probabilities or sample spike trains (see Methods). The latter provide additional information on the uncertainty of the estimations. Here, the three different estimations have been applied to simulated data with two different levels of noise. a: When noise is low, estimation uncertainty is low too: as a consequence, spiking probabilities are very focused in time, and sample estimates are very similar one to another (3 examples are shown). b: When noise is high, spiking probabilities are more scattered, and sample estimates display large variability. All graphical conventions are the same as in Fig. 6 and Supplementary Fig. 8.