Abstract
Neural activity spans multiple time scales, from milliseconds to months. Its evolution can be recorded with chronic high-density arrays such as Neuropixels probes, which can measure each spike at tens of sites and record hundreds of neurons. These probes produce vast amounts of data that require different approaches for tracking neurons across recordings. Here, to meet this need, we developed UnitMatch, a pipeline that operates after spike sorting, based only on each unit’s average spike waveform. We tested UnitMatch in Neuropixels recordings from the mouse brain, where it tracked neurons across weeks. Across the brain, neurons had distinctive inter-spike interval distributions. Their correlations with other neurons remained stable over weeks. In the visual cortex, the neurons’ selectivity for visual stimuli remained similarly stable. In the striatum, however, neuronal responses changed across days during learning of a task. UnitMatch is thus a promising tool to reveal both invariance and plasticity in neural activity across days.
Similar content being viewed by others
Main
Neural activity spans a multitude of time scales, from the milliseconds that separate spikes to the hours, days or months that characterize learning, memory or aging. Changes at these longer time scales can be studied with two-photon imaging, where the same neurons can be visually tracked across days1,2,3,4,5. However, imaging methods lack the fast time scales and are hard to deploy in deep brain regions. To cover all time scales in all brain regions, the ideal method is chronic electrophysiology.
Recordings with chronic electrodes reveal units (putative neurons) with consistent spike waveforms across days6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21. This constancy indicates that the units track the same neurons over time, particularly when the spikes are measured at multiple locations with stereotrodes15, tetrodes13,14,18,22,23,24,25,26,27, microwire bundles28,29, silicon probes19,30, polymer arrays20 or Neuropixels probes31. The latter are readily implanted chronically31,32,33,34,35,36 and yield hundreds of potentially matchable neurons across days. In addition, their geometry and density allow for correction of electrode drift31,37.
The current methods for matching neurons across days, however, cannot process the vast amounts of data produced by sequences of recordings with high-density probes such as Neuropixels. For example, an established method relies on concatenating two recordings and spike sorting the resulting file30,31. This method can work well for pairs of recordings but becomes unwieldy for longer sequences. It does not scale to the dozens of recordings that may be obtained across weeks or months.
To solve this problem, we developed a pipeline called UnitMatch, which operates after spike sorting. Before applying UnitMatch, the user spike sorts each recording independently using their preferred algorithm. UnitMatch then deploys a naive Bayes classifier on the units’ average waveform in each recording and tracks units across recordings, assigning a probability to each match.
We tested UnitMatch on sequences of Neuropixels recordings from multiple regions of the mouse brain and found that it reliably tracked neurons across weeks. Its performance compares well to the concatenated method and to curation by human experts, while being much faster and applicable to longer sequences of recordings.
Because UnitMatch relies only on each unit’s spike waveform, and not on any functional properties, it can be used to test whether these properties change over time. Indeed, while units can maintain firing properties such as inter-spike interval (ISI) distribution10,11,12,19,20,28,29 and sensory, cognitive or motor correlates11,13,14,15,24,28,29,31,38, the stability of these properties cannot be assumed. In fact, it is often the question being investigated6,7,19,21,22,23,25,27,28,38,39,40.
We examined properties of neurons such as ISI distributions, correlations with other neurons and responses to visual stimuli (for neurons in visual cortex). These distinctive properties remained remarkably stable. We also used UnitMatch to characterize the changes of neural representations in the striatum during learning. These results indicate that UnitMatch can track neural activity in multiple brain regions across long sequences of recordings.
Results
UnitMatch takes as input the spike waveforms of units that have been spike-sorted independently across recording sessions, averaged across each half of each session (Fig. 1a), and operates in two stages. The first stage, ‘matching’, produces the probability that each unit in a recording matches a unit in another recording (Fig. 1b). The second stage, ‘tracking’, produces a matrix of indices that track a unit across recordings (Fig. 1c). Below we describe these stages and illustrate them on a large body of data obtained in our laboratory. Our description here is qualitative; the relevant equations are referenced and provided in Methods.
Preprocessing
Before running UnitMatch, the users record neural activity in multiple sessions, and use their preferred software to spike-sort each recording independently. For each recording, the output of the spike-sorting software is then used to extract for each unit a file with the average spatiotemporal spike waveform in the first and in the second half of each recording. These files have no information on individual spikes.
To develop and test UnitMatch, we used 1,350 recordings performed over multiple days (up to 235 days from a single probe) in mice implanted with chronic Neuropixels probes31,35,41 in multiple brain regions including cortex, hippocampus, striatum and superior colliculus (Extended Data Table 1). Each recording session was individually spike-sorted with Kilosort42, which provides drift correction within each session31. After spike sorting, we used a set of quality measures43 to select 25.2 ± 10.2% (mean ± standard deviation, n = 1,350 recording sessions across 25 mice) units that were well isolated and distinct from noise (Extended Data Fig. 1).
Extraction of waveform parameters
High-density recording arrays such as Neuropixels probes sample the spikes of a unit at many recording sites (Fig. 2a), revealing the unit’s characteristic spatiotemporal waveform (Fig. 2b). The amplitude of the waveform peaks at a maximum site and decays with distance from that site (Fig. 2b,c). UnitMatch fits this decay with an exponential function and obtains the distance d10 at which the amplitude reaches 10% of the maximum (Fig. 2c). In the example recordings, this value ranged between 30 and 95 μm (95% confidence interval; Fig. 2d). For each unit, UnitMatch considers the recording sites closer than d10 (but at most 150 μm away) from the maximum site. In our data, this typically resulted in 6–24 sites arranged in two columns (for example, Fig. 2b).
For each unit and each of its two averaged waveforms, UnitMatch uses the spatiotemporal spike waveform measured at the selected recording sites to extract seven attributes:
-
The weighted-average waveform (Fig. 2e and equation (9)) obtained by averaging across sites, weighted by the proximity of each site to the maximum site.
-
The amplitude of the weighted-average waveform (Fig. 2e and equation (10)).
-
The average centroid (Fig. 2f and equation (6)), defined as the average position weighted by the maximum amplitude on each recording site.
-
The trajectory of the spatial centroid from 0.2 ms before the peak to 0.5 ms after the peak (Fig. 2f and equation (4)).
-
The distance traveled at each time point (Fig. 2f).
-
The travel direction of the spatial centroid at each time point (Fig. 2f and equation (5)).
Computation of similarity scores
After extracting these spatiotemporal waveform parameters, UnitMatch compares them for every pair of waveforms within and across all recordings, to obtain six similarity scores:
-
Decay similarity (D; equation (14));
-
Waveform similarity (W; equation (18));
-
Amplitude similarity (A; equation (13));
-
Centroid similarity (C; equation (20));
-
Volatility similarity (V; stability of the difference between centroids, equation (23));
-
Route similarity (R; similarity of the trajectory, equation (24)).
Each similarity score is scaled between 0 and 1, with 1 indicating the highest similarity. Finally, we also average the individual scores to compute a total similarity score T.
To gain an intuition for these scores, consider their values for two example pairs of units. The first example involves two neighboring but distinct units recorded on the same day (Fig. 3a,b). Because they are neighbors, they have high centroid similarity C. However, their spike waveforms are different (low value of W) and so are their spatial decays (low value of D) and routes (low value of R). As a result, the total similarity score T is well below 1 (Fig. 3c). Conversely, in the case of a single unit recorded in two different days, we observed similar waveforms and trajectories (Fig. 3d,e), with high values of most similarity scores and, consequently, a total similarity score T near the maximal value of 1 (Fig. 3f).
Identification of putative matches
As expected, the total similarity score T is generally high when applied to the same unit recorded across the two halves of a single recording (Fig. 3g, main diagonal). Indeed, the values of T measured for the same unit across two halves were markedly higher than those measured across neighboring units (Fig. 3h, green versus blue curves).
UnitMatch leverages this difference to define a threshold as the value of T where the proportion of pairs from the same unit exceeds the proportion of pairs from neighboring units (Fig. 3h and equation (28)). It then applies the threshold to the distribution of T across days (Fig. 3i). The pairs of units recorded across days with values of T beyond this threshold are putative matches.
Correction for drift across sessions
Modern spike-sorting algorithms (including the one we used42) correct for electrode drift within a session but naturally cannot correct for drift across sessions that are separately sorted. This can lead to larger values of total similarity score T measured within a day than across days.
To adjust for this difference, UnitMatch fits a Gaussian function to the two distributions of total similarity scores (within and across days) and uses the fit to equalize their means and again identify putative matches. For these putative matches, it computes the median centroid displacement and uses it to rigidly transform all parameters affected by position. It then repeats the previous two steps, thus finding a more robust set of putative matches. The results we have shown (Fig. 3) are from after drift correction.
Building a classifier from putative matches
Having used the total similarity score T to identify putative matches across all pairs of recordings (Fig. 3i), UnitMatch goes back to the individual similarity scores and uses their distributions to train a classifier. There are two types of pair: putative matches (T > threshold) and putative nonmatches (T < threshold; Fig. 3i). The distributions of the six similarity scores for these pairs differ substantially (Fig. 3j). Based on these distributions, we defined a naive Bayes classifier, which takes as input the values of the six similarity scores for two spike waveforms and outputs the ‘match probability’: the posterior probability of the two waveforms coming from the same unit (equation (29)).
This classifier correctly identified the same unit within a day with match probabilities close to 1 (Fig. 3k, main diagonal and Fig. 3l, green curve). On the contrary, the matching probabilities of neighboring units were close to 0 (Fig. 3l, blue). Across days, most pairs of waveforms are expected to come from different neurons, which is reflected in a large portion of match probabilities close to 0 (Fig. 3m). However, a fraction of pairs had a match probability close to 1. These matches reflect units tracked across days.
Performance metrics
We first evaluated the performance of UnitMatch on waveforms obtained within days and confirmed that it is overall accurate while being conservative. We applied UnitMatch to units recorded in two halves of a single recording session, which were assessed to be the same across the two halves by the spike-sorting algorithm (here, Kilosort42). Consistent with the way the classifier was trained, UnitMatch tended to agree with the algorithm on these within-day matches (Extended Data Fig. 2a). Disagreements were rare: in ten recordings spike-sorted individually, UnitMatch found 0.2 ± 0.1% (median ± median absolute deviation (m.a.d.)) unexpected matches and 4.2 ± 0.8% unexpected nonmatches (Extended Data Fig. 2a). These few disagreements might represent false positives and false negatives by UnitMatch or mistakes by the spike-sorting algorithm.
From the maximum possible number of units recorded across two consecutive days, UnitMatch found 42 ± 19% (median ± m.a.d., n = 339 pairs of days) of units to be a match. Reassuringly, when we applied UnitMatch to acute recordings, where the probe was reinserted daily and had negligible chance of finding the same unit, finding a match was rare (1.9 ± 3.7%, n = 21 pairs of consecutive days; Wilcoxon rank sum comparing chronic and acute: P < 10−11; Extended Data Fig. 2b).
Next, we compared the performance of UnitMatch with spike sorting performed on concatenated recordings (as if they were obtained in a single session)30,31. Running UnitMatch on the output of Kilosort42 on these concatenated recordings yielded similar levels of unexpected matches (0.3 ± 0.3%, N = 5 mice, each two recordings) and nonmatches (5.9 ± 1.9%) within days as when the recordings were sorted separately (Extended Data Fig. 2a). Across days, 29.5 ± 16.4% of units that were identified as the same unit by Kilosort were not identified as matches by UnitMatch (Extended Data Fig. 2c).
Given the substantial difference between UnitMatch and Kilosort, we asked which method agreed more closely with human curation, where the majority of six experts had to agree on a pair being a match (Extended Data Fig. 3). We found that UnitMatch performed more similarly to manual curation than the sorting on stitched recordings did (Extended Data Fig. 2c,d). Sorting the stitched recordings with Kilosort tended to overestimate the number of matches across recordings, specifically for noisier datasets. The agreement between UnitMatch and manual curation is reassuring because the latter is often regarded more highly than automated spike sorting. However, neither can be considered ground truth.
Finally, we examined whether UnitMatch is biased toward tracking units with specific waveform and firing properties (Extended Data Fig. 4). We found that tracking could be predicted by several of these properties. As expected, the number of spikes and peak amplitude were highly predictive. In addition, we found some predictive power in waveform duration (units with thinner spike were slightly less likely to get a match) and number of peaks (units with more peaks were slightly less likely to get a match). These features may point the way toward future improvements of the algorithm.
Validation with stable functional properties
A more reliable estimate of UnitMatch’s abilities can be found by assessing the neurons’ functional ‘fingerprint’ (that is, pattern of activity). If this fingerprint turns out to be both distinctive across neurons and stable across days, one can conclude that the tracking algorithm performed well. We found functional fingerprints to be both distinctive and remarkably stable, thus validating UnitMatch’s performance.
We considered three possible fingerprints: a unit’s distribution of ISIs8,10,11,12,19,20,24,28,29, its population coupling (the instantaneous correlation of its firing rate with that of the other units recorded at the same time11,24,44,45) and its response to a large set of visual stimuli (for units in visual cortex28,29,31,38). We considered each possible pair of days independently and computed the similarity of the functional fingerprints across days for matching and nonmatching pairs.
The histograms of ISIs of tracked units remained highly consistent across days. This distribution is often considered to be distinctive and stable: it has been used as a feature to track units across days10,11,12,19,28,29 or as a diagnostic of this tracking8,24. Accordingly, the ISI histograms were typically different for neighboring units recorded within a day but similar for units matched across days (Fig. 4a). In an example mouse, the ISI distributions of pairs of units matched across days tended to be highly correlated, nearly as highly as the ISI histograms of the same units measured in the same session (Fig. 4b). Conversely, the ISI histograms of units that UnitMatch defined as different had much lower correlations (Fig. 4b).
Indeed, the correlation between the ISI histograms of a pair of units was highly predictive of whether the units were matched, even for recordings performed 6 months apart. We characterized the separation of the distributions of the correlations of matched and nonmatched pairs by computing the receiver operating characteristic (ROC) curve (Fig. 4c). The area under the curve (AUC) for the example pair of days was 0.95, almost as high as the value measured within days (0.97). Similar values were seen when increasing the number of days between recordings (Fig. 4d and Extended Data Fig. 5a,b for a breakdown of all pairs of days) and across all mice (Fig. 4e and Extended Data Fig. 5 for example mice). On average, the AUC was 0.88 ± 0.01 across days (0.94 ± 0.01 within days, mean ± standard error of the mean (s.e.m.), n = 16 mice) and decayed slowly with each additional day between recordings (−0.001 ± 0.008, median ± m.a.d., n = 16 mice). For the example mouse, the AUC remained at 0.82 after 183 days.
We then examined population coupling and found it to be also remarkably consistent across days 11,24,44,45. This measure provided a distinctive ‘fingerprint’ that was highly correlated both within and across days (Fig. 4f,g), and not for neighboring units. The discriminability of this measure was particularly high, with AUC indices close to 1 (0.98 across days versus 0.98 within days; Fig. 4h), indicating that the pairs found by UnitMatch were indeed highly likely to be the same across days. Again, this held true across mice (0.92 ± 0.01 versus 0.96 ± 0.01 across mice, mean ± s.e.m., n = 16 mice) and even across weeks and months (slope of −0.006 ± 0.010, median ± m.a.d., n = 16 mice), suggesting that the correlation patterns of the population of neurons were highly stable over time (Fig. 4i,j and Extended Data Fig. 5). For the example mouse, the AUC was still 0.82 after 183 days. This fingerprint, along with the ISI histograms, can be used in any region of the brain since it does not depend on responses to stimuli.
Finally, units in visual cortex that were tracked across days also typically showed consistent responses to visual stimuli. Neurons in mouse visual cortex give distinctive responses to natural images, and these responses can remain constant across days28,29,31,38. Consistent with this, a typical unit matched by UnitMatch across days gave similar responses to natural images on each day (Fig. 4k,l). In the example mouse, it yielded AUCs of 0.95 across days versus 0.96 within days (Fig. 4m). Similar results were seen across mice, with AUCs of 0.85 ± 0.02 versus 0.90 ± 0.03 (mean ± s.e.m., n = 9 mice). This held true even with long intervals between recordings (slope of −0.006 ± 0.010, median ± m.a.d., n = 9 mice; Fig. 4n,o and Extended Data Fig. 5). For the example mouse, the AUC was still 0.75 after 183 days.
Comparison with other methods
The stability of functional properties offered another opportunity to compare the performance of UnitMatch to the established method of running the spike-sorting algorithm on stitched recordings. We applied both methods on a sequence of four recording sessions and evaluated their accuracy using functional properties. In line with the earlier curation results, functional validation showed a larger overlap with the output of UnitMatch than with spike sorting the concatenated data. Indeed, the AUC for distinguishing matches versus nonmatches with functional similarity scores was generally higher for UnitMatch than for Kilosort, especially when recordings were many days apart (Extended Data Fig. 2e,f).
We also compared the performance of UnitMatch with that of a recently published tracking algorithm46 based on Earth Mover’s Distance (EMD). We tested both algorithms on the first recordings from five mice recorded in our laboratory (Extended Data Fig. 6a). Overall, UnitMatch had a larger hit rate than the EMD method for within-day performance, and a more consistently low false positive rate. We compared the ability to track neurons across 22 recordings of the mouse used as an example in the EMD paper46 (Extended Data Fig. 6b). UnitMatch did this tracking in 25 min, whereas the EMD algorithm took 8 h. Leveraging the high stability of ISI histograms across recordings, we computed AUC values for matches versus nonmatches for both algorithms. Matches made by UnitMatch had significantly higher AUC values (paired t-test; t(20) = 4.57, P = 0.0002). Overall, this supports UnitMatch as being a fast and well-performing algorithm compared with state-of-the-art methods.
Tracking over many recordings
So far, we have examined the matching of neurons across pairs of recordings, potentially spaced far apart. However, UnitMatch is easily scalable and can perform matching across all recorded days simultaneously, providing a match probability for all pairs of neurons in all recordings (Fig. 5a). The next stage is to use these probabilities to track individual neurons across multiple recordings, that is, to group matching units under a unique identification number (Fig. 1c).
UnitMatch provides a tracking algorithm that comes in three versions: default, liberal and conservative (Fig. 5b). The default version of the algorithm iteratively inspects all pairs, and merges a unit with a target group of units if its probability of matching with all of the units in the target group that are within the recording and in neighboring recordings is higher than 0.5. This algorithm successfully tracked populations of neurons across days and weeks, allowing neurons to disappear and reappear across days (Fig. 5c,d). The more liberal version of the algorithm tracks more neurons at the cost of more false positives. Finally, the more conservative version ensures a higher probability of accurate tracking at the cost of more false negatives. These versions of the algorithm result in slightly different groupings (Extended Data Fig. 7). As we will see below, the default algorithm is superior to the other two in some respects. We have thus used it for further analyses.
As might be expected, the probability of tracking neurons across multiple recordings decreased as the number of days between the recordings increased. To investigate the dynamics of tracking, we quantified the probability for a neuron recorded on a specific day to be tracked (not necessarily continuously) in the past (negative difference in days), or in the future (positive difference in days) (Fig. 5d,e). The probability of tracking within days (different recordings performed on the same day) was high but <1, setting an upper limit on the ability to track neurons in recordings spike-sorted independently. Interestingly, the tracking probability decreased in both directions, suggesting that neurons recorded by the probe are slowly but consistently renewed. However, the probability of a match was slightly lower in the future than in the past, indicating a progressive depletion of an initial pool of neurons. With this combination of implant and algorithms, up to 30% of neurons could be tracked for more than 100 days. These results, however, greatly depend on the quality and yield of the recordings.
Functional validation confirmed that the tracking algorithm performs accurately, even for recordings separated by many days (Fig. 5f). Indeed, the functional similarity scores for pairs identified as the same units by the algorithm remained higher than the ones of units labeled as ‘different’, across the whole spectrum of match probabilities (Extended Data Fig. 8a–f). When inspecting pairs of units between far-away recordings that had a match probability of 0, we observed that those tracked by the algorithm had higher functional similarity scores than the pairs labeled as ‘different’. Similarly, pairs of units that had a match probability of 1 but were identified as being different by the algorithm had a lower functional similarity score than those identified as tracked. This observation suggests that units were successfully tracked across recordings, beyond the simple match probabilities. This was especially true with the default version of the algorithm, which was thus superior to the liberal and conservative versions, with the best trade-off of false positives versus false negatives (Extended Data Fig. 8g–r).
Tracking units across learning
We have shown that UnitMatch can be used to track units across days, and this can be validated by stable functional similarity scores. An important future application of this algorithm will be to track units as their functional properties evolve over time, particularly as a result of learning.
To illustrate this potential, we applied UnitMatch to a small, exploratory dataset recorded during a learning process. We trained a mouse (Extended Data Table 1) in a visuomotor operant task47 and recorded activity in the dorsomedial striatum using a chronically implanted Neuropixels probe. The mouse was head-fixed in front of three screens with its forelimbs resting on a steering wheel. When the stimulus appeared on the left screen (contralateral to the recordings), moving the wheel clockwise moved the stimulus to the center, resulting in a sucrose water reward. The mouse learned to correctly move the wheel over a training period of a few days47.
UnitMatch revealed intriguing changes in the activity of neurons across days. After each training day, we recorded passive responses to the presentation of a stimulus in the center screen or the left (contralateral) screen (Fig. 6a). We analyzed data recorded during passive viewing of the same set of stimuli on day 0 (pretraining), day 2 (intermediate performance) and day 4 (plateau performance). The population’s response (averaged across all tracked neurons) to the central stimulus increased on day 2, and its response to the lateral stimulus increased on day 4 (Fig. 6b). However, tracking neurons with UnitMatch (for example, Fig. 6c–e) revealed substantial diversity across neurons. Some units increased their response to both stimuli over learning (for example, unit 158). Others developed a strong response to only one of the stimuli (for example, units 72 and 89). Despite changes in responses induced by learning, ISI histograms remained relatively stable (Fig. 6d). Importantly, there was no relation between match probability and changes in functional measures (Fig. 6f–h).
This proof of concept suggests that UnitMatch is a promising tool to reveal not only invariance but also plasticity in neural activity across days.
Discussion
UnitMatch fills a need for flexible and probabilistic tracking of neurons across recordings, and has many advantages. First, it does not use functional properties in matching neurons, like many other algorithms do6,7,9,10,11,12,16,19,22,28,29,39, allowing the user to ask whether functional properties change or remain constant. Second, it acts after spike sorting, allowing the user to choose the spike-sorting algorithm that they prefer. This is important because the quality of sorting algorithms keeps improving, and sorting is time-consuming and, thus, ideally done only once per recording. Because UnitMatch is based solely on the average waveform of neurons from single recordings, it is thus compatible with widely used preprocessing electrophysiology pipelines such as SpikeInterface48, for which we provide example interfacing code. Third, it is specifically designed to handle long sequences of separate recordings, rather than the single prolonged recording required by other approaches20,21,24. Fourth, it uses within-recording cross-validation to build probability distributions and extract match probabilities. Consequently, it can also check for units that should have been merged or split within a single (potentially acute) recording. Fifth, unlike existing algorithms, it outputs match probabilities rather than a binary output, and provides a user interface for curation. Sixth, it is robust to the drift that is often observed in chronic Neuropixels recordings. Finally, it is substantially faster and performs better than even the latest algorithm in the field46.
Although UnitMatch could track the same units over months, the number of units that were tracked decreased with time. This decrease could derive from numerous sources independent of the algorithm, such as a decline in recording quality, accumulation of drift across recordings, neurons becoming silent or dying, or changes in waveform properties. Indeed, the probability of finding a match depends on its quality metrics, suggesting an important role of recording quality and drift. Ideally, further work will reveal the contribution of each of these factors to the quality of the tracking.
UnitMatch revealed that distinctive functional properties of neurons remain remarkably stable over time; hence, it is tempting to use functional properties themselves to track neurons. However, this would prevent any evaluation of the variation in functional properties across time, and such a variation has been documented4,5,19. For example, the slow decrease in AUC values that we observed across days could be explained either by a decrease in the quality of matching or by changes in functional properties of the units. Therefore, unless there is reason to believe that the functional properties are constant9,16,38, it is prudent to exclude these properties from the criteria that determine the tracking of units and only consider them as a possible validation11,14,15,18,31 or as a separate question6,7,22,23,25,38,39.
Taken together, these findings show that UnitMatch is a promising tool to characterize neural activity spanning a multitude of brain regions and time scales, such as memory, learning and aging.
Methods
Experimental procedures were conducted at University College London according to the UK Animals Scientific Procedures Act (1986) and under personal and project licenses released by the Home Office following appropriate ethics review.
We analyzed the data from 25 chronically implanted mice of both sexes with a Bl6 background. Mice were 3–9 months of age at implantation surgery and were implanted for maximally 8 months. During the experiments, mice were typically head fixed and exposed to sensory stimuli, engaged in a task, or resting. The mice were recorded from different experimental rigs, implanted and recorded by different experimenters using different devices (Extended Data Table 1).
Surgeries
A brief (~1 h) initial surgery was performed under isoflurane (1–3% in O2) anesthesia to implant either a titanium headplate (~25 × 3 × 0.5 mm, 0.2 g in the case of the Apollo implant) or a steel headplate (~25 × 5 × 1 mm, 0.5 g in the case of the ultralight and cemented implants). In brief, the dorsal surface of the skull was cleared of skin and periosteum. A thin layer of cyanoacrylate was applied to the skull and allowed to dry. Thin layers of ultraviolet (UV)-curing optical glue (Norland Optical Adhesives #81, Norland Products) were applied and cured until the exposed skull was covered. The head plate was attached to the skull over the interparietal bone with Super-Bond polymer. In one mouse (ID 2/19), a silver wire was implanted in the mouse’s skull to ground the mouse during recordings.
After recovery, mice were treated with carprofen or meloxicam for 3 days, then acclimated to handling and head fixation. Mice were then implanted with either a modular recoverable35, ultralight or cemented implant (see section ‘Implants’below). Briefly, craniotomies were performed on the day of the implantation, under isoflurane (1–3% in O2) anesthesia and after injection of Colvasone and Rimadyl. The UV glue was removed, and the skull was cleaned and scarred for best adhesion of the cement. The skull was leveled, before opening the craniotomies using a drill or a biopsy punch. Once exposed, the brain was covered with Dura-Gel (Cambridge Neurotech).
Implants
Cemented implant
Four mice were implanted by holding and inserting the probes using a cemented dovetail and applying dental cement to encase the probe printed circuit board and reliably attach it to the skull. The recordings were made in external reference mode, using the silver wire or the headplate as the reference signal. The data from these four mice were already published31.
Recoverable modular implants
Twenty mice were implanted with a recoverable, modular implant. The methods for the Apollo implant35 and the ‘Haesler’ implant31,34 have been described in their respective papers. The third implant (‘Repix’36 implant) is conceptually similar. In short, the implant was held using the three-dimensionally printed payload holder and positioned using a micromanipulator (Sensapex). After carefully positioning of the shanks at the surface of the brain, avoiding blood vessels, probes were inserted at slow speed (3–5 µm s−1). Before surgery, the probes were coated with fluorescent dye DiI (ThermoFisher) by either manually brushing each probe with a droplet of DiI or dipping them in directly in DiI, for histological reconstruction. Once the desired depth was reached (optimally, just before the docking module touched the skull), the implant was sealed using UV glue, then covered with Super-Bond polymer, ensuring that only the docking module was cemented. After finishing all recording sessions, the probes were explanted and cleaned before reusing. The recordings were made in external or internal reference mode, using the headplate as the reference signal.
Ultralight implant
We also developed an ultralight implant (https://github.com/Julie-Fabre/ultralight_implant). Briefly, one Neuropixels probe was encased in rigid-resin K custom-made three-dimensionally printed parts. A thin square of sorbuthane sheet was added to the front of the implant. Special care was taken to ensure all shanks were parallel to each other and to the implant. This implant was then slowly lowered into the brain. At the target depth, the implant base was covered in Vaseline to protect the shank from subsequent cement applications. We then applied cement to the implant and mouse skull. To explant, we carefully drilled the implant out in the areas where Vaseline had been applied.
Data processing
Electrophysiology data were acquired using SpikeGLX (https://billkarsh.github.io/SpikeGLX/), and each session was spike-sorted with Kilosort2.542 or Kilosort449 (only for Fig. 6). Data were preprocessed using ‘ExtractKilosortData.m’, meaning that all relevant information was extracted (for example, positions of recording sites, information on extracted clusters and their spike times) and common noise was removed. Well-isolated units were selected using Bombcell43 (https://github.com/Julie-Fabre/bombcell; using parameters defined in bc_qualityParamValuesForUnitMatch.m). For each session the average waveform on every recording site for each unit was extracted, either through Bombcell or through Unitmatch’s ‘ExtractAndSaveAverageWaveforms.m’.
Input to the core of UnitMatch, which matches units purely on the basis of waveforms, was information on the clusters, at least (1) cluster identity, (2) a Boolean on which clusters to include, typically well-isolated units, (3) which recording session it was recorded in, and (4) on which probe it was recorded. In addition, it requires parameters (we used default parameters available using ‘DefaultParametersUnitMatch.m’) containing information on where to find the raw waveforms.
Example analysis pipelines from raw electrophysiology recorded using SpikeGLX all the way to using and validating UnitMatch are provided in the UnitMatch repository. A minimal use case scenario is also provided in ‘DEMO_UNITMATCH.m’, which is also useful for electrophysiological data recorded and preprocessed using other probes and software.
Mathematical definitions
We consider recordings made in a probe with N sites, and we denote with ps the position of site s (a vector with the x, y coordinates). For every unit i, we denote the spike waveform at site s and at time t as \({w}_{s,t,i}\) (averaged across n spikes of that neuron).
Step 1: extract waveform parameters
Some useful summaries of the spike waveform include the spatial footprint
and the maximum site \(s_{i}^{* }\) where the voltage has maximum amplitude
Most analyses are performed in a time window of size T samples starting 0.23 ms before the waveform reaches its peak and ending 0.50 ms after the peak. To establish a baseline noise level, we used a window of same duration starting 1.33 ms before waveform onset.
The spatial decay of the waveform is the degree to which the waveform’s maximum amplitude at site s decreases as a function of distance from the peak site, \(\left|\mathbf{p}_{s}-\mathbf{p}_{{s}_{i}^{* }}\right|\). To describe it, we fit an exponential decay function (Fig. 2c) with scale λi such that
and we use this fit to obtain the distance at which the amplitude drops to 10% of maximal value, \({d}_{10}=\log \left(10\right)/\lambda\). For further analysis, we only take recording sites into account with distance to \({s}_{i}^{* }\) < \({d}_{10}\).
The centroid trajectory of neuron i is (Fig. 2f)
and its travel direction at each time t is
xt,i and yt,i being the components of \({{\mathbf{c}}}_{t,i}\).
The neuron’s average centroid (Fig. 2f) is
To calculate a neuron’s average waveform, we start by computing the proximity \({f}_{s,i}\) of each site s to the centroid of the neuron \({{\mathbf{c}}}_{i}^{* }\)
where d10 is the distance where amplitude drops to 10% (or 150 μm if that distance is larger). At sites that are further away (where \({f}_{s,i}\) would be negative), we set \({f}_{s,i}=0.\)
We then calculate the unit’s spatial decay as the average decrease in amplitude divided by the increase in distance for all sites closer than d10 (Fig. 2):
We then compute the neuron’s weighted-average waveform \({\overline{w}}_{t,i}\) (Fig. 2e) as
We use this waveform to compute the weighted amplitude of the neuron’s spike as
When comparing waveforms between units, we normalize \(\overline{w}\) to obtain
Step 2: compute similarity scores
Based on these parameters, we next compute similarity scores for each pair of units i and j. These scores are scaled between 0 and 1, with 1 being the most similar. For most similarity scores, we do ‘0–99 scaling’: we rescale the similarity scores so that the minimum is 0 and the 99th percentile is 1. If Xi,j is the similarity score between units i and j, its 0–99 scaling is
where PK(X) is the Kth percentile of X. For similarity scores above the 99th percentile, we clip the score to 1.
We used two types of similarity score: those based on waveform time courses and those based on waveform trajectories.
Amplitude similarity
We compute the difference in maximum amplitude between each unit i and j, and we apply 0–99 scaling to its square root via
Decay similarity
We compute the difference in spatial decay, and we apply 0–99 scaling to it via
Waveform similarity
We compute the Euclidean distance between the waveforms, and we apply 0–99 scaling to it via
We also compute the correlation between the waveforms and apply Fisher’s z transformation and 0–99 scaling to it via
Empirically, we found both measures (distance and correlation) to be informative. Of course, they are also highly correlated with each other (Extended Data Fig. 9b). This correlation poses problems for a naive Bayes decoder. To take them both into consideration, we defined ‘waveform similarity’ as their average:
Centroid similarity
We compute the mean absolute distance between two centroids, and then we rescale it to obtain a measure of proximity that is 1 if centroids are identical and 0 if they are further than dmax = 100 μm:
Units that are further apart than dmax are unlikely to be a match, even when considering drift between recordings.
Volatility similarity
If some of the drift remains uncorrected, a unit that appears in two recordings may have centroid trajectories that are identical but displaced by a constant shift. To correct for this, we subtracted the average centroid (equation (6)) from the centroid trajectory (equation (4)) for each unit and computed their similarity Fi,j across units as in equations (19) and (20):
We also compute the standard deviation in Euclidean distance between centroids, and apply 0–99 scaling to it via
Since Fi,j and Si,j are highly correlated (Extended Data Fig. 9b), we averaged these two scores to
centroid ‘volatility’ similarity via
Route similarity
We compute the summed difference in direction (angle) of the centroid trajectory, and apply 0–99 scaling to it via
In addition, we compute the distance traveled by the centroid between each time point of the trajectory and compare the differences between each pair of units i and j, and apply 0–99 scaling via
The final route similarity is
Default similarity scores
Before settling on this set of default similarity scores, we evaluated the performance of other scores (Extended Data Fig. 9). For each set of scores, we computed the AUC value in classifying whether two waveforms came from the same unit or not (Extended Data Fig. 9a). This process led us to consolidate similarity scores that were highly correlated with each other (Extended Data Fig. 9b). Note that, based on within-day cross-validated performance, a user of UnitMatch will be advised what similarity scores to use for every individual dataset. In this paper, we only used default parameters and scores.
Step 3: identify putative matches
Having defined these six similarity scores for each pair of units i and j, we averaged them to obtain a total score
We define the preliminary class (M) of a pair of units as
where \({T}_{{\rm{P}}\left({M}_{i,\,j}=0\right)}\) is defined as the crossing point of probability distributions of \({T}_{i,i}\) and \({T}_{i,\,j}\), with j within 50 μm of i. In the case of overall lower scores across days (for example, due to uncorrected drift), we lowered the threshold by the difference in means (by fitting a normal distribution) for the within-day distribution (Fig. 3h, blue and red green curves combined) and the across-days distribution (Fig. 3i, red curves).
Step 5: build classifier
We use the preliminary class labels to build the probability distributions for the similarity scores as defined above, and use these to compute the probability of a match between units i and j as
where \({{\mathbf{X}}}_{{{i}}{{,}}\,{{j}}}\) is the vector of elements \({X}_{i,\,j,p}\) containing the individual similarity scores (\({A}_{i,\,j}\), \({V}_{i,\,j}\) and so on).
Functional similarity scores
To evaluate UnitMatch performance, we determined three functional similarity scores of neuronal activity.
ISI fingerprint
For each neuron i we compute the ISI histogram Ai of elements \({a}_{i,\tau }\) as the distribution of the times between consecutive spikes, binned on a logarithmic scale from 0 to 5 s. The ISI histogram Ai was then use as the first functional fingerprint.
Cross-correlation fingerprint
We computed the correlation of each unit with a reference population of units that was tracked across days. For each day d, we first binned the spiking activity of each unit across each half of the session using bins of 10 ms. Then, we computed the cross-correlation of each unit with every unit that was found to be tracked across days, yielding vectors Ci of elements ci,j corresponding to the instantaneous correlation coefficient of unit i with unit j. The value of the correlation of one unit with itself if the unit was part of the reference population was set to NaN. These vectors Ci were used as the second functional fingerprint.
Natural image responses fingerprint
To characterize the functional fingerprint of the neurons in visual cortex, we showed 112 natural images, each presented five times in a random order, to the head-fixed mice31. Two versions of the protocol were used, one long (1 s stimulus, 2 s intertrial interval) and one short (0.5 s, 0.8 s), without affecting the overall reliability of the fingerprints. To define the fingerprint, we computed the responses as the peristimulus histograms locked on the image onset (0.3 s before and 0.5 s after) and the image offset (from 0 to 0.5 s after), using 5 ms bins. The response \({R}_{i,t,s}\) for each unit i and stimulus s were then defined as the concatenation of the onset and offset matrices along their temporal dimensions. Finally, two fingerprints were obtained by looking both at the average time course
and the average response to each image
We then concatenated the vectors of elements \({p}_{i,t}\) and \({m}_{i,s}\) for each unit i to obtain its third functional fingerprint.
Fingerprint stability
To assess the similarity \({S}_{{i,j,d}_{1},{d}_{2}}\) of the fingerprints of the units i and j across two days d1 and d2, we first computed the fingerprints separately for both halves of the recording sessions, yielding two fingerprints \({f}_{{i,d}_{1},1}\) and \({f}_{{i,d}_{1},2}\) for each unit. Then, we computed the correlation of the fingerprint of units i and j across the two days and using different halves via
Using two different halves allowed use to compute the fingerprint’s reliability when d1 = d2.
ROC and AUC
To quantify the amount of information present in the distributions of the correlations of the fingerprints, we computed the ROC curve for different populations of pairs: pairs coming from the same units, or different units within days, or pairs coming from putative matched units, or nonmatched units, across days. We then computed the area under the ROC curve (AUC) to quantify this difference between distributions.
Only sessions with at least 20 matched units were taken into consideration. Moreover, in the case of the natural image responses fingerprint, these units had to be reliable on the first day (test–retest reliability of the fingerprint >0.2). Units that had a match within recordings were excluded from this analysis. For each mouse, the AUCs were then averaged across recordings locations. Similarly, the slope of AUC versus days was computed for each recording location (whenever there were at least 3 days recorded at that location), and all slopes for each mouse were then averaged. Statistics were performed across animals.
Continuous tracking algorithms
To track neurons across many recordings, we developed three versions of an algorithm. They all rely on the same procedure of serially going through all pairs of units but have different rules to group units under a common identification number. First, all the pairs (across all recordings) are sorted by their probability of matching. Then, the three versions will consider attributing the same unique identification number to the two members of the pair (and to all members of their respective groups) if they have a probability of matching that is above 0.5. The liberal version has no other constraint. The conservative version, on the other hand, will group these units only if all members of both groups match with each other. The intermediate version, finally, does something in between: it will group these units if each unit of the pair matches with all the units from the other group that are either in the same or an immediately adjacent recording.
To compute the probability of a unit being tracked, we then looked at each unit across all recordings and computed the probability of this unit being tracked in previous or subsequent recordings. These probabilities were then averaged across all the units from each animal, and averaged across animals. AUCs were computed similarly to previously described.
Tracking functional changes with learning
To track neurons across learning, we sorted data from three days with Kilosort4. We found 12 neurons tracked across the three recording days and computed the average baseline-corrected response to stimuli presented on the (contra)lateral and central screen. To do so, we computed the firing rate \({R}_{i,d,t}\) of unit i on day d around the stimulus time, averaged across trials, and normalized it to obtain the unit’s response
where \({\left\langle {R}_{i,d,t}\right\rangle }_{t < 0}\) denotes the baseline firing rate of the unit. It was then smoothed with a moving average window for plotting.
To evaluate the stability of functional measures, we quantified the Pearson correlation between ISI histograms across days, and the root mean square RMSi of the normalized visual responses across days via
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Example data for mouse IDs 1–5 (Extended Data Table 1) are available via figshare (https://doi.org/10.6084/m9.figshare.24305758.v1)51 as part of the software demo. Full datasets for mice 1, 7 and 8 are available via figshare (https://doi.org/10.5522/04/24411841.v1)52, and due to size constraint, the rest of the full datasets can only be made available upon request.
Code availability
UnitMatch software is available in Matlab and Python via GitHub at https://github.com/EnnyvanBeest/UnitMatch or via Zenodo at https://zenodo.org/records/12734237 (ref. 50). The Python version includes a SpikeInterface plugin. UnitMatch is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Bombcell is available in Matlab via Github at https://github.com/Julie-Fabre/bombcell or via Zenodo at https://doi.org/10.5281/zenodo.8172822 (ref. 43). Bombcell is under the open-source copyleft GNU General Public License 3.
References
Ziv, Y. et al. Long-term dynamics of CA1 hippocampal place codes. Nat. Neurosci. 16, 264–266 (2013).
Peters, A. J., Lee, J., Hedrick, N. G., O’neil, K. & Komiyama, T. Reorganization of corticospinal output during motor learning. Nat. Neurosci. 20, 1133–1141 (2017).
Lee, J. J., Krumin, M., Harris, K. D. & Carandini, M. Task specificity in mouse parietal cortex. Neuron 110, 2961–2969.e5 (2022).
Deitch, D., Rubin, A. & Ziv, Y. Representational drift in the mouse visual cortex. Curr. Biol. 31, 4327–4339.e6 (2021).
Marks, T. D. & Goard, M. J. Stimulus-dependent representational drift in primary visual cortex. Nat. Commun. 12, 1–16 (2021).
Muller, R. U., Kubie, J. L. & Ranck, J. B. Spatial firing patterns of hippocampal complex-spike cells in a fixed environment. J. Neurosci. 7, 1935–1950 (1987).
Thompson, L. T. & Best, P. J. Long-term stability of the place-field activity of single units recorded from the dorsal hippocampus of freely behaving rats. Brain Res. 509, 299–308 (1990).
Williams, J. C., Rennaker, R. L. & Kipke, D. R. Stability of chronic multichannel neural recordings: Implications for a long-term neural interface. Neurocomputing 26–27, 1069–1076 (1999).
Taylor, D. M., Tillery, S. I. H. & Schwartz, A. B. Direct cortical control of 3D neuroprosthetic devices. Science 296, 1829–1832 (2002).
Dickey, A. S., Suminski, A., Amit, Y. & Hatsopoulos, N. G. Single-unit stability using chronically implanted multielectrode arrays. J. Neurophysiol. 102, 1331–1339 (2009).
Fraser, G. W. & Schwartz, A. B. Recording from the same neurons chronically in motor cortex. J. Neurophysiol. 107, 1970–1978 (2012).
Eleryan, A. et al. Tracking single units in chronic, large scale, neural recordings for brain machine interface applications. Front. Neuroeng. 7, 1–13 (2014).
Wilson, F. A. W., Ma, Y. Y., Greenberg, P. A., Ryou, J. W. & Kim, B. H. A microelectrode drive for long term recording of neurons in freely moving and chaired monkeys. J. Neurosci. Methods 127, 49–61 (2003).
Schmitzer-Torbert, N. & Redish, A. D. Neuronal activity in the rodent dorsal striatum in sequential navigation: separation of spatial and reward responses on the multiple T task. J. Neurophysiol. 91, 2259–2272 (2004).
Greenberg, P. A. & Wilson, F. A. W. Functional stability of dorsolateral prefrontal neurons. J. Neurophysiol. 92, 1042–1055 (2004).
Liu, X., McCreery, D. B., Bullara, L. A. & Agnew, W. F. Evaluation of the stability of intracortical microelectrode arrays. IEEE Trans. Neural Syst. Rehabil. Eng. 14, 91–100 (2006).
Santhanam, G. et al. HermesB: a continuous neural recording system for freely behaving primates. IEEE Trans. Biomed. Eng. 54, 2037–2050 (2007).
Tolias, A. S. et al. Recording chronically from the same neurons in awake, behaving primates. J. Neurophysiol. 98, 3780–3790 (2007).
Schoonover, C. E., Ohashi, S. N., Axel, R. & Fink, A. J. P. Representational drift in primary olfactory cortex. Nature 594, 541–546 (2021).
Chung, J. E. et al. High-density, long-lasting, and multi-region electrophysiological recordings using polymer electrode arrays. Neuron 101, 21–31.e5 (2019).
Hengen, K. B., Torrado Pacheco, A., McGregor, J. N., Van Hooser, S. D. & Turrigiano, G. G. Neuronal firing rate homeostasis is inhibited by sleep and promoted by wake. Cell 165, 180–191 (2016).
Lever, C., Wills, T., Cacucci, F., Burgess, N. & Keefe, J. O. Long-term plasticity in hippocampal place-cell representation of environmental geometry. Nature 416, 236–238 (2002).
Muzzio, I. A. et al. Attention enhances the retrieval and stability of visuospatial and olfactory representations in the dorsal hippocampus. PLoS Biol. 7, e1000140 (2009).
Dhawale, A. K. et al. Automated long-term recording and analysis of neural activity in behaving animals. eLife 6, 1–40 (2017).
Akritas, M. et al. Nonlinear sensitivity to acoustic context is a stable feature of neuronal responses to complex sounds in auditory cortex of awake mice. Preprint at bioRxiv https://doi.org/10.1101/2023.04.22.537782 (2024).
Emondi, A. A., Rebrik, S. P., Kurgansky, A. V. & Miller, K. D. Tracking neurons recorded from tetrodes across time. J. Neurosci. Methods 135, 95–105 (2004).
Jensen, K. T., Kadmon Harpaz, N., Dhawale, A. K., Wolff, S. B. E. & Ölveczky, B. P. Long-term stability of single neuron activity in the motor system. Nat. Neurosci. 25, 1664–1674 (2022).
McMahon, D. B. T., Bondar, I. V., Afuwape, O. A. T., Ide, D. C. & Leopold, D. A. One month in the life of a neuron: longitudinal single-unit electrophysiology in the monkey visual system. J. Neurophysiol. 112, 1748–1762 (2014).
Bondar, I. V., Leopold, D. A., Richmond, B. J., Victor, J. D. & Logothetis, N. K. Long-term stability of visual pattern selective responses of monkey temporal lobe neurons. PLoS ONE 4, e8222 (2009).
Okun, M., Lak, A., Carandini, M. & Harris, K. D. Long term recordings with immobile silicon probes in the mouse cortex. PLoS ONE 11, 1–17 (2016).
Steinmetz, N. A. et al. Neuropixels 2.0: a miniaturized high-density probe for stable, long-term brain recordings. Science 372, eabf4588 (2021).
Juavinett, A. L., Bekheet, G. & Churchland, A. K. Chronically implanted neuropixels probes enable high-yield recordings in freely moving mice. eLife 8, 1–17 (2019).
Luo, T. Z. et al. An approach for long-term, multi-probe neuropixels recordings in unrestrained rats. eLife 9, 1–54 (2020).
van Daal, R. J. J. et al. Implantation of Neuropixels probes for chronic recording of neuronal activity in freely behaving mice and rats. Nat. Protoc. 16, 3322–3347 (2021).
Bimbard, C. et al. An adaptable, reusable, and light implant for chronic Neuropixels probes. eLife https://doi.org/10.7554/eLife.98522.1 (2024).
Horan, M. et al. Repix: reliable, reusable, versatile chronic Neuropixels implants using minimal components. eLife https://doi.org/10.7554/eLife.98977.1 (2024).
Boussard, J. et al. DARTsort: a modular drift tracking spike sorter for high-density multi-electrode probes. Preprint at bioRxiv https://doi.org/10.1101/2023.08.11.553023 (2023).
McMahon, D. B. T., Jones, A. P., Bondar, I. V. & Leopold, D. A. Face-selective neurons maintain consistent visual responses across months. Proc. Natl Acad. Sci. USA 111, 8251–8256 (2014).
Kentros, C. et al. Abolition of long-term stability of new hippocampal place cell maps by NMDA receptor blockade. Science 280, 2121–2126 (1998).
Cacucci, F., Wills, T. J., Lever, C., Giese, K. P. & O’Keefe, J. Experience-dependent increase in CA1 place cell spatial information, but not spatial reproducibility, is dependent on the autophosphorylation of the α-isoform of the calcium/calmodulin-dependent protein kinase II. J. Neurosci. 27, 7854–7859 (2007).
Jun, J. J. et al. Fully integrated silicon probes for high-density recording of neural activity. Nature 551, 232–236 (2017).
Pachitariu, M., Steinmetz, N., Kadir, S., Carandini, M. & Harris, K. D. Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. Preprint at bioRxiv https://doi.org/10.1101/061481 (2016).
Fabre, J. M. J., van Beest, E. H., Peters, A. J., Carandini, M. & Harris, K. D. Bombcell: automated curation and cell classification of spike-sorted electrophysiology data (1.0.0). Zenodo https://doi.org/10.5281/zenodo.8172822 (2023).
Lin, I. C., Okun, M., Carandini, M. & Harris, K. D. The nature of shared cortical variability. Neuron 87, 644–656 (2015).
Okun, M. et al. Diverse coupling of neurons to populations in sensory cortex. Nature 521, 511–515 (2015).
Yuan, A. X. et al. Multi-day neuron tracking in high-density electrophysiology recordings using Earth Mover’s Distance. eLife 12, 92495 (2023).
Peters, A. J., Marica, A. M., Fabre, J. M. J., Harris, K. D. & Carandini, M. Visuomotor learning promotes visually evoked activity in the medial prefrontal cortex. Cell Rep. 41, 111487 (2022).
Buccino, A. P. et al. Spikeinterface, a unified framework for spike sorting. eLife 9, 1–24 (2020).
Pachitariu, M., Sridhar, S., Pennington, J. & Stringer, C. Spike sorting with Kilosort4. Nat. Methods https://doi.org/10.1038/s41592-024-02232-7 (2024).
van Beest, E. H., Bimbard, C., Dodgson, S., Fabre, J. & Bourboulou, R. EnnyvanBeest/UnitMatch: Python+spikeinterface release (v2.0). Zenodo https://doi.org/10.5281/zenodo.12734237 (2024).
van Beest, E. H. et al. UnitMatch Demo - data. figshare https://doi.org/10.6084/m9.figshare.24305758.v1 (2023).
Lebedeva, A., Okun, M., Krumins, M. K. & Carandini, M. Chronic recordings from Neuropixels 2.0 probes in mice. figshare https://doi.org/10.5522/04/24411841.v1 (2023).
Acknowledgements
We thank M. Robacha for help with experiments and histology, B. Terry for animal husbandry and C. B. Reddy for lab management. This project received funding from the Wellcome Trust (Investigator Award 223144 to M.C. and K.D.H., Early Career Award 227065 to C.B.), NIH (U19NS123716), Biotechnology and Biological Sciences Research Council (grant BB/T016639/1 to M.C. and P.C.), the European Union’s Horizon 2020 research and innovation program (Marie Skłodowska-Curie grant agreement no. 101022757 to E.H.v.B.) and European Molecular Biology Organization (ALTF 740-2019 to C.B.). M.C. holds the GlaxoSmithKline/Fight for Sight Chair in Visual Neuroscience.
Author information
Authors and Affiliations
Contributions
Conceptualization: C.B. and E.H.v.B.; methodology: C.B., M.C., K.D.H. and E.H.v.B.; software: C.B., S.W.D. and E.H.v.B.; formal analysis: C.B., J.M.J.F. and E.H.v.B.; investigation: C.B., J.M.J.F. and E.H.v.B.; resources: C.B., P.C., J.M.J.F., A.L., F.T. and E.H.v.B.; data curation: C.B., P.C., J.M.J.F., A.L., F.T. and E.H.v.B.; writing—original draft: C.B., M.C. and E.H.v.B.; writing—review and editing: C.B., M.C., P.C., S.W.D., J.M.J.F., F.T. and E.H.v.B.; visualization: C.B., M.C., J.M.J.F. and E.H.v.B.; supervision: M.C. and K.D.H.; funding acquisition: C.B., M.C., P.C., K.D.H. and E.H.v.B.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Matthias Hennig and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Nina Vogt, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Bombcell output distributions for an example dataset.
a-l, For all the units in an example dataset (Mouse ID 1, Extended Data Table 1) we measured 12 parameters. Each panel shows the number of units that passed (green section of the abscissa) or did not pass (red section) the selection based on that parameter. The parameters are: a, Number of detected peaks. b, Number of detected troughs. c, Somatic waveform. d, Estimated percentage of refractory period violations. e, Estimated percentage of spikes below the spike sorting algorithm's detection threshold, assuming a Gaussian distribution of spike amplitudes. f, Total number of spikes. g, Mean raw absolute waveform amplitude (μV). h, Spatial decay slope (fit). i, Waveform duration (μs). j, Waveform baseline ‘flatness’, defined as the ratio between the maximum value in the waveform baseline and the maximum value in the waveform. k, Presence ratio (of total recording time), defined as the fraction of bins that contain at least one spike. l, Signal-to-noise ratio. m, Waveforms of units that survived all quality metrics thresholds.
Extended Data Fig. 2 UnitMatch performance.
a, Left: Zoom in on Fig. 2j on some units along the main diagonal. On the diagonal we expect the matching probability P(match) to be close to 1, off diagonal we expect P(match) to be close to 0. Right: Unexpected matches (%) and nonmatches (%) by UnitMatch relative to units that were defined as good single units within a day by Kilosort. UnitMatch was either run on individually sorted data (closed circles), or on concatenated data (open circles). Colors depict individual mice, and refer to colors in c. b, Pairs of consecutive recording days (%) for acute (gray) and chronic (black) recordings as a function of the percentage of tracked units. The percentage of tracked units is defined as the number of matched units between two consecutive recording days divided by the number of units on the recording day with the least recorded units. c, Venn diagrams for five individual mice illustrating the overlapping pairs of units assigned as ’match’. d, Units matched within or across two (concatenated) consecutive days by UnitMatch (left) and Kilosort (right) for five mice. Dark parts of the bars show the overlap with curated matches, light parts of the bar the matches additionally made by the respective algorithm. e, AUC value for ISI histogram correlations (left) and reference population correlations (right) for matches versus non-matches found by UnitMatch (x-axis) versus Kilosort (y-axis) for the same five mice as in C. f, same as e, but for multiple pairs of days of mouse ID1. ΔDays given by color bar.
Extended Data Fig. 3 Expert curation.
Example of a figure seen by the six expert curators that were asked to judge whether the black and gray waveforms came from the same unit. Curators could score a pair with ‘1’ (match), ‘0’ (unsure), or ‘−1’ (not a match). This example was found to be a match by both UnitMatch and stitched Kilosort. a, Average waveform across recording sites. b, Maximum recording site indicated (note, they overlap) on a 4-shank Neuropixels. c, Centroid trajectories. d, Average waveforms. e, Normalized waveforms (peak to base stretching). f, Same as c, but shown next to each other for black and gray unit. g, Spike times (x-axis) versus amplitude (y-axis), with the amplitude distribution next to it. Note that the amplitudes for ‘gray’ are drawn above ‘black’ for visibility. h, Autocorrelogram. i, Inter spike interval distribution. j, Reference population cross-correlation, which was 1 for this specific pair. k, reference population cross-correlation values between this pair of units (black line), relative to other possible pairs of units (distribution). Rank 2 means this cross-correlation value is the second highest of all possible pairs.
Extended Data Fig. 4 Quality metrics predict whether UnitMatch can find a match for a unit.
We evaluated the predictive value of different quality measures (Bombcell output; Fabre et al., 2023) on whether a match could be found for units included in our analysis. An AUC can be computed for each quality metric to quantify whether this metric differs across matches and non-matches. If the AUC is above (or below) 0.5, it means that units for which UnitMatch could find a match had a higher (or a lower) value for this parameter than the units that were left matchless. Each plot shows the distribution of AUC values and the median AUC value, across all 189 unique chronic recording datasets in 25 mice. To assess significance, we used two-sided t-tests (uncorrected for multiple comparisons). Red color of the median line indicates p<0.01. a, spikes missing, b, number of spikes, c, presence ratio, d, number of refractory period violations, e, baseline ‘flatness’, f, peak amplitude, g, amount of drift, h, waveform duration, i, number of peaks, j, spatial decay slope, and k, signal-to-noise ratio.
Extended Data Fig. 5 Units can be tracked across pairs of recordings separated by many days.
a, Number of matched units for each pair of days in one example animal. b, AUC of the inter-spike intervals (ISI). Only pairs of recordings with at least 20 tracked units are shown. c, AUC of the correlations with a reference population. d, AUC of the natural images responses. Protocols with natural images were not performed every day, and only recordings with at least 20 matched units are shown. e-h, Other example animals. All first four recordings were performed in visual cortex, and the last one was performed in frontal cortex.
Extended Data Fig. 6 Comparison to the EMD method.
a, We tested the performance of the EMD algorithm and UnitMatch on matching neurons from two halves of the same session in five mice. Note that this is a proportion relative to Kilosort output, and a small fraction of units from Kilosort should potentially have been merged. b, We compared matching performance in 22 recordings of example mouse 1. Since inter-spike-interval (ISI) histograms are generally stable (Fig. 4), we used the area under the curve (AUC) of ISI correlations to validate the pairs found by both algorithms. AUC values were significantly (***, paired t-test; t(20)=4.57, p=0.0002) larger for UnitMatch than for the EMD algorithm.
Extended Data Fig. 7 Tracking neurons over multiple sessions.
a, Example population of neurons that was tracked over 195 days in example mouse (ID1) – after drift correction. Other neurons on the probe are not shown. We compare the liberal, default, and conservative algorithm. Note that on some days neurons can disappear, to reappear on another day. b, Average ± s.e.m. probability of tracking a unit as a function of days in between recordings: the number of tracked neurons was divided by the total number of neurons available in the future recording (negative ΔDays) or past recording (positive ΔDays). We compare the liberal (green), default (black) and conservative (red) algorithms. Number of datasets per bin indicated in Fig. 5g (also applies to c-e). c, Average ± s.e.m area under the curve (AUC) values for inter-spike-interval histogram correlations when comparing matches versus non-matches as a function of number of days in between for liberal (green), default (black) and conservative (red) algorithms. d, same as c, but for the correlation with a reference population. e, same as c, but for responses to natural images.
Extended Data Fig. 8 Relationship between match probability and functional property similarity.
a, Average correlation of the inter-spike interval across all pairs (black), pairs of clusters tracked as the same unit with the default algorithm (red), or different units (blue), as a function of the match probability for that pair (with bin size 0.05). One example animal is shown, and other animals showed similar patterns. b, Number of pairs per bin. c-d, Same as a-b, but for the correlation with a reference population. e-f, Same as a-b, but for the responses to natural images. g-l, Same as a-f, but using the liberal version of the algorithm. m-r, Same as a-f, but using the conservative version of the algorithm.
Extended Data Fig. 9 Similarity scores.
a, Area under the curve (AUC) for a receiver operating characteristic (ROC) classifying same units versus neighboring units. Large AUC values indicate that the similarity score is very informative in telling whether two waveforms come from the same unit versus whether that is not the case. Data points are 189 unique chronic recording datasets across 25 mice. b, Cross-correlation between each pair of similarity scores for example mouse. The diagonal shows histograms for the individual scores. When two parameters were very informative (large AUC scores) but correlated, we averaged them together (for example, waveform similarity is the average of waveform MSE and waveform correlations). c, Individual similarity scores for example mouse, sorted by depth shank and then depth. Thresholded at same prior as total score. Slightly smoothed to increase visibility (by averaging over 3 nearest neighbors over both columns and rows).
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
van Beest, E.H., Bimbard, C., Fabre, J.M.J. et al. Tracking neurons across days with high-density probes. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02440-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41592-024-02440-1