Integrated open-source software for multiscale electrophysiology

The methods for electrophysiology in neuroscience have evolved tremendously over the recent years with a growing emphasis on dense-array signal recordings. Such increased complexity and augmented wealth in the volume of data recorded, have not been accompanied by efforts to streamline and facilitate access to processing methods, which too are susceptible to grow in sophistication. Moreover, unsuccessful attempts to reproduce peer-reviewed publications indicate a problem of transparency in science. This growing problem could be tackled by unrestricted access to methods that promote research transparency and data sharing, ensuring the reproducibility of published results. Here, we provide a free, extensive, open-source software that provides data-analysis, data-management and multi-modality integration solutions for invasive neurophysiology. Users can perform their entire analysis through a user-friendly environment without the need of programming skills, in a tractable (logged) way. This work contributes to open-science, analysis standardization, transparency and reproducibility in invasive neurophysiology.


Introduction
Invasive electrode recordings are a unique source of in-vitro and in-vivo neurophysiological data at high resolution in both space and time, recorded in relation to complex animal and human behavior. The complexity of this kind of data has increased in recent years, with the advent of increasingly dense multi-channel and multi-site electrode arrays. This evolution provides exciting opportunities to explore the relationship between local events, such as action potentials, and more global dynamics at the systems level, such as fluctuations in oscillatory network activity. At the same time, these multiscale explorations require different analytical methods from those traditionally used in the field.
Challenges in exploring high-dimensional spatio-temporal data sets are not specific to electrophysiology: they occur frequently in neuroimaging data, as scanners produce increasingly large volumes of data, which are often shared across multiple groups or research centres. In response, the brain imaging community has made significant strides in developing shared software platforms to harmonize analytical methods and to facilitate data sharing [1][2][3][4][5][6] . Indeed, free, open-source software toolkits have been critical for facilitating training and augmenting research productivity. This approach has transferred to the field of scalp electrophysiology 7 , but as of yet it has not found widespread use in invasive neurophysiology (IN). Software tools do exist for specific segments of the IN data workflow, such as for spike detection and sorting and time-series analysis [8][9][10][11][12][13][14][15] , but they remain relatively specialized, some with limited support and documentation and most with restricted interoperability with other tools.
While we acknowledge significant efforts in harmonizing data formats for electrophysiology (Refs [16][17][18] , Neuroshare -http://neuroshare.sourceforge.net/index.shtml), it does seem that this field lags behind others in meeting the demands of recommended practices for data management and transparency 19,20 . In this regard, well-supported software tools are required to produce analytical workflows that are validated, well documented and reproducible. Important components include data organization, review and quality control, verified implementations of signal extraction and decomposition methods, solutions for advanced visualization registered to anatomy, and sound approaches to machine learning and statistical inference. As in the brain imaging field, such tools would facilitate the reproducibility of published results and the dissemination of methods within and between research groups. They would also save considerable time and resources currently required to re-code published methods. In addition, re-coding presents challenges in code verification relative to a published method, raising possible concerns about the validity of the end results and limiting the long-term value of the effort. Table 1. Synopsis of the e-Phys toolbox and the tools that can be used for LFP analysis. The e-Phys toolbox provides a working framework for every step of the e-Phys analysis and each module can easily be enriched with future additions.
www.nature.com/scientificdata www.nature.com/scientificdata/ depending on the visualization parameters set by the user (e.g., virtual page length, selection of a subset of channels or montages for review, keyboard and mouse shortcuts for navigating and marking events).
Task events (e.g., stimulus types and presentation times, behavioral responses) and ancillary recordings (electrooculograms, electrocardiogram, eye and body movements, video recordings of behaviour, etc.) are readily registered to the electrophysiological data in IN-Brainstorm, for multimodal data review, quality control and event-related processing. We emphasize that when a raw file is reviewed, the physical data is not duplicated as a Brainstorm file. Instead, the header of the original data file is automatically parsed to extract metadata, such as channel parameters, sampling rate, time stamps, event codes, etc. Figure 2 (left) shows an example of IN-Brainstorm display for data review, including sub-menus for displaying and navigating through files and events. The right panel shows an example of raw data collected with a Plexon MAP system and a 32-channel linear electrode implanted in cortical areas MT and MST of a non-human primate. The animal maintained fixation during the presentation of a motion stimulus comprising of dots that translated in 8 different directions.
The red line in the figure shows the time of a "Stim On 0" event, extracted from the data. Spikes detected online (labelled as Spikes Channel) were extracted directly from the raw file contents by IN-Brainstorm, with automatic registration to the data time series.
The bottom right panel of Fig. 2 shows a selection of 4 channels temporally aligned with the top figure. The spikes from a neuron that was isolated on the first electrode are marked with green circles at the top of the full time-series displayed in the top panel. Users can browse the raw traces using point-and-click GUI and a series of keyboard shortcuts. On-the-fly bandpass and notch filtering can be applied to the signals.
Quality control & data pre-processing. Starting from the kind of raw data shown in Fig. 2, users can easily navigate through the recordings and experimental trials and events for quality control. Data segments, channels and entire trials can be marked as "bad" and excluded from further analyses using automatic processes or based on user evaluations.
The IN-Brainstorm pre-processing toolkit features solutions for adjustments of recording baseline, data resampling and frequency filtering (with linear phase filters). Additionally, detection and attenuation of artifacts (e.g., heartbeats, eye and body movements, stimulation and juice artifacts) can be achieved with principal 22 or independent component analysis 23,24 . Finally, combining sensor data with the actual geometry of the recording array(s) enables many 2-D and 3-D visualization possibilities for time-series and realistic topographical plots, as illustrated further below.
Spike detection and spike sorting. Following the importation and preprocessing of data, IN data is often processed to extract spiking events from single or multiple neurons. This entails detecting spike occurrences and classifying these events according to their respective neural sources 25 . Most data acquisition systems feature online spike detection and sorting. These online events can be imported directly into IN-Brainstorm with the corresponding raw recordings. Yet, usual IN practice is to refine spike classification with a two-step procedure consisting of 1) unsupervised clustering, which automatically assigns each spike to a neural source based on waveform features, then 2) supervised clustering, which requires manual reviewing and editing of the labels from unsupervised clustering and the elimination of spurious spike events.
For IN-Brainstorm, we have enabled the direct interoperability with a selection of existing and openly-available spike-sorting toolkits: Waveclus 14 , UltraMegaSort2000 8,10 and Kilosort 13 . Those packages can be downloaded and installed automatically, in a completely transparent procedure. Sequentially, these tools are called by and interact with IN-Brainstorm without programming interventions from users.  Fig. 1 Workflow of the e-Phys toolbox. Users initially import the header to the raw binary signal. Once the data is identified, the users perform the spike-sorting step. The spike-sorting process is divided into two parts: Unsupervised (the algorithm creates neuronal clusters automatically) and Supervised (the user inspects the output of the unsupervised part). At this point of the workflow, the LFP can be extracted. All the spiking events that were previously computed, and the down-sampled LFP signals, are all encapsulated to a single binary file. The original binary file can be stored to an external source and is no-longer needed. Finally, users can now perform preprocessing and analyze their data, utilizing the spike-related functions that have been introduced to Brainstorm by this toolbox.
Unsupervised spike sorting. Figure 3 (left) shows IN-Brainstorms' GUI for unsupervised spike-sorting. Raw files are dragged and dropped into the GUI process box before a spike-sorting tool is selected from the IN-Brainstorm toolkit. Next, spike events are detected on each electrode and classified according to their putative neuronal generators. The unsupervised spike events produced overwrite the online counterparts that were detected during data acquisition. The output of the spike-sorting process ( Fig. 3 Box 1) is automatically registered to and accessible from the IN-Brainstorm database and linked to the corresponding raw file. The spike events are labelled in a principled manner (per channel and source cell number - Fig. 3     (1) A new entry in the database "WaveClus Spike Sorting" appears and indicates that this dataset has been spike-sorted. (2) New events appear in the Events window, corresponding to the spikes that the spike-sorter clustered.
www.nature.com/scientificdata www.nature.com/scientificdata/ Supervised spike sorting. As WaveClus and UltraMegaSort2000 have built-in supervised spike sorting graphical user interfaces, we synchronized their GUIs with IN-Brainstorm's. For Kilosort, we developed specific GUI bridges via Klusters 9 . The user-selected supervised clustering tool is called from Brainstorm's main window after an unsupervised spike-sorted file is selected (Fig. 4a). The user then switches to the GUI of the selected supervised spike clustering tool ( Fig. 4b-d). Once supervised spike clustering is complete, the spike events are updated accordingly and registered into the software's file system. Double-clicking on the link to the raw data file lets the user review the updated spike events along with the raw electrophysiological traces as shown in Fig. 2

(Right).
Spike events and categories from other spike-sorting tools can be readily imported as Brainstorm events, following the procedure described in the online documentation (https://neuroimage.usc.edu/brainstorm/e-phys/ ConvertToBrainstormEvents).

Extraction of local field potentials.
In addition to spiking activity, IN recordings yield local field potentials (LFPs), which provide direct measures of the summed post-synaptic electrical activity in the vicinity of recording electrodes 26 . These can be useful as a complement to spiking activity or a surrogate for some aspects of neural activity (e.g. 27 ), provided that LFP traces can reliably be filtered and separated from spike waveforms 28 . Figure 5a shows the IN-Brainstorm's GUI for extracting LFP traces from raw recordings. The application features efficient tools to remove spike traces (Zanos et al., 2011), to perform anti-aliasing bandpass filtering and to down-sample the raw data. The de-spiking method proposed by Zanos et al. 28 increases the accuracy of subsequent spike-field coherence measures and of spike-triggered average signals.
The resulting LFP traces and experimental events are automatically registered in IN-Brainstorm's data repository for further review and analysis with a vast library of tools and pipelines − as described below − or for easy exportation to other software or plain files.
LFP extraction produces a new IN-Brainstorm down-sampled time-series binary file ( Fig. 5b) with all the corresponding metadata, such as channel description (e.g., electrode labels and locations), and spike and www.nature.com/scientificdata www.nature.com/scientificdata/ experimental events. This file is easily sharable among researchers since its size is typically ~20-30 times smaller than the original raw file. Figure 5c shows a segment of the LFP file created.
Epoching. Once the relevant neural signals (LFPs and spikes) have been extracted from the raw data, they can be divided according to experimental epochs. Epochs are typically comprised of experimental trials, with the time window selection defined around a stimulation or behavioral event of interest. These can be imported directly into the IN-Brainstorm file system.
To illustrate these functions, we make use of the example visual cortex recording described previously (Fig. 2). The experiment involved presentations of moving stimuli while the animal maintained fixation; we defined the relevant epochs as segments of [−500, 1000] ms around the onset of each visual stimulus (Fig. 6 Left). In total we considered 8 different directions of the visual stimulus moving pattern; each stimulus condition was repeated 4 times (one condition was repeated for 96 trials for usage in the raster plot, and noise correlation functions). Imported trials to the database are shown in (Fig. 6 -Right).
The following analysis steps can then be applied on the epoched trials.
analysis of individual LFP signals. LFP traces can be analyzed using Brainstorm's extensive library originally developed for EEG and MEG research 6 . We show in Table 1 a list of the main data processing categories that are available for LFP analysis. There is extensive online documentation, accompanied by data files, that describes in detail the methods and practices of LFP signal analysis (http://neuroimage.usc.edu/brainstorm). We briefly provide below a few examples of these functions and their implementation in IN-Brainstorm.
Time-frequency decompositions. Having extracted the LFP signal and defined an appropriate analysis epoch, one can compute the LFP power at different frequencies and at different times relative to a stimulus event. Such information is often used to infer stimulus selectivity, anatomical sources of input, and other factors that are not necessarily apparent in spiking activity [29][30][31][32][33] . IN-Brainstorm provides functionality for spectral and time-frequency decompositions, which can be derived using power spectrum density estimates, Hilbert or wavelet transforms. An example time-frequency decomposition (wavelet) is shown in Fig. 7a for the example LFP data corresponding to a single stimulus condition and epoch that shows strong alpha and beta responses after stimulation. The wavelet decomposition was z-scored with respect to a pre-stimulus baseline [−500:−100] ms.
LFP-LFP signal analysis. LFP signals from multichannel recordings can be analyzed to detect occurrences of various forms of signal similarities in the time or frequency domain. These measures are often interpreted as representing functional connectivity between different sites 30,[33][34][35] . IN-Brainstorm provides support for widely-used measures based on amplitude or phase statistics as indicators of possible interregional brain interactions (coherence, phase-locking values, bandlimited amplitude envelope correlations, phase-transfer entropy) and parametric models (estimates of time-or frequency-domain Granger causality). Advanced measures of interdependence between oscillatory components of polyrhythmic brain activity can be derived with phase-amplitude coupling (PAC) estimation tools 36,37 . An example estimation of coherence among all combinations of electrodes is shown in Fig. 7b for a single stimulus condition and epoch. The bimodal pattern that emerges (high coherence among some  Raster plot -peristimulus time histograms. Raster plots and peristimulus time histograms (PSTH), are routinely used to visualize the relations between neuronal firing and a stimulus event or a behavioral response.
We provide three methods for visualizing spiking activity with IN-Brainstorm:     www.nature.com/scientificdata www.nature.com/scientificdata/ The first method (raster plot) shows the spiking data as trial vs. time for each neuron. Similarly, the second method (PSTH) shows the average binned firing rate for each neuron, along its 95% confidence intervals. Raster plots and PSTHs of spiking rates are displayed after interactive selection of the cell to be reviewed. Figure 8a shows the raster plot of the first neuron detected from contact AD01 (top), and its equivalent PSTH with 10-ms binning (bottom). The PSTH of the neuron's firing rate from 96 trials of a single condition revealed a stimulus-onset-to-maximum-firing latency of about 150 ms.
The third method is embedded within the topographical plots section as shown below.
Tuning curves. Tuning curves capture the relationship between an experimental variable (e.g., the orientation of a visual stimulus) and a scalar measure of neural activity (e.g., a single neuron's trial-averaged firing rate). Tuning curves are readily produced from continuous data files that contain the event markers of interest to the study. Tuning curves are displayed with IN-Brainstorm after manual assignment of the order of the experimental conditions (x-axis), the selection of the neurons to be displayed, and the selection of the time window of interest for reporting spiking activity. A separate tuning curve figure is produced for each neuron selected.
We selected the events and individual neurons previously identified from spike sorting via IN-Brainstorm's GUI. Figure 8b shows the tuning curves of one example neuron (labeled as "Spikes Channel AD07 |1|") for the 8 different conditions (Stim On −3/4 pi, Stim On −2/4 pi etc.) of the motion stimuli, and its 95% confidence intervals. The tuning curve shows the preference of this neuron for stimuli moving in the right direction (Stim On -1/4 pi condition).
Topographical plots. When multichannel recording devices are used, neurophysiology data can be shown as topographically registered to structural anatomy. IN-Brainstorm can show neuronal firing at the 3-D locations of the recording probes/arrays. To illustrate this feature, we used a separate dataset that was collected from two 96-channel Utah arrays and one 32-linear probe 38 . A structural T1-weighted MRI volume was acquired preoperatively. The head and brain surface envelopes were segmented with Freesurfer 39 and directly imported in IN-Brainstorm. The electrode contact locations were co-registered to the 3-D anatomical volume by specifying the distance of the electrodes along the probe and locating the tip of the probe and the entry point through the skull, using Brainstorm's MRI volume viewer.
Neuronal firing was binned in 10-ms segments and displayed on the animal's anatomy as shown in Fig. 9a (a single bin is displayed in the figure). This figure shows IN-Brainstorm's ability to overlay the segmented cortical surface, MRI orthogonal slices, the implanted devices with actual geometry, and color-coded displays of raw or processed electrophysiology data (here instantaneous firing rates). Figure 9b shows a zoomed-in version of Fig. 9a over the Utah array implanted in the prefrontal cortex.
Spike-spike analysis: noise correlations. While tuning curves capture neuronal sensitivity to stimulus properties, the fidelity of a population code is thought to be limited by noise that is common across neurons 40 ; for example, neurons would be noise correlated if for each stimulus their activities are correlated 41 . Such noise correlations are typically quantified as the Pearson correlation coefficient between the firing rates of two neurons across trials. Such correlations strongly influence the accuracy of population coding [42][43][44][45][46] .
Noise correlation statistics are displayed with IN-Brainstorm from the correlation of the spike trains that each neuron elicited within a given epoch, for all neuronal combinations. The end result is a nxn matrix (with n the number of unique neurons that produced spikes during the selected trials) that shows noise correlation estimates between the selected neurons. Figure 8c shows the noise correlation profile across the 32-channel array of the example dataset, for 53 unique neurons that elicited spikes across all trials at the 8 conditions of presentation of the moving stimulus in the original data set from Fig. 2. Spikes included in the correlation computations were selected in the [0,300]-ms time range of each trial.
The computed noise correlation showed 2 pairs of neurons with abnormally high noise correlation (above 0.8). After further inspection, it was revealed that this was due to the fact that the spike-sorter that was used was not taking into account the relative position of the electrodes, and the same neurons were picked up from neighbouring channels: Neurons: AD01 |1| -AD02|2| and AD08 |1| -AD09 |1| were the same neuron. can capture activity over regions, including subthreshold post-synaptic activity, and therefore reflect the state of a broader network 47 . There is considerable interest in relating the two types of signals for estimating the dependence of spiking activity on the broader context in which the neuron is embedded.
Spike-field coherence. Spike-field coherence (SFC) estimates the consistency between the time occurrence of spike trains and the phase of co-localized LFP cycles as a function of frequency 48 . SFC can also be used to evaluate synchronized activity between distant brain regions, as a marker of neuronal communication 34,[49][50][51][52] . IN-Brainstorm features the spike-field coherence estimator proposed by Fries 53 . The user can derive SFC estimates for each GUI-selected neuron, for all electrodes and frequencies of interest. Figure 8d shows SFC up to 50 Hz between a single neuron detected at channel AD07 of the example data set and the LFP traces at all the 32 channels of the probe. The time window selected around the spiking events was [−150, 150] ms. The horizontal white line indicates the electrode where the neuron was detected.
Spike-triggered average of the LFP. Spike-triggered averaging (STA) of the LFP reveals how neuronal spiking is related to the dynamics of proximal or distant LFPs [54][55][56] . STA proceeds with trial averaging of LFP traces time-locked to a designated neuron's spike events, followed by normalization with the total spike count.
Analogous to spike-field coherence, STA is computed over a user-selected time window around each spiking event. STA scores are per neuron, showcasing the average LFP amplitude around the occurrence of the spikes of each neuron. STA can be visualized on topological 2-D representations of the recording array, to reveal time-locked associations between neuronal spiking activity and local or remote LFP recordings. Figure 8e shows the STA time-locked to the firing of the first neuron detected by electrode AD01 across trials and conditions. The topographical 2-D plot is produced with IN-Brainstorm using multidimensional scaling of the actual 3-D location and geometry of the implanted probe. The LFP epoch around spike event was [−150,150] ms.

Statistical inference and machine learning.
Once measures have been extracted from spiking or LFP data, tools to conduct inferential statistical analysis in the multiple dimensions of electrophysiological data (space, time, frequency, connectivity) are available from Brainstorm's library.
Parametric (one-and two-sample tests) and nonparametric permutation tests, descriptive and distribution statistics from histograms (Q-Q plot and Shapiro-Wilk test for data normality) are available. Here too, the software architecture emphasises interoperability with other toolkits, for expanded resources. For instance, multidimensional and nonparametric cluster statistics can be run on LFP and time-frequency data, from Brainstorm, via calls to FieldTrip 12 .
In addition, statistical learning tools for decoding and multivariate pattern analysis (MVPA) are also available (see e.g. Cichy et al. 57 ). The Brainstorm library also includes support vector machine (SVM) and linear discriminant analysis (LDA) classification of LFP time series based on experimental events and conditions. additional features. Processing power. Hardware acceleration in the processing of long recordings is enabled by Matlab's standard parallel computing (e.g., multi-core) features, which are controlled directly from Brainstorm's GUI. Flexible management of memory resources is also accessible to users, with the specification of the amount of RAM allocated to data manipulations while executing the LFP extraction process. Moreover, GPU acceleration computations are enabled through Kilosort for the spike-sorting step.
Data management. Generally speaking, formal data management plans are seldom adopted by electrophysiology labs. Instead, the handling of data is typically project-based, with trainees managing their individual data www.nature.com/scientificdata www.nature.com/scientificdata/ collection and analyses until publication. When they move on to another project or to the next step of their career, they frequently leave data, analysis pipelines and results behind, with minimal documented organization for sustainability and knowledge transfer. This limits the long-term value of data and negatively impacts the reproducibility and verification of research results 58 . Brainstorm has tools to improve and facilitate data management: data is hierarchically organized by Studies, followed by Subjects/Samples and (experimental) Conditions, which point to data elements such as links to raw data files, single-trial epochs, sample statistics, and other derivatives: power spectra, wavelet decompositions, measures of cross-frequency coupling and inter-regional connectivity, etc. As with all features in the application, user interactions with Brainstorm's data organization are facilitated both by the application's GUI and direct access via scriptable functions using Matlab code.
Another important aspect of Brainstorm is its capacity for importing entire data repositories at once, with associated metadata, when those datasets are organized according to the emergent Brain Imaging Data Structure (BIDS). Originally driven by the neuroimaging community, BIDS is a grassroots effort to harmonize data organization and documentation 59 . BIDS has recently been extended to MEG electrophysiology 60 and is presently integrating EEG 61 , and invasive neurophysiology 62 .
Batch processing. The software has a specific GUI for assembling data processing pipelines in an intuitive manner, choosing elementary processes from the (IN-)Brainstorm library and assembling them together into a logical progression along the workflow. These pipelines enable the reproduction of any data workflow with a click of a button. They can also be shared in Matlab format with collaborators or the entire user community. The Matlab code for pipelines can also be generated automatically by Brainstorm e.g., for execution in headless (no GUI) mode on high-performance computing servers and cloud resources.

Discussion
We provide a free, extensive open-source software application for invasive electrophysiology. IN-Brainstorm is built on the foundations of Brainstorm, which was originally designed for human multimodal electrophysiology and imaging. IN-Brainstorm supports multiple data formats of raw signals from a variety of acquisition systems. The recorded traces and their LFP versions can be reviewed, quality-controlled and processed within a unique analytical environment, with easy GUI interactions, rich visualization, intuitive pipeline editing for scripting and sharing. We have built bridges for IN-Brainstorm to interoperate seamlessly with established, free spike-sorting tools.
A specific emphasis was put on providing versatile solutions for multidimensional data visualization, including 2-D and 3-D topographical plots registered to structural anatomy from co-registered MRI data. Source modeling of array data is also available using boundary element modeling of head and brain tissues 63,64 and a variety of source modeling techniques available in Brainstorm 65 . Videos synchronised to electrophysiological traces can also be imported and visualized simultaneously in synchrony, for marking behavioral events.
The software is supported by an expansive online documentation (with tutorial data) and online user forum. The active Brainstorm user community contributes to an efficient peer-reviewing/debugging process, and daily updates deliver bug fixes and software improvements that are readily available to the users.
With IN-Brainstorm, electrophysiologists are provided a free, integrated software environment that promotes and facilitates harmonized principles of data management, methods, documentation, code verification and reproducibility of data analyses. Such practical and user-friendly tools also accelerate the education of electrophysiologist trainees and promotes the adoption and expansion of data harmonization efforts, such as BIDS and Neurodata Without Borders.
Every instance of data processing is logged, with the filenames of the data used and time stamps of execution. These simple, yet powerful features document the provenance of data derivatives and analysis results. Custom IN analysis pipelines assembled for elementary processing blocks of the software's library can be shared with collaborators, publishers and the scientific community. Pipelines are constructed via the GUI and saved as Matlab files. The open-source code of IN-Brainstorm is thoroughly documented, verifiable and can benefit from contributions from any user via GitHub. Sharing is further encouraged and facilitated by Brainstorm's data organization in Studies, which can be zipped for archiving, exportation (e.g., as a BIDS repository) or importation into the Brainstorm environment of a collaborator. Batch processing of multiple data volumes is automated, thanks to the systematic organization of Brainstorm's file system and can be executed on high-performance computing servers without requiring GUI interactions.
For all these reasons, we believe that IN-Brainstorm responds to an unmet need of the electrophysiology community. By providing a unique environment with a common set of analytical tools, the application also provides a unique bridge between recording scales, data types and researchers, and additionally, between the methods used in human, animal and slice preparations. It also represents a scalable framework to developments and integration of existing or future tools and data formats for the entire field of electrophysiology.

Data availability
The dataset that was used for showcasing this toolbox, is available as part of the tutorial for the toolbox's features: https://neuroimage.usc.edu/brainstorm/e-phys/Introduction.