BIDS-EEG: an extension to the Brain Imaging Data Structure (BIDS) Speciﬁcation for electroencephalography

The Brain Imaging Data Structure (BIDS) project is a quickly evolving eﬀort among the human brain imaging research community to create standards allowing researchers to readily organize and share study data within and between laboratories. The ﬁrst BIDS standard was proposed for the MRI/fMRI research community and has now been widely adopted. More recently a magnetoencephalography (MEG) data extension, BIDS-MEG, has been published. Here we present an extension to BIDS for electroencephalography (EEG) data, BIDS-EEG, along with tools and references to a series of public EEG datasets organized using this new standard


Introduction
Electroencephalography (EEG) is now almost a century old, with its first application in humans in the 1920s (Berger, 1929).EEG records the electric potential fluctuations at the scalp, primarily from locally synchronous post-synaptic activity in the apical dendrites of pyramidal cells in the cortex.The signal recorded at each channel is however a mixture, not only of the neuronal electrical activity all over the brain, but also from other physiological (electrooculographic (EOG), electrocardiographic (ECG), electromyographic (EMG), etc.) and non-physiological (e.g., line noise) sources.As such, the sensitivity pattern of each channel (a.k.a."leadfield") strongly depends on the location and orientation of underlying generators, as well as on the geometry and conductivity profile of the tissues in the head.
EEG is a technique widely used in both clinical and non-clinical settings and is playing an increasing role in cognitive neuroscience.Scientific reports statistics signal a renewal of interest in EEG beginning around the early 2000s (see figure 1), and currently accelerating, in particular reflecting interest in brain-computer interfaces and use of more sophisticated dynamics measures with more accurate biophysical models for reconstructing and imaging source-resolved brain dynamics.1950 1960 1970 1980 1990 2000  Compared to other imaging modalities, EEG is more versatile because (i) it is lightweight, nowadays even wireless and wearable, and requires relatively low cost equipment; (ii) it can be used in many different environments (while seated in a lab chair, or whilst driving, walking, working on a task or playing a video game, during sleep, in social situations, etc.), either alone or in conjunction with other imaging modalities; (iii) it has less restrictive task design constraints than metabolic (PET) or hemodynamic (fMRI) imaging methods, and; (iv) it captures the neural activity with millisecond precision, allowing recording cortical dynamics occurring at the speed of perception, thought and action.Because of this versatility, the field of applications for EEG is broad and therefore the commercial market for EEG systems is much larger than that of other imaging techniques (like PET or MRI).With such a broad market comes a multitude of equipment manufacturers (already more than 10 prominent ones in neuroscience), building different hardware systems, usually with their own software and proprietary data formats.Moreover, for economical reasons, manufacturers have little incentive to cooperate and provide compatible formats.Such diversity is an impediment to reuse of data and for building large scale EEG databases.
The Brain Imaging Data Structure (BIDS, Gorgolewski et al., 2016, RRID:SCR_016124) specification, originally proposed for magnetic resonance imaging data (MRI), is now a set of human brain research community standards used for organizing and sharing brain imaging study data within and between laboratories.BIDS therefore primarily addresses the issue of data structure heterogeneity.By agreeing on how to structure data using dedicated metadata files, dictionaries and naming conventions, BIDS standards foster interoperability and reuse of acquired datasets (Wilkinson et al., 2016).Because BIDS data are structured, BIDS also addresses issues of reproducibility and, when combined with sufficiently detailed annotation of experimental events and designs, allows creation and application of fully automated data analysis workflows applied to or across studies from many laboratories.Here we report on the extension of the BIDS standard to EEG data: BIDS-EEG.

BIDS-EEG history
BIDS-EEG was created following publication of the BIDS-MEG extension (Niso et al., 2018), which itself derived from the specification originally dedicated to MRI/fMRI and behavioral data (Gorgolewski et al., 2016).Because MEG and EEG data share many features, the MEG organization was used as a template to ensure maximal compatibility between standards and to ease integration.Soon after the EEG extension was created, the intracranial EEG extension (BIDS-iEEG) was also developed (Holdgraf et al., in preparation).By developing the BIDS standard for MEG, EEG and iEEG in close collaboration, compatibility of relevant fields is ensured.As with the preceding BIDS standards, the creation and initial updates of BIDS-EEG have very much been a community effort driven by concrete user cases and open discussion.Particular issues of community concern have been the support for specific data formats, for the explicit distinction between channels versus electrodes and for the description of electrode positions in relation to either fiducials or anatomical landmarks.

BIDS-EEG Specification summary
The specification follows the general BIDS: Each subject has a directory of raw data containing subdirectories for each session and modality.This is accompanied by a dataset_description.json file and a metadata file with the suffix _eeg.json, that specifies the task, the EEG system used (amplifier, hardware filter, cap, placement scheme, etc.)1 .
Optionally, a sourcedata directory containing the original behavioral and EEG data (if different format, see below) can be present, as well as a stimuli directory and a code directory to allow data conversion and preprocessing to be reproduced, as indicated in the original specification (Gorgolewski et al., 2016).Within each subject directory, the eeg subdirectory contains the EEG and metadata (see figure 2).For instance, for a single session study, sub-XX would have subdirectory eeg which contains multiple EEG files as sub-XX_task-YY_run-ZZ_eeg.edf files corresponding the different runs of EEG data acquisition.In addition, sub-XX_task-YY_channels.tsv must be specified describing the parameters of the data acquisition and sub-XX_task-YY_electrodes.tsv and sub-XX_task-YY_coordsystem.jsonfiles should be specified, if the positions of the electrodes are known (see below).As for fMRI, sub-XX_task-YY_run-ZZ_events.tsvfiles should be present, indicating the onset of events, trial type, duration, responses, etc.As for fMRI, events.tsvfiles should be present, indicating the onset of events, trial type, duration, responses, etc.While such information is often present as one or several binary trigger channels in the EEG recordings, the representation of events is rarely explicit in the original data (e.g., a numeric code is used to indicate the onset of a given picture presented in a given task condition) hence the necessity of these events.tsvfiles.Example of * eeg.json { "TaskName " : "TASKNAME" , " SamplingFrequency " : 1 0 0 0 , " S o f t w a r e F i l t e r s " : "n / a " , "EEGChannelCount " : 4 , "EOGChannelCount " : 1 , " EEGReference " : " p l a c e d on Cz " , " PowerLineFrequency " : 50 } .jsonfile that provides metadata.The EEG data and anatomical MRI scans are saved per subject within the eeg and anat subdirectories respectively.If the original data is not supported by BIDS, it can be included in an additional sourcedata directory.Finally, a stimuli directory contains the stimuli that were presented to the participants in the experiment.
Finally, since the initial specification for BIDS-MRI in 2016, the Hierarchical Event Descriptor (HED) system for precise annotation of events (Bigdely-Shamlo et al., 2016) has been integrated.The HED system allows to describe each event with a formatted string of pre-specified words that are called tags within the system.Tags are hierarchical and levels are delimited with a forward slash ("/").Multiple tags can be combined, delimited with a comma.Parentheses (brackets) group tags and enable specification of multiple items and their attributes in a single HED string.Detailed event annotation is useful for electrophysiological data in which dynamics of brain activity are associated with multiple experimental events.The HED standard is an open community tagging schema whose toplevel tags are specified2 but whose lower-level tags, giving more specialized detail about events in question, can be accumulated in personal, laboratory, or consortium libraries, or can be considered for publication as accepted tags and tag categories.

Data Formats
EEG data can be saved in numerous formats and compared to MRI and MEG data there is a much larger heterogeneity of data formats.Allowing all existing data formats to be included as standard BIDS files would place a burden upon the potential users to deal with this diversity of the data, exacerbated by the fact that some data formats are proprietary or not openly documented.This problem might well undermine the positive effect that BIDS aims to have on the development of integrated apps and analysis software.The process of converging on a list of suitable data formats for BIDS-EEG was governed by three major requirements: A suitable data format should (i) address the needs of a large portion of the global EEG community, (ii) be a FAIR format (Wilkinson et al., 2016), with a focus on interoperability, and should (iii) meet the technical requirements of neuroscientific workflows, such as saving numerical data with high precision.
As a solution to this challenge, the BIDS-EEG specification incorporates only two recommended official data formats: The European Data Format (EDF3 ), which is an ongoing international effort to provide a common data format for electrophysiological recordings that started in 1992 (Kemp et al., 1992;Kemp and Olivan, 2003), and the BrainVision data format4 , as used by Brain Products GmbH5 .While the BrainVision data format was designed by Brain Products GmbH for their proprietary EEG recording equipment and analysis software, it is based on the Microsoft Windows INI file and has a concise documentation6 .Both of these formats follow the three above mentioned requirements: (i) they are used widely in the community as indicated by a recent survey7 , (ii) they have open access documentation, open source implementation for both reading and writing in at least two programming languages and are widely supported in multiple software packages (both open source and commercial) and (iii) have high numerical precision (16 and 32 bits respectively).To accommodate a larger scientific audience and facilitate adoption, the BIDS-EEG standard also allows two unofficial commonly used data formats: the format used by the MATLAB toolbox EEGLAB (Delorme and Makeig, 2004) (.set and .fdtfiles), and the Biosemi format (.bdf8 ).These formats, while not actively encouraged, are still included because of their popularity and their interoperability among the major software packages.Future versions of BIDS may extend the list of officially supported data formats, based on the fulfillment of the above mentioned criteria and usability to develop data analysis pipelines.

Electrodes versus Channels
The notions of electrodes and channels are often used interchangeably in the context of EEG research.However for BIDS-EEG, it is crucial to distinguish between the two in order to provide an unambiguous documentation.An EEG electrode is attached to the skin, whereas a channel is the combination of the analog differential amplifier and analogto-digital converter that result in a potential (voltage) difference being stored in the EEG dataset.The reference and ground electrodes should not be referred to as channels and only as electrodes.Some systems (e.g., Biosemi) have an active floating reference, whilst for most of the other systems, the potential at electrodes is neither amplified nor recorded.For BIDS-EEG, researchers must specify a channels.tsvand may in addition specify an electrodes.tsv(see example in figure 2).

Fiducials, Anatomical landmarks, Coordinate System
Similar to the questionable synonymous use of electrodes and channels in the EEG community, there is often a confusion of fiducials and anatomical landmarks, which we now distinguish.Fiducials are objects with a well defined location used to facilitate the localization of electrodes and co-registration with other geometric data such as the participant's own T1 weighted magnetic resonance head image, a T1 weighted template head image, or a spherical head model.Commonly used fiducials are vitamin-E pills, which show clearly in an MRI, or reflective spheres that are localized with an infrared optical tracking system.Anatomical landmarks on the other hand define locations on a research subject such as the nasion, which is the intersection of the frontal bone and two nasal bones of the human skull.Fiducials are typically used in conjunction with anatomical landmarks.An example would be the placement of vitamin-E pills on top of anatomical landmarks (Agrawal and Steinbok, 2009), or the placement of LEDs on the nasion and preauricular points to triangulate the position of other LED-lit electrodes on a research subject's head.A coordsystem.json file is used to specify the fiducials, the location of anatomical landmarks, and the coordinate system and units in which the position of electrodes and landmarks is expressed.The coordsystem.json is required if the optional electrodes.tsv is specified.If a corresponding anatomical MRI is available, the locations of landmarks and fiducials according to that scan should also be stored in the T1w.json file which goes alongside the MRI data.

Public EEG-BIDS datasets
During the preparation of the BIDS-EEG specification, three full data sets have been made publicly available and study examples (with zero-byte datafiles) are available on the BIDS GitHub repository9 .
Single session per participant (eeg_matchingpennies, Appelhoff et al., 2018) The matching pennies dataset was collected as part of a student project to replicate a brain computer interface study of motor intention decoding.In this study, participants were playing a game of matching pennies against the computer.After initiating a countdown of three seconds, the participants raised either their left hand or their right hand; at the same time the computer presented a stimulus to the left side or to the right side.Participants could win against the computer by performing actions opposite to that of the computer.The computer however had access to the real time EEG data and could make use of the participants' EEG activity before the end of the countdown, thus trying to decode a lateralized readiness potential and increasing the chances to win against the participant.In the dataset, the offline data of 7 participants are provided.
Multiple sessions per participant (eeg_rishikesh, Brandmeyer and Delorme, 2018) This study was conducted at the Meditation Research Institute (MRI) in Rishikesh, India.All participants (25 meditators from the Himalayan Yoga tradition) were asked to meditate continuously throughout the experiment in their usual seated meditation position.Experience sampling probe questions were presented at random intervals ranging from 30 seconds to 90 seconds throughout the duration of the experiment.Probes, in the form of pre-recorded vocal questions, were presented on two freestanding speakers.Each experience sampling probe series consisted of three questions: "Please rate the depth of your meditation", "Please rate the depth of your mind wandering", "Please rate how tired you are".Subjects responded on a small customized numeric USB keypad (0, 1, 2 or 3) resting on their right thigh, to enable their right hand to comfortably rest without having to move or open their eyes.
Resting state (eeg_rest_fmri, Deligianni et al., 2014) In this study, simultaneous resting-state EEG-fMRI were acquired in 17 subjects who had their eyes open and were asked to remain awake and fixate on a white cross presented on a black background.Structural T1 weighted data and diffusion data (NODDI sequence) are also available for these subjects.

Data format validation software
As part of the BIDS project, datasets formatted to follow the BIDS-EEG standard can be validated using bids-validator, a JavaScript application that can be run locally using Node.js.Both a command line version10 , and a browser version11 are available.Note that the validator runs in the browser and no data is uploaded when using either version of the bids-validator preventing any data protection issues to arise.The functionality of the previous BIDS validator was extended to capture the specifics of EEG data.With this validation tool, scientists are empowered to check their newly formatted datasets and make full use of the data structure's strengths such as checking for missing data or underspecified metadata.

Data conversion software
To help EEG practitioners to convert data from one format to EDF (16 bit) or BrainVision data format (32 bit), we have started a project to make data readers accessible that only depend on Python for both EDF12 and BrainVision data format13 .This tool will be extended with time to provide a universal converter to prepare data for BIDS, as does for example dcm2niix for MRI14 .Data conversion utilities from many EEG formats to the EDF and BrainVision data format are available in MATLAB from the FieldTrip Toolbox15 .
Community tools and software support EEGLAB 16EEGLAB (Delorme and Makeig, 2004) is a freely available, readily extensible open source software environment running on MATLAB (The Mathworks, Inc.) for analysis of neuroelectromagnetic data, particularly EEG.Its maintenance and further development at UCSD is supported by U.S. National Institutes of Health since 2004.A new EEGLAB function, std_tobids.m,reads an EEGLAB study, which is a type of file that EEGLAB uses for group analysis, and exports it into the BIDS-EEG specification.Several subfunctions have been made to export information to create channel and electrode .tsvand .jsonfiles.Future work will allow creating multiple sessions studies and export them, and conversely import automatically BIDS ready studies.

Fieldtrip 17
FieldTrip (Oostenveld et al., 2011) is an open source MATLAB toolbox for channel and source-level MEG, EEG and iEEG analysis.It includes a collection of highlevel, consistent, well-documented and user-friendly functions that researchers combine in pipelines (MATLAB scripts) for their analysis.FieldTrip supports reading from most EEG formats, including the ones used in BIDS, and can also export to the EDF and BrainVision data formats.Analyzing data that is organized in BIDS is not very different from other data, except that researchers can read additional metadata (events and annotations) from the events.tsvsidecar file.FieldTrip includes the data2bids.mfunction to help users to organize their EEG, MEG, iEEG and MRI data in BIDS and to provide proper metadata annotation.Tutorial documentation for BIDS are available18 .

MNE 19
MNE (Gramfort et al., 2014) is a software suite for exploring, visualizing, and analyzing human electrophysiological data.It consists of three fully integrated core subpackages to be used with the Python programming language (Gramfort et al., 2013), MATLAB and C/C++.Functions to read EEG data formats as supported by BIDS are provided.Furthermore, there is active development underway to make MNE-Python completely compatible with the BIDS specification in the form of the MNE-BIDS project20 .At the current state of the project, users can read their raw data using MNE-Python and then automatically format an initial BIDS directory with metadata extracted from the raw data already partially filled in the correct format.An example is available21 ).

SPM 22
SPM (Flandin and Friston, 2008) is a free and open source software for the analysis of neuroimaging data.It is mostly written in MATLAB and offers a wide range of methods for the analysis of PET, MRI, fMRI, EEG and MEG data (Litvak et al., 2011).The latest release, SPM12, includes a library, spm_BIDS.m,to interact with BIDS datasets.It has been extended to support the present EEG specification23 and is also available as a standalone library24 .

Data analysis pipeline and reproducible workflows
The long history, versatility and variety of applications of EEG makes it a data and method rich technique.The recent OHBM guideline for good practices and reproducibility in EEG (Pernet et al., 2018) lists eight preprocessing steps for standard event related potentials (identification and removal of electrodes with poor signal quality, artifact identification and removal, detrending, digital low-and high-pass filtering, data segmentation, additional identification/elimination of artifacts, baseline correction, and rereferencing) with the order of steps depending on applications and potentially augmented by additional transformation in time, frequency or time-frequency domains, projection into source space and additional connectivity measurements.This implies that while the BIDS-EEG will help data sharing, it will remain non-trivial to develop automated preprocessing pipelines of magneto- Niso et al. (2018) and electrophysiological (Holdgraf et al., in preparation) prepared data such as fMRI BIDS apps (Gorgolewski et al., 2017).BIDS represents however the necessary step to achieve validated and reproducible data analysis.

Beyond sharing raw data
This article describes the new EEG extension for the Brain Imaging Data Structure, and has limited itself to sharing raw data using previously developed community standards.Challenges that are specific to EEG, such as support for data formats, are still actively debated, and some additional formats will likely be incorporated once the technical issues and standards of FAIRness are achieved.The development of BIDS for EEG derivatives is also already underway (see BIDS Extension Proposal 21), which will allow sharing preprocessed data (see e.g., Bellec et al., 2017), thus fostering re-analyses, meta-analyses, and new analyses without the burden of data preparation.
• S.A.: critical review of the specification, moderating community interactions during the process, preparation of datasets and examples, coding of the bids-validator extension, coding of bids MNE tools, writing the manuscript.
• G.F.: critical review of the specification, coding of BIDS SPM tools, critical review and final approval of the version submitted.
• C.P.: critical review of the specification, preparation of examples, critical review and final approval of the version submitted.
• A.D.: critical review of the specification, preparation of datasets and examples, coding of BIDS EEGLAB tools, critical review and final approval of the version submitted.
• R.O.: conception and design of the specification, moderating community interactions during the process, preparation of datasets and examples, coding of the bids-validator extension, coding of BIDS FieldTrip tools, critical review and final approval of the version submitted.

Figure 1 :
Figure 1: Number of publications per year since 1950 with EEG in the title or abstract as obtained from PubMed.See the code at https://zenodo.org/record/1490630 to reproduce the figure.

Figure 2 :
Figure 2: A prototypical directory tree of a BIDS dataset containing EEG data.At the root level of the directory, the README, CHANGES, and dataset_description.json files provide basic information about the dataset.A participants.tsvdata file is accompanied by a participants.jsonfile, which contains the description of the columns in its associated .tsvfile.The panel in the upper right of the figure provides examples on the typical format within a .tsvand .jsonfile.Usually, each .tsvfile is accompanied by a.json file that provides metadata.The EEG data and anatomical MRI scans are saved per subject within the eeg and anat subdirectories respectively.If the original data is not supported by BIDS, it can be included in an additional sourcedata directory.Finally, a stimuli directory contains the stimuli that were presented to the participants in the experiment.