A comparison of neuroelectrophysiology databases

Subash, Priyanka; Gray, Alex; Boswell, Misque; Cohen, Samantha L.; Garner, Rachael; Salehi, Sana; Fisher, Calvary; Hobel, Samuel; Ghosh, Satrajit; Halchenko, Yaroslav; Dichter, Benjamin; Poldrack, Russell A.; Markiewicz, Chris; Hermes, Dora; Delorme, Arnaud; Makeig, Scott; Behan, Brendan; Sparks, Alana; Arnott, Stephen R; Wang, Zhengjia; Magnotti, John; Beauchamp, Michael S.; Pouratian, Nader; Toga, Arthur W.; Duncan, Dominique

doi:10.1038/s41597-023-02614-0

Download PDF

Article
Open access
Published: 19 October 2023

A comparison of neuroelectrophysiology databases

Scientific Data volume 10, Article number: 719 (2023) Cite this article

2284 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Abstract

As data sharing has become more prevalent, three pillars - archives, standards, and analysis tools - have emerged as critical components in facilitating effective data sharing and collaboration. This paper compares four freely available intracranial neuroelectrophysiology data repositories: Data Archive for the BRAIN Initiative (DABI), Distributed Archives for Neurophysiology Data Integration (DANDI), OpenNeuro, and Brain-CODE. The aim of this review is to describe archives that provide researchers with tools to store, share, and reanalyze both human and non-human neurophysiology data based on criteria that are of interest to the neuroscientific community. The Brain Imaging Data Structure (BIDS) and Neurodata Without Borders (NWB) are utilized by these archives to make data more accessible to researchers by implementing a common standard. As the necessity for integrating large-scale analysis into data repository platforms continues to grow within the neuroscientific community, this article will highlight the various analytical and customizable tools developed within the chosen archives that may advance the field of neuroinformatics.

Data Archive for the BRAIN Initiative (DABI)

Article Open access 09 February 2023

Integrated open-source software for multiscale electrophysiology

Article Open access 25 October 2019

Data and Tools Integration in the Canadian Open Neuroscience Platform

Article Open access 06 April 2023

Introduction

Open science

Open science aims to make research data more transparent and widely available while promoting interdisciplinary partnerships that leverage findings^1,2. In the United States, public health organizations, such as the National Institutes of Health (NIH), are funding data repositories that can serve as reliable resources to make data accessible. Requiring and encouraging data sharing, promoting common standards, and providing tools for analysis are the foundation of translating research findings into new knowledge, products, and procedures^3,4,5.

Access to existing datasets presents many advantages, including developing new hypotheses and serving as a source for preliminary analyses. Further, secondary analysis (analysis of existing data to address a different question from the original study) is critical in novel research fields where limited data hinder the production of replicable results, and pooling data sets can add statistical power to an otherwise limited study. Lastly, reused datasets can validate previous conclusions or be repurposed to address new questions with lower cost and effort.

Intracranial electroencephalography (iEEG) provides high temporal and spatial resolution, enabling a level of detail not possible using other neurodata capturing techniques. With iEEG more widely utilized in clinical settings and FDA approvals of deep brain stimulation (DBS) for multiple conditions in the last three decades, electrophysiology studies have become more critical and frequently employed in neuroscience.

Intracranial neuroelectrophysiology

Intracranial neuroelectrophysiology data can be collected through electrodes placed on the cortical surface for electrocorticography (ECoG), intracortical for stereoelectroencephalography (sEEG), or from deep brain stimulation (DBS) electrodes. The recordings are obtained from patients undergoing clinically indicated brain surgery for neurological conditions or those participating in device trials with FDA investigational device exemption (IDE) approval^6,7,8. The complexity of the implantation procedures requires multimodal imaging data for proper placement of the electrodes, enriching the resultant datasets. The nature of these procedures, their high costs, and their specialized clinical requirements make these studies relatively rare. In addition, electrodes are placed sparsely, covering different brain regions across patients. Therefore, limited sample sizes^8,9,10 establish a need for centralized databases to make rare data types available to the larger community for large-scale studies.

Sharing non-human invasive neuroelectrophysiology data

Sharing non-human data, while different from sharing human data regarding privacy concerns, requires careful consideration of ethical factors related to animal welfare and research practices¹¹. Archives hosting non-human neuroelectrophysiology datasets should stress adherence to animal welfare standards and ensure proper sharing permissions, with clear disclosures of any restrictions. Substantial documentation, including using Latin species names, should be provided to prevent data pooling and reanalysis mistakes.

While PRIMatE Data Exchange (PRIME-DE) (https://fcon_1000.projects.nitrc.org/indi/indiPRIME.html) aims to support open science in the neuroimaging community, initiatives involving protocols for sharing non-human neuroelectrophysiology data are still under development.

History of intracranial neuroelectrophysiology databases

So far, there have been several significant developments toward creating valuable databases housing neuroelectrophysiology data. Notable pioneers include EPILEPSIAE (http://www.epilepsiae.eu), iEEG.org (https://www.ieeg.org), EEGLAB (https://eeglab.org), and Collaborative Research in Computational Neuroscience (CRCNS) (https://crcns.org).

One of the early efforts to construct an electrophysiology repository was undertaken in 2012 by the Epilepsy Research Group at the University of Leuven (Katholieke Universiteit Leuven) in Belgium. EPILEPSIAE (Evolving Platform for Improving the Living Expectations of Patients Suffering from IctAl Events)¹² with a repository subdivision known as The European Epilepsy Database, was developed to provide access to expert-annotated electrophysiology recordings along with metadata and imaging for 275 patients through serving as a paid resource for researchers, clinicians, and students.

Another early platform for data sharing and collaboration The International Epilepsy Electrophysiology Portal (iEEG.org), was established in 2013 by the University of Pennsylvania and the Mayo Clinic. iEEG.org revolutionized the creation and curation of intracranial neurophysiologic and multimodal datasets while making large-scale complex analysis and customization easier for researchers¹³.

A different approach was undertaken by the creators of EEGLAB (https://sccn.ucsd.edu/eeglab/index.php), an environment for human EEG analysis developed by Swartz Center for Computer Neuroscience at the University of California, San Diego (UCSD) in 2004 (https://sccn.ucsd.edu). EEGLAB gathered contributions from programmers, tool authors, and users while providing access to 32-channel EEG recordings from 14 patients, which later became available on the OpenNeuro platform¹⁴. In 2019, EEGLAB creators, jointly with OpenNeuro, built the Neuroelectromagnetic Data Archive and Tools Resource (NEMAR)¹⁵.

CRCNS was established in 2002 in collaboration with funding from the National Science Foundation and National Institutes of Health, with the goal of enabling concerted efforts to understand and share neurodata, stimuli, and analysis tools with researchers worldwide. Data available on the CRCNS platform include physiological recordings from sensory and memory systems, as well as eye movement data¹⁶.

Governing bodies with neurodata sharing mandates

The NIH brain initiative

In the United States, in 2013, NIH launched the BRAIN (Brain Research Through Advancing Innovative Neurotechnologies) Initiative to advance neuroscience through multimodal, cross-disciplinary, and multi-institutional research, fostering a more integrative approach. Over $1.5 billion has been invested in investigating treatments for brain disorders while advancing research tools and technologies. BRAIN Initiative studies have produced a wealth of neurodata that can further expand our knowledge, making data archives a vital part of its efforts¹⁷.

Presently, several existing BRAIN Initiative-funded neurophysiology repositories collect data to develop new features and expand the size and scope of their systems while sharing broadly with the scientific community. These include Data Archive for the BRAIN Initiative (DABI) (https://dabi.loni.usc.edu), Distributed Archives for Neurophysiology Data Integration (DANDI) (https://www.dandiarchive.org), and OpenNeuro (https://openneuro.org) with its partner analysis platform NEMAR (https://nemar.org).

Ontario brain institute

In Canada, the Province of Ontario recognized the need to improve the diagnosis and treatment of brain disorders, aiming to implement a province-wide integrated approach to research. As a result, in 2010, it established and funded the Ontario Brain Institute (OBI) to create a patient-centered research system, engage the industry, and drive knowledge exchange between researchers, policymakers, and the neuroscience industry¹⁸. In 2012, OBI launched Brain-CODE (https://www.braincode.ca), a data-sharing informatics platform, as a crucial part of its efforts to facilitate and maximize the integrative research approach.

Each archive aims to improve public health by increasing research transparency through data accessibility, reproducibility, and inter-institutional collaboration. Data from DABI, DANDI, OpenNeuro, NEMAR, and Brain-CODE contributed to numerous publications^{19,20,21,22,23,24,25,26,27,28,29,30,31,32} demonstrating their influential impact on neuroscience research.

National institute of mental health (NIMH) and the national institute of mental health data archive (NDA)

Another repository, the NDA (https://nda.nih.gov), is managed by the NIMH for researchers to store, share, and access research data related to mental health. NDA aims to accelerate scientific research and discovery by sharing de-identified and harmonized data across scientific domains (https://nda.nih.gov). It provides a secure platform for researchers to upload and store clinical, neuroimaging, and genomic data, ensuring that datasets are de-identified, sensitive information is encrypted, and strict access controls are in place. Not all data on NDA are publicly available. Some datasets require access requests and approvals by authorized individuals or institutions.

While NDA shares the goal of accelerating discovery through sharing and reanalyzing existing neurodata, it is distinct from DABI, DANDI, and OpenNeuro, which focus on neuroresearch data collected through the BRAIN Initiative-funded projects. Further, NDA focuses on mental health research data and has a cost associated with data deposition, which is intended to cover the maintenance and curation of the archive.

The aim of this review is to describe archives that provide researchers with tools to store, share, and reanalyze both human and non-human neurophysiology data based on criteria that are of interest to the neuroscientific community.

Methods

Governmental agencies, academic institutions, and patients engaged in research have collectively acknowledged the imperative of sharing scientific data. This imperative is crucial for enhancing transparency and driving research progress, ultimately minimizing the duplication of efforts and resource allocation. Consequently, data archives hold immense potential to revolutionize scientific research by establishing standardized data collection protocols while safeguarding data privacy, security, and long-term preservation.

Data governance is critical in well-established archive management and data asset control. It involves establishing frameworks that dictate how data are collected, stored, accessed, shared, and organized. In the context of data archives, data governance oversees the entire archival process, encompassing data retention policies, security measures, access controls, and data integrity and privacy. Additionally, it facilitates appropriate and controlled data sharing among relevant stakeholders.

To better appreciate the scope of neurophysiology databases and describe optimal user systems, DABI, DANDI, Brain-CODE, and OpenNeuro, jointly with NEMAR (Note: as NEMAR platform is an analysis partner to OpenNeuro archive and does not store independent data, it will be discussed only in the context of neurodata analysis tools), are summarized and compared to assist individuals in the scientific community who have an interest in sharing and accessing human and non-human neuroelectrophysiology data. Inclusion criteria for the selected archives include accessibility to free human and non-human iEEG data variables, integration of open access or controlled access sharing protocols, establishment in North America with global users, and preferred data archives in NIH or OBI-mandated data sharing initiatives.

Though not exhaustive, this review utilizes the following method of assessment to compare databases containing intracranial recordings, focusing on criteria related to data governance frameworks:

Data Standards & File Formats
Data Upload Procedures
Data Download, Access Protocols, and Policies
Data Storage and Maintenance
Analytic Tools

To assist with identifying which archive meets the needs of potential data users or data sharers, summary tables of features are provided at the end of each criteria discussion.

DABI

Funded in 2018, DABI was created to facilitate and streamline the dissemination of human and non-human neurophysiology data, focusing on intracranial recordings. DABI emphasizes the organization and analysis with investigators who retain control and ownership of their datasets while fulfilling data-sharing directives. Housed at the University of Southern California Stevens Neuroimaging and Informatics Institute, DABI provides innovative infrastructure for interactive data visualization, processing, sharing, and collaboration among researchers³³.

DANDI

Funded in 2019, DANDI is a repository that accepts cellular neurophysiology and neuroimaging datasets termed Dandisets (https://github.com/dandisets) for both human and non-human data. The self-service model allows uploading, organizing, and analyzing data with tools provided by the platform, giving users greater control over their data; however, it requires technical expertise to use the platform effectively. Additional features include storage optimizations and tools, allowing investigators to collaborate outside their institutions. DANDI positions itself as a platform for scientists new to secondary analysis. Led by scientists from the Massachusetts Institute of Technology and Dartmouth College, DANDI is designed to aid in the adoption of Neurodata Without Borders (NWB)³⁴, Brain Imaging Data Structure (BIDS)⁷, and Neuroimaging Data Model (NIDM)³⁵. Also included are World Wide Web Consortium-Provenance (W3C-PROV) data, metadata, and provenance standards that address data harmonization challenges and promote interoperability³⁶.

OpenNeuro

Funded in 2018 and led by Stanford University, OpenNeuro is one of the largest repositories of human and non-human neuroimaging data¹⁴. Developed from an earlier version of the platform, OpenfMRI (https://openfmri.org/), OpenNeuro is built around the BIDS specification to simplify file formats and folder structures for broad accessibility, evolving into its current ecosystem of tools and resources. OpenNeuro began supporting iEEG data in 2019 after the modality was incorporated into the BIDS standards (iEEG-BIDS) as an extension⁷. NEMAR is a partner to OpenNeuro for MEG, EEG, and iEEG data (MEEG) and provides additional MEEG tools for datasets made available for public downloads on the OpenNeuro platform, which undergo quality checks and automatic preprocessing. In addition to BIDS, NEMAR uses detailed descriptions of experimental events stored using the Hierarchical Event Descriptor (HED) system^15,37. Figure 1 illustrates the iEEG-BIDS folder structure⁷.

Brain-CODE

Launched in 2012, Brain-CODE²² is a platform that provides secure informatics-based data sharing, management, and integration with standards that maximize the interoperability of complex neuroscientific human and non-human datasets. In addition to hosting studies that utilize iEEG and Magnetic Resonance Imaging (MRI) data, Brain-CODE collects and shares clinical measures, neuropsychological, omics, sensor, and other data types to facilitate deeper neuroscientific understanding. Brain-CODE includes processing pipelines, notebooks, and virtual desktops to assist with analytics. The platform further promotes academic and industry collaborations for research and discovery.

Results

Data standards and file formats

A variety of neurophysiology data modalities (i.e., EEG, MEG, DBS, and iEEG) results in a wide range of formats and structures, leading to challenges in integrating and analyzing pooled data. The lack of standardization of recorded file formats complicates building large-scale datasets and requires file conversion. The emergence of intracranial neurophysiology databases necessitated improved standardization and harmonization protocols to ensure data usability and integration. DABI, DANDI, OpenNeuro, and Brain-CODE offer nuanced solutions to address this demand.

The brain imaging data structure (BIDS)

BIDS has gained broad acceptance by the neuroimaging community, becoming the leading standard for harmonizing imaging data. As previously mentioned, electrophysiology data is complex and challenging to harmonize because there are many different formats in which the recording devices store the (source) data. Several electrophysiology data formats are allowed in the BIDS specification. For EEG, these include European Data Format and its extensions (EDF/EDF + /BDF)³⁸, Brain Vision Core Data Format³⁹, and EEGLAB. iEEG additionally allows constrained NWB and MEF3 files to allow data chunking (NWB & MEF3), lossless compression (NWB & MEF3) and HIPAA-compliant multi-layer encryption of sensitive data (MEF3). Lastly, MEG is limited to CTF, Neuromag, BTi/4D Neuroimaging, KIT/Yokogawa, KRISS, and Chieti file formats (Note that this is not a strict rule, as some iEEG and MEG files may contain EEG channels). While there are some differences in the formats across modalities, the overall structure is harmonized such that metadata with information about channels, electrodes, and events are stored similarly across MEG, EEG, and iEEG modalities.

Neurodata without borders (NWB)

NWB format is a standard that packages neurophysiology data with the metadata necessary for reanalysis. NWB is primarily used for cellular neurophysiology data such as extra- and intra-cellular electrophysiology, optical physiology, and behavior (Fig. 2). Several NWB datasets on DANDI and DABI contain iEEG data³², but it is not commonly used for EEG or MEG. In contrast to BIDS, which supports storing acquired data in domain-specific formats, NWB requires that the electrophysiology measurements be stored within the NWB file. Although this creates a higher barrier for data conversion, it provides increased standardization and enables advanced data engineering tools such as data chunking and lossless compression.

Each repository discussed here approaches data standards differently. Some archives place the burden of file conversions on the data providers. Others take on the task themselves or leave the harmonization protocol to the data users to decide and execute.

DABI

DABI hosts a broad range of multimodal data emphasizing intracranial neurophysiology. Neurological diagnostic test and procedure subtypes, imaging, behavioral, demographic, and clinical variables are also stored on the platform. Modalities of data include iEEG, EEG, electromyography (EMG), single/multi-unit microelectrode recordings, DBS, MRI, fMRI, DWI, positron emission tomography (PET), and computed tomography (CT). DABI accepts multiple data formats (see Table 1 for a comprehensive list) to alleviate the challenge of time-consuming file conversions but strongly encourages using NWB and BIDS standards when possible. The variety of data formats and modalities within DABI is intended to be all-encompassing and includes scripts from Python (https://www.python.org), MATLAB (https://www.mathworks.com), and R (https://www.r-project.org). Users are free to upload either raw or processed forms.

Table 1 Data Standards and File Formats.

Full size table