Multi-channel EEG recording during motor imagery of different joints from the same limb

Motor imagery (MI) is one of the important brain-computer interface (BCI) paradigms, which can be used to control peripherals without external stimulus. Imagining the movements of different joints of the same limb allows intuitive control of the outer devices. In this report, we describe an open access multi-subject dataset for MI of different joints from the same limb. This experiment collected data from twenty-five healthy subjects on three tasks: 1) imagining the movement of right hand, 2) imagining the movement of right elbow, and 3) keeping resting with eyes open, which results in a total of 22,500 trials. The dataset provided includes data of three stages: 1) raw recorded data, 2) pre-processed data after operations such as artifact removal, and 3) trial data that can be directly used for feature extraction and classification. Different researchers can reuse the dataset according to their needs. We expect that this dataset will facilitate the analysis of brain activation patterns of the same limb and the study of decoding techniques for MI.

In this paper, we collected data from twenty-five subjects in the new dataset MI-2, which contains a total of 22,500 (=25 · 900) hand movement MI, elbow movement MI, and resting state trials. The subjects performed specific MI tasks according to the prompts on the screen and their EEG data were recorded at the same time. During the experiment, EMG signals of the limbs were monitored to make sure that the collected EEG data were indeed the result of motor imagery, not motor execution. The public dataset consists of three stages of data, namely the raw data, the pre-processed data, and the trial data that can be directly used for classification, so that different researchers can reuse the dataset according to their needs.
Based on this dataset, we have preliminarily compared the differences in EEG activation patterns between hand and elbow MI, and conducted a three-classification decoding task using the existing baseline and the state-of-the-art methods, proving that the collected data of the three categories are separable 23 . Also, we extracted the correlation coefficient matrices between channels as features representing functional connections, and developed an ensemble Channel-Correlation Network to improve the decoding performance. In this dataset, our proposed deep learning based method obtained a decoding accuracy of 87.03% in the 3-class scenario 23 .
This paradigm of different joints from the same limb can provide intuitive control of outer equipments (e.g. exoskeleton), without the need to increase the cognitive load of the users to establish any artificial associations between imaging movement and neuroprosthetic movement. We expected that this dataset could be used to analysis the brain activation patterns and to design the decoding methods of MI of different joints from the same limb.

Methods
Subjects. All experiments were approved by the ethical committee of Institute of automation, Chinese Academy of Sciences. In this experiment, we collected data from 25 right-handed healthy subjects (19 males, 6 females, aged 19-27) without MI-based BCI experience. All subjects signed the informed consent before the experiment. The names of all participants have been hereby anonymized. The participants are identified only by their aliases "sub-001" through "sub-025". experimental paradigm. The subjects sat in a comfortable chair with their hands naturally on their thighs, keeping their eyes one-meter away from the screen (see Fig. 1(a)). As shown in Fig. 2, each trial (8s) started with a white circle at the center of the monitor for 2s, followed by a red circle as a cue for 1s to remind the subjects of paying attention to the upcoming target. The target prompt ("Hand" or "Elbow") appeared on the screen for 4s. During this period, the subjects were asked to imagine the prompted movement kinesthetically in mind rather than a visual type of imagery. The subjects were instructed to avoid any motion during imagination. The EMG of the right hand and the right forearm of the subjects were monitored (see Fig. 1(b)) to make sure they did not move involuntarily. After the imagination, "Break" appeared for 1s as the end of the entire 8s trial. During the break, the subjects were asked to relax and minimize their eye and muscle movements.
The experiments contained 7 sessions, involving five sessions consisting of 40 trials each for two kinds of motor imagination tasks (20 trials for each movement imagination in one session) and two sessions consisting of 50 trials each for resting state. In order to avoid interference between the imaginary tasks before and after, the order of the MI task indications in a session was disrupted. There were 5 to 10 minutes of rest between sessions. Thus, there are totally 300 trials (100 trials for each type of mental state) per subject in the dataset.  www.nature.com/scientificdata www.nature.com/scientificdata/ Data collection and preprocessing. As shown in Fig. 1(a), EEG data were acquired using a 64-channel gel electrode cap (according to the standard 10/20 System) with a Neuroscan SynAmps2 amplifier (Neuroscan, Inc.). The sampling frequency was 1000 Hz. The left mastoid reference was used for EEG measurement. Electrode impedances were kept below 10 kΩ during the experiment. The band-pass filtering range of the system was 0.5-100 Hz. Besides, an additional 50 Hz notch filter was used for data acquisition.
The pre-processing of the collected data was done using the EEGLAB toolkit (v14.1.1_b) 24 of MATLAB (R2015a) software. We used Common Average Reference (CAR) to spatially filter the data, and performed time-domain filtering on the data from 0 to 40 Hz. The formula of CAR is shown in Eq. (1), where N represents the number of EEG channles. The data were downsampled to 200 Hz to reduce computational cost. A plugin in EEGLAB called Automatic Artifact Removal toolbox (AAR) 25 was used to automatically remove the ocular and muscular artifacts in EEG.

Data records
The data are freely downloaded from the open access repository 26 . The source files and meta-data files in this dataset were organized according to EEG-BIDS 27 , which was an extension to the brain imaging data structure for EEG. The directory tree for our repository and some previews for meta-data are shown in Fig. 3.
On the whole, the repository's data consists of three parts: (1) Preprocessed Data stored in the home folderset/fdt files with meta-data; (2) Raw Recorded Data stored in the sourcedata folder -cnt files; (3) Trial Data stored in the derivatives folder -mat files. Within these directories, the subdirectory corresponding to each subject is named "sub-xxx", where xxx represents the serial number of the subject. raw Data. The acquired raw data for a single task session were saved as .cnt files and organized according to the following naming rules:  where xx stands for the subject number (001, 002, …, 025), yy is the session number (01, 02, …, 19), TASKNAME represents the task type ("motorimagery" or "rest"). After loading the.cnt file through EEGLAB toolbox, you will get a structure variable named "EEG" whose key field information can be seen in Table 1.   www.nature.com/scientificdata www.nature.com/scientificdata/ processed Data. Raw data (.cnt files) from Neuroscan was imported to MATLAB using the EEGLAB toolbox. After the preprocessing operations such as re-referencing and artifact removal were performed, the data of each session were saved as two associated files: FILENAME.set and FILENAME.fdt (.fdt for float data). Note that at this stage the two EMG channels that were not relevant to the EEG data analysis were removed. The name of them are consistent with the raw recorded data except for the suffix: . sub xxx ses yy task TASKNAME eeg set fdt The .set and .fdt files store meta information (e.g. number of channels, sampling rate) and raw data respectively. After loading the .set/.fdt file through EEGLAB toolbox, you will get a structure variable named "EEG" whose key field information can be seen in Table 1.

Trial Data.
To facilitate the decoding of movement intentions, the trial data of each subject were integrated into a single .mat file, which are ready for feature extraction and classification. This folder contains 25 .mat files  www.nature.com/scientificdata www.nature.com/scientificdata/ named in the format sub -xxx_task -motorimagery_eeg.mat, where xxx stands for the subject number (001, 002, …, 025). Each.mat file stores three variables: • task_data: It contains all the task data of a subject with a size of (session number, trial number, channel number, sample points), e.g. (15, 40, 62, 800). • task_label: It contains all the task label ("1" and "2" for MI of hand and elbow, respectively) of a subject with a size of (session number, trial number), e.g. (15,40). • rest_data: It contains a total of 300 trials from 4 rest sessions (75*4) with a size of (trial number, channel number, sample points), e.g. (300, 62, 800). www.nature.com/scientificdata www.nature.com/scientificdata/ technical Validation eMG Validation. During imagination, the subjects were instructed to avoid any motion. The electromyography (EMG) signal snapshot is shown in Fig. 4 (the bottom two channels). When we observed actual hand movements or fluctuations in EMG signals, we would remind the subject and restart the session. In this way, we can ensure that the collected EEG is indeed the result of motor imagery, not motor execution. Due to equipment problems, the EMG signals were only available in the last 15 subjects (from sub-011 to sub-025).
eeG Validation. As shown in Fig. 5, during the experiment, the impedances of electrodes were kept close to or below 5 kilo-ohms to ensure the quality of the connections between the electrodes and the scalp.
Event related desynchronization (ERD, reduction in power) in the alpha and beta band during the period of motor imagery has been extensively documented in the literature 28,29 . This characteristic behaviors is observed in our data. Figure 6 shows the average time-frequency maps of all subjects at electrode positions C3, C4 and Cz during the MI of hand, elbow, and resting state. The maps show clear reduction of power in both alpha (8)(9)(10)(11)(12)(13) and beta (20)(21)(22)(23)(24)(25) bands from 0 to 4s, except for resting state. In order to study the activation of the motor area, we calculated the average energy distribution in the alpha and beta bands, and plotted them into a topology map based on the channel positions, which are depicted in Fig. 7. The maps of the hand and elbow MI show a clear ERD pattern in the contralateral motor areas, while that of the resting state diagram does not.
In addition, we determined a 2Hz-wide frequency band according to the frequency of the lowest energy position in each subject's ERD graph, and plotted the curve of energy with time. As shown in Fig. 8, the power decrease after the onset (0-s on the x-axis) of the motor imagery.
These results indicate that in our experiments, subjects' imagination of the hands and elbows activated their corresponding areas of motion on the opposite side of the brain, which is consistent with previous studies 28 .
In addition to the above qualitative analysis, there is also quantitative analysis. We extracted the features of FBCSP 30 and used the SVM as our classifier to classify the data of each subject. Table 2 shows the classification accuracies of cross-trial under 5-fold cross-validation. The classification results of each subject were above the chance level (33.33%) in the 3-class scenario.

Usage Notes
The raw recorded EEG dataset, the pre-processed dataset, and the classification-ready trial-wise dataset are available from Harvard Dataverse 26 . The data can be analyzed in MATLAB using the EEGLAB toolkit. Detailed tutorials of EEGLAB can be found on their website (https://sccn.ucsd.edu/eeglab/index.php). Although the results may not vary much between versions, we recommend the researchers use EEGLAB with version v14.1.1_b.
We provide some instructions on how to use EEGLAB to load our data.  Table 2. Classification accuracy obtained with FBCSP + SVM method.