“Thinking out loud”: an open-access EEG-based BCI dataset for inner speech recognition

Surface electroencephalography is a standard and noninvasive way to measure electrical brain activity. Recent advances in artificial intelligence led to significant improvements in the automatic detection of brain patterns, allowing increasingly faster, more reliable and accessible Brain-Computer Interfaces. Different paradigms have been used to enable the human-machine interaction and the last few years have broad a mark increase in the interest for interpreting and characterizing the “inner voice” phenomenon. This paradigm, called inner speech, raises the possibility of executing an order just by thinking about it, allowing a “natural” way of controlling external devices. Unfortunately, the lack of publicly available electroencephalography datasets, restricts the development of new techniques for inner speech recognition. A ten-subjects dataset acquired under this and two others related paradigms, obtained with an acquisition system of 136 channels, is presented. The main purpose of this work is to provide the scientific community with an open-access multiclass electroencephalography database of inner speech commands that could be used for better understanding of the related brain mechanisms.


118
The design of the dataset was made having in mind as main objectives the decoding and understanding of the processes 119 involved in the generation of inner speech, as well as the analysis of its potential use in BCI applications. As described in the 120 "Background & Summary" Section, the generation of inner speech involves several complex neural networks interactions. With 121 the objective of localizing the main activation sources and analyzing their connections, we asked the participants to perform the 122 experiment under three different conditions: inner speech, pronounced speech and visualized condition.

123
Inner speech 124 Inner speech is the main condition in the dataset and it is aimed to detect the brain's electrical activity related to a subject's 125 thought about a particular word. In the inner speech runs, each participant was indicated to imagine his/her own voice, repeating 126 the corresponding word until the white circle turn blue. The subject was instructed to stay as still as possible and not to move 127 the mouth nor the tongue. For the sake of natural imagination, no rhythm cue was provided.

128
Pronounced speech 129 Although motor activity is mainly related to the imagined speech paradigm, inner speech may also show activity in the motor 130 regions. The pronounced speech condition was proposed with the purpose of finding motor regions involved in the pronunciation 131 matching those activated during the inner speech condition. In the pronounced speech runs, each participant was indicated 132 to repeatedly pronounce aloud the word corresponding to each visual cue. As in the inner speech runs, no rhythm cue was 133 provided. 134 Since the selected words have a high visual and spatial component, with the objective of finding any activity related to that being 136 produced during inner speech, the visualized condition was proposed. It is timely to mention that the main neural centers related 137 with this spatial thinking are located in the occipital and parietal regions 41 . In the visualized condition runs, the participant was 138 indicated to focus on mentally moving the circle shown in the center of the screen in the direction indicated by the visual cue.

140
In order to recast the continuous raw data into a more compact dataset and to facilitate their use, a transformation procedure 141 was proposed. Such processing was implemented in Python, mainly using the MNE library 42 , and the code along with the raw 142 data are available, so any interested reader can easily change the processing setup as desired (see Code Availability Section).

144
A function that rapidly allows loading of the raw data corresponding to a particular subject and session, was developed. The raw 145 data stored in the .bdf file contains records of the complete EEG and external electrodes signals as well as the tagged events.

146
Events checking and correction 147 The first step of the signal processing procedure was checking for correct tagging of events in the signals. Missing tags 148 were detected and a correction method was proposed. The method detects and completes the sequences of events. After the 149 correction, no tags were missing and all the events matched those sent from PC1.

151
A re-reference step of the data to channels EXG1 and EXG2 was applied. This eliminates both noise and data drift, and it was 152 applied using the specific MNE re-reference function.

153
Digital filtering 154 The data were filtered with a zero-phase bandpass finite impulse response filter using the corresponding MNE function. The 155 lower and upper bounds were set to 0.5 and 100 Hz, respectively. This broad band filter aims to keep the data as raw as possible, 156 allowing future users the possibility of filtering the data in their desired bands. A Notch filter in 50Hz was also applied.

157
Epoching and decimation 158 The data were decimated four times, obtaining a final sampling rate of 254 Hz. Then, the continuous recorded data were

163
Independent Components Analysis (ICA) is a standard and widely used blind source separation method for removing artifacts 164 from EEG signals [43][44][45] . For our dataset, ICA processing was performed only on the EEG channels, using the MNE implementa-165 tion of the infomax ICA 46 . No Principal Component Analysis (PCA) was applied and 128 sources were captured. Correlation 166 with the EXG channels was used to determine the sources related to blink, gaze and mouth movement, which were neglected in 167 the process of reconstructing the EEG signals, for obtaining the final dataset.

169
The EMG control aims to determine whether a participant moved his/her mouth during the inner speech or visualized condition 170 runs. The simplest approach to find EMG activity is the single threshold method 47 . The baseline period was used as a basal 171 activity. The signals coming from the EXG7 and EXG8 channels were rectified and bandpass filtered between 1 and 20 Hz 48-50 .
where x[·] denotes the signal being considered, and s, S are the initial and final samples of the window, respectively. For every window, the computed powers were stacked and their mean and standard deviations were calculated and used to construct a 176 decision threshold: 4/20 procedure was repeated for both channels and the mean power in the action interval of every trial was calculated. Then, if 179 one of those values, for either the EXG7 or EXG8 channels was above the threshold, the corresponding trial was tagged as 180 "contaminated".

181
A total of 115 trials were tagged as contaminated, which represents a 2.5% of the inspected trials. The number of tagged 182 trials is shown in Table 1. The tagged trials and their mean power corresponding to EXG7 and EXG8 were also stored in 183 a report file. In order to reproduce the decision threshold, the mean and standard deviation power for the baseline for the 184 corresponding session were also stored in the same report file.

185
The developed script performing the control is publicly available and interested readers can use it to conduct different 186 analyses with the single threshold method.

188
After session 1, subject sub-03 claimed that, due a missinterpretation, he/she performed only one inner speech run and three 189 visualized condition runs. The condition tag was appropriately corrected. folder is composed of ten subfolders containing the session raw data, each one corresponding to a different subject. There is an 194 additional folder, containing five files obtained after the proposed processing: EEG data, Baseline data, External electrodes data,

195
Events data and a Report file. We now proceed to describe the contents of each one of these five files along with the raw data.  Table 2. A spurious event, of unknown origin, with ID 65536 appeared at the beginning of 202 the recording and also it randomly appeared within some sessions. This event has no correlation with any sent tag and it was 203 removed in the "Events Check" step of the processing. The raw events are stored in a three column matrix, where the first 204 column contains the time stamp information, the second has the trigger information, and the third column contains the event ID.

206
Each EEG data file, stored in .fif format, contains the acquired data for each subject and session, after processing as described  Table 4.

213
External electrodes data 214 Each one of the EXG data files contains the data acquired by the external electrodes after the described processing was applied,

215
with the exception of the ICA processing. They were saved in .fif format. Each event data file (in .dat format) contains a four column matrix where each row corresponds to one trial. The first two 220 columns were obtained from the raw events, by deleting the trigger column (second column of the raw events) and renumbering 221 the classes 31, 32, 33, 34 as 0, 1, 2, 3, respectively. Finally, the last two columns correspond to condition and session number, 222 respectively. Thus, the resulting final structure of the events data file is as depicted in Table 5. inspection it was observed that the recorded baselines of subject sub-03 in session 3 and subject sub-08 in session 2, were 228 highly contaminated.

230
The report file (in .pkl format) contains general information about the participant and the particular results of the session 231 processing. Its structure is depicted in Table 3. to monitor their concentration on the requested activity. The results of the evaluation showed that participants correctly followed 236 the task, as they performed very few mistakes (Table 6; mean ± std = 0.5 ± 0.62). Subjects sub-01 and sub-10 claimed that 237 they had accidentally pressed the keyboard while answering the first two questions in session 1. Also, after the first session, 238 subject sub-01 indicated that he/she felt that the questions were too many, reason for which, for the subsequent participants, the 239 number of questions was reduced, in order to prevent participants from getting tired.

241
It is well known that Events Related Potentials (ERPs) are manifestations of typical brain activity produced in response to 242 certain stimuli. As different visual cues were presented during our stimulation protocol, we expected to find brain activity 243 modulated by those cues. Moreover, we expected this activity to have no correlation with the condition nor with the class 244 and to be found across all subjects. In order to show the existence of ERPs, an average over all subjects, for each one of the 245 channels at each instant of time, was computed using all the available trials (N ave = 5640), for each one of the 128 channels.

246
The complete time window average, with marks for each described event is shown in Figure 5. As shown in Figure [5-C], a pronounced negative potential followed. It is reasonable to believe that this negative potential is the 251 so-called "Contingent Negative Variation" ERP, which is typically related to the "warning-go" stimuli 56 . The signal appears 252 to be mostly stable for the rest of the action interval. As seen in Figure

Time-Frequency Representation
In order to detect regions where neural activity between conditions is markedly different, the power difference in the main 279 frequency bands between each pair of conditions, was computed. As in the Averaged Power Spectral Density section, the time 280 window used was 1.5 -3.5 s. The Power Spectral Density (PSD) was added to the analysis to further explore regions of interest.

281
Shaded areas on the PSD graphics in Figure 9 corresponds to ±1 std of the different channels used. No shaded area is shown 282 when only one channel was used to compute the PSD.

283
The top row of Figure 9 shows a comparison between inner and pronounced speech. In the alpha band, a major inner speech 284 activity can be clearly seen in the central occipital/parietal region. The PSD was calculated using channels A4, A5, A19, A20 285 and A32 6 and shows a difference of approximately 1 dB at 11 Hz. On the other hand, in the beta band, the spatial distribution 286 of the power differences shows an increased temporal activity for the pronounced condition, consistent with muscular activity 287 artifacts. Here, the PSD was calculated using channels B16, B22, B24 and B29 for the right PSD plot and channels D10, D19, 288 D21 and D26 for the left PSD plot. Pronounced speech shows higher power in the whole beta band with a more prominent 289 difference in the central left area.

290
The middle row of Figure 9 shows a comparison of the pronounced speech against the visualized condition. In the alpha 291 band, the visualized condition presents a larger difference in the central parietal regions and a more subtle difference in the 292 lateral occipital regions. The PSD was calculated using channels A17, A20, A21, A22 and A30. Here again, a difference 293 of about 1 dB at 11 Hz can be observed. In the beta band, an intense activity in the central laterals regions appears for the 294 pronounced condition. For this band, the PSD was calculated using the same channels as in the comparison between inner and 295 pronounced speech for the beta band. As seen, power for pronounced speech is higher than for the visualized condition in the were computed using channels A10 and B7 for the left and right plots respectively. In both plots, the peak corresponding to the 303 inner speech condition is markedly higher than the one corresponding to the visualized condition. For the beta band, the PSD 304 was calculated using channels A13 and A26 for the left and right PSD plots, respectively. As it can be observed, the power for 305 the visualized condition in the whole beta band is higher than the inner speech power. It is timely to mention that no significant   Figure 1. Experiment setup. Both computers, PC1 and PC2, were located outside the acquisition room. PC1 runs the stimulation protocol while communicating to PC2 every cue displayed. PC2 received the sampled EEG data from the acquisition system and tagged the events with the information received from PC1. At the end of the recording, a .bdf file was created and saved.  -01  25  25  25  25  50  50  50  50  50  50  50  50  sub-02  30  30  30  30  60  60  60  60  60  60  60  60  sub-03  25  25  25  25  45  45  45  45  55  55  55  55  sub-04  30  30  30  30  60  60  60  60  60  60  60  60  sub-05  30  30  30  30  60  60  60  60  60  60  60  60  sub-06  27  27  27  27  54  54  54  54  54  54  54  54  sub-07  30  30  30  30  60  60  60  60  60  60  60  60  sub-08  25  25  25  25  50  50  50  50  50  50  50  50  sub-09  30  30  30  30  60  60  60  60  60  60  60  60  sub-10  30  30  30  30  60  60  60  60  60  60  60  60  Sub Total 282  282  282  282 559  559  559  559 569  569  569  569  Total  1128  2236  2276   Table 5. Events data format and tag meaning.

Sample
Trial's class Trials' condition Trials' session Sample at which the event occured 0 = "Arriba" (up) 0 = Pronounced speech 1 = session 1 (Numbered starting at n=0, 1 = "Abajo" (down) 1 = Inner speech 2 = session 2 corresponding to the beginning of the recording) 2 = "Derecha" (right) 2 = Visualized condition 3 = session 3 3 = "Izquierda" (left) 12/20 Relative and global time were plotted above and below the arrow, respectively. Table 6. Result of attention monitoring. Note that the maximum number of incorrect answers is 2. The large variability in the number of questions in session 3 is due to the different number of trials for each one of the participants.