Event-related potential data from a guess the number brain-computer interface experiment on school children

Guess the number is a simple P300-based brain-computer interface experiment. Its aim is to ask the measured participant to pick a number between 1 and 9. Then, he or she is exposed to corresponding visual stimuli and experimenters try to guess the number thought while they are observing event-related potential waveforms on-line. 250 school-age children participated in the experiments that were carried out in elementary and secondary schools in the Czech Republic. Electroencephalographic data from three EEG channels (Fz, Cz, Pz) and stimuli markers were stored. Additional metadata about the participants were collected (gender, age, laterality, the number thought by the participant, the guess of the experimenters, and various interesting additional information). Consequently, we offer the largest publicly available odd-ball paradigm collection of datasets to neuroscientific and brain-computer interface community.

There were usually three experimenters present. The first experimenter, a health-care professional in EEG, was responsible for preparing the participant for the experiment (by applying the EEG cap and electrodes) and explaining them the goal of the experiment and behavior rules that are necessary to follow during the experiment. This experimenter was also responsible for replacing the cap and electrodes after the end of the experiment. The second experimenter was mainly responsible for correct functioning of the used hardware and software infrastructure. The third experimenter spent most of his time explaining the nature of the experiment to the participant and to other onlookers. All experimenters usually participated in the main task-guessing the number.

Data acquisition
Before starting the experiment itself the participants were informed about the goal of the experiment, course of the experiment, and used equipment. Each participant was familiarized with basic behavioral rules, asked to sit comfortably, pay attention to the stimulation, not to move, and limit their eye blinking. To increase alertness, the participants were instructed to silently count the total number of target stimuli presented on the monitor. During the experiment the participants were sitting approximately 1.5 m in front of the monitor for as long as needed (approximately 10 min on average). Other children observing the experiment were asked not to enter into the field of view of the participant and not to disturb him/her in any other way.
Then the participant was technically prepared for the experiment: an EEG cap was used depending on the size of the participant's head, the reference electrode was placed on the bridge of the nose and the ground electrode was placed on the ear. The EOG electrode for observing eye movements was placed under the participant's eye. The reference, ground, and EOG electrodes as well as the EEG cap were connected to the V-Amp amplifier. The impedances of all electrodes were checked and corrected if necessary. When the participant assured experimenters that he/she had understand all the circumstances of the experiment and selected a target number to concentrate on, the experiment was launched.
During the experiment the participant was regularly checked if he/she was following the rules. If the signal was damaged by eye blinking or other movement artifacts, the participant was asked to reduce these movements. However, there were several cases when the experiment was terminated prematurely because of a large number of artifacts or bad feelings (nausea, headache) of the participant. Normally the experiment was stopped at the time the experimenters decided to guess the number or assumed not having any chance to guess the number from the signal. If the experimenters were not successful in guessing the number, they usually asked the participant to continue in the experiment and tried the second or even the third guess. After finishing the experiment the experimenters showed and explained the participant his/her results including the P300 average waveforms. The explanation was always adjusted to the age of the participant.

Data Records Data storage
The EEG/ERP Portal (EEGBase) (Data Citation 1) was used for storing the experimental data and metadata. It is a web application that serves not only for long-term storage of EEG/ERP experiments, but also for their annotation, management, and sharing. The stored data are protected by the system of user accounts and defined user roles (Reader, Experimenter, Group Administrator, and Supervisor). Individual users are grouped into self-managed groups. The user is required to create a personal account prior to uploading or downloading any experiment. Metadata are stored using metadata templates that reflect the odML terminologies 7 .
Although the EEG/ERP Portal has been developed and optimized as a data storage for human EEG data, it does not provide direct and permanent download links for individual datasets. Currently the EEG/ERP Portal also does not support DOI citations. Therefore, each dataset stored in the EEG/ERP Portal is mirrored in the Harvard Dataverse (Data Citations 2-251).

Data organization
The data and metadata from 250 participants are stored in the EEG/ERP Portal and downloadable as 'PROJECT DAYS P3 NUMBERS' zip package (the procedure of getting this package is described in Section usage notes). Each dataset has its own folder that is further internally organized in the following way: (1) the experimental protocol (the files generated by the Presentation software) is located in the Scenario folder, (2) the experimental data and metadata stored in the BrainVision format (.eeg,.vhdr and .vmrk files) and the basic experimental metadata (.txt file) are located in the Data folder.
(i) P3Numbers_yyyymmdd_gender_age_id.eeg is a binary file containing raw EEG data, (ii) P3Numbers_yyyymmdd_gender_age_id.vhdr is a text file containing metadata that describe raw EEG data stored in the corresponding.eeg file, (iii) P3Numbers_yyyymmdd_gender_age_id.vmrk is a text file containing stimuli markers used in the experiment, (iv) P3Numbers_yyyymmdd_gender_age_id.txt is a text file containing basic experimental metadata-gender, age, the number thought, first guessed number, second guessed number, third guessed number, laterality, and eventually any interesting additional information (the field named as 'other') collected on site (these metadata are presented separately because they did not meet fully the allowable content of EEG/ERP Portal metadata templates at the time when they were collected and stored).
(3) The License agreement (Creative Commons Attribution Non Commercial 4.0) is located in the License folder. (4) The experimental metadata file (metadata.xml) contains a set of metadata (such as used hardware and software) describing the experimental conditions. It is stored in the root folder of each dataset and structured according to the portal metadata template used for data storing. It reflects the EEG/ERP Portal terminology restrictions applied to the metadata content at the time the metadata were collected and stored (These metadata restrictions are no more applied; currently all experimental metadata could be stored in one file only).
While all the described files are organized in a hierarchical folder structure within a .zip package when they are downloaded from the EEG/ERP Portal, this hierarchical structure is not applied to the replicated data in the Harvard Dataverse repository (Data Citations 2-251). Instead, the files are organized in the plain structure there.

Technical Validation
All data were saved in a raw form. It means that the preprocessing methods (filtering, baseline correction, artifacts rejection) applied to the data during experimental sessions (to visualize and analyze them) were not applied to the stored data. The quality of datasets varies because all measurements were performed outside the laboratory. EEG signal of most datasets shows a declining signal trend. Because of that we tested the hardware amplifier for possible defects and took measures to eliminate sources of interference in classrooms as much as possible. We believe that the declining signal trend was caused by outside interference we could no longer influence (see Fig. 2). This issue can be easily handled by applying high pass filtering (with cutoff frequency e.g., 0.5 Hz). Most data were first stored on a laptop that was running on battery power. However, in the case of low battery power (only during the days when many experiments were carried out) it was necessary to switch the power source to grid. Then the data contain 50 Hz interference that can be also removed by filtering.
The technical validation of each dataset was performed separately. Two different parameters were considered: • The rate of eye-blinking artifacts in ERP epochs. Eye-blinks severely distort the EEG signal and reduce the usability of datasets. The percentage of epochs damaged by eye blinks was calculated using a combination method described in 8 for each experiment separately. The combination method iterates over all baseline-corrected ERP epochs. For eye-blinks detection, it uses a combined threshold factoring in maximum absolute value of amplitude and a correlation with a sample eye-blink. The results for each dataset are available in Table 1 (available online only). • It was evaluated if the number thought was correctly guessed by the experimenters or not. The experiments with successful guesses are typically associated with a larger amplitude of the P300 component. The results for each dataset are available in Table 1 (available online only).

Usage Notes
The experimental data and metadata can be downloaded from the EEG/ERP Portal (Data Citation 1) according to the following procedure. Any user has to be registered first. When the registration form is completed, a confirmation e-mail is sent to the user. Then the user is requested to click on the confirmation link contained in the confirmation e-mail. After successful login a personalized user's homepage including an overview of user's experiments, scenarios, research group memberships, etc. is displayed. In order to see publicly offered experiments and find the package named 'PROJECT DAYS P3 NUMBERS' the user selects the Experiments section from the main menu appearing at the top of the homepage. When the Experiment section is loaded, the user selects the package 'PROJECT DAYS P3 NUMBERS', chooses the license under which he/she wants to use the data (Creative Commons BY-NC) and clicks on the 'Add to cart' link (see Fig. 3). When the package is added into the cart, the user is requested to click on the 'My cart' link at the top of the page. The content of the cart is shown (Fig. 4). The experiments in the 'PROJECT DAYS P3 NUMBERS' package are available under the selected license. When the user finishes the order (by clicking on the 'Create order' button), the package is formally available for downloading (by clicking on the 'Download' link). Then the user confirms his/her selection of the experiments within the package and clicks on the 'Create package' button to create a.zip package (PROJECT_DAYS_P3_NUMBERS.zip). Since the data are quite large, the progress bar indicates the portion of the package that has been already created. When the package is created, it can be finally downloaded by clicking on the 'Download' link.
The ordered (purchased) package could be re-downloaded at any time in the Experiment section by clicking on the 'Download' link that appears instead of the 'Add to cart' link within the package.  Since the data were stored in the BrainVision (BV) format 9 , appropriate software tools have to be used to read and further process the data. EEGLab, an open-source Matlab toolbox for EEG signal processing 10 , is one of the preferred options. It is necessary to download the BVA-io plugin (available at http://sccn. ucsd.edu/wiki/EEGLAB_Extensions_and_plug-ins) in order to easily import the data stored in the BV format into EEGLab. Another option is to use the EEGLoader library (available at https://github.com/ stebjan/eegloader) that provides a simple interface for reading the BV format.