A resource for assessing dynamic binary choices in the adult brain using EEG and mouse-tracking

We present a dataset combining high-density Electroencephalography (HD-EEG, 128-channels) and mouse-tracking intended as a resource for examining the dynamic decision process of semantics and preference choices in the human brain. The dataset includes resting-state and task-related (food preference choices and semantic judgments) EEG acquired from 31 individuals (ages: 18–33). Along with the dataset, we also provided the preliminary microstate analysis of resting-state EEG and the ERPs, topomap, and time-frequency maps of the task-related EEG. We believe that the simultaneous mouse-tracking and EEG recording would crack the core components of binary choices and further index the temporal dynamics of decision making and response hesitation. This publicly available dataset could support the development of neural signal processing methods in motor EEG, thus advancing research in both the decision neuroscience and brain-computer interface (BCI) applications.

www.nature.com/scientificdata www.nature.com/scientificdata/ with EEG, both task-related motor responses and mental processing could be reflected in the behavioral and brain patterns. These open data can be used for classifying motor noise and EEG signals related to decision processes. We have preliminarily shown the EEG patterns when people make decisions in image/text-based semantic judgment task, food preference task, and resting-state based on this dataset. This data from different decision tasks can further provide brain data in computer cursor controlling or identifying temporal dynamic patterns in responses or choices.
We anticipate that the current work could encourage decision neuroscience, cognitive neuroscience, and biomedical engineering researchers to reuse the dataset for brain activation pattern analyses, artificial removal, and neural decoding.

Methods
Participants and task overview. Thirty-one college students (18-33 years old, average: 20.68 years old; 14 males) participated in the present experiment. None of the participants reported any neurological or psychiatric history. All participants were right-handed and had normal or corrected-to-normal vision. Each participant voluntarily enrolled in and signed an informed consent form prior to the experiments and got the monetary compensation of approximately MOP 60 for one experimental session. The anonymous participants can only be identified by the tag from "sub-01" through "sub-31". This study was performed in strict accordance with the Guidelines for Research Ethics of the University of Macau. The Institutional Review Board of the University has approved all procedures.
Text descriptions of task overview were shown in Table 1.
Experimental procedures. Participants were instructed to sit in an adjustable chair, whose eyes were approximately 60 cm away from the monitor (Dell, resolution: 1,920 × 1,080 pixels, vertical refresh rate: 60 Hz), see Fig. 3a. They were then informed that they would perform simple decision making tasks. All three tasks were divided into three blocks, interleaved with two breaks when the experimenter could check the impedance of the electrodes and supplement saline if necessary. Stimulus presentation and manual response measurement were controlled by PsychoPy 13 Standalone (2020.2.3), and the EGI PyNetstaion module was used to connect PsychoPy and EGI Netstation. All stimuli were presented on the screen against a black background (RGB: 0,0,0). A white cross subtended 1.0 × 1.0° worked as a fixation at the center of the screen. There was a grey rectangle (RGB: 169, 169, 169) with the white-font "start" subtended 4.0 × 2.2° at the start of each trial, showing up at the bottom center of the screen. Two target options (images or words) subtended 10.6 × 8.0°, appeared at the screen's top left and top right side, respectively. After clicking on one of the targets, a blue frame (RGB: 0, 255, 0) will appear on the selected target, subtended 10.7 × 8.1°.
Session 1: Resting-state. The resting-state session consisted of two blocks, with 50 trials in each block. In each trial, participants were asked to look at the fixation cross at the center of the screen for 7 s, followed by a blank lasting for 1 s. There was a 2-min break between the two blocks.
Session 2: Food choice task. Three hundred and twenty pictures depicting various types of food were selected from Food-Pics Extended 14 for both the pre-rating and food choice tasks. Firstly, participants were required to rate the target food from three aspects, which were presented randomly.
The first rating task aims to examine participants' preferences in food choices. Participants needed to rate the taste of the target food presented at the center of the screen by a five-point Likert scale ("How do you like this food?" One denotes "extremely unsavory" while five denotes "extremely delicious"). The second rating relates to participants' perception of food healthiness. Participants were asked to judge whether the target food was healthy or not ("How healthy do you think this food is?" One denotes "extremely unhealthy" while five denotes "extremely healthy"). In the third rating, participants were asked to judge how much they wanted to eat the target food after the experiment ("How much do you want to eat this food after the experiment?" One denotes "I don't want it at all" while five denotes "I desperately want it").
In the formal food choice task, as shown in Fig. 1a, each trial began with a blank with the word "start" at the bottom center of the screen. The participant could click the start box once they were ready. Two pictures of target food appeared in the upper left and upper right corner of the screen. The participants were asked to choose the preferred one by continuously moving the mouse to their favorite food box and clicking it. Upon click, the selected picture would be highlighted in blue for 1000 ms. After the pictures disappeared, a fixation would show

Session Description
Resting-state Measures EEG activity during rest.
Food choice task A food preference task based on mouse-tracking. Measures the dynamic process and corresponding EEG signals of food preferences.
Image choice task A semantic classification task for images based on mouse-tracking. Measures the dynamic semantic process and corresponding EEG signals of binary decisions.
Word choice task A semantic classification task for written words based on mouse-tracking. Measures the dynamic semantic process and corresponding EEG signals of binary decisions. www.nature.com/scientificdata www.nature.com/scientificdata/ up for 800 to 1500 ms (mean = 1150 ms). There were 160 pairs of pictures in the formal task, with each pair appearing twice. This session took about 26 minutes. There were two 2-min breaks in this task.
Session 3: Image choice task. Eighty images of animate objects and 80 inanimate objects were collected and designed by ourselves. Each image appeared twice in a pseudo-randomized order. Each trial began with the "start" button at the bottom center of the screen, as shown in Fig. 1b. Once started, the target object image would appear at the center. Participants need to determine whether the object is an animate or inanimate thing by clicking on the corresponding option in the upper left or upper right corner of the screen with a mouse. Upon click, the selected option would be highlighted in blue and lasted for 1000 ms. A fixation cross would subsequently show up and last for 800-1500 ms randomly. There were 320 trials in total. The positions for the "animate" and "inanimate" options were counterbalanced across all trials. It took participants around 27 minutes to complete this session. There were two 2-min breaks in this task.
Session 4: Word choice task. The procedure of the word choice task was almost identical with the image choice task, as shown in Fig. 1c. All the pictures from session three were replaced by their corresponding Chinese written words. To avoid ambiguity, two raters of Chinese linguistics background were invited to name the pictures and reached an agreement upon the 80 animate and 80 inanimate nouns. Two word types were also matched regarding word frequency [t(80) = 1.241, p = 0.216, word frequency data retrieved from the chinese corpus (http://corpus.zhonghuayuwen.org)] and number of strokes [t(80) = 0.749, p = 0.455]. This session took around 26 minutes. There were two 2-min breaks in this task.
Although the time to finish all of the tasks is long, we attempted to give participants enough break time during the task and between the tasks. Meanwhile, the saltwater during the break time can keep the impedance of the electrodes low, which provides sufficient guarantees to ensure high quality EEG data collection.

Data Records
All data are publicly accessible (https://openneuro.org/datasets/ds003766) 15 in the brain imaging data structure (BIDS) 16 format under the OpenNeuro platform. Fig. 3a,b, EEG data were acquired using a 128-channel cap based on the standard 10/20 System with Electrical Geodesics Inc (EGI, Eugene, Oregon) system. The layout of EEG electrodes was presented in Fig. 3b. During recording, the sampling rate was 1000 Hz, and the E129 (Cz) electrode was used as reference. Electrode impedances were kept below 50k Ω for each electrode during the experiment. The raw EEG data was exported to metafile format (.mff) files on the Mac OS. www.nature.com/scientificdata www.nature.com/scientificdata/ EEG data organization. The dataset can be accessed via the OpenNeuro link, organized following the EEG-BIDS 17 specification, which was an extension to the brain imaging data structure for EEG. The overview directory tree of our dataset and part of the meta-data are shown in Fig. 2.

EEG data collection. As shown in
The repository contains three regular BIDS files, 31 participants' data folders, a sourcedata folder, and a derivatives folder. The stand-alone files offer an overview of the dataset: (i) dataset_description.json is a JSON file depicting the dataset, such as the objective, acquisition time, and location; (ii) participants.tsv contains participants' demographic information, such as age, sex, and handedness; (iii) participants.json describes the column attributes in participants.tsv. Each participant's folder contains two folders named beh and eeg respectively and one file sub-xx_scans.tsv. The TSV file contains information about the scanning time of each task. The beh folder contains the corresponding behavioral data such as stimulus, response time, mouse trajectory, etc. The eeg folder contains the minimally processed raw EEG data, channels and marker events files of four sessions. The EEG data was converted from raw metafile format (.mff file) to EEGLAB 18 dataset format (.set file) using the EEGLAB toolbox in MATLAB since EEG-BIDS is not officially compatible with the .mff format. All data was formatted to EEG-BIDS using the MNE-BIDS 19 package in Python. The sourcedata folder contains two folders, psychopy, and rawdata, corresponding to the presentation scripts for all tasks and the raw EEG data in metafile format, respectively. Finally, the derivatives folder contains preprocessed EEG data, including resting-state and all three task sessions. EEG data preprocessing. For the data of binary choice tasks, standard preprocessing operations including resampling (100 Hz) and filtering (0.1-30 Hz) were performed in the EEGLAB. Moreover, 11 channels close to the left and right eyes were used as the EOG channels, and 17 channels with extensive artifacts were removed 20 for further data processing and visualization, as shown in Fig. 3b. We removed bad channels with the clean_rawdata plugin and did spherical interpolation further. Then we converted data to average reference and rejected bad data periods with clean_rawdata again. Next, we ran ICA decomposition and used ICLabel 21 for automatic independent component labeling and rejection. After preprocessing, we extracted data epochs from −200 ms to 800 ms at the stimulus onset with a mean baseline correction from −200 ms to the onset time. To provide an overview of the EEG signals, we presented an exemplar EEG result after preprocessing in Fig. 3c.
The resting-state EEG data preprocessing is basically identical with the task EEG except for minor parameter tuning for the subsequent microstate processing 22 . After standard procedures such as resampling, filtering, and artifacts removal on continuous data, we extracted the first 6-second data after fixation onset and segmented it into 2000 ms epochs, with −100 ms to 0 ms for baseline correction 8,23 . Behavioral and mouse-tracking preprocessing steps. The behavioral analyses were performed separately, including reaction times (RT), accuracy, and mouse trajectory. RT and accuracy results were visualized in Fig. 4a-d   www.nature.com/scientificdata www.nature.com/scientificdata/ Mouse-tracking data was preprocessed temporally and spatially 24 . Temporally, the time windows combining trials and responses were sliced into 101 identical time bins using linear interpolation to average their length across multiple trials. Spatially, the trajectories of the mouse-tracking data were flipped to the same side and transformed into standard coordinate space (bottom left: [0, 0], top right: [1,1]).

technical Validation
We could extract at least two kinds of valuable behavioral data for all binary choice tasks: i) behavioral data associated with choices, such as choices that participants made or their response time during this procedure, ii) mouse trajectory data when participants made a choice. Several novel analyses can be done in neuroscience or BCI by combining behavioral data, mouse trajectory, and EEG data. For example, since mouse trajectory and EEG are both time-series data, their fusion would allow us to explore the association between human behavior and brain activity in decision making dynamics 25 . For the image/text-based semantic judgment task, we can combine ERP analysis with EEG source reconstruction algorithms to explore the neural representation of objects in different modalities with HD-EEG data 26 . Actually, a recent study from HajiHosseini and Hutcherson has investigated the time course of food choices with EEG 27 . In addition, researchers can decode or predict mouse position based on EEG signals to get better performance when using BCI to control computer mouse cursor for people with motor disabilities.
Four participants were excluded from further validation. Participant sub-28 performed poorly on both the word choice and image choice tasks, with an accuracy of less than 90%. There was 20 Hz EEG noise on participant sub-01 and sub-05. Participant sub-19 was excluded because of poor EEG epoch numbers (less than 200 trials for both word choice and image choice tasks) after automatic data rejection and epoching. Only trials with correct response were reserved for the word choice and image choice task in both behavioral and EEG data. www.nature.com/scientificdata www.nature.com/scientificdata/ There were 101 channels left, and the bad channel percentages over all three tasks (i.e., food choice task, word choice task, and image choice task) were 1.96% ± 1.28 (Mean ± SD), 2.56% ± 1.76, and 2.74% ± 1.85, respectively. The algorithm automatically determines the independent component number, which is around 100. The ICLabel removed components percentages for each task were 3.85% ± 1.8, 3.07% ± 1.49, and 3.31% ± 1.03, respectively. Since we only care about the correct response, the total trial number for different tasks is reduced to 310 to 320. After data rejection and epoching, 7.76% ± 4.43, 8.49% ± 4.21, and 10.02% ± 5.62 of epochs were dropped for each task.

Behavioral validation.
Here we presented some basic behavioral results to validate the dataset availability. www.nature.com/scientificdata www.nature.com/scientificdata/

EEG validation.
Resting-state EEG. After the preprocessing, we used the Microstate 28 (https://www.thomaskoenig.ch/index.php/software/microstates-in-eeglab) plugin in EEGLAB to analyze the resting-state EEG data. We conducted the microstate analysis and identified four topography states (see Fig. 8). In light of the sample results, the topography states were stable, and the individual topography states were overlapped with four grand averaged topography states. www.nature.com/scientificdata www.nature.com/scientificdata/ higher alpha power in C3 and C4 than Cz due to the left and right motor responses. This was in line with a previous study which observed similar lateralized patterns in motor response, particularly for alpha power 30 .
Image choice task EEG. Likewise, the EEG data were segmented based on the image stimulus onset. The averaged ERPs and time-frequency results were displayed in Figs. 5 and 6. In particular, we observed more positive ERP amplitude over Cz for animate than inanimate stimuli. However, the averaged time-frequency maps across both conditions in Fig. 6 did not show a power difference between C3 and C4.  www.nature.com/scientificdata www.nature.com/scientificdata/ Word choice task EEG. Segmented EEG data were time-locked to word stimulus. Similarly, we displayed the averaged ERPs and time-frequency results in Figs. 5 and 6. For ERP amplitudes, we replicated more positive ERP amplitude over Cz for animate than inanimate words. This result is consistent with previous studies that showed larger negative N400 for inanimate word or sentence processing 31 . Again, the averaged time-frequency maps across both conditions in the word choice task did not show a power difference between C3 and C4.
Consistent with previous studies, we found longer RTs, larger deviations in mouse trajectories, and more negative conflict-related negativity for the inanimate condition than animate condition. Therefore, it validates the dataset from both behavioral and EEG features.
Decoding choices with EEG. Decoding analyses were performed in MNE-Python combined with Scikit-learn, using the support vector machine classifier (with the function of SVC). For all three tasks, participants needed to click on the left or right option for a given pair of options. For simplicity, we decoded participants' binary choices (left or right) for all three task-based EEG data. Here we extracted EEG epochs from −200 ms to 1800 ms since participants' mean response time is over 1200 ms. The performance of classification is higher than the chance level (0.5). Specifically, the accuracy for the three tasks (i.e., food choice task, word choice task, and image choice task) were 0.67 ± 0.08, 0.73 ± 0.09, and 0.77 ± 0.1, respectively. The topomap for left or right choice in Fig. 7 also showed different patterns over time. The decoding results evaluated the potential of the data for future decoding use.

Usage Notes
Most of the software or packages used for analyzing these data are freely available. All codes for preprocessing and plotting are openly accessed (see Section Code availability). We used a python package named squeak (https://github.com/eointravers/squeak) for the mouse trajectory data analysis of binary choice tasks. For task EEG data analysis, we used MNE-Python to generate formatted BIDS data, and plot ERP and time-frequency results based on the preprocessed data. Also, we used EEGLAB with plugins to preprocess all EEG data and do the microstate analysis for resting-state EEG data. For more detailed technical usage instructions, please refer to the GitHub repository.
There are three types of EEG event markers in all three binary choice tasks. The first is the trial index marker. Its value ranges from "0000" to "0319", corresponding to the fixation starting from the first trial to the 320th trial, respectively. This type of marker mainly facilitates the correspondence between EEG data and behavioral data when the marker is missing.
The second is the stimulus marker, which appeared when the stimulus occurred and varied across tasks. There is only one stimulus marker in the food choice task, "0400". In the image choice and word choice task, there are four types of stimulus markers, from "0400" to "0403", which correspond to: (1) showing an animate object, the animate option appears on the left; (2) showing an animate object, the animate option appears on the right; (3) showing an inanimate object, the animate option appears on the left; (4) showing an inanimate object, the animate option appears on the right.
The third type is the response marker, which is locked to the option participants made. The food choice task has two response markers, "0500" and "0501", denoting the left and right, respectively. In the image choice and word choice tasks, there are eight types of response marker, from "0500" to "0507", which correspond to: (1) showing an animate object, the animate option appears on the left, left option selected; (2) showing an animate object, the animate option appears on the left, right option selected; (3) showing an animate object, the animate option appears on the right, left option selected; (4) showing an animate object, the animate option appears on the right, right option selected; (5) showing an inanimate object, the animate option appears on the left, left option selected; (6) showing an inanimate object, the animate option appears on the left, right option selected; (7) showing an inanimate object, the animate option appears on the right, left option selected; (8) showing an inanimate object, the animate option appears on the right, right option selected.

Code availability
The code used to preprocess the data and plot results is openly available on GitHub (https://github.com/ andlab-um/MT-EEG-dataset). For more details about code usage, please refer to the GitHub repository.