Background & Summary

Visual evoked potentials (VEPs) have a long record of shedding light on the spatial and temporal dynamics of large-scale neural processing in the brain1,2. EEG potentials registered at scalp electrodes result from synchronous activity in large populations of neurons that are distributed across cortical and subcortical areas3,4. Visual stimulation gives rise to a fast sequence of well-known EEG components that reflect initial processing at latencies before ~100 ms5,6, subsequent object and recurrent processes7,8,9, and later components that reflect target detection, integration and decisions10,11,12. The VEP provides a millisecond by millisecond recording of whole-brain activity dynamics, and has a rich distribution of temporal frequencies that provides further insight into the functionality of brain processes13,14.

From VEPs recorded across the scalp, the underlying distributed patterns of brain activity can be estimated using an inverse solution based on anatomical constraints15,16,17,18. The anatomical constraints determine what electrical fields from a neural activity source would look like at the recording electrodes on the scalp, given the conductivities of various tissues and fluids lie in between. An inverse solution translates the recorded electrical potential field back to a pattern of distributed source activity in the brain. Source localizations from EEG are necessarily coarse, as compared to fMRI, and improving them further is an active field of research and helps understand the activity dynamics within areas, and their inter-relatedness19,20,21,22.

Activity in each area can influence activity throughout the brain in a few steps, due to the dense connectivity of cortico-cortical and cortico-subcortical fibers23,24,25,26. This structural connectivity can be inferred with diffusion weighted imaging (DWI)27,28. The resulting connectome constitutes a road map of sorts over which activity can propagate between areas, and in an important sense constrains how activity within an area can evolve through the influences it receives from others29,30.

The VEPCON dataset31 combines raw EEG data, T1-weighted (T1w) MRI, and DWI for 20 human participants, with as derivatives, VEPs, inverse solution matrices, brain parcellations and connectomes at 5 different spatial scales (Fig. 1). These data were recorded to study the dynamics of functionally specialized processes that support face and motion perception. High-density EEG was recorded in two active paradigms where participants categorically discriminated face images from scrambled counterparts, or coherent from incoherent motion in random dot kinematograms. VEP sources for face stimuli are known to include inferior temporal and lateral occipital cortex and for motion stimuli they include dorsal area MT8,9. Part of these data were previously used for improving and validating EEG source imaging methods32 and time-varying functional connectivity methods33,34, for using connectomes to inform inverse solutions35,36, as well as for developing the multi-modal image processing pipeline software Connectome Mapper 3 (CMP3)37.

Fig. 1
figure 1

Schematic overview of the study design and data. (a) Stimuli and temporal structure of trials in the Motion and Face task. (b) Illustrates 128-channel EEG recording and a grand-average visual evoked potential at the scalp level, with corresponding source activity time-series from three areas at parcellation Level 3. (c) Sample defaced T1w images and illustration of brain parcellations and corresponding connectomes based on fiber density at three spatial scales.

The dataset is publicly available on OpenNeuro31, and raw data are structured following the MRI and EEG Brain Imaging Data Structure standards (BIDS38,39). We expect the data to be useful for the development and benchmarking of multimodal analysis methods that combine functional and structural information, for exploring structure-function relations, and for controlling whether and how the level of parcellation affects results. We also expect unimodal reuse value for development and benchmarking of EEG source-imaging approaches and functional connectivity analyses. Beyond this, the availability within the same participant of single-trial behavior, EEG data, inverse solutions, anatomical and diffusion MRI, and connectomes based on four different indices allows for many types of analyses that can generate new hypotheses about dynamic brain function in human visual processing and behavior.

Methods

Participants

Twenty participants (3 males, mean age = 23 ± 3.5) took part, recruited from the local student population (University of Fribourg, Switzerland). Participants had normal or corrected-to-normal vision. Before the experiment visual acuity was tested with the Freiburg Acuity test, and a value of 1 had to be reached with both eyes open40. Nineteen participants were right-handed, one was left-handed. All participants provided written informed consent before the experiment. The experimental procedures complied with the Declaration of Helsinki and were approved by the regional ethics board (CER-VD, Protocol Nr. 2016–00060).

Stimuli, display and procedures

EEG data were recorded while participants performed a face detection and a motion discrimination task (see Fig. 1). The order of the two tasks was counterbalanced across participants.

Face stimuli were female and male faces (4° by 4° of visual angle) taken from online repositories and cropped with a Gaussian kernel to smooth the borders. Scrambled images were obtained by fully randomizing the phase spectra of the original images41. In the face detection task, each trial lasted 1.2 s and started with a blank screen (500 ms). After the blank screen, one image (either a face or a scramble image of a face) was presented at the center of the screen for 200 ms and participants had the remaining 1000 ms to respond. The task was to report whether they saw a face or not (yes/no task) by pressing one of two buttons in a response box with their right hand. After the response and a random interval (from 600 to 900 ms), a new trial began. The experiment consisted of four blocks of 150 trials each, for a total of 600 trials, i.e., 300 with faces and 300 with scrambled faces. Faces and scrambled faces were randomly interleaved across trials.

In the motion discrimination task, motion stimuli were dot kinematograms presented on a circular frame at the center of the screen (dot field size = 8°; dot size = 12 pixels; lifetime = 10 frames; average number of dots inside the field = 75; dot speed in displacements per frame = 0.01°; mean dot luminance = 50%). For coherent motion, 80% of the dots were moving toward either the left or right, with the remaining 20% moving in random directions. For incoherent motion stimuli, all dots moved randomly. Each trial started with a blank interval of 500 ms followed by a centrally presented stimulus for 300 ms (Fig. 1). Participants discriminated whether the presented motion was coherent or incoherent by pressing one of two buttons of a response box. After the response there was a random interval (from 600 to 900 ms) before the next trial began. There were four blocks of 150 trials each, for a total of 600 trials (300 with coherent motion). The two conditions were randomly intermixed within each block.

Stimuli were generated using Psychopy42,43 and presented on a VIEWPixx/3D display system (1920 × 1080 pixels, refresh rate of 100 Hz). Responses were collected using a ResponsePixx response box (VPixx technologies).

EEG recording and preprocessing

EEG data were recorded at a sampling rate of 2048 Hz with a 128-channel Biosemi Active Two EEG system (Biosemi, Amsterdam, The Netherlands) in a dimly lit and electrically shielded room. Signal quality was ensured by monitoring and maintaining the offset between the active electrodes and the Common Mode Sense - Driven Right Leg (CMS-DRL) feedback loop under a standard value of ± 20 mV. After each recording session, individual 3D electrode positions were digitized using an ultrasound motion capture system (Zebris Medical GmbH).

Offline, data were preprocessed using EEGLAB 14.1.144. EEG data were downsampled to 250 Hz and detrended (antialiasing filter: cut-off = 112.5 Hz, bandwidth = 50 Hz, detrending at <1 Hz). Line noise (50 Hz) was removed via spectrum interpolation45. Data were then segmented into epochs and time locked from −1500 to 1000 ms from the stimulus onset in both tasks. Data from participant 05 (Face and Motion tasks) and 15 (Motion task) was not further processed and discarded due to excessive noise. Bad channels and epochs were identified and removed before preprocessing. Remaining physiological artifacts were isolated using an independent component analysis (ICA) decomposition (FastIca). Bad components were labelled by crossing the results of a machine-learning algorithm (MARA, Multiple Artifact Rejection Algorithm in EEGLAB) with the criterion of >90% of total variance explained and removed manually. Bad channels were then interpolated using the nearest-neighbor spline method and data were re-referenced to the average reference.

EEG source imaging

EEG source imaging was performed using Cartool (v3.80)46 and custom-made scripts (see Matlab and Python examples in the code/ directory). Source reconstruction was based on Cartool-segmented individual MRI T1w data, co-registered individual electrode positions, and the LORETA47 and LAURA15 algorithms (regularization = 6; spherical model with anatomical constraints, LSMAC). Leadfields were calculated for each of the around 5000 freely oriented dipoles while limiting the solution space to voxels within the gray matter mask46 provided by CMP3.

MRI recording and processing

MR data from the same 20 subjects was acquired on a General Electrics Discovery MR750 3 T MRI clinical scanner at the cantonal hospital in Fribourg, Switzerland, using a 32-channel head coil. The acquisition included anatomical T1-weighted images and DTI. T1-weighted images were acquired as rapid-gradient echo (MPRAGE) volumes using a COR FSPGR BRAVO pulse sequence (flip angle  =  9°; echo time = 2.8 ms; repetition time = 7300 ms; inversion time  =  0.9 s, FOV = 220 mm, matrix size = 256 × 256, number of slices = 276, slice thickness  =  1 mm, in-plane resolution = 0.9 × 0.9 mm2). DTI data were acquired with a spin echo single shot EPI pulse sequence and a diffusion sensitizing gradient set of 30 different directions and 5 diffusion-free B0 scans (echo time = 87 ms; repetition time = 8000 ms; interleaved slice order; b-weighting of 1000, FOV = 260 mm, matrix size = 128 × 128, number of slices = 60; slice thickness = 2.0 mm; slice spacing = 0.2 mm, in-plane resolution = 2.0 × 2.0 mm2).

Processing of all T1w and DTI data was performed using the Connectome Mapper v3.0.0-beta-RC1 pipelines37. All T1w scans were resampled to 1mm3 isotropic resolution from which gray and white matter were segmented using Freesurfer 6.0.148, and parcellated into 83 cortical and subcortical areas49. The parcels were then further subdivided following the method proposed by50 into 129, 234, 463 and 1015 approximately equally-sized parcels according to the Lausanne anatomical atlas. DTI data were corrected from motion and eddy current distortions using mcflirt and eddy_correct provided by FSL 5.0.9 and resampled to 1mm3 resolution using mrconvert from MRtrix 3.0.0-RC1. Diffusion directions per voxel were then reconstructed using the algorithm of constrained spherical deconvolution implemented in MRtrix 3.0.0-RC151 with a maximal order of 4, enabling the estimation of multiple directions per voxel.

For sharing purposes, all raw T1w and DTI data were anonymized during BIDS conversion and all anatomical T1w data were de-identified by removing facial features using Quickshear52.

Structural connectomes

For each participant, structural connectivity matrices were estimated from the reconstructed fiber orientation distribution (FOD) image using the SD_stream deterministic streamline tractography algorithm implemented in MRtrix 3.0.0-RC151. Fiber streamline reconstruction started from seeds in the white-matter that were spatially random and the whole process completed when a number of 1 M fiber streamlines were reconstructed. At each streamline step of 0.5 mm, the local FOD was sampled, and from the current streamline tangent orientation, the orientation of the nearest FOD amplitude peak was estimated via a Newton optimization on the sphere. Fibers were stopped if a change in direction was greater than 45 degrees. Fibers with a length not in the 5 mm to 200 mm range were discarded. The streamline reconstruction process was complete when both ends of the fiber left the white matter mask. Then, for each scale, the parcellation was projected to the native DTI space after symmetric diffeomorphic co-registration between the T1w scan and the diffusion-free B0 using the Advanced Normalization Tools (ANTs) 2.2.0. Finally, connectivity matrices at 5 different spatial scales were built by considering all fiber streamlines connecting parcels according to the following connectivity measures: number of fibers, fiber density, average and median Fractional Anisotropy (FA), and Mean Diffusivity (MD).

Data Records

The VEPCON dataset is available via the Open Neuro repository31, and is fully BIDS compatible (v1.4.1). Below, we describe all data records following the directory structure.

The main directory contains descriptor files detailing the dataset, and the age and sex of each participant. Subfolders (sub- <label>/) hold the raw EEG data (eeg/) in .bdf format, as well as the MRI T1w images (anat/) and diffusion weighted images (DWI) (dwi/) in the compressed Nifti format, with acquisition parameters stored in their associated sidecar .JSON file. With respect to raw EEG data, description of timing and other event properties are stored in *_task- <label>_events.tsv files and the corresponding column descriptor (*_task- <label> _events.json), for each task (task-faces and task-motion). The events file contains a table listing for each single trial the stimulus onset relative to recording onset (sec), condition specifiers, response made, response evaluation, reaction time and whether the trial was discarded or not in preprocessing. Trials with behavioral errors or reaction times slower than 200 ms were marked as outliers. For each electrode the digitized electrode positions (x, y, z; mm) are listed in the *_electrodes.tsv file, and status information (good/bad) is listed in the *_channels.tsv file.

The derivatives/ directory includes a dataset_description.json file for each derivatives dataset (mriqc, cmp, cartool, eeglab) at the root level. Any subject-specific datatype-specific derivatives are housed within each subject/datatype’s directory (sub- <label>/<datatype>) where <datatype> can be “anat”, “dwi”, “eeg”, “xfm” or “connectivity”. The output reports generated by MRIQC, and CMP3 derivatives of anatomical and diffusion MRI are in the mriqc- <version> cmp- <version>) folders, where the version of each software used is encoded by the <version> label (e.g., cmp-v3.0.0-beta-RC1). EEG derivatives obtained with EEGLAB (eeglab- <version>) and Cartool (cartool- <version>) are contained in their respective folders and organized to comply with BIDS to the fullest possible extent (specifications for some of these files are currently under development).

In the eeglab/ directory, each eeg/ participant’s folder contains the preprocessed single trial epochs (.fdt, .set). The task-faces files contain epochs for face stimuli and control stimuli, the task-motion files contain epochs for coherent and incoherent motion stimuli. The directory holds summaries of preprocessing and data cleaning for the motion and face task in two separate .tsv files. These files list for each participant the proportion of channels, epochs and ICA components removed during preprocessing.

In the cartool/ directory, each eeg/ participant’s folder holds a text file (.xyz) with individual electrode positions in mm, co-registered to the participant’s head (x, y, z, Biosemi electrode name). The .spi file is a similar text file listing x, y, z coordinates and a text label for each sourcepoint for which an inverse solution was calculated. The inverse solution matrices for LAURA and LORETA15,47 are in the respective .is files; they map observed patterns of EEG potentials to a distribution of 3D dipoles across source points. The rois/ folder lists files that indicate which source points belong to what area (region of interest, ROI), according to each parcellation level, with a Cartool readable (.rois) and a Python readable version (.pickle.rois).

The cmp/ directory contains anatomical, diffusion and structural connectivity data for each participant. All T1w-derived data are placed in the anat/ folder, which includes the brain segmentations and parcellations in the native T1w space and in the native DTI space (_space-DWI_). All DTI-derived data are placed in the dwi/ folder. It includes the preprocessed DTI, the FOD image and the final tractogram used to build the connectivity matrices. The connectivity/ folder holds connectomes at each parcellation level, with a Matlab readable (.mat) and Python readable (.gpickle) version. Each connectome file contains the following metrics: number_of_fibers, fiber_density, mean_FA and median_FA, mean_MD and median_MD. The transformations from the native T1w space to the native DTI space applied to the parcellations are stored in the xfm/ folder. The generated .pdf in the group/ folder visualizes each individual’s connectome using the number of fibers metric, for parcellation scale 1.

The code/ directory provides code for preprocessing and derivation steps for MRI and DWI data, as well as sample scripts to derive time-series of activity per ROI from the preprocessed EEG data in Python and Matlab. Each code folder has its own README file with afor more specific user guide and documentation to get started. The MRIQC-Docker/ folder holds the code to run MRIQC on all participants. All outputs generated by MRIQC to support quality assessment of T1w data can be found in the same directory53. Code to run Connectome Mapper 3 on all participants that can be found in the ConnectomeMapper-Docker/ folder. The code to deface anatomical MRIs52 is provided in the Deface-Python/ folder, together with the generated log file and a report in pdf format. In Source-Reconstruction-Python/, the main.py script performs all steps for source reconstruction, with comments that explain each step. The utils.py contains all necessary functions to run main.py. The Source-Reconstruction-Matlab/ folder provides the equivalent in Matlab code, with the ESI_RECONSTRUCTION.m script performing all steps of source reconstruction, and dependencies and utilities as separate files.

Technical Validation

Behavioral analysis

Analysis of proportion correct and reaction times showed that participants behaved according to task instruction. In the face detection task, the average accuracy was 97 ± 2% of correct responses, with mean reaction times of 501 ± 60 ms. Trials with behavioral errors or reaction times slower than 200 ms were marked as outliers (mean proportion of outliers 0.03 ± 0.02).

In the motion discrimination task, the average accuracy was 90 ± 14% and mean reaction times were 680 ± 90 ms. The mean proportion of outliers was 0.10 ± 0.14. One participant (number 16) inverted the response keys for several trials in the motion task, leading to an outlier accuracy value (36% of correct responses).

VEPs

EEG data were visually inspected and noisy trials and channels were excluded from further analysis (see EEG recording and preprocessing). Remaining ocular, muscle and other artifacts were removed using ICA. For the face task, the average proportion of channels removed across participants was 0.12 ± 0.07, range 0.02–0.28; mean proportion of epochs removed due to non-stereotyped artifacts, peristimulus eye blinks and eye movements: 0.04 ± 0.06, range 0.002–0.26); proportion of ICA components removed: 0.05 ± 0.03, range 0.01–0.15. For the motion task, the average proportion of channels removed was 0.14 ± 0.06, range 0.02–0.23, the average proportion of epochs removed was 0.03 ± 0.03, range 0.005–0.13); proportion of ICA components removed: 0.05 ± 0.04, range 0.008–0.13.

Figure 2 shows the grand-average VEPs for the Face, Scrambled Face, Coherent Motion and Incoherent Motion conditions, showing robust evoked responses with components that conform the existing literature.

Fig. 2
figure 2

Grand-average VEPs for each of the four stimulus conditions, with corresponding scalp potential maps.

MRI and connectomes

After recording MRI data were visually inspected and checked for neurological anomalies by the radiologist. Quality of T1w data was further inspected quantitatively by running MRIQC v0.16.1, an automated and robust quality control tool for T1w data which derives a set of 56 different image quality metrics to characterize image quality at different levels such as noise, motion or imaging artifacts53. Results were summarized in reports at both individual and group levels, which are included in the mriqc/ derivatives folder.

De-identification of T1w data was checked individually and supported by the creation of a PDF report that is available in the code/ folder.

For each participant, the quality of the output after each processing step of the CMP3 pipelines was assessed using its graphical user interface, where we inspected individually the different Freesurfer outputs, brain parcellations, DTI data after preprocessing, co-registered T1w-derived parcellations with DTI data, the estimated FOD image, the reconstructed tractogram, and the reconstructed connectomes. Visual inspection did not reveal any artifacts.

Usage Notes

For optimal sharing and re-usability, raw data of the dataset conforms to BIDS standards for MRI38 and EEG39.

Python and Matlab example code in Source-Reconstruction-Python/ and Source-Reconstruction-Matlab/ shows how to import the various raw and derivative files. This assures compatibility with commonly used EEG and MRI software, including MNE-Python54, NeuroPycon55, Fieldtrip56 and SPM57. The code also shows how to calculate one time-series of activity from all source points contained in a region using singular-value decomposition. Source activity for around 5000 freely oriented dipoles was extracted from all the source points inside each cortical area, as defined by the parcellation, and projected to a representative single direction using singular-value decomposition32.

For creating new inverse solutions, new forward models can be created from the T1w images, using the individual electrode coordinate files. When new leadfields are generated using the defaced MRIs, small differences can occur, but are unlikely to pose problems for the intended reuse purposes58.