We present a dataset combining electrophysiology and eye tracking intended as a resource for the investigation of information processing in the developing brain. The dataset includes high-density task-based and task-free EEG, eye tracking, and cognitive and behavioral data collected from 126 individuals (ages: 6–44). The task battery spans both the simple/complex and passive/active dimensions to cover a range of approaches prevalent in modern cognitive neuroscience. The active task paradigms facilitate principled deconstruction of core components of task performance in the developing brain, whereas the passive paradigms permit the examination of intrinsic functional network activity during varying amounts of external stimulation. Alongside these neurophysiological data, we include an abbreviated cognitive test battery and questionnaire-based measures of psychiatric functioning. We hope that this dataset will lead to the development of novel assays of neural processes fundamental to information processing, which can be used to index healthy brain development as well as detect pathologic processes.
Machine-accessible metadata file describing the reported data (ISA-tab format)
Background & Summary
It has become increasingly apparent that there are abundant links between cognitive deficits and mental health disorders. However, progress towards a comprehensive understanding of these relationships has remained slow. In part, this is believed to be a reflection of limitations in the current diagnostic classification systems1,2. Epidemiologic, genetic and neuroimaging studies alike have struggled with a lack of specificity in findings (i.e., an inability to map a single phenomenon to single diagnosis), leading many to question the validity of diagnostic boundaries drawn based upon clinical observation rather than biology. Compounding these challenges is the tendency of researchers to focus on a single disorder at a time when collecting data, which limits trans-diagnostic analyses. These realities have encouraged calls for a paradigm shift in clinically-focused cognitive neuroscience research from the restricted study of specific facets of cognition in specific diagnostic groups, towards the ideal of examining all facets in all individuals3. The National Institute of Mental Health has taken a leading role in these efforts by establishing the Research Domain Criteria (RDoC) Project as a framework for forging multidimensional characterizations of mental illness. A central aspect of this framework is the integration of information across multiple levels (e.g., from genetics to self-report), and the recognition of human neuroimaging and neurophysiology measures as potentially providing key dimensions4.
The field of cognitive neuroscience offers a variety of approaches to measuring functionally relevant neural activity, each with its own pros and cons. For example, some research has focused on neural activity measurements during task performance, which can link discrete neural signatures to behavioral outcomes recorded simultaneously. Meanwhile, ‘task-free’ neural recordings taken during resting or passive stimulation conditions (e.g., naturalistic viewing) have increased in popularity because they provide a broader view on neural dynamics that exceed specific, circumscribed task scenarios. Additionally, they remove behavioral requirements that can at times limit their utility in developing and clinical populations. Another dimension along which approaches vary greatly is that of paradigm complexity. This dimension involves a definite trade-off: on one hand, elementary tasks involving reduced stimuli with few attributes and simple action mappings afford greater possibilities to link low-resolution neural activity measures to well-defined computations, partly because the computational building blocks of a simple task are more easily identified. On the other hand, such reduced, simplified tasks correspond to artificial behavioral scenarios, and it is important to measure neural activity during more complex, ecologically valid behavioral scenarios that lie closer both to real-life behavior and to clinical symptomology.
Here, we present a novel battery of EEG-based paradigms that attempts to ‘run the gamut’ in both of these respects, widely spanning both the passive and active, as well as simple and complex paradigm dimensions. The battery includes three active task paradigms, where ‘active’ here refers to a requirement of the subject to actively engage with and choose actions based on the presented stimuli. These tasks allow, to varying degrees, principled deconstruction of core components of task performance (see Table 1).
The simplest paradigm permits the tracing of the three major processing stages for simple contrast decisions5. The second paradigm involves the learning of simple sequences6. The third emulates a standard neuropsychological processing speed task, which involves multiple perceptual decisions, short-term memory, and motor responses. For all of these tasks, simultaneous eye tracking provides a rich complement to EEG-based characterizations of neural processing and cognition.
The battery also includes three passive paradigms, which permit the examination of intrinsic functional network activity during different amounts of external stimulation, namely, no stimulation (classical resting-state); simple and reduced (surround-suppression paradigm); and complex and rich (videos; Table 1). Whereas the simpler stimulation offers insight into elemental facets of information processing such as excitatory/inhibitory balance, the complex video stimuli allow measurement of engagement with naturalistic content7. Alongside these neurophysiological data, abbreviated, standardized tests of intelligence and academic achievement, and self- or parent-reported measures of psychiatric functioning have been included.
In our initial release, we present high-density task-based and task-free EEG, eye tracking, and cognitive and behavioral data for 126 subjects ages 6–44, the majority of whom do not have a history of clinical illness. Our long-term goal is to collect data on this multi-level, multi-modal battery from a diverse community sample, including patient populations. Ultimately, we hope that this dataset will provide a rich new set of metrics for assaying neural processes fundamental to perception and cognition across a continuum from healthy to pathological functioning, and thereby contribute to understanding and better diagnosing a broad range of brain pathologies.
Participants and experiment overview
126 individuals between the ages of 6 and 44 were invited to participate in a study investigating domain-general cognitive processes related to attention, working memory, perception, and decision-making across a range of task/stimulation contexts (Fig. 1). The participants were recruited from both the Child Mind Medical Practice, as well as the wider New York City-area community. 80.2% were typically developing, and 19.8% were diagnosed with one or more clinical disorders (see Table 2 for a summary of diagnostic categories represented in the sample). The participants were 54.8% male, 45.2% female; 45.2% identified as Black or African American, 32.7% as White, 0.04% as Asian, and 17.3% as other race or races. Also included are measures of subject handedness and socioeconomic status (MacArthur Scale of Subjective Social Status, http://www.macses.ucsf.edu/research/socialenviron/sociodemographic.php).
Prior to visiting the laboratory, participants (or their legal guardians, in the case of participants under the age of 18) completed a 10 min. pre-screening interview over the phone with a research assistant to confirm their eligibility and safety to participate in the study. This brief interview obtained information regarding an individual’s psychiatric history, including past or present diagnoses and/or treatment, as well as current medications and any neurological disorders. If a participant demonstrated no contraindications for EEG (e.g., history of seizures or epilepsy), he or she was then scheduled for a research study appointment.
The full battery of EEG and eye tracking tasks and behavioral assessments was five hours in duration; participants were permitted to split their visit into two shorter sessions, lasting 3 h (EEG recording and eye tracking portion) and 2 h (cognitive and behavioral assessment portion) respectively. Multiple breaks within and between sessions were included. For those who elected to participate in the single, full-length session, the EEG and eye tracking tasks always preceded the cognitive and behavioral testing.
The study was approved by the Chesapeake Institutional Review Board. Written informed consent was obtained from all participants or their legal guardians prior to the start of the experiment; additionally, written assent was obtained from participants under the age of 18 and over the age of 6. Consent was also obtained for data sharing through the 1,000 Functional Connectomes Project [http://fcon_1000.projects.nitrc.org/].
All behavioral and cognitive assessments are described in Table 3.
Behavioral self-report measures were acquired via the online Self-Assessment Portal of the Collaborative Informatics and Neuroimaging Suite (COINS).
Cognitive testing was administered by trained research assistants in a sound-shielded room. The participants’ responses were first-scored by the research assistant who administered the test; then, to ensure accuracy, the entire set of responses were again scored by another trained research assistant. Furthermore, all test scores were double-entered into the database by two different research assistants. Both raw scores and standard scores are provided as part of this dataset.
Data acquisition overview
Participants were seated in a sound-attenuated and dark experiment room at a distance of 70 cm from a 17-inch CRT monitor (SONY Trinitron Multiscan G220, display dimensions 330×240 mm, resolution 800×600 pixels, vertical refresh rate of 100 Hz). The data were recorded without shielding of electromagnetic interference. A stable head position was ensured via the chin rest. Subjects were instructed to stay as still as possible during the tasks. Two breaks were included in the EEG session, during which electrode impedance levels were checked and reduced if necessary. Participants were also offered snacks and juice during the breaks, and encouraged to rest.
Stimulus presentation was programmed in MATLAB (6.1, The Math-Works, Natick, MA, 2000), using the PsychToolbox extension8,9. The order of the EEG and eye tracking paradigms was the same for all participants. Instructions for the tasks were presented on the computer screen, and a research assistant answered questions from the participant from the adjacent control room through an intercom. Compliance with the task instructions was confirmed through a live video-feed to the control room. If participants were approximately 12 years of age or younger, they were joined in the experiment room by an additional research assistant who proctored their testing session; otherwise, participants completed the EEG and eye tracking tasks siting alone in the room.
High-density EEG data were recorded at a sampling rate of 500 Hz with a bandpass of 0.1 to 100 Hz, using a 128-channel EEG Geodesic Hydrocel system. The recording reference was at Cz (vertex of the head). For each participant, head circumference was measured and an appropriately sized EEG net was selected. The impedance of each electrode was checked prior to recording, to ensure good contact, and was kept below 40 kOhm. Time to prepare the EEG net was no more than 30 min. Impedance was tested every 30 min of recording and saline added if needed.
Eye tracking acquisition
During all of the EEG paradigms, eye position and pupil dilation were recorded with an infrared video-based eye tracker (iView-X Red-m, SMI GmbH; http://www.smivision.com/en.html) at a sampling rate of 120 Hz, quoted by the manufacturer to have a spatial resolution of 0.1° and a gaze position accuracy of 0.5°. The eye tracker was calibrated with a 5-point grid before each paradigm. Specifically, participants were asked to direct their gaze in turn to a dot presented at each of 5 locations (center and four corners of the display) in a random order. In a validation step, the calibration was repeated until the error between two measurements at any point was less than 2°, or the average error for all points was less than 1°.
Both task-independent (passive) and task-based (active) paradigms were included in the EEG battery as they play complementary roles in the investigation of human brain function. Paradigms were also selected to vary widely in the degree of sensory stimulation involved and/or the depth of processing, from simple to complex. Our task-free paradigms permit examination of intrinsic functional networks for different degrees of external stimulation, e.g., no stimulation (classical resting-state); simple and reduced (surround-suppression paradigm); and complex and rich (videos). In general, such passive paradigms enable measurement of neurophysiological indices of brain function on a relatively equal footing across a wider population, including low-functioning neurological and psychiatric populations for whom task-based assays are a challenge. On the other hand, our task-based paradigms aim to isolate distinct, fundamental information processing steps that play a core role in most neuropsychological and psychometric assessments, and thereby furnish a systems-level, neurophysiologically-based account of the factors underlying observed impairments in accuracy and/or response speed. Taken together, our EEG paradigm battery is intended to provide a window into neurophysiological mechanisms underlying domain-general cognitive functions, which account for a diverse range of behaviors and should thus, in theory, be possible to connect with psychiatric symptoms. With the exception of paradigm #3 (naturalistic video10), none of the datasets have yet been used in any published research articles.
Note that the purpose of the ‘Output Measures’ section in each of the following paradigm descriptions is to propose or guide possible analysis strategies for other researchers, without any intention to restrict the scope for using the present dataset in further creative and distinctive ways.
Paradigm #1 (Passive): Resting-state.
The acquisition of endogenous brain activity without any external stimulation has become very popular in the EEG and functional MRI communities. The low cognitive demand and relatively short duration of resting-state recordings make them well suited for studying pediatric and clinical populations with low tolerance for standard paradigms and acquisitions11. A growing number of studies have shown that many of the brain areas engaged during various cognitive tasks also form coherent large-scale brain networks that can be readily identified in data recorded during rest12,
Numerous studies have demonstrated high intra-individual stability for resting EEG measures15,
Stimuli & Experimental Design
Participants viewed a standard fixation cross in the center of the computer screen. The recorded voice of a female research assistant instructed them to ‘now open your eyes’ (rest with eyes open for 20 s) and ‘now close your eyes’ (rest with eyes closed for 40 s); this procedure was repeated 5 times, alternating between eyes opened and eyes closed. For purposes of analysis, we were mainly interested in the eyes-closed condition, due to the lower frequency of eye blinks. However, we interspersed the brief eyes-open blocks throughout the task in order to ensure that participants remained engaged for the duration of the task session.
‘Fixate on the central cross. Open or close your eyes when you hear the request for it. Press to begin.’
There are various ways to analyze resting-state EEG data. One can examine the data in the frequency domain using classical power spectral analysis, which has been successfully employed to characterize subjects’ age23, state of arousal24, the presence of neurological or psychiatric disorders25,
Paradigm #2 (Passive): Surround suppression
Task Overview: The surround suppression paradigm enables measurement of basic sensory excitation by visual stimuli and the suppressive contextual influence of the visual background, thereby providing insight into relative levels of excitability and inhibition in the human cortex. In this paradigm, periodic, visual, on-off flicker stimulation is used to elicit periodic EEG/MEG responses at the exact frequency of stimulation and its harmonics, known as the steady state visual evoked potential (SSVEP)39,40. Being spectrally restricted to a single frequency, SSVEPs provide a measure of visual neural response amplitude with a higher signal-to-noise ratio than standard transient evoked potential approaches39,40. SSVEP amplitude and phase can be measured to probe sensory sensitivity and latency (timing) information, respectively, and these measures can further be tracked over time to gain insight into dynamic aspects of sensory responses such as adaptation and attention orienting41. SSVEP amplitude and topographic variation across individuals correlate with intelligence42 and depend on age43. They have also been informative in the study of cognitive disorders such as schizophrenia, anxiety, stress, and epilepsy44.
In our surround suppression paradigm, we present ‘foreground’ flicker stimuli at a range of contrasts to probe basic visual excitation, and we also manipulate the contrast of a static surround pattern to probe basic inhibition. Surround suppression is the well-known phenomenon whereby the neural response to a delimited stimulus is suppressed by stimulation in the surrounding area, which has been widely observed in animal neurophysiology (e.g., refs 45,46), and in human psychophysics (e.g., ref. 47), neuroimaging (e.g., ref. 48), and electrophysiology41. In our paradigm we obtain an index of surround suppression by measuring the reduction in ‘foreground’ SSVEP amplitude that results from the presence of the static surround. Surround suppression has become increasingly relevant in clinical research, with clear abnormalities reported in a range of disorders such as depression49, autism50, schizophrenia51,52, and migraine53.
Stimuli & Experimental Design
We used the paradigm developed by Vanegas et al.41, adapted to include a restricted set of conditions that were established to provide the most robust measures. In each sequence of discrete 2.4 s trials, four circular ‘foreground’ stimuli (vertical grating, radius 2°) were flickered on-and-off at 25 Hz, embedded in a static (non-flickering) full-screen ‘surround’ (see Fig. 2). Each trial began with the presentation of the fixation spot for 500 ms, after which the foreground and surround stimuli were simultaneously presented for 2,400 ms. After an inter-trial interval of 500 ms, the following trial was initiated. Foreground and surround patterns were sinusoidal luminance-modulated gratings with a spatial frequency of 1 cycle per degree in all conditions (see Fig. 1a–c in Vanegas et al.41). Across trials, we randomly varied foreground contrast (0%, 30%, 60% or 100%), surround contrast (0% or 100%) and surround orientation (parallel or orthogonal to the foreground, i.e., vertical or horizontal). Eye gaze was monitored continually using the eye tracker. The entire task was recorded in two blocks, each consisting of 64 trials and lasting ~3.6 min. We placed the four flickering ‘foreground’ disks at locations that are well known to evoke scalp potentials that are inverted in polarity for the upper versus lower field, at polar angles of 20° above (upper) and 45° below (lower) the horizontal meridian at an eccentricity of 5° of visual angle54,
‘Just maintain fixation on the central spot at all times. Press to begin. First, we have to measure the position of your eyes. Just follow the circle with your eyes.’
The flickering foreground elicits a steady-state visual evoked potential (SSVEP) in the EEG over the posterior scalp at the fundamental frequency of stimulation, the amplitude of which increases monotonically with foreground contrast57. Surround suppression is measured as a relative reduction in amplitude of the SSVEP due to surround contrast. As mentioned above, SSVEP amplitude and phase can also by tracked over time to examine temporal aspects of gain control as well as latency effects. These measures have the potential to provide a marker of improperly balanced excitation and inhibition in children with developmental disorders, as has been implicated in recent studies of autism50.
Paradigm #3 (Passive): Naturalistic stimuli.
In recent years, there has been a significant expansion in the scope of studies utilizing naturalistic viewing paradigms58,
The goal of the present paradigm was to measure variable engagement based on the strength of higher-level audio-visual responses, and to aid the understanding of the modulation of perception across ages and developmental stages10. Participants viewed 4 short, age-appropriate video clips taken from television and movies. There is evidence that children’s performance on reading, school readiness, and creativity tests improve after viewing educational programs such as Sesame Street63. Thus, the content of educational videos, such as those used in the current study, can interact with children’s school-based knowledge. These advantages of the natural viewing stimuli over a more traditional task with simple stimuli suggest that naturalistic studies of brain activity with real-world stimuli could serve as an important complement to highly controlled EEG paradigms.
Stimuli & Experimental Design
Participants viewed 4 short, age-appropriate video clips taken from television and movies. Each clip was between 2 and 6 min in length, for a total of 12:50 min.
(Prior to this task, parents were given the opportunity to review the full list of clips and exclude any video clips they deemed unsuitable for their children; no parents had any objections to the clips.). The following are a description of clips that we included in the Naturalistic Stimuli Paradigm.
E-How video: How to Improve at Simple Arithmetic: Lessons in Math
Rating: No parental guideline rating
Description: A female instructor introduces addition and multiplication tricks.
Rationale: This clip is included to probe for attention related difficulties.
MIT K-12: ‘Fun with Fractals’:
Rating: No parental guideline rating
Description: This video depicts fractal-based geometry in everyday objects and visually depicts how some fractals are created.
Rationale: This clip is included to probe for attention related difficulties.
Diary of a Wimpy Kid Trailer:
Rating: Rated PG for some rude humor and language
Description: This comedic movie trailer is a hyperbolic depiction of a child’s experience of middle school. It contains several character vignettes.
Rationale: This clip is included to probe for socially related anxiety.
Rating: Rated PG for rude humor and mild action
Description: In this animation, a new adoptive father reads his three children a bedtime story.
Rationale: This clip is included to probe for attachment formation related issues.
‘Now you can watch video clips. Enjoy! First, we have to measure the position of your eyes. Just follow with your eyes the circle. Press to begin.’
Naturalistic audiovisual stimuli have been shown to elicit highly reliable neural activity across multiple viewers58, with the level of such inter-subject correlation (ISC) linked to successful memory encoding61, and effective communication between individuals64. ISC usually is increased during scenes marked by high arousal and negative emotional valence58, and is strongest for familiar and naturalistic events65. Here, the EEG data were analyzed using Correlated Component Analysis (CCA) in order to parse relative inter-subject correlations (ISC). We are mainly interested in the similarity of neural response across subjects for naturalistic stimuli experienced in everyday life. To determine the neural similarity among subjects in response to a stimulus, the inter-subject correlation (ISC) of the EEG signal was calculated. The procedure is described in detail in previous studies7,66.
In brief, the ISC is a measure of correlation among a group of subjects; larger values imply more similarity of the EEG signal across subjects in response to identical stimuli. The advantage of the ISC technique compared to averaging multiple trials is that it can be calculated with a single presentation of a novel stimulus, allowing naturalistic settings with continuous stimulation rather than discrete events67,
Paradigm #4 (Active): Contrast change detection.
Our contrast change detection task is based on a recently presented EEG paradigm innovation that enables the isolation and simultaneous tracing of neural dynamics at the three major processing stages underlying simple sensorimotor decisions: sensory evidence encoding, evidence accumulation over time, and motor preparation5. Here we employed a modified version of that task in order to probe fluctuations in attentional engagement in addition to these three sensory-motor processing levels. This task combines continuous visual stimulation, EEG and eye tracking in a broadly similar way to an increasing number of studies focused on other cognitive functions, e.g. attention shifting71,72.
Simple sensory-motor decision making—i.e., choosing a course of action based on a sensory judgment—can be regarded as a core component of a large portion of human behavior, and of almost any neuropsychological test administered in clinical settings. Such decisions require the momentary encoding of sensory information necessary for the decision (evidence), the sequential integration of that evidence into a ‘decision variable,’ and the concomitant preparation of an appropriate action. Whereas typical EEG tasks involve sudden-onset, discrete stimuli that evoke a complex set of overlapping components on the scalp, only a small proportion of which relate to the relevant computations underlying task performance, our contrast change detection paradigm uses gradual-change targets, thereby eliminating transient, task-irrelevant sensory-evoked signals and thus fully unmasks the neural processes of decision formation. By asking subjects to indicate detection of a change in contrast of a continuously presented, flickering visual stimulus, an independent and continuous neurophysiological measure of the momentary sensory input to the decision process can also be extracted. In tandem, motor preparatory activity such as contralateral pre-motor movement–selective beta-band (16–30 Hz) activity can be traced73,74. Thus, discrete, freely evolving neural signatures of sensory evidence encoding, decision formation and motor preparation, can be isolated using this paradigm.
In the present task battery, we employ a two-alternative version of the contrast change detection paradigm, whereby, instead of detecting a change to a single stimulus component with a single response, subjects must monitor the relative contrast of two simultaneous stimuli for gradual changes and select one of two responses to indicate the direction of the change. The reasoning behind this is that fluctuations in the sensory evidence (the difference in response to the two stimuli to be compared) can be dissociated to some degree from fluctuations in general arousal or levels of sustained attention (non-selective changes common to both responses). Such fluctuations are of considerable interest in their own right, both in clinical and basic neuroscience75,
Stimuli & Experimental Design
The contrast change detection paradigm is designed to enable isolation of the neural signatures of sensory evidence encoding, accumulation, and motor preparation without the need for complex signal processing beyond elementary epoch averaging and spectral estimation5. In the present task, subjects continuously viewed an annular pattern (inner radius: 1°; outer radius 6°) composed of two overlaid gratings tilted 45° to the left and 45° to the right of vertical, which continuously phase-reversed at distinct rates of 20 and 25 Hz, respectively. At baseline (in between targets), both gratings had an equal contrast of 50%. Participants were asked to maintain fixation on a point in the center of this stimulus, and to detect contrast-change targets, where one grating gradually increased to 100% and the other simultaneously decreased to 0%. They were asked to make a left-hand button click for targets in which the left-tilted grating increased in contrast, and to make a right-hand click for right-tilted increases. Twelve of each of these two target types were presented in each 3.1-minute block, in random order. The changes in contrast from 50 to 100% occurred linearly over 1,600 ms, with an immediate 800 ms linear return to 50%. Beginning immediately at the end of each target, the 50% contrast baseline stimulus was presented for an inter-target interval of 2.8, 4.4 or 6 s. Also, immediately following target end, feedback was presented in the form of a smiley (correct click) or sad face (incorrect click or no click) for the first 400 ms of the inter-target interval. If a subject missed three consecutive targets, a short voice recording was played, saying, ‘You just missed three targets in a row. Please focus again.’ In the current dataset, each subject completed 3 blocks of this task.
‘Fixate on the central dot. Press the LEFT button with LEFT hand when the LEFT-tilted pattern gets stronger. Press the RIGHT button with RIGHT hand when the RIGHT-tilted pattern gets stronger. Work as quickly as you can without making mistakes. Press the mouse button to begin.’
By design, the principal components of activity on this task are the SSVEP over occipital scalp sites, the event-related potential over centro-parietal scalp sites, and decreases in Mu (8–13 Hz) and Beta (16–30 Hz) spectral amplitude over left/right motor cortical areas (C3/C4), which reflect sensory evidence encoding, evidence accumulation and motor preparation, respectively5. Each of these signals has been shown to bear a systematic relationship with the timing and accuracy of the participant’s detection responses. Since this task version involves two-alternative decisions mapped to the left and right hands, the relative preparation for the two alternative actions can also be tracked via the lateralized readiness potential derived by subtracting ERP traces from motor cortical sites of the two hemispheres5,79. In addition to these measures, posterior parietal alpha-band activity can be analyzed to provide measures of vigilant attentional state. In principle, because the monitoring task is performed continuously and stimulation is continuous, neural activity measures are potentially informative on cognitive/perceptual states and processes at any point during the block of task performance.
Paradigm #5 (Active): Sequence learning
In order to evaluate the neural correlates of declarative learning, we included an explicit visual sequence learning paradigm, in which subjects repeatedly view a fixed sequence of flashed visual locations and attempt to memorize it in order to make regular intermediate recall reports. This task was originally developed by Moisello, Ghilardi and colleagues as a control condition for the examination of spectral EEG signatures of visuo-motor learning80, and was recently shown to be highly informative in its own right, in providing reliable indices of memory formation and surprise-modulated stimulus processing that related systematically to the ongoing progress of learning6. An important aspect of the paradigm is that the information to be remembered (flashed location) is of the most elementary kind and computed very rapidly in the brain, so that perceptual decisions regarding the immediately presented item are completed quickly, allowing the longer-lasting neural signatures of memory formation to be reliably distinguished from the short-lived processes of immediate stimulus identification.
During the task, participants were asked to observe and memorize a single sequence of elements over repeated observations. This provides the possibility to track the progress of gradual memory formation through regular behavioral recall, as an individual element goes from being completely unknown to fully committed to memory. Rather than making comparisons among different complex items as is commonly done in the field81,82, which may differ in sensory characteristics and/or semantic content, this paradigm enables comparisons across successive learning states for each of a set of uniform, highly reduced, and semantically unloaded stimuli. This enables neural and behavioral tracking of the gradual learning progress in a way that cannot be done using typically employed paradigms with dichotomous subsequent recall outcomes (remembered versus forgotten)83,
Stimuli & Experimental Design
In the current task battery we employed an adapted version of the task of Steinemann et al.6 Participants were asked to view a sequence of 10 flashed-circle stimuli, which appeared among 8 possible, marked locations on the screen. The same sequence was presented a total of 5 times; after viewing each presentation, the participant attempted to reproduce the sequence to the best of their ability by sequentially clicking the different locations using a computer mouse. In pilot testing, we observed a floor effect on this 10-item sequence version in children younger than 9 years old; therefore, in the present study, participants 8 years and below were shown a shorter sequence of 8 items displayed among 6 possible locations. There was no restriction on the time provided to report the recalled sequence, and no feedback was provided throughout the task. Visual stimuli consisted of filled white circles with a diameter of 1 cm presented at eight different equidistant spatial locations on a radius of 5 cm eccentricity, and were continuously marked by static circular outlines (see Fig. 1 in Steinemann et al.6). Stimuli were presented (and gradually faded out) for 200 ms, with an inter-stimulus interval of 1,300 ms. Throughout the task, subjects were asked to hold eye fixation on a central fixation point (yellow dot). Before the main task recording, a training block was administered, consisting of 5 stimuli on the same 8 locations, in order to familiarize the subjects with the tasks and to confirm their comprehension of them. Feedback was provided for the training task only. The duration of this paradigm varied between 8–15 min, depending on the speed of recall reports.
‘Fixate on the yellow dot. Try to remember the sequence of the flashing dots. The SAME sequence will be repeated 5 times. After each round you have to give a response. If you do not know all the locations guess the others. Press the mouse button to begin.’
In the approach of Steinemann et al.6, trials were categorized as ‘still-unknown’, ‘newly-learned’ or ‘known’ based on the participants’ recall reports, and the average ERPs for these learning states were directly compared to examine processes of immediate stimulus identification and their modulation by ‘surprise,’ which reduced over the course of learning, and processes of memory formation which were especially strong at the point where a given item was newly learned. For the purposes of the current paper, we analyzed behavioral recall performance as well as these neural correlates over the successive blocks of sequence observation, which provides a simpler, but related, view on the progress of learning over the task. The process of immediate stimulus identification is reflected in a ‘P300’ component measured over centro-parietal sites. The P300 is a centro-parietal positivity occurring roughly 300 ms or later after stimulus onset, which famously indexes the level of ‘surprise’, i.e., the degree to which a stimulus was unexpected86,87. Recently it has been established that the P300 corresponds to the centro-parietal positivity (CPP), which reflects the accumulation of evidence for a decision, and it has been suggested that its sensitivity to surprise may arise from the setting of higher accumulation thresholds for unexpected stimuli5,88. In the sequence learning paradigm, as learning progresses, the location of the stimuli becomes increasingly less surprising, and therefore P300 amplitude decreases systematically. In fact, the degree of P300 reduction from the first to second block of sequence observation was found to correlate significantly with behavioral measures of the speed of learning6, highlighting the potential value of such measures.
Paradigm #6 (Active): Symbol search.
As our final, ‘active and complex’ paradigm, we chose to emulate a standard neuropsychological test in widespread, routine clinical use for assessing ‘processing speed’ in children. We chose the particular construct of processing speed because it is a good example among a wide range of clinical metrics that are almost universally employed yet imprecisely defined, with many conceivable computational explanations that can account for variation in the lumped, unitary score that is ultimately recorded on completion of the test. The ‘processing speed’ construct has been defined as the ability to focus attention, quickly scan, and discriminate between (visual) information, and is known to be sensitive to factors such as motivation, difficulty working under time pressure, and motor coordination89. Previous studies have associated processing speed with age, reading performance, and psychiatric and neurological disorders90,
The specific paradigm used here was a computerized version of the Symbol Search subtest of the Wechsler Intelligence Scale for Children IV (WISC-IV), which together with the subtests Coding and Cancellation makes up the Process Speed Index (PSI)89,94,95. The Symbol Search subtest is designed to assess the speed and accuracy with which a child can process nonverbal information. High scores require rapid and accurate processing of visual symbols that have no a priori meaning, which hinges on processing efficiency at several levels including motor, cognitive, and decisional and memory processes (e.g., Royer et al.96,97. For example, a participant needs to (a) detect and encode the target symbols; (b) hold this information in short-term and/or working memory; (c) process each of the symbols in the search set, whether in turn or in parallel to some degree; (d) identify the symbol among the search set that matches one of the target symbols, or conclude that there is no match; (e) select and initiate the appropriate response. This paradigm further enables the study of different strategies or performance styles that might cause a decreased performance, such as excessive carefulness (i.e., double-checking, or ‘making sure’).
It is not entirely clear which components of symbol search task performance are affected by decreases in processing speed, as the standard application of the task provides only one overall behavioral score (number relatively correct); little or no information on the underlying etiology of low performance is offered. Our on-line simultaneous acquisition of eye tracking and EEG data during this test thus stands to provide substantial further insights. We believe this integrated EEG/eye tracking approach will allow us to decompose the processing speed task into interpretable components of cognitive and perceptual processing, such as working memory, distractibility, uncertainty, and sustained attention.
Stimuli & Experimental Design
The visual geometric stimuli consisted of black symbols with a size of 1 cm width and 1 cm height (Fig. 3a). As on each page of the paper version, 15 trials were presented at a time on the screen. Each row contained two target symbols and five search symbols, arranged horizontally across the row. Participants were instructed to indicate for each row, by mouse-click (mark either the yes or no checkbox), whether either of the target symbols matched with any of the five search symbols. The participants had the option to correct their initial responses if they desired. Participants were instructed to solve as many rows, or trials, as possible within two minutes. Before beginning the actual paradigm, participants performed a training block with 4 trials, for which they received feedback, to ensure their comprehension of the task. No feedback was provided throughout the actual task.
Once a participant finished all 15 trials, they pressed the ‘next page’ button to advance onward. There were 4 pages (a maximum of 60 trials) in total. No participant ever reached the end of the 60 trials.
‘The task is to figure out if either one of the two first symbols are presented again in the same line. Press with the left mouse button YES and NO boxes to select your answer. If you accidently press the wrong button you can make a correction by simply clicking on the other response. You have 2 min to solve as many trials as possible.’
In contrast to the traditional pen and paper administration of the symbol search task, our computerized, multimodal approach allows for the generation of a range of measures rather than a single summary score. These included, but were not limited to: time spent looking at each symbol, the number of saccade steps, number of repetitions, pupil size, and the protracted gaze dwell times for each sub-region of the screen. These measures supply additional information on participants’ strategies for completing the task, and on why they might do well or poorly. This eye tracking data can further be complemented with topographic spatial and power analyses of the concurrently acquired EEG data.
EEG and eye tracking preprocessing steps
EEG data extraction
The data shared in this project are available as raw data, but also preprocessed. The MATLAB code for the preprocessing can be found at https://github.com/amirrezaw/automagic. The easiest and recommended method is to simply install the application ‘Automagic’, which includes all the required libraries and paths. If preprocessing is intended to run independently from the ‘gui’, the user should download functions from eeglab and Augmented Lagrange Multiplier (ALM) method (https://github.com/amirrezaw/automagic#4-how-to-run-the-application-from-the-code for details on how to install and use it). The data from each paradigm is saved as a separate file. In the first step of preprocessing, EEG data were imported in MATLAB (pop_readegi.m) and the triggers and latencies for each paradigm were extracted. The electrodes in the outermost circumferences (chin and neck) were excluded to a standard 111-channel electrode array98.
Electrode quality check
Bad electrodes were identified and replaced. Identification of bad electrodes was based on probability, kurtosis, and frequency spectrum distribution of all electrodes. A channel was defined as a bad electrode when recorded data from that electrode had a variance more than 3 standard deviations away from the mean across all other electrodes. This was realized with the eeglab MATLAB function: ‘pop_rejchan.m’. Subsequently bad electrodes were interpolated by using a using spherical spline interpolation98,99 ‘eeg_interp.m’. Moreover, after automatic scanning, noisy channels were selected by visual inspection and interpolated or replaced entirely by zeros (for the calculation of the ISC measures to eliminate the channel’s contribution in subsequent calculation of covariance matrices).
Artifact signal correction
One hundred and nine EEG channels were used for scalp recordings, while 9 EOG channels were used for artifact removal. The rest of the channels lying mainly on the neck and face were discarded before data analysis. The EEG data were high-pass filtered at 0.1 and notch filtered (59–61 Hz) with a Hamming windowed-sinc finite impulse response zero-phase filter (EEGLAB function pop_eegfiltnew.m). The filter order was defined to be 25% of the lower passband edge. Eye artifacts were removed by linearly regressing the EOG channels from the scalp EEG channels. The EOG electrodes were placed on the participant’s forehead, outer and inner canthi (#'s 8, 14, 17, 21, 25, 125, 126, 127, and 128 from the HydroCel Geodesic Sensor Net).
Next, a robust Principal Components Analysis (PCA) algorithm, the inexact Augmented Lagrange Multipliers Method (ALM100), removed sparse noise from the data. Briefly, the ALM recovers a low-rank matrix, A, efficiently and accurately from a corrupted data matrix D=A+E, where some entries of the additive errors E may be arbitrarily large. Finally, the entire dataset for each subject was visually inspected in order to discard whole block and/or paradigm recordings that remained noisy after the automatic and manual noise removal methods.
Eye tracking data extraction
Saccades and fixations were detected using a dispersion-based and a fixed-length moving interval algorithm provided by SMI101. The SMI detection algorithm is described in detail in Salvucci and Goldberg102. Briefly, a blink can be regarded as a special case of a fixation, where the pupil diameter is either zero or outside a dynamically computed valid pupil, or the horizontal and vertical gaze positions are zero. The algorithm identifies fixations as groups of consecutive points within a particular dispersion. It uses a moving window that spans consecutive data points checking for potential fixations. The moving window begins at the start of the protocol and initially spans a minimum number of points, determined by the given Minimum Fixation Duration (here: 50 ms) and sampling frequency. The algorithm then checks the dispersion of the points in the window by summing the differences between the points' maximum and minimum x and y values and comparing that to the Maximum Dispersion Value; so if [max(x)−min(x)]+[max(y)−min(y)]>Maximum Dispersion Value, the window does not represent a fixation, and the window moves one point to the right. If the dispersion is below the Maximum Dispersion Value (here: 50 pixels, physical display dimension: 330×240 mm), the window represents a fixation. In this case, the window is expanded to the right until the window's dispersion is above threshold. The final window is registered as a fixation at the centroid of the window points with the given onset time and duration. Following this process, a saccade event is created between the newly and the previously created blink or fixation. Although these detected fixation and saccade times as estimated by the SMI algorithm are provided in the database for convenience, we would encourage users to make use of the raw data provided since one can directly apply detection algorithms best suited to the analysis at hand.
The codes for the EEG preprocessing can be found here: https://github.com/amirrezaw/automagic. Code for the ISC analysis is available here: http://parralab.org/isc. All the analyses were performed with MATLAB 2014a (MathWorks, Natick, MA, USA) and EEGlab 13.3.2b.
All data are de-identified and participants gave permission for their data to be openly shared as part of the informed consent process.
Distribution for use
Raw and preprocessed EEG data (Data Citation 1: 1000 Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) http://dx.doi.org/10.15387/FCP_INDI.MIPDB.eeg), as well as eye tracking data (Data Citation 1: 1000 Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) http://dx.doi.org/10.15387/FCP_INDI.MIPDB.eeg) can be accessed through the 1,000 Functional Connectomes Project and its International Neuroimaging Data-sharing Initiative (FCP/INDI) based at www.nitrc.org (http://fcon_1000.projects.nitrc.org/indi/cmi_eeg/). EEG data are available openly, along with basic phenotypic data (age, sex, handedness, completion status of EEG paradigms, and known diagnosis status) and performance measures for the EEG paradigms (Data Citation 2: 1000 Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) http://dx.doi.org/10.15387/FCP_INDI.MIPDB.phenofile). Public data are distributed under the Creative Commons, Attribution Non-Commercial Share Alike License (https://creativecommons.org/licenses/by-nc-sa/3.0/).
The more extensive phenotypic data (Data Citation 3: 1000 Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) http://dx.doi.org/10.15387/FCP_INDI.MIPDB.phenopage) (e.g., behavioral questionnaires, abbreviated intelligence and achievement testing; see Table 3) may be accessed through the Collaborative Informatics and Neuroimaging Suite (COINS) Data Exchange (http://coins.mrn.org/dx). These data are protected by a Data Usage Agreement (DUA), which investigators must complete and have signed by an authorized institutional official before receiving access (the DUA can be found at: http://fcon_1000.projects.nitrc.org/indi/cmi_eeg/phenotypic.html). The DUA is based upon that of the NKI-Rockland Sample, which does not attempt to restrict or curate the focus of analyses, but does require users to agree not to attempt re-identification of participants under any circumstances.
EEG and eye tracking data organization
The data (Data Citation 1: 1000 Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) http://dx.doi.org/10.15387/FCP_INDI.MIPDB.eeg) are stored in folders by participant. Each participant’s folder contains one EEG folder, one eye tracking folder and one behavioral folder. In the EEG folder the EEG data are available as of the simple binary file (<ID number>.raw) and ascii file (<ID number>.csv) for each paradigm. There are also two eye tracker files for each paradigm: one file is segmented into blinks, saccades and fixations (<ID number>.txt). The other file is an unsegmented file (<ID number>.txt), with the eye tracking information for each sample. Furthermore, a behavioral folder contains a MATLAB file (<ID number>.mat) for each paradigm, which includes the information about the paradigm itself, including: inter-trial interval, triggers, number of trials, response selection and reaction time (if available). For users who intend to use the data without MATLAB, this information is also available as.csv files. Each subject’s folder requires on average 2.5 GB storage space. The channel location file of the EEG montage is also provided (Data Citation 4: 1000 Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) http://dx.doi.org/10.15387/FCP_INDI.MIPDB.channelloc). A ‘MIPDB_EEG_Readme’ folder contains ‘Readme’ files for each paradigm about the variables and paradigm parameters (Data Citation 5: 1000 Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) http://dx.doi.org/10.15387/FCP_INDI.MIPDB.readme).
Following standardized EEG preprocessing (described in the Methods section), the data were filtered between 1.5 and 30 Hz and segmented into eyes-closed and eyes-open segments. Only the eyes-closed segments were further analyzed for display here. The artifact-free EEG was recomputed against the average reference and segmented into 2-second epochs. In a second step, a discrete Fourier transformation algorithm was applied to the 2-second epochs. In resting-state EEG, the spectral amplitude of the signals is typically assumed to be of interest; therefore the power spectrum of 1.5–30 Hz (resolution: 0.5 Hz) was calculated. The spectra for each channel were averaged over all epochs for each subject. Next, the group mean spectral amplitude was computed and displayed as an average over all electrodes (Fig. 4a) and for each electrode individually (Fig. 4b). Finally, the group mean relative power spectra data were integrated for the following 7 frequency bands classification proposed by Kubicki et al.103: delta (1.5–3.5 Hz); theta (4–8 Hz); alpha 1 (8.5–10 Hz); alpha 2 (10.5–12 Hz); beta 1 (12.5–18 Hz); beta 2 (18.5–21 Hz); beta 3 (21.5–30 Hz) as well as the theta/(beta1+2) ratio, which is often used in ADHD research (Fig. 4c). As can be seen from the figure, the expected distribution of spectral amplitudes in resting-state EEG data was obtained104,105. These spectral measures have been sensitive and successful for describing, for instance, age-related EEG changes or various clinical conditions of developmental disorders106.
Surround suppression paradigm
Following standard preprocessing, the EEG data were segmented based on the flickering stimulus onset. The segmentation was conducted for each of the 4 foreground contrasts (0 30, 60 and 100%) and three different background conditions (parallel, orthogonal, no background) individually. For each participant, the data were merged across the two blocks. For the purpose of technical validation, we computed 1) the SSVEP signal without background, and 2) the SSVEP across all conditions with a background. In a next step, the FFT was computed to obtain a measure of SSVEP power at the 25 Hz flickering frequency. In Fig. 2, we subtracted the SSVEP power for 25 Hz from its neighboring frequency bin to extract the actual evoked activity by the stimulus presentation. An in-house algorithm detected the electrode with the highest SSVEP amplitude and computed the SSVEP signal based on the average of max electrode and its four surrounding electrodes. Moreover, we displayed the SSVEP amplitude for each foreground contrast without a background (black line) and with a background (red line) (Fig. 2). As expected, the data demonstrate an increase of SSVEP amplitude with an increase of foreground contrast. We also demonstrated the surround suppression effect, which was measured as a relative reduction in amplitude of the SSVEP due to surround contrast. This is in line with a recent study, which originally developed and used this paradigm in healthy adults41.
Naturalistic stimuli paradigm
Consistent with previously established methodologies, within-subject covariance matrices were computed across subjects and videos after standard EEG preprocessing. Thus, we obtained a set of component projections, which can be used as an ISC measure. We selected the three strongest correlated components and computed the corresponding scalp topographies separately for each component (Fig. 5). It has been shown that the sum of the first three components explains sufficient variance of the data. The distribution of the ISC measure is in line with previous studies showing congruent distributions7,66.
Contrast change detection paradigm
After standard EEG preprocessing, for each participant we merged the data from the three blocks. Target epochs were extracted from 500 ms before target onset to 1,000 ms after peak sensory evidence. Moreover, response-locked epochs were extracted from 1,000 ms before a response, to 300 ms after response button press. Trials were rejected if any scalp channel exceeded 100 μV. SSVEP (20 or 25 Hz depending on left or right target) based on stimulus-locked epochs and motor response signal (12.5–18 Hz) based on response-locked epochs were measured using the standard short-time Fourier transformation. The CPP analysis consisted simply of averaging the single-trial waveforms, which were baseline-corrected relative to the 500 ms interval before response onset. Figure 6 displays the sensory evidence encoding (SSVEP), the evidence accumulation (CPP), and motor preparation topographies. As expected, we found a posterior maximum for the SSVEP around the electrode Oz. The CPP signal shows the highest activity near the CPz electrode and the preparatory motor response signal (reduction in spectral amplitude relative to baseline) peaks over the electrodes C3 and C4.
Sequence learning paradigm
Following standard preprocessing, the EEG data were segmented based on stimulus onsets, which were each defined by the appearance of a filled white circle in one of the eight different spatial locations. Each epoch was 900 ms long (100 ms pre-stimulus to 800 post-stimulus presentation). For each of the five blocks, all artifact-free segments were extracted and subsequently baseline-corrected. For the technical validation, we averaged all trials within each block for each subject individually, and then calculated a group average. Although this averages across ‘still-unknown’, ‘newly-learned’, and ‘known’ conditions within each block, the relative dominance of trial numbers in each category systematically varied over the course of the blocks as the sequence became better memorized. In Fig. 7a, the ERP for the electrode CPz is depicted. We found a decrease in the P300 over the different blocks. In a next step, the behavioral ‘performance’ was calculated as percentage correctly learned spatial locations, and the ‘learning rate’ was defined as the newly learned locations in relation to all possible locations (Fig. 7b). The corresponding P300 amplitude was calculated as the amplitude on electrode CPz with a latency of 350–500 ms post-stimulus presentation. We chose a slightly delayed time window, because recent studies have shown that the P300 in children exhibits a greater latency compared to adults107. Figure 7b reveals that while the performance is increasing with practice, the learning rate and the P300 are decreasing. These findings are in line with the assumption that the subjects learn to expect the order of the appearance of the targets over the course of the 5 blocks, which is represented by the performance and learning rate measure. In other words, over the course of the five blocks, the P300 amplitude decreases because the subjects are not as surprised by the appearance of the specific location of the stimulus presentation. Equivalent results have been reported in the original version of this paradigm from which the current one was adapted6.
Symbol search paradigm
For this paradigm, we mainly focused on the eye tracking data, but the newly established inventory of objective eye tracking measures can be complemented with topographic spatial and power analyses of the concurrently acquired EEG data. As described in the methods section, the goal of each trial is to determine whether either of two target symbols appears among a set of five search symbols. The presented graphic of the symbol search task was segmented into three subregions of interest: targets, search group, and response buttons (Fig. 3a). From the eye tracking data, we calculated the number of saccade steps, number of repetitions, pupil size, and protracted gaze dwell times (fixation duration) for each subregion. Figure 3a displays all the fixations for a representative subject. The darkness of the color and the size of the circle indicate the duration of the fixations. Blue color indicates fixations outside of the current trial. Figure 3b represents the distribution of saccade amplitude, peak velocity and the angular histogram. In the second row of Fig. 3b, the distribution of the durations of the fixations, the heat map and the allocation of the fixations are displayed. As expected, the data demonstrate that the eye tracker is able to track oculomotor activity while the subject is performing the task. This enables us to decompose the processing speed task into interpretable components of cognitive and perceptual processing, such as working memory, distractibility, uncertainty, and sustained attention.
How to cite this article: Langer, N. et al. A resource for assessing information processing in the developing brain using EEG and eye tracking. Sci. Data 4:170040 doi: 10.1038/sdata.2017.40 (2017).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Langer, N. 1000 Functional Connectomes Project International Neuroimaging Data-Sharing Initiative (FCP/INDI) http://dx.doi.org/10.15387/FCP_INDI.MIPDB.readme (2017)
The data resource and work presented here were supported by gifts to the Child Mind Institute from Phyllis Green, Randolph Cowen, and Joseph Healey. N.L. was supported through the CMI Endeavor Scientists Program. M.P.M. is a Randolph Cowen and Phyllis Green Scholar. We would like to thank the participants and their families for taking the time to be in the study, as well as their willingness to have their data shared with the scientific community. We would also like to thank the members of the Mind Research Network, particularly Margaret King and Vince Calhoun, for their support in using the COINS neuroinformatics platform.