An extension of the studyforrest dataset for vision research

The studyforrest (http://studyforrest.org) dataset is likely the largest neuroimag-ing dataset on natural language and story processing publicly available today. In this article, along with a companion publication, we present an update of this dataset that extends its scope to vision and multi-sensory research. 15 participants of the original cohort volunteered for a series of additional studies: a clinical examination of visual function, a standard retinotopic mapping procedure, and a localization of higher visual areas — such as the fusiform face area. The combination of this update, the previous data releases for the dataset, and the companion publication, which includes neuroimaging and eye tracking data from natural stimulation with a motion picture, form an extremely versatile and comprehensive resource for brain imaging research — with almost six hours of functional neuroimaging data across five different stimulation paradigms for each participant. Furthermore, we describe employed paradigms and present results that document the quality of the data for the purpose of characterising major properties of participants’ visual processing stream.


Background & Summary
The studyforrest dataset 1 , with its combination of functional magnetic resonance imaging (fMRI) data from prolonged natural auditory stimulation and a diverse set of structural brain scans, represents a versatile resource for brain imaging research with a focus on information processing under real-life like conditions. The dataset has, so far, been used to study the role of the insula in dynamic emotional experiences 2 , modeling of shared blood oxygenation level dependent (BOLD) response patterns across brains 3 , and to decode input audio power-spectrum profiles from fMRI 4 . The dataset has subsequently been extended twice, first with additional fMRI data from stimulation with music from various genres 5 and secondly with a description of the movie stimulus structure with respect to portrayed emotions 6 . However, despite providing three hours of functional imaging data per participant, experimental paradigms exclusively involved auditory stimulation, thereby representing a substantial limitation regarding the aim to aid the study of real-life cognition -which normally involves multi-sensory input. With this further extension of the dataset presented here and in a companion publication 7 , we are now substantially expanding the scope of research topics that can be addressed with this resource into the domain of vision and multi-sensory research.
This extension is twofold. While the companion publication 7 describes an audio-visual movie dataset with simultaneously acquired fMRI, cardiac/respiratory traces, and eye gaze trajectories, the present article focuses on data records and exams related to a basic characterization of the functional architecture of the visual processing stream of all participantsnamely retinotopic organization and the localization of particular higher-level visual areas. The intended purpose of these data is to perform brain area segmentation or annotation using common paradigms and procedures in order to to study the functional properties of areas derived from these standard definitions in situations of real-life like complexity. Moreover, knowledge about the specific spatial organization of visual areas in individual brains aids studies of the functional coupling between areas, and it also facilitates the formulation and evaluation of network models of visual information processing in the context of the studyforrest dataset.
The contributions of this study comprise three components: 1) results of a clinical eye examination for subjective measurements of visual function for all participants to document potential impairments of the visual system that may impact brain function, even beyond the particular properties relevant to the employed experimental paradigms; 2) raw data data for a standard retinotopic mapping paradigm and a six-category block-design localizer paradigm for higher visual areas, such as the fusiform face area (FFA) 8 , the parahippocampal place area (PPA) 9 , the occipital face area 10 , the extrastriate body area (EBA) 11 , and the lateral occipital complex (LOC) 12 ; and 3) validation analyses providing volumetric angle maps for retinotopy data and ROI masks for visual areas. While the first two components are factual, the third component is based on a largely arbitrary selection of analysis tools and procedures. No claim is made that the chosen methods are superior to any alternative, but the results are shared to document the plausibility of the results and to facilitate follow-up studies that do not require any particular method to analyze and interpret these data.

Methods Participants
Fifteen right-handed participants (mean age 29.4 years, range 21-39, 6 females) volunteered for this study. All of them had participated in both previous studies of the studyforrest project 1,5 . The native language of all participants was German. The integrity of their visual function was assessed at the Visual Processing Laboratory, Ophthalmic Department, Otto-von-Guericke University, Magdeburg, Germany as specified below. Participants were fully instructed about the purpose of the study and received monetary compensation. They signed an informed consent for public sharing of all obtained data in anonymized form. This study was approved by the Ethics Committee of the Otto-von-Guericke University (approval reference 37/13).

Subjective measurements of visual function
To test whether the study participants had normal visual function and to detect critical reductions of visual function, two important measures were determined: (1) visual acuity to identify dysfunction of high resolution vision and (2) visual field sensitivity to localize visual field defects. For each participant, these measurements were performed for each eye separately -if necessary with refractive correction. (1) Normal decimal visual acuity (>=1.0) was obtained for each eye of each participant. (2) Visual field sensitivities were determined with static threshold perimetry (standard static white-on-white perimetry, program: dG2, dynamic strategy; OCTOPUS Perimeter 101, Haag-Streit, Koeniz, Switzerland) at 59 visual field locations in the central visual field (30°radius) i.e., covering the part of the visual field that was stimulated during the MRI scans. In all, except for two participants, visual field sensitivities were normal for each eye (MD (mean defect) dB<2.0 & >-2.0; LV (loss variance) dB 2 < 6) -indicating the absence of visual field defects. Visual field sensitivities for participant 2 (right eye) and participant 4 (both eyes) were slightly lower than normal but not indicative of a distinct visual field defect.

Functional MRI acquisition setup
For all of the fMRI acquisitions described in the paper, the following parameters were used: T2 * -weighted echo-planar images (gradient-echo, 2 s repetition time (TR), 30 ms echo time, 90°flip angle, 1943 Hz/px bandwidth, parallel acquisition with sensitivity encoding (SENSE) reduction factor 2) were acquired during stimulation using a whole-body 3 Tesla Philips Achieva dStream MRI scanner equipped with a 32 channel head coil. 35 axial slices (thickness 3.0 mm) with 80×80 voxels (3.0×3.0 mm) of in-plane resolution, 240 mm field-of-view (FoV), anterior-to-posterior phase encoding direction) with a 10% inter-slice gap were recorded in ascending order -practically covering the whole brain. Philips' "SmartExam" was used to automatically position slices in AC-PC orientation such that the topmost slice was located at the superior edge of the brain. This automatic slice positioning procedure was identical to the one used for scans reported in the companion article 13 and yielded a congruent geometry across all paradigms.

Physiological recordings
Pulse oximetry and recording of the respiratory trace were performed simultaneously with all fMRI data acquisitions using the built-in equipment of the MR scanner. Although the measurement setup yielded time series with an apparent sampling rate of 500 Hz, the effective sampling rate was limited to 100 Hz.

Stimulation setup
Visual stimuli were presented on a rear-projection screen inside the bore of the magnet using an LCD projector (JVC DLA RS66E, JVC Ltd., light transmission reduced to 13.7% with a gray filter) connected to the stimulus computer via a DVI extender system (Gefen EXT-DVI-142DLN with EXT-DVI-FM1000). The screen dimensions were 26.5 cm×21.2 cm at a resolution of 1280×1024 px with a 60 Hz video refresh rate. The binocular stimulation were presented to the participants through a front-reflective mirror mounted on top of the head coil at a viewing distance of 63 cm. Stimulation was implemented with PsychoPy v1.79 (with an early version of the MovieStim2 component later to be publicly released with PsychoPy v1.81) 14 on the (Neuro)Debian operating system 15 . Participant responses were collected by a two-button keypad and was also logged on the stimulus computer.

Retinotopic Mapping
Stimulus Similar to previous studies 16,17 , traveling wave stimuli were designed to encode visual field representations in the brain using temporal activation patterns 18 . Expanding/contracting rings and clockwise/counter-clockwise wedges (see Figure 3A) consisting of flickering radial checkerboards (flickering frequency of 5 Hz) were displayed on a gray background (mean luminance ≈100 cd/m 2 ) to map eccentricity and polar angle. The total run time for both eccentricity and polar angle stimuli was 180 s, comprising five seamless stimulus cycles of 32 s duration each along with 4 s and 12 s of task-only periods (no checkerboard stimuli) respectively at the start and the end.
The flickering checkerboard stimuli had adjacent patches of pseudo-randomly chosen colors, with pairwise euclidean distances in the Lab color space (quantifying relative perceptual differences between any two colors) of at least 40. Each of these colored patches were plaided with a set of radially moving points. To improve the perceived contrast, the points were either black or white depending on the color of the patch on which the points were located. The lifetime of these points was set to 0.4 s, a new point at a random location was initialised after that. With every flicker, the color of the patches changed to its complementary luminance. Simultaneously, the color changed and the direction of movement of the plaided points also reversed.
Eccentricity encoding was implemented by a concentric flickering ring expanding and contracting across the visual field (0.95°of visual angle in width). The ring was not scaled with cortical magnification factor. The concentric ring traveled across the visual field in 16 equal steps, stimulating every location in the visual field for 2 s. After each cycle, the expanding or the contracting rings were replaced by new rings at the center or the periphery respectively.
Polar angle encoding was implemented by a single moving wedge (clockwise and counterclockwise direction). The opening angle of the wedge was 22.5 degrees. Similar to the eccentricity stimuli, every location in the visual field was stimulated for 2 seconds before the wedge was moved to the next position.
Center letter reading task In order to keep the participants' attention focused and to minimize eye-movements, they performed a reading task. A black circle (radius 0.4°) was presented as a fixation point at the center of the screen, superimposed on the main stimulus. Within this circle, a randomly selected excerpt of song lyrics was shown as a stream of single letters (0.5°height, letter frequency 1.5 Hz, 85% duty cycle) throughout the entire length of a run. Participants had to fixate, as they were unable to perform the reading task otherwise. After each acquisition run, participants were presented with a question related to the previously read text. They were given two probable answers, to which they replied by corresponding button press (index or middle finger of their right hand). These question only served the purpose of keep participants attentive -and were otherwise irrelevant. The correctness of the responses was not evaluated.

Procedure
Participants performed four acquisition runs in a single session with a total duration of 12 min, with short breaks in-between and without moving out of the scanner. In each run, participants performed the center reading task while passively watching the contracting, counter-clockwise rotating, expanding, and clockwise rotating stimuli in exactly this sequential order. For the retinotopic mapping experiment, 90 volumes of fMRI data were acquired for each run.

Stimulus
All the stimuli for this experiment were used in a previous study 19 . There were 24 unique grayscale images from each of six stimulus categories: human faces, human bodies without heads, small objects, houses and outdoor scenes comprising of nature and street scenes, and phase scrambled images ( Figure 1B). Mirrored views of these 24×6 images were also used as stimuli. The original images were converted to grayscale and scaled to a resolution of 400×400 px. Images were matched in luminance using lumMatch in the SHINE toolbox 20 to a mean and standard deviation of 128 and 70 respectively. The original images of human faces and houses were produced in the Haxby lab at Dartmouth College; human body images were obtained from Paul Downing's lab at Bangor University 11,21 ; images of small objects  In each block, 16 unique images were presented on a medium-gray background (with a superimposed green fixation cross). Each image was shown for 900 ms and images were separated in time by 100 ms. The participant's task was to press a button (index finger, right hand) when any image was immediately followed by its mirrored equivalent. These events happened randomly either once, twice, or never in each block. In order to alert the participant, the fixation cross turned green 1.5 s prior to the start of a block, remained green throughout a block, and was white during the rest period. The start of each block was synchronized with the MR volume acquisition trigger pulse. Stimulus blocks were separated by 8 s of fixation. (B) Example images for all six stimulus categories.
were obtained from the Bank of Standardized stimuli (BOSS) 22,23 ; outdoor natural scenes are a collection of personal images and public domain resources; and street scenes are taken from the CBCL Street scene database 24 . Stimulus images were displayed at a size of approximately 10°×10°of visual angle.

Procedure
Participants were presented with four block-design runs, with two 16 s blocks per stimulus category in each run, while they also performed a one-back matching task to keep them attentive. The order of blocks was randomized so that all six conditions appeared in random order in both the first and second halves of a run. However, due to a coding error, the block-order was identical across all four runs; though the actual per-block image sequence was a different random sequence for each run. The block configuration and implementation of the matching task are depicted in Figure 1A. 156 fMRI volumes were acquired during each experiment run.

Movie frame localizer
A third stimulation paradigm was implemented to collect BOLD fMRI data for an independent localization of voxels that show a response to basic visual stimulation in areas of the visual field covered by the movie stimulus used in the companion article 7 . The stimulus was highly similar to the one used for the retinotopic mapping, but instead of isolated rings and wedges, the dynamic stimulus covered either the full rectangle of the movie frame (without the horizontal bars at the top and bottom) or just the horizontal bars. The stimulus alternated every 12 s, starting with the movie frame rectangle. A total of four stimulus alternation cycles were presented -starting synchronized with the acquisition of the first fMRI volume. A total of 48 volumes were acquired. During stimulation, participants performed the same reading task as in the retinotopic mapping session, hence a localization of responsive voxels assumes a central fixation and can only be considered as an approximation of the responsive area of the visual cortex during the movie session, where eye movements were permitted.

Code availability
All custom source code for data conversion from raw, vendor-specific formats into the deidentified released form is included in the data release (code/rawdata conversion). fMRI data conversion from DICOM to NIfTI format was performed with heudiconv (https: //github.com/nipy/heudiconv), and the de-identification of these images was implemented with mridefacer (https://github.com/hanke/mridefacer). The data release also contains the implementations of the stimulation paradigms in code/stimulus/. Moreover, analysis code for visual area localization and retinotopic mapping is available in two dedicated repositories at github.com/psychoinformatics-de/studyforrestdata-visualrois, and github.com/psychoinformatics-de/studyforrest-data-retinotopy, respectively.

Data Records
This dataset is compliant with the Brain Imaging Data Structure (BIDS) specification 25 , which is a new standard to organize and describe neuroimaging and behavioral data in an intuitive and common manner. Extensive documentation of this standard is available at http://bids.neuroimaging.io. This section provides information about the released data, but limits its description to aspects that extends the BIDS specifications. For a general description of the dataset layout and file naming conventions, the reader is referred to the BIDS documentation. In summary, all files related to the data acquisitions for a particular participant described in this manuscript can be located in a sub-<ID>/ses-localizer/ directory, where ID is the numeric subject code.
All data records listed in this section are available on the OpenfMRI portal (dataset accession number: ds000113d) at http://openfmri.org/dataset/ds000113d as well as on Github/ZENODO 26 .
In order to de-identify data, information on center-specific study and subject codes have been removed using an automated procedure. All human participants were given sequential integer IDs. Furthermore, all BOLD images were "de-faced" by applying a mask image that zeroed out all voxels in the vicinity of the facial surface, teeth, and auricles. For each image modality, this mask was aligned and re-sliced separately. The resulting tailored mask images are provided as part of the data release to indicate which parts of the image were modified by the de-facing procedure (de-face masks carry a defacemask suffix to the base file name).
In addition to the acquired primary data described in this section, we provide results of validation analysis described below. These are: 1) manually titrated ROI masks for visual areas localized for all participants (github.com/psychoinformatics-de/studyforrest-datavisualrois); and 2) volumetric and surface-projected eccentricity and polar angles maps from retinotopic mapping analysis (github.com/psychoinformatics-de/studyforrest-data-retinotopy).

fMRI data
Each image time series in NIfTI format is accompanied by a JSON sidecar file that contains a dump of the original DICOM metadata for the respective file. Additional standardized Retinotopic mapping fMRI data files for the retinotopic mapping contain a *ses-localizer task-retmap* bold in their file name. Specifically the retmapclw, retmapccw, retmapcexp, and retmapcon file name labels respecitvely indicate stimulation runs with clockwise and counterclockwise rotating wedges, and expanding and contracting rings.
Higher visual area localizers fMRI data files for the visual area localizers contain a *ses-localizer task-objectcategories* bold in their file name. The stimulation timing for each acquisition run is provided in corresponding * events.tsv files. These three-column text files describe the onset and duration of stimulus block and identify the associated stimulus category (trial type).

Movie frame localizer
fMRI data files for the movie frame localizer contain a *ses-localizer task-movielocalizer* bold in their file name.

Physiological recordings
Time series of pleth pulse and respiratory trace are provided for all BOLD fMRI scans in a compressed three-column text file: volume acquisition trigger, pleth pulse, and respiratory trace (file name scheme: _recording-cardresp_physio.tsv.gz). The scanner's built-in recording equipment does not log the volume acquisition trigger nor does it record a reliable marker of the acquisition start. Consequently, the trigger log has been reconstructed based on the temporal position of a scan's end-marker, the number of volumes acquired, and under the assumption of an exactly identical acquisition time for all volumes. The time series have been truncated to start with the first trigger and end after the last volume has been acquired.

Technical Validation
All image analyses presented in this section were performed on the released data in order to test for negative effects of the de-identification procedure on subsequent analysis steps. During data acquisition, (technical) problems were noted in a log. All known anomalies and their impact on the dataset are detailed in Table 1.
Temporal signal-to-noise ratio (tSNR) Data acquisition was executed using the R5 software version of the scanner vendor. With this version, the vendor changed the frequency of the Spectral Presaturation by Inversion Recovery (SPIR) pulse from the previously 135 Hz to 220 Hz in order to increase fat suppression efficiency. After completion of data acquisition, it was discovered that the new configuration led to undesired interactions with pulsations in the cerebrospinal fluid in the ventricles, which resulted in a reduced temporal stability of the MR signal around the ventricles. Figure 2A illustrates the magnitude and spatial extent of this effect. Despite this issue, the majority of voxels show a tSNR of ≈70 or above ( Figure 2B), as can be expected with a voxel volume of about 27 mm 3 and with 3 Tesla acquisition 27 .  Figure 2: Average temporal signal-to-noise ratio (tSNR) across all acquisitions, including the fMRI data described in the companion article 7 . tSNR was computed independently from motion-corrected and linearly detrended BOLD fMRI time series for each scan. The resulting statistics were projected into group space for averaging across scans and participants (n=255). (A) Spatial distribution of average tSNR across the brain. In the vicinity of the ventricles, tSNR is reduced due to a suboptimal fat suppression SPIR pulse frequency. This artifact amplifies the expected U-shape of the spatial SNR profile of a 32 channel head coil. (B) Histograms of average tSNR scores. The dark shaded histogram shows the tSNR distribution of all voxels in an approximate brain mask (MNI template brain mask); the lighter shaded histogram shows the tSNR distribution of 20% of voxels with the largest probability of sampling gray matter, as indicated by FSL's gray matter prior volume (avg152T1 gray.nii.gz) for the MNI template image.

Retinotopic mapping analysis
Many regions of interest (ROI) in the human visual system follow a retinotopic organization 16,17,28 . The primary areas like V1 and V2 are also provided as labels with the Freesurfer segmentation using the recon-all pipeline 29 . But the higher visual areas (V3, VO, PHC, etc) need to be localized by retinotopic mapping [30][31][32][33] or probability maps 34,35 .
We implemented a standard analysis pipeline for the acquired fMRI data based on standard algorithms publicly available in the software packages Freesurfer 29 , FSL 36 , and AFNI 37 . All analysis steps were performed on a computer running the (Neuro)Debian operating system 15 , and all necessary software packages (except for Freesurfer) were obtained from system software package repositories.
BOLD images time series for all scans of the retinotopic mapping paradigm were brainextracted using FSL's BET and aligned (rigid-body transformation) to a participant-specific BOLD template image. All volumetric analysis was performed in this image space. An additional rigid-body transformation was computed to align the BOLD template image to the previously published cortical surface reconstructions based on T1 and T2-weighted structural images of the respective participants 1 for later delineation of visual areas on the cortical surface. Using AFNI tools, time series images were also "deobliqued" (3dWarp), slice time corrected (3dTshift), and temporally bandpass-filtered (3dBandpass cutoff frequencies set to 0.667/32 Hz and 2/32 Hz, where 32 s is the period of both the ring and the wedge stimulus).
For angle map estimation, AFNI's waver command was used to create an ideal response time series waveform based on the design of the stimulus. The bandpass filtered BOLD images were then processed by the 3dRetinoPhase (DELAY phase estimation method was based on the response time series model). Expanding and contracting rings, as well as clockwise and counter-clockwise wedge stimuli, were jointly used to generate average volumetric phase maps representing eccentricity and polar angles for each participant. Polar angle maps were adjusted for a shift in the starting position of the wedge stimulus compared between the two rotation directions. The phase angle representations, relative to the visual field, are shown in Figure 3A. As an overall indicator of mapping quality, Figure 3B shows the distribution of the polar angle representations across all voxels in the MNI occipital lobe mask combined for all participants. For visualization and subsequent delineation, all volumetric angle maps (after correction) were projected onto the cortical surface mesh of the respective participant using Freesurfer's mri vol2surf command -separately for each hemisphere. In order to illustrate the quality of the angle maps, the subjectively best, average, and worst participants (respectively: participant 1, 10, and 9) have been selected on the basis of visual inspection. Figure 3C shows the eccentricity maps on the left panel and the polar angle maps for both hemispheres on the right panel. A table summarizing the results of the manual inspections of all surface maps is available at github.com/psychoinformatics-de/studyforrest-data-retinotopy/qa. Delineations of the visual areas depicted in Figure 3C were derived according to Kaule et al. (page 4) 38 . Further details on the procedure can be found in 31,32,39 .

Localization of higher visual areas
To localize higher visual areas for each participant, we implemented a standard two-level general linear model (GLM) analysis using the FEAT component in FSL. BOLD image time series were slice-time-corrected, masked with a conservative brain mask, spatially smoothed (Gaussian kernel, 4 mm FWHM), and temporally high-pass filtered using a cutoff period of 100 s. For each acquisition run, we defined the stimulation design using six boxcar functions, one for each condition (bodies, faces, houses, small objects, landscapes, scrambled images), such that each stimulation block was represented as a single contiguous 16 s segment. The GLM design, comprised of these six regressors, convolved with FSL's "Double-Gamma HRF" as a model hemodynamic response function model. Temporal derivatives of those regressors were also included in the design matrix, and it was subjected to the same temporal filtering as the BOLD time series.
At the first level, we defined a series of t-contrasts to implement different localization procedures found in the literature 40 . The strict set included one contrast per target region of interest and involved all stimulus conditions (one condition vs. all others, except for the PPA contrast, where houses/landscapes were contrasted with all other conditions). The relaxed set included structurally similar contrasts as the strict set, but the number of contrast condition was reduced, for example: the FFA contrast was defined as faces vs. small objects and scrambled images. Lastly, the simple set contained only contrasts of one (e.g., faces) or two related conditions (e.g., houses and landscapes) against responses to scrambled images.
The GLM analysis was performed for each experiment run individually, and afterwards results were aggregated in a within-subject second-level analysis by averaging. Statistical evaluation (fixed-effects analysis) and cluster-level thresholding were performed at the second level using a cluster forming threshold of Z >1.64 and a corrected cluster probability threshold of p <0.05.
We defined category-selective regions starting with the contrast clusters that survived second-level analysis for each participant. For each region of interest, we started with the most conservative contrast (strict set) by using a threshold of t =2.5 and looked for clusters with at least 20 voxels (using AFNI). We titrated the threshold in the range of [2,3] until we found an isolated cluster for the localizer region of interest. If a cluster was not found or not isolated, we used a contrast from the relaxed set, or finally the simple set, and repeated    the process until we found a cluster that matched the expected anatomical location based on literature for FFA/OFA 41 , PPA 9 , LOC 12 , and EBA 11 . Figure 4 depicts the results of this procedure for all regions of interest by means of localization overlap across all participants on the cortical surface of the MNI152 brain. Detailed participant-specific information is provided in Table 2 (face-responsive regions), Table 3 (scene and place responsive regions), and Table 4 (early visual areas and LOC). Both the spatial localization of regions in the groups of participants, as well as the frequency of localization success, approximately matches reports in the literature (for example 8 ).

Usage Notes
The procedures we employed in this study resulted in a dataset that is highly suitable for automated processing. Data files are organized according to the BIDS standard 25 . Data are shared in documented standard formats, such as NIfTI or plain text files, to enable further processing in arbitrary analysis environments with no imposed dependencies on proprietary tools. Conversion from the original raw data formats is implemented in publicly accessible scripts; the type and version of employed file format conversion tools are documented. Moreover, all results presented in this section were produced by open source software on a computational cluster running the (Neuro)Debian operating system 15 . This computational environment is freely available to anyone, and it -in conjunction with our analysis scripts -offers a high level of transparency regarding all aspects of the analyses presented herein.
All data are made available under the terms of the Public Domain Dedication and License (PDDL; http://opendatacommons.org/licenses/pddl/1.0/). All source code is released under the terms of the MIT license (http://www.opensource.org/licenses/MIT). In short, this means that anybody is free to download and use this dataset for any purpose as well as to produce and re-share derived data artifacts. While not legally required, we hope that all users of the data will acknowledge the original authors by citing this publication and follow good scientific practise as laid out in the ODC Attribution/Share-Alike Community Norms (http://opendatacommons.org/norms/odc-by-sa/).    Table 4: Individual localization results after manual titration for early visual cortex and lateral occipital complex. Table semantics are identical to Table 2.