Task-dependent functional organizations of the visual ventral stream

The visual hierarchy of the ventral stream has been widely studied. However, it remains unclear how the hierarchical system organizes its functional coupling during top-down cognitive process. The present fMRI study investigated task-dependent functional connectivity along the ventral stream, while twenty-eight participants performed object recognition tasks that required different types of visual processing: i) searching or ii) memorizing visual objects embedded in natural scene images or iii) free viewing of the same images. Utilizing a seed-based approach that explicitly compared task-specific BOLD time-series, we identified task-dependent functional connectivity of the visual ventral stream, demonstrating different correlation structures. Searching for a target object manifested both correlated and anti-correlated structures, separating the visual areas V1 and V4 from the posterior part of the inferior temporal cortex (PIT). In contrast, the ventral stream structure remained correlated during memorizing objects, but increased the correlation between the right V4 and PIT. On the other hand, V1 and V4 showed task-dependent activation, whereas PIT was deactivated. These results highlight the context-dependent nature of the visual ventral stream and shed light on how the visual hierarchy is selectively organized to bias object recognition toward features of interest.

ventral stream. However, it remains unclear how the ventral stream configures its functional structures according to the task goals.
To provide insight into this question, BOLD time-series in individuals performing visual cognition tasks were examined. A seed-based analysis was performed along the visual ventral stream, with the first cortical processing stage V1 subjected as the initial seed. Voxel clusters that revealed significant task effect on BOLD time-series correlation with the seed were identified as the regions of interests (ROIs) and these ROIs were further subjected as seeds for the subsequent seed-based analyses. We then show to which extent task-dependent connectivity strengths across the identified ROIs increases or decreases during each of the visual search, memory, and free view condition.

Materials and Methods
Thirty-two right-handed volunteers took part in the study. All subjects had normal or corrected-to-normal visual acuity and were free from any history of neurological or psychiatric diseases. Written informed consent was collected from each subject after the procedure was fully explained. The study was approved by the ethics committee of RWTH Aachen University Hospital and carried out in accordance with the relevant guidelines and regulations. Data of four subjects were discarded from the analysis due to artifacts and technical failure during recordings. As a result, twenty-eight subjects (mean age = 23.86, SD = 3.36; 15 females) were included in the present report.
Task design, data acquisition, and behavioral results of this study have been reported in detail previously 15 , but the fMRI data presented here have not been reported in the previous study. In brief, subjects performed voluntary visual exploration of natural scene images under three different task conditions: searching for a target object, memorizing objects, and free viewing of the same images (Fig. 1). Each subject performed 30 trials for each task condition, resulting in a total of 90 trials. The order of the trials of the three different conditions was pseudo-randomized to avoid the same condition three times in a row. Each trial began with a 2 s display of one of three task indicators ("search", "memory", or "free view"). For the search condition, a target object was first displayed for 6 s after the task indicator and then a scene image was presented for 6 s. Subjects had to search the scene for the target object (every scene image contained five different objects; see below) and had to press the middle button of a three-response button device as soon as they found the object. In contrast, for the memory condition, a scene image was first presented for 6 s after the task indicator and subjects were required to find and memorize all five objects embedded in the scene image. Afterwards, they were presented with a probe array consisting of five objects with a cursor over one of them, among which only one object had been presented in the scene image. Subjects were given 5 s for a response by pressing the left and right buttons for moving the cursor to the left and right, respectively, and pressing the middle button for selection. For the free view condition, a scene image was presented for 6 s after the task indicator and subjects viewed the scene freely without any specific task. In all three conditions, a fixation-cross was presented for 2.5 s before and after the scene image presentation. The same fixation-cross was also presented for a random interval of 5.5-8 s between trials.
The scene image stimuli for the three task conditions were generated by placing object images on background images of natural scenes. For the objects, we draw 39 object images from the Microsoft image gallery and resized them (keeping the aspect ratios) to a size of on average 143.56 × 141.10 pixels. The objects consisted of flowers, animals, insects, fruits, furniture, tools, and transports. For the background, we collected 15 natural scene images of 1920 × 1200 pixels, eight of which were pictures taken by one of the co-authors and the rest were taken from the Internet. The images were photos of plants, flowers, fruits, gardens, boats, rooms, streets, and churches (details are available on request to the authors). The objects naturally blended into the background based on a contrast map for every pair of object and background images 15 (see Supplementary Information for an example of the visual stimulus). A total of 90 stimulus images were generated based on 30 combinations of 15 backgrounds times two sets of five objects; each combination served for three variants of a stimulus image which differed only in the object positions. We used each of the three variants for each of the three task conditions, assuring that each and every combination appeared only once in each task condition.
The presentation of the scene images (background plus five placed objects) for each trial was implemented using the Presentation software (Neurobehavioral Systems, USA). The images were presented on a monitor screen (BOLDscreen, Cambridge Research Systems Ltd) at a refresh rate of 60 Hz. The images were displayed with a 1920 × 1200 pixels resolution, subtending 25.4 × 16.0° of visual angle. fMRI data acquisition and analysis. Participants were placed in a 3 Tesla MR scanner (Siemens MAGNETOM Prisma ® , Siemens Medical Systems, Germany) with their right fingers positioned on a three-response button device. Before the main task, a 4 minutes magnetization-prepared rapid acquisition gradient echo image (MPRAGE) T1-weighted sequence was used to acquire structural images (TR = 1900 ms, TE = 2.52 ms, matrix = 256 × 256, 176 slices, voxel size = 1 × 1 × 1 mm 3 , flip-angle 9°). During the main task, functional imaging was performed using echo-planar imaging sensitive to BOLD contrast (voxel size: 3 × 3 × 3 mm 3 , 64 × 64 matrix, 36 slices, TR 1950ms, TE 30 ms, flip-angle 76°). MR-compatible EEG and eye tracking systems were also used to record physiological responses, which have been reported elsewhere 15 and were not used in this study.
For preprocessing, SPM12 toolbox was used (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/). Six motion parameters have been estimated and used to realign the functional images to their mean and the differences in image acquisition time between slices were corrected. The resulting images were co-registered to the structural image that was segmented into tissue components (gray and white matter, and cerebrospinal fluid), and normalized to the standard brain template from the Montreal Neurological Institute (MNI). Functional images were further smoothed using an 8 mm Gaussian kernel.
We first performed seed-based connectivity analyses using the functional connectivity toolbox CONN v.18 16 . The CONN toolbox computed Fisher-transformed correlation coefficient (z) between the average BOLD www.nature.com/scientificreports www.nature.com/scientificreports/ time-series of a given seed region and every voxel in the brain (seed-based correlation map). Before computing correlations, functional images were denoised, following the strategy implemented in the CONN toolbox; Images of the whole recording session were high-pass filtered at 0.008 Hz. Each delta function of the conditions was convolved with the canonical hemodynamic response function implemented in SPM12 and then entered as regressor in the general linear model (GLM). BOLD signal from the white matter and cerebrospinal fluid as well as functional outlier and motion parameters were taken as confound effects and regressed out. Besides, the BOLD signal changes associated with the presence or absence of task conditions were also regressed out. The BOLD time-series was then divided into scans associated with each block of task condition (task-specific BOLD time-series).
We defined the area reflecting the first stage of visual cortical processing as a seed (the primary visual cortex V1) based on an anatomical reference obtained from the Anatomy toolbox 17 (the bilateral hOC1; see also the bottom row in Fig. 2), which overlapped with the BOLD signal responses from the conjunction map of the three task conditions (see Supplementary Information). The correlations between the mean BOLD time-series of the voxels in the V1 seed and the time-series of each voxel throughout the whole brain were then computed for each subject, separately for each task condition. The resulting correlations were subsequently used for group level analysis with a repeated measures ANOVA to investigate any differences across the three task conditions. Significance of the  15 ). The time course of each condition is shown in seconds. For the search condition, subjects were required to search the scene for a target object, while during the memory condition they memorized all objects embedded in the scene. No specific instruction was given for the free view condition. Note that the objects embedded in natural scene images are enlarged and randomly located for illustration purpose. (2019) 9:9316 | https://doi.org/10.1038/s41598-019-45707-w www.nature.com/scientificreports www.nature.com/scientificreports/ task effect was thresholded using a voxel level uncorrected p < 0.001 and cluster-level FWE-corrected at p < 0.05. The resulting voxel clusters in the ventral stream were identified as ROIs and subjected as seeds for the subsequent seed-based analyses. This seed-based approach resulted in five ROIs (see the Results section).
Next, we examined the interaction between the identified ROIs (ROI-to-ROI analysis). The average BOLD time-series was computed across all the voxels within each ROI. Fisher-transformed correlation was then computed for each pair of ROIs and subjected to a repeated measures ANOVA with task condition as a within-subject variable. Mean ± standard deviation of the mean were used to express the variables.
Lastly, to assess the BOLD signal changes in each ROI as a response to the scene images (the main effects of task-driven activation instead of correlation between regions), the parameter estimates reflecting the signal change for each condition versus baseline were calculated in the context of the GLM using SPM12. Six motion parameters and the presentations of the instruction, target object, and the probe array were modeled as regressors of no interest. The ROI activity was then extracted from these contrast estimates of each task condition.

Results
Behavioral results. For the search condition, mean performance across subjects was 46.6% (SD = 0.20) correct responses for detecting a target object. On the correct trials, mean response time was 3.0 s (SD = 0.36). For the memory condition, the mean performance was 48.0% (SD = 0.12) correct responses to a five-choice probe request of trials, where the chance level of choosing the correct object was 20%. These results ensure cognitive engagement of the subjects during the task conditions. Besides, our preceding study demonstrated different eye-movement behaviors between the conditions and further supported task-dependent cognitive processes 15 .
Seed-based analysis. To identify task-dependent functional network of the visual ventral stream, we conducted a series of seed-based analyses. First, we defined the area reflecting the first stage of visual cortical processing as a seed (the primary visual cortex V1). The top row in Fig. 2 displays the map of the voxels with any significant task effects on their BOLD time-series correlations with the V1 seed. This map revealed significant task-dependent correlations with a widespread voxel cluster covering the early visual areas (V1, V2, and V3) and the intermediate visual area V4 as well as with a voxel cluster of the higher-order visual area, the posterior part of the IT (PIT). To obtain the correspondence area V4, the map was combined with an anatomical map of the ventral extrastriate and lateral occipital cortices (the bilateral hOC4d, hOC4la, and hOC4lp) 17 . The significant voxels within the anatomical map were defined as left and right V4 respectively (lV4 = 338 voxels; rV4 = 215 voxels; the bottom row in Fig. 2). It also revealed a significant task-dependent correlation with a voxel cluster in the left PIT   Likewise, the extracted lV4 and rV4 were separately subjected to the same seed-based analysis. The lV4 seed-based map revealed a significant task-dependent correlation with a cluster covering the early visual areas and the rV4 (the second row in Fig. 2). Similar pattern of results was found for the rV4 seed-based map (the third row in Fig. 2), showing significant task-dependent correlations with the early visual areas and lV4. Notably, both the lV4 and rV4 seed-based maps identified the right PIT, covering the posterior portion of the right inferior and middle temporal gyri (rPIT). The rPIT was thus defined by the joint effect of the lV1 and rV4 seed-based maps (rPIT = 195 voxels).
Lastly, the BOLD time-series extracted from the lPIT and rPIT were separately subjected to the same seed-based analysis above. The lPIT seed-based map revealed a significant task-dependent correlation with a cluster covering the early visual areas, lV4, and rV4 (the forth row in Fig. 2). On the other hand, the rPIT seed-based map identified the lV4 (the fifth row in Fig. 2).
Taken together, this seed-based approach converged into five ROIs in the visual ventral stream (the bottom row in Fig. 2), representing a task-dependent connectivity network. However, we note here that there were also other significant voxel clusters (see Table 1), for instance, in the angular gyrus of the V1 and rV4 seed-based maps and in the frontal pole of the lV4 seed-based map, which are not located in the visual ventral stream.

ROI-to-ROI analysis.
For further validation of the identified network, we examined the interactions between the BOLD time-series averaged within each of the five ROIs. A repeated measures ANOVA showed a significant main effect of task condition in seven correlations out of ten possible pairs. Figure 3A shows the ROI-to-ROI correlations that yielded a significant task effect (p < 0.05; Bonferroni corrected for 10 multiple tests).
As significant activation clusters were also observed beyond the visual areas (see Table 1), we further examined the interactions with these clusters, i.e., the left and right supramarginal gyri (lSMG and rSMG), left and right frontal poles (lFP and rFP), and the right precentral gyrus (rPCG). The lSMG (2067 voxels) covering the left angular and postcentral gyri was defined by the joint effect of the five ROIs' seed-based maps (V1, lV4, rV4, lPIT, and rPIT), since all these ROIs revealed a significant task-dependent correlation with the lSMG (Table 1). Likewise, the rSMG (384 voxels) was identified by the joint effect of the four ROIs' seed-based maps (lV4, rV4, lPIT, and rPIT). On the other hand, the lV4 seed-based map identified significant task-dependent correlations with the rPCG (124 voxels), lFP (251 voxels), and rFP (134 voxels). ROI-to-ROI correlations across all 10 ROIs (V1, lV4, rV4, lPIT, rPIT, lSMG, rSMG, lFP, rFP, and rPCG) were separately subjected to a repeated measures ANOVA and revealed 22 significant task-dependent correlations out of 45 possible pairs ( Table 2; p < 0.05; Bonferroni corrected for 45 multiple tests), retaining the seven correlations found in Fig. 3. Notably, during the search condition early and intermediate visual areas (V1, V4) were anti-correlated with all the five ROIs beyond the visual ventral stream (lSMG, rSMG, lFP, rFP, and rPCG), further supporting the distinction of V1 and V4 from the higher-order brain areas. In contrast, the memory condition increased the correlation between rV4 and rSMG.
ROI activity. We further assessed the instantaneous BOLD signal changes in response to the scene images, instead of interaction between ROIs. Individual mean BOLD activity in each of the five ROIs was subjected to a repeated measures ANOVA with task condition as a within-subject variable. A significant task effect was found for all ROIs (all p's < 0.001; Bonferroni corrected for five multiple tests; Fig. 4). Post-hoc analyses revealed that, for V1, the search and memory conditions increased BOLD activity than the free view condition (all p's < 0.003, Bonferroni corrected for three multiple tests). On the other hand, the memory condition showed the highest activity in rV4 and lV4 than the other two conditions (all p's < 0.001, corrected), whereas the free view and search conditions did not show a significant difference (all p's > 0.139, corrected). In contrast, rPIT and lPIT decreased activity during the search and memory conditions as compared to the free view condition (all p's p < 0.007, corrected), whereas the search and memory conditions were not significantly different (all p's > 0.784, corrected). All the activity was significantly different from zero (all p's < 0.020) except for the lPIT and rPIT during the free view condition (all p's > 0.363). www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
The seed-based approach allowed us to identify a task-dependent functional network, by explicit comparison of seed-to-voxel correlations during the free view, search, and memory conditions taken on identical visual stimuli. The connectivity across the identified ROIs was organized into correlated and anti-correlated structures according to the context of visual cognition. Searching for a target object distinguished the visual areas V1 and V4 from the high-order visual area PIT, whereas memorizing objects strengthened the connectivity between rV4 with rPIT. Furthermore, task-dependent activation was found in V1 and V4, while PIT showed deactivation during the search and memory conditions.
Anti-correlated BOLD activity has been observed between networks, namely between the regions exhibiting task-dependent activations and deactivations 18,19 . A set of brain regions that are activated during task performance, known as the task-positive network, manifests strong correlations between regions. The correlated brain regions are considered to serve an integrative role in combining neural processes and thus facilitating the execution of a task. In contrast, a set of deactivated brain regions that are anti-correlated with the task-positive network is thought to operate as segregating neural processes and subserving competing or inhibitory representations, known as the task-negative network 20 . The same mechanism seems to hold during the search condition, in which the search condition showed high BOLD activity at V1 and V4 with strong correlations between them. On the contrary, the deactivated region left PIT was anti-correlated with the activated region V1. In this regard, the presence of anti-correlations indicates a dichotomy between low and high-order visual areas, which may have served a competition or inhibition of neural processes. The task-dependent activation suggests that low-order visual area V1 and the intermediate area V4 were strongly involved in searching for a target object, while the high-order area PIT was suppressed.
Our results also showed an increased correlation between regions that exhibited task-dependent activation and deactivation. During the memory condition, the activated region right V4 was strongly correlated with the deactivated region right PIT. It is possible that although the extent of task-dependent activity was different between these two regions, the fluctuations of the BOLD time-series were synchronized. Indeed, this association has been reported in a previous study 21 , suggesting that a brain region that does not exhibit significant activation may still contain a population of neurons that communicate with other regions. Therefore, task-dependent activation at V4 with the presence of deactivation at PIT during the memory condition suggests that, though speculative, memorizing objects might recruit rV4 and communicate with the high-order area rPIT other than the low-order area V1, as we observed a decreased correlation between rV4 and V1.  Importantly, the increased and decreased relationships of the search condition distinguished the V1 and V4 from PIT. Besides, the increased correlation found during the memory condition between the right V4 and PIT highlights the significant role of the division during the present task conditions. Several lines of evidence have shown that as signal moves from V1 to V4 to PIT, the onset latency, receptive field size, and the neural selectivity for complex shape are gradually increasing (for reviews see refs 2,22 ). Thus, object representation at high-order areas such as PIT is considerably complex than at low (V1) and intermediate (V4) areas. Given this nature, one reasonable interpretation of the connectivity structure is that during the search condition subjects selectively focused on relatively simple features, whereas during memorizing their focus was directed to features that are more complex. It should be noted that recognition of a specific object might require more than low-level features such as line and orientation 6 , in which most of the objects share these low-level features. Therefore, searching for or memorizing such very simple features is unlikely to be the best strategy for the present task. Instead, our results suggest that the functional properties of V4 such as color, texture, and simple geometric shape 23 were critical in order to detect the target object, whereas memorizing objects might require much complex object representations.
Since the set-up of the task conditions required subjects to recognize and identify objects on complex scene images, brain responses in the visual ventral stream were expected 2 . However, it does not imply that the brain responses were confined within the ventral visual stream. Indeed, we observed significant task-dependent connectivity with the brain areas covering the SMG, FP, and PCG. Further analyses including all these brain areas highlighted anti-correlations specifically during the search condition, supporting the distinction of V1 and V4 from the higher-order brain areas. This distinction between brain regions may indicate segregated brain networks that are functionally optimized for searching a target object, whereas memorizing unspecified objects might have requested increased integration across brain networks. One critique of the present task paradigm would be that the search condition compared with the free view and memory conditions required a different sequence of presentation, i.e., a target object was initially presented before the scene image in the search condition. This could have required processing resources such as working memory at the initial stage of the search trial, which might have influenced brain responses during the scene image presentation. Future studies with a careful modification of the task paradigm, for instance, comparing simple with complex target object presentations, may prove useful in extending the findings of the present study.
In summary, the present study demonstrated different functional structures of the visual ventral stream. In particular, while the ventral stream was organized into correlated and anti-correlated structures during searching for a target object, memorizing objects manifested a correlated structure. Our results further suggest a putative boundary between V4 and PIT, which partitions the visual hierarchy into two subdivisions that interact competitively or cooperatively depending on task demand. These results highlight the context dependent nature of the visual ventral stream and may provide theoretical and computational pursuits of finding optimal structure in a hierarchical system. . Average signal changes in response to the scene images. The V1, lV4, and rV4 showed task-related activation, while the lPIT and rPIT showed task-related deactivation particularly during the search and memory conditions. Repeated measures ANOVA revealed significant task effect on all regions (all p's < 0.05, corrected). Error bars are standard error of the mean, whereas horizontal bars indicate significance in the paired t-test between conditions (all p's < 0.05, corrected). V1 = primary visual cortex; rV4 = right V4; lV4 = left V4; lPIT = left posterior part of the inferior temporal cortex; rPIT = right posterior part of the inferior temporal cortex.