A functional dissociation of face-, body- and scene-selective brain areas based on their response to moving and static stimuli

The human brain contains areas that respond selectively to faces, bodies and scenes. Neuroimaging studies have shown that a subset of these areas preferentially respond more to moving than static stimuli, but the reasons for this functional dissociation remain unclear. In the present study, we simultaneously mapped the responses to motion in face-, body- and scene-selective areas in the right hemisphere using moving and static stimuli. Participants (N = 22) were scanned using functional magnetic resonance imaging (fMRI) while viewing videos containing bodies, faces, objects, scenes or scrambled objects, and static pictures from the beginning, middle and end of each video. Results demonstrated that lateral areas, including face-selective areas in the posterior and anterior superior temporal sulcus (STS), the extrastriate body area (EBA) and the occipital place area (OPA) responded more to moving than static stimuli. By contrast, there was no difference between the response to moving and static stimuli in ventral and medial category-selective areas, including the fusiform face area (FFA), occipital face area (OFA), amygdala, fusiform body area (FBA), retrosplenial complex (RSC) and parahippocampal place area (PPA). This functional dissociation between lateral and ventral/medial brain areas that respond selectively to different visual categories suggests that face-, body- and scene-selective networks may be functionally organized along a common dimension.

such as within-category exemplar recognition and do not fully account for how these areas diferentially process moving stimuli.
he motion-selective area, V5/MT, is located on the lateral surface, close to the intersection of the ascending limb of the inferior temporal sulcus and the lateral occipital sulcus 24 . Areas surrounding V5/MT in the lateral occipitotemporal cortex (LOTC) have also been shown to represent diferent aspects of action perception 25 . hese include perception of bodies 26 , tools 27 and action observation 28 . Anterior to human V5/MT is the superior temporal sulcus (STS), which also responds to a wide range of moving biological stimuli. hese include faces 12,29,30 , point-light walkers 31,32 , bodies 33 and the perception of goal-directed actions [34][35][36][37] By contrast, the strong preference for moving stimuli on the lateral surface is less, or absent, in category-selective regions on the ventral surface. Prior studies have demonstrated a greater response for moving stimuli in lateral than ventral brain areas, using diferent stimuli including faces 30,[38][39][40] , bodies 41 and scenes 42 .
To date these studies have largely focused on a single visual category (e.g. faces, bodies or scenes) and have not simultaneously compared the response to moving and static stimuli across multiple object categories in the same group of experimental participants. In addition, the response to moving and static stimuli in the place-selective retrosplenial complex (RSC), as well as in the face-selective voxels of the amygdala is unknown. In the current study, participants (N = 22) were scanned using fMRI at 7 Tesla while viewing 3 second videos containing bodies, faces, objects, scenes and scrambled objects, or static pictures taken from the beginning, middle and end of each video. Our aim was to simultaneously measure the diferential response to moving and static stimuli across all face-, body-and scene-selective areas in the brain to establish diferences and similarities across diferent visual categories.

Results
Identifying ROIs. Face, body and scene-selective areas were identified using short videos displaying bodies, faces, objects, scenes, and scrambled objects 40 . We were able to identify the necessary ten ROIs in the right hemisphere of eighteen of the twenty-two participants. Face-selective ROIs (identiied using a contrast of faces > objects) included the FFA, OFA, pSTS, aSTS and face-selective voxels in the amygdala (four participants did not show any face-selective voxels in the amygdala). he two body-selective ROIs (identiied using a contrast of bodies > objects) were EBA and FBA. Scene-selective ROIs (identiied using a contrast of scenes > objects) included PPA, RSC and OPA. All ROIs were identiied based on the activation of peak voxels in the relevant brain areas identiied by prior studies 4,[6][7][8][9]14,16,40 . We selected all contiguous voxels for each ROI.
In the let hemisphere, we were unable to identify the necessary ROIs (FFA N = 22; OFA N = 18; pSTS N = 18; aSTS N = 11; amygdala N = 11; EBA N = 22; FBA N = 15; PPA N = 22; RSC N = 20; OPA N = 15) in the same 22 participants. his diference between category-selective regions across hemispheres has been reported in prior face-processing studies 4,19,40,43,44 . Consequently, our subsequent analysis focused only on data from the right hemisphere (data from the let hemisphere ROIs are included in supplemental igures).

Discussion
Our results show a functional dissociation between category-selective regions located on the lateral brain surface and those located on the ventral and medial brain surfaces. his dissociation was consistent across all three visual categories investigated, suggesting that the networks that selectively process faces, bodies and scenes in the human brain share a common functional organization in response to motion. Lateral areas, including face-selective ROIs in the posterior and anterior superior temporal sulcus (pSTS and aSTS), the body-selective extrastriate body area (EBA) and the scene-selective occipital place area (OPA) all responded more strongly to moving than static stimuli. By contrast, we found no evidence of a diference in the response to moving and static stimuli in ventral and medial category-selective regions, including the face-selective fusiform face area (FFA) and occipital face area (OFA), face-selective voxels in the amygdala, the body-selective fusiform body area (FBA), and the scene-selective retrosplenial complex (RSC) and parahippocampal place area (PPA). Moreover, in face-selective and scene-selective ROIs, this preference for moving, relative to static, stimuli was limited to the preferred stimulus category of the area, i.e., faces in face-selective ROIs and scenes in scene-selective ROIs (Figs 1  and 2). he body-selective EBA, by contrast, showed not only a signiicantly greater response to moving than static bodies but also a greater response to moving than static objects (Fig. 2). his result is consistent with prior evidence showing the spatial overlap between the EBA and the object-selective lateral occipital complex (LOC) as well as the motion-selective V5/MT 45 .
Prior studies have demonstrated that face-selective ROIs in the STS show a greater response to moving than static faces, while the FFA and OFA show a reduced, or no diference, in the response to moving and static faces 30,[38][39][40] . A similar dissociation between moving and static images of bodies was shown between lateral and ventral areas in a meta-analysis of human movement perception 41 . Most recently, a study of the scene processing network showed that the lateral scene-selective OPA responded more to moving than static scenes, while there was no diference in the response to moving and static scenes in the medial RSC and ventral PPA 42 .
he present study replicates these prior results and extends them in two ways. First, face-selective voxels in the ventromedially located amygdala showed no diference in its response to moving and static faces, thereby demonstrating that the amygdala has the same functional proile as the FFA and OFA (Fig. 1). Second, we simultaneously compared the response to moving and static stimuli in face-, body-and scene-selective areas in the same participants. his design enabled us to demonstrate that a diferential response to moving and static stimuli exists in category-selective areas located on the lateral brain surface but is absent in those located on the ventral and medial brain surfaces. his result suggests a common scheme across networks that process diferent visual object categories. Perhaps this greater response to moving than static stimuli in lateral category-selective areas also extends to lateral brain regions in the human brain that are not category-selective.
here was no diference in the response to moving and static stimuli in the OFA (Fig. 2). his result is consistent with our prior fMRI study that scanned participants using the same experimental stimuli at 3 Tesla 40 . he absence of a diference between moving and static faces is perhaps surprising given that the area is located on the lateral cortical surface in the inferior occipital gyrus 6 . he OFA is thought to process the component parts of a face and is thought to be the earliest face-selective area in the visual cortical hierarchy 1,19 . his has led to the proposal that the OFA selectively processes the primitive, local and stimulus-driven features of a face and should be grouped as a lateral category-selective area together with LOC and the EBA 17 . However, this prior theory did not consider the diferential role of motion in the division of category-selective areas.
he broad variety of cognitive operations performed in the STS has led to a debate concerning the functional speciicity of the region. One view takes the modular position that diferent cognitive operations (e.g. face, body and speech perception) are processed in specialized and distinct cortical regions 46 . Another view proposes that SCIENTIFIC REPORTS | (2019) 9:8242 | https doi org s www.nature.com/scientificreports www.nature.com/scientificreports/ the cortical areas encompassing the STS perform a variety of diferent cognitive operations that are dependent on task-dependent network connections 47 . Our data do not address this debate, but further demonstrate that the lateral regions of occipitotemporal cortex, including the STS, are driven strongly by motion.
he diferential response to moving and static stimuli in the pSTS we demonstrated is also consistent with a hypothesis that there are two pathways for face recognition, one inferior and one superior, that begin in early visual cortex [48][49][50][51] . he inferior pathway, projecting along the ventral cortical surface, encompasses the OFA and FFA, and is proposed to compute the invariant aspects of a face, such as its identity. he superior pathway, projecting laterally along the STS, is proposed to compute the changeable aspects of a face, including facial expression and direction of eye-gaze. he lack of a signiicant diference between moving and static faces in both the FFA and OFA also supports this model (Fig. 1). However, in contrast to our data, some prior fMRI studies have reported a higher response to moving than static faces in the FFA 29,52-54 . his discrepancy warrants further investigation but a recent review of the fMRI face processing literature suggested that diferences in experimental stimuli could account for the diferent results 51 . Speciically, the studies reporting a diferential response to moving and static faces in the FFA predominately used face morphing sotware to generate the motion elements in the stimuli. By contrast, prior studies 30,38 (as well as the current study) that reported no diference between moving and static faces in the FFA used movies of real faces. It is possible that morphed stimuli do not fully capture the changeable aspects of the human face that are apparent in real-world movies 51 .
he most likely source of motion information into the STS (as well as to the EBA and OPA) is the laterally located motion-selective area V5/MT 24 . Neuroanatomical studies in macaques 55,56 show that V5/MT projects to areas MST and FST, which in turn project to more anterior portions of the STS. A more recent fMRI study in which macaques viewed moving natural stimuli demonstrated that motion, particularly biological motion, accounted for the greatest amount of the neural response in large parts of visual areas, including the STS 57 . In humans, tractography data show a cortical pathway projecting along the lateral surface from occipital cortex, along the STS 49 . Further, our recent combined TMS/fMRI studies show that the response to moving faces in the pSTS and aSTS can be impaired by thetaburst TMS (TBS) delivered over the pSTS 50,58 .
In conclusion, the present study has shown that category-selective regions for faces, bodies and scenes located on the lateral surface of the human brain exhibit a greater response to moving than static stimuli. By contrast, face-, body-and scene-selective regions located on the ventral and medial surfaces exhibit an equal response to moving and static stimuli. his functional dissociation in the response of regions selective for diferent visual categories, based on brain location, suggests that a response to motion is a common organizing feature in the human brain.

Participants.
A total of 22 right-handed participants (13 females) aged between 22 and 46 years old (Mean 27.4 years). All subjects had normal, or corrected-to-normal vision and gave informed written consent before commencing the study. he experimental protocols were approved by the Institutional Review Board (IRB) at the National Institutes of Mental Health (NIMH). All methods, were carried out in accordance with the guidelines and regulations of the NIMH.
Stimuli. Moving stimuli were 3-second video clips of faces, bodies, scenes, objects and scrambled objects.
hese stimuli have been used in prior studies 40,50,[58][59][60] . here were sixty video clips for each category. Videos of faces and bodies were ilmed on a black background, and framed close-up to reveal only the faces or bodies of 7 children as they danced or played with toys or adults (both of which were out of frame). Face stimuli depicted close-up videos of the child's face as they performed a range of diferent actions including; head movement, gaze direction changes, talking (no sound was included) and facial expression changes. Fiteen diferent locations were used for the scene stimuli, which were mostly pastoral scenes shot from a car window while driving slowly through leafy suburbs, along with some other videos taken while lying through canyons or walking through tunnels that were included for variety. Fiteen diferent moving objects were selected that minimized any suggestion of animacy of the object itself or of a hidden actor pushing the object; these included mobiles, windup toys, toy planes and tractors, balls rolling down sloped inclines, etc. Scrambled objects were constructed by dividing each object video clip into a 15 by 15 box grid and spatially rearranging the location of each of the resulting video frames. Within each block, stimuli were randomly selected from within the entire set for that stimulus category (faces, bodies, scenes, objects, scrambled objects). his meant that the same video clip could appear within the same block but, given the number of stimuli, this occurred infrequently.
Static stimuli were identical in design to the moving stimuli, except that in place of each 3-second video we presented three diferent static images taken from the beginning, middle and end of the corresponding video clip. Each image was presented for one second with no inter-stimulus interval, to equate the total presentation time with the corresponding video clip.
Procedure. Functional data were acquired over 12 blocked-design functional runs lasting 234 seconds each.
Each functional run contained two sets of ive consecutive stimulus blocks (faces, bodies, scenes, objects or scrambled objects) sandwiched between these rest blocks to make two blocks per stimulus category per run. Each block lasted 18 seconds and contained stimuli from one of the ive stimulus categories. he order of stimulus category blocks in each run was palindromic (e.g. ixation, faces, objects, scenes, bodies, scrambled objects, ixation, scrambled objects, bodies, scenes, objects, faces, ixation) and was randomized across runs. For the moving runs, each 18-second block contained six 3-second video clips from that category. For the static runs, each 18-second block contained 18 one-second still snapshots, composed of six triplets of snapshots taken at one-second intervals from the same video clip. Stimuli were presented using Psychtoolbox and Matlab running on a Macbook Pro.
SCIENTIFIC REPORTS | (2019) 9:8242 | https doi org s www.nature.com/scientificreports www.nature.com/scientificreports/ Video clips were presented at a frame rate of 70 Hz. Video clips and static stimuli were both presented full screen at a visual angle of 19.8 by 15.7 degrees.
Moving and static runs occurred in the following order: 4 moving, 2 static, 2 moving, 2 static, 2 moving. he irst 4 runs of the moving stimuli were used to localize the category-selective regions-of-interest (ROIs) (see 'Data Analysis' section). To maintain attention to the stimuli, participants were instructed to press a button when the same stimulus content (e.g. face, body, scene or object) was presented twice in a row (1-back task). On average this occurred once per block. Ater all functional runs were complete, we collected a high-resolution T-1 weighted anatomical scan to localize the functional activations.
Brain imaging and analysis Participants were scanned using a research-dedicated Siemens 7 Tesla Magnetom scanner in the Clinical Research Center on the National Institutes of Health campus (Bethesda, MD). Brain images were acquired using a 32-channel head coil (42 slices, 1.2 × 1.2 × 1.2 mm; 10% interslice gap, TR = 2 s, TE = 27 ms; matrix size, 170 × 170; FOV, 192 mm). Slices were aligned with the anterior/posterior commissure. In addition, a high-resolution T-1 weighted MPRAGE anatomical scan (T1-weighted FLASH, 1 × 1 × 1 mm resolution) was acquired to anatomically localize functional activations. In each scanning session, functional data were acquired over 12 blocked-design functional runs lasting 234 seconds.
Functional MRI data were analyzed using AFNI (http://afni.nimh.nih.gov/afni). Data from the irst four TRs from each run were discarded. he remaining images were slice-time corrected and realigned to the third volume of the irst functional run and to the corresponding anatomical scan. he volume registered data were spatially smoothed with a 2-mm full-width half-maximum Gaussian kernel. Signal intensity was normalized to the mean signal value within each run and multiplied by 100 so that the data represented percent signal change from the mean signal value before analysis. A general linear model (GLM) was established by convolving the standard hemodynamic response function with the 5 regressors of interest (one for each stimulus category -faces, bodies, scenes, objects and scrambled objects). Regressors of no interest (e.g., 6 head movement parameters obtained during volume registration and AFNI's baseline estimates) were also included in this GLM.
he irst four moving runs were used to deine ROIs using the same statistical threshold (p = 10 −4 , uncorrected) for all participants. We used moving stimuli to localize ROIs because they have been shown to more robustly activate some category-selective regions across participants 30,40 . In addition, our prior work has shown that the pattern of the response across diferent stimulus categories within a given ROI does not difer when localized with moving stimuli vs. static stimuli 40 .
Face-selective regions were identiied using a contrast of activations evoked by moving faces greater than those evoked by moving objects. Body-selective regions were identiied using a contrast of activations evoked by moving bodies greater than those evoked by moving objects. Scene-selective regions were identiied using a contrast of activations evoked by moving scenes greater than those evoked by moving objects. Within each functionally deined ROI we then calculated the magnitude of response (percent signal change, or PSC, from a ixation baseline) to the moving and static conditions of each of the ive stimulus categories (faces, bodies, scenes, objects and scrambled objects), using the data collected from runs 5 to 12 in which pairs of moving and static runs were alternated. All the data used to calculate PSC were independent of the data used to deine the ROIs.