The neural representation of personally familiar and unfamiliar faces in the distributed system for face perception

Personally familiar faces are processed more robustly and efficiently than unfamiliar faces. The human face processing system comprises a core system that analyzes the visual appearance of faces and an extended system for the retrieval of person-knowledge and other nonvisual information. We applied multivariate pattern analysis to fMRI data to investigate aspects of familiarity that are shared by all familiar identities and information that distinguishes specific face identities from each other. Both identity-independent familiarity information and face identity could be decoded in an overlapping set of areas in the core and extended systems. Representational similarity analysis revealed a clear distinction between the two systems and a subdivision of the core system into ventral, dorsal and anterior components. This study provides evidence that activity in the extended system carries information about both individual identities and personal familiarity, while clarifying and extending the organization of the core system for face perception.


Movie data fMRI acquisition and preprocessing
Eleven subjects watched the full-length feature film "Raiders of the Lost Ark" divided into eight parts of ∼14 min 20 s duration. Successive parts repeated the final 20s of the previous part to account for hemodynamic response. Data collected during overlapping movie segments were discarded from the beginning of each part. Participants viewed the first four parts of the movie in one session and were taken out of the scanner for a short break. Participants then viewed the remaining four parts after the break. Video was projected onto a rear projection screen with an LCD projector, which the subject viewed through a mirror on the head coil. The video image subtended a visual angle of ∼22.7° horizontally and 17° vertically. Audio was presented through MR-compatible headphones. Subjects were instructed to pay attention to the movie and enjoy (Guntupalli et al., 2016;. Additional details on the MR parameters can be found in the Supplementary material. Subjects were scanned in a Philips Intera Achieva 3T scanner with an 8 channel head coil at the Dartmouth Brain Imaging Center (original and preprocessed data could be obtained from http://datasets.datalad.org/?dir=/labs/haxby/raiders). Functional scans were acquired with an echo planar imaging sequence (TR=2.5 s, TE=35 ms, flip angle = 90°, 80 x 80 matrix, FOV=240 mm x 240 mm) every 2.5 s with whole brain coverage (41x3 mm thick interleaved axial slices). We acquired a total of 2718 functional scans with 1350 TRs in four runs during the first session and 1368 TRs in four runs during the second session. T1-weighted anatomical scans were acquired at the end of each session (MPRAGE, TR=9.85 s, TE=4.53 s, flip angle=8°, 256 x 256 matrix, FOV=240 mm, 160 1 mm thick sagittal slices). The voxel resolution was 0.938 mm x 0.938 mm x 1.0 mm. Each subject's fMRI data were preprocessed using AFNI software (Cox, 1996;http://afni.nimh.nih.gov). Functional data were corrected for the order of slice acquisition then for head motion by aligning to the last volume of the last functional run. Any spikes in the data were removed using 3dDespike in AFNI. Data were then filtered using 3dBandpass in AFNI to remove any temporal signal variation slower than 0.00667 Hz, faster than 0.1 Hz, and that correlated with the whole brain average signal or the head movement parameters. Residual data were then aligned to the MNI 152 brain template using nearest neighbor resampling and spatially smoothed with a 4 mm full-width-at-half-maximum (FWHM) Gaussian filter. Data acquired during the overlapping movie segments were discarded resulting in a total of 2662 TRs with 1326 TRs in the first session and 1336 TRs in the second session. We derived a gray matter mask by segmenting the MNI_avg152T1 brain provided in AFNI and removing any voxel that was outside the cortical surface by more than twice the thickness of the gray matter at each surface node. It included 54,034 3 mm isotropic voxels across both hemispheres (Guntupalli et al., 2016).

Permutations for Identity and Familiarity decoding
Statistical significance of the decoding analyses was computed using permutation testing (Stelzer, Chen, & Turner, 2013) coupled with Threshold-Free Cluster Enhancement (Smith & Nichols, 2009), as implemented in CoSMoMVPA (Oosterhof, Connolly, & Haxby, 2016). The identity decoding analysis was performed separately for Familiar and Unfamiliar identities, and permuted datasets were obtained by randomly permuting the four identity labels separately within each of the 11 runs. Out of the (4!) 11 possible permuted maps, we randomly selected 20 for each subject, and bootstrapped them to generate 10,000 group-level permuted maps (see Methods in main text, and (Stelzer et al., 2013).
For the familiarity decoding, the cross-validation scheme was performed across identities and not across runs. Because each identity had a clear association with a superordinate category that indicated the familiarity of the stimulus (Familiar or Unfamiliar), during permutation it was important not to divide the identity labels across runs into two different familiarity groups. Thus, permutation had to be performed at the level of identities rather than at the superordinate level of familiarity to avoid a possible positive bias. Because the decoding problem was binary, the number of valid permutations within each subject was much lower than in the identity decoding. For example, consider the eight labels " , $ , % , & , " , $ , % , & where ) indicates familiar identities, and ) indicates unfamiliar identities; any permutation that maps all familiar identities to unfamiliar identities, such as where , 4 indicate any permutation of the integers 1 to 4, will result in the same accuracy maps as the original one, because the order of the class "Familiar" and "Unfamiliar" does not matter. Thus, at least one identity label within each class must be kept fixed. Because also the order of the identity labels within each class does not matter, without loss of generality one can consider the first familiar identity label " as fixed, with the remaining seven identity labels to be determined. Once three labels have been determined, the remaining four are automatically determined (because the order within class does not matter). Thus, the total number of distinct permutations for this binary classification % 6 (7 choose 3). We note that the identity permutation is part of this set of possible permutations. Thus, we performed the familiarity decoding on these 35 permutations, and proceeded as described before to obtain null distribution maps.

Early visual cortex ROI analysis and HMAX features
To further analyze the results in early visual cortex found while decoding familiarity, we performed additional analysis on neural data, selecting early visual area masks from the probabilistic atlas by (Wang, Mruczek, Arcaro, & Kastner, 2015), as well as on visual features extracted from the face images using the HMAX model (Riesenhuber & Poggio, 1999;Serre, Wolf, Bileschi, Riesenhuber, & Poggio, 2007). All decoding analyses were performed with a linear SVM classifier, similar to the analyses in the main text.
For the neural data, we selected the bilateral V1v, V1d, V2v, V2d, V3v, V3d masks, downsampled to the MNI 2mm template, and created eight masks: V1v, V1d, V1; V2v, V2d, V2; V3v, V3d, V3; V1+V2; V1+V2+V3. We selected the voxels in these masks, and for each ROI we performed Familiarity Decoding analysis as follows. We crossvalidated across identities, as described in the methods in the main text. Within each training fold, we performed feature selection by keeping only the top 10% voxels according to the F-value of a one-way ANOVA. Such feature selection was performed in each training fold. To perform statistical assessment, we permuted the labels (as described in the methods of the main text) and performed the same analysis, obtaining 35 permuted accuracy values for each ROI and subject. Then, to bootstrap population level statistic, we randomly selected one accuracy value for each subject and each ROI, averaged to obtain an average accuracy across subject, and repeated this process 10,000 times to produce a null distribution of accuracy values (Stelzer et al., 2013). The empirical p-value was then computed as the number of null-distribution accuracies higher than the original accuracy divided by the total number of bootstraps. We added 1 to both the numerator and the denominator to account for the original value being considered (Ojala & Garriga, 2010). Supplementary Figure 7 shows the results of this analysis.
For the HMAX model features, we used the pre-trained model obtained from http://maxlab.neuro.georgetown.edu/hmax.html, and extracted C1 and C2 features for all the images used in the experiment, stacking them across patch sizes and orientations. Images were resized to 200 x 200 pixels to reduce computation time. For each subject separately (using only the images each subject saw in the experiment), we performed a classification analysis on Familiar vs. Unfamiliar face images across identities, and obtained 33 accuracies (one for each subject). We then generated a null distribution of accuracy values at the group level with the same process used with neural data (see above). Supplementary Figure 8 shows the result of this analysis.

Quantification of similarities within and between core and extended systems
To further quantify the similarity of representations within the systems, we compared the average within-system correlations with the average between-system correlations.
We first considered only the areas belonging to the core and extended system (without EV areas). Then, for each one of the distance matrices (corresponding to each subject for the task data, and to each pair of subjects for the movie data), we removed the diagonal and averaged the cells corresponding to the within-system correlations (core or extended system ROIs) and to the between-system correlations (core system ROIs correlated to extended system ROIs), and computed the difference between these two values. These differences were bootstrapped 10,000 times to obtain 95% BCa (bias-corrected and accelerated) confidence intervals (DiCiccio & Efron, 1996). This process was repeated for the further subdivisions of the core system (anterior, dorsal, ventral core system) and extended system (theory of mind, precuneus), considering only ROIs belonging to the core and extended system respectively: for example, the between-system correlations for the anterior core system were considered to be only the correlations between the anterior core ROIs with the dorsal and ventral core ROIs.
Comparison of second-order representational geometries obtained from task data and movie data We quantified the degree of similarity of the distance matrices used to generate the two MDS plots (one for the task data, and one for the movie data) by correlating (Spearman correlation) the upper triangular matrices of the distances, as well as computing the RV-coefficient, a measure of similarity between square matrices (Abdi, 2007(Abdi, , 2010Robert & Escoufier, 1976). Before computing the RV-coefficient, the distance matrices were normalized as follows (Abdi, Williams, Valentin, & Bennani-Dosse, 2012). Given a distance matrix D, we computed a cross-product matrix = − " $ : , where = − " < is the centering matrix, and O is a n x n matrix of ones. Then, was normalized by its first eigenvalue, resulting in = / " . We then computed the RV-coefficient between the two normalized distance matrices. To obtain 95% confidence intervals, we bootstrapped the individual subjects' distance matrices for the task data, and computed both the correlation and the RV-coefficient against the movie distance matrix.
To visualize the match between the MDS solutions obtained from the two datasets, we performed Procrustes alignment to obtain an affine transformation that would align the movie solution to the task solution, and visualized the two solutions in the first two dimensions (see Supplementary Figure 13).

Model-based RSA within systems
We performed model-based RSA within each ROI and for each subject separately using the task data. For each ROI we computed a neural RDM and correlated it (Spearman partial correlation) with four target model RDMs: a familiarity RDM, an identity RDM, and two RDMs obtained from the C1 and C2 layers of the HMAX model (Riesenhuber & Poggio, 1999) to control for low-level visual differences. The correlations were then averaged across the ROIs in each system, and visualized in Supplementary Figure 15.

Distances of EV ROIs from dorsal and ventral systems
We analyzed the position of the EV with respect to the ventral and dorsal streams by computing the following "Dorsal" index where d(ventral, EV) and d(dorsal, EV) indicate the average distance to the EV of the ventral and dorsal stream respectively; a positive index indicates that the EV is farther from the ventral stream, while a negative index indicates that the EV is closer to the ventral stream. We performed this analysis separately for each hemisphere and for the four EV ROIs using the task data to allow for statistical testing, before any dimensionality reduction with MDS. Table 1. MNI coordinates of the ROI centers sorted and color-coded with respect to the system they belong to, according to the models of (Duchaine & Yovel, 2015;Gobbini & Haxby, 2007;Guntupalli, Wheeler, & Gobbini, 2017;Haxby, Hoffman, & Gobbini, 2000). Abbreviation in parenthesis refers to the statistical map that was used to individuate the ROI: FAM, familiarity decoding; ID, identity decoding (see Methods in the main text for details). See also http://neurovault.org/collections/NEUNABLT/images/46823/ for the position of the ROIs.  Table 4. Difference of correlations for within-and between-system ROIs with the task data. Correlations of the subsystems of core and extended areas were computed using only ROIs for core and extended system respectively. Labels and differences in bold show systems whose 95% bootstrapped confidence intervals did not contain 0.  Table 5. Difference of correlations for within-and between-system ROIs with the movie data. Correlations of the subsystems of core and extended areas were computed using only ROIs for core and extended system respectively. Labels and differences in bold show systems whose 95% bootstrapped confidence intervals did not contain 0.  Figures   Figure 1. Comparison of cross-validation schemes for the familiarity decoding. The xaxis shows z-values from the familiarity classification using the leave-2-identities-out scheme, as reported in the main manuscript. The y-axis shows the same classification using a common leave-one-run-out scheme. Colors of the points map the location of the voxel in the axial plane, as shown in the inset. The leave-2-identities-out scheme successfully controls for identity information, as can be seen by the overall lower z-values for voxels belonging to the occipital cortex. The maps used to generate this plot are http://neurovault.org/collections/NEUNABLT/images/46809/ (x-axis) and http://neurovault.org/collections/NEUNABLT/images/46812/ (y-axis). (Wang et al., 2015). The x-axis shows z-values from the familiarity classification using the leave-2identities-out scheme, as reported in the main manuscript. The y-axis shows the same classification using a common leave-one-run-out scheme. Colors of the points map the location of the voxel in the axial plane, as shown in the inset. The leave-2-identities-out scheme successfully controls for visual information, as can be seen by the overall lower zvalues for voxels belonging to the early visual areas. The maps used to generate this plot are http://neurovault.org/collections/NEUNABLT/images/46809/ (x-axis) and http://neurovault.org/collections/NEUNABLT/images/46812/ (y-axis).   Maps were thresholded at a z-TFCE score of 1.65, corresponding to p < 0.05 one-tailed (corrected for multiple comparisons).    (Riesenhuber & Poggio, 1999;Serre et al., 2007) were used to classify the images used in the experiment, using a leave-two-identities-out cross-validation scheme (see Supplementary Methods and Methods in main text). Dashed black line shows 50% accuracy. Dashed blue line shows the average accuracy across participants in the original (non-permuted) dataset (indicated as M in the inset).   Because the MDS solution is invariant to orientation, the second dimension was multiplied by -1 to have the precuneus ROIs on top. In the figure in the main text labels were minimally jittered to avoid overlaps and increase legibility; here we report here the original non-jittered position.    . Partial Spearman correlation between the neural RDMs with model RDMs, averaged across the ROIs of each system. The top row shows the results for the familiarity and identity models, while the bottom row shows the results for the models obtained from the C1 and C2 features of the HMAX model (Riesenhuber & Poggio, 1999).