Main

Our experience of the world is profoundly shaped by memory. Whether we are shopping for a list of items at the grocery store or talking to friends at a social gathering, our actions depend critically on remembering a large number of visual objects. Multiple studies have explored the molecular12,13 and cellular14,15 basis for memory, but the network-level code remains elusive. How is a familiar song, place or face encoded by the activity of neurons?

Recent work on the sensory code for visual object identity in the inferotemporal (IT) cortex suggests that objects are encoded as points in a continuous, low-dimensional object space, with single IT neurons linearly projecting objects onto specific preferred axes2,3,4 (Fig. 1a, left). These axes are defined by weightings of a small set of independent parameters spanning the object space. This coding scheme (also referred to as linear mixed selectivity16,17, and related to disentangled representations in machine learning18) is efficient, allowing a huge number of different objects to be represented by a small number of neurons. Indeed, the axis code carried by macaque face patches allows detailed reconstruction of random realistic faces using activity from only a few hundred neurons3.

Fig. 1: Cells in face patches are modulated by familiarity.
figure 1

a, Two alternative schemes for face representation: a low-dimensional (low-d) continuous feature space (left) and a set of discrete attractors (right). b, Left, view preference test. Pairs of faces (n = 72), one familiar and one unfamiliar, were presented for 10 s and the time spent fixating each was recorded. Right, ratio of time spent fixating personally familiar versus unfamiliar faces for two animals (each dot represents one face pair). Error bar, mean ± s.e.m. c, Responses of cells to stimuli from six stimulus categories (familiar human faces, unfamiliar human faces, familiar monkey faces, unfamiliar monkey faces, familiar objects and unfamiliar objects) across three face patches (AM, PR and TP). Responses were averaged between 50 and 300 ms following stimulus onset (‘full’-response window; AM, n = 152 cells; PR, n = 171 cells; TP, n = 266 cells; Supplementary Methods and Supplementary Table 1 provide additional statistical information). d, Same as c but for the ‘short’-response window (AM, 50–125 ms; PR, 50–150 ms; TP, 50–150 ms). e, Similarity (Pearson correlation coefficient) matrix of population responses for full-response window. f, Same as e but for short-response window. g, Left, average response time course across AM (top), PR (middle) and TP (bottom) populations to each of the screening stimuli. Right, AM, PR and TP response time courses averaged across both cells and category exemplars (normalized for each cell; Supplementary Methods). Left-hand arrow indicates mean time when visual responses to faces became significantly higher than baseline (AM, 85 ms; PR, 105 ms; TP, 75 ms; Supplementary Methods); right-hand arrow indicates mean time when responses to familiar versus unfamiliar faces became significantly different (AM, 105 and 145 ms for human and monkey faces, respectively; PR, 155 and 215 ms, respectively; TP, 145 and 205 ms, respectively). Shaded areas, s.e.m. across neurons.

Source Data

Here we set out to leverage recent insight into the detailed sensory code for facial identity in IT cortex3 to explore the population code for face memories. A long-standing assumption of neuroscience is that long-term memories are stored by the same cortical populations that encode sensory stimuli1. This suggests that the same neurons that carry a continuous, axis-based, object-coding scheme should also support tagging of a discrete set of remembered objects as familiar. However, schemes for representing discrete familiar items often invoke attractors19,20 that would lead to breakdowns in continuous representation (Fig. 1a, right). This raises a key question: does familiarity alter the IT axis code for facial identity? We surmised that discovering the answer might uncover the neural code for face memory.

Previous studies have generally found decreased and sparsened responses to familiar stimuli in IT and perirhinal cortex and have proposed that this decrease, or ‘repetition suppression’, is the neural correlate of object memory5,6,7,8,9,10,11. However, these studies were not targeted to specific subregions of IT cortex known to play a causal role in discrimination of the visual object class being studied21 and where the visual feature code is precisely understood3. Here, to study the neural mechanism that represents long-term object memories, we targeted three regions: anterior medial face patch (AM), the most anterior face patch in IT cortex22, and PR and TP, two recently reported face patches in the perirhinal cortex and anterior temporal pole, respectively23,24. These three regions lie at the apex of the macaque face patch system, an anatomically connected network of regions in the temporal lobe dedicated to face processing22,25,26,27,28,29. AM harbours a strong signal for invariant facial identity3,22, perirhinal cortex is known to play a critical role in visual memory30,31,32,33 and TP has recently been suggested to provide a privileged pathway for rapid recognition of familiar individuals24. We thus hypothesized that a representation of face memory should occur in AM, PR and/or TP.

Our recordings showed that, in all three patches, familiar faces were distinguished from unfamiliar faces. First, in all three patches, familiar faces were represented in a subspace distinct from unfamiliar faces. Second, in all three patches the relative response magnitude to familiar faces differed significantly from that to unfamiliar faces; however, the sign of this difference was not stable and depended strongly on the relative frequency of presentation of familiar and unfamiliar faces (that is, temporal context). Third, and most strikingly, in AM and PR, but not in TP, familiar faces were encoded by a unique geometry at long latency; furthermore, unlike response magnitude, this unique geometry associated with familiar faces was stable across contexts. These results suggest that the memory of familiar faces is primarily represented in face patches AM and PR through axis change rather than altered response magnitude. This conclusion—that a major piece of the network code for visual memory is temporally multiplexed with the perceptual code and activated only at long latency—sheds light on how we can both veridically perceive visual stimuli and recall past experiences from them using the same set of neurons.

AM and PR are modulated by familiarity

We identified face patches AM, PR and TP in five animals using functional magnetic resonance imaging25. To characterize the role of familiarity in modulating the responses of cells in AM, PR and TP, we targeted electrodes to these three patches (Extended Data Fig. 1) and recorded responses to a set of screening stimuli consisting of human faces, monkey faces and objects. The stimuli were either personally familiar or unfamiliar (Extended Data Fig. 2a), with eight or nine images per category. Personally familiar images depicted people, monkeys and objects with which the animals interacted on a daily basis; a new set of unfamiliar images was presented per recording site. Animals showed highly significant preferential looking towards the unfamiliar face stimuli and away from familiar face stimuli (Fig. 1b), confirming behaviourally that these stimuli were indeed familiar to the monkey34. Monkeys also performed significantly better on a face identification task for familiar compared with unfamiliar faces (Extended Data Fig. 3a,b), indicating a behavioural recognition advantage for familiar faces.

Across the population, 93% of cells in AM, 74% in PR and 88% in TP were face selective (Extended Data Fig. 3c). Below, we group data from three monkeys for AM, three for PR and two for TP becaue we did not find any marked differences between individuals (Extended Data Figs. 4 and 5 show the main results separately for each animal). All three patches exhibited a significantly stronger response across the population to unfamiliar compared with personally familiar stimuli in this experiment (Fig. 1c). This is inconsistent with a recent study reporting that TP is specialized for representing personally familiar faces24 (however, the latter study never actually presented unfamiliar faces but contrasted responses only to personally versus pictorially familiar faces; Extended Data Fig. 6a–c provides further detail). Further casting doubt on a specialized role for TP in encoding personally familiar faces, we found that the response in TP to faces of other species was stronger than to human or monkey faces (Extended Data Fig. 6d–f). Overall, the pattern of decreased responses to familiar faces across AM, PR and TP is consistent with a large number of previous studies reporting suppression of responses to familiar stimuli in IT and perirhinal cortex5,6,7,8,9,10. Individual cells showed a diversity of selectivity profiles for face species and familiarity type (Extended Data Fig. 7a–c). Representation similarity matrices showed distinct population representations of the six stimulus classes in both AM and PR, and more weakly in TP (Fig. 1e).

Mean responses to familiar versus unfamiliar faces diverged over time, with difference becoming significant at 125 ms in AM, 185 ms in PR and 175 ms in TP; the mean visual response to faces themselves significantly exceeded baseline earlier, at 85 ms in AM, 105 ms in PR and 75 ms in TP (Fig. 1g). The delay in suppression to familiar faces is consistent with previous reports of delayed suppression to familiar stimuli in IT5,7,8,9. Single-cell response profiles and representation similarity matrices computed using a short time window showed less distinct responses to familiar versus unfamiliar stimuli (Fig. 1d,f). Overall, the results so far show that AM, PR and TP all exhibit long-latency suppression to familiar faces.

An axis code for unfamiliar faces

Responses of AM, PR and TP cells to familiar stimuli, although lower on average at long latencies, remained highly heterogeneous across faces (Fig. 1c and Extended Data Fig. 7a–c), indicating that cells were driven by both familiarity and identity. We next asked how familiarity interacts with the recently discovered axis code for facial identity3.

According to this axis code, face cells in IT compute a linear projection of incoming faces formatted in shape and appearance coordinates onto specific preferred axes3. For each cell, the preferred axis is given by the coefficients c in the equation r = c·f + c0, where r is the response of the cell, f is a vector of shape and appearance features and c0 is a constant offset (Supplementary Methods); shape features capture variations in the location of key facial landmarks (for example, outline, eye, nose and mouth positions and so on) whereas appearance features capture the shape-independent texture map of a face3. Together, a population of face cells with different preferred axes encodes a face space that is embedded as a linear subspace of the neural state space. The axis code has so far been examined only for unfamiliar faces. By studying whether and how this code is modified by familiarity, we reasoned that we could potentially understand the code for face memory.

We first asked whether face cells encode familiar and unfamiliar faces using the same axis. To address this, we examined tuning to unfamiliar faces (described in this section) and then compared this with tuning to familiar faces (described in the next section). We began by mapping the preferred axes of AM, PR and TP cells using a set of 1,000 unfamiliar monkey faces (Extended Data Fig. 2b). We used monkey faces because responses to the screening stimuli were stronger to monkey than to human faces on average in AM/PR/TP (Fig. 1c; P < 4 × 10−6, two-sided paired t-test, t = −4.68, degrees of freedom = 588, difference = 0.75 Hz, 95% confidence interval = [0.44, 1.07], n = 589 cells pooled across AM, PR and TP). The 1,000 monkey faces were randomly drawn from a monkey face space defined by 120 parameters (Supplementary Methods) encompassing a wide variety of identities, allowing the selection of a subset that was matched in feature distributions to familiar faces (Extended Data Fig. 8).

As expected, cells in AM showed ramp-shaped tuning along their preferred axes (Fig. 2a and Extended Data Fig. 3e). Interestingly, a large proportion of cells in PR and TP also showed ramp-shaped tuning along their preferred axes (Fig. 2a and Extended Data Fig. 3e). To our knowledge this is the first time that axis coding of visual features has been reported for face patches outside the IT cortex. In all three patches, preferred axes computed using split halves of the data were highly consistent (Extended Data Fig. 3f). These results suggest that AM, PR and TP share a common axis code for representing unfamiliar faces.

Fig. 2: AM and PR cells use different axes to represent familiar versus unfamiliar faces.
figure 2

a, Three example cells showing axis tuning. Top, mean response as a function of distance along the preferred axis. Green (yellow) dots denote responses to eight random unfamiliar (nine personally familiar) faces. Error bar, s.e.m. Bottom, responses to 1,000 unfamiliar faces, projected onto the cell’s preferred axis and principal (longest) orthogonal axis in the face feature space. Response magnitudes are colour coded. b, Population analysis comparing preferred axes for familiar versus unfamiliar faces. Distribution of cosine similarities between axes computed using 1,000 − 36 unfamiliar faces and 36 omitted unfamiliar faces (orange), and between axes computed using 1,000 − 36 unfamiliar faces and 36 familiar faces (blue) are shown. Preferred axes were computed using the top ten shape and top ten appearance features of presented faces. Inset, control experiment with 36 low-contrast faces in place of 36 familiar faces. c, Time course of cosine similarity between preferred axes for unfamiliar–unfamiliar (orange) and unfamiliar–familiar (blue) faces as in b. Arrowheads indicate when differences became significant (AM, 105 ms; PR, 175 ms; TP, 255 ms; one-tailed t-test, P < 0.001; AM, n = 134 cells; PR, n = 72 cells; TP, n = 197 cells; Supplementary Methods and Supplementary Table 1 give additional information about statistical tests). Shaded areas, s.e.m. d, Time course of linear decoding of facial features (Supplementary Methods). Shaded areas, s.e.m. Lighter colour, same analysis using stimulus identity-shuffled data (ten repeats). e, Example linearly reconstructed faces from short (120–170 ms) and long (220–270 ms) latency responses combining cells from both AM and PR. Reconstructions were performed using linear decoders trained on a large set of unfamiliar faces, similar to d (Supplementary Methods).

Source Data

Off-axis responses to familiar faces

We next examined how familiarity modulates the axis code. We projected the features of personally familiar and a random subset of unfamiliar faces onto the preferred axis of each AM/PR/TP cell and plotted responses. In AM and PR, responses to unfamiliar faces followed the axis (Fig. 2a, green dots) whereas, strikingly, responses to familiar faces departed from the axis (Fig. 2a, yellow dots).

This departure in AM and PR was not a simple gain change: the strongest responses to familiar faces were often to faces projecting somewhere in the middle of the ramp rather than on the end (Fig. 2a). It cannot be explained, therefore, by an attentional increase or decrease to familiar faces, which would elicit a gain change35. Indeed, the effect cannot be explained by any monotonic transform in response, such as repetition suppression or monotonic sparsening8,10, because any such transform should preserve the rank ordering of preferred stimuli.

The surprising finding of off-axis responses to familiar faces was prevalent across the AM and PR populations, but not TP. To quantify this phenomenon at the population level we first created a larger set of familiar faces. To this end, animals were shown face images and videos daily for at least 1 month, resulting in a total of 36 familiar monkey faces, augmenting the nine personally familiar monkey faces in our initial screening set (Extended Data Fig. 2c and Supplementary Methods). Preferential looking tests confirmed that pictorially and cinematically familiar faces were treated similarly to the personally familiar faces (Extended Data Fig. 3d). These 36 familiar faces were presented randomly interleaved with the 1,000 unfamiliar monkey faces while we recorded from AM, PR and TP.

We computed preferred axes for cells using responses to the 36 familiar faces. We found that, when familiar and unfamiliar faces were matched in number (36), familiar axes performed as well in explaining responses to familiar faces as unfamiliar axes in explaining responses to unfamiliar faces (Extended Data Fig. 3g,h). The comparable strength of axis tuning for familiar and unfamiliar faces naturally raised the question: are familiar and unfamiliar axes the same?

To compare familiar and unfamiliar axes, for each cell we first computed the preferred axis using responses to the large set of unfamiliar faces (1,000 − 36 faces). We then correlated this to a preferred axis computed using responses to either (1) the set of 36 familiar faces (‘unfamiliar–familiar’ condition) or (2) the omitted set of 36 unfamiliar faces (‘unfamiliar–unfamiliar’ condition). The distribution of correlation coefficients showed significantly higher similarities for the unfamiliar–unfamiliar compared with the unfamiliar–familiar condition in AM and PR, but not in TP (Fig. 2b).

As a control, we presented a set of low-contrast faces expected to elicit a simple decrease in response gain but preserving rank ordering of preferred stimuli. Confirming expectations, axis similarities computed using these contrast-varied faces were not significantly different for high–high- versus high–low-contrast faces (Fig. 2b, inset). As a second control, to ensure that the effects were not due to differences in the feature content of familiar versus unfamiliar faces, we identified 30 familiar and 30 unfamiliar faces that were precisely feature matched. In brief, we used gradient descent to search for a subset of familiar and unfamiliar faces that were matched in the distribution of each feature as well as in the distribution of pairwise face distances (Supplementary Methods and Extended Data Fig. 8). We recomputed unfamiliar–familiar and unfamiliar–unfamiliar correlations and continued to find that familiar faces were encoded by a different axis than unfamiliar faces in AM and PR, but not in TP (Extended Data Fig. 9a, top). Finally, we confirmed that axis divergence persisted when axes were computed using only the subset of cells showing significant axis tuning for both familiar and unfamiliar faces (Extended Data Fig. 9a, middle).

Previously we observed that the decrease in firing rate for familiar faces occurred at long latency (Fig. 1g). We next investigated the time course of the deviation in the preferred axis. We performed a time-resolved version of the analysis in Fig. 2b, comparing the preferred axis computed from 36 unfamiliar or 36 familiar faces with that computed from 1,000 − 36 unfamiliar faces over a rolling time window (Fig. 2c). Initially, axes for familiar and unfamiliar faces were similar but, at longer latency (t > 105 ms in AM, t > 155 ms in PR), the preferred axis for familiar faces diverged from that for unfamiliar faces.

The divergence in preferred axis over time for familiar versus unfamiliar faces suggests that the brain would need to use a different decoder for familiar versus unfamiliar faces at long latencies. Supporting this, in both AM and PR, at short latencies, feature values for familiar faces obtained using a decoder trained on unfamiliar faces matched actual feature values, and reconstructions were good (Fig. 2d,e). By contrast, a decoder trained on unfamiliar faces at long latency performed poorly on recovering feature values of familiar faces (Fig. 2d,e).

Could the apparent axis change be explained by a simpler change—for example, sensitivity decrease in a subset of features or an output nonlinearity change, without necessitating a change in axis? Further analyses demonstrated that these simpler models could not explain the change in responses of cells to familiar faces (Extended Data Fig. 9b,c).

An early shift in familiar face subspace

So far we have uncovered a distinct geometry for encoding familiar versus unfamiliar face features in AM and PR at long latency. But how is the categorical variable of familiarity itself encoded in AM and PR? Previous studies have suggested that familiarity is encoded by response suppression across cells5,6,7,8,9,10. Supporting this, our first experiment (a screening set consisting of familiar and unfamiliar human faces, monkey faces and objects) showed a decreased average response to familiar compared with unfamiliar faces (Fig. 1). However, to our great surprise, data from our second experiment (1,000 unfamiliar faces interleaved with 36 familiar faces; Fig. 2) showed a stronger mean response to familiar compared with unfamiliar stimuli (Fig. 3a,b). This was true even when we compared responses to the exact same subset of images (Extended Data Fig. 9d). What could explain this reversal? The two experiments had one major difference: in the first experiment the ratio of familiar to unfamiliar faces was 34:16 whereas in the second the ratio was 36:1,000 (in both experiments, stimuli were randomly interleaved and presentation times were identical). Thus the expectation of familiar faces was much lower in the second experiment. Previous studies in IT have suggested that expectation can strongly modulate response magnitudes, with unexpected stimuli exhibiting stronger responses36. The marked reversal of relative response magnitude to familiar versus unfamiliar faces across the two experiments suggests that mean response magnitude is not a robust indicator of familiarity, because it depends on temporal context. Importantly, and, by contrast, axis change for familiar faces was stable across the two experiments (Extended Data Fig. 9e).

Fig. 3: An early shift in response subspace allows decoding of familiarity.
figure 3

a, Responses of cells to stimuli from 36 familiar and 1,000 unfamiliar monkey faces, averaged over 50–300 ms following stimulus onset. b, Response time course across AM, PR and TP populations, averaged across cells and all familiar or unfamiliar faces from the 1,036 monkey face stimulus set. Shaded areas, s.e.m. Red arrowheads indicate the time when responses to faces became significantly higher than baseline (AM, 95 ms; PR, 105 ms; TP. 90 ms; Supplementary Methods). Green arrowheads indicate the time when responses to familiar versus unfamiliar faces became significantly different (AM, 125 ms; PR, 135 ms; TP, 105 ms). c, Black line, time course of accuracy for decoding familiarity; shaded area, s.e.m.; grey line, chance level. Arrowheads indicate the time at which decoding accuracy rose above chance (AM, 95 ms; PR, 105 ms; TP, 135 ms). d, Time course of neural distance between centroids of 36 familiar and 1,000 − 36 unfamiliar face responses (blue) and between centroids of responses to a subset of 36 unfamiliar faces and responses to the remaining 1,000 − 36 unfamiliar faces (orange). Arrowheads indicate the time when d′ along the two centroids became significant (AM, 95 ms; PR, 105 ms; TP, 135 ms). e, Distribution of differences between mean firing rates to familiar and unfamiliar faces at three different time intervals. Grey bars indicate cells showing a significant difference (Supplementary Methods). f, Distribution of cosine similarities between familiarity decoding and face feature decoding axes at short (50–150 ms) and long (150–300 ms) latency for the first 20 features (ten shape, ten appearance). g, Schematic illustration of neural representation of familiar (blue) and unfamiliar (orange) faces at short and long latency for AM and PR.

Source Data

Even more challenging to the repetition suppression model of familiarity coding, the accuracy for decoding familiarity rose above chance extremely early, starting at 95 ms in AM, 105 ms in PR and 135 ms in TP (Fig. 3c, decoding using responses from Experiment 2); in PR this occurred even before any significant difference in mean firing rates between familiar and unfamiliar faces (compare black arrowheads in Fig. 3c with green arrowheads in Fig. 3b). What signal could support this ultrafast decoding of familiarity, which moreover generalized across face identity, if not mean firing rate difference? Recall earlier that we had found that, at short latency, familiar faces were encoded using the same axes as unfamiliar faces (Fig. 2d). This implies that, at short latency, familiar and unfamiliar faces are represented in either identical or parallel manifolds. Agreeing with this, familiar face features could be readily decoded using a decoder trained on unfamiliar faces (Fig. 2e). This suggested to us that their representations might be shifted relative to each other and that this shift is what permits early familiarity decoding. A plot of the neural distance between familiar and unfamiliar response centroids over time supported this hypothesis (Fig. 3d): the familiar–unfamiliar centroid distance increased extremely rapidly compared with that of unfamiliar–unfamiliar, and d′ (Supplementary Methods) along the unfamiliar–familiar centroid axis became significantly higher than a shuffle control at 95 ms in AM, 105 ms in PR and 135 ms in TP, equal to the time when familiarity could be decoded significantly above chance in each of these areas. Direct inspection of shifts between responses to familiar versus unfamiliar faces across cells showed a distribution of positive and negative values that could be exploited by a decoder for familiarity (Fig. 3e).

Further supporting the shift hypothesis, we found that the familiarity decoding axis was orthogonal to the face feature space at both short and long latency. We computed cosine similarity in the neural state space between the familiarity decoding and face feature decoding axes, both familiar and unfamiliar, for 20 features capturing the most variance. The resulting values were tightly distributed around 0 at both short (50–150 ms) and long (150–300 ms) latency (Fig. 3f). Overall, these results suggest a geometric picture in which familiar and unfamiliar stimuli are represented in distinct subspaces, with the familiar face subspace shifted relative to the unfamiliar face subspace at short latencies and then further distorted at long latencies in AM and PR (Fig. 3g).

Localization of the site of face memory

The finding of memory-driven axis change at long latency in AM and PR is consistent with decades of functional studies suggesting a unique role for interactions between IT and the medial temporal lobe in memory formation37,38. Is the distinct representational geometry for familiar faces at long latency in AM due to feedback from PR? To address this we silenced PR while recording responses to familiar and unfamiliar faces in AM (Fig. 4a). IT cortex is known to receive strong feedback from perirhinal cortex39, and this is true in particular for face patch AM29. Consistent with this, inactivation of PR produced strong changes in AM responses with some cells showing an increase in response and others a decrease (Fig. 4b,c).

Fig. 4: Axis change for familiar faces does not depend on PR feedback to IT.
figure 4

a, Schematic of experiment aimed at identifying the origin of memory-related signals in AM. Muscimol was injected into face patch PR and the responses of AM neurons were recorded both before and after PR inactivation using a multi-electrode probe. b, Responses of two example AM cells to screening stimuli before and after PR inactivation. c, Response profiles of the AM population to screening stimuli before and after PR inactivation. d, Time course of the similarity between preferred axes for unfamiliar–unfamiliar and unfamiliar–familiar faces (as in Fig. 2c) before and after PR inactivation. Shaded areas, s.e.m. e, Normalized firing rate changes for familiar and unfamiliar faces induced by PR inactivation (n = 85 cells). f, Familiarity decoding accuracy for screening stimuli before and after PR inactivation (n = 17 for human and monkey faces, n = 16 for objects). For objects, chi-square test χ2 (1, N = 16) = 0.58, P = 0.45. g, Face feature decoding error (mean square error) for unfamiliar (n = 1,000) and familiar (n = 36) faces before and after PR inactivation.

Source Data

We next asked whether feedback modulation from PR specifically affected AM responses to familiar faces, as one might expect if PR were the source of AM memory signals. We found that divergence between familiar and unfamiliar axes at long latency continued to occur in AM following PR inactivation (Fig. 4d). Indeed, responses to familiar and unfamiliar faces were similarly modulated by PR inactivation across the population (Fig. 4e). Finally, decoding of both face familiarity and face features from AM activity was unaffected by PR inactivation (Fig. 4f,g). Overall, these results show that inactivation of PR had a strong effect on the gain of AM responses but no apparent effect on face coding, including memory-related axis change.

Do signatures of familiarity coding, as observed in AM, PR and TP, exist even earlier in the face patch pathway? We mapped responses to familiar and unfamiliar faces in middle lateral face patch (ML), a hierarchically earlier patch in the macaque face-processing pathway that provides direct anatomical input to AM22,29. Responses to the screening stimuli in ML exhibited a similar pattern as in AM, showing suppression to personally familiar faces at long latency (Extended Data Fig. 10a,c). However, population representation similarity matrices did not show distinct responses to familiar versus unfamiliar faces (Extended Data Fig. 10b). Furthermore, the population average firing rate showed a sustained divergence between responses to familiar and unfamiliar faces much later than in AM (160 compared with 140 ms in AM; Extended Data Fig. 10c), suggesting that ML may receive a familiarity-specific feedback signal from AM. Importantly, ML neurons also showed axis divergence (Extended Data Fig. 10d–f), consistent with the idea that memory is stored in a distributed way across the entire hierarchical network used for representation40. Finally, familiarity could be decoded in ML even earlier than in AM (Extended Data Fig. 10g–i). Overall, these results suggest that ML also plays a significant role in storing memories of faces.

Discussion

In this paper we investigated the elusive neural code for long-term object memory. Although classic lesion studies suggest that long-term object memories should reside in IT cortex1, recent work on IT coding has focused on representation of incoming visual input and concluded that IT neurons extract high-level visual features agnostic to semantic content4,41. How can such meaning-agnostic, feature-selective cells be responsible for encoding long-term object memories that are highly context and familiarity dependent? Here we shed light on this conundrum, finding that in anterior face patches AM and PR a distinct neural code for familiar faces emerges at long latency in the form of a change in preferred axis. Thus, feedforward feature-coding properties of IT cells may be reconciled with a putative role in long-term memory through temporal multiplexing. Inactivation of PR did not affect axis change dynamics in AM, suggesting that the memory-related axis change mechanism may be intrinsic to IT cortex.

Previous physiological work on representation of familiar stimuli has focused largely on repetition suppression, the observation that the response to familiar stimuli is reduced5,6,7,8,9,10,11. We found that repetition suppression was not a robust indicator of familiarity in any face patch. Instead, relative response amplitude to familiar versus unfamiliar faces was highly sensitive to temporal context. We speculate that these relative response amplitudes, and associated neural distances and decoding accuracies (Extended Data Fig. 10j,k), may reflect momentary changes in stimulus saliency rather than face memory. By contrast, axis change for familiar faces at long latency was consistent across context (Extended Data Fig. 9e), indicating a reliable code for face memory.

What could the computational purpose of this axis change be? We speculate that, by lifting representations of face memories into a separate subspace from that used to represent unfamiliar faces (Fig. 3g), attractor-like dynamics may be built around these memories through a recurrent network to allow reconstruction of familiar face features from noisy cues without interfering with veridical representation of sensory inputs42,43. Computational considerations make it clear that the ability to recall (that is, reconstruct from noisy cues) a large number of familiar faces requires a code change. This is because a perfectly disentangled representation (the axis code) is inherently low dimensional; the memory capacity of a recurrent network using disentangled representations increases only linearly with the number of dimensions of the representation44. Importantly, recoding stimuli with small, nonlinear distortions of disentangled representations can significantly increase the memory capacity to one that scales linearly with the number of neurons43,44, as in Hopfield networks with random memories43. We hypothesize that long-latency axis change reflects this recoding. To date, studies of IT have emphasized the stability of response tuning over months45,46. Our results suggest such stability coexists with a precisely orchestrated dynamics for representing familiar stimuli through the mechanism of long-latency change in axis.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.