Introduction

Face perception is one of the most crucial abilities for social cognition in primates including humans. Research in humans revealed an impairment of face recognition when faces are displayed upside-down (i.e. image plane rotation for 180 degrees)1. Rotation disproportionally affects faces, known as the face inversion effect (FIE), but to a lesser degree other object classes, known as viewpoint dependency1, unless they are exemplars of a class of expertise acquired through extensive exposure, like dogs for dog experts and cars for car experts2. These findings served as the basis to explore how faces become special and how face-like processing can be obtained through extensive exposure and discrimination practice of visually similar exemplars3,4. The FIE might reflect a computational limitation of the visual system to cope with inverted complex objects and the constraint to deal with such complex visual inputs given the “special” way of (expert) processing by default5,6. Early evidence for a special underlying mechanism in processing faces indicates that configural information is more important in upright than inverted faces7,8. Over the years evidence accumulated supporting the assumption that configural information is explicitly represented as precise spatial relationship among facial features2,9,10,11 or implicitly as a combination of input from neurons selective for complex features12, rather than a undifferentiated template, a Gestalt pattern13. Further, FIE occurs at the level of perceptual encoding rather than at the level of long-term memory representation14,15. The behavioral findings in non-human primates in terms of configural/holistic processing of faces are largely inconsistent. The inconsistency stems from a number of sources: Many studies in Rhesus monkeys (Macaca mulatta) and chimpanzees (Pan troglodytes) used experimental paradigms inapplicable for testing FIE: (i) Matching-to-sample (MTS) tasks were designed in a way that a cue picture was presented in upright position and match-distractor picture pairs in inverted conditions (or the other way around)16,17,18,19. In a visual-paired-comparison (VPC) task, two identical images of an upright face were followed by the same face in combination with a different face, both inverted20. With these paradigms, an effect (discrimination performance16,17,18,19 or novelty preference20) automatically reflects the combination of a general view-dependency known from view-based object recognition21,22 and the face inversion effect. It is nearly impossible to disentangle these two factors and separate the effect solely due to face specific processing. (ii) In many studies there is no temporal separation between the cue and the match-distractor picture pair19,23,24. Participants might rely on picture-based matching techniques, rather than holistic/configural processing mechanisms17, or a combination of both. (iii) Stimulus material is not well-controlled in terms of low-level properties and irrelevant features, such as background25,26 or external facial cues19,27 or are not naturalistic28. E.g. In experimental testing using static images, individuals are easily discriminated based on external cues like the hairline given the high degree of variance among individuals19. However, these cues are not diagnostic in real life situations: External cues are not as static as internal face features and hence do not provide reliable information. (iv) In some studies, participants were extensively trained on a set of stimuli, before tested on manipulated versions, such as inverted faces28,29. This might lead to a qualitatively different processing strategy depending on the amount of exposure. (v) Many studies showed confounds involving cognitive variables with unknown characteristics, such as associative mechanisms to assign a face to a symbol30 or have no clear prediction or make inadequate interpretation of the outcome: e.g. Increased preference scores for upright than inverted faces in Japanese macaques do not necessarily reflect the FIE26. (vi) Moreover, some studies are conceptually deficient: Instead of evaluating discrimination performances of faces based on their identity (e.g. face 1, face 2; subordinate-level categorization), participants have been tested on basic-level categorization, e.g. discriminating a face from a scrambled face27, or identifying one picture of a picture pair that always appears in conjunction31. For such types of discriminations, a simple classification network with no specialized face processing capabilities is sufficient.

The findings from studies testing chimpanzees, the human's closest relatives, are more consistent than findings from monkey studies. Aside from the face-symbol association task30, chimpanzees showed face-specific inversion effects being stronger for familiar than unfamiliar faces16,17,32. The question remains why similarly deficient paradigm revealed FIE in chimpanzees17, but not in monkeys18. The most plausible answer – even acknowledged by the authors of some of the studies19 - is that monkeys took advantage of these paradigms and used a strategy that does not rely on the face-processing system, e.g.33. The fact that chimpanzees showed FIE despite the deficient paradigm might indicate that configural processing mechanisms were effective enough to overshadow additional factors to at least some extent. This, however, does not indicate that in monkeys these configural processing mechanisms are less strong or not present34. We cannot reject the possibilities that previous conclusions in chimpanzee studies might have been drawn based on artifacts caused by the above-mentioned methodological drawbacks and hence, are to some degree questionable. However, there is plenty of evidence from scientifically valid assessments in monkeys of various species that FIE exists in those species and FIE reflects configural processing of facial features34,35,36,37,38,39,40,41. Hence, the lack of FIE in several studies cannot be interpreted as evidence for differences in the selectivity of configural/holistic processing for faces in those species42, but rather as a product of a mixture and interaction of uncontrolled factors due to methodological drawbacks.

In the human literature, most of the above-mentioned issues are less prominent. Many studies compared long-term memory representation of faces to inverted presentations1,43, requiring no initial encoding phase (the cue presentation in a match-to-sample task). Hence, the issues about inversion of cue and match-distractor stimuli (i) and temporal separation of cue and match-distractor (ii) become obsolete. Those studies that do rely on a direct comparison of cue and match stimuli, such as a delayed matching-to-sample or a two-alternative-forced-choice (2AFC) task, did correctly apply inversion to all stimuli (e.g.14,44,45).

In the present study we test chimpanzees and examined the following two major issues: First, we re-examined the FIE in chimpanzees to shed light into the ongoing debate about whether and to what extent this well-accepted behavioral hallmarks of face processing in humans is a general characteristic. We therefore used a paradigm avoiding all potential methodological conflicts: we presented chimpanzee and human faces in a matching-to-sample task in both upright and inverted conditions with a delay after the sample stimulus. Inverted faces included the cue stimulus as well as the match-distractor pair (i) presented with temporal delay (DMS) (ii). We further used well-controlled stimuli with no diagnostic low-level characteristics (iii) and did not explicitly train our participants on inverted faces (iv). Second, we used chimpanzees to explore additional factors that might modulate the FIE and that might help to re-evaluate previous studies on FIE in non-human primates. The chimpanzees at the Primate Research Institute of Kyoto University have a unique exposure history: they are living in social groups of 6 or 7 other individuals (2 groups) with visual contact to the other group of chimpanzees. Besides the relatively limited exposure to chimpanzees, they are exposed to an increasing amount of human faces (researchers, care takers, visitors, etc.) over a lifetime. By examining chimpanzees of distinct age groups, performance changes due to this “uneven” exposure between chimpanzee and human faces help to understand if perceptual learning can shape a perceptual system along different dimensions than originally tuned to by perceptual narrowing early in life. In other words, how flexible does the face perception system respond to changes in the environment in terms of exposure to specific face classes? We recently showed that the perceptual system tunes toward the face class most exposed to46. Hence, the question here is to what extent face inversion affects discrimination of face classes with distinct perceptual tuning. Thus, we predict that the face inversion affects not all types of faces by default, but is rather selective to the face class the perceptual system is tuned to. We therefore predict an increased FIE for the chimpanzee as opposed to human faces in young chimpanzees (around 10 years of age, YC) due to a distinctive tuning of the perceptual system toward conspecific face class and an increased FIE for human faces as opposed to chimpanzee faces for older chimpanzees (around 30 years of age, OC) due to a distinctive tuning of the perceptual system toward the human face class.

Results

We tested the discrimination performances (percent correct responses) of young and old chimpanzee participants for upright and inverted faces of chimpanzees and humans (Figure 1a). A cue stimulus (e.g. face image 1 of individual 1) was centrally presented for 750 ms, followed by an inter-stimulus interval (ISI) of 500 ms and the match-distractor stimulus pair, with the stimulus for the match being a different image than the cue stimulus to avoid picture-matching strategies (e.g. face image 2 of individual 1 and a face image of individual 2). The participants were required to indicate which of the two pictures of the match-distractor pair displayed the same individual as the cue picture by touching it. Critical for our hypothesis is to replicate the modulation of correct responses for the two types of faces between age groups as shown in our previous study46, reflecting the specific tuning to one or the other face class: We therefore ran a mixed model ANOVA (with stimulus class and age group as fixed factors and participants as random factor nested in age group) and found a significant interaction between the factors age group and stimulus class (F(1,11) = 7.79, p < .05, mean square = .047) (Figure 1b, solid colors). There were no significant main effects for the factors age group (p = .96) and stimulus class (p = .51). Jarque-Bera tests affirmed normally distributed samples in both age groups and stimulus classes (all p > 0.23). Further we predict that this modulation in upright faces between age groups is not evident in inverted faces. We ran the same type of analysis on inverted faces and found no significant interaction between the factors age group and stimulus class (F(1,11) = 0.58, p = .49, mean square = .007) (Figure 1b, light red and light blue). There were no significant main effects for the factors age group (p = .58) and stimulus class (p = .54). In the next step we tested if inversion causes a significant change in the response latencies: We ran a mixed model ANOVA (with stimulus class, age group and stimulus manipulation (upright vs. inverted) as fixed factors and participants as random factor nested in age group) and found a significant interaction between the factors age group, stimulus class and stimulus manipulation (F(1,23) = 73.7, p < .001, mean square = .046) (Figure 1b,c). In addition, the factor stimulus manipulation showed a main effect (F(1,23) = 38.57, p < .01, mean square = .13; mean upright = .75, mean inverted = .60). To account for our hypothesis that YC show an increased FIE in chimpanzee as opposed to human faces and OC show an increased FIE in human as opposed to chimpanzee faces, we collapsed the response latencies for chimpanzee faces of YC and those for human face of OC and compared between stimulus manipulation (upright vs. inverted). A two-sample t-test showed a significant effect with a greater discrimination performance for upright than inverted faces (t(10) = 2.01, p < .05, standard deviation = .21; mean upright = .81, mean inverted = .58) (illustrated in Figure 1c,d). Using an iterative randomization procedure (see Methods) to account for the low sample size, we confirmed that a random effect can be excluded (CI 95%). In contrast, a comparison between discrimination performances for human faces of YC and those for chimpanzee faces of OC, compared between stimulus manipulations, did not show a significant deterioration due to inversion (t(10) = 0.59, p = .57, standard deviation = .17; mean upright = .68, mean inverted = .63) (illustrated in Figure 1c,d). Accordingly, an iterative randomization procedure confirmed that a random effect occurred with 60% likelihood. Further, based on the deterioration caused by inversion (performance scores uprightinverted faces) for each participant and stimulus class (see values in Figure 1c), we calculated a Face Inversion Species Index (FI species index, Figure 1e) by the ratio of deterioration for human faces as opposed to the deterioration for human and chimpanzee faces combined (Det_Human/(Det_Human + Det_Chimpanzee)) (Figure 1e). Values below .5 indicate stronger deterioration for chimpanzee as opposed to human faces; values above .5 indicate stronger deterioration for human as opposed to chimpanzee faces.

Figure 1
figure 1

Face discrimination task and modulation by inversion.

(a), Procedure. In each trial, a face picture of an individual (cue) was presented on the display, followed by an inter trial interval and a presentation of two face pictures (match, distractor). All faces were either upright or inverted. Chimpanzees indicated their choice by touching either the match or distractor picture (the pictures in this panel were taken by I.A.). (b), Proportion of correct responses. Performance scores (correct trials/number of trials) were average across age groups (YC, OC), stimulus classes (chimpanzee, human faces) and manipulation (upright, inverted). (c), (d), Deterioration of discrimination performances by inversion. c, Performance scores of upright faces were subtracted from the performance scores of inverted faces to determine the relative deterioration due to inversion for each participant and stimulus class. (e), Face Inversion Species Index (FI species index). The ratio of deterioration for human faces as opposed to the deterioration for human and chimpanzee faces combined are shown. Values below .5 indicate greater deterioration for chimpanzee faces, while values above .5 indicate greater deterioration for human faces.

Discussion

We found a FIE in chimpanzee participants, which is consistent with previous findings16,17,32. More importantly, the FIE effect in the current study was selective for the face class which the perceptual systems of our participants was tuned to46, confirming our hypothesis (Figure 1b–e). In YC the FIE was more pronounced for chimpanzee than human faces, while in OC the FIE was more pronounced in human than chimpanzee faces. We further found that the performance for inverted faces of the class which the perceptual system was tuned to dropped below the performance level of the face class which the perceptual system was not tuned to. This trend was found in five out of six chimpanzees (See Figure 1d, indicated by performance scores crossing the midline (dotted black line)). In other words, we here find a clear evidence of FIE in chimpanzees and confirm our hypothesis that face inversion affects configural processing of facial features, which is a processing mechanism predominantly applied to faces of expertise, i.e. those faces which the perceptual system has been tuned to due to early and/or late developmental processes46.

Our results are in accordance to human studies showing a negatively peaked event-related potential (ERP) at 170 ms after stimulus onset (N170) over occipital and temporal regions for upright face and a larger and later peaking N170 ERP component for inverted faces47,48. Importantly, this so-called N170 inversion effect is restricted to face classes of expertise: monkey faces48 and other-race faces49,50 do not elicit the same response. Further along the same line, effects of expertise for monkey faces have been tested with a group of expert primatologists, revealing an advantage for experts (as opposed to non-experts) in identifying monkey faces. However, experts were more affected by inversion of monkey faces than non-experts were, suggesting a processing of monkey faces in experts similar to that of human faces51.

A re-evaluation of studies investigating face inversion in non-human primates shows that by avoiding the methodological issues (i–iv) a consistent FIE can be found across species. In more detail, these studies examined face inversion by comparing two inverted faces (match-distractor pair) with an inverted cue stimulus23,52, avoiding the first drawback (i): Neiworth and colleagues52 showed a very selective effect of face inversion for specific type of faces, suggesting that configural processing is influenced by life experience, in accordance with previous studies46. Along the same line, using an oddity task with all inverted faces, in Capuchin monkeys (Cebus apella) upright presentations revealed better performances than inverted ones for capuchin and human faces, but not chimpanzee faces and automobiles in Capuchins41. Further, a passive viewing paradigm, avoiding all drawbacks, revealed a clearly distinctive eye tracking pattern for inverted as opposed to upright faces38, that is along the predictions derived from human studies53, which showed tendencies to look at eyes over other facial parts in upright faces, but diverse scanning patterns across facial parts in inverted faces. Studies showing no inversion effect in monkeys are the following:18,20,25,27,28,29,31,54. In more detail, these studies examined face inversion using line-drawing versions of faces (not a very naturalistic stimulus, see (iii))28, that might not be processed configurally or might facilitate part-based above configural processing strategies. Further, tamarin monkeys relied on external cues more than on internal features27, leading to part-based rather than configural processing (see (iii) above). The study by Wright and Roberts (1996)29 in our opinion shows FIE in Rhesus monkeys to some extent, considering differences between upright and inverted faces of both human and monkey in early recording sessions (see Figure 2 of29; standard errors not shown). For an unspecified reason the experimental protocol involved a training procedure for inverted faces, which caused a reduction of FIE with increasing experience (see (iv) above).

In addition, differences in the FIE in human and monkey faces in this study suggest that the amount of exposure, which the participants had to humans and monkeys in their lives, might influence the FIE. A more critical drawback – and maybe the main contributor to the inconsistent findings in monkeys – is the issue about presenting the cue stimulus in upright orientation and match-distractor stimuli in inverted orientations (see (i) above): The study by Parr and colleagues (1999)18 found that Rhesus monkeys performed significantly better on upright than inverted presentations of automobiles, Rhesus monkey and capuchin faces, but not human faces or abstract shapes. They drew the wrong conclusions that the inversion effect in Rhesus monkeys does not appear to be face-specific and should not be used as a marker of specialized face processing in this species. Given the design of the experiment (see (i) above), the effect reflects a general view-dependency55 that not surprisingly not only affects all types of faces but also object classes, such as cars, dogs, houses and to a lesser degree abstract shapes (due to the less complex nature of these shapes). Unfortunately, this misleading paradigm established16,17,18,19 and developed20 in the literature of face inversion in non-human primates.

Plentiful evidence has described what it is that makes face perception special: the configural information processing56. A plausible model for the coding of configural information is a relational account that explicitly represents precise spatial relationships among facial features2,9,10,14,57. However, an alternative account suggesting an implicit relational coding was proposed12, suggesting that configural sensitivity emerges at the level of face-selective neurons by combining inputs of neurons selective for complex features. This convergence would automatically lead to selectivity for the whole face. Interestingly, there is no qualitative difference in the mechanisms processing upright and inverted faces. Models dealing with the principle of overrepresentations exist58,59. With this conceptual background a more interesting question than what is special in face perception emerges: How does face perception become special? Participants with extensive training on Greebles started showing configural processing for upright exemplars, but not for inverted ones3. A feasible explanation is that over the time course of learning and exposure new feature detectors emerged or existing ones expand in size and complexity60,61, which binds parts of the objects. Along this line, our recent study showed that the face perception system remains plastic over a lifetime and adapts to the changes of exposure in the environment46: chimpanzees' discrimination performances was modulated by two distinct developmental processes in face perception. First, perceptual narrowing62,63, or the early developmental component, takes place very early in life and quickly and substantially shapes the perceptual system toward the class of conspecific faces. YC showed better discrimination performance for chimpanzee faces than OC. This might reflect the default tuning of the perceptual system toward chimpanzee faces by perceptual narrowing processes. Over the lifetime of exposure, however, perceptual learning processes64,65,66, or the late developmental component, influence the perceptual system and shape it slowly but continuously along the critical facial dimensions, reflected in a better discrimination performance for human faces in OC than YC given the specific exposure conditions for chimpanzees in captivity, i.e. an extensive exposure to human faces over the years, along with a constant exposure to a low and limited number of chimpanzee faces46. Importantly, perceptual learning seems to be a long-lasting constant learning process, see46 for a mathematical approximation, that even after 10 years of exposure to a novel face class (i.e. 9 or more years after perceptual narrowing occurred) (as in the YC), has not fully adapted to the more prominent and important face class (here human faces).

In the current study, we used chimpanzees to determine to what extent the tuning toward one and not the other class of faces influences the amount of discrimination deterioration due to inversion. The unique history of exposure of the chimpanzees at the Primate Research Institute67 helped to double-dissociate the discrimination performances of chimpanzees of two age groups given their distinctive facial tuning. We found a more pronounced FIE in YC for chimpanzee faces and in OC for human faces. This reflects the special face processing mechanism for the particular class of faces (or objects of expertise) which the perceptual system adapted to given the age-dependent history of face exposure and the sensitivity toward the early and late developmental components. Hence, FIE has to be evaluated under consideration of the – to some extent redundant – factors developmental stage, expertise and exposure history of the participant. One cannot expect FIE in all types of faces, nor for only the conspecific face class. Furthermore, a transition from part-based to configural processing mechanism has been shown to be gradually68, suggesting that for participants with a decent amount of training and exposure to a by default non-expert class will possibly show some indication of initial configural processing and hence FIE. A further possibility that might influence the FIE is the fact that depending on the similarity of the exemplars of the face class, which the perceptual system is tuned to and the novel or non-expert face class, some sort of transfer effect in terms of feature detectors might occur. In other words, the more similar two face classes are, the more they share the shape of facial features and the configuration among those features. Thus, Rhesus macaques might well be “experts” in faces of Japanese macaques26 due to the close similarity between them. In the above-mentioned study46, a simulation experiment shows that even for a non-expert face class the perceptual system is able to discriminate exemplars to some degree given the overlap of feature distributions of the expert and non-expert classes.

It is important to note that exposure alone might not be sufficient for a neuronal specialization on a particular class of faces. It has been shown that neural specialization requires learning at the individual level69,70. We here refer to the term exposure under the assumption that with increasing amount of exposure to a face class and more detailed interaction with individuals of that face class associated abilities like subordinate-level entry point71, individuation34 emerge. This is plausible to assume: a chimpanzee in captivity ought to learn to differentiate human individuals in order to adapt its actions toward each human individual to receive food, care, attention, etc. Here, however, we did not test these abilities.

Given these insights, we do not support the statement that there are species differences, referring to macaque monkeys as opposed to chimpanzees and humans, in the processing of configural information19. However, we cannot rule out this possibility entirely, just simply by the fact that we did not investigate monkeys. Nevertheless, we offered a set of additional variables, such as the developmental stage, expertise and exposure history, which might influence FIE in primates and have to be taken into account when evaluating FIE. In addition, we proposed a re-examination of the FIE in monkeys under consideration of these substantial improvements in the paradigm.

Methods

Participants

Six chimpanzees (Pan troglodytes; 1 male juvenile, 2 female juveniles and 3 female adults; YC: 10.8 +/− 0.17 (s.d.), OC: 30.8 +/− 3.82 (s.d.) years) from the Primate Research Institute of Kyoto University participated in this study. Chimpanzees are socially housed in a group of 6 or 7 individuals with access to an outdoor (770 m2) as well as indoor compounds. They participated in variety of computer-controlled tasks in the past72,73

Stimuli

We used grey-scale pictures of 16 chimpanzee and 16 human individuals. Two pictures were selected per individual taken at two different times. The stimuli were normalized for luminance and contrast and arranged in a canvas of 533 × 702 pixels, corresponding to approximately 10.7 × 14.25 degrees of visual angle at 40 cm distance. The same stimuli were used for the inverted testing condition. Inversion was 180 degrees or upside-down.

Apparatus

The chimpanzees at the Primate Research Institute of Kyoto University participated in pairs (mother and offspring) and worked at two touch screens independently. Face stimuli were presented at 17-inch LCD touch panel monitors (1280 × 1024 pixels) controlled by custom-written software using Visual Basic 2010 (Microsoft Corporation, Redmond, Washington, USA). The chimpanzees were in two adjacent experimental chambers (each approximately 2.5 m wide, 2.5 m deep, 2.1 m high). The chimpanzees were separated from the experimenter by transparent acrylic panels. The display was mounted into the acrylic panel. The distance between the display and the participants was around 40 cm. One degree of gaze angle corresponded to approximately 0.7 cm on the screen at a 40 cm viewing distance. Responses were given by touching the display surface with a finger. The display was protected from deterioration by a transparent acrylic panel fitted with an armhole (10 × 47 cm) allowing hand contact with the display. Below the display a food tray was installed in which pieces of food reward was delivered by a custom-designed feeder. Display and feeder were controlled by the Visual Basic program code.

Procedure

We used a delayed matching-to-sample procedure (DMS) (Figure 1a). The cue and the match stimuli were different faces images of the same individual. The match and the distractor pictures were horizontally separated by 20 mm. We counter-balanced the identities of faces as well as the positions of match and distractor across the whole sequence of trials. The sequence was divided into runs of 50 trials and alternated between runs of chimpanzee and human stimulus presentations. Upright and inverted presentation conditions were intermixed. In the inverted condition all stimuli (cue, match and distractor) were inverted. Each participant did eight runs for each stimulus class, leaving a total of 200 upright and 200 inverted trials.

Data analysis

The dependent variable was percent correct responses. We conducted analyses of variances among the participants using a mixed model ANOVA with stimulus class, age group and stimulus manipulation as fixed factors and participants as random factor nested in age group. Further, two-sampled t-tests for post-hoc comparisons were used. To account for the low number of participants, randomization procedures were run drawing values from independent Gaussian distributions with means and standard deviations of the original data sets and compared these values using two-sampled t-tests. We determined the 95% confidence intervals (CI) based on 1000 repetitions of this procedure for each comparison.

Ethics statement

All experiments were carried out in accordance with the 2002 version of the Guidelines for the Care and Use of Laboratory Primates by the Primate Research Institute, Kyoto University. The experimental protocol was approved by the Animal Welfare and Care Committee of the same institute.