Rapid Categorization of Human and Ape Faces in 9-Month-Old Infants Revealed by Fast Periodic Visual Stimulation

Peykarjou, Stefanie; Hoehl, Stefanie; Pauen, Sabina; Rossion, Bruno

doi:10.1038/s41598-017-12760-2

Download PDF

Article
Open access
Published: 02 October 2017

Rapid Categorization of Human and Ape Faces in 9-Month-Old Infants Revealed by Fast Periodic Visual Stimulation

Stefanie Peykarjou^1,2,
Stefanie Hoehl³,
Sabina Pauen¹ &
…
Bruno Rossion²

Scientific Reports volume 7, Article number: 12526 (2017) Cite this article

1899 Accesses
27 Citations
4 Altmetric
Metrics details

Subjects

Abstract

This study investigates categorization of human and ape faces in 9-month-olds using a Fast Periodic Visual Stimulation (FPVS) paradigm while measuring EEG. Categorization responses are elicited only if infants discriminate between different categories and generalize across exemplars within each category. In study 1, human or ape faces were presented as standard and deviant stimuli in upright and inverted trials. Upright ape faces presented among humans elicited strong categorization responses, whereas responses for upright human faces and for inverted ape faces were smaller. Deviant inverted human faces did not elicit categorization. Data were best explained by a model with main effects of species and orientation. However, variance of low-level image characteristics was higher for the ape than the human category. Variance was matched to replicate this finding in an independent sample (study 2). Both human and ape faces elicited categorization in upright and inverted conditions, but upright ape faces elicited the strongest responses. Again, data were best explained by a model of two main effects. These experiments demonstrate that 9-month-olds rapidly categorize faces, and unfamiliar faces presented among human faces elicit increased categorization responses. This likely reflects habituation for the familiar standard category, and stronger release for the unfamiliar category deviants.

Neural specialization to human faces at the age of 7 months

Article Open access 21 July 2022

Santeri Yrttiaho, Anneli Kylliäinen, … Mikko J. Peltola

A neural marker of the human face identity familiarity effect

Article Open access 28 September 2023

Xiaoqian Yan, Angélique Volfart & Bruno Rossion

Computational models of category-selective brain regions enable high-throughput tests of selectivity

Article Open access 20 September 2021

N. Apurva Ratan Murty, Pouya Bashivan, … Nancy Kanwisher

Introduction

One of the most important visual challenges faced by young infants is to detect other human beings in their environment. Infants are surrounded by other humans most of the time, and are attracted by human faces in particular: For about 25% of their awake time, infants gaze at human faces¹. Human faces form a homogeneous group of stimuli consisting of an oval shape with two eyes above a nose and a mouth. Given the high amount of exposure to faces and the homogeneity of exemplars of this category, it is not surprising that infants develop a categorical representation of faces from an early age². However, the degree of specificity of this representation, in particular whether it differs for human and similarly looking nonhuman primate faces, remains unknown. The current study investigates this issue by testing visual categorization of human and ape faces in 9-month-old infants.

Perceptual categorization of human faces has been documented with brain and behavioural measures in adults. Human faces activate specialized regions along the ventral visual pathway with a right hemispheric advantage^3,4,5, and elicit a right-lateralized face-sensitive event-related-potential (ERP) response peaking at ~170 ms, the N170⁶. It is increased in amplitude and latency for inverted faces^7,8,9. Human individual face recognition is characteristically impaired for faces belonging to unfamiliar face categories, such as other species^10,11 or other human groups, the “other-race” face effect¹² for review¹³.

Several studies have compared the N170 in response to human and ape faces^14,15,16. Carmel and Bentin¹⁴ observed shorter N170 peak latencies for human than ape faces. A similar effect was obtained by Itier and collegues¹⁶, who also observed that the inversion effect was more pronounced for human faces in latency and absent for ape faces in amplitude. Another study found smaller amplitude for human than monkey faces, and an inversion effect that was restricted to human faces¹⁵. The characteristics of the N170 for faces of different species have thus not been consistent across studies. Moreover, the N170 component is not present in infants, but two ERP components are considered as its precursors, the N290 and P400^17,18. These components differ from the N170 in timing, scalp distribution, polarity (in case of the P400), and partly in response properties. This makes it difficult to predict how the species of faces may be reflected in infants’ electrophysiological responses compared to adults’.

Processing of human and ape faces has been compared repeatedly during the first year of life. Newborns do not show a preference for human or ape faces, but a preference for upright faces irrespective of species¹⁹. Whereas young infants discriminate individual ape faces similarly to human faces, from 6 to 9 months of age individuation of ape faces declines^20,21 for similar results obtained with sheep faces, see²². Experience in individuating ape faces helps infants to maintain their ability to discriminate them at 9 months²¹. When older infants are given more time to process the faces, discrimination of unfamiliar face categories is still possible²³.

Evidence for common categorization of human and ape faces (i.e., jointly forming the category of primate faces), as well as distinct categorization (i.e., human vs. ape faces as different sub-categories of primate faces) has recently been obtained in 9-month-old infants²⁴. In this study, broad categorical repetition effects (face/non-face) were observed on the level of the early visual P1 component, which was elicited with increased amplitude and decreased latency for all faces following house fronts compared to faces. In addition, a species-specific repetition effect was observed on the level of the N290: N290 amplitude and latency were enhanced for human targets following ape face adaptors, whereas amplitude and latency were decreased for ape targets following human face adaptors. This was taken to indicate that the N290 reflects activation of basic-level representations. In another line of research, the two potentially face-sensitive infant ERP components, N290 and P400, were compared for human and monkey faces^15,25. In all age groups tested (3-, 6-, and 12-month-olds), processing differences between the two face categories were observed, but they were not consistent across age-groups. A human face-specific increase in N290 amplitude for inverted faces has been obtained only in 12-month-olds²⁵.

Several challenges make it difficult to draw conclusions from infant ERP studies measuring average responses to human and ape faces ^15,25. First, these studies suffered from relatively high drop-out rates of 63–81%, which raises the question whether their results can be generalized. Second, human and monkey faces were presented in a between-subjects design so that every infant viewed only faces from one species. Therefore, infants were not required to categorize faces at all. Third, processing differences between the different face species were observed at every age tested. One may wonder whether such differences truly reflect perceptual categorization. For instance, it has been suggested that the human face-specific inversion effect on N290 amplitude in 12-month-olds reveals expert face processing¹⁷, but the inversion effect is no indication for categorical perception. To clearly demonstrate perceptual categorization, a paradigm is required that tests both discrimination between exemplars belonging to different categories, and generalization across exemplars belonging to the same category.

In addition, expert perceptual categorization requires fast and automatic processing ²⁶, for,review. In adults, categorization is very rapid: Broad categorization as animal/no animal takes place within 150 ms^27,28, at about the same time as the onset of the N170. This ERP component reliably differentiates faces and various animal and object categories^6,29, forareview, see³⁰. Concrete e.g., “face”, “car”, “dog”^31,32; and abstract e.g., “living”, “non-living”^33,34; categorization can even take place after having viewed an image for less than 50 ms. Moreover, face perception seems to be mandatory, that is, faces cannot be ignored even if it is required by the task^35,36,37, and face subcategory (e.g., gender) judgements are not impaired by reduced attention³⁸. Thus, it seems that face categorization occurs effortlessly in adults.

Recently, categorization in this sense (a rapid, automatic response including both discrimination and generalization) of human faces from many non-face visual objects has been demonstrated in adults with Fast Periodic Visual Stimulation (FPVS)³⁹ while measuring electroencephalography (EEG). In this paradigm, highly heterogeneous images of human faces were periodically presented between diverse images of different biological and non-biological objects including animals. In 4–6-monht-old infants, human faces elicited a strong right-lateralized occipito-temporal categorization response². Similar to adults, this response was driven by high-level representations, as it was not found for phase-scrambled images.

To evaluate whether infants have developed perceptual categories for human and ape faces, and to overcome limitations of previous ERP studies, we used a similar FPVS paradigm in the present study. FPVS has several advantages compared to standard ERP measures: (1) FPVS has a high signal-to-noise ratio, requiring short looking times so that only few trials are needed and few participants need to be excluded; (2) the different stimulus categories are embedded within one sequence and a categorization response will only be elicited if all (or most) exemplars are categorized, (3) and the categorization response can be defined and quantified objectively.

Here, we tested 9-month-old infants with sequences of human or ape faces as standard stimuli in which the respective other category was presented periodically as every 5^th image. At 9 months, behavioural work has demonstrated that individuation of ape faces has declined²⁰ and ERP work has indicated that the two categories are discriminated when stimuli are presented in an upright position²⁴. Accordingly, we predicted that 9-month-olds show a categorization response when presented with upright human versus ape faces. Whether categorization is similar for the two categories is an open question: On the one hand, both human and ape deviant conditions require categorization of human and ape faces, making it likely that infants will show similar responses to the two conditions. On the other hand, extensive experience with processing of human faces might support the process of activating an already existing categorical representation, thereby increasing novelty responses to ape face deviants.

Moreover, this study explored the contribution of low-level image characteristics to face species categorization. If these cues were fully sufficient to discriminate both face categories, we would expect similar categorization performance in upright and inverted conditions because low-level cues are identical in both cases. However, if categorization were based on higher-level visual representations and previous real-world experience, infants should show a stronger categorization response when looking at stimuli presented upright than at faces presented in an inverted orientation.

The FPVS paradigm allows us to determine categorization performance not only at the group level but also at the level of individual infants. Study 1 provides an initial investigation of rapid processing of upright and inverted human and ape faces at 9 months of age. Based on this pilot study we then optimize the stimulus set and specify hypotheses to test with an independent sample in study 2. Findings of both studies provide the basis for our conclusions.

Study 1

Material and Methods

Participants

Twenty-two 9-month-old infants were tested (10 female, mean age = 9 months, 12 days, SD = 9 days). In accordance with the terms provided by the local ethics committee of Heidelberg University, verbal informed consent was obtained from their caretakers. Two additional infants were tested but excluded (one due to excessive crying, one due to insufficient data quality). All methods were carried out in accordance with relevant guidelines and regulations. The general procedure has been approved by the local ethics committee of Heidelberg University.

Stimuli/Presentation

Infants were presented with sequences of human and ape faces. Images were displayed in upright and inverted orientations in subsequent trials. The presentation was similar to recent studies employing the FPVS technique^2,40,41. Fifteen images each of human and ape faces were presented. Human face images were taken from standard face databases^42,43, whereas ape face images were collected through google search. All faces showed a neutral emotional expression, were presented in full-frontal view, and cropped to an oval shape excluding outer facial contours and, in the case of human faces, hair. Cropping was performed in order to increase infants’ focus on inner facial features and to ensure that categorization was not based on facial contours varying between species (e.g., the transition from smooth skin to hair in humans versus the continuous presence of facial hair in apes). Mean luminance was equalized across categories.

Images were displayed on a light grey background. Infants sat at a looking distance of 60 cm, and pixel size was 550 (width) × 607 (height), corresponding to approximately 12 × 15 degrees of visual angle. Images changed size (+/−10%) at every stimulation cycle. MATLAB 7.8 (The Mathworks) with PsychToolbox (http://psychtoolbox.org/) was used for stimulus display. Stimulus sequences were presented at a fixed rate of 6.03 cycles per second (F = 6.03 Hz; base stimulation frequency) through sinusoidal contrast modulation⁴⁴. Each cycle lasted 166 ms (i.e., 1000 ms/6.033). Trials started with a uniform grey background from which an image appeared as contrast increased. The stimulation was gradually faded in by progressively increasing the modulation depth from 0% to 100% maximum contrast level (and faded out vice versa). Each stimulus reached full contrast at 83 ms, then contrast was decreased at the same rate. At fixed intervals of every 5th image, a stimulus from the other category was introduced, creating a trial sequence containing category changes at a frequency of 1.21 Hz (6.03 Hz/5; i.e., A = Ape; H = Human: HHHHAHHHHA..…). EEG amplitude at this frequency (F/5 = 1.21 Hz) and its harmonics (i.e., 2 F/5 = 2.41 Hz, 3 F/5 = 3.62 Hz…) was used as an index of the visual system’s categorization of face species⁴⁵. The schematic stimulation course is illustrated in Fig. 1.

Four types of trials were presented: ape face deviant (with human face standard), human face deviant (with ape face standard), and likewise versions of these trials with pictures inverted. For half the sample, human faces served as standard, for half the sample it was vice versa. Stimulus orientation was varied within-subject across trials (four consecutive trials upright, then four trials inverted, four upright, four inverted). As we were primarily interested in processing of upright faces, corresponding trials were always presented first to increase the number of trials available. Stimulus order was randomized for each trial with the exception that no stimulus could be repeated immediately. Between trials, short breaks were provided if needed. Overall, testing took about 10 minutes.

Procedure

Infants were seated at a looking distance of approx. 60 cm from the computer screen on their caregiver’s lap. Each trial consisted of a blank screen (random, min. 5 seconds), a 2-second fade-in, a stimulation sequence for 20 seconds, and a fade-out of 2 seconds. Stimulus fade-in and fade-out were introduced to avoid surprise reactions, abrupt eye-movements or blinks.

Triggers were sent via parallel port at the start of the each sequence and at the minimum of each cycle (grey background, 0% contrast). Trigger accuracy was registered by a photodiode located in the upper left corner of the monitor. During the entire stimulation, looking-behavior was video-taped and coded offline. Trials were initiated manually when participants looked attentively at the screen and showed an artifact-free EEG signal.

EEG Recordings and Analyses

EEG measures were obtained applying a BrainProducts actiCap (Gilching, Germany) with 32 active Ag-AgCl electrodes arranged according to the 10-10-system and a right mastoid reference. Sampling rate was set at 250 Hz and the EEG signal was amplified via a BrainAmp amplifier. Impedances were considered acceptable if < 20 kΩ. Recordings were acquired in a dimly-lit and quiet room.

EEG Preprocessing. All EEG processing steps were carried out using Letswave (http://nocions.webnode.com/letswave) and Matlab 2012b (The Mathworks) and followed the procedure described in several recent studies^40,46. EEG data was first band-pass filtered at 0.1–100 Hz using a Butterworth filter with a slope of 24 dB/octet. Filtered data was then segmented 2 seconds before and after the sequence, resulting in 28-second segments (−2 s–26 s). Next, noisy channels were identified and pooled from surrounding channels (for a maximum of 2 channels) and a common average reference computation was applied to all channels.

Frequency domain analysis. Preprocessed data segments were cropped to an integer number of 6.03 Hz cycles beginning 2 seconds after onset of the trial until approximately 20 seconds, just before the stimulus fade-out (120 cycles, 4973 time bins in total ≈ 19.892 s). The first two seconds of each trial were excluded to avoid any contamination by the initial transient responses. For each condition, trials were averaged in the time-domain for every individual participant. Averaging was performed to increase the signal-to-noise ratio (SNR) by reducing EEG activities non-phase-locked to the stimulus. Then a Fast Fourier Transform (FFT) was applied to these averaged segments to extract amplitude spectra for all channels (square root of sum of squares of the real and imaginary parts divided by the number of data points). Frequency analysis yielded spectra with a high frequency resolution of 0.0503 Hz (1/19.892 s).

To measure the magnitude of activity at pre-defined bins of interest, baseline corrected amplitudes were computed by subtracting the average amplitude of 12 surrounding bins (6 on each side, excluding the immediately adjacent bins) from every frequency bin^40,46. For the base rate response, only occipital channels (O1, O2, Oz) were considered, for the categorization response, all occipito-temporal channels (P7, P8, PO9, PO10, O1, O2, Oz) were considered. Z-scores were calculated as the difference between amplitude at the frequency of interest and mean amplitude of 12 surrounding bins divided by the standard deviation of the 12 surrounding bins⁴¹. Threshold of significance was placed at Z-score 1.64 (p < 0.05, one-tailed). SNRs were computed by dividing the signal by the amplitude at the 12 neighboring frequency bins. Note that in the current study, 12 rather than 20 bins as in previous studies^40,41 were used to estimate noise variance. Due to shorter recording time in infants compared to adults (26 versus 66 second trials), the frequency resolution in this study is lower than in previous reports. In order to avoid including low parts of the spectrum that are inherently contaminated by higher levels of biological noise, the number of bins for noise variance estimation was reduced.

Only trials with a significant response at the base frequency (6.03 and/or its harmonic 12.07) were used. On average, participants viewed 10 trials (M = 10.41; SD = 2.8), of which one trial (M = 1.36; SD = 1.7) was excluded due to a non-significant base rate response. There was no difference in the number of trials in the human (M = 10.4; SD = 3.4) and ape conditions (M = 10.4; SD = 2.1; p > 0.05), but participants saw more upright (M = 6.2; SD = 1.8) than inverted trials (M = 4.2; SD = 1.2; p < 0.001). To ensure that results could not be explained by differences in trial numbers, additional analyses were performed using a matched number of upright and inverted trials (trials from the upright condition randomly excluded). The results pattern conformed to the analyses on all trials. Additionally, trials were selected based on looking time, which was coded offline from the video. 20% of trials were double-coded, with an intraclass correlation (ICC) coefficient of 0.98. When using only trials in which looking time was > 50%, the results pattern was similar to the main analyses.

Statistical analyses were performed using baseline corrected amplitudes (summed up to the highest consecutively significant harmonic⁴⁶). For the categorization response, 1.21 Hz and harmonics were summed up to the 11^th harmonic, but excluding the 5^th and 10^th harmonics which correspond to the base frequency. For the base stimulation response, 6.03 Hz and harmonics were summed up to the 6^th harmonic. Channels of interest were defined based on scalp topographies: P7, P8, PO9, PO10, O1, O2, Oz for the categorization response and O1, O2, Oz for the base response.

Z-scores were calculated to determine whether a significant response was obtained in each condition after summing across harmonics. Conditions were compared using baseline corrected amplitudes in a JZS Bayes factor repeated measurement analysis of variance (rmANOVA) with default prior scales^47,48. Factors were species (2: human deviant, ape deviant) * orientation (2: upright, inverted). Preliminary analyses indicated that there was no main effect or interaction with electrode, so an average of all seven electrodes (categorization response) or three electrodes (base response) was calculated and used in the statistical analyses. The Bayes factor rmANOVA provides a more conservative test than the standard rmANOVA and estimates probability for models based on the null and alternative hypotheses.

We hypothesized that upright images would elicit stronger categorization responses than inverted images. We did not have strong predictions regarding categorization differences between human and ape deviants, as both conditions require categorization of human and ape faces. However, extensive experience with human faces may enhance adaptation for human standards, which would lead to stronger categorization responses for ape deviants.

Results

Categorization Response

The categorization response (response at 1.21 Hz and harmonics) was observable in the grand-averaged data when upright ape faces were presented as deviant stimuli among human faces (SNR 1.37, Z > 3.11, p < 0.01; see Fig. 2 and Table 1). It was spread over occipital channels, with a slight right-hemispheric advantage. When looking at single infants, a significant response was obtained in six out of 11 infants in that condition (Zs > 3.11, ps < 0.001). There also was a categorization response for upright human deviant faces (SNR 1.08, Z > 2.33, p < 0.05) and inverted ape deviants (SNR 1.20, Z > 3.11, p < 0.01). In analyses of individual responses, a categorization response was observed for inverted ape among human faces in six of 11 infants (Zs > 2.33, ps < 0.01), and for upright human among ape faces in seven of 11 infants (Zs > 1.64, p < 0.05). No categorization response was observed for inverted human deviant faces on grand-averaged data (p > 0.05), but one infant among 11 showed a categorization response for inverted human faces among ape faces (Z > 1.64, p < 0.05).

Table 1 Baseline corrected amplitude (bca) means and standard deviations (SD), Z-score and signal-to-noise ratio (SNR) ranges for individual categorization and base rate responses in experiment 1.

Full size table

The Bayes rmANOVA revealed that the model with a main effect of orientation was preferred to the null model by a Bayes factor of 2.31. This provides marginal evidence⁴⁹, Appendix B for the hypothesis that categorization responses were stronger for upright images irrespective of species (upright M = 3.08 µV; SD = 4.2; inverted M = 0.78 µV; SD = 3.3). Moreover, the model with two main effects (species and orientation) was preferred to the null model by a Bayes factor of 3.07, providing moderate evidence that categorization responses differed between upright and inverted conditions and between human and ape deviants (ape face deviants M = 3.15 µV; SD = 4.4; human face deviants M = 0.72 µV; SD = 3.0). The difference between the model with a main effect of orientation and the one with main effects of species and orientation was only marginal (Bayes JZS = 0.75) but, went in favor of the model with two main effects. The model with two main factors was also marginally preferred over a model with the main factor species (Bayes JZS = 2.27) and over a model with two main factors and an interaction term (Bayes JZS = 2.19).

Base Response

A strong response to the base visual stimulation was observed in all conditions (all SNRs > 2.1, all Zs > 10, see Table 2). It was centered on channel Oz and spread over O1 and O2. This response was significant in nine of 11 infants for upright ape faces (Zs > 3.11 ps < 0.001), in eight of 11 infants for inverted ape faces (Zs > 3.11, ps < 0.001), in all 11 infants for upright human faces (Zs > 1.64, ps < 0.05), and in nine of 11 infants for inverted human faces (Zs > 2.33, ps < 0.01).

Table 2 Baseline corrected amplitude (bca) means and standard deviations (SD), Z-score and signal-to-noise ratio (SNR) ranges for individual categorization and base rate responses in experiment 2.

Full size table

The Bayes rmANOVA confirmed that there were no differences between conditions (JZS Bayes factors < 1 > 0.3).

Discussion

In study 1, we explored 9-month-old infants’ rapid categorization of human and ape faces. As a group, infants showed a strong categorization response for upright ape faces presented among human faces, which was spread over the occipital cortex. Moreover, this response reached significant threshold in individual averages of six out of 11 infants. Categorization was also observed for upright human face deviants and inverted ape face deviants. Categorization responses best fit a model with main factors of species and orientation, indicating that categorization of ape faces and upright images was stronger than of human faces and inverted images. Thus, this study reveals that 9-month-old infants’ face species categorization relies on high-level visual perception and goes beyond mere perception of low-level image characteristics.

Moreover, this initial exploration of infant face categorization revealed an asymmetry, with stronger categorization responses for deviant ape faces. Before we can turn to discussing high-level explanations for this finding, however, low-level confounds should be ruled out. The asymmetry cannot be explained by a general difference of attention in human and ape standard trials. This was verified using two measures: (1) The response to the base stimulation frequency (6.03 Hz) did not differ between human and ape standard trials. (2) Video-coding confirmed that infants looked equally long at human (M = 16.11, SD = 5) and ape (M = 15.49 s, SD = 3.7) standard trials (p > 0.6). Therefore, we have no indication for differential attention to trials with different standard categories.

Likewise, categorization of ape from human faces cannot be attributed to low-level image characteristics, as inverting faces reduced categorization overall. Interestingly though, the categorization asymmetry was observed in inverted trials as well. Moreover, regarding individual infants’ responses, six infants showed a significant categorization response for rarely presented inverted ape faces, whereas only one infant categorized rarely presented inverted human faces. This raises the question whether some low-level cues may have biased infants to categorize ape, but not human faces. Visual examination of our images indicated that the heterogeneity of ape faces was larger than that of human faces. Whereas human faces were taken from face databases, ape faces were collected from free images via google search, and were thus more likely to vary. We extracted luminance and size values and statistical analyses confirmed that the standard deviations of both measures were larger for ape than human faces, while there was no difference in mean luminance and size. The larger variability of ape faces may have contributed to the asymmetrical categorization observed here: It might have been more difficult for infants to form a category of ape faces from which human faces could be distinguished. In comparison, detecting ape faces among the more homogeneous group of human faces might have been easier.

Therefore, we edited the images and matched the heterogeneity of face categories to examine categorization of those controlled stimuli in study 2. We based our hypotheses on study 1 and thus expected best model fit for a model with two main factors, orientation and species, reflecting stronger categorization responses for ape face deviants and upright conditions. These a priori hypotheses were evaluated using a rmANOVA. Thus, study 2 provides a test whether similar categorization patterns as in study 1 will be observed in an independent sample with controlled images.