The discrimination of facial sex in developmental prosopagnosia

Developmental prosopagnosia (DP) is a neurodevelopmental condition characterised by difficulties recognising and discriminating faces. It is currently unclear whether the perceptual impairments seen in DP are restricted to identity information, or also affect the perception of other facial characteristics. To address this question, we compared the performance of 17 DPs and matched controls on two sensitive sex categorisation tasks. First, in a morph categorisation task, participants made binary decisions about faces drawn from a morph continuum that blended incrementally an average male face and an average female face. We found that judgement precision was significantly lower in the DPs than in the typical controls. Second, we used a sex discrimination task, where female or male facial identities were blended with an androgynous average face. We manipulated the relative weighting of each facial identity and the androgynous average to create four levels of signal strength. We found that DPs were significantly less sensitive than controls at each level of difficulty. Together, these results suggest that the visual processing difficulties in DP extend beyond the extraction of facial identity and affects the extraction of other facial characteristics. Deficits of facial sex categorisation accord with an apperceptive characterisation of DP.

more ambiguous, Biotti and Cook 32 found that a sample of DPs (N = 15) showed significantly poorer facial emotion categorisation precision than matched controls.
The present study sought to determine whether the perceptual problems seen in DP extend to another non-identity attribute -judgements about facial sex. Here again, the existing experimental findings on sex recognition in DP paint an inconsistent picture. Studies of individual cases suggest that discrimination of facial sex may be impaired in some cases of DP 2,33-36 , but not in others 29,[37][38][39] . Group studies have also yielded equivocal results; some authors have described similar levels of performance in DPs and typical observers 29,[40][41][42] , while others have described weak differences at the group level 43 .
The present study investigated facial sex discrimination in a group of DPs using sensitive psychophysical tasks. In both tasks, external cues to sex were eliminated; participants were forced to use the internal structure of a briefly presented face to make binary 'male'/'female' categorisation decisions. In a morph categorisation task, participants made binary decisions about faces drawn from a morph continuum that blended incrementally an average male face and an average female face. In a sex discrimination task, participants made binary decisions about female or male facial identities that had been blended with an androgynous average face. To vary task difficulty, we manipulated the relative weighting of each facial identity and the androgynous average to create four levels of signal strength. Relative to typical controls, we found that DPs were i) less precise in their categorisations of facial sex (morph categorisation task), and ii) less sensitive to the sex signal presented in faces at all levels of task difficulty (sex discrimination task).

Methods
Diagnostic testing. Seventeen adults with DP (7 males; M age = 45.9 years, SD age = 11.8 years) were recruited through www.troublewithfaces.org. All reported life-long face recognition difficulties in the absence of brain injury or psychiatric disorder (e.g. autism, schizophrenia). Diagnostic decisions were based on participants' scores on the Twenty-Item Prosopagnosia Index (PI20) 44,45 and the Cambridge Face Memory Test (CFMT) 22 . We also report the DPs' performance on the Cambridge Face Perception Test (CFPT) 11 to index their face encoding ability, and the Cambridge Car Memory Test (CCMT) 46 , a measure of non-face object recognition ability. Diagnostic information for each DP is provided in Table 1. Ethical approval was granted by the University of Reading's ethics committee. The study was conducted in line with the ethical guidelines provided by the 6th (2008) Declaration of Helsinki. All participants provided informed consent and were fully debriefed after the experimental procedure.
Morph categorisation task. In this task, a central fixation cross (250 ms) was followed by a single face presented centrally (250 ms), followed by a high-contrast mask (250 ms). Each stimulus image was drawn from a morph-continuum that blended an average male face with an average female. Facial-sex cues ranged in intensity from 20% female to 80% female in 10% increments (see Fig. 1a). Participants were asked to categorise the stimulus image presented as either male or female. We used participants' responses to model psychometric functions that plotted how their response likelihoods varied as a function of the sex signal contained in the face. Stimulus images were presented in greyscale and subtended 7.9° × 5.9° of visual angle when viewed at a distance of 60 cm. Hair and jaw lines were occluded by an elliptical window with a Gaussian blur at the edges. Each of the seven morphs were presented 20 times, in a random order, resulting in 140 trials.
The average male face and average female face used to create the morph continuum were themselves created by morphing together 16 males and 16 females, respectively. The original images were downloaded from the Chicago Face Database (CFD) 47 , and morphed using the method described by Adams, Gray, Garner and Graf 48 .
The performance of the DPs was compared with twenty typically developed (TD) controls recruited from the local subject pool (5 males; M age = 41.2 years, SD age = 8. Sex discrimination task. A central fixation cross (250 ms) was followed by a single face presented centrally (250 ms) drawn from a set of 40 to-be-judged identities (20 male, 20 female). A high-contrast mask was presented for 250 ms immediately after stimulus offset. Participants were asked to categorise each face as male or female. We measured participants' ability to discriminate male and female faces using signal detection theory 49 .
To make the task more challenging we diluted the sex signal in each stimulus image by morphing the facial identity with an androgynous average face 48 . The androgynous face was constructed by morphing together all 40 of the to-be-judged facial identities. To produce four levels of task difficulty, we varied the relative weighting of each facial identity and the androgynous average (Fig. 1b). In the most difficult condition, stimulus images contained only 20% of the original facial identity; in the least difficult condition, stimulus images contained 50% of the original facial identity.
The original facial images were sourced from the NimStim face set 50 and the CFD 47 . Images were presented in greyscale and subtended 7.3° × 5.4° of visual angle when viewed at 60 cm. Each facial identity was presented four times at each of the morph levels in a random order resulting in 640 trials (40 facial identities × 4 morph levels × 4 repetitions). Both experiments were programmed in MATLAB, using the Psychophysics Toolbox extensions 51,52 .
The performance of the DPs was compared with twenty TD controls recruited from the local subject pool (5 males; M age = 43.5 years, SD age = 7.5 years). Twelve out of twenty TDs from this sample also completed the sex categorisation task. Neither participant age [t(35) = .76, p = .452] nor proportion of males [X 2 (1) = 1.10, p = 0.295] Figure 1. (a) Morphed stimuli used in the morph categorisation task. One male and female model were morphed along a continuum, ranging from 20% (80% male) to 80% female (20% male) in 10% increments. (b) Stimuli used in the sex discrimination task. Forty models were morphed together to create an androgynous average face. Each model was then morphed with the androgynous face to create four weighted morphs (20%, 30%, 40% and 50% identity/sex signal) along a continuum for each model.

Results
Morph categorisation task. We used each participants' binary choice responses to estimate their psychometric function by fitting Cumulative Gaussian functions using the Palamedes toolbox 53 . We were primarily interested in the slope of participants' psychometric function, quantified using the reciprocal of the standard deviation of the symmetric Gaussian distribution underlying each cumulative Gaussian function. Higher slope values (steeper slopes) indicate that observers can perceive subtle differences between the stimuli and vary their responses accordingly. Slope values and the point of subjective equivalence (PSE) for DPs and TD controls were compared using independent samples t-tests (two-tailed) (α = 0.05).
While the groups did not differ significantly, the DP group contained a slightly higher proportion of male observers than the group of typical controls. To confirm that observer gender had no influence on our results, we re-ran the foregoing analyses in an ANOVA with observer gender as an additional between-subjects factor. None of the main effects or interactions with observer gender approached significance for either the slope (all F's < 0.1, all p's > 0.75) or PSE analyses (all F's < 2.0, all p's > 0.17).

Sex discrimination task.
For the purpose of the signal detection analysis, a female response to a female stimulus was regarded as a hit, whereas a female response to a male stimulus was regarded as a false alarm. We derived measures of discrimination sensitivity (d′ = z[hit rate] -z[false alarm rate]) and bias (c = −0.5(z[hit rate] + z[false alarm rate])) at each level of task difficulty. When hit or false alarm rates were 1 or 0 (respectively), total hits or false alarms were adjusted by 0.5 before calculating d′.
To assess sensitivity, d' values were submitted to ANOVA (α = 0.05) with Difficulty (20%, 30%, 40%, 50%) as a within-participants factor and Group (TD, DP) as a between-participants factor (Fig. 2b) Once again, we re-ran the foregoing analyses with observer gender as an additional between-subjects factor. None of the main effects or interactions with observer gender approached significance for either the sensitivity (all F's < 1.60, all p's > 0. 19) or bias analyses (all F's < 1.50, all p's > 0.20). Z-scores for each DP on both tasks are presented on Table S1 in the Supplementary Materials.

Discussion
The present study aimed to determine whether facial sex discrimination is impaired in DP. In our morph categorisation task, we found that judgment precision, inferred from the slope of individual psychometric functions, was significantly lower for DPs relative to controls. We replicated and extended this finding using a sex discrimination task. We found that DPs were significantly less sensitive to facial sex cues compared to controls at all levels of task difficulty. Despite their impairments, DPs did not differ significantly from controls in terms of categorisation threshold (PSE) or response bias (C).
Previous studies of facial sex categorisation in DP have provided equivocal results. Several studies have found no difference in task performance between DPs and typical controls 29,41,42 . To date, only one study has reported evidence of atypical perception of facial sex in DP at the group level 43 . In their study, Esins et al. 43 compared 16 DPs and 21 controls, and whilst both groups were approaching ceiling (controls: 91.5% correct; DPs: 84.4% correct), DPs were significantly impaired when making binary classifications of facial sex.
We speculate that methodological issues may have prevented earlier studies from detecting the clear, widespread differences seen here. In previous studies, participants were asked to categorise the sex of unmodified facial images 29,41,42 . This may have let DPs use trivial pictorial cues, such as the presence of make-up or facial hair -and in some cases, actors' hair line 29 -to achieve broadly typical levels of categorisation accuracy. In several studies, stimuli were also held on screen until response execution 39,42 , thereby permitting DPs to detect pictorial cues and develop other heuristics. Our use of image-morphing to vary the strength of sexually dimorphic signals, together with brief presentation durations, ensured that our psychophysical tasks taxed the structural encoding processes thought to be aberrant in DP.
The widespread sex discrimination deficits seen here provide compelling support for the apperceptive account of DP. Our tasks had little or no memory component -participants had to make a simple binary-choice decision about a single stimulus. Unlike sequential matching or delayed match-to-sample tasks, there was no need to hold the target face in memory; rather, participants were free to determine their response at any point after stimulus onset. The impairments seen here are therefore hard to explain in terms of a mnemonic deficit. Instead, the fact that DPs were impaired on our sex categorisation tasks strongly suggests a problem forming structural descriptions of faces. Leading models of face processing propose that a common structural description, formed early in the face processing stream, informs subsequent processing of identity, expression, age, and sex, mediated by largely independent, divergent pathways 25,26 . Subtle deficits of facial sex, and facial emotion perception 33 , together with problems sorting simultaneously presented faces by similarity 24 , suggest that this early structural description may be impoverished in DP.
The nature of the apperceptive deficit seen in DP remains currently unclear. According to one influential account, a failure to process faces holistically -whereby facial features are integrated into a non-decomposable whole 54-56 -may underlie the face recognition deficits seen in DP 31,57-60 . However, recent evidence suggests that most individuals with DP show typical susceptibility to the composite face illusion 18,42,43,61,62 , thought to be a key marker of holistic face processing 63,64 . Instead, we speculate that DPs have an apperceptive problem that affects local feature descriptions. Consistent with this possibility, many DPs are unable to accurately discriminate local face regions 2,32,60 . Similarly, variability in local feature encoding ability contributes substantially to the individual differences in face recognition seen in the typical population 65 .
It has recently been suggested that individual differences in face recognition ability may reflect different patterns of gaze fixations employed by observers. For example, some DPs appear to spend less time examining the internal feature of the face, in particular the eyes 66,67 . Conversely, so-called 'super-recognizers' -people with exceptional face recognition ability 68 -spend more time inspecting the nose region than typical observers 66 . While these interesting differences warrant research attention in future, we believe it is unlikely that atypical gaze behaviour is driving the facial sex categorisation impairment seen here. In both tasks we employed a central fixation cross and brief stimulus presentation (250 ms) in order to limit variability in fixation behaviour as far as possible.
The principle line of evidence in favour of the mnemonic account of DP is the fact that some DPs perform worse on tests of face perception that have a memory component, than on tests that have little or no memory demand, notably the CFMT and CFPT, respectively 15,16,20 . It has recently been argued, however, that these apparent dissociations should be treated with caution 23 . In particular, these tasks are used differently in the diagnosis of DP. In many labs, a poor score on the CFMT is deemed necessary for a diagnosis of DP; i.e., people who describe face recognition problems outside the lab are excluded from DP research if their CFMT score does not fall at least two SDs below the typical mean. In contrast, the CFPT is treated as a descriptive measure, administered to assess differences in perceptual encoding ability. Thus, whereas DPs are preselected based on their low CFMT scores, their CFPT scores are free to vary. For example, only 9 out of 17 DPs in this sample performed more than two SDs below the normative mean on the CFPT. Given this treatment -akin to 'double-dipping' -it is not at all surprising that some DPs appear to perform worse on the CFMT than the CFPT 3 . It should also be noted that the psychometric properties of the CFPT are poorer than those of the CFMT, and that the simultaneous presentation of the faces in the CFPT may make it easier to employ compensatory strategies (e.g., detection of trivial pictorial cues).
We have argued that subtle deficits of facial sex categorisation may be widespread in DP, consistent with an apperceptive impairment. If this is the case, why do DPs rarely report problems making sex judgements in their Scientific RepoRtS | (2019) 9:19079 | https://doi.org/10.1038/s41598-019-55569-x www.nature.com/scientificreports www.nature.com/scientificreports/ daily lives 37,69 ? We believe this apparent discrepancy reflects the different demands posed by sex categorisation outside of the lab and in contrived lab tasks. Outside of the lab, we rarely encounter people who seek to obscure their sex; on the contrary, individuals frequently elect to accentuate sexually dimorphic face signals through the use of make-up, personal grooming, and cosmetic surgery. Moreover, a wealth of non-face cues (e.g., facial hair, hair styles, clothes, use of make-up, body shape, voice) can be used to judge the sex of the faces we encounter outside the lab. Only when these cues are systematically removed in contrived lab-based studies, are individuals forced to base their sex categorisation judgements on facial structure. It is only under these conditions that DPs are disadvantaged relative to TD controls.
In summary, this study aimed to determine whether DPs exhibit deficits of facial sex discrimination. Our two complementary psychophysical procedures produced converging results: when judging closely controlled stimuli in which the strength of the sex signals were manipulated systematically, DPs were less able to categorise facial sex than matched TD controls. While these subtle deficits may have little impact on the day-to-day interactions of individuals with DP, they have considerable theoretical significance. Crucially, they imply that the condition has a locus of impairment early in the face processing stream, that hinders the ability of individuals to form structural descriptions of faces.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.