Prosodic influence in face emotion perception: evidence from functional near-infrared spectroscopy

Emotion is communicated via the integration of concurrently presented information from multiple information channels, such as voice, face, gesture and touch. This study investigated the neural and perceptual correlates of emotion perception as influenced by facial and vocal information by measuring changes in oxygenated hemoglobin (HbO) using functional near-infrared spectroscopy (fNIRS) and acquiring psychometrics. HbO activity was recorded from 103 channels while participants (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {n} = 39$$\end{document}n=39, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {age} = 20.37\ \hbox {years}$$\end{document}age=20.37years) were presented with vocalizations produced in either a happy, angry or neutral prosody. Voices were presented alone or paired with an emotional face and compared with a face-only condition. Behavioral results indicated that when voices were paired with faces, a bias in the direction of the emotion of the voice was present. Subjects’ responses also showed greater variance and longer reaction times when responding to the bimodal conditions when compared to the face-only condition. While both the happy and angry prosody conditions exhibited right lateralized increases in HbO compared to the neutral condition, these activations were segregated into posterior-anterior subdivisions by emotion. Specific emotional prosodies may therefore differentially influence emotion perception, with happy voices exhibiting posterior activity in receptive emotion areas and angry voices displaying activity in anterior expressive emotion areas.

www.nature.com/scientificreports/ and independently makes some multimodal findings more difficult to interpret, as there is no separate account of how each modality was differentially processed when presented alone or together 52 . Further elucidating the neural substrates of emotion perception will require the use of multimodal stimuli, which encapsulate both the visual and auditory components of affect perception. The current study elected to use affective multimodal stimuli to investigate the neural activity underlying multimodal integration during emotion perception using functional neuroimaging and psychophysical measures. Functional near-infrared spectroscopy (fNIRS) is well-suited to study the cortical activity associated with emotion perception, as it allows for the measurement of both oxygenated and deoxygenated blood flow, it does not require immobilization, is resilient to movement, and is a reliable measure of neural activity 54,55 . Additionally, fNIRS measures the hemodynamic response at a higher temporal resolution than fMRI, which makes it wellsuited to capture the temporal complexities of emotion perception. While fNIRS has been used in a variety of cognitive tasks, it has been used less frequently in tasks investigating emotion, with very few investigating the integration of emotional audiovisual stimuli [54][55][56][57] . Importantly, fNIRS data acquisition is virtually silent, which gives it a significant advantage in sound sensitive studies examining auditory processing as fMRI scanners are inherently limited by the loud noise generated during scans. This extraneous noise may disrupt normal stimulus processing and introduce extraneous stimulus-locked artifacts, which may further complicate later data analysis. Moreover, this study adds to the current literature by presenting data from high density recordings in healthy, awake adults.
We aimed at examining the influence of non-verbal prosodic vocal stimuli on emotion perception of simultaneously presented faces. Functional near-infrared spectroscopy (fNIRS) was used to quantify changes in oxygenated-hemoglobin (HbO) while subjects were presented with affective voices, faces, and voices and faces presented together. We hypothesized that the prosody of the simultaneously presented voice would bias participants' perception of the stimuli and this bias would be evidenced by an anterior-posterior shift in neural activity between prosody conditions in the right hemisphere, with the angry prosody condition showing an anterior distribution of activity. For the region-of-interest analysis we predicted that activation in bilateral scalp areas consistent with the TPJ would vary with the stimulus type and prosody of the speaker's voice, with the combined face and voice (F + V) conditions exhibiting greater HbO activity than the face only or voice only conditions. From behavioral data on the same task, we also predicted that the presence of prosodic vocal information would bias the emotion perception in the direction of the vocal emotion in simultaneously presented stimuli in the F + V condition.

Results
Behavioral data. To analyze the hypothesized bias effects, data were fit using a logistic function to calculate point of subjective equality (PSE) and just noticeable difference (JND) values, which were analyzed using two identical one-way repeated measures analysis of variance (ANOVA), with condition as the within subject factor. PSE and JND values for each condition are located are reported in Supplementary Table 8S. There was a statistically significant difference in PSE values between conditions F(3,115) = 8.48 , p = .0000 with the Happy prosody condition having a significantly different PSE ( 3.78 ± .946 ) than both the Angry ( 4.87 ± 1.14 , p = .0000 ) and Neutral ( 4.70 ± .966 , p = .0003 ) conditions. The PSE for the face only condition ( 4.15 ± .628 ) was also significantly different than the Neutral ( p = .0281 ) and Angry ( p = .0035 ) conditions, but not the Happy prosody condition ( p = .1324 ), see Fig. 1a. The ANOVA for JND revealed a statistically significant difference in JND values between conditions F(3, 115) = 13.39 , p = .0000 . The face only condition exhibited a significantly larger JND ( 3.32 ± .939 ) than all prosody conditions (Happy, 1.77 ± .689 , p = .0000 ; Angry, 2.04 ± 1.20 , p = .0000 ; Neutral, 2.16 ± 1.20 , p = .0000 ). The JND for the Happy prosody condition was not significantly different from the Angry ( p = .3131 ) or Neutral ( p = .1494 ) prosody conditions, which also did not significantly differ from one another ( p = .6557 ), see Fig. 1b. Subject reaction times were analyzed using an ANOVA with the same within subjects factors and levels, which revealed a significant main effect for face step F(6, 162) = 11.65 , p = .0000 , condition F(3, 81) = 20.01 , p = .0000 , and a significant interaction between face step and condition F(18, 486) = 3.15 , p = .0000 . Pairwise comparisons revealed significantly faster reaction times for the face only condition when compared to all prosody conditions (Happy, 760.62 ± −51.08 , p = .0000 ; Angry, 750.106 ± −40.57 , p = .0001 ; Neutral, 753.73 ± −44.20 , p = .0000 ). There were no significant differences in reaction times between the prosody conditions (see Fig. 1c).
NIRS results. Three optodes (two emitters and one detector) were eliminated from all scans as they failed quality control in more than half of the subjects' datasets. A gain check was used to assess individual channel quality channels with gains higher than 6 but otherwise passing quality controls were interpolated to form 103 channels to ensure that all subjects shared a common set of channels for comparisons.
Face and voice contrasts. While both the Angry and Happy F + V conditions exhibited greater concentrations of HbO when compared with the Neutral F + V condition, the resulting patterns of activity were distinctly different (Fig. 2, for full description of the results see Supplementary Table 1S). The Angry-Neutral F + V contrast revealed a pattern of increased HbO that was primarily restricted to motor areas in the right hemisphere (Fig. 2b). Such activations spanned prefrontal areas, with increased HbO occurring in superior and inferior aspects of the precentral gyrus. A handful of channels also exhibited significantly greater activity when compared to the Neutral F + V condition: supramarginal gyrus, middle superior temporal gyrus (STG), inferior postcentral gyrus, and the inferior occipital gyrus (IOG). The Happy-Neutral F + V contrast exhibited a strikingly different pattern of activity, with increased levels of HbO occurring bilaterally (Fig. 2a). Activations in the right hemisphere spanned posterior dorsal areas, which included superior parietal lobules (SPL), superior, middle, and inferior occipital gyri. Bilateral activations appeared in left and right temporoparietal junctions (TPJ), as well as posterior and www.nature.com/scientificreports/ www.nature.com/scientificreports/ middle sections of the STG. In the left hemisphere activations appeared in prefrontal and precentral gyrus areas, superior frontal gyrus (SFG), and the TPJ. When compared against one another, the Angry F + V condition exhibited greater HbO in right inferior precentral gyrus than the Happy F + V condition (Fig. 2c). Interestingly, the Happy F + V condition exhibited greater HbO in the right superior occipital gyrus, left dorsolateral prefrontal areas, and superior postcentral gyrus than the Angry F + V condition. When compared to the FO condition, both the Angry and Happy F + V conditions exhibited activity in the right hemisphere, with a similar anteriorposterior division as the previous F + V − F + V contrasts (Fig. 3, see Supplementary Table 2S for more detailed results). Activity included but was not limited to the TPJ for both contrasts. The Happy F + V condition exhibited increased HbO activity in posterior occipital and dorsal association areas (Fig. 3a). Activity for the Angry F + V condition was primarily localized to motor areas (Fig. 3b). Additionally, while greatly reduced, a similar increase in right lateralized anterior-posterior HbO activity appeared when both the Angry and Happy F + V conditions were compared with their respective VO conditions (Fig. 4, see Supplementary Table 3S). Happy F + V activity appeared in right TPJ and posterior occipital areas. The Angry F + V exhibited increased HbO in frontal motor areas but showed decreased HbO in anterior portions of the right temporal lobe (Fig. 4b).
Voice only contrasts. When contrasted with the Neutral voice condition, both the Angry and Happy voice conditions exhibited decreased activity bilaterally ( Fig. 5a,b, see Supplementary Table 4S for a full description of the results). Similar to the activations in the bimodal contrasts, these patterns were not overlapping, with Angry Mean face and voice, and voice only condition contrast. The mean activity of all F + V conditions was contrasted with the mean activity of all VO conditions. The VO conditions exhibited greater activity in areas associated with language processing in the left hemisphere, with activity appearing in the inferior frontal gyrus and posterior areas of the inferior temporal gyrus (Fig. 6, see Supplementary Table 5S for full details). Additionally, the F + V conditions only exhibited greater HbO activity in the left occipitotemporal junction, this area is related to the identification of visual information.

Discussion
We evaluated participants' oxygenated hemoglobin (HbO) levels and 2AFC button-press responses as they were presented with images of affective facial expressions and vocal utterances voiced in a happy, angry, or neutral prosody. This design enabled the comparison of both the unimodal (face or voice presented alone) and bimodal (face and voice (F + V)) components of affect expression. At the behavioral level, voices appeared to aid emotion recognition as subjects displayed the smallest JND values when presented with any bimodal F + V condition compared to when faces were shown alone. Prosody may work to reduce confusion when presented with an ambiguous face, enabling the recipient to react immediately and appropriately. However, this perceptual gain may require more processing as the F + V conditions exhibited significantly longer reaction times than the FO www.nature.com/scientificreports/ www.nature.com/scientificreports/ condition. There was partial support of our hypothesis of vocal bias of emotion classification from the Happy F + V condition, but not from the Angry F + V condition. Collectively, these data provide further support for the role of prosody in influencing emotion perception at both the neural and behavioral level. HbO activity for the bimodal conditions varied by prosody, with the Angry and Happy F + V conditions exhibiting higher HbO levels in bilateral temporoparietal junctions (TPJ) brain regions than in the Neutral F + V condition, and the Angry F + V condition showing greater HbO activity in these areas when compared to the Happy F + V condition. The data showed that the Happy F + V condition exhibited significantly greater HbO activity in these regions when compared to both the Angry and Neutral F + V conditions (Fig.2a,c). While the Happy F + V condition exhibited greater HbO activity in bilateral TPJs than the Neutral F + V condition, differences in TPJ activity between the Happy and Angry F + V conditions only appeared in the right hemisphere. These results are intriguing as the left and right TPJs are associated with different aspects of social communication, with integration of vocal and facial social stimuli occurring in the right TPJ and perspective taking being localized to the left TPJ. This suggests that TPJ activity may be linked to the valence of the prosodic voice, as both conditions contained the same faces, but only differed in the prosody of the voice. The Happy F + V condition may have selectively engaged this area as there was no difference in TPJ activity between the Neutral and Angry F + V conditions (Fig. 2b).
The Happy and Angry F + V conditions also exhibited pronounced differences in patterns of activity in the right hemisphere, with a distinct anterior-posterior division appearing between emotions (Fig. 2a,b). Evidence from aprosodia patients has shown that damage to the right hemisphere may selectively impair prosody related functions as comprehension deficits appear in patients with posterior injuries and difficulties in reproducing prosody manifest with anterior damage 19,20 . Both the anterior and posterior regions of the right hemisphere are essential to multimodal affect perception, as damage to either area may result in deficits in identifying, recognizing, or producing emotional expressions 58 . Additionally, these impairments may be connected to the fundamental role of the right hemisphere in processing paralinguistic information related to a speaker's age, gender, or emotional state 25,26 . The right hemisphere is thus generally involved in distinguishing and associating vocal and facial information with their relevant conceptual representations. Prosody may therefore be an auditory extension of a speaker's identity and is thought to have a more pronounced effect on emotion perception than other sensory modalities 59 . The impact of prosody on emotion perception is underscored by the ability of affective voices to bias the perception of emotional facial expressions in the direction of the simultaneously presented prosody (Fig. 2) and this effect is not tied to stimulus valence as both negative and positively valenced stimuli have been shown to bias face perception 31 . One neural substrate potentially related to this perceptual bias may be the posterior superior temporal sulcus (pSTS) of the right hemisphere, as this area exhibits increased activity when presented with emotionally incongruent vocal and facial stimuli 34 . We found that both the Happy and Angry F + V conditions exhibited increased activity in these posterior brain regions consistent with pSTS when compared to the Neutral F + V, suggesting that the pSTS evaluates the congruency of affective audiovisual stimuli and this is not valence specific. Average HbO time course activity for the Happy F + V and Neutral F + V conditions is shown in Supplementary Fig. 1S.
Together, these findings show that activity in the right hemisphere may represent a more general role in speaker identification and that this identification may be mediated by a speaker's prosody. Results from the www.nature.com/scientificreports/ current study support and extend this assertion as activity was lateralized to the right hemisphere and seemingly subdivided by valence into two anterior and posterior subdivisions (Fig. 2a,b). A similar functional division has been reported in the stroke literature, where patients sustaining damage to anterior regions in the right hemisphere exhibited deficits in recognizing negatively valenced emotions, but this effect was not seen in patients with posterior damage 60 . These findings complement those of the current study, which found that the Angry F + V condition exhibited increased HbO in the right hemisphere, with activations primarily localized to right frontal and somatomotor areas (Fig. 2b). In contrast to Anger, a homologous posterior anatomical region has not been reported for positively valenced emotions 60,61 . Rather, happiness appears to be represented in several cortical and subcortical areas, with bilateral activity appearing in somatosensory and posterior association areas 37,38,61 . The Happy F + V condition displayed a similar pattern of posterior activity in right superior parietal and lateral occipital regions, with bilateral activity appearing in the TPJs. Again, while this lateralization may reflect the perceptual weight that vocalizations carry in affect perception, the posterior segregation of activity highlights the involvement of brain areas associated with representing and responding to another individuals' mental state 37,38 . Additionally, these findings may reflect the automatic mimicry and approach behaviors evoked by happy faces 39 . When compared against one another, the Happy F + V stimuli exhibited greater HbO activity over the right occipital region than the Angry F + V stimuli (Fig. 2c). Most interestingly, the Happy F + V condition also exhibited greater levels of HbO in left dorsolateral prefrontal cortex (DLPFC). These data provide indirect support for the role of the right hemisphere in selectively processing angry facial expressions 9,62 . However, neither hemisphere displays a similar specialization for happy faces 9 . These findings may dually reflect the unilateral specialization of the right hemisphere in processing angry stimuli, as well as the more general, bilateral activity evoked by happy stimuli.
Despite their spatial segregation, the Angry and Happy F + V conditions exhibited overlapping areas of increased HbO brains areas near the STG and the right occipital region (ROR), areas that are essential to the fine discrimination and initial integration of affective vocal and facial cues with their corresponding emotion-specific perceptual-representations 6  www.nature.com/scientificreports/ conditions were compared to the Neutral F + V condition, but not in direct comparisons between emotion conditions (Fig. 2c), suggesting that while activity in middle STG and ROR may be mediated by emotional valence, the regions do not appear to be emotion specific. This is consistent with findings from fMRI, EEG, MEG, and lesion studies, which have indicated that the ROR and mSTG are sensitive to the physical attributes of affective stimuli but may not encode specific emotions 9,22,25 . Collectively, these findings emphasize the crucial role of audiovisual integration in affect perception, as faces paired with a happy or angry voice exhibited distinctly different patterns of neural activity and the location of this activity appears to closely correspond to those reported in the face mimicry literature 9,39,42 . In the current study, however, this effect was driven by the prosodic information, since all three F + V conditions utilized the same set of face stimuli and differed only in the prosody they were paired with (angry, happy, neutral). This distinct distribution of activity may be related to the apparent bias in perception that was evident in the behavioral findings, with both the Angry and Happy F + V conditions demonstrating a perceptual shift in subjects' responses when compared to the face only condition. These findings complement the aprosodia literature by showing that emotion is represented in the brain via a distributed pattern of activity that appears to be socially relevant and valence dependent. Prosody comprehension and expression are vital to emotion perception and the current data indicate that differences in the valence of a speaker's voice may produce a perceptual bias that corresponds to a distinctive pattern of neural activity that follows the anatomical-functional organization of prosody related areas in the right hemisphere. These data suggest that while the physical qualities of a face play a major role in affect perception this is not entirely independent of the simultaneously presented vocal information. To further parse apart this relationship, activity from the face only and prosody only conditions was compared to the Angry and Happy F + V conditions. Interestingly, when compared to the face only condition, both the Angry F + V and Happy F + V conditions exhibited a similar distributed pattern of increased HbO activity over posterior-anterior areas in the right hemisphere (Fig. 3a,b). The F + V conditions also exhibited greater HbO levels than the prosody only conditions, but this activity was not as diffuse as that witnessed in the face comparisons (Fig. 4a,b). Additional analyses were carried out on the NIRS and behavioral data to check for gender differences. There were no significant differences between the men and women for either dataset.
These findings highlight the inherent interrelatedness of affective vocal and facial expressions. These data also support the notion that affect perception evolves in complexity, with initial processing originating in brain areas that are mutually responsive to all vocal (mSTG) and facial (ROR) displays of emotion. Subsequent integration with conceptually relevant knowledge appears to manifest as a divergence of neural activity, separating the right hemisphere into posterior and anterior subdivisions. Indicating that the processing of affective faces and voices may not be entirely separate. Moreover, these findings underscore the ability of prosody to influence the integration of socially relevant, affective multisensory information using fNIRS. Further disentangling the neural correlates underlying these processes may hold special significance in understanding and treating the deficits in speech and emotion perception exhibited by individuals with autism spectrum disorders or other neurodevelopmental disorders 64,65 .
The current study also has significant limitations that should be acknowledged. The technical problem with behavioral data collection during the fNIRS acquisition prevented us from examining correlations between the behavioral and imaging data. Second, the contribution of subcortical structures to emotion perception cannot be assessed with NIRS due to limitations in depth sensitivity 66 . Third, because we did not acquire electromyogram data the degree of facial mimicry/emotional contagion, if any, could not be assessed in our participants. Finally, the lack of findings for the left dorsolateral frontal region should be taken with caution, as the technical failure of one source optode for some subjects necessitated the interpolation of channels in that region across the entire participant dataset. Additionally, the influence of experimental design must be acknowledged, as these findings stand in stark contrast to those of similar studies investigating affective auditory processing. In particular, Shekhar and colleagues (2019) showed that happy voices caused a more positive response than neutral stimuli in areas of the inferior frontal temporal cortex in two-month old infants using a traditional block design experiment 67 . These data showed an initial increase to happy speech followed by a decrease potentially related to neural habituation. The current study employed a hybrid block design, in which multiple similar stimuli were shown in close repetition. This repetition may have habituated responses to happy stimuli and may account for the decrease in HbO that was seen in the happy VO < neutral VO contrasts. However, it should be noted that these discrepancies may also be related to ongoing neurovascular development as infants have been shown to exhibit inverted hemodynamic responses in bilateral temporal brain regions when compared to healthy adults during a passive auditory listening task 68 .
Despite these limitations, the data presented here provide an initial investigation of the neural correlates underlying emotion perception using a multimodal approach to increase the ecological validity and generalizability of the experiment to other neuroimaging studies. To our knowledge, this is the first fNIRS study to investigate multimodal emotion perception using a high-density optode array 69,70 . The striking anterior-posterior distinction between Angry and Happy F + V conditions in the right hemisphere should be replicated and extended in future studies.

Methods
Subjects. Thirty-nine subjects used in the NIRS study were drawn from the undergraduate introductory psychology subject pool volunteers from Colorado State University (See Supplementary Table 6S). The protocol was approved by the Colorado State University Institutional Review Board and all participants provided written informed consent before taking part in the procedures. The experiment was conducted in accordance with all relevant guidelines and regulations. Exclusionary criteria were based on self-report and included past or present neurological or psychiatric diagnosis, history of developmental disability, traumatic brain injury, current Face stimuli. Face stimuli were taken from the NimStim database 71 . This dataset used untrained actors with natural hair and makeup. One angry and one happy closed-mouth image were selected from a subset of 20 actors (10 men) from the NimStim database. Images were grayscaled and cropped tightly around the face so that no hair, neck or clothing was visible. Two continua were generated, one for each actor, using Psychomorph software 72,73 . Each continuum consisted of two end-point prototype images (angry or happy), which were morphed together in seven steps (two endpoints and 5 morphs, in 12.5% steps) so that the mid-point image would be a 50% combination of each prototype image (for an example continuum see Supplementary Fig. 2S). Individual face templates were created for each end-point image using 182 manually placed points. Faces with closed mouths were selected to facilitate morphing. Face stimuli were presented on an LED monitor at 240 Hz refresh rate located 45 cm in front of the subject. Face stimuli subtended 7.62 degrees of vertical visual angle and 5.72 degrees of horizontal angle.
Auditory stimuli. Auditory stimuli were obtained from the Montreal Affective Voices database, in which professional actors produced short, nonverbal affective interjections of the vowel /a/, which sounds similar to the /a:/ in "ah" in spoken English 7 . The current stimuli were chosen because they effectively convey emotion, they are not synthetic, and are free of semantic or linguistic information that may indirectly bias participants responses. Three vocalizations expressed in angry, happy, and neutral prosody were chosen for each actor (1 male and 1 female), resulting in a total of six vocalizations. These stimuli have previously been matched and validated for valence (negative, positive), arousal, and perceived intensity 7 . All vocal stimuli were 993 ms in length.
Paradigm. Participants were presented with three classes of stimuli: face + voice (F + V), voice only (VO), and face only (FO). These stimuli were used to create seven conditions, one for each prosody (happy, angry, neutral) for the F + V and VO conditions with one condition for the FO stimuli, see Fig. 7 for an example. Binaural auditory stimulation (70 dB SPL) was delivered via EAR 3a foam insert earphones. Morphed face stimuli were presented alone or simultaneously with auditory stimuli. Each trial began with a white fixation cross on a black background for 300 ms, followed by a 200 ms pause, after which a VO, FO, or F + V stimulus was presented for a duration of 993 ms, followed by a blank black screen for 500 ms, for a total trial time of 1993 ms, see Fig. 7a. Subjects were instructed to identify the emotion expressed by the actor for every trial in a two−alternative ("happy" or "not happy") forced choice (2AFC) procedure using a x−box controller without specific reference to the face or voice. Stimuli were presented using E−Prime 2 Professional (Psychology Software Tools, Inc., United States). The hybrid block design experiment contained a total of 980 trials (20 actors x 7 conditions x 7 faces on a continuum), with pseudo-random VO, FO, and F + V condition blocks presented as shown in Fig. 7b. Blocks were defined by their stimulus type (FO, F + V, VO) and condition (Happy F + V, Angry F + V, Neutral F + V, Happy VO, Angry VO, Neutral VO, FO). All blocks contained 14 trials and were 28 s in duration. Block condition was indicated by the trial type that was in the majority and trials were organized pseudorandomly. For the three F + V and VO conditions, 70% of the trials were the same as the block voice condition, and the remaining 30% was divided equally between the two remaining voice conditions. This was done to prevent subjects from easily predicting the emotions within the blocks. Each condition was shown in 10 blocks, for a total of 70 blocks, which were organized in a pseudorandom order. Face and voice gender were matched for all F + V conditions. For the NIRS scan, two resting periods of 2−minute duration were added at the beginning (block 1) and middle (block www.nature.com/scientificreports/ 37) of the experiment where subjects were instructed to relax, while a central fixation cross was shown on the screen, for a total for 72 blocks. The total duration of the NIRS experiment was 36.5 min, and the total duration for the behavioral experiment was 32.5 min.
NIRS instrumentation and preprocessing. Optical data were acquired using a continuous wave NIRScoutX (NIRScout; NIRx Medical Technologies, Los Angeles, CA, USA) NIRS system, which can record from up to 32 multiplexed silicon dioxide photodetectors. In our montage, 16 detectors were located over each hemisphere. The optode array contained 28 source positions (light emitting diodes) operating at two wavelengths 760 and 850 nm. Data were acquired at a sampling frequency of 3.92 Hz. Sources and detectors were manually inserted into special NIRS recording caps (Easycap GmbH, Germany) configured in a standard 10−05 International Electrode system manner (Easycap montage M15). This arrangement distributed sources and detectors so that they were located approximately 3 cm apart, to produce a total of 103 channels (See Fig. 8), in an attempt to maximize coverage of the cortical surface and to obtain high−resolution estimates of chromophore concentrations 74 . Recordings were analyzed in the spm_fnirs software package for Matlab 75 , where data were cleaned of motion artifact (MARA; 76 ) high pass filtered at 0.01 Hz, and temporally smoothed with a 5.0 s moving window to reduce cardiac and respiration noise. Data were compared using nine contrasts (Angry prosody > Neutral prosody, Happy prosody > Neutral prosody, Angry prosody > Neutral prosody, Angry F + V > Neutral F + V, Happy F + V > Neutral F + V, Angry F + V > Happy F + V, all voices > Face only, all face and voices > Face, all F + V > all voices). All NIRS channels were first analyzed using a whole−brain approach, by implementing a general linear model design matrix to perform first−level statistics on HbO data. Second−level statistics were performed on all resulting contrasts (p−value .05) to reveal any significant channels. Multiple comparisons were corrected by FDR set at q = .05.
Behavioral data analysis. Reaction times were analyzed using a two-way, 3 (prosody) x 7 (face) repeated measures ANOVA with Greenhouse-Geisser correction for the prosody factor. Significant main effects and interactions were subsequently examined using Bonferroni adjusted Fisher LSD post-hoc tests at alpha = .05 . Response choices to each face were treated as psychophysical data and were considered as the percent happy www.nature.com/scientificreports/ responses for each face. The point of subjective equality (PSE, or 50% angry/happy point) and the just-noticeable difference (JND, or 25 + 75% points, divided by 2) were derived from a logistic function fit applied to the 7 face classifications. Each dependent variable was entered into two separate one-way, repeated measures ANOVAs with a single factor of prosody to examine the potential bias of voice prosody on face perception.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.