Neural measures of the role of affective prosody in empathy for pain

Emotional communication often needs the integration of affective prosodic and semantic components from speech and the speaker’s facial expression. Affective prosody may have a special role by virtue of its dual-nature; pre-verbal on one side and accompanying semantic content on the other. This consideration led us to hypothesize that it could act transversely, encompassing a wide temporal window involving the processing of facial expressions and semantic content expressed by the speaker. This would allow powerful communication in contexts of potential urgency such as witnessing the speaker’s physical pain. Seventeen participants were shown with faces preceded by verbal reports of pain. Facial expressions, intelligibility of the semantic content of the report (i.e., participants’ mother tongue vs. fictional language) and the affective prosody of the report (neutral vs. painful) were manipulated. We monitored event-related potentials (ERPs) time-locked to the onset of the faces as a function of semantic content intelligibility and affective prosody of the verbal reports. We found that affective prosody may interact with facial expressions and semantic content in two successive temporal windows, supporting its role as a transverse communication cue.

Scientific RepoRts | (2018) 8:291 | https://doi.org/10.1038/s41598-017-18552-y accessibility of the verbal reports expressing pain (i.e., utterances in mother-tongue vs. utterances in a fictional language designed to sound natural; we named this manipulation "intelligibility": intelligible vs. unintelligible utterances) and the "prosody" of the verbal reports (neutral vs. painful). To note, the content of intelligible utterances was always of pain. The two sets of utterances (in mother-tongue and in fictional language) were declaimed by a professional actor so that the prosody of each utterance matched between languages. An independent sample of participants judged the intensity of pain conveyed by the prosody of each utterance confirming that the perception of the pain expressed by the tone of the voice did not differ between the two sets of utterances.
We also collected explicit measures of participants' dispositional empathy (i.e. Empathy Quotient 25 and Interpersonal Reactivity Index 22 ). In line with our previous findings 16 , we expected painful facial expressions and intelligible utterances (always with a content of pain) to trigger dissociable empathic reactions in two successive temporal windows. We time-locked ERP analysis to the presentation of facial stimuli as a function of preceding utterances and we anticipated that facial expressions would have selectively elicited empathic reactions on the P2 and N2-N3 ERP. Lastly, we expected intelligible utterances (i.e., utterances expressing a painful context in participants' mother-tongue) to trigger empathic reactions on the P3 ERP component when compared to unintelligible utterances (i.e., in a fictional language). On the basis of previous studies we expected these empathic reactions to manifest as positive shifts of ERPs time-locked to faces onset for painful facial expressions and intelligible utterances when compared to neutral conditions 15,16,19,21,24 . The current study was specifically designed to unravel the role of affective prosody in inducing an empathic response as an additional cue of others' pain. We hypothesized and demonstrated that, by virtue of its dual-nature, affective prosody can be considered cross-domain information able to transversely influence processing of painful cues triggering experience-sharing (facial expressions; pre-verbal) and mentalizing responses (intelligible utterances with a content of pain; verbal domain of processing). More specifically, we anticipated that affective prosody would have affected the neural empathic response to painful facial expressions in the early temporal window linked to experience-sharing (i.e., P2, N2-N3 ERP reaction, time-locked to faces onset), and the empathic response to painful intelligible utterances in a dissociable and later temporal window associated with mentalizing (i.e., P3 reaction, always time-locked to the onset of faces).

Results
Questionnaires. The present sample of participants showed a mean EQ score in the middle empathy range according to the original study 25 , i.e. 46.83 (SD = 7.09). IRI scores were computed by averaging the scores of the items composing each subscale as reported in Table 1.
Behavior. Participants  776. All cues induced higher scores of self-rated empathy in painful conditions relative to neutral conditions. The two-way interaction between facial expression and prosody was significant, F(1,16) = 10.219, p = 0.006, MS e = 0.229, η p 2 = 0.390. Post-hoc t-tests revealed that participants rated their empathy as higher when both facial expression and prosody were painful compared to neutral, (t(16) = 10.526, p < 0.000001). Both conditions in which only one of the cues was painful induced significantly higher scores than the condition in which both cues were neutral (min t(16) = 6.468, max p = 0.000008), but scores did not differ between these conditions when only one cue was painful (t(16) = 1.905, p = 0.075). The two-way interaction between intelligibility and prosody (F(1,16) = 10.219, p = 0.006, MS e = 0.229, η p 2 = 0.390) indicated that the difference in the rates assigned to painful and neutral prosody was higher when utterances were in a fictional language, when compared to those in participants' mother-tongue, (t(16) = 3.197, p = 0.006; M diff = 0.524 [0.177, 0.872]). Empathy for unintelligible utterances reported with both neutral and painful prosody were rated as lower than intelligible utterances pronounced with neutral and painful prosody (min t(16) = 3.252, max p = 0.005). Rates to intelligible utterances with neutral prosody did not significantly differ from unintelligible utterances with painful prosody (t(16) = 1.945, p = 0.07). This pattern could be due to the explicit painful context expressed by intelligible utterances despite being pronounced with neutral prosody. Both unintelligible and intelligible utterances pronounced with painful prosody were rated as higher than those pronounced with neutral prosody (min t(16) = 6.320, max p = 0.00001). The two-way interaction between facial expression and intelligibility, (F(1,16) = 5.135, p = 0.038, MS e = 0.183, η p 2 = 0.243), revealed that painful, relative to neutral, faces induced higher self-rated empathy following utterances in participants' mother-tongue compared to those in a fictional language, (t (16)   To highlight the effect of prosody, we conducted separate ANOVAs for neutral and painful prosody with facial expression and intelligibility as within-subject factors. ANOVA conducted for neutral prosody did not reveal any significant At CP, we observed a main effect of intelligibility, F(1,16) = 7.028, p = 0.017, MS e = 3.110, η p 2 = 0.305, i.e. larger P2 for utterances in mother-tongue than those in a fictional language, that was further qualified by a three-way interaction between intelligibility, prosody and hemisphere F(1,16) = 5.224, p = 0.036, MS e = 0.202, η p 2 = 0.246. Again, to highlight the effect of prosody, we conducted separate ANOVAs for neutral and painful prosody with hemisphere and intelligibility as within-subject factors. ANOVA conducted for neutral prosody, revealed a significant interaction between hemisphere and language, F(1, 16

P3.
Replicating previous findings, on this component we expected to observe a main effect of the context. In the current study, that was given by the contrast between utterances in mother-tongue, i.e. the context was always painful, and those in a fictional language, where there was no semantic access to the context (Fig. 4).
Correlational analysis results. The perceptual cue reaction on the P2 component was qualified as an empathic reaction associated with affective empathy, i.e. experience-sharing, as it significantly correlated with the affective subscale of the IRI, the empathic concern (EC) at CP, r = 0.516, p = 0.017 (the correlation was not significant at FC, r = 0.326, p = 0.101) but not with EQ, max r = −0.116, min p = 0.328. The same reaction marginally correlated with EC on the N2-N3 at CP, r = 0.390, p = 0.061 (at FC, r = 0.315, p = 0.109), but did not correlate with the EQ, max r = 0.099, min p = 0.352.

Discussion
In the current study, we investigated the role of affective prosody in neural empathic responses to physical pain expressed by pre-verbal and verbal cues of pain, (i.e. facial expressions and utterances). We orthogonally manipulated facial expressions (neutral vs. painful), intelligibility of the utterances (intelligible vs. unintelligible, i.e. utterances in mother-tongue vs. utterances in fictional language) and affective prosody (neutral vs. painful). On each trial of the experimental design a face stimulus was presented at the centre of the computer screen, either with a neutral or a painful expression; it was preceded at variable intervals by an utterance, either intelligible or unintelligible, pronounced with either a neutral prosody or an affective prosody expressing the speaker's pain. All ERPs waveforms were time-locked to the presentation of the face stimuli. Importantly, intelligible utterances (i.e., utterances in mother-tongue) always considered a painful content. Our purpose was to monitor ERP empathic responses to others' pain time-locked to the onset of faces (manifested as a positive shift of ERPs reflecting painful when compared to neutral conditions) as a function of all the combinations of cues of pain. We were interested in replicating our previous findings 16 , in which we demonstrated that when time-locked to faces onset, P2 and N2-N3 empathic responses to others' pain were driven by facial expressions, whereas P3 empathic responses were driven by higher level cues of pain such as the painful content (i.e., choice of words) of a verbal expression. These two different temporal-windows are functionally dissociable and very likely reflections of experience-sharing and mentalizing components of empathy as supported also by source analysis 7,16,21 . Most important, the main aim of the present investigation was to elucidate how affective prosody influenced early and late empathic responses to pain. We hypothesised that because of its dual-nature -pre-verbal but also accompanying the semantic content of the speech -prosody could interact with facial expression within an early temporal window of processing and with semantic content within a later temporal window of processing.
Replicating our previous study 16 , we time-locked ERP analysis to the onset of faces and observed that painful facial expressions modulated P2 and N2-N3 components associated with the experience-sharing response. Painful contexts maximally triggered the P3 response linked to mentalizing, as further corroborated by participants' self-rated empathy and by the pattern of correlational analysis.
Crucially, we observed that painful prosody acted on a pre-verbal domain enhancing ERP empathic reaction to painful faces when preceded by unintelligible utterances, i.e. in the fictional language, within the time-window associated with experience-sharing, including the P2 component. Painful prosody acted on a verbal domain enhancing P3 empathic reaction to painful semantic content, linked to mentalizing mechanisms. This effect of empathic reaction enhancement to painful facial expressions due to painful prosodic information was absent within the N2-N3 temporal window. N2-N3 amplitude to neutral facial expressions preceded by utterances with painful prosody was significantly less positive than that elicited by neutral facial expression preceded by utterances with neutral prosody. This pattern was opposite to what is usually observed in ERP studies on empathy. This may suggest that the incongruence between prosody and facial expression interfered with the elicitation of an empathic response. Nevertheless, this further observation strongly corroborates the view that prosody and facial expression information may interact within this earlier temporal window, including the P2 and N2-N3. Notably, a similar interference in the elicitation of neural empathic response was observed on the P3 component under conditions in which unintelligible utterances where pronounced with a painful prosody. This finding is particularly interesting when contrasted with the empathic response enhancement that we observed for intelligible utterances pronounced with a painful prosody. This pattern seems to suggest that prosodic information may magnify a higher-level empathic response linked to language (and to mentalizing) only when it is associated with a semantic content.
This pattern of neural responses translated into higher scores of self-rated empathy under conditions in which utterance were pronounced with a painful compared to a neutral prosody along with higher scores recorded when both facial expressions and prosody were painful and for intelligible, relative to unintelligible, utterances reported with painful prosody when compared to other combinations.
Taken together, these findings are consistent with those studies on on-line processing of prosodic information showing that vocal emotion recognition, i.e. prosody, can occur pre-attentively and automatically in the time-range including the Mismatch Negativity (MMN 38 ) and the P2 39,40 . The MMN has been shown to peak at about 200 ms in an oddball task where standard and deviant stimuli were emotionally and neutrally spoken P2 N2-N3 P3

Facial Expression
Yes -at both FC and CP pools and at both hemispheres, max p = 0.013, η p 2 = 0.331. Painful facial expressions elicited more positive P2 than neutral expressions.

Intelligibility
Yes -confined to the CP pool, max p = 0.017, η p 2 = 0.305. Larger P2 for utterances in mother-tongue than those in a fictionallanguage. Further qualified by the three-way interaction (see "Interactions" row).
No, p > 0.05 Yes -at both FC and CP pools, max p = 0.008, η p 2 = 0.364. Intelligible utterances elicited larger P3 than utterances in a fictional language. Further qualified by the two-way interaction (see [2] "Interactions" row).
[1] At FC: Hemisphere x Facial expression, p = 0.025, η p 2 = 0.276. Further qualified by post-hoc comparisons (see [3] in the bottom panel). [2] At CP: Intelligibility x Prosody, max p = 0.005, η p 2 = 0.397. Further qualified by post-hoc comparisons (see [4] in the bottom panel).  syllables. The differential MMN response to such comparison, larger for emotional than neutral stimuli, could therefore be taken as an index of the human ability to automatically derive emotional significance from auditory information even when irrelevant to the task. The modulations of the P2 have been related to the salience of the stimulus that conveys emotional content 39 . Importantly, the modulations of the centro-parietal P2 can also reflect the processing of the information important in a specific context: P2 is also modulated by individual characteristics of participants and experimentally-induced knowledge about categories of visual stimuli that are physically equivalent in the context of empathy for pain 23 . In line with Schirmer and Kotz 40 , evaluation of prosody encompasses a later verbal stage of processing that is related to the context evaluation and semantic integration with earlier pre-verbal bottom-up prosodic cues. When participants are required to detect an emotional change from vocal cues that can convey either prosodic and semantic information, ERP studies showed that such emotional change detection is reflected on larger P3 41,42 when compared to non-violations conditions. Findings in the context of emotional change detection with high ecological validity 41 can also help explain late modulations of the P3 as a function of bottom-up processes such as processing of facial expression observed in the present study. Although the present investigation considered neural responses time-locked to faces onset as a function of facial expression, accessibility to semantic content of pain (i.e., intelligibility) and prosody, we propose that on-line processing of prosodic information (as in the studies described above) and off-line processing of prosodic information (as in our study) could induce very similar ERP modulations encompassing temporal-windows linked to pre-verbal and verbal domains.
Interestingly, affective prosody also showed interactive effects with intelligibility of the utterances in a very early time-window, i.e. on the P2 (i.e., neutral faces preceded by utterances in mother-tongue with painful prosody induced a larger P2 reaction when compared to neutral faces preceded by utterances in a fictional language with painful prosody), and with the facial expression in the latest time-window, i.e. on the P3, confined to the right hemisphere at the centro-parietal sites (i.e., painful facial expressions elicited larger P3 than neutral facial expressions when preceded by utterances with painful prosody independently of their intelligibility). Within this framework, affective prosody of pain has a distinct role in enhancing neural empathic reactions by favouring the processing of congruent facial expressions of pain beyond the time-window linked to experience-sharing and favouring mentalizing processes on those faces; and, on the other side, by favouring earlier empathic reactions linked to experience-sharing to those neutral facial expressions that were preceded by utterances with a content of pain (i.e., intelligible utterances).
Importantly, similar to our previous work 16 , we did not find evidence of an interaction between facial expression and intelligibility within the earlier and the later time-windows. Remarkably, despite the higher ecological validity of the present stimuli when compared to our previous work where facial expressions were preceded by written sentences in third person (e.g., "This person got their finger hammered"), facial expression and intelligibility never interacted within both the earlier and the later time-windows, indexing that pre-verbal and verbal domains of processing distinctively contribute to the occurrence of the empathic response.
This whole pattern of results dovetails nicely with the ascertained view that affective prosody processing is a phylogenetically and ontogenetically ancient pre-verbal ability that develops along with intelligibility abilities. Similarly, it has been suggested that affective and cognitive components of empathy, i.e. experience-sharing and mentalizing, might have evolved along two different evolutionary trajectories attributing phylogenetically older age to experience-sharing than to mentalizing [43][44][45] . Explicit inference on others' inner states is believed to be a higher-order cognitive ability that is shared only by apes and humans 46,47 and its selection might be associated with increasing of social interactions complexity due to groups exchanges 48 .

Conclusions
In the present study we provided evidence that affective prosody is a powerful communication signal of others' pain by virtue of its dual-nature that conserved its evolutionary value along with human cognitive development. It enhances young adult humans' explicit ability to share others' pain acting transversely on empathy systems in two successive temporal windows. From a broader perspective, these findings may explain how harmonic interactions may survive partial or degraded information (i.e., when the speaker's words are not understandable or their facial expression is not visible) and allow powerful communication in contexts of immediate necessity, for instance, as in case of others' physical injuries.

Methods
Participants. Prior to data collection, we aimed to include 15-20 participants in the ERP analyses because it is suggested to be an appropriate sample in this field 15,19 . Data were collected from twenty-seven volunteers (10 males) from the University of Padova. Data from ten participants were discarded from analyses due to excessive electrophysiological artifacts, resulting in a final sample of seventeen participants (5 males; mean age: 24.29 years, SD = 3.72; three left-handed). By using G*Power 3.1 49 for a 3 × 2 × 2 × 2 × 2 × 2 repeated measures design, we calculated that for 95% of power given the smallest effect size we observed, 14 was an adequate sample size. Analyses were conducted only after data collection was complete. All participants reported normal or corrected-to-normal vision, normal hearing and no history of neurological disorders. Written informed consent was obtained from all participants. The experiment was performed in accordance with relevant guidelines and regulations and the protocol was approved by the Ethical Committee of University of Padova.
Stimuli. Stimuli were sixteen Caucasian male faces, with either a neutral or painful expression 19  The sentences were uttered by a professional Italian actor and presented by a central speaker at an average value of 52.5 dB. Eight utterances were in participants' mother-tongue (i.e., Italian) and each of them described a painful situation reported in first-person. Eight utterances were unintelligible (i.e., fictional language). Critically, each sentence was uttered with both neutral and painful prosody (i.e., prosodic cue). The Italian utterances were comparable for syntactic complexity, i.e., noun + verbal phrase (e.g., "I hurt myself with a knife"). The utterances in a fictional language were paired to Italian utterances for length and prosody.
To confirm that intelligibility did not affect prosody and vice versa, we tested 20 subjects for a rating task. In two separate blocks, subjects were asked to report (within a 7 points Likert scale) the pain intensity and how much the utterances were conceptually understandable (counterbalanced). We found that there was no significant difference in the pain rating with regard to the prosody (i.e., the tone of the voice) between intelligible and unintelligible utterances (t = 1.59, p = 0.11). Further, there was no significant difference in the intelligibility of the sentences between painful and non-painful prosody (t = −1.01, p = 0.31). Finally, we tested whether the painful prosody was actually perceived more intense than the neutral one, finding a significant difference (t = −54.38, p < 0.001). Participants were exposed to an orthogonal combination of the 16 faces, and the 16 sentences uttered with both neutral and painful prosody. Stimuli were presented using E-prime on a 17-in cathode ray tube monitor with 600 × 800 of resolution and 75 Hz of refreshing rate.

Experimental design.
We implemented a variant of the pain decision task 19 . Each trial began with a central fixation cross (600 ms), followed by the utterances (i.e., semantic and prosodic cues; 4000 ms). After a blank interval (800-1600 ms, jittered in steps of 100 ms), the face (i.e., perceptual cue) was displayed for 250 ms (Fig. 6).
Participants were told that in each trial they would have heard a voice reporting potential important information to understand what the person displayed immediately after was feeling. Their task was to decide whether the face had a neutral or a painful expression by pressing one of two counterbalanced response keys. At the end of each trial, they were required to self-rate their empathy on a 7-points Likert scale for each face considering the preceding utterance. Following a brief session of practice, participants performed 320 trials in 5 blocks where all conditions were randomly intermixed. EEG was recorded while executing the pain decision task. At the end of the recording session, participants were administered with self-report questionnaires of dispositional empathy: The Italian version of the Empathy Quotient (EQ 25,50 ) and the Italian version of the Interpersonal Reactivity Index (IRI 22 ). The EQ has been mainly linked to cognitive aspects of empathy. The IRI is composed of four subscales measuring both affective and cognitive aspects of empathy: empathic concern, EC, and personal distress, PD; perspective taking, PT, and fantasy, FS, respectively).
Electrophysiological recording and analyses. The EEG was recorded from 64 active electrodes placed on an elastic Acti-Cap according to the 10/20 international system, referenced to the left earlobe. The EEG was re-referenced offline to the average of the left and right earlobes. Horizontal EOG was recorded bipolarly from two external electrodes positioned laterally to the external canthi. Vertical EOG was recorded from Fp1 and one external electrode placed below the left eye. The electrode impedance was kept less than 10 KΩ. EEG and EOG signals were amplified and digitized at a sampling rate of 250 Hz (pass band 0.01-80 Hz). The EEG was segmented into 1200-ms epochs starting 200 ms prior to the onset of the faces. The epochs were baseline-corrected based on the mean activity during the 200-ms pre-stimulus period. Trials associated with incorrect responses or contaminated by horizontal and vertical eye movements or other artifacts (exceeding ± 60μV and ± 80μV, respectively) were discarded from analysis. We kept participants who showed at least 20 trials in each condition. The final range of trials was 21-40 but only 3 participants showed less than 25 trials in at least one condition. Separate average waveforms for each condition were then generated time-locked to the presentation of the faces as a function of the preceding utterances. Statistical analyses of ERPs mean amplitudes focused on P2 (125-170 ms), N2-N3 (180-380 ms) and P3 (400-900 ms). The selection of a single temporal window including the N2 and N3 components was mainly based on our previous studies 16

Statistical analysis. Pain Decision Task.
Reaction times (i.e., RTs) exceeding each individual mean RT in a given condition +/− 2.5 SD and associated with incorrect responses were excluded from analyses. RTs and mean proportions of correct responses were submitted to a repeated measure ANOVA including facial expression (neutral vs. painful), intelligibility (mother-tongue, i.e., Italian vs. fictional language) and prosody (neutral vs. painful) as within-subjects factors. ANOVAs carried out on mean amplitude values of each ERP component also included the within-subjects factor hemisphere (right vs. left) and were carried out separately for FC and CP.
The significant threshold for all statistical analyses was set to 0.05. Exact p values, mean squared errors (i.e., MS e ) and effect sizes (i.e., partial eta-squared, η p 2 ) are reported. Confidence intervals (i.e., CIs, set at 95% in squared brackets) are defined only for paired t-tests and referred to difference of means (i.e., M diff ). Planned comparisons relevant to test the hypotheses of the present experiment are reported. Bonferroni correction was applied for multiple comparisons.
Correlational analysis. With the aim of further qualifying neural responses as experience-sharing or mentalizing responses we correlated ERP empathic reactions (i.e., painful minus neutral conditions) with participants' dispositional empathy as measured by the IRI and the EQ. More specifically, the painful-minus-neutral score was computed for both the pre-verbal and verbal domains of processing. A perceptual cue reaction was computed for the pre-verbal domain by subtracting ERP to neutral faces preceded by utterances with neutral prosody from ERP to painful faces preceded by utterances with neutral prosody regardless of the intelligibility and of the hemisphere. A semantic cue reaction was computed for the verbal domain by subtracting ERP to faces as a function of utterances in a fictional language from ERP to faces as a function of Italian utterances regardless of facial expression, prosody and hemisphere. For both reactions, positive values indexed an empathic reaction.

Data Availability
The datasets generated during and/or analysed during the current study are not publicly available because we did not obtain from participant consent for publication but are available from the corresponding author on reasonable request.