Inter-brain synchronization during coordination of speech rhythm in human-to-human social interaction

Kawasaki, Masahiro; Yamada, Yohei; Ushiku, Yosuke; Miyauchi, Eri; Yamaguchi, Yoko

doi:10.1038/srep01692

Download PDF

Article
Open access
Published: 22 April 2013

Inter-brain synchronization during coordination of speech rhythm in human-to-human social interaction

Masahiro Kawasaki^1,2,3,
Yohei Yamada¹,
Yosuke Ushiku¹,
Eri Miyauchi¹ &
…
Yoko Yamaguchi⁴

Scientific Reports volume 3, Article number: 1692 (2013) Cite this article

13k Accesses
140 Citations
26 Altmetric
Metrics details

Subjects

Abstract

Behavioral rhythms synchronize between humans for communication; however, the relationship of brain rhythm synchronization during speech rhythm synchronization between individuals remains unclear. Here, we conducted alternating speech tasks in which two subjects alternately pronounced letters of the alphabet during hyperscanning electroencephalography. Twenty pairs of subjects performed the task before and after each subject individually performed the task with a machine that pronounced letters at almost constant intervals. Speech rhythms were more likely to become synchronized in human–human tasks than human–machine tasks. Moreover, theta/alpha (6–12 Hz) amplitudes synchronized in the same temporal and lateral-parietal regions in each pair. Behavioral and inter-brain synchronizations were enhanced after human–machine tasks. These results indicate that inter-brain synchronizations are tightly linked to speech synchronizations between subjects. Furthermore, theta/alpha inter-brain synchronizations were also found in subjects while they observed human–machine tasks, which suggests that the inter-brain synchronization might reflect empathy for others' speech rhythms.

Induced alpha and beta electroencephalographic rhythms covary with single-trial speech intelligibility in competition

Article Open access 23 June 2023

Auditory-motor synchronization varies among individuals and is critically shaped by acoustic features

Article Open access 21 June 2023

The relationship between stability of interpersonal coordination and inter-brain EEG synchronization during anti-phase tapping

Article Open access 13 April 2022

Introduction

Individual human behavioral rhythms in nature are independent but can be spontaneously synchronized and entrained to become a shared rhythm through interactions with others (i.e., social interactions) by both verbal and nonverbal communication^1,2. We experience daily synchronizations with others, such as with hand clapping or foot tapping and incidental coordination of speech frequencies in conversations^3,4,5,6. This unconscious, shared rhythm brings individuals close to each other, generates empathy and coordinates performance^7,8. Indeed, mother-infant rhythmic coupling through the imitation of movement supports social and cognitive development, such as language acquisition and learning⁹.

Coordinated rhythms have been observed in the brain not only as local- and distant-regional synchronizations within an individual brain^10,11 but also as inter-brain synchronizations between individuals¹². Human brain rhythms in specific electroencephalography (EEG) frequency bands have increased not only while performing synchronized behaviors but also while observing such behaviors^13,14. These brain activities are related to the understanding of others' intentions (i.e., social cognition), in this case, the understanding of others' behavioral rhythms¹⁵. Moreover, recent EEG studies that scanned multiple brains at the same time have shown phase synchronizations between individuals during the imitation of hand movements^16,17,18,19 or with participation in cooperative games and actions^20,21,22. Synchronized changes between individual brains have also been reported in functional magnetic resonance imaging (fMRI; 23–26) and near-infrared spectroscopy (fNIRS; 27, 28) studies.

Although there is considerable evidence of behavioral synchronization and brain synchronization between individuals during social interactions through nonverbal communication, these relationships are not as clear during verbal communication. One EEG hyperscanning study revealed correlated brain activities between the speakers and listeners during verbal communication³¹, but this study did not address the behavioral rhythm (i.e., the speech rhythm) itself. Furthermore, it is poorly understood whether such synchronizations include both simultaneous common movement and turn-taking, the latter of which is typically non-simultaneous and requires unconscious alternation of behavior, such as between speakers during a conversation²⁹. Turn-taking requires interpersonal synchronization of speech rhythms, including timing, duration, interval and speed of speech, along with the content and context of the conversation itself³⁰.

In the current study, we attempted to address whether inter-brain synchronizations on an EEG appear when speech rhythms are synchronized between two subjects in verbal communication. We conducted alternating speech tasks in which two subjects alternately and sequentially pronounced the alphabet during hyperscanning EEG recordings (human–human, Fig. 1A). The behavioral and brain synchronization between the subjects were evaluated in terms of the correlation of speech rhythms (duration and interval of pronunciation) and brain rhythms (EEG oscillatory amplitudes in specific frequency bands), respectively.

In addition, each subject participated in alternating speech tasks with a machine (human–machine, Fig. 1A) and these results were compared with the results of the human–human tasks to address the following questions: (1) How does inter-brain synchronization change after the two subjects' behavioral rhythms have been coordinated to common rhythms (i.e., the machine's rhythms) and (2) how does inter-brain synchronization between a subject performing and a subject observing the task with a machine (social cognition) differ from the inter-brain synchronization between subjects in the human condition (social interaction)?

Results

Speech rhythms: voice sounds

Each subject performed two types of alternating speech tasks: human–human tasks and human–machine tasks (Fig. 1A). They completed 14 sessions consisting of two pre-machine human–human sessions, 10 successive human–machine sessions [five voices (electronic, male, female, partner's and subject's) at two paces (fixed and random)] and two post-machine human–human sessions (Fig. 1B). We recorded the sounds of human–human alternating speech tasks. We dissociated the durations of voices and intervals between the voices of the pair of subjects using short-time Fourier transformations (Fig. 2A).

During the human–human tasks, the durations and intervals of speech were correlated between the pair of subjects and not significantly different among the subjects (Fig. 3C, D). Notably, compared with the pre-machine tasks using one-factor repeated-measures ANOVA (pre-machine vs. post-machine), the pair-averaged correlations for the post-machine tasks were significantly higher (Fig. 3D; duration, F_{1, 104} = 4.50, P = 0.036 interval, F_{1, 104} = 6.19, P = 0.014) and the differences were significantly lower (Fig. 3E; duration, F_{1, 104} = 5.49, P = 0.021; interval, F_{1, 104} = 9.30, P = 0.003). Pre-machine, the durations and intervals were not correlated between the two subjects in six pairs. Moreover, the durations were significantly different between the two subjects in four pairs and the intervals were significantly different between the two subjects in three pairs. Post-machine, 17 of 18 pairs showed significant correlations in the duration and interval of the speech (example shown in Fig. 3B, C).

For human–machine tasks, the correlations between subject pairs were lower and the differences between subject pairs were higher than those for the human–human tasks [Fig. 3C, D; correlation of duration, F_{1, 338} = 58.36, P = 0.001; correlation of interval, F_{1, 338} = 127.01, P = 0.001; difference of duration, F_{1, 338} = 8.09, P = 0.004; difference of interval, F_{1, 338} = 9.04, P = 0.002; one-factor repeated-measures ANOVA (human vs. machine conditions)].

Brain rhythms: EEG oscillations

To characterize brain oscillatory activity, we conducted wavelet analysis on the collected EEG data^28,29. The subject-averaged amplitudes of each frequency (ranging from 4 to 28 Hz) for performing the alternating speech tasks in both human–human and human–machine conditions were significantly higher than those for subjects who were observing the human–machine tasks [Fig. 4A; P < 0.01; one-factor repeated-measures ANOVA (human vs. machine conditions)]. Frequency ranges (4–28 Hz) were divided into two parts: theta/alpha bands (6–12 Hz), which showed higher amplitudes for human than machine conditions and beta bands (20–28 Hz), which showed the opposite (machine > human).

The theta/alpha band amplitudes increased in the frontal regions during both human–human and human–machine tasks and extended to the central and parietal regions during human–human tasks (Fig. 4B). The theta/alpha amplitudes during human–human tasks were significantly higher than those during the human–machine tasks in the central and parietal regions (electrodes C3, T7, CP1 and CP2). For the human–human tasks, the post-machine activities were significantly higher than pre-machine activities in the Fp2, C3 and P4 electrodes.

Similar to the theta/alpha amplitudes, the beta band amplitudes showed enhancement in the frontal regions but extended to the temporal and occipital regions under both human and machine conditions. However, there was no significant difference between the conditions for any electrode. Moreover, the beta activities showed no significant difference between the post-machine and pre-machine human–human tasks.

To investigate inter-brain synchronization during alternating speech, we conducted cross-correlation analyses of the theta/alpha amplitudes from each pair of subjects' EEG data because these amplitudes were significantly different during the human–human and human–machine tasks. An example of the time course of a theta/alpha amplitude and the cross-correlation coefficients for one subject pair during a human–human alternating speech task is shown in Fig. 5A (left). In pair-averaged correlations of the theta/alpha amplitudes, high-peaked correlations were distributed in the temporal regions (electrodes F7, FC5, T7, T8, CP1 and CP2) for both pre- and post-machine using cross-correlation analyses that included both the pre- and post-machine conditions (Fig. 5B). For most of the electrodes, these values were significantly higher for the post-machine conditions [Fig. 5D; P < 0.05; one-factor repeated-measures ANOVA (pre-machine vs. post-machine) with post-hoc analyses (Bonferroni correction)].

The theta/alpha cross-correlation coefficients for the temporal and parietal electrodes were significantly correlated with the individual behavioral synchronizations between the subjects; in other words, there were high correlations between the speech durations and intervals between the two subjects (Fig. 5E; electrode measuring the peak statistic value, T7; correlation of duration, r = 0.49, P = 0.046; correlation of interval, r = 0.44, P = 0.071).

Finally, we investigated theta/alpha inter-brain synchronization between subjects who participated in alternating speech tasks with a machine and the subjects and observed the human–machine tasks (an example is shown in Fig. 5A, right). High-peaked correlations partially overlapped in the temporal regions, the same area in which significant correlations were seen in subjects during human–human tasks (compare Fig. 5B and Fig. 5C; electrodes F7, T7, CP1 and CP2). However, the theta/alpha cross-correlation coefficients for the temporal regions were not significantly correlated with the individual behavioral synchronizations between subjects (Fig. 5E; electrode measuring the peak statistic value, T7; correlation of duration, r = −0.25, P = 0.320; correlation of interval, r = −0.14, P = 0.591).

Subjective ratings

After each alternating speech session, the subjects were asked to rate the following factors using a 5-point scale: “comfort,” “synchrony,” “speed,” “initiative,” and “humanity” (only for the human–machine tasks). For all of the factors except for humanity, ratings for the human partners were significantly higher than those for the machine partner [comfort, F_{1, 502} = 82.54, P = 0.001; synchrony, F_{1, 502} = 80.75, P = 0.001; speed, F_{1, 502} = 7.59, P = 0.006; initiative, F_{1, 502} = 92.30, P = 0.001; one-factor repeated-measures ANOVA (human vs. machine conditions)]. These results suggested that the subjects felt more comfortable, had better and faster synchronization and had a higher initiative when performing alternating speech with a human partner than with the machine partner.

For the human–human tasks, post-machine ratings were significantly higher than pre-machine ratings for comfort (F_{1, 142} = 30.71, P = 0.001), synchrony (F_{1, 142} = 4.45, P = 0.036) and speed (F_{1, 142} = 17.65, P = 0.001), but the post-machine initiative was rated slightly lower (in favor of the partner's initiative rather than the subject's own initiative; F_{1, 142} = 2.47, P = 0.118).

Effects of the machine's voice

Next, we examined the effects of machine voice type on the behavioral and EEG rhythms in human subjects participating in human–machine alternating speech tasks. In the analyses of speech rhythms (i.e., the correlations and differences of the durations and intervals), we conducted two-factor repeated-measures ANOVA for voice types and paces. For correlation of voices, the ANOVA showed a main effect for voice type (correlation of duration, F_{4, 222} = 3.77, P = 0.005; correlation of interval, F_{4, 222} = 13.00, P = 0.001) but no effect for pace (correlation of duration, F_{1, 222} = 0.25, P = 0.62; correlation of interval, F_{1, 222} = 0.29, P = 0.59). In contrast, for the difference in the voices, the ANOVA showed no effect of voice (difference of duration, F_{4, 222} = 2.23, P = 0.066; difference of interval, F_{4, 222} = 2.35, P = 0.055) or pace (difference of duration, F_{1, 222} = 0.01, P = 0.920; difference of interval, F_{1, 222} = 0.04, P = 0.841). Post-hoc analyses (Bonferroni correction) showed significant differences between voice types (Fig. 6A; P < 0.05; correlation of duration, male and female < electronic; difference of duration, self < male; correlation of interval, male < other voices; difference of interval, female and self < electronic and male).

The EEG analyses revealed theta/alpha activities that showed a main effect of voice but no effect of pace in some of the electrodes using a two-factor repeated-measures ANOVA. Post-hoc analyses revealed that the activities were higher with female and the partner's voices than for the other voices (P < 0.05; Bonferroni correction). Moreover, only temporal electrodes registered significant correlations between the speakers and observers in response to the machine's voices according to a two-factor repeated-measures ANOVA (Fig. 6B; T7 electrode vs. CP2 electrode). On the temporal electrodes, inter-brain correlations were higher for the female and partner's voices than for the electronic and male voices (P < 0.05).

Along with the behavioral and EEG results, the subjective ratings showed significant differences based on voice type: for comfort, electronic < all other voices; for synchrony, electronic < self and female; for initiative, electronic and male < self; and for humanity, electronic < self and partner (P < 0.05; Bonferroni correction).

There were no differences in any of the subjective emotions, speech, or EEG rhythms between the condition with a fixed pace of machine speech and that with a random pace. This finding might have occurred because of the small range used for the random pace (400–600 msec) and its similarity to the fixed pace (500 msec). Indeed, most of the subjects did not notice the difference between the paces used. Although this study revealed no differences between the fixed and random paces of the machine's speech, the possible effect of adaptation in human and machine communications will need to be clarified in future studies.

Discussion

The current study is the first to describe inter-brain synchronization along with synchronization of speech rhythms between subjects during an alternating speech task. Individual speech rhythms (i.e., duration of speech and intervals between two voices) were more likely to synchronize (high correlations and small differences), which is similar to the spontaneous synchronizations that were observed between humans in previous nonverbal studies^{2,3,4,13,14,15,16}. This phenomenon is specific to human–human communication because the same behavioral synchronizations were not found during the human–machine alternating speech tasks with the machine pronouncing letters at a fixed interval. The behavioral synchronization was also correlated with the theta/alpha oscillatory amplitudes, which were enhanced and synchronized between two subjects in the same bilateral temporal and lateral parietal regions during the human–human tasks. This result suggests that inter-brain synchronization tightly links to speech synchronization between subjects. Indeed, the inter-brain synchronization was enhanced as the speech rhythm coordination developed, with previous human–human and human–machine interactions strengthening subsequent human–human behavioral and brain coordination (comparing pre- and post-machine conditions).

Inter-brain synchronizations are of great importance to our cooperative communication. In previous studies¹⁹, inter-brain synchronizations have been reported in nonverbal communication, such as the imitation of hand movements^16,17,18 or during participation in cooperative games and actions^20,21,22 and in verbal communication between listeners and speakers³¹. Such imitations, games and speeches included the context of the subjects' behaviors. The present study indicated that inter-brain synchronizations are also found during simple communication, without the context of the subjects' behavior.

Numerous EEG studies have proposed that theta/alpha activities are involved in high cognitive functions, such as working memory^11,32,33,34. For example, theta and alpha oscillations in the temporal and parietal brain regions are modulated during auditory and visual working memory tasks, respectively¹¹. In this study, the theta/alpha modulation in the temporal regions during auditory communication is similarly related to the auditory working memory. Therefore, we propose that social interactions that lead to successful turn-taking require a working memory of others' speech rhythms. Similarly, previous studies have shown that alpha inter-brain synchronizations are also used in tasks that require working memory for others' behavior, such as imitation tasks¹⁶ and cooperative tapping tasks¹⁴.

In this study, theta/alpha inter-brain synchronization was also observed in the temporal and lateral parietal regions between speakers and observers (i.e., between subjects who participate in and who observe the human–machine tasks). These regions, which are in the vicinity of the temporal parietal junction (TPJ), have been proposed to be related to social cognition, such as understanding others' intentions, emotions and behavior, including the theory of mind and the mentalizing process^{33,34,35,36,37,38}. The TPJ is associated not only with the observation of human behaviors but also with the observation of machine (i.e., robotic) behavior^39,40,41. Moreover, the theta/alpha amplitude modulation has been reported to be related to the coordination of self-behaviors for the observation of others' behaviors in tapping tasks^13,14. Thus, there are spatial and frequency overlaps between the neural mechanisms for social interaction and social cognition¹⁵. It should be noted, however, that the neural mechanisms for social cognition are enhanced during social interaction because the lateral parietal theta/alpha amplitudes during human–human tasks were higher and their synchronizations were greater than those observed during human–machine tasks.

There is a possibility that inter-brain synchronizations are related to jaw movements causing muscle artifacts in EEG data, especially within the temporal regions⁴². However, the theta/alpha enhancements in the temporal regions were not observed in human–machine tasks, although the tasks required subjects to pronounce letters in the same manner as in the human–human tasks. Moreover, in the human–human tasks, when one subject pronounced letters, the other subject did not pronounce letters. In other words, the two subjects did not move their jaws simultaneously. Finally, the human–machine tasks did not ask one subject (i.e., the observer) to pronounce letters when another subject (i.e., the speaker) participated in the task. Therefore, we concluded that the inter-brain connectivity was not due to the muscle movement artifacts.

In addition, our findings reveal the importance of the familiarity of the machine's voice (e.g., a Japanese-native voice for Japanese subjects) in human–machine speech coordination. Use of the subjects' voices, partners' voices, or a Japanese-native female voice showed higher correlations and smaller differences in the speech rhythms and higher theta/alpha synchronizations in the EEG than did other types of voices. Consistent with both behavioral and brain synchronizations, the use of familiar voices and human-like machine voices made the subjects feel comfortable and synchronized. Typically, the subjects' speech rhythms were more likely to be synchronized when the voices used were familiar, with positive subjective impressions, which underscores important differences in language and learning^43,44.

Interestingly, the effects of the machine's voice on the inter-brain synchronizations between the speakers and observers demonstrated functional dissociations within the temporal-parietal regions. The inter-brain synchronization of the temporal regions is susceptible to the machine's voice; greater synchronization was observed with the use of familiar voices. This finding might be due to the sensitivity of the primary auditory areas in the temporal brain regions to auditory perceptual properties^45,46. However, the lateral parietal regions usually showed inter-brain synchronization, regardless of the machine's voice.

The existing relationships between the subjects might also affect the degree of empathy and their coordination of behavioral and brain rhythms, although there were no differences between the genders, ages and relationships of the subject pairs who were acquaintances or strangers at the time of the study (e.g., more than half of the desynchronized pairs were already acquaintances). Future studies should address how behavioral and brain synchronization are influenced by personal relationships (e.g., mother-infant, husband-wife, boss-subordinate, teacher-student, etc.).