Measuring speaker–listener neural coupling with functional near infrared spectroscopy

The present study investigates brain-to-brain coupling, defined as inter-subject correlations in the hemodynamic response, during natural verbal communication. We used functional near-infrared spectroscopy (fNIRS) to record brain activity of 3 speakers telling stories and 15 listeners comprehending audio recordings of these stories. Listeners’ brain activity was significantly correlated with speakers’ with a delay. This between-brain correlation disappeared when verbal communication failed. We further compared the fNIRS and functional Magnetic Resonance Imaging (fMRI) recordings of listeners comprehending the same story and found a significant relationship between the fNIRS oxygenated-hemoglobin concentration changes and the fMRI BOLD in brain areas associated with speech comprehension. This correlation between fNIRS and fMRI was only present when data from the same story were compared between the two modalities and vanished when data from different stories were compared; this cross-modality consistency further highlights the reliability of the spatiotemporal brain activation pattern as a measure of story comprehension. Our findings suggest that fNIRS can be used for investigating brain-to-brain coupling during verbal communication in natural settings.


Results
Natural communication unfolds through collective participation of speaker and listener: speakers construct grammatical sentences based on thoughts, convert these to motor plans, and execute the plans to produce vocal sounds; listeners analyze the sounds, build phonemes into words and sentences, and ultimately decode sound patterns into meaning. In our study, we observed significant speaker-listener temporal coupling only during successful verbal communication. When communication was blocked (i.e., during listening to a foreign, incomprehensible language), the synchronization was lost. As expected, this synchronization was present in a temporally shifted time course of the speaker's brain activity relative to the moment of vocalization. Hence, listeners' brain activity mirrors the speaker's brain activity with a delay. These lagged responses suggest that on average, the speaker's production-based processes precede and likely induce the mirrored activity observed in the listeners' brains during comprehension. Furthermore, synchronization was also present between listeners with zero shift, representing temporally aligned listener-to-listener inter-subject coupling. Again, this coupling was only present during successful communication and disappeared when communication was blocked. Finally, we found strong consistencies between the evidence from fNIRS and fMRI for the presence of coupling during successful communication.
SCIENtIFIC REPORTS | 7:43293 | DOI: 10.1038/srep43293 Listener-listener fNIRS inter-subject correlation. For each story, significantly coupled optodes were identified using multilevel general linear model (GLM) and the results are shown in Fig. 1. As expected, significant results were found only for the English conditions E1 and E2 (false discovery rate 33 [FDR] q < 0.01), indicating that neural coupling only emerges during successful communication (i.e. when subjects understand the story content). Δ HbO shows a much stronger coupling effect than Δ HbR. A contrast for listening comprehension using successful communication stories (E1 and E2) minus control conditions (T1 and T2) was further included in Fig. 1 as E1 + E2 − T1 − T2. Consistent with previous fMRI studies, superior frontal gyrus, inferior parietal and angular gyrus were robustly correlated across listeners during speech comprehension 34 . Although the function of these regions is far from clear, they have been associated with various linguistic functions-in particular, semantic processing and comprehension 35,36 . Speaker-listener fNIRS coupling. The neural coupling between speaker and listener may not be restricted to homologous brain areas 13 . Previous studies also showed that the neural responses of listeners can lag behind 12,13 or precede 12 those of the speaker, facilitating comprehension and anticipation, respectively. To investigate these effects, multilevel GLM has been adopted to evaluate the coupling between all permutations of (speaker optode, listener optode) pairs with the speaker's time course shifted with respect to those of the listeners from − 20 s to 20 s in 0.5 s increments, where a positive shift represents the speaker preceding (listener lagging), and the results are shown in Fig. 2. For the English story E1, the listeners' fNIRS signals were found to be significantly coupled with the speaker's signal with a 5-7 s time delay, and the number of significantly coupled optodes peaked at 5 s (Fig. 2a). As expected, no temporal asymmetry has been found for the listener-listener case and alignment is coupled to the incoming auditory input (i.e. lag 0, moment of vocalization) (Fig. 2b). The speaker-listener lagged correlation replicated the speaker-listener lagged correlation observed with fMRI 12 . Significant couplings can mainly be found between prefrontal of speaker and parietal of listeners in the medial prefrontal and left parietal areas for Δ HbO and no results was significant at FDR q < 0.01 level for Δ HbR (Fig. 2c). The relationships between significantly coupled anatomical locations in speaker and listener brains are further listed in Figure S1. These areas include many of the sensory, classic linguistic-related, and extralinguistic-related brain areas, demonstrating that areas involved in speech comprehension (listener-listener coupling) are also aligned during communication (speaker-listener coupling). No significant speaker-listener coupling was found for either of the Turkish stories (stories T1 and T2). Using identical analysis methods and statistical thresholds, we found no significant coupling between the speaker and the listeners or among the listeners during these control conditions (see Fig. 1, T1 and T2).
To further investigate the temporal asymmetry of coupling, the average t-statistics for all significantly coupled speaker-listener optode-pairs were assessed across time shifts between the speaker and listener time series, as shown in Fig. 3 (red curve). The peak of the curve is centered at 5 seconds, which shows that, on average, listeners' time courses lagged behind the speaker's. In comparison, the time courses of the listeners were synchronized (with each other) at 0 sec (Fig. 3, blue curve).
Listener-listener BOLD coupling. As a verification of the fNIRS approach, we reanalyzed an fMRI dataset of 17 subjects listening to story E2 (the "Pie-man" story), which was recorded and used in a previous study 14,27 . To compare with the fNIRS results, we considered only voxels from the outer layer of the cortex in the neighboring regions of prefrontal and parietal sites. The coupling results estimated using the multilevel GLM model are shown   shown that fNIRS and fMRI signals are highly correlated across multiple cognitive tasks 31,32,38,39 . In our study, two groups of subjects, the brain activity of one group measured with fNIRS and the other with fMRI, were engaged in the same task of listening to the E2 story ("Pie-man"). We hypothesize that the BOLD and fNIRS signals share common information even though they were measured from different subjects and in different recording environments. To directly compare the signals across fMRI and fNIRS, we estimated correlations between spatially overlapping voxel-optode pairs while subjects listened to the exact same story. Widespread significant correlations can be found between BOLD and Δ HbO only when the participants were listening to the same story (i.e. E2) as shown in Fig. 5a. For control stories T1, T2 (Turkish) and E1 (English), we observed no coupling effect.

Discussion
During social interaction, the brains of individuals become coupled as those individuals send and receive signals (light, sound, etc.) through the environment, analogous to a wireless communication system 8 . This brain-to-brain coupling relies on the stimulus-to-brain coupling which reflects the brain's ability to be coupled with the physical world in order to represent it, veridically and dynamically. In this study, we identified the brain-to-brain coupling between a speaker telling an unrehearsed real-life story and a group of listeners listening to the story, with the aid of fNIRS. We have demonstrated for the first time that it is feasible to study neural coupling during natural verbal communication with fNIRS. While there is a growing literature using fMRI and EEG to study brain-to-brain coupling during social interaction 40 , the application of fNIRS in the field is still rare. Cui et al. in 2012 first adopted fNIRS to investigate neural coupling between pairs of subjects playing a simple cooperation and competition in 0.5 s increments. A larger t-statistic indicates stronger synchrony between the signal time courses. Listenerlistener coupling was centered at 0 sec whereas speaker-listener coupling was centered at 5 sec. This suggests that, on average, the speaker preceded, and listeners needed time to process the information conveyed in the stories in order to synchronize with the speaker. Similar results have been found previously 12,13 .    23 . In the game, participants were asked to press a response key after a 'go' signal, either in synchrony with (cooperation mode) or faster than (competition mode) their partner. An increase in neural correlation between the members of a pair was found only during cooperation and was associated with better cooperation performance. Holper et al. in 2012 investigated neural coupling between a model and an imitator during a finger tapping task 24 . A stronger increase in neural coupling was found during the imitation condition compared with a control condition in which the imitator no longer needed to follow the model's tapping pace. These two studies, however, involved only simple stimuli and contexts.
Our results demonstrated that: (1) the brain activation recorded by fNIRS was coupled between a speaker telling a real-life story and listeners listening to the story (speaker-listener coupling); (2) on average, the listeners' brain activity lagged behind that of the speaker (5-s delay); (3) the brain activity evoked by the same story was reliable across the listeners (listener-to-listener coupling); and (4) coupling was not present when listeners heard stories in a language incomprehensible to them. These findings are consistent with previous work using fMRI 12 and EEG 13 and demonstrate that fNIRS can be used to study brain-to-brain coupling during social interaction in a natural communicative context. During brain-to-brain coupling, activity in areas of prefrontal and parietal cortex previously reported to be involved in sentence comprehension were robustly correlated across subjects, as revealed in the inter-subject correlation analysis 34 . As these are task-related (active listening) activation periods (not resting, etc.), the correlations reflect modulation of these regions by the time-varying content of the narratives, and comprise linguistic, conceptual and affective processing. As expected, and in agreement with previous work 12 , the activity among listeners is time locked to the moment of vocalization. In contrast, the activity in the listener's brain lagged behind the activity in the speaker's brain. Though the sluggish hemodynamic response measures of fMRI and fNIRS have lower temporal resolution and could mask the exact temporal speaker-listener coupling, the delay (5-sec) we have identified in this study is consistent with previous work on the multiple timescales of linguistic processing 10 . In particular, analyses integrating single-unit, ECoG, and fMRI data have uncovered a hierarchy of timescales over which natural communication unfolds in the brain, ranging from hundreds of milliseconds (single words) to a few seconds (sentences) to tens of seconds (paragraphs) 10 . These previous findings indicate that each brain area along this hierarchical pathway accumulates and integrates information at a preferred timescale (e.g., early auditory areas integrate phonemes to detect words, while later areas integrate words to form a sentence, etc.) and transmits the chunked information upstream to the next processing level. The relatively long (5-sec) lag between speaker and listener seems to match the processing timescale of natural discourse, which unfolds at the sentence level. In agreement with such an observation, the speaker-listener neural coupling emerges in areas with relatively long processing timescales.
Another interesting observation is that significant inter-subject correlations were found primarily between prefrontal areas in the speaker and parietal areas in the listeners. This result supports a previous EEG study in which the coupling between speaker and listener was found to be mainly between different channels 13 . In that study, listeners watched the video playback of speakers who were telling either a fairytale or the plot of their favorite movie or book. A canonical correlation analysis between the EEG of the speakers and listeners showed that coupling was mainly limited to non-homologous channels although source localization was not applied. To our best knowledge, our study presents the first fNIRS-based evidence that the brain-to-brain coupling between speaker and listener was mainly between non-homologous brain areas.
In the current study, we further compared the fMRI and fNIRS signal time courses when two groups of subjects were listening to the same audio recording of a real-life story. The neural activation of one group was recorded with fNIRS. The neural activation of the other group (the fMRI group) was recorded with fMRI in a previous study 14 . We first analyzed the two datasets separately with inter-subject multilevel GLM and found similar patterns of coupling between listeners in the two modalities. We then investigated the correlation between fMRI and fNIRS signals and found that Δ HbO and BOLD were significantly correlated, despite the fact that they were collected from different subjects in different recording environments, and with different techniques (fMRI vs. fNIRS). Furthermore, the fMRI voxels that were significantly correlated with fNIRS optodes were not randomly distributed but came from brain areas usually considered to be related to listening comprehension 1 . When the fNIRS and fMRI signals corresponding to different stories were compared for control purposes, no significant correlation was found, as expected. Significant BOLD-Δ HbR correlations were found but to a much lesser extent compared to Δ HbO (Fig. 5a). One possible explanation is the superior reliability of Δ HbO across the listeners compared to Δ HbR as shown in Fig. 1.
For many years, researchers have been interested in comparing the fNIRS and fMRI responses during cognitive tasks. Strangman et al. simultaneously recorded fNIRS and fMRI in a finger flexion/extension task 30 . Although the authors expected the Δ HbR-BOLD correlation to be strongest due to the causal relation between the Δ HbR and BOLD signals, the results suggested stronger BOLD-Δ HbO correlations. The authors suspected this might be due to a higher signal-to-noise ratio (SNR) of Δ HbO in response to the task. Cui et al. simultaneously recorded fNIRS and fMRI for the same group of subjects during 4 tasks: finger tapping, go/no-go, judgment of line orientation and visuospatial n-back 31 . They found that both Δ HbR and Δ HbO were correlated with BOLD, despite differences in SNR, and the type of task did not significantly affect the correlations. Δ HbO-BOLD correlations were found to be slightly but significantly higher than Δ HbR-BOLD. Noah et al. compared fMRI and fNIRS measurements in a naturalistic task in which participants played the video game Dance Dance Revolution and rested alternately for 30 second blocks, and the results suggested a high Δ HbO-BOLD correlation within the same measurement areas 32 . All of the studies above compared the mean triggered average activity induced by averaging a condition over time. However, Ben-Yakov et al. demonstrated the shortcoming of the triggered averaging method for detecting an event-specific responses which are locked to the structure of each particular exemplar 26 . In our study, each event in the story is unique and singular and cannot be averaged with the responses evoked by other events in the story. Our findings provide evidence that the response dynamics to a sequence of events in the story are robust and reliable, and can be detected both with fMRI and fNIRS, by measuring the reliability (correlation) of responses to the story within and between the two methods in real-life complex settings that unfolds over several minutes.
The current study, however, is only the first step toward studying brain coupling during natural communication using fNIRS. While fNIRS has obvious disadvantages relative to fMRI, which include coarser spatial resolution and an inability to measure signal beneath the cortical surface, it also has some advantages over fMRI. The advantages of fNIRS over fMRI include: (1) lower cost; (2) easier setup; (3) higher temporal resolution, (4) greater ecological validity, which can allow face-to-face communication (in fMRI setups subjects cannot see each other); (5) greater ease of connecting two systems simultaneously to test bi-directional dialogue-based communication; and (6) ease of use in real-life clinical communicative contexts. In recent studies, fNIRS has been used in extreme settings such as in aerospace applications 41,42 and with mobile participants walking outdoors 20 . In the context of the current study, the true advantages of fNIRS can be exploited when the neural activation of two or more subjects is studied during face-to-face conversations in a natural context such as a classroom.
Despite these promising results, our study was limited in certain aspects. First, for the fNIRS-fMRI comparison, the spatial resolution and coverage of our fNIRS system were limited, so fMRI-fNIRS correlations were estimated between all possible voxel-optode pairs within large cortical areas or the whole brain. Second, our data from fNIRS and fMRI were not recorded simultaneously and involved different participants, so any possible between-subject differences should be taken into account when interpreting the results. Future studies, preferably with concurrent fNIRS and fMRI, can validate our findings and deepen our understanding of the fNIRS-fMRI relationship for complex natural stimuli. Potential future uses of this approach would include everyday settings such as classrooms, where we could investigate communication between a teacher and students, or in business meetings, across a speaker and attendees. However, due to the nature of the hemodynamic response that fNIRS measures, rapid communication may be difficult to analyze at short timescales (e.g., at the word-level). Future studies can investigate the temporal and spatial limits and requirements of fNIRS-based speaker-listener coupling.
In summary, our results showed that: (1) A speaker's and listeners' brain activity as measured with fNIRS were coupled only when the listeners understood the story; (2) listeners' brain activity mirrored the speaker's brain activity with a delay, (3) only during listening to the same real-life story, common brain activation patterns were evoked across listeners that were independent of the imaging technology (fNIRS or fMRI) and recording environment (quiet or noisy; sitting down or lying down); and (4) fNIRS and fMRI signals were correlated during the comprehension of the same real-life story. These results support fNIRS as a viable future tool to study brain-to-brain coupling during social interaction, in real-life and clinical settings such as a classroom, a business meeting, a political rally, or a doctor's office.

Materials and Methods
Participants. Three speakers (one male native English speaker, two male native Turkish speakers) and 15 native English listeners (8 females) volunteered to participate in the study and were included in the analysis. An additional six subjects participated in the study but were excluded from analysis due to technical issues during recording or the excessive motion artifact (detected both visually and using an automatic algorithm 43 ) presented in the large sensor array data that covers both prefrontal and parietal cortices. Subjects were all right-handed (mean LQ = 74.5, SD = 24.1) based on Edinburgh Handedness Inventory 44 and ages 18-35 years. All subjects had normal or corrected-to-normal vision. Participants did not have any history of neurological/mental disorder and were not taking any medication known to affect alertness or brain activity. None of the listeners understood Turkish. The protocol used in the study was reviewed and approved by the Institutional Review Board (IRB) of the Drexel University (DU). The methods were carried out in accordance with approved guidelines and participants gave written informed consent approved by the IRB of DU. Experimental Procedure. All participants were seated comfortably in front of a computer screen throughout the experiment. fNIRS data of the three speakers were recorded while they told an unrehearsed real-life story in their native language (either English or Turkish). Audio of the stories was recorded using a microphone. The resulting one English story (E1) and two Turkish stories (T1 and T2) were played to the listeners later. An additional real-life English story E2 ("Pie-man", recorded at "The Moth", a live storytelling event in New York City) used in several recent fMRI studies of natural verbal communication 14,26,27 , was also played to the listeners. fNIRS data were recorded from the listeners throughout the audio playbacks. The playback sequence always began with E2 (English story, Pie-man), and order of the remaining stories (E1, T1, and T2) was counterbalanced across subjects. Before each story playback, short samples of scrambled audio were played to the subjects so they could adjust the volume of the headphones they were wearing. Before the start and after the end of the audio story playback, there was a 15-s fixation period for stabilizing the signal. Immediately after each playback, subjects were asked to write a detailed report of the story they just heard to verify if they understood the story. Figure 6, below, shows the timeline of a story session. Data Acquisition. Two continuous wave optical brain imaging devices were used simultaneously on each participant to record brain activity from prefrontal cortex (PFC) and parietal cortex (PL) using 40 measurement locations (optodes). Prefrontal and parietal regions were selected based on the significant areas found in the previous fMRI-based speaker-listener neural coupling study by Stephens, et al. 12 . Anterior prefrontal cortex was recorded by a 16-optode continuous wave fNIRS system (fNIR Imager Model 1100; fNIR Devices, LLC) first described by Chance et al. 45 and developed in our lab at Drexel University 46,47 . The sensor was positioned based on the anatomical landmarks as described before in Ayaz et al. 47 . Briefly, the center of the sensor was aligned to the midline and the bottom of the sensor was touching the participant's eyebrow so that the center point of the sensor was approximately at Fpz according to the 10-20 international system (see Fig. 7). The sensor has a source-detector distance of 2.5 cm and the sampling rate was 2 Hz. Parietal cortex was recorded using a 24-optode continuous wave Hitachi fNIRS system (ETG 4000; Hitachi Medical Systems). Two "3 × 3" measurement patches were attached to a cap that was customized for the measurement of the parietal cortex. For each subject, the center of the two patches was placed at Pz, which was located using a measuring tape. Sensors from each patch measured the fNIRS signal of one hemisphere from 12 channels. The sensor has a source-detector distance of 3 cm and the sampling rate was 10 Hz. Figure 7 shows the complete sensor setup and optode configuration.
Time synchronization markers (triggers) were sent from the stimulus presentation computer to both fNIRS acquisition computers for registration of audio playback start and end times on both fNIRS devices and for temporal aligning recorded data for all subjects.
The approximate projection of the channel locations onto the cortical surface in MNI space was estimated using a virtual spatial registration approach 48,49 . In this approach, the sensor patches are virtually placed on an ideal scalp and the projected Montreal Neurological Institute (MNI) coordinates on the cortical surface and the standard deviation of displacement were estimated from the magnetic resonance (MR) images of 17 individuals that were obtained from a publicly available dataset 50,51 . The results are shown in Fig. 8. The optodes covered regions in the frontopolar area, orbitofrontal area, dorsolateral prefrontal cortex, primary somatosensory cortex, somatosensory association cortex, supramarginal gyrus and angular gyrus. Detailed lists of anatomical locations and Brodmann areas corresponding to each optode are included in Table S1 and Table S2, respectively. fNIRS Data Preprocessing. fNIRS raw light intensity signals were converted to changes in oxygenated hemoglobin (Δ HbO) and deoxygenated hemoglobin (Δ HbR) concentrations using the modified Beer-Lambert law 52 . The raw signal and hemoglobin concentration changes were inspected both visually and also using the automated SMAR algorithm 53 , which uses a coefficient-of-variation based approach to assess signal quality and reject problematic channels with bad contact or saturated raw light intensity. Two optodes, 1 and 15, were over the hairline for most participants and hence were rejected from the study. A total of 11.7% were rejected by visual inspection and SMAR. Next, the Δ HbO and Δ HbR time series for each optode and participant were band-pass  filtered (0.01-0.5 Hz) and the recordings from parietal sites were down-sampled to 2 Hz to match the sampling rate of prefrontal sites. In the fNIRS literature, various filtering settings have been adopted to reduce physiological artifacts 31,54-58 . The high cutoff of a band-pass filter, for example, ranges from 0.1-1 Hz 31,54-56 . We chose 0.01-0.5 Hz cutoffs to reduce slow signal drift and physiological artifacts from cardiac activities but maximally preserve activities related to listening comprehension. We considered only the period from 15 to 399 seconds, with respect to story start, in the signal time courses from each audio story. The first 15 s were rejected to account for subjects' initial period of adjustment to the listening comprehension task, and 399 s is the duration of the shortest story. Prior to subsequent analysis, the signal time courses were standardized optode-wise using a Z-score transform.
fNIRS Analysis. Inter-subject correlation. We first evaluated the reliability of the correlation between listeners' brain activity using an inter-subject multilevel GLM similar to the one adopted by Stephens et al. 12 for each of the four conditions: E1, E2, T1 and T2. We expected neural coupling between listeners to emerge only for the English story conditions E1 and E2, as none of the subjects understood Turkish.
At the individual subject level, a GLM with AR(1) (first-order autoregressive) error model was estimated using the average time course of the listeners as the independent variable and the time course of an individual listener as the dependent variable as follows: is the time course of a channel from listener k, x t ( ) k is the average time course of a channel from all the other listeners except for listener k, ρ is the autocorrelation coefficient and ∈ σ N (0, ) 2 . AR(1) error model has been frequently adopted in event-related fNIRS analysis to model auto-correlated noise caused by low-frequency drift and physiological processes such as cardio-respiratory and blood pressure changes 57 .
At the group level, we tested the hypothesis β > H : 0 1 using a one-tailed one-sample t-test evaluated on the slopes β k ( = … k n 1, , ) estimated at the individual level. We used the Benjamini-Hochberg procedure 33  Speaker-listener coupling. We evaluated the coupling between optode i of speaker and optode j of the listeners for all permutations of (i j , ) ( = × = N o ptodes pairs 40 40 1600 ). The multilevel GLM was applied as it was for listener-listener coupling except that the time course of optode i of a speaker was used as independent variable comparison, listener-listener coupling was re-evaluated by shifting the average listener time series with respect to that of each individual listener.
fMRI Analysis. The fMRI dataset included 17 subjects listening to story E2 (the "Pie-man" story), which was recorded and used in a previous study 14 . To compare with the fNIRS results, we considered only voxels from the outer layer of the cortex in the neighboring regions of prefrontal and parietal sites as shown in Fig. 8. The "neighboring regions" are defined as voxels within a radius of 2.7 standard deviations from the center of each optode projection. A total of 994 voxels were chosen in this manner. The voxel time courses were high-pass filtered at 0.01 Hz (for comparison with fNIRS signals) and trimmed to only include 15-399 seconds (with respect to story start), and the inter-subject multilevel GLM described in section 0 was employed for model analysis. FDR was controlled among the 994 voxels with < . q 0 01.

fMRI-fNIRS Correlation.
To estimate the correlation between BOLD and fNIRS signals, both signals were high-pass filtered at 0.01 Hz. The BOLD time courses were z-scored normalized for each voxel and then averaged across the 17 subjects. fNIRS time courses were down-sampled to . 1 1 5 Hz to match the fMRI sampling rate. The inter-subject multilevel GLM approach was then applied with the time course of optode i of an fNIRS subject k as independent variable x t ( ) k i and the time course of voxel j averaged across fMRI subjects as dependent variable y t ( ) j . The aforementioned procedure was applied first to estimate all possible correlations between channels and their corresponding voxels in the left prefrontal (8 optodes × 131 voxels), right prefrontal (8 optodes × 144 voxels), left parietal (12 optodes × 376 voxels) and right parietal (12 optodes × 343 voxels) areas. The voxels were selected as described in section 0. To correct for multiple comparisons, FDR was controlled with a threshold of 0.05.