Introduction

Verbal communication involves the relaying of information between individuals through the use of sound patterns within a structure of language. For decades, neuroimaging technologies have been applied to study the neural mechanisms underlying the production and comprehension of language. Multiple brain areas have been identified to be involved with verbal communication using Positron Emission Tomography (PET) and functional Magnetic Resonance Imaging (fMRI)1, whereas the timing of auditory processing was studied with the aid of Electroencephalogram (EEG) and Magnetoencephalogram (MEG)2,3,4,5,6,7. Although important findings have been discovered using these technologies, there are two limitations in traditional neurolinguistic studies. First, these studies are mostly concerned with the cognitive process of either speech production or speech comprehension and confine the analysis to be within individual brains. Verbal communication, however, is an interactive process between speaker and listener. As pointed out by Hasson and others8, a complete understanding of the cognitive processes involved cannot be achieved without examining and understanding the interaction of neural activity among individuals. Second, cognitive functions are traditionally studied in a controlled laboratory environment. While this practice helps to isolate various factors (e.g. syntactical transformations or the representation of isolated lexical items), the ecological validity of the findings is not clear until tested in a real-life context. In addition, many studies confined the auditory stimuli to short lengths, often using isolated words or sentences for experimental control1,7,9. As a result, questions regarding the brain’s ability to accumulate information over longer time scales cannot be effectively investigated8,10,11.

With the recent advances in neuroimaging systems and methodology, researchers can now address which brain processes are involved in social interaction. Stephens et al. investigated the alignment (correlation) of neural activity between speaker and listener during natural verbal communication using fMRI12. In the study, brain activity was recorded when a speaker was telling a real-life story and later when listeners were listening to the audio recording of the story. Listeners’ brain activity was found to be coupled with the speaker’s brain activity with a delay, although for certain brain areas, listeners were ahead of the speaker in time, possibly due to a predictive anticipatory effect. Remarkably, higher coupling was found to be associated with better understanding of the story. This neural coupling between speaker and listener was further supported by a recent EEG study in which the coordination between the brain activity of speakers and listeners was investigated with canonical correlation analysis13. Additional findings in the same EEG study suggest that this speaker-listener neural coupling might not be restricted to homologous brain areas. In another study using fMRI, Lerner et al. recorded the Blood Oxygenation Level Dependent (BOLD) response of participants listening to a real-life story scrambled at the time scales of words, sentences and paragraphs. Inter-subject correlation analyses were employed to estimate the reliability of neural responses across subjects, and striking topography differences in brain activation were found at the different time scales14.

Although novel findings have been discovered using fMRI and EEG to address the aforementioned challenges, certain limitations of the two neuroimaging technologies have hindered the investigation of neural coupling during natural verbal communication. fMRI, for example, requires subjects to lie down motionlessly in a noisy scanning environment. Simultaneous scanning of multiple individuals engaged in a face-to-face communication is impractical for fMRI based setups. EEG, on the other hand, is able to provide a more naturalistic environment. However, EEG is susceptible to muscle induced artifacts during vocalization, and is therefore less suitable for studying speaker-listener interactions15. Furthermore, the localization of sources from the EEG signal requires higher-density recordings and additional computation to solve the inverse problem16,17,18.

In this study, we propose using functional near-infrared spectroscopy (fNIRS) to investigate speaker-listener coupling as an effective complement to the existing studies. fNIRS is an optical brain imaging technology for monitoring the concentration changes of oxygenated hemoglobin (ΔHbO) and deoxygenated hemoglobin (ΔHbR) in the cortex. By utilizing portable and wearable sensors, fNIRS provides an imaging solution with high ecological validity for studying cortical hemodynamic changes in real-life contexts19. Furthermore, fNIRS has been used in natural environments, including on mobile individuals outdoors20, consistent with the neuroergonomic and mobile brain/body imaging approaches21,22. fNIRS has been adopted to study brain-to-brain coupling during a cooperation-competition game23 and a finger-tapping imitation task24. fNIRS has demonstrated usefulness for studying single brain activations during social interactions in a natural setting with traditional block design25. At present, studying brain-to-brain coupling during natural verbal communication using fNIRS has not been demonstrated.

The main objective of this study is to evaluate speaker-listener brain-to-brain coupling in natural everyday settings. To achieve this objective, we designed an fNIRS experiment to replicate the speaker-listener neural coupling results from a previous fMRI study12. A native English speaker and two native Turkish speakers told an unrehearsed real-life story in their native language. An additional real-life English story E2 (“Pie-man”, recorded at “The Moth”, a live storytelling event in New York City) used in recent fMRI studies of natural verbal communication14,26,27, was also used here. The resulting two English stories (E1 and E2) and two Turkish stories (T1 and T2, the control conditions) were played to listeners who only understand English. We hypothesized that: (1) the listeners’ brain activity will demonstrate inter-subject correlation only when listening to a story they understand; (2) the English speaker’s brain activity during production of E1 will be coupled with the activity of the listeners during comprehension. We have targeted both prefrontal and parietal cortices, as these include cognitive and higher order extralinguistic areas that are known to be involved in social information processing crucial for successful communication including, among others, the capacity to discern the beliefs, desires, and goals of others12,28,29.

A second objective of this study is to compare the fNIRS recorded in this study with the fMRI BOLD response recorded in the previous fMRI study14. It has been known that fNIRS and fMRI BOLD signals, both of which are based on the neurovascular coupling phenomenon, are correlated in various cognitive tasks30,31,32. However, previous studies primarily adopted block designs with simple stimulation to investigate the mean activity induced by trigger-averaging a condition over time. The relationship between fNIRS and BOLD during natural verbal communication is yet to be shown. To this end, we hypothesized that fNIRS biomarkers (ΔHbO and ΔHbR) of our listeners are correlated with the fMRI BOLD responses of the previously tested listeners14 during the comprehension of only the same English story E2 (which all of these listeners heard) and not for the other English story E1 and Turkish stories T1 and T2 (which the previously tested fMRI participants did not hear). The convergence of fNIRS and fMRI for comprehension of the same story serves as a further validation that fNIRS can be used to investigate brain-to-brain coupling during natural verbal communication.

Results

Natural communication unfolds through collective participation of speaker and listener: speakers construct grammatical sentences based on thoughts, convert these to motor plans, and execute the plans to produce vocal sounds; listeners analyze the sounds, build phonemes into words and sentences, and ultimately decode sound patterns into meaning. In our study, we observed significant speaker–listener temporal coupling only during successful verbal communication. When communication was blocked (i.e., during listening to a foreign, incomprehensible language), the synchronization was lost. As expected, this synchronization was present in a temporally shifted time course of the speaker’s brain activity relative to the moment of vocalization. Hence, listeners’ brain activity mirrors the speaker’s brain activity with a delay. These lagged responses suggest that on average, the speaker’s production-based processes precede and likely induce the mirrored activity observed in the listeners’ brains during comprehension. Furthermore, synchronization was also present between listeners with zero shift, representing temporally aligned listener-to-listener inter-subject coupling. Again, this coupling was only present during successful communication and disappeared when communication was blocked. Finally, we found strong consistencies between the evidence from fNIRS and fMRI for the presence of coupling during successful communication.

Listener-listener fNIRS inter-subject correlation

For each story, significantly coupled optodes were identified using multilevel general linear model (GLM) and the results are shown in Fig. 1. As expected, significant results were found only for the English conditions E1 and E2 (false discovery rate33 [FDR] q < 0.01), indicating that neural coupling only emerges during successful communication (i.e. when subjects understand the story content). ΔHbO shows a much stronger coupling effect than ΔHbR. A contrast for listening comprehension using successful communication stories (E1 and E2) minus control conditions (T1 and T2) was further included in Fig. 1 as E1 + E2 −T1 − T2. Consistent with previous fMRI studies, superior frontal gyrus, inferior parietal and angular gyrus were robustly correlated across listeners during speech comprehension34. Although the function of these regions is far from clear, they have been associated with various linguistic functions—in particular, semantic processing and comprehension35,36.

Figure 1: Listener-listener fNIRS inter-subject correlation.
figure 1

The top row shows the t-Map of coupling results from ΔHbO and the bottom row shows the results from ΔHbR. White crosses represent non-significantly coupled optodes; Black crosses represent significantly coupled optodes (n = 15, FDR The t-Maps were smoothed using a spline method.

Speaker-listener fNIRS coupling

The neural coupling between speaker and listener may not be restricted to homologous brain areas13. Previous studies also showed that the neural responses of listeners can lag behind12,13 or precede12 those of the speaker, facilitating comprehension and anticipation, respectively. To investigate these effects, multilevel GLM has been adopted to evaluate the coupling between all permutations of (speaker optode, listener optode) pairs with the speaker’s time course shifted with respect to those of the listeners from −20 s to 20 s in 0.5 s increments, where a positive shift represents the speaker preceding (listener lagging), and the results are shown in Fig. 2. For the English story E1, the listeners’ fNIRS signals were found to be significantly coupled with the speaker’s signal with a 5–7 s time delay, and the number of significantly coupled optodes peaked at 5 s (Fig. 2a). As expected, no temporal asymmetry has been found for the listener-listener case and alignment is coupled to the incoming auditory input (i.e. lag 0, moment of vocalization) (Fig. 2b). The speaker-listener lagged correlation replicated the speaker-listener lagged correlation observed with fMRI12. Significant couplings can mainly be found between prefrontal of speaker and parietal of listeners in the medial prefrontal and left parietal areas for ΔHbO and no results was significant at FDR q < 0.01 level for ΔHbR (Fig. 2c). The relationships between significantly coupled anatomical locations in speaker and listener brains are further listed in Figure S1. These areas include many of the sensory, classic linguistic-related, and extralinguistic-related brain areas, demonstrating that areas involved in speech comprehension (listener–listener coupling) are also aligned during communication (speaker–listener coupling). No significant speaker-listener coupling was found for either of the Turkish stories (stories T1 and T2). Using identical analysis methods and statistical thresholds, we found no significant coupling between the speaker and the listeners or among the listeners during these control conditions (see Fig. 1, T1 and T2).

Figure 2: Speaker-listener and listener-listener neural coupling.
figure 2

(a) The number of optode-pairs at which the fNIRS time courses between a speaker and the listeners were significantly coupled (n = 15, FDR . Results are shown with the speaker’s time course shifted from −20 to 20 sec. with respect to the listeners’ in 0.5 s increments. Significant results can be found from 5 to 7 sec. of shift (speaker precedes) with a peak at 5 sec for the English condition E1. (b) The number of optodes showed significant listener-listener inter-subject correlation (n = 15, FDR . Results are shown with the average listener’s time course shifted from −20 to 20 sec. with respect to those of each individual listener in 0.5 s increments. No temporal asymmetry effect can be found. (c) Significantly coupled speaker-listener optode-pairs were in non-homologous areas (n = 15, FDR .

To further investigate the temporal asymmetry of coupling, the average t-statistics for all significantly coupled speaker-listener optode-pairs were assessed across time shifts between the speaker and listener time series, as shown in Fig. 3 (red curve). The peak of the curve is centered at 5 seconds, which shows that, on average, listeners’ time courses lagged behind the speaker’s. In comparison, the time courses of the listeners were synchronized (with each other) at 0 sec (Fig. 3, blue curve).

Figure 3: Delayed synchrony between speaker and listener.
figure 3

The mean distribution of t-values across significantly coupled optodes for speaker-listener (red) and listener-listener (blue) analyses. Results are shown with the speaker’s (or average listener’s) time course shifted from −20 to 20 sec. with respect to the listeners’ in 0.5 s increments. A larger t-statistic indicates stronger synchrony between the signal time courses. Listener-listener coupling was centered at 0 sec whereas speaker-listener coupling was centered at 5 sec. This suggests that, on average, the speaker preceded, and listeners needed time to process the information conveyed in the stories in order to synchronize with the speaker. Similar results have been found previously12,13.

Listener-listener BOLD coupling

As a verification of the fNIRS approach, we reanalyzed an fMRI dataset of 17 subjects listening to story E2 (the “Pie-man” story), which was recorded and used in a previous study14,27. To compare with the fNIRS results, we considered only voxels from the outer layer of the cortex in the neighboring regions of prefrontal and parietal sites. The coupling results estimated using the multilevel GLM model are shown in Fig. 4. Of the 994 investigated voxels, 551 showed significant listener-listener coupling (FDR q < 0.01. This result replicates published result14 and demonstrate a nice convergence across fNIRS and fMRI methods.

Figure 4: T-map of listener-listener coupling during the comprehension of story E2 (“Pie-man” story) evaluated with fMRI.
figure 4

The significance threshold was at t(16) = 2.9 (FDR . The t-statistics were superimposed on the ch2better template and rendered using MRIcron software37.

BOLD and ΔHbO are correlated during comprehension of the same story

Previous studies have shown that fNIRS and fMRI signals are highly correlated across multiple cognitive tasks31,32,38,39. In our study, two groups of subjects, the brain activity of one group measured with fNIRS and the other with fMRI, were engaged in the same task of listening to the E2 story (“Pie-man”). We hypothesize that the BOLD and fNIRS signals share common information even though they were measured from different subjects and in different recording environments. To directly compare the signals across fMRI and fNIRS, we estimated correlations between spatially overlapping voxel-optode pairs while subjects listened to the exact same story. Widespread significant correlations can be found between BOLD and ΔHbO only when the participants were listening to the same story (i.e. E2) as shown in Fig. 5a. For control stories T1, T2 (Turkish) and E1 (English), we observed no coupling effect.

Figure 5: Correlation between fNIRS optodes and corresponding fMRI voxels.
figure 5

994 related voxels were selected according to optode spatial registration (See Materials and Methods for more details). (a) The number of significant BOLD-ΔHbO and BOLD-ΔHbR correlations (n = 15, FDR ); here we compared the fMRI time series from story E2 with the fNIRS time series from stories E1, E2, T2, and T2. When comparing time series corresponding to different stories, we found no significant correlations between BOLD and fNIRS. (b) Brain maps showing voxels correlated with the ΔHbO of at least one optode (red) and voxels with no significant correlations with fNIRS (blue). The images were rendered with MRIcron37. (c) Six examples of voxel-optode pair. Three of the examples are the most correlated within prefrontal and the other three most correlated within parietal. Each colored circle represents one voxel-optode pair. For each pair, the voxel and optode locations are illustrated in I. fMRI Voxels and II. fNIRS optodes respectively, and the BOLD (blue line) and ΔHbO (red line) time course during the comprehension of story E2 are compared (duration = 385 s). Time courses were first standardized and averaged across the subjects and the Pearson’s correlations () between BOLD and ΔHbO were estimated.

Discussion

During social interaction, the brains of individuals become coupled as those individuals send and receive signals (light, sound, etc.) through the environment, analogous to a wireless communication system8. This brain-to-brain coupling relies on the stimulus-to-brain coupling which reflects the brain’s ability to be coupled with the physical world in order to represent it, veridically and dynamically. In this study, we identified the brain-to-brain coupling between a speaker telling an unrehearsed real-life story and a group of listeners listening to the story, with the aid of fNIRS. We have demonstrated for the first time that it is feasible to study neural coupling during natural verbal communication with fNIRS. While there is a growing literature using fMRI and EEG to study brain-to-brain coupling during social interaction40, the application of fNIRS in the field is still rare. Cui et al. in 2012 first adopted fNIRS to investigate neural coupling between pairs of subjects playing a simple cooperation and competition game23. In the game, participants were asked to press a response key after a ‘go’ signal, either in synchrony with (cooperation mode) or faster than (competition mode) their partner. An increase in neural correlation between the members of a pair was found only during cooperation and was associated with better cooperation performance. Holper et al. in 2012 investigated neural coupling between a model and an imitator during a finger tapping task24. A stronger increase in neural coupling was found during the imitation condition compared with a control condition in which the imitator no longer needed to follow the model’s tapping pace. These two studies, however, involved only simple stimuli and contexts.

Our results demonstrated that: (1) the brain activation recorded by fNIRS was coupled between a speaker telling a real-life story and listeners listening to the story (speaker-listener coupling); (2) on average, the listeners’ brain activity lagged behind that of the speaker (5-s delay); (3) the brain activity evoked by the same story was reliable across the listeners (listener-to-listener coupling); and (4) coupling was not present when listeners heard stories in a language incomprehensible to them. These findings are consistent with previous work using fMRI12 and EEG13 and demonstrate that fNIRS can be used to study brain-to-brain coupling during social interaction in a natural communicative context.

During brain-to-brain coupling, activity in areas of prefrontal and parietal cortex previously reported to be involved in sentence comprehension were robustly correlated across subjects, as revealed in the inter-subject correlation analysis34. As these are task-related (active listening) activation periods (not resting, etc.), the correlations reflect modulation of these regions by the time-varying content of the narratives, and comprise linguistic, conceptual and affective processing. As expected, and in agreement with previous work12, the activity among listeners is time locked to the moment of vocalization. In contrast, the activity in the listener’s brain lagged behind the activity in the speaker’s brain. Though the sluggish hemodynamic response measures of fMRI and fNIRS have lower temporal resolution and could mask the exact temporal speaker–listener coupling, the delay (5-sec) we have identified in this study is consistent with previous work on the multiple timescales of linguistic processing10. In particular, analyses integrating single-unit, ECoG, and fMRI data have uncovered a hierarchy of timescales over which natural communication unfolds in the brain, ranging from hundreds of milliseconds (single words) to a few seconds (sentences) to tens of seconds (paragraphs)10. These previous findings indicate that each brain area along this hierarchical pathway accumulates and integrates information at a preferred timescale (e.g., early auditory areas integrate phonemes to detect words, while later areas integrate words to form a sentence, etc.) and transmits the chunked information upstream to the next processing level. The relatively long (5-sec) lag between speaker and listener seems to match the processing timescale of natural discourse, which unfolds at the sentence level. In agreement with such an observation, the speaker-listener neural coupling emerges in areas with relatively long processing timescales.

Another interesting observation is that significant inter-subject correlations were found primarily between prefrontal areas in the speaker and parietal areas in the listeners. This result supports a previous EEG study in which the coupling between speaker and listener was found to be mainly between different channels13. In that study, listeners watched the video playback of speakers who were telling either a fairytale or the plot of their favorite movie or book. A canonical correlation analysis between the EEG of the speakers and listeners showed that coupling was mainly limited to non-homologous channels although source localization was not applied. To our best knowledge, our study presents the first fNIRS-based evidence that the brain-to-brain coupling between speaker and listener was mainly between non-homologous brain areas.

In the current study, we further compared the fMRI and fNIRS signal time courses when two groups of subjects were listening to the same audio recording of a real-life story. The neural activation of one group was recorded with fNIRS. The neural activation of the other group (the fMRI group) was recorded with fMRI in a previous study14. We first analyzed the two datasets separately with inter-subject multilevel GLM and found similar patterns of coupling between listeners in the two modalities. We then investigated the correlation between fMRI and fNIRS signals and found that ΔHbO and BOLD were significantly correlated, despite the fact that they were collected from different subjects in different recording environments, and with different techniques (fMRI vs. fNIRS). Furthermore, the fMRI voxels that were significantly correlated with fNIRS optodes were not randomly distributed but came from brain areas usually considered to be related to listening comprehension1. When the fNIRS and fMRI signals corresponding to different stories were compared for control purposes, no significant correlation was found, as expected. Significant BOLD-ΔHbR correlations were found but to a much lesser extent compared to ΔHbO (Fig. 5a). One possible explanation is the superior reliability of ΔHbO across the listeners compared to ΔHbR as shown in Fig. 1.

For many years, researchers have been interested in comparing the fNIRS and fMRI responses during cognitive tasks. Strangman et al. simultaneously recorded fNIRS and fMRI in a finger flexion/extension task30. Although the authors expected the ΔHbR-BOLD correlation to be strongest due to the causal relation between the ΔHbR and BOLD signals, the results suggested stronger BOLD-ΔHbO correlations. The authors suspected this might be due to a higher signal-to-noise ratio (SNR) of ΔHbO in response to the task. Cui et al. simultaneously recorded fNIRS and fMRI for the same group of subjects during 4 tasks: finger tapping, go/no-go, judgment of line orientation and visuospatial n-back31. They found that both ΔHbR and ΔHbO were correlated with BOLD, despite differences in SNR, and the type of task did not significantly affect the correlations. ΔHbO-BOLD correlations were found to be slightly but significantly higher than ΔHbR-BOLD. Noah et al. compared fMRI and fNIRS measurements in a naturalistic task in which participants played the video game Dance Dance Revolution and rested alternately for 30 second blocks, and the results suggested a high ΔHbO-BOLD correlation within the same measurement areas32. All of the studies above compared the mean triggered average activity induced by averaging a condition over time. However, Ben-Yakov et al. demonstrated the shortcoming of the triggered averaging method for detecting an event-specific responses which are locked to the structure of each particular exemplar26. In our study, each event in the story is unique and singular and cannot be averaged with the responses evoked by other events in the story. Our findings provide evidence that the response dynamics to a sequence of events in the story are robust and reliable, and can be detected both with fMRI and fNIRS, by measuring the reliability (correlation) of responses to the story within and between the two methods in real-life complex settings that unfolds over several minutes.

The current study, however, is only the first step toward studying brain coupling during natural communication using fNIRS. While fNIRS has obvious disadvantages relative to fMRI, which include coarser spatial resolution and an inability to measure signal beneath the cortical surface, it also has some advantages over fMRI. The advantages of fNIRS over fMRI include: (1) lower cost; (2) easier setup; (3) higher temporal resolution, (4) greater ecological validity, which can allow face-to-face communication (in fMRI setups subjects cannot see each other); (5) greater ease of connecting two systems simultaneously to test bi-directional dialogue-based communication; and (6) ease of use in real-life clinical communicative contexts. In recent studies, fNIRS has been used in extreme settings such as in aerospace applications41,42 and with mobile participants walking outdoors20. In the context of the current study, the true advantages of fNIRS can be exploited when the neural activation of two or more subjects is studied during face-to-face conversations in a natural context such as a classroom.

Despite these promising results, our study was limited in certain aspects. First, for the fNIRS-fMRI comparison, the spatial resolution and coverage of our fNIRS system were limited, so fMRI-fNIRS correlations were estimated between all possible voxel-optode pairs within large cortical areas or the whole brain. Second, our data from fNIRS and fMRI were not recorded simultaneously and involved different participants, so any possible between-subject differences should be taken into account when interpreting the results. Future studies, preferably with concurrent fNIRS and fMRI, can validate our findings and deepen our understanding of the fNIRS-fMRI relationship for complex natural stimuli. Potential future uses of this approach would include everyday settings such as classrooms, where we could investigate communication between a teacher and students, or in business meetings, across a speaker and attendees. However, due to the nature of the hemodynamic response that fNIRS measures, rapid communication may be difficult to analyze at short timescales (e.g., at the word-level). Future studies can investigate the temporal and spatial limits and requirements of fNIRS-based speaker-listener coupling.

In summary, our results showed that: (1) A speaker’s and listeners’ brain activity as measured with fNIRS were coupled only when the listeners understood the story; (2) listeners’ brain activity mirrored the speaker’s brain activity with a delay, (3) only during listening to the same real-life story, common brain activation patterns were evoked across listeners that were independent of the imaging technology (fNIRS or fMRI) and recording environment (quiet or noisy; sitting down or lying down); and (4) fNIRS and fMRI signals were correlated during the comprehension of the same real-life story. These results support fNIRS as a viable future tool to study brain-to-brain coupling during social interaction, in real-life and clinical settings such as a classroom, a business meeting, a political rally, or a doctor’s office.

Materials and Methods

Participants

Three speakers (one male native English speaker, two male native Turkish speakers) and 15 native English listeners (8 females) volunteered to participate in the study and were included in the analysis. An additional six subjects participated in the study but were excluded from analysis due to technical issues during recording or the excessive motion artifact (detected both visually and using an automatic algorithm43) presented in the large sensor array data that covers both prefrontal and parietal cortices. Subjects were all right-handed (mean LQ = 74.5, SD = 24.1) based on Edinburgh Handedness Inventory44 and ages 18–35 years. All subjects had normal or corrected-to-normal vision. Participants did not have any history of neurological/mental disorder and were not taking any medication known to affect alertness or brain activity. None of the listeners understood Turkish. The protocol used in the study was reviewed and approved by the Institutional Review Board (IRB) of the Drexel University (DU). The methods were carried out in accordance with approved guidelines and participants gave written informed consent approved by the IRB of DU.

Experimental Procedure

All participants were seated comfortably in front of a computer screen throughout the experiment. fNIRS data of the three speakers were recorded while they told an unrehearsed real-life story in their native language (either English or Turkish). Audio of the stories was recorded using a microphone. The resulting one English story (E1) and two Turkish stories (T1 and T2) were played to the listeners later. An additional real-life English story E2 (“Pie-man”, recorded at “The Moth”, a live storytelling event in New York City) used in several recent fMRI studies of natural verbal communication14,26,27, was also played to the listeners.

fNIRS data were recorded from the listeners throughout the audio playbacks. The playback sequence always began with E2 (English story, Pie-man), and order of the remaining stories (E1, T1, and T2) was counterbalanced across subjects. Before each story playback, short samples of scrambled audio were played to the subjects so they could adjust the volume of the headphones they were wearing. Before the start and after the end of the audio story playback, there was a 15-s fixation period for stabilizing the signal. Immediately after each playback, subjects were asked to write a detailed report of the story they just heard to verify if they understood the story. Figure 6, below, shows the timeline of a story session.

Figure 6
figure 6

Story presentation timeline.

Data Acquisition

Two continuous wave optical brain imaging devices were used simultaneously on each participant to record brain activity from prefrontal cortex (PFC) and parietal cortex (PL) using 40 measurement locations (optodes). Prefrontal and parietal regions were selected based on the significant areas found in the previous fMRI-based speaker-listener neural coupling study by Stephens, et al.12. Anterior prefrontal cortex was recorded by a 16-optode continuous wave fNIRS system (fNIR Imager Model 1100; fNIR Devices, LLC) first described by Chance et al.45 and developed in our lab at Drexel University46,47. The sensor was positioned based on the anatomical landmarks as described before in Ayaz et al.47. Briefly, the center of the sensor was aligned to the midline and the bottom of the sensor was touching the participant’s eyebrow so that the center point of the sensor was approximately at Fpz according to the 10–20 international system (see Fig. 7). The sensor has a source-detector distance of 2.5 cm and the sampling rate was 2 Hz. Parietal cortex was recorded using a 24-optode continuous wave Hitachi fNIRS system (ETG 4000; Hitachi Medical Systems). Two “3 × 3” measurement patches were attached to a cap that was customized for the measurement of the parietal cortex. For each subject, the center of the two patches was placed at Pz, which was located using a measuring tape. Sensors from each patch measured the fNIRS signal of one hemisphere from 12 channels. The sensor has a source-detector distance of 3 cm and the sampling rate was 10 Hz. Figure 7 shows the complete sensor setup and optode configuration.

Figure 7: fNIRS acquisition setup.
figure 7

Red circles indicate emitters; blue circles indicate detectors; White squares indicate measurement channels between emitters and detectors.

Time synchronization markers (triggers) were sent from the stimulus presentation computer to both fNIRS acquisition computers for registration of audio playback start and end times on both fNIRS devices and for temporal aligning recorded data for all subjects.

The approximate projection of the channel locations onto the cortical surface in MNI space was estimated using a virtual spatial registration approach48,49. In this approach, the sensor patches are virtually placed on an ideal scalp and the projected Montreal Neurological Institute (MNI) coordinates on the cortical surface and the standard deviation of displacement were estimated from the magnetic resonance (MR) images of 17 individuals that were obtained from a publicly available dataset50,51. The results are shown in Fig. 8. The optodes covered regions in the frontopolar area, orbitofrontal area, dorsolateral prefrontal cortex, primary somatosensory cortex, somatosensory association cortex, supramarginal gyrus and angular gyrus. Detailed lists of anatomical locations and Brodmann areas corresponding to each optode are included in Table S1 and Table S2, respectively.

Figure 8: Optode locations.
figure 8

(a) Approximate spatial registration of optode locations to MNI space. (b) Schematic representation of the same optode locations on head surface which is used to show results. The virtual spatial registration approach was adopted to estimate the projection of the optodes on the cortical surface in MNI space48. The center of each green colored patch stands for the most likely position and the radius represents the standard deviation of displacement.

fNIRS Data Preprocessing

fNIRS raw light intensity signals were converted to changes in oxygenated hemoglobin (ΔHbO) and deoxygenated hemoglobin (ΔHbR) concentrations using the modified Beer-Lambert law52. The raw signal and hemoglobin concentration changes were inspected both visually and also using the automated SMAR algorithm53, which uses a coefficient-of-variation based approach to assess signal quality and reject problematic channels with bad contact or saturated raw light intensity. Two optodes, 1 and 15, were over the hairline for most participants and hence were rejected from the study. A total of 11.7% were rejected by visual inspection and SMAR. Next, the ΔHbO and ΔHbR time series for each optode and participant were band-pass filtered (0.01–0.5 Hz) and the recordings from parietal sites were down-sampled to 2 Hz to match the sampling rate of prefrontal sites. In the fNIRS literature, various filtering settings have been adopted to reduce physiological artifacts31,54,55,56,57,58. The high cutoff of a band-pass filter, for example, ranges from 0.1–1 Hz31,54,55,56. We chose 0.01–0.5 Hz cutoffs to reduce slow signal drift and physiological artifacts from cardiac activities but maximally preserve activities related to listening comprehension. We considered only the period from 15 to 399 seconds, with respect to story start, in the signal time courses from each audio story. The first 15 s were rejected to account for subjects’ initial period of adjustment to the listening comprehension task, and 399 s is the duration of the shortest story. Prior to subsequent analysis, the signal time courses were standardized optode-wise using a Z-score transform.

fNIRS Analysis

Inter-subject correlation

We first evaluated the reliability of the correlation between listeners’ brain activity using an inter-subject multilevel GLM similar to the one adopted by Stephens et al.12 for each of the four conditions: E1, E2, T1 and T2. We expected neural coupling between listeners to emerge only for the English story conditions E1 and E2, as none of the subjects understood Turkish.

At the individual subject level, a GLM with AR(1) (first-order autoregressive) error model was estimated using the average time course of the listeners as the independent variable and the time course of an individual listener as the dependent variable as follows:

where is the time course of a channel from listener k, is the average time course of a channel from all the other listeners except for listener k, ρ is the autocorrelation coefficient and . AR(1) error model has been frequently adopted in event-related fNIRS analysis to model auto-correlated noise caused by low-frequency drift and physiological processes such as cardio-respiratory and blood pressure changes57.

At the group level, we tested the hypothesis using a one-tailed one-sample t-test evaluated on the slopes () estimated at the individual level. We used the Benjamini–Hochberg procedure33 to control FDR among 80 statistical tests (2[ΔHbO/ΔHbR] × 40[optodes]) with .

Speaker-listener coupling

We evaluated the coupling between optode of speaker and optode of the listeners for all permutations of ( (). The multilevel GLM was applied as it was for listener-listener coupling except that the time course of optode of a speaker was used as independent variable and the time course of optode j of each individual listener was used as dependent variable .

Temporal asymmetry of coupling

The speaker-listener coupling analysis was repeated by shifting the speaker’s time course with respect to those of the listeners from −20 s to 20 s in 0.5 s increments. At each time shift, FDR was controlled among 3200 statistical tests (2[ΔHbO/ΔHbR] × (40 × 40)[optode pairs]) with . As a comparison, listener-listener coupling was re-evaluated by shifting the average listener time series with respect to that of each individual listener.

fMRI Analysis

The fMRI dataset included 17 subjects listening to story E2 (the “Pie-man” story), which was recorded and used in a previous study14. To compare with the fNIRS results, we considered only voxels from the outer layer of the cortex in the neighboring regions of prefrontal and parietal sites as shown in Fig. 8. The “neighboring regions” are defined as voxels within a radius of 2.7 standard deviations from the center of each optode projection. A total of 994 voxels were chosen in this manner. The voxel time courses were high-pass filtered at 0.01 Hz (for comparison with fNIRS signals) and trimmed to only include 15–399 seconds (with respect to story start), and the inter-subject multilevel GLM described in section 0 was employed for model analysis. FDR was controlled among the 994 voxels with .

fMRI-fNIRS Correlation

To estimate the correlation between BOLD and fNIRS signals, both signals were high-pass filtered at 0.01 Hz. The BOLD time courses were z-scored normalized for each voxel and then averaged across the 17 subjects. fNIRS time courses were down-sampled to  Hz to match the fMRI sampling rate. The inter-subject multilevel GLM approach was then applied with the time course of optode of an fNIRS subject as independent variable and the time course of voxel averaged across fMRI subjects as dependent variable .

The aforementioned procedure was applied first to estimate all possible correlations between channels and their corresponding voxels in the left prefrontal (8 optodes × 131 voxels), right prefrontal (8 optodes × 144 voxels), left parietal (12 optodes × 376 voxels) and right parietal (12 optodes × 343 voxels) areas. The voxels were selected as described in section 0. To correct for multiple comparisons, FDR was controlled with a threshold of 0.05.

Additional Information

How to cite this article: Liu, Y. et al. Measuring speaker-listener neural coupling with functional near infrared spectroscopy. Sci. Rep. 7, 43293; doi: 10.1038/srep43293 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.