Introduction

Regardless of age, gender, or native language, healthy individuals use filler phrases, also known as filled pauses, during spontaneous speech1. Frequent utterance of fillers is tightly associated with increased effort to recall or search for a relevant word2, increased anxiety3, and divided attention4. Disfluent non-native speakers compared to native ones as well as dysphasic patients compared to non-dysphasic ones more frequently utter fillers during verbal communication5,6. Practice and preparation are effective methods to reduce the rate of filler utterance during interviews or presentations because the word recall process becomes more automatic and less effortful7.

What happens in the cerebral cortex when one utters a filler? Only a small number of studies have attempted to determine the neural correlates of filler utterances. Effective study design is a consistent challenge in the field due to the unpredictable timing of naturally occurring filler phrases or pauses. In a study of six healthy adults using functional MRI (fMRI)8, participants were instructed to speak whatever came to mind when viewing Rorschach inkblot plates. The authors reported that trials accompanied by overt filler pauses, compared to those accompanied by complete silent pauses, was associated with increased hemodynamic activation in the left superior temporal gyrus. Another fMRI study characterized the spatial pattern of hemodynamic activation when participants listened to other’s speeches including fillers to determine the neural correlates of listening and not utterance of fillers9.

Measurement of event-related high gamma activity on electrocorticography (ECoG), a presurgical evaluation method for patients with drug-resistant epilepsy10, provides a unique opportunity to quantify the rapid dynamics of human perception and cognition without increasing the risk of surgical complications11. Task-related high gamma activity at 70–110 Hz is a summary measure of local cortical engagement with a temporal resolution of tens of milliseconds12. Augmentation of high gamma amplitude has been reported to be tightly associated with an increase in firing rate13, hemodynamic activation14, glucose metabolism15, and the probability of stimulation-induced functional impairment16. Conversely, attenuation of high gamma amplitude is associated with a reduced firing rate and hemodynamic deactivation17. Because of its outstanding signal fidelity, ECoG recording is suggested to be capable of accurately measuring the spatiotemporal dynamics of event-related neural modulations at a single-trial level in an individual patient18,19.

In the present study, during extraoperative ECoG recording, each participant was instructed to overtly explain the content of a given image with a sentence including the subject (e.g., A baby), verb (plays with a dog), location (at the beach), and time (during the day; Fig. 1). Due to the challenging nature of this task, all participants intermittently used fillers during sentence production. Although we did not originally employ this task to study the neural correlates of filler utterances versus ordinary phrases, it provided a rare opportunity to determine the spatiotemporal characteristics of high gamma augmentation during this ubiquitous, yet unpredictable, human behavior.

Figure 1
figure 1

Sentence production task. (A) Timeline of the task. (B) Event classification. Each patient was instructed to look at and overtly explain a visual scene in a sentence, including the subject, verb, location, and time in any order. At the end of each response, the examiner pressed a button to present the next photograph following the presentation of a fixation cross in the center of the screen for 2 or 2.5 s. All phrases were classified as either filler or non-filler.

Given that filler phrases are associated with recall effort and word search2, we hypothesized that the spontaneous utterance of fillers, compared to that of non-filler words, would be associated with greater high gamma augmentation across large-scale networks of the association cortex. This hypothesis was further motivated by previous imaging studies of healthy children and adults, which reported that these regions were activated, to the largest extent, during tasks requiring the selection of optimal words among competing alternatives20,21,22,23,24,25. Furthermore, studies of patients with stroke and primary progressive aphasia suggest an association between more severe damage to the association cortex of the left hemisphere and increased rate of filler utterance due to the loss of word retrieval ability3,26,27,28,29.

Methods

Participants

We studied three native English-speaking patients (Table 1; age: 15, 16, and 17 years; 1 female), who underwent the sentence production task (Fig. 1A) during extraoperative subdural ECoG recording at Children’s Hospital of Michigan, Detroit, USA. None of these patients had massive brain malformations observable on an MRI or severe cognitive impairment defined by a verbal IQ of < 70. This study, approved by the Institutional Review Board at Wayne State University, was performed in accordance with the approved guidelines. Informed consent and assent were obtained from the guardians of patients and patients, respectively.

Table 1 Patient profile.

Acquisition of ECoG and three-dimensional magnetic resonance surface images

ECoG and MRI data acquisition methods were described elsewhere30. Platinum disk electrodes (10 mm center-to-center distance; 3 mm exposed diameter) were placed in the subdural space of the hemisphere estimated to contain the epileptogenic zone, based on collective evidence from the noninvasive presurgical evaluation31. ECoG signals were continuously acquired at a sampling rate of 1,000 Hz using the Nihon Kohden Neurofax 1100A Digital System (Nihon Kohden America Inc., Foothill Ranch, CA, USA). Channels classified as seizure onset zone, those generating interictal spike discharges, as well as those showing artifacts during the task were excluded from further analysis. This is a common procedure across ECoG studies of event-related high gamma activity and expected to improve the generalizability of the findings12,32,33,34. The number of nonepileptic channels included in the analysis ranged from 100 to 128 per patient. We created a three-dimensional surface image for each patient with electrode sites defined directly on the pial surface30. FreeSurfer scripts were used to parcellate the cortical gyri of each individual surface image (https://surfer.nmr.mgh.harvard.edu), in order to determine the anatomical label of each electrode location 35,36 (Fig. 2). All three patients had electrode coverages commonly involving the lateral frontal, parietal, and temporal regions.

Figure 2
figure 2

Location of subdural electrodes included in the analysis. (A) Patient 1. (B) Patient 2. (C) Patient 3. The pink line delineates the central sulcus.

Sentence production task

At the bedside during extraoperative ECoG recording, participants were instructed to freely explain, in a sentence, the content of a visual scene. Each scene was a photograph sampled from the International Affective Picture System37. Each photograph was 9 × 12 cm and presented at the center of a 19 inch LCD monitor placed 60 cm in front of the patient. Participants were instructed to include the following domains in the sentence in any order: ‘subject (e.g., A hippo)’, ‘verb (is bathing)’, ‘location (in the water)’, and ‘time (in summer)’ (Fig. 1A). Each participant was instructed to say, ‘I don’t know’, in case she/he failed to understand the content of a given scene. Each trial began with a 2.0 or 2.5 s fixation cross followed by the presentation of the photograph. The scene was presented until the patient completed their response, at which point the examiner manually started the next trial. Overt verbal responses were recorded using a WS-823 digital voice recorder (Olympus America Inc, Hauppauge, NY, USA) and synchronized with ECoG signals via a DC input to the ECoG amplifier10. The timing of the picture stimulus presentation was likewise synchronized using a photosensor attached to the corner of the LCD monitor and the ECoG amplifier via DC input.

Event classification and marking

The onset and offset of filler and ordinary phrases were identified and marked using recorded vocal sounds synchronized to the ECoG signal (Fig. 1B). Fillers were defined as an extraneous word or set of words (e.g., “uh”, “um”, “y’know”, or “well”1,38). We used Cool Edit Pro version 2 (Syntrillium Software Corp., Phoenix, AZ, USA) to aid in the manual marking of each phrase of interest39.

Time–frequency analysis

We determined the dynamics of high gamma modulations during filler and non-filler utterances using a method similar to what we have previously reported30. Briefly, we applied a complex demodulation method to transform ECoG signals from the time–voltage into time–frequency domain in steps of 5 Hz and 10 ms40,41. For each ECoG channel, we quantified the mean percentage change of high gamma amplitude within 70–110 Hz in 10 ms bins relative to a 400-ms reference period at 600–200 ms prior to the presentation of the photograph stimulus (Fig. 1). High gamma amplitude, time-locked to utterance onset and offset, was plotted as a function of time (Figs. 4, S1).

Statistical analysis to determine the effect of filler utterances on high gamma activity

To determine whether fillers accounted for the variance in utterance-related high gamma modulations, we employed a mixed model analysis at each electrode site of a given patient (SPSS Statistics 25, IBM Corp., Chicago, IL, USA). The dependent variable was the percent change of high gamma activity during a 300 ms utterance period. The following variables were treated as fixed effects: (1) ‘filler utterance’ (1 if an uttered phrase was a filler and 0 if a non-filler), (2) ‘onset/offset of phrase’ (1 during the 300 ms period immediately after utterance onset and 0 during the 300 ms period immediately before utterance offset), (3) trial number, and (4) phrase duration (ms). This analysis was designed to determine whether a filler was associated with increased neural activation independently of the three co-variables mentioned above. The intercept was treated as a random effect. The statistical significance threshold was set at p = 0.05. Cortical regions with preferential activation during fillers were identified as those with high gamma effects exceeding two standard deviations above or below the mean across all electrodes for the patient (Fig. 3).

Figure 3
figure 3

Spatial characteristics of filler-preferential high gamma augmentation and attenuation. (A) Patient 1. (B) Patient 2. (C) Patient 3. All electrode sites that showed significant filler-preferential high gamma augmentation (red circles) or attenuation (blue circles) based on the mixed model analysis. Filler-preference electrodes were defined as having a ‘filler utterance’ effect on high gamma activity (t-score) that was either above or below two standard deviations from the mean across all electrodes in given patients. The pink line delineates the central sulcus.

Results

Table 2 summarizes the behavioral data of given patients, including the number and duration of filler and non-filler utterance. The duration of filler utterances was shorter than that of the utterance of ordinary phrases (Table 2). Figure 3 presents the locations of electrode sites at which the filler effect was above or below two standard deviations from the mean across all electrode sites for a given patient. Ten sites in the association cortex and one in the left lingual gyrus (i.e., visual cortex) showed filler-preferential high gamma augmentation. Figure 4 shows the temporal dynamics of utterance-related high gamma activity at sites showing filler-preferential high gamma augmentation. Blue circles (N = 2) in Fig. 3 indicate the locations of sites showing filler-preferential high gamma attenuation. Table 3 summarizes the mixed model coefficients, t-scores, and confidence intervals of the filler effects at the 13 sites mentioned above. Online Supplementary Figure S1shows utterance-related high gamma augmentation at a face sensorimotor cortical site taking place commonly during filler and non-filler utterance.

Table 2 Behavioral data.
Figure 4
figure 4

Temporal dynamics of utterance-related high gamma augmentation. The temporal dynamics of high gamma amplitude (% change) in (A) Patient 1, (B) Patient 2, and (C) Patient 3. The mixed model analysis showed significant filler-preferential high gamma augmentation in these electrode sites.

Table 3 Results of mixed model analysis to assess the filler effect on high gamma activity.

Discussion

Significance of filler-preferential high gamma augmentation

The present study indicated that the utterance of fillers, compared to that of ordinary ones, was associated with greater high gamma augmentation primarily in the association cortex. A plausible explanation for our ECoG observation is that filler utterances are more likely to occur while large-scale networks across the association cortex remain engaged in cognitive processing prior to motor responses (i.e., verbal articulation). This hypothesis is consistent with the generally accepted notion that filler utterances are a behavioral marker of increased effort to recall, search, or select a relevant word2. The involvement of large-scale association networks reflects the complexity of the sentence production task. To observe and fully describe a pictured scene involves integrating perceptual, working memory, motor, and cognitive functions at least including the semantic processing of the perceived image as well as lexical and phonological access in a sentence context42,43. The sentence production task requires extensive analysis of the visual scene involving multiple domains and a long duration of utterance response. Collective evidence indicates that semantic, lexical, and phonological processes are exerted by large-scale networks in the temporal, parietal, and frontal lobe association cortex with left-hemispheric dominance33,44,45. Linking each part of the description into a single sentence also requires substantial verbal working memory activation, which may further involve the association cortex of either hemisphere46,47,48. In contrast, overt production of non-filler phrases was previously reported to maximize the degree of neural activation in the primary sensorimotor cortex following the subsidence of neural activation in the left inferior frontal gyrus32,44.

One cannot rule out the possibility that our patients spontaneously used fillers as a method to communicate their intention49,50. In other words, one may subconsciously use fillers as a signal to infer that she/he still intends to speak further or show a need for time to collect thoughts. A behavioral study previously reported that the audience rated speakers using filler pauses higher in presentation skills than those using complete silent pauses51. A previous fMRI study of 16 healthy adults investigated the effect of listening to speech including fillers9; thereby, participants were instructed to listen to auditory sentences delivered via headphones carefully. This fMRI study reported that speech including fillers, compared to fluent speech, elicited greater degrees of hemodynamic activation in the superior temporal gyri as well as medial frontal regions.

The present study did not provide the causal evidence suggesting that filler utterance indeed facilitated the cognitive process. Our observation of filler-preferential neuronal activation in the association cortex does not indicate that frequent usage of fillers improves the verbal response.

Since all patients were adolescents, we cannot rule out the possibility that the reported neuronal dynamics could be specific to this phase of development.

Methodological considerations

The small sample size is a major limitation of the present study. Thus, one should consider this research as a hypothesis-generating study rather than as a definitive investigation. However, because the signal fidelity of ECoG is more than 100 times better than that of scalp EEG19, a number of studies suggest that one can evaluate task-related high gamma modulations on a per-trial basis12,18,52. Each patient uttered filler phrases only three to 21 times but more than 300 ordinary phrases during the task (Table 2). Such small numbers of filler utterance limited the statistical power in the mixed model analysis. Only seven of the 11 sites showing a positive filler effect on high gamma activity would survive the FDR correction for approximately 100 subdural electrode channels per patient (Table 3). Correction for multiple tests decreases the risk of Type I error but increases the risk of Type II error; given the exploratory nature of this analysis, we opted not to apply the FDR correction. Further studies using a larger number of patients and trials are necessary to validate or disprove the hypothesis generated in the present study. For example, analysis of ECoG signals during task-free communications may increase the chance of securing sufficient statistical power53.

In the present study, we computed the percentage change of high gamma amplitude relative to that during a reference period. This analytic approach was based on the assumption that the patient was resting during the reference period between trials.