Introduction

Time is a basic abstract concept that humans use to precisely process temporal information. Everyone uses temporal information in his or her daily life without explicitly thinking about it. However, the nature of time itself may be best explained by Augustine, who said, ‘what then is time? If no one asks me, I know; if I wish to explain it to one that asketh, I know not’1. A critical element of temporal information processing is working memory (WM), which involves short-term storage and the online manipulation of information2. WM can maintain and manipulate many types of information, such as digits, letters, words, locations, images, etc.

Few studies have investigated the maintenance or manipulation of temporal information in WM3 or the neural representation of duration in WM for different sensory modalities4,5,6,7,8. Behavioural studies have revealed that two types of memory subsystems—visuospatial sketchpad and phonological loop—maintain visual duration and auditory duration, respectively7, 9, 10. However, several limitations prohibit behavioural studies from examining the internal representation of duration in WM. In contrast, neurophysiological studies have indicated that neural oscillation is critically involved in the neuronal dynamics required for sustaining WM representations11,12,13. Compared with other frequencies, theta and alpha bands are routinely delineated in neural oscillation studies of WM. Previous studies have reported correlations between WM load and theta oscillations (4–8 Hz) over prefrontal regions or alpha oscillation (8–12 Hz) over posterior sites14,15,16,17,18,19,20. Some studies have suggested that alpha oscillations are involved in inhibiting the cortical areas representing information that is no-longer-relevant18, 21, 22. In contrast, others have suggested that the alpha oscillation is associated with the successful maintenance of item information3, 17, 23. However, more research is needed to specifically investigate the neural representation of temporal WM.

Recently, using electroencephalogram (EEG) recordings to examine the neural oscillatory correlates of visual duration in WM, Chen and colleagues3 found that alpha activity rather than theta activity was involved in the maintenance of visual duration in WM. Additionally, different alpha activities occurred during WM maintenance below and above 3 s, which provides electrophysiological evidence for the perspective of segmented duration representation24, 25. They argued that the neural representations of different lengths of durations are distinct. Thus, an estimated critical threshold exists. Durations below the critical threshold will be recognised as the ‘subjective present’, and they typically should result in the perception of a coherent experience or temporal gestalt25. In contrast, durations above the threshold would not result in the perception of a single unit because of disintegration24. There is some evidence to support this claim, e.g. Elbert et al.26 employed event-related potentials (ERPs) and found that there was a critical threshold (3 s) for the accuracy of visual duration reproduction. Additionally, it was accompanied by a slow negative wave named contingent negative variation (CNV) during reproduction durations within the threshold, whereas the CNV was reduced or even absent when durations were beyond the threshold. Moreover, a functional magnetic resonance imaging showed that the motor system and default mode network respectively process durations below and above 2 s27. Further indications for the perspective of segmented duration representation are derived from the perception of rhythmic coherence (e.g. it becomes impossible about the perception of rhythm if the tones are separated by intervals exceeding 3 s in a regular sequence28, 29) and sensorimotor synchronization (e.g. the appropriate synchronization between a regular sequence of beats and corresponding motor acts breaks down when the interstimulus interval between the beats exceeds 2~3 s30).

However, the neural representation of auditory duration remains unknown, and thus is the focus of our study. It is an important characteristic of duration perception that subjectively perceived durations vary between different sensory modalities (e.g. auditory or visual). Many studies showed that auditory signals are often judged as longer than visual signals for a given duration31,32,33,34,35, for example, Wearden et al.34 used duration bisection task and verbal estimation found that auditory signals appeared longer than visual signals in all cases, and the effect was greater at longer stimulus durations. They suggested that auditory signals drive an internal clock at a faster rate than visual signals32, 33. Similar auditory/visual difference in duration judgment was also observed in 5- and 8-year-old children as well as young adults35. But this modality difference was more obvious for children. They believed that it is more automatic for the processing of auditory than visual signals for all age groups, however, visual signals require more attentional resources. Then it is more difficult for children to process visual signals than adults because of the limited attentional abilities. Therefore, considering that auditory signals are often judged to be longer than visual ones31,32,33,34,35, we hypothesised that the neural representation of auditory duration in WM is also segmented, but the critical threshold may be shorter than that for visual duration.

Here, we applied a matching-to-sample task (Fig. 1), following the study of Chen et al.3, to investigate the theta and alpha oscillation correlates of auditory duration maintenance in WM. The advantage of this paradigm includes the separation of the encoding, maintenance, and decision stages of temporal information36, 37. Subjects were required to maintain one duration (1 s, 2 s, 3 s, and 4 s) in WM. As for the theta band, we hypothesised that the auditory duration length would not be associated with changes in the theta band amplitude, because the theta band mainly reflects the maintenance of temporal order information23, 38. As mentioned earlier, we hypothesised that the neural representation of auditory duration would be segmented. Thus, when it comes to the controversial role of alpha oscillation in WM, we inferred that if the alpha oscillation reflects different internal representation of auditory durations below and above the critical threshold point, then the alpha band reflects the successful maintenance of item information3, 17, 23. If there is no significant alpha band difference between durations below and above the critical threshold point, then the alpha band reflects inhibition of no-longer-relevant information18, 21, 22.

Figure 1
figure 1

Trial sequences and durations of each screen presentation. Auditory stimuli were randomly presented for 1, 2, 3, or 4 s. Each trial starts with a pure tone (sample), followed by a 3-s interval (the delay/maintenance phase). Next, a second pure tone (probe) is presented. Participants press ‘1’, ‘2’, or ‘3’ correspondingly after estimating the duration of the second pure tone (probe) as shorter, equal to, or longer than the first tone (sample).

Results

Behavioural data

A repeated measures analysis of variance (ANOVA) was performed separately on accuracy and reaction time, with duration (1 s, 2 s, 3 s, and 4 s) as a within-participant factor. There was a significant main effect of duration [F(2.071,49.698) = 61.205, p < 0.001, η p 2 = 0.718] on accuracy. Results of post hoc tests showed that the accuracy of the 1-s condition (mean ± standard error: 0.888 ± 0.017) was significantly higher than that of the 2-s (0.840 ± 0.019), 3-s (0.729 ± 0.023), and 4-s (0.670 ± 0.028) conditions (t(24) = 3.961–9.216, p values < 0.01); the accuracy of the 2-s condition was significantly higher than that of the 3-s and 4-s conditions (t(24) = 7.232–8.881, p values < 0.001); and the difference between the 3-s and 4-s conditions was also significant (t(24) = 3.596, p < 0.01).

There was also a significant effect of duration on reaction time [F(3,72) = 3.176, p < 0.05, η p 2 = 0.117]. Results of the post hoc tests demonstrated that the difference between the 1-s (449.885 ± 22.788) and 2-s (457.184 ± 25.217) conditions on reaction time was not significant (t(24) = −1.090, p > 0.05), whereas there was a marginal significant difference between the 1-s and 3-s (467.358 ± 27.116) conditions (t(24) = −2.042, p < 0.06). In addition, the reaction time of the 1-s condition was significantly shorter than that of the 4-s (469.806 ± 27.463) condition. Differences among the 2-s, 3-s, and 4-s conditions were not significant (t(24) = −1.756–0.413, p values > 0.05).

EEG data

Figure 2 shows the dynamic activities of theta and alpha powers during the encoding (sample), delay, and probe phases. Figure 3 shows the average theta and alpha band powers during the time course of the delay phase. Repeated-measures ANOVA on the theta band power from the interval of 0.5 s to 0.1 s before the onset of delay found no significant effect of duration or a duration × region interaction (p values > 0.05). An ANOVA conducted on the theta band power from the 1-s to 3-s interval after the onset of delay found a significant main effect of region [F(2.207,52.959) = 3.524, p < 0.05, η p 2 = 0.128]. The regions with the highest theta band power during WM maintenance were the right-posterior cluster (0.148 ± 0.098 dB), middle-posterior cluster (0.108 ± 0.100 dB), middle-central cluster (0.076 ± 0.119 dB), and left-posterior cluster (0.061 ± 0.109 dB). There was no significant effect of duration [F(3,72) = 1.321, p > 0.05, η p 2 = 0.052] or a duration × region interaction [F(8.316,199.582) = 1.475, p > 0.05, η p 2 = 0.058] (Fig. 4).

Figure 2
figure 2

Theta and alpha effects for the whole epoch in the posterior parietal in the 1-s (A), 2-s (B), 3-s (C), and 4-s (D) auditory duration conditions. The decibel-transformed value is relative to the baseline interval (−0.4 s to −0.1 s) before the sample stimulus. The theta band is enhanced at the onset and offset of stimuli, whereas the alpha band is enhanced during encoding, especially in the delay phase.

Figure 3
figure 3

Average theta (A) and alpha (B) band powers during the 1-s, 2-s, 3-s, and 4-s auditory duration maintenance in WM. Steady theta (4–8 Hz) and alpha (8–12 Hz) band activities are elicited during WM maintenance after averaging across frequency, and they are separately plotted for each of the nine analysed electrode clusters. Zero on the x-axis represents the onset of the delay phase. Red, green, blue, and black curves denote the 1-s, 2-s, 3-s, and 4-s auditory duration conditions, respectively.

Figure 4
figure 4

Grand average theta and alpha band powers during auditory duration maintenance in WM in the 1-s, 2-s, 3-s, and 4-s conditions. Grand average theta and alpha band powers during the delay phase are computed for each duration condition, and they are separately plotted for each of the nine analysed electrode clusters. Red and green lines denote the alpha and theta band powers in each duration condition, respectively. Error bars represent the standard error of the mean across observers.

Repeated-measures ANOVA conducted on the alpha band power from the interval of 0.5 s to 0.1 s before the onset of delay demonstrated no significant effect of duration or a duration × region interaction (p values > 0.05). An ANOVA conducted on the alpha band power from the interval of 1 s to 3 s after the onset of delay found a significant main effect of duration [F(3,72) = 3.011, p < 0.05, η p 2 = 0.111]. Results of post hoc tests showed that the difference of the alpha power between the 1-s (0.426 ± 0.170 dB) and 2-s (0.333 ± 0.153 dB) conditions was not significant (t(24) = 0.643, p > 0.05). Moreover, the alpha power for these duration conditions were higher (or marginal higher) than that of the 3-s (0.084 ± 0.185 dB) and 4-s (0.053 ± 0.153 dB) conditions (t(24) = 1.899–2.459, p values < 0.07), and the difference between the 3-s and 4-s conditions was not significant (t(24) = 0.203, p > 0.05). The main effect of the region was significant [F(3.751,90.031) = 3.807, p < 0.01, η p 2 = 0.137], and the regions with the highest alpha band power during WM maintenance were the middle-posterior cluster (0.429 ± 0.183 dB), left-posterior cluster (0.346 ± 0.157 dB), right-posterior cluster (0.321 ± 0.170 dB), and middle-central cluster (0.304 ± 0.175 dB). The duration × region interaction [F(8.655,207.714) = 1.521, p > 0.05, η p 2 = 0.060] was not significant (Fig. 4).

Figure 5 shows that the region with the highest theta and alpha band powers during WM maintenance were the posterior parietal (i.e. the left-posterior cluster, middle-posterior cluster, right-posterior cluster, and middle-central cluster). A correlation analysis was performed to determine the relationship between the amplitude of the alpha power and the accuracy at the posterior parietal cortex (i.e. 20 channels in total) during the delay phase for every duration condition to further examine the role of the alpha band power in auditory duration maintenance in WM. As shown in Fig. 6, we found that the amplitude of the alpha power was positively correlated with the accuracy of the 2-s condition (Pearson r = 0.569, p < 0.01). In contrast, the correlations for the 1-s (r = 0.155, p > 0.05), 3-s (r = 0.094, p > 0.05) and 4-s conditions (r = −0.027, p > 0.05) were not significant.

Figure 5
figure 5

Topographies of theta and alpha activity during auditory duration maintenance in working memory in the 1-s, 2-s, 3-s, and 4-s conditions. Theta and alpha band powers are significantly activated in the posterior parietal cortex from the interval of 1 s to 3 s after the onset of delay.

Figure 6
figure 6

Correlation between the amplitude of the alpha band power and accuracy in the posterior parietal during the 2-s auditory duration condition. The scatter plot depicts the amplitude of the alpha power (y-axis) relative to the accuracy of the behavioural performance (x-axis) in the 2-s auditory duration condition. Pearson product-moment correlation coefficient, r = 0.569.

Discussion

Using a matching-to-sample task, the present study was to investigate the neural oscillations associated with auditory duration maintenance in WM. EEG analyses indicated that the auditory duration length was not associated with changes in the theta band amplitude, which confirmed our hypothesis that the theta band is not involved in auditory duration maintenance in WM. The alpha band amplitudes during 3-s and 4-s auditory duration conditions were lower than during the 1-s and 2-s conditions. Additionally, the alpha amplitude positively correlated with accuracy in the 2-s condition. These results emphasise the involvement of the alpha band in auditory duration maintenance in WM. They are also consistent with our hypothesis that the neural representation of auditory duration is segmented with a critical point of approximately 2 s, which is shorter than that for visual duration (3 s).

The neural representations of different lengths of durations are distinct. Separate mechanisms process durations below and above the critical threshold point. Compared with the results of Chen et al.’s3 study on visual duration, our study indicated that there are different threshold points between different sensory modalities. As mentioned earlier, only duration below the critical threshold point is represented as a coherent experience or temporal gestalt in WM25. However, auditory signals are often subjectively perceived to be longer than visual ones for a given duration31,32,33,34,35. As described in the pacemaker-accumulator model of timing39, a pacemaker in the brain emitted pulses at the onset of a timed auditory/visual stimulus, and these pulses were summed by an accumulator until the stimulus stopped. Perception of duration depends on the number of pulses counted in the accumulator. Some researchers claim that the rate of internal clock for auditory signals is faster than that for visual ones33, 34. Others hold that it is better for auditory signals to capture attention than visual ones32, 40. In both cases, the accumulated clock value of auditory signals will be larger; therefore, the critical point of auditory duration in WM will be shorter compared with visual duration.

The role of the alpha oscillation is controversial, and our study’s findings supported that the alpha band is related to successful maintenance of item information. The most important reason is that the ‘inhibition’ hypothesis of the alpha band cannot explain the separate representations of auditory duration below and above 2 s. Every duration condition should be inhibited in the same way if alpha band reflects inhibition of no-longer-relevant information18, 21, 22, and there should be no significant alpha band difference between durations below and above the critical threshold point. Furthermore, our results indicated that the alpha band power was greatest over the posterior parietal during WM maintenance, which was suggested to be important to the storage of information in WM41, 42. One EEG study also reported that alpha band activities distributed over the posterior parietal and lateral occipital were increased during item maintenance in WM, and the enhancement was primarily evident in high performers on the item WM task23. The parietal cortex is also considered as the most important region associated with temporal processing43. In addition, individual threshold differences are likely to exist even though the general threshold is 2 s. That is, individuals with a longer threshold (2 s is a short duration) likely show a high accuracy and a high alpha band amplitude if the alpha band is engaged in the maintenance of auditory duration in WM. In contrast, individuals with a shorter threshold (2 s is a long duration) probably show a low discrimination accuracy and a low alpha band amplitude. Indeed, we found a positive relationship between the alpha amplitude and accuracy in the 2-s auditory duration condition (Fig. 6). Previous studies have considered that WM-related alpha oscillations during WM maintenance are an essential constituent that sustains the neural representations of memorised items44, and they may preserve the fidelity of stimulus representations during WM maintenance23. It seems that the alpha oscillation fluctuations during stimulus maintenance represent complex network activity of internally directed attention that promote WM processes to enhance the stimulus maintenance45.

Four issues must be noted. First, there may be varying degrees of task difficulty among 1-s, 2-s, 3-s, and 4-s duration conditions. For example, determining the difference of a 1-s sample and 2-s probe could be more difficult than that of a 1-s sample and 4-s probe. Moreover, the responses were chosen from two options (i.e. equal or longer/shorter) for 1-s and 4-s samples and three options (i.e. equal, longer, or shorter) for 2-s and 3-s samples. Thus, it would probably have been easier to make a decision for 1-s and 4-s samples than 2-s and 3-s. However, the behavioural result showed that the accuracy of judging significantly decreased with the increase of sample durations. This shows that the main conclusion was not affected by the task difficulty. Second, the time intervals between the baseline and delay phase among the 1-s, 2-s, 3-s, and 4-s conditions were different, which suggests the influence of the alpha amplitude in the delay phase. Thus, the alpha amplitude between −0.5 and −0.1 s before the onset of the delay phase may have been affected by another factor, namely the time intervals. However, an ANOVA found no significant main effect of duration or a duration × region interaction for the theta and alpha band amplitude during the interval between −0.5 and −0.1 s before the onset of the delay phase. This excluded the likelihood of an effect of the different time intervals between the baseline and delay phase on the theta and alpha band amplitude during the delay phase. Thus, the behavioural results did not support the presence of a threshold point between short and long auditory durations, which was different from the EEG results. This observation is likely due to the limitations of behavioural studies. Previous studies have suggested that variability in timing behaviour is mainly attributed to encoding, memory, and decision-making processes39, 46. Thus, WM as well as encoding and decision-making processes influenced the behavioural results of our study. EEG technology, nevertheless, can distinguish processes of encoding, maintenance, and decision-making for temporal information. Clearly, EEG data are more suitable than behavioural data for investigating a threshold point between short and long durations3. Finally, there are some other differences between the neural representation of visual and auditory durations in addition to the difference of the critical threshold point (e.g. the encoding-related alpha oscillations and the topography of WM maintenance). This probably indicates that WM for stimulus duration is organised by stimulus modality, as predicted in previous behavioural studies7, 9, 10. On the basis of the advantages of electroencephalography technology, the present study has partly revealed the differences between visual and auditory durations in the neural representation compared to Chen et al.’s study3. However, this is still a preliminary assessment. Further detailed studies of various advantageous techniques (e.g. functional magnetic resonance imaging with high spatial resolution) are still needed.

Methods

Participants

Twenty-five right-handed undergraduates (9 men, 21.00 ± 1.78 years old) were paid for their participation in the present study. Each participant provided written informed consent, and they had normal or corrected-to-normal vision. Our study received approval from the local institutional review board of Southwest University and was compliant with the ethical standards of the Declaration of Helsinki47.

Stimuli and apparatus

Auditory stimuli were 1, 2, 3, or 4-s pure tones of 1,000 Hz that were binaurally presented through Sennheiser stereo earphones. The loudness was adjusted to be comfortable to the subjects at ~60 dB. The response signal was a 2-cm white question mark. The computer screen was positioned approximately 75 cm from the participants’ eyes. The refresh rate of the monitor was 85 Hz.

Procedures

We used a classical matching-to-sample task (Fig. 1). Participants first heard a pure tone (the sample stimulus), followed by a 3-s interval (the delay/maintenance phase). Next, a second pure tone (the probe stimulus) was presented. Each tone was randomly presented for 1, 2, 3, or 4 s. After a 1-s interval, a question mark (response signal) was displayed on the screen until a key was pressed, or for a maximum of 2 s. During the response period, participants were instructed to press ‘1’, ‘2’, or ‘3’ correspondingly after estimating the duration of the second pure tone (probe) as shorter, equal to, or longer than the first tone (sample). Half of the participants responded with their left hand, and the other half responded with their right hand. The intertrial interval was 2 s. There were seven blocks with 48 trials in each block. Every duration condition contained 84 trials.

Electrophysiological recording

A continuous EEG was acquired from a 64-channel scalp cap (Brain Products GmbH, Herrsching, Germany) at a rate of 500 Hz, which conformed to the extended 10–20 system of channel locations with an amplifier bandpass of 0.01–100 Hz, including a 50-Hz notch filter. Additional electrodes were positioned on the left and right mastoids. The horizontal and vertical electrooculogram were acquired using electrodes positioned at the external ocular canthi and below the right eye, respectively. Impedances were maintained below 5 kΩ for all electrodes.

EEG analysis

All data processing was performed offline using EEGLAB48 and MATLAB (MathWorks, Natick, MA, USA). Continuous EEG data were re-referenced to the average of bilateral mastoids, and high-pass filtered at 0.5 Hz3, 23, 38. EEG epochs were extracted in 9-s time windows (pre-stimulus 1 s and post-stimulus 8 s, and zero was the onset of the sample stimulus). Each EEG epoch was baseline-corrected by subtracting the mean voltage before the sample stimulus. Epochs with an amplitude exceeding ± 100 μV were automatically marked and then manually confirmed and excluded through visual inspection. An independent component analysis was implemented to remove artefacts such as eye blinks and movements in terms of the scalp maps and activity profile49, 50. A total of 93.40% of the trials remained, and there was no significant main effect of duration (four duration conditions) on the remaining number of trials after ANOVA [F(2.144,51.455) = 0.103, p > 0.05, η p 2 = 0.004].

A time-frequency analysis was performed for the segmented and artefact-free data, which used Hanning-windowed sinusoidal wavelets of three cycles at 3 Hz, increasing linearly to approximately 15 cycles at 30 Hz; a similar approach was used by Makeig et al.51 and Chen et al.3. It was used to optimise the trade-off between temporal resolution at lower frequencies and stability at higher frequencies by selecting the modified wavelet transform51. The event-related spectral perturbation (ERSP) index was adopted to compute the changes in event-related spectral power response (in dB)52, which was decibel-transformed relative to the baseline interval (−0.4 s to −0.1 s) before the sample stimulus. Figure 2 shows the ERSP during the encoding, delay, and probe phases of the sample duration relative to the baseline. The spectral analysis was based on single trials.

As shown in Fig. 3, steady theta (4–8 Hz) and alpha band (8–12 Hz) activities were elicited during WM maintenance after averaging across frequency. For this new coordinate, zero represents the onset of the delay phase. Another average was computed for the theta and alpha amplitude during the delay phase, which is shown in Fig. 4. Topographic EEG power analyses were implemented by grouping electrodes into nine clusters based on previous studies38, i.e. the left-frontal cluster (Fp1, AF3, AF7, F3, and F5), middle-frontal cluster (Fpz, F1, Fz, and F2), right-frontal cluster (Fp2, AF4, AF8, F4, and F6), left-central cluster (FC5, C3, C5, T7, and CP5), middle-central cluster (FCz, Cz, C1, C2, and CPz), right-central cluster (FC6, C4, C6, T8, and CP6), left-posterior cluster (P3, P5, P7, PO3, and O1), middle-posterior cluster (P1, P2, Pz, POz, and Oz), and right-posterior cluster (P4, P6, P8, PO4, and O2).

The time intervals were different between the baseline and delay phase among the four duration conditions. Therefore, we chose an additional statistical time window of 0.5 s to 0.1 s before the onset of delay to examine whether the following statistical analysis was contaminated by the different time intervals. A two-way repeated measures ANOVA was successively performed on the mean power of the theta and alpha bands during 0.5 s to 0.1 s and 1 s to 3 s. Duration (1 s, 2 s, 3 s, and 4 s) and region (nine clusters) were the ANOVA factors. We adopted the Greenhouse-Geisser correction method to correct for any violations of sphericity53, and we used partial eta squared (η p 2) to estimate the ANOVA effect size54.