Neural oscillations associated with auditory duration maintenance in working memory

The neural representation of auditory duration remains unknown. Here, we used electroencephalogram (EEG) recordings to investigate neural oscillations during the maintenance of auditory duration in working memory (WM). EEG analyses indicated that the auditory duration length was not associated with changes in the theta band amplitude, whereas the alpha band amplitudes during 3-s and 4-s auditory duration conditions were lower than during the 1-s and 2-s conditions. Moreover, the alpha band amplitude and accuracy were positively correlated in the 2-s duration condition. We also found that the neural representation of auditory duration is segmented, with a critical threshold point of approximately 2 s, which is shorter than that for visual duration (3 s). The results emphasised the involvement of the alpha band in auditory duration maintenance in WM. Our study’s findings indicate that different internal representations of auditory durations are maintained in WM below and above 2 s from the perspective of electrophysiology. Additionally, the critical threshold point is related to the sensory modality of duration.

There is some evidence to support this claim, e.g. Elbert et al. 26 employed event-related potentials (ERPs) and found that there was a critical threshold (3 s) for the accuracy of visual duration reproduction. Additionally, it was accompanied by a slow negative wave named contingent negative variation (CNV) during reproduction durations within the threshold, whereas the CNV was reduced or even absent when durations were beyond the threshold. Moreover, a functional magnetic resonance imaging showed that the motor system and default mode network respectively process durations below and above 2 s 27 . Further indications for the perspective of segmented duration representation are derived from the perception of rhythmic coherence (e.g. it becomes impossible about the perception of rhythm if the tones are separated by intervals exceeding 3 s in a regular sequence 28,29 ) and sensorimotor synchronization (e.g. the appropriate synchronization between a regular sequence of beats and corresponding motor acts breaks down when the interstimulus interval between the beats exceeds 2~3 s 30 ).
However, the neural representation of auditory duration remains unknown, and thus is the focus of our study. It is an important characteristic of duration perception that subjectively perceived durations vary between different sensory modalities (e.g. auditory or visual). Many studies showed that auditory signals are often judged as longer than visual signals for a given duration [31][32][33][34][35] , for example, Wearden et al. 34 used duration bisection task and verbal estimation found that auditory signals appeared longer than visual signals in all cases, and the effect was greater at longer stimulus durations. They suggested that auditory signals drive an internal clock at a faster rate than visual signals 32,33 . Similar auditory/visual difference in duration judgment was also observed in 5-and 8-year-old children as well as young adults 35 . But this modality difference was more obvious for children. They believed that it is more automatic for the processing of auditory than visual signals for all age groups, however, visual signals require more attentional resources. Then it is more difficult for children to process visual signals than adults because of the limited attentional abilities. Therefore, considering that auditory signals are often judged to be longer than visual ones [31][32][33][34][35] , we hypothesised that the neural representation of auditory duration in WM is also segmented, but the critical threshold may be shorter than that for visual duration.
Here, we applied a matching-to-sample task ( Fig. 1), following the study of Chen et al. 3 , to investigate the theta and alpha oscillation correlates of auditory duration maintenance in WM. The advantage of this paradigm includes the separation of the encoding, maintenance, and decision stages of temporal information 36,37 . Subjects were required to maintain one duration (1 s, 2 s, 3 s, and 4 s) in WM. As for the theta band, we hypothesised that the auditory duration length would not be associated with changes in the theta band amplitude, because the theta band mainly reflects the maintenance of temporal order information 23,38 . As mentioned earlier, we hypothesised that the neural representation of auditory duration would be segmented. Thus, when it comes to the controversial role of alpha oscillation in WM, we inferred that if the alpha oscillation reflects different internal representation of auditory durations below and above the critical threshold point, then the alpha band reflects the successful maintenance of item information 3,17,23 . If there is no significant alpha band difference between durations below and above the critical threshold point, then the alpha band reflects inhibition of no-longer-relevant information 18,21,22 .
There was also a significant effect of duration on reaction time  Trial sequences and durations of each screen presentation. Auditory stimuli were randomly presented for 1, 2, 3, or 4 s. Each trial starts with a pure tone (sample), followed by a 3-s interval (the delay/maintenance phase). Next, a second pure tone (probe) is presented. Participants press '1' , '2' , or '3' correspondingly after estimating the duration of the second pure tone (probe) as shorter, equal to, or longer than the first tone (sample). EEG data. Figure 2 shows the dynamic activities of theta and alpha powers during the encoding (sample), delay, and probe phases. Figure Figure 5 shows that the region with the highest theta and alpha band powers during WM maintenance were the posterior parietal (i.e. the left-posterior cluster, middle-posterior cluster, right-posterior cluster, and middle-central cluster). A correlation analysis was performed to determine the relationship between the amplitude of the alpha power and the accuracy at the posterior parietal cortex (i.e. 20 channels in total) during the delay phase for every duration condition to further examine the role of the alpha band power in auditory duration maintenance in WM. As shown in Fig. 6, we found that the amplitude of the alpha power was positively correlated with the accuracy of the 2-s condition (Pearson r = 0.569, p < 0.01). In contrast, the correlations for the 1-s (r = 0.155, p > 0.05), 3-s (r = 0.094, p > 0.05) and 4-s conditions (r = −0.027, p > 0.05) were not significant.

Discussion
Using a matching-to-sample task, the present study was to investigate the neural oscillations associated with auditory duration maintenance in WM. EEG analyses indicated that the auditory duration length was not associated with changes in the theta band amplitude, which confirmed our hypothesis that the theta band is not involved in auditory duration maintenance in WM. The alpha band amplitudes during 3-s and 4-s auditory duration conditions were lower than during the 1-s and 2-s conditions. Additionally, the alpha amplitude positively correlated with accuracy in the 2-s condition. These results emphasise the involvement of the alpha band in auditory duration maintenance in WM. They are also consistent with our hypothesis that the neural representation of auditory duration is segmented with a critical point of approximately 2 s, which is shorter than that for visual duration (3 s).
The neural representations of different lengths of durations are distinct. Separate mechanisms process durations below and above the critical threshold point. Compared with the results of Chen et al. 's 3 study on visual duration, our study indicated that there are different threshold points between different sensory modalities. As mentioned earlier, only duration below the critical threshold point is represented as a coherent experience or temporal gestalt in WM 25 . However, auditory signals are often subjectively perceived to be longer than visual ones for a given duration [31][32][33][34][35] . As described in the pacemaker-accumulator model of timing 39 , a pacemaker in the brain emitted pulses at the onset of a timed auditory/visual stimulus, and these pulses were summed by an accumulator until the stimulus stopped. Perception of duration depends on the number of pulses counted in the accumulator. Some researchers claim that the rate of internal clock for auditory signals is faster than that for visual ones 33,34 . Others hold that it is better for auditory signals to capture attention than visual ones 32,40 . In both cases, the accumulated clock value of auditory signals will be larger; therefore, the critical point of auditory duration in WM will be shorter compared with visual duration.
The role of the alpha oscillation is controversial, and our study's findings supported that the alpha band is related to successful maintenance of item information. The most important reason is that the 'inhibition' hypothesis of the alpha band cannot explain the separate representations of auditory duration below and above 2 s. Every duration condition should be inhibited in the same way if alpha band reflects inhibition of no-longer-relevant information 18,21,22 , and there should be no significant alpha band difference between durations below and above the critical threshold point. Furthermore, our results indicated that the alpha band power was greatest over the posterior parietal during WM maintenance, which was suggested to be important to the storage of information in WM 41,42 . One EEG study also reported that alpha band activities distributed over the posterior parietal and lateral occipital were increased during item maintenance in WM, and the enhancement was primarily evident in high performers on the item WM task 23 . The parietal cortex is also considered as the most important region associated with temporal processing 43 . In addition, individual threshold differences are likely to exist even though the general threshold is 2 s. That is, individuals with a longer threshold (2 s is a short duration) likely show a high accuracy and a high alpha band amplitude if the alpha band is engaged in the maintenance of auditory duration in WM. In contrast, individuals with a shorter threshold (2 s is a long duration) probably show a low discrimination accuracy and a low alpha band amplitude. Indeed, we found a positive relationship between the alpha amplitude and accuracy in the 2-s auditory duration condition (Fig. 6). Previous studies have considered that WM-related alpha oscillations during WM maintenance are an essential constituent that sustains the neural representations of memorised items 44 , and they may preserve the fidelity of stimulus representations during WM maintenance 23 . It seems that the alpha oscillation fluctuations during stimulus maintenance represent complex network activity of internally directed attention that promote WM processes to enhance the stimulus maintenance 45 .
Four issues must be noted. First, there may be varying degrees of task difficulty among 1-s, 2-s, 3-s, and 4-s duration conditions. For example, determining the difference of a 1-s sample and 2-s probe could be more difficult than that of a 1-s sample and 4-s probe. Moreover, the responses were chosen from two options (i.e. equal or longer/shorter) for 1-s and 4-s samples and three options (i.e. equal, longer, or shorter) for 2-s and 3-s samples. Thus, it would probably have been easier to make a decision for 1-s and 4-s samples than 2-s and 3-s. However, the behavioural result showed that the accuracy of judging significantly decreased with the increase of sample durations. This shows that the main conclusion was not affected by the task difficulty. Second, the time intervals between the baseline and delay phase among the 1-s, 2-s, 3-s, and 4-s conditions were different, which suggests the influence of the alpha amplitude in the delay phase. Thus, the alpha amplitude between −0.5 and −0.1 s before the onset of the delay phase may have been affected by another factor, namely the time intervals. However, an ANOVA found no significant main effect of duration or a duration × region interaction for the theta and alpha band amplitude during the interval between −0.5 and −0.1 s before the onset of the delay phase. This excluded the likelihood of an effect of the different time intervals between the baseline and delay phase on the theta and alpha band amplitude during the delay phase. Thus, the behavioural results did not support the presence of a threshold point between short and long auditory durations, which was different from the EEG results. This observation is likely due to the limitations of behavioural studies. Previous studies have suggested that variability in timing behaviour is mainly attributed to encoding, memory, and decision-making processes 39,46 . Thus, WM as well as encoding and decision-making processes influenced the behavioural results of our study. EEG technology, nevertheless, can distinguish processes of encoding, maintenance, and decision-making for temporal information. Clearly, EEG data are more suitable than behavioural data for investigating a threshold point between short and long durations 3 . Finally, there are some other differences between the neural representation of visual and auditory durations in addition to the difference of the critical threshold point (e.g. the encoding-related alpha oscillations and the topography of WM maintenance). This probably indicates that WM for stimulus duration is organised by stimulus modality, as predicted in previous behavioural studies 7,9,10 . On the basis of the advantages of electroencephalography technology, the present study has partly revealed the differences between visual and auditory durations in the neural representation compared to Chen et al.'s study 3 . However, this is still a  preliminary assessment. Further detailed studies of various advantageous techniques (e.g. functional magnetic resonance imaging with high spatial resolution) are still needed.

Methods
Participants. Twenty-five right-handed undergraduates (9 men, 21.00 ± 1.78 years old) were paid for their participation in the present study. Each participant provided written informed consent, and they had normal or corrected-to-normal vision. Our study received approval from the local institutional review board of Southwest University and was compliant with the ethical standards of the Declaration of Helsinki 47 .
Stimuli and apparatus. Auditory stimuli were 1, 2, 3, or 4-s pure tones of 1,000 Hz that were binaurally presented through Sennheiser stereo earphones. The loudness was adjusted to be comfortable to the subjects at ~60 dB. The response signal was a 2-cm white question mark. The computer screen was positioned approximately 75 cm from the participants' eyes. The refresh rate of the monitor was 85 Hz.
Procedures. We used a classical matching-to-sample task (Fig. 1). Participants first heard a pure tone (the sample stimulus), followed by a 3-s interval (the delay/maintenance phase). Next, a second pure tone (the probe stimulus) was presented. Each tone was randomly presented for 1, 2, 3, or 4 s. After a 1-s interval, a question mark (response signal) was displayed on the screen until a key was pressed, or for a maximum of 2 s. During the response period, participants were instructed to press '1' , '2' , or '3' correspondingly after estimating the duration of the second pure tone (probe) as shorter, equal to, or longer than the first tone (sample). Half of the participants responded with their left hand, and the other half responded with their right hand. The intertrial interval was 2 s. There were seven blocks with 48 trials in each block. Every duration condition contained 84 trials.
Electrophysiological recording. A continuous EEG was acquired from a 64-channel scalp cap (Brain Products GmbH, Herrsching, Germany) at a rate of 500 Hz, which conformed to the extended 10-20 system of channel locations with an amplifier bandpass of 0.01-100 Hz, including a 50-Hz notch filter. Additional electrodes were positioned on the left and right mastoids. The horizontal and vertical electrooculogram were acquired using electrodes positioned at the external ocular canthi and below the right eye, respectively. Impedances were maintained below 5 kΩ for all electrodes. EEG analysis. All data processing was performed offline using EEGLAB 48 and MATLAB (MathWorks, Natick, MA, USA). Continuous EEG data were re-referenced to the average of bilateral mastoids, and high-pass filtered at 0.5 Hz 3, 23, 38 . EEG epochs were extracted in 9-s time windows (pre-stimulus 1 s and post-stimulus 8 s, and zero was the onset of the sample stimulus). Each EEG epoch was baseline-corrected by subtracting the mean voltage before the sample stimulus. Epochs with an amplitude exceeding ± 100 μV were automatically marked and then manually confirmed and excluded through visual inspection. An independent component analysis was implemented to remove artefacts such as eye blinks and movements in terms of the scalp maps and activity profile 49,50 . A total of 93.40% of the trials remained, and there was no significant main effect of duration (four duration conditions) on the remaining number of trials after ANOVA [F(2.144,51.455) = 0.103, p > 0.05, η p 2 = 0.004]. A time-frequency analysis was performed for the segmented and artefact-free data, which used Hanning-windowed sinusoidal wavelets of three cycles at 3 Hz, increasing linearly to approximately 15 cycles at 30 Hz; a similar approach was used by Makeig et al. 51 and Chen et al. 3 . It was used to optimise the trade-off between temporal resolution at lower frequencies and stability at higher frequencies by selecting the modified wavelet transform 51 . The event-related spectral perturbation (ERSP) index was adopted to compute the changes in event-related spectral power response (in dB) 52 , which was decibel-transformed relative to the baseline interval (−0.4 s to −0.1 s) before the sample stimulus. Figure 2 shows the ERSP during the encoding, delay, and probe phases of the sample duration relative to the baseline. The spectral analysis was based on single trials.
As shown in Fig. 3, steady theta (4-8 Hz) and alpha band (8-12 Hz) activities were elicited during WM maintenance after averaging across frequency. For this new coordinate, zero represents the onset of the delay phase. Another average was computed for the theta and alpha amplitude during the delay phase, which is shown in Fig. 4. Topographic EEG power analyses were implemented by grouping electrodes into nine clusters based on previous studies 38  The time intervals were different between the baseline and delay phase among the four duration conditions. Therefore, we chose an additional statistical time window of 0.5 s to 0.1 s before the onset of delay to examine whether the following statistical analysis was contaminated by the different time intervals. A two-way repeated measures ANOVA was successively performed on the mean power of the theta and alpha bands during 0.5 s to 0.1 s and 1 s to 3 s. Duration (1 s, 2 s, 3 s, and 4 s) and region (nine clusters) were the ANOVA factors. We adopted the Greenhouse-Geisser correction method to correct for any violations of sphericity 53 , and we used partial eta squared (η p 2 ) to estimate the ANOVA effect size 54 .