Abstract
The latency of the auditory steadystate response (ASSR) may provide valuable information regarding the integrity of the auditory system, as it could potentially reveal the presence of multiple intracerebral sources. To estimate multiple latencies from highorder ASSRs, we propose a novel twostage procedure that consists of a nonparametric estimation method, called apparent latency from phase coherence (ALPC), followed by a heuristic sequential forward selection algorithm (SFS). Compared with existing methods, ALPCSFS requires few prior assumptions, and is straightforward to implement for higherorder nonlinear responses to multicosine sound complexes with their initial phases set to zero. It systematically evaluates the nonlinear components of the ASSRs by estimating multiple latencies, automatically identifies involved ASSR components, and reports a latency consistency index. To verify the proposed method, we performed simulations for several scenarios: two nonlinear subsystems with different or overlapping outputs. We compared the results from our method with predictions from existing, parametric methods. We also recorded the EEG from ten normalhearing adults by bilaterally presenting superimposed tones with four frequencies that evoke a unique set of ASSRs. From these ASSRs, two major latencies were found to be stable across subjects on repeated measurement days. The two latencies are dominated by lowfrequency (LF) (near 40 Hz, at around 41–52 ms) and highfrequency (HF) (> 80 Hz, at around 21–27 ms) ASSR components. The frontalcentral brain region showed longer latencies on LF components, but shorter latencies on HF components, when compared with temporallobe regions. In conclusion, the proposed nonparametric ALPCSFS method, applied to zerophase, multicosine sound complexes is more suitable for evaluating embedded nonlinear systems underlying ASSRs than existing methods. It may therefore be a promising objective measure for hearing performance and auditory cortex (dys)function.
Introduction
Auditory steadystate responses (ASSRs) are stable brain oscillations that are locked to the frequencies present in the periodic envelope of acoustic stimuli^{1,2}. Studies have indicated that ASSRs may be generated at different levels within the auditory pathway, ranging from as early as the cochlear nerve and subcortical sources^{3,4}, to the neocortex^{5,6,7,8}. Because of their reproducibility, and involuntary nature, ASSRs have been considered a valid objective biomarker for auditory system disorders that feature abnormal sound processing, as well as for evaluating primary and nonprimary auditory cortex function^{9,10}. Its presence could relate to basic bottomup sound processing^{11,12}, as well as to more complex cognitive skills, such as auditory learning^{13}, selective attention^{14}, speech and music perception^{7}, mental disorders^{15}, or illnesses such as tinnitus^{9,16}. Furthermore, since ASSRs may contain signatures of the neural processing from auditory periphery to cortex, the electrically evoked ASSR from a cochlear implant (CI) could provide an objective measure for the responsiveness of different regions of the auditory pathway of hearingimpaired CIusers^{17}.
It should be noted that ASSRs are due to nonlinear mechanisms in the auditory system (see^{18}, for a review). At the earliest stages in the auditory pathway, the nonlinear character of neural sound encoding has been evaluated from auditory nerve recordings. A stimulus consisting of multiple pure tones (the carrier frequencies) was shown to result in an envelope spectrum that could be characterized by a unique series of 2ndorder nonlinear difference frequencies (or ‘beats’) between any pair of carrier frequencies, which all showed up in the auditory nerve response. This indicates that these 2ndorder nonlinear distortions already arise at, or before, the auditory nerve^{19}. In the ascending auditory pathway, such harmonic complexes might also evoke higherorder nonlinear components, which in turn could give rise to additional ASSRs. In particular, when two pure tones with frequencies \(f_1\) and \(f_2\), are passed through a nonlinear system with system order R, a series of combination tones is produced at the output, characterized by \(nf_1\pm mf_2\) (> 0), with n and m positive integers, such that \(n+m \le R\)^{16,20}. So far, few studies have systematically evaluated such higherorder nonlinearities in the EEG responses of the human auditory system.
Apart from these nonlinear distortion components, also the latency at which they occur in the EEG provides information of the underlying neural mechanisms, and could reveal the contribution from different intracerebral sources^{6,7,21}. Although the auditory system consists of both ascending and descending pathways^{5}, the ASSRs recorded at the scalp are assumed to be mainly due to the ascending system. Therefore, the longer the latency, the higher in the auditory system its generators^{1}.
A steadystate response is characterized by its amplitude and phase. The phase (with certain ambiguity) can be used to extract a socalled ‘apparent latency’ (the slope of the phase versus frequency plot) introduced by Regan (1966). In auditory physiology, the apparent latency is described as ‘group delay’^{22} and has been used to measure the latency of the motion of the basilar membrane, the cochlear microphonic, otoacoustic emissions, and the discharge of auditory nerve fibers, although the actual relation of apparent latency to physiological delay is not clear^{1}. Specifically, apparent latency, \(\tau\), is derived from the relation between the response phase, \(\Delta \phi\), and frequency: \(\Delta \phi = 2\pi f\cdot \tau\)), and is used as an indirect method to estimate the neural response latency for steadystate stimuli, when a direct measurement is not feasible^{22}. This method, however, suffers from the inherent \(2\pi\) phase ambiguity from the socalled wraparound effect^{23}, and is only suitable for analyzing loworder ASSR components (e.g., envelope frequencies) when the initial phases are known (so that phase lags can be computed). The more recentlydeveloped multispectral phasecoherence (MSPC) method^{20}; or similarly, the crossspectral coherence (CSC) method^{24}, resolves the phaseambiguity problem, and allows evaluation of higherorder nonlinear response components. However, the MSPC method relies on prior assumptions regarding the underlying nonlinear systems: knowledge about the system orders is required for evaluating the expected nonlinear interactions, from which response latencies can be estimated. Therefore, the MSPC method is difficult to implement when the number of underlying systems and their orders (and thus, their nonlinear interactions) are unknown. Moreover, when using multiple carrier frequencies, there is an explosion of the number of potential nonlinear interactions: for given nonlinear systems and multitone sinusoidal inputs, a recursive method^{25} can yield the series of output frequencies. The number of output frequencies is a steeply increasing function of the system order R and the number of input frequencies, I, and its accurate computation is nontrivial because of potentially overlapping distortion products^{26}; see also Appendix C in Supplemental Material).
Thus, the current stateoftheart methods are limited in estimating multiple latencies for different nonlinear systems from the ASSRs. However, as indicated above, several studies have suggested that ASSR sources can be located at multiple cortical and subcortical regions^{3,6}, which are likely associated with different latencies. For example, the twocomponent (brainstemcortex) model suggests that a cortical generator will dominate lowfrequency distortion products (around 45 Hz), whereas a brainstem generator would dominate the higher modulation frequencies (around 90 Hz)^{2,27,28}. As the measured EEG will typically contain the components from these multiple generators, a more general hybrid nonlinear model would read:
in which the unknown parameters are the number of subsystems M, the subsystem order \(R_m\), the subsystem latencies \(\tau _m\), as well as the potentially frequencydependent gains, \({{\psi _{mr}}}\). Note that, in general, it will not be possible to uniquely identify each nonlinear system, when the underlying nonlinearities and their inputs are unknown. Yet, for certain cases, systemspecific information can be obtained from the ASSRs. Figure 1 illustrates a potential simple scenario of such nonlinear models. The EEG is considered to reflect the (linear) sum of the outputs from multiple nonlinear subsystems (here taken, for simplicity, as homogeneous nonlinearities of different orders, \(R_m\), and each at a different latency, \(\tau _m\)). The output distortion frequencies of these subsystems will differ when they receive nonoverlapping inputs (e.g., systems \(S_1\) and \(S_2\)). In this case, the latencies of both subsystems can be accurately estimated. In contrast, the output distortion products could be largely overlapping when the systems receive identical inputs, as in \(S_2\) and \(S_M\). In such a case, it will be much harder to estimate the underlying latencies. In this paper, we simulated both cases.
There are currently no suitable methods to deal with such models, and the the commonly used apparent latency (or group delay) method cannot be readily aplied. The apparent latency method requires computation of phase lags for a group of nearby frequencies, which is difficult for higherorder ASSRs, due to (1) the \(2\pi\) wrapping problem (i.e., how many cycles have passed?), (2) the nonlinear distortion of phases for higherorder ASSRs (which could introduce a bias within \(2\pi\), see Appendix C in Supplemental Material), and (3) higherorder ASSRs often cover a large frequency range. Thus, the assumption that a group of target frequencies share a common latency does not apply anymore. As described earlier, the SPC method may partly account for the first two problems through a parametric approach, which then hinges on the assumption of the underlying nonlinear system orders (and their corresponding nonlinear interactions). It then estimates a latency that follows the apparent latency rule (i.e., \(\tau =\Delta \phi /2\pi f\)). Here, we applied the MSPC method as an extension of the classical group delay method for higherorder ASSRs. However, the method assumes a common latency for all nonlinear response components with the same system order (e.g., all 2nd order responses), which might not be suitable for the ASSR analysis when the EEG signals arise from multiple subsystems (as illustrated in Fig. 1).
We here propose and test an alternative, nonparametric, semiheuristic method that requires few prior assumptions regarding the underlying EEG sources, and aims to objectively estimate multiple potential latencies from the ASSRs. We named our method ‘apparent latency from phase coherence’ (ALPCSFS), which is applied to a specifically designed stimulus complex, and implements a heuristic sequential forward selection (SFS) algorithm^{29} to identify the components of each subsystem. Our method can estimate multiple latencies corresponding to more than one underlying source by using only one EEG experiment. Our ALPCSFS method does not have to make prior assumptions about the number of underlying systems, their input frequencies, or their system orders. To test our method, we performed simulations for several scenarios, and recorded EEG responses from ten normalhearing participants to superimposed tones that evoked highorder ASSRs during passive listening.
We also compared our ALPCSFS method with the existing MSPC method, for both the simulation data and the EEG recordings. Both methods can be viewed as extensions of the classical groupdelay method for higherorder ASSRs, as both follow the apparent latency rule. They will achieve an identical latency estimate if only one common latency exists. However, when multiple potential latencies underlie the EEG, our ALPSSFS method will outperform the MSPC method (and the group delay method), achieving correct latencies for each subsystem, and identifying the associated frequencies that drive each subsystem (see Fig. 3).
Methods
To estimate multiple latencies from recorded ASSRs, we developed a twostage procedure that consists of the nonparametric ALPC estimation method, followed by the heuristic SFS algorithm. The ALPC method can apply phase compensation on stimuli with arbitrary initial phases (as used in^{20}), or it can use a much simpler time compensation (TC) procedure for a special stimulus complex, in which the initial phases of all stimulus frequencies, and thus their nonlinear distortion products, are set to zero. Note that the former requires prior information regarding the underlying nonlinear orders of systems and the corresponding nonlinear interaction (see Appendix A in Supplemental Material). In contrast, ALPC with TC requires no such information, and thus can be viewed as a nonparametric method (see the righthand side of Fig. 3). We combined ALPC with SFS and TC to identify multiple latencies from the ASSRs.
Problem description
The EEG is considered as a timeseries signal, S, containing a number of distortion frequency components \(\{ {f_{{\Sigma _1}}},{f_{{\Sigma _2}}}, \ldots ,{f_{{\Sigma _N}}}, \ldots \}\), with corresponding angular frequencies \(\omega _{\Sigma _{i}} = 2\pi {f_{{\Sigma _i}}}\), which are the outputs of a nonlinear system. Assuming that N frequency components of them have a common, unknown, delay \(\tau\) relative to their initial phase \(\varphi _{\Sigma _{i}}\) at t = 0 (potentially having a common generator), we compute the current phase angle \(\alpha _{\Sigma _{i}}\) of each component by taking the Fourier transform (FT), where \(\alpha _{\Sigma _{i}} ,\varphi _{\Sigma _{i}} \in [0,\,2\pi ]\). They follow:
where \({n_i}\) are unknown integers, i.e., the number of cycles that a component is delayed, and \({\varepsilon _i}\) is the phase error (PE), caused by random noise.
To estimate the common latency, we minimize the meansquared phase error across N frequencies^{30}:
where \(\bmod ( \cdot )\) is the modulus between 0 and \(2\pi\). If the nonlinear system order, \(R\le 2\), the initial phases are straightforward to measure from the input stimulus by extracting the phases from the envelope of the input signal. However, when the system’s order is unknown, and higher than three, this method will fail, because the unknown \(\varphi _{\Sigma _{i}}\) are endowed with the wraparound problem (i.e., unknown \(n_i\)).
General framework of ALPC
First, to avoid that a discontinuous ‘jump’ (from 0 to \(2\pi\)) would affect the averaging in (3), we represent the components of (2) by Euler’s formula according to: \(\exp {(j(\alpha _{\Sigma _{i}}+\tau \omega _{\Sigma _{i}}))} = \exp {(2\pi n_i +\varphi _{\Sigma _{i}})} + {\mathbf{\varepsilon }}_i\) or, equivalently, \({\exp {(j(\alpha _{\Sigma _{i}}+\tau \omega _{\Sigma _{i}}))} = \exp {(j\varphi _{\Sigma _{i}} })} + {{\mathbf{\varepsilon }}_i}\), where \({\mathbf{\varepsilon }_i}\) is a vectorized phase error (PE) and its length \(\left {{{\mathbf{\varepsilon }}_i}} \right \in [0,\,2]\). We then minimize the cost, defined by the mean length of PE (MPE) across N frequencies, to estimate \(\tau _e\):
where \(\left {\, \cdot \,} \right\) is the length of a complex vector. Using the estimated \({\tau _e}\), we can compute \(n_i\) and the absolute phase lag of each frequency by \(2\pi {n_i} + \varphi _{\Sigma _{i}}  \alpha _{\Sigma _{i}}\) (equal to \({\tau _e}\omega _{\Sigma _{i}}  {\varepsilon _i}\)). Plotting the absolute phase lags for each frequency component as a function of frequency should yield a line with a slope \(d_\phi\), where the relationship between \({\tau _e}\) and \(d_\phi\) follows the apparent latency rule^{1}:
In (4), we estimate \({\tau _e}\) by searching within a period T of signal S, where T is the smallest common multiple of the periods of frequency components \(\{ {f_1},{f_2},...,{f_N}\}\). Then, the solutions for the latency are given by a set \(\{ \tau \tau = {\tau _e} + mT\}\), with m an arbitrary integer. In a particular application, a prior for the possible range of latency values can be imposed to constrain a unique value for \(\tau _e\).
Time compensation (TC) for stimuli with fixed initial phases
To avoid the need for computing the unknown \(\varphi _{\Sigma _{i}}\) when using stimuli with arbitrary initial phases (which constitutes ‘phase compensation’, see Appendix A in Supplemental Material), we here propose to use a more practical stimulus complex: a multicosine stimulus, in which all initial phases are set to zero at t = 0. With such a stimulus complex it is straightforward to apply time compensation (TC) for an arbitrary ongoing interval (starting at t > 0) in the following way (see Fig. 2).
For a given multicosine input stimulus, \(s(t) = \sum \limits _{i = 1}^I {\cos (\omega _i^{in}t + \varphi _i^{in})}\), with all initial phases \(\varphi _i^{in} = 0\) at \({t_0} = 0\), the ongoing stimulus interval, starting at \({t_0} + {\Delta _t}\), is denoted as \(s({t_0} + {\Delta _t})\). The delayed EEG signal (the ASSR) d(t) is considered to be the output of an unknown nonlinear system with an unknown latency, \(\tau _e\), i.e., \(d(t) = F(s(t  {\tau _e}))\), where \(F( \cdot )\) represents the unknown nonlinear mapping. The EEG epoch of interest will hence follow:
Therefore, from the ASSR we can estimate the pseudolatency \(\tau _p\) between \(d({t_0} + {\Delta _t})\) and \(s({t_0})\). Since \(\varphi _i^{in} = 0\), the corresponding initial phases \(\varphi _{\Sigma _{i}}\) of all highorder responses are also zero (i.e., \(\varphi _{\Sigma _{i}} = \sum \limits _{i = 1}^r {{a_i}} \varphi _i^{in} \equiv 0\), see Appendix A and C in Supplemental Material). Moreover, it can be shown that for zerophase sine inputs, all evenorder difference interactions (e.g., \(f_nf_m\), and \(2f_n2f_m\)) will yield the same initial phase of \(\pi /2\) rad, so that the method can also be applied to these components (after changing the initial phase to \(\pi /2\); see Appendix C). Therefore, from (4) and (6), we can estimate the pseudo latency \(\tau _p\). Finally, we apply TC to obtain the actual latency:
In our study, \(\Delta _t\)= 0.3 s, in order to exclude the nonstationary eventrelated potentials (ERP) in the EEG after stimulus onset (see Fig. 2). The device jitter (resulting to be < 0.5 ms) involved a small trigger jitter of the EEG recording system, and a fixed travel time of the acoustic signal through the silicon tube to the ears. Thus, the estimated latency had around 1ms system error, which was deemed acceptable for the estimated ASSR latencies.
Latency consistency index (LCI)
Considering that the estimated latency may vary due to additive noise on the nonstationary signals, we quantified its consistency over K time epochs. We defined the latency consistency index (LCI) for the ith output frequency \(f_{{\Sigma _i}}\) as \(C_\tau ^i\).
where \(\tau _{e,k}\) is the estimated latency for frequency component \(f_{{\Sigma _i}}\) on the kth epoch, and it follows \(\omega _{{\Sigma _i}}\tau _{e,k} = 2\pi {n_i} + \varphi _{\Sigma _{i}}  \alpha _{\Sigma _{i}}\) from (2), assuming Gaussian noise with zero mean. The LCI is hence mathematically equivalent to the phasecoupling strength used in MSPC^{20}, which shows that the higher the phasecoupling strength between inputs and outputs, the more stable the estimated latency. We selected the duration of each time epoch (without overlap) as multiples of the periods of the ASSR frequencies (e.g., integer seconds, given integer ASSR frequencies) so that the initial phase \(\varphi _{\Sigma _{i}}\) remains the same across epochs. In this case, \(\varphi _{\Sigma _{i}} \equiv 0\) when the cosine carriers have their initial phases set to zero. Thus, the LCI no longer relies on the stimulus frequencies, and reads:
\(C_\tau ^i\) can vary between 0 and 1. If the latency is perfectly stable over time epochs, \(C_\tau ^i=1\). Otherwise, if no stable latency exists, \(C_\tau ^i\) is statistically indistinguishable from zero. Therefore, LCI can be used to determine if an ASSR component can be used to estimate latencies. The theoretical threshold for significance is \(C_\tau ^{sig} = \sqrt{\frac{3}{K}}\)^{31}.
The null hypothesis is that the phase of an ASSR component is randomly distributed in the interval \([0,\,2\pi ]\). If \(C_\tau ^i\) exceeds \(C_\tau ^{sig}\), the null hypothesis is rejected, and the ASSR component is considered to be significant. \(C_\tau ^{sig}\) approximates the 95% confidence limit, as verified by Monte Carlo simulation^{20}. In our analysis, only significant ASSR components were considered for estimating latencies.
ALPC with sequential forward selection (SFS)
A particular set of ASSR components may have a common latency that leads to a low cost of the MPE in (4). In addition, the set of ASSR components may contain several subsets that correspond to different latencies, which would be indicative for different underlying sources. To automatically identify a frequency subset from all significant ASSR components that belong to a common latency, we applied the heuristic SFS algorithm. SFS is commonly used as an alternative to a greedy algorithm, to reduce computational complexity. SFS starts with a frequency (or frequency pair), and then sequentially adds another frequency (from the remaining set) that results in the smallest increase of the cost (i.e., MPE in a range of [0–2]). The details of the SFS algorithm are described in^{29,32}.
The termination rule of SFS should be carefully chosen, depending on the data at hand. Here, we used the following procedure. When starting from a single frequency, SFS continued as long as the following two criteria were met: (i) the increase of the MPE of the current step remained < 0.1, (ii) the current MPE < 0.5. Otherwise, SFS terminates. If SFS started from a pair of frequencies, it continued when the following three criteria were met: (i) the difference of estimated latencies between the current and the previous step < 5 ms, (ii) the step increase of MPE < 0.1, (iii) the current MPE < 0.5. Otherwise, SFS terminates. These criteria may be further tuned for the particular EEG data set.
Figure 3 summarizes the pipeline for estimating the apparent latency from the EEG by the MSPC and ALPCSFS methods, respectively. Depending on the initial phases of the carriers (either arbitrary, or all zero), one can use either a parametric (the lefthand side, MSPC, or group delay) or a nonparametric (the righthand side; ALPCSFS) approach to estimate the underlying latency. For the parametric approach, phase compensation (see Appendix A in Supplemental Material) can be used to estimate the unknown initial phases of the ASSRs, which are fed to the MSPC algorithm to estimate the dominant latency. Note that the phase compensation method requires prior assumptions regarding the nonlinear system. For the nonparametric approach as proposed in the present paper, ALPC with SFS can estimate multiple potential latencies, by selecting the involved ASSR frequencies for each subsystem.
EEG experiments
Experimental design and stimuli
We performed EEG recording experiments on 10 normalhearing adults (1 female, age 29 ± 13 y) (hearing thresholds < 20 dB HL). Each subject was measured twice, using the same stimuli on different days (i.e., > 48 h interval). The stimulus at the left ear was a superposition of two tones of 461 Hz and 500 Hz at 81 dB SPL, and two tones of 504 Hz and 537 Hz at 56 dB SPL. The stimuli at the right ear consisted of the reverse pattern: two loud tones of 504 Hz and 537 Hz at 81 dB SPL, and two weaker tones of 461 Hz and 500 Hz at 56 dB SPL. The stimuli were fed into both ears simultaneously. The four frequencies were chosen such that the difference frequencies and higherorder distortions were all unique (see Fig. A2 in the Supplemental Material).
In the EEG experiments described here, all input tones were sines with their initial phases at t = 0 set to zero. In Appendix C of Supplemental Material, we demonstrate that such carriers will yield irregular phaseshifts for all higherorder distortion products, except for the evenorder difference interactions (e.g., \(f_1f_2\), and \(2f_12f_2\)). The initial phase of these evenorder distortion products will remain the same. Thus, when only evenorder difference interactions in the ASSRs are considered (as in this study), APLCSFS with TC can also be applied for zerophase sine stimuli.
In each trial, stimuli with a duration of 12.3 s were played after a silent break with a random duration between 2 and 3 s. It has been reported that the primary auditory cortex responds within 20 ms to stimulus changes and integrates stimulus features over a period of about 200 ms, which is the presteadystate phasic response phase^{33}. Therefore, the first 300 ms of the EEG of each trial was excluded from analysis to avoid the contributions from phasic eventrelated potentials (i.e., nonASSRs). We expect the ASSR latencies to remain constant during the subsequent 12 s of the ASSR stage, which will allow for an accurate estimate of the neighboring SNR (see section E). We repeated 100 trials while recording the scalp EEG with a 64channel cap with Ag/AgCl electrodes, with the listener sitting in an anechoic chamber, watching a silent video. The EEG signals were digitized at a sampling rate of 2000 Hz by a Refa amplifier system (TMSi, Twente Medical Systems International B.V., the Netherlands), and stored for offline analysis. At the start of each recording session, the impedances of all EEG electrodes were carefully set to values below 10 \(k\Omega\) to ensure good contacts between electrodes and scalp. During the session, these values were regularly checked, and adjusted when deemed necessary. Stimuli were generated by TDT (TuckerDavis Technologies, USA) system 3 hardware, and presented through ER3C insert earphones (Etymotic Research, Elk GroveVillage, IL), which were connected to the participant’s ears via 30 cmlong plastic tubes and foam earplugs.
The experiments were approved by the ethics committee at Radboud University and performed in accordance with the human experiment guidelines and regulations of Radboud university. We confirm that informed consent was obtained from all subjects.
Preprocessing of EEG
We applied a threestage preprocessing protocol on the raw EEG recordings, after first removing the three EEG electrodes above the eyes (FP1, FPz, and FP2) because they were contaminated by eyeblinks and eyemovement artifacts. First, the raw EEG signals were rereferenced to a common average reference (CAR) to reduce the noise from the grounding electrode^{34}. Compared with rereference in relation to a certain EEG electrode that may not be ‘neutral’ to ASSRs, the CAR montage will keep the phases of ASSRs unchanged on most EEG electrodes. Second, each EEG channel was filtered with a zerophase shift filter^{35} that consisted of a 10thorder Butterworth highpass filter with a cutoff frequency of 1 Hz, and a notch filter at 50 Hz to remove the line noise. Third, to exclude severely contaminated EEG trials from further analysis (mainly by EMG artifacts, which may cause false detections or decreased SNRs of ASSR components), we developed a method to automatically exclude bad EEG trials. To that end, we computed the mean and the variance of the peaktopeak amplitude range as indicators on each EEG trial. We excluded EEG trials whose indicators were positive outliers on 100 trials, i.e., samples > Q3 + 1.5 * IQR. (Interquartile range (IQR) = upper quartile (Q3) − lower quartile (Q1)). In this way, up to 5% of all trials were automatically excluded from further analysis.
Epoch filtering by phase lock value (PLV)
We tested whether excluding poorly phaselocked epochs would improve the accuracy of the phase estimates, and latencies, for the frequencies of interest. To that end, we divided each trial into twelve onesecond epochs, and computed their phaselock value (PLV \(\in [0,1]\)) to each frequency of interest^{36}. PLV = 0 represents no phase locking between the frequency band of EEG and the frequency of interest, while PLV = 1 represents the complete phase locking. Epochs with a PLV value below a empirically selected threshold, \(\theta\) (taken as \(\theta\) = 0, 0.4, 0.6, or 0.8) were excluded. Finally, we computed the phase values on the remaining epochs by using two different averaging methods: AVG EEG, and AVG phase (described below).
Phase extraction of EEG
The accuracy of the extracted phases of the frequency components is important for estimating latency. Previous studies often average the EEG signals across a number of trials to reduce the background noise, so that ASSRs have a higher SNR^{37}. Here, we employed two different methods, ‘AVG EEG’ and ‘AVG phase’ to extract the phase of a target frequency. The ‘AVG EEG’ method averages filtered EEG signals of all trials and subsequently computes the phases of the average EEG. The phase values from this method tend to be dominated by epochs with strong ASSRs (i.e., with large amplitudes). Instead, the ‘AVG phase’ method averages the extracted phases from all EEG epochs with a duration that is an integer multiple of the signal period (1 sec in this study), without considering amplitude information (as used in^{20}). For this method, all EEG epochs thus contributed with equal weight to the estimate. For comparison, we used both methods to extract the phases on each EEG channel independently.
SNR on target frequencies
It is difficult to separate the signals generated by ASSRs from background noise in the EEG on a singletrial basis. However, after averaging the EEG across a number of trials (100 trials in this study), the power of phaselocked frequencies is enhanced, whereas the power of the background EEG is suppressed due to random phases. To quantify the SNR of each ASSR frequency component, we defined the neighboring SNR as^{3}:
where \(P({f_t})\) is the power of the target frequency bin \(f_t\). The \(f_i\) is a neighboring frequency bin within a small range (± 0.5 Hz) of \(f_t\). For a 12 s EEG trial, the width of a frequency bin is 1/12 Hz and N = 12. Compared with the frequency spectrum of the EEG, the neighboring SNR is robust to the pink noise present in EEG signals (see Fig. 9).
Note that the defined neighboring SNR is not an unbiased estimate of real SNR. In fact, the expected neighboring SNR \(E(SNR({f_t})) = SNR + 1\)^{38}. More specifically, the defined neighboring SNR follows a F distribution with (2, 2N) degrees of freedom (when the target frequency bin contains no signal), where N is the number of neighboring frequency bins. Given a typeI error (e.g., \(\alpha\) = 0.05) from the F test, the significance threshold \(\theta\) is a function of the number of neighboring frequency bins^{39}, i.e., \(\theta\) = finv(1 − α, 2, 2N) by using the MATLAB (2018a) function ‘finv’. For N = 12, the corresponding thresholds for a significance level p of 0.05 and 0.01 are 3.401 (5.318 dB) and 5.614 (7.492 dB), respectively.
Results
Simulation results
To illustrate the potential of ALPC with SFS in combination with zerophase cosine stimuli, we simulated different models and scenarios (cf. Fig. 1). In these examples, two parallel nonlinear subsystems with different delays could have different, or (partially) shared inputs.
Example 1: Two pure secondorder subsystems with nonoverlapping inputs
Figure 4 shows the simulation results of our ALPCSFS algorithm for the simple example of Eq. (11). The total input signal, x(t), consisted of five frequency components: (17, 21, 27, 41, 49) Hz, where \(x_1\) = (17, 21, 27) Hz were fed to a pure secondorder subsystem \(y_1\), and \(x_2\) = (41, 49) Hz to a similar subsystem \(y_2\). In this model, system \(y_1\) generated the distortion products \(\{2f_i, f_i \pm f_j\} = \{4, 6, 10, 34, 38, 42, 44, 48, 54\}\) Hz, and system \(y_2\) yielded the nonoverlapping components \(\{8, 82, 90, 98\}\) Hz. Subsystems \(y_1\) and \(y_2\) had different latencies \(\tau _1\) = 51 ms and \(\tau _2\) = 21 ms, respectively. Y is the combined output signal, (representing the EEG) with additive Gaussian white noise (WGN), \(\varepsilon (t)\). We added noise with different variances, such that the SNRs were 5, 0, − 5, − 10, − 15, and − 20 (dB) (using MATLAB’s function ‘awgn’).
To estimate the unknown latencies, we assume that only the total input (stimulus) and output (EEG) are available, but that the information regarding the subsystems (e.g., input frequencies of each subsystem, and system orders) is unknown. Figure 4 shows that our ALPCSFS method separated the mixed outputs into the appropriate contributions from the two underlying subsystems, and correctly estimated their latencies. Here, we took for the additional signal delay, \(\Delta _t = 0\), for simplicity, and signal Y had an SNR of 5 dB. Figure 4a shows the input and output signals, and their spectrum. Figure 4b illustrates the SFS procedure, starting from a randomly selected frequency component (here: 38 Hz) from the output frequency set (i.e., those frequencies whose power are higher than the background). The upper panels show the mean phase errors (MPEs) of all candidate frequencies on each step in the SFS, the bottom panel shows the corresponding estimated latencies of candidate frequencies at each step. The MPE started to rise significantly after step 8, where a termination criterion of SFS came into action (see Methods). Thus, the first nine (8 + 1) frequencies were selected, yielding an estimated latency of 51 ms. Subsequently, SFS was performed on the remaining frequency set, which gave an estimated latency for a second subsystem of 21 ms. Without SFS, an estimate based on all frequencies would result in only one dominant latency (51 ms), as shown for step 12 (Fig. 4b).
The relationship between SNR and accuracy of the estimated latencies is provided in Table A1 of Supplemental Material.
We also considered a scenario for which \(\Delta _t\) was not zero. We first estimated the pseudo latencies and applied TC to determine the actual latencies. The pseudo latency uses the property that the initial phases of the nonlinear distortions of zerophase cosine inputs remain the same (see Appendix C in Supplemental Material for details). In this way, we do not need to apply phase compensation. However, when \(\Delta _t\) exceeds the actual latency, the corresponding pseudo latency will be negative (Eq. 6), yielding a negative slope for the phaselag vs. frequency relation (See Fig. 5).
For comparison, we also applied the MSPC method^{20} to this simple model. Also this method estimates a system latency by minimizing the MPE over a time window. It requires the total input (stimuli) and output (EEG), and has to make an assumption about the underlying system orders. In case of a correct selection of inputs and outputs and systemorder information, MSPC may perform equally well as ALPCSFS (Fig. 3). However, when the subsystems inputs and outputs are not known a priori, MSPC may fail to correctly identify the systems and make a biased estimate. Figure 6 demonstrates this for the same settings as in Fig. 4. The MSPC method estimated only one biased latency (close to the dominant one) for the assumed 2ndorder system. However, it failed to find the second hidden subsystem, which produced fewer nonlinear distortion products.
Example 2: Models containing multiple system orders
We also applied both methods to a more complex model, in which each subsystem contained a second and a thirdorder nonlinearity:
The total input signal X(t) contained four frequency components: {37, 43, 38, 46} Hz, where \(x_1\)=(37, 43) Hz were fed to subsystem \(y_1\), and \(x_2\)=(38, 46) Hz to subsystem \(y_2\). In this example, subsystem \(y_1\) generated the following distortion product set: {6, 74, 80, 86, 31, 37, 43, 49, 111, 117, 123, 129} Hz, and subsystem \(y_2\) yielded the nonoverlapping set: {8, 76, 84, 92, 30, 38, 46, 54, 114, 122, 130, 138} Hz. For both frequency sets, the first four entries are 2ndorder distortion products and the remaining components are 3rdorder outputs. Subsystems \(y_1\) and \(y_2\) had latencies \(\tau _1\) = 51 ms and \(\tau _2\) = 21 ms, respectively. Y is the combined output signal, representing the EEG.
To estimate the latencies, we assumed that only the total input (X) and output (Y) were available, and that there was no prior information about the subsystems. Figure 7 shows the estimated latencies for the ALPCSFS and MSPC methods. In this example, signal Y had an SNR of 5 dB. Without knowing the information of both subsystems \(y_1\) and \(y_2\), ALPCSFS correctly estimated the latencies and separated the two original subsystems. In contrast, the MSPC method relied on an explicit assumption regarding the order of potential subsystems. However, as in this example, within a given system order, not all distortions are associated with a common latency. As a result, the MSPC method made biased estimates.
Example 3: two mixed subsystems with identical outputs
We also simulated the more challenging scenario in which the two subsystems had overlapping output components. In this case, only distorted phases can be extracted from the mixed output signals, and the relative amplitude gain of the output components (related to \(\psi _{mr}\) in (1)) results to determine the estimated latencies. The stimulated model reads:
where the relative amplitude gain \(\xi >0\), and the latencies, \({\tau _1}=15\), \({\tau _2}=20\) ms. Both \(y_1\) and \(y_2\) had identical input frequencies (17, 21, 27) Hz so that they generated nine identical 2ndorder output components: {4, 6, 10, 34, 38, 42, 44, 48, 54} Hz. We varied the relative gain \(\xi = [\frac{1}{{100}},\frac{1}{{10}},\frac{1}{6},\frac{1}{4},\frac{1}{3},\frac{1}{2},\frac{1}{{\sqrt{2} }},1,\sqrt{2} ,2,3,4,6,10,100]\), and simulated the mixed signals Y without noise, and under SNRs of (5, 0, − 5, − 10, − 15, − 20) dB. The latency estimates as function of the relative gain are shown in Fig. 8.
The simulations show that in case of total overlap of the system outputs, the estimated latency is a weighted average of the underlying true latencies, where the relative gain of the systems serves as the weighting factor. After normalizing the estimated latencies (on the pure, noiseless signal) to a range between 0 and 1, we fitted the estimates with a sigmoid function \(f(x) = \frac{a}{{1 + {e^{(  bx)}}}}\), with x the natural logarithm of the relative gain, and fit coefficients a=1.00 and b=1.17, with 95% confidence bounds (0.9944, 1.007) and (1.143, 1.196), respectively. In general, the estimated latency for the superposition of the two system outputs with latencies (\({\tau _1} \le {\tau _2}\)) follows: \({\tau _{mix}} = \frac{{({\tau _2}  {\tau _1})}}{{1 + {e^{(  1.17\log (\xi ))}}}} + {\tau _1}\), which is equivalent to,
On the signal without noise, each selected starting frequency of the SFS yielded the same results for latency, MPE, and selected frequency components (i.e., output components). On the signal with noise, selecting a different starting frequency for the SFS yielded similar results for latency and MPE (as shown in Fig. 8), but could slightly differ regarding the selected frequency components, because frequency components with a large phase shift (due to low SNRs) are not selected by SFS.
Measured ASSR components in the EEG
Table 1 shows the nonlinear interactions between the four stimulus input frequencies up to 6th order, and the corresponding frequencies that could potentially be present in the measured ASSR. The oddorder distortion products (around 500 Hz and higher; e.g., \(2f_1f_2\)) fall above the preprocessed EEG bandwidth (< 200 Hz). In this study, we used the same four frequencies (with different SPLs) as stimuli for both ears. As a result, all potential binaural beats (BBs, if significant) overlap with the monaural beats (MBs), whereas MBs often show much larger ASSR amplitudes than BBs^{40}. Therefore, the ASSRs analyzed in this study are the result from a superposition of MBs and potential BBs, but are mainly dominated by the MBs. This study therefore focused only on the method to estimate ASSR latencies, rather than to disentangle the contribution of MBs vs. BBs.
In the ASSRs, two predominant MBs (i.e., 33, 39 Hz) were generated by the secondorder difference of the two louder pure tones in each ear. Their full 2ndorder interactions (including harmonics and their intermodulations, i.e., a subset of 4thorder interactions of the input frequencies) and 3rdorder interactions (i.e., a subset of 6thorder interactions of the input frequencies) also show high SNRs. Note that Table 1 shows a subset (< 200 Hz) of theoretical ASSR frequencies corresponding to each system order. In addition to the 4thorder output frequencies shown in Table 1, more 4thorder output frequencies could potentially be generated among the full set of 2ndorder output frequencies, e.g., 29 and 35 Hz. Such output frequencies, however, were not considered in this study because of much lower SNRs.
Figure 9 shows that, in the measured EEG, the theoretical ASSRs all exceeded the background (i.e., nonASSR integer frequency components) of 6.3 (± 0.6) dB and the threshold of 7.492 dB (p = 0.01). To avoid the potential detrimental effect of EMG artifacts, we averaged the top3rd SNRs on each integer frequency across ten subjects, measured on both recording days. The measurements of day 1 and day 2 showed highly similar SNR values. The grand average (n = 20, day 1 and 2) SNR across all 61 EEG channels showed that ASSRs tended to have their maximum activation at both temporal lobes and frontalcentral regions (bottom panel of Fig. 9). Results from the individual subjects are provided in the Suplemental Material, section D.
Estimated latencies from ASSRs
The MB frequencies showed a higher SNR (or LCI) than the other ASSR components. Therefore, in applying ALPCSFS, we estimated latencies by starting with a pair of MB frequencies (i.e., 33 and 39 Hz). The MB frequency pair can determine a search direction for SFS, so that it finds the involved ASSR components that have a common latency. Similarly, we evaluated the underlying latencies dominated by higher frequencies (HF) (> 80 Hz) by using SFS starting with the 3MB frequencies (i.e., 99 and 117 Hz). The ten subjects yielded similar results for the measurements on both days. The results of day 2 are shown in Fig. 10, where the phase extraction was performed by taking the ‘AVG EEG’ procedure (see Methods).
The estimated latencies resulted to vary with the location of the EEG channels. Table 2 includes the estimated latencies for three representative EEG channels: P7 (left TL), FCz (FC region) and P8 (right TL). Channels FCz and P8 were chosen because they yielded the largest grandaverage LCI. P7 is located contralateral to P8, and was selected for comparison.
The results based on AVG phase and AVG EEG were quite comparable (see Fig. 11). The estimated latencies for the lowfrequency distortion products (LF) on EEG channels FCz and P7 had a nonGaussian distribution, so that we report their median values and percentiles. The MPEs at the estimated latencies were low (< 10%) and at a similar level for all estimates of the two LF and HFdominated systems, which is indicative for a reliable estimation. The most consistent latencies were obtained from EEG channel P8, which yielded a large ASSR response (i.e., high SNRs and LCIs). As the power of highorder responses (e.g., 4th and 6th orders) tended to decrease, the corresponding ASSR components were not significant on certain EEG channels, causing a less reliable estimated latency. We thus selected a subjectspecific best channel with the largest LCI value instead of using P7 for estimating the HFdominated latencies, showing a median latency of 21.1 ms, with 19.9 and 24.3 ms as the 25% and 75% percentiles, respectively. The ‘best ch’ is mainly selected from the FC region, thus it shows a similar median value as FCz. The HFdominated latency generally showed a larger variance than the LFdominated latency, which relates to a smaller LCI of the HF components.
The LCI was computed on K = 1200 (100 trials * 12 s) EEG epochs with 1sec duration for each ASSR component. According to the theoretical threshold of \(\sqrt{{{3}}/K}\) (see Methods), the 95% confidence threshold for a significant LCI was 0.05. We used the average LCI of MBs and 3MBs for the LF and HF components. Table 2 shows that the LCIs for LF components were significantly higher than for HF components, which is mainly due to the decreased SNRs of the higherorder ASSRs (see Fig. 9). The lower LCIs on the HF components partly explains the larger variance on the estimated latencies.
Effect of epoch filtering
We excluded poorly phaselocked epochs by applying epoch filtering (see Methods) and compared the results for four different thresholds (\(\theta\) = 0, 0.4, 0.6, and 0.8) for the phaselock values (PLV). These PLV thresholds were chosen empirically, (\(\theta\) = 0 means that 100% of the EEG epochs were included), such that the included EEG epochs were around 60%, 30% and 10% for \(\theta\) = 0.4, 0.6, and 0.8, respectively. The estimated latencies were robust for different values of \(\theta\), as shown in Fig. 11. The cause for this stability is that the ALPCSFS method employs multiple ASSR frequencies to determine a latency, and among them the MB frequencies show generally stable phase values across subjects. As a result, the Average EEG and Average Phase extraction methods both yielded similar results. Around 10% of all EEG epochs (i.e., \(\theta\) = 0.8) generated similar results to that using all EEG epochs, which indicates that the latency is mainly determined by EEG epochs that are strongly phaselocked to the stimuli. On the other hand, using fewer EEG epochs increased the variance of estimated latency across subjects.
A comparison with the MSPC method
We compared our ALPCSFS method with the MSPC method for estimating the ASSR latencies from the recorded EEG. Figure 12 shows the results obtained from one representative subject. The ALPCSFS method selects a group of ASSR frequencies that share a common latency across different potential system orders (panel b). In contrast, the MSPC method computes a latency by assuming a specific system order. All significant ASSRs (i.e., LCI > 0.05) were used. Note that the LCI is mathematically equivalent to the phasecoupling strength employed in the MSPC method (see Methods). The latter method, however, led to larger MPEs, because the ASSRs of the same order could have been generated by subsystems with different latencies, yielding poorer linear regression results for the relation between phase lag vs. frequency (see Fig. 12).
Table 3 shows the estimated latencies from the MSPC method for 20 measurements on two EEG channels: FCz and P8, on which subjects showed generally more significant ASSRs. Compared with the ALPCSFS based MPE values in Table 2, the MPE values (> 0.3) in Table 3 indicate that the estimated latencies from the MSPC method were not reliable. In addition, the low LCI values show that many ASSRs were weak and not significant (i.e., LCI < 0.05), and were excluded for estimating latencies. Note that the median values of the ALPCSFS vs. MSPCbased latencies were similar (e.g., when based on 2ndorder outputs). The ALPC method becomes mathematically equivalent to the MSPC method when the assumed system information is correct. However, compared with the ALPCSFS method, the MSPC method lacks the procedure of ASSR frequency selection, and instead relies on a prior assumption regarding the system order to group frequencies (see Fig. 12). Such an assumption may be appropriate for the skeletal motor system^{20}, but may be less suitable for analyzing the human auditory system. For example, looking at the 4thorder distortion products, the MSPC method would include 2MBs, which are already generated at the cochlea, as well as the difference frequencies (e.g., 6 Hz) of MBs from both ears, which are generated at higher levels in the binaural auditory pathways (e.g., at the inferior colliculus, or at auditory cortical levels). Possibly, these different components may have different latencies, and should not be lumped into the same subsystem for estimating a latency.
Discussion
Advantages of the proposed method and stimuli
ALPC combined with SFS and time compensation has three major advantages compared with existing methods: (i) It requires no prior assumptions of the underlying nonlinear systems (e.g., system orders, or the number of subsystems) to calculate the unknown phase information of the ASSR components through phase compensation, and is therefore easier to implement. (ii) It allows for the estimation of multiple underlying latencies. (iii) It allows for the analysis of different nonlinear (sub)systems with different nonlinearities in one pass, and automatically selects the contributing ASSR components with the same latency (and potential common generators). Therefore, ALPCSFS is better suited for analyzing nonlinear, higherorder ASSRs that are thought to underlie complex brain functions.
Moreover, the use of multicosine stimuli with all initial phases set to zero provides two additional advantages. (i) Compared to stimuli with arbitrary initial phases, it avoids the complicated procedure of phase compensation, and thus requires no prior assumptions regarding the underlying system nonlinearities. Note that the stimuli should preferably be cosine (not sine) with initial phase at zero, because for multicosine stimuli, arbitraryorder (even and odd) nonlinear distortion products all have zero initial phase, so that one can readily apply ALPCSFS with time compensation. As shown in Appendix C of the Supplemental Material, a sum of sines only maintains a common initial phase for a specific subset of distortion products only, because the nonlinear distortions of the sine will generate phase shifts \(\in\) [0, \(\pi /2\), \(\pi\), or \(3\pi /2\)] that depend on system order and nonlinear interactions. (ii) The estimated latency for zerophase distortions will be more robust against the unavoidable small system errors (e.g., small delays caused by tubes and the measurement devices). It is often difficult to directly measure the exact phase values of the stimulus carriers at the onset of experimental trials, i.e., the exact time at which sound stimuli hit the ear drums. As a result, even a small timing error of one millisecond will already significantly change the phases of highfrequency stimuli (e.g., > 1 kHz). Consequently, small errors in the stimulus phases may lead to an unreliable latency estimate for methods that require exact initial phase values of the stimuli. In contrast, for multicosine stimuli the exact onset phase at the ears is not critical, as the estimated latency will be the sum of the true latency and the unknown (small) system delay, which can be readily incorporated as a fixed time compensation (described in Methods).
Therefore, to evaluate higherorder ASSRs, our proposed cosine stimuli with zero phase serve as a convenient alternative to multiplesine stimuli with arbitrary initial phases; the ALPCSFS method can also be applied to amplitudemodulated stimuli, provided that the initial phases of the modulation frequencies are all set to zero.
The ALPCSFS method is a phasebased latencyestimation method. In addition to the phasebased methods, amplitudebased methods have been developed, in which the latency is estimated by the time shift that maximizes the statistical correlation between the envelope of the stimulus signal (e.g., a frequency modulation) and the response^{41,42}. In general, phasebased latencyestimation methods are suitable when a limited number of ASSR frequencies is present. The amplitudebased methods are appropriate for broadband response signals, consisting of a continuous frequency spectrum. However, it is worth noting that in order to analyze higherorder ASSRs, phasebased methods may outperform amplitudebased methods, because the latter are applied to analyze the envelope of the stimuli and responses, which are limited to only 2ndorder nonlinear distortion products.
Limitations of the method
The proposed ALPCSFS method can estimate multiple ASSR latencies from a single EEG channel. However, it has two major limitations. First, the estimated latency is directly associated with ASSR frequencies, rather than underlying discrete neural sources. Thus, the method cannot disentangle the contributions from multiple neural generators. Instead, to reveal the potential ASSR sources from the estimated latencies, it will require prior information about the relationship between ASSR frequencies and potential neural sources. Second, when the same ASSR frequencies stem from multiple sources (which can be measured from the EEG), the ALPCSFS estimate will fall between the two true latencies, where the exact value will be determined by the relative ASSR amplitudes of the original systems (Fig. 8).
One of our future goals will be to associate a potential latency change (even to infinite, i.e., no response) with (dys)function of the auditory system of the hearing impaired. To this end, in addition to estimate (and longitudinally follow) the ASSR latencies, we also need to evaluate the underlying neural generators. For that, it will be helpful to take advantage of the multiple electrode setup, and to employ sophisticated sourcelocalization techniques and methodologies, which enable a better spatial resolution^{7}, Bayesian model averaging^{43}, beamforming techniques^{44}, as well as multivariate source separation^{45}.
Pitfalls of EEG montage
For the preprocessing of EEG signals, it is detrimental to rereference all channels to an EEG channel that is not neutral to ASSRs, since a rereference EEG channel containing strong ASSRs will affect all other EEG channels. For example, Fig. 13 shows phase values of all EEG channels at one MB frequency. After rereferencing to EEG channel Cz (carrying a large SNR of ASSRs), the distribution of SNR and phase values clearly changed. The phase of all EEG channels became similar to that on Cz. Therefore, it is recommended to either use the CAR montage^{34}, or to rereference to a neutral electrode. On the other hand, also the CAR montage has its drawbacks, as it may cause phase ambiguity on the EEG channels on which ASSRs are relatively low. Certain ASSR components might be reversed, resulting in a phase shift of \(\pi\). For that reason, the EEG channel P7 shows a larger variance on the estimated latency. To avoid the influence of such phase ambiguity, it is good practice to use the EEG channels that show relatively stronger ASSRs, e.g., at the frontalcentral (FC) and the right temporallobe regions. Note that completely reversing an EEG channel (i.e., y = x) will cause the same phase shift of \(\pi\) on all frequency components. It thus only changes the bias term of a fitted line (phase lag vs frequency) without changing the slope. Therefore, it does not affect the results of ALPC, or apparent latency.
Multiple latencies of ASSRs
To estimate reliable latencies with the ALPCSFS method, several important issues should be considered: (i) select the starting parameters for the SFS; (ii) use only EEG channels with a relatively high SNR (or LCI), and (iii) use EEG epochs that show high phaselocked values. An important parameter for the SFS is the starting frequency. SFS is a suboptimal method, and the starting frequency (or frequency pair) determines the direction of the heuristic search. It is good practice to start with those ASSR components with the highest SNR (or LCI), like the two major MB frequencies (33 and 39 Hz) in our experiments. Our simulation results demonstrate that the SNR of ASSRs affect the probability that two subsystems can indeed be separated (see Table A1 of Supplemental Material.). Starting the SFS with these two MB components also identified the other ASSR components with the same latency. We also started the SFS from 2MBs or from the (4, 6) Hz pair. However, only when the SFS started from the major MBs or from 3MBs, we obtained consistent results across subjects. ASSRs at 4 and 6 Hz are weak in most subjects, so that no reliable phase values are available for estimating the latency.
ASSR sources locate in multiple cortical as well as subcortical regions: ASSRs (< 20 Hz) originate from the auditory cortex; ASSRs around 40 Hz may arise from multiple locations including the brainstem, thalamus and the auditory cortex, while high frequency ASSRs (80  100 Hz) are supposed to dominate from the upper region of the brainstem^{2,47}. Our analysis revealed two major latencies from the ASSR components. The LFdominant system (near 40 Hz) showed longer latencies than the HFdominant system (>80 Hz). This finding corroborates previous studies using amplitude modulation (AM) stimuli on EEG or MEG signals^{1,22}. LF and HF components correspond to the frequency ranges where maximum activation of ASSRs are thought to arise from the cortex (\(\approx\)45 Hz) and the brainstem (\(\approx\) 90 Hz), respectively^{27}. For the LFdominant system, the two TL regions showed similar latencies (\(\approx\) 41 ms) whereas the FC region was endowed with a slightly longer latency (\(\approx\) 52 ms). This observation is consistent with the results from an EEGbased, binaural hearing study^{46}. A longer latency suggests an origin in the upper stages of the ascending pathways of the auditory system. Thus, the longer latency in the FC region could perhaps arise from binaural interactions in the cortex, following monaural interactions. For the HFdominant system, however, the estimated latency on the FC region (near FCz) was shorter than at both TL regions (P7 and P8), which is similar as reported by^{22}. For these HF components, the brainstem has been considered as the major underlying source^{27}. In the monaural auditory pathway, the signal may propagate from the lower brainstem, through the upper brainstem (which is anatomically located near the FC region), to the auditory cortex (located near the TLs).
It is further interesting to note that both LF and HFdominant systems contained some overlapping ASSR components (see Fig. 10). This finding suggests that multiple neural mechanisms, characterized by different frequency ranges, may still enter a phaselocked status and thus shows a common latency. This finding is in line with the observed phase synchronization across multiple brain regions and function stages in the hearing pathways and highlevel cortex^{3,48}.
ASSRs are rightlateralized, i.e., righthemisphere response exceeds the lefthemisphere response, which has been reported by several ASSR studies^{7,12}. We also found it in this study for both LF and HF ASSRs (see Table 2), thus the EEG channel at P8 showed similar latency but smaller variance than on location P7 (see Fig. 11). In addition, the frontalcentral (FC) brain region (near FCz) showed the strongest ASSRs. However, the latencies estimated from FCz had a larger variance than on P8. In particular, Fig. 11 shows two latency clusters, one major cluster around 52 ms and a smaller one around 25 ms. The major latency is longer than the TL region latency (around 40 ms), which might represent a binaural cortical interaction, i.e., the 4th and 6thorder distortion frequencies are mainly generated from interaction between the beat frequencies of both ears. The smaller cluster (n = 6 out of 20 measurements) should not be simply considered as outliers, because the FC brain region may represent the superposition of responses from multiple underlying sources (resulting in a weighted average latency estimate for brainstem and cortex). Furthermore, some subjects may have produced stronger ASSRs from the upper brainstem than from cortex, which would affect the estimated latency, as the brainstemrelated latency is much smaller than the cortexrelated latency^{21,49}.
Possible applications
The ascending auditory system, from auditory nerve to auditory cortex, contains several parallel pathways of acoustic signal processing, which eventually reach the cortical areas. For example, the localization of a sound source requires the neural processing of three types of acoustic cues, which are extracted by independent neural circuits within the auditory brainstem: interaural time differences (ITDs) and interaural level differences (ILDs) for locations in the horizontal plane, and directiondependent spectralshape cues from the pinna for directions in the vertical plane (Fig. 14)^{50}.
The binaural difference cues operate on complementary frequency bands: as the ITD for humans varies from about − 600 μs (sound at the far left) to + 600 μs (sound at the far right), this cue only provides unambiguous phase information for sound frequencies up to ~ 1500 Hz. For these low frequencies, phaselocked neurons in the auditory nerve, the anteroventral cochlear nucleus (AVCN), and the medial superior olive (MSO) carry precise phase information about the carrier frequency. In the MSO, neurons respond to the interaural phase difference at their characteristic frequency. Because the phase difference is frequencydependent for a given delay (\(\Delta \varphi = 2\pi f \Delta \tau\)), MSO neurons that are tuned to a specific frequency and phase difference thus encode a particular interaural absolute delay, and hence a particular direction in the horizontal plane. Recently, a periodic representation of ITDs has also been demonstrated for human cortex^{51}. Neurons that are tuned to the same direction in azimuth, but are sensitive to other frequencies, will be tuned to different phase differences. In this way, the neuronal processing resembles the ALPC algorithm, which also relies on frequencydependent phasedifferences to estimate the unique underlying (characteristic) delay^{50,52,53}.
A second pathway processes the ILDs that arise for frequencies above ~ 3 kHz because of the head shadow effect. Cells in the lateral superior olive (LSO) are tuned to the ILD for a particular frequency. The ILD cue depends strongly on the sound’s frequency: the higher the frequency, the stronger the headshadow effect. Combining the responses from a population of neurons, tuned to different frequencies, is needed to estimate the veridical azimuth angle of the sound source. It is conceivable that the final stage in estimating the sound’s azimuth angle involves nonlinear integrative mechanisms for both binaural pathways. Finally, the dorsal cochlear nucleus (DCN) feeds a monaural signal pathway via the central nucleus of the Inferior Colliculus (ICC) to the cortex, and is possibly involved in the localization of source elevation (spectral cue extraction), and pitch perception.
It is conceivable that the different auditory processing mechanisms may be endowed with different delays and different nonlinear processes. As the soundlocalization mechanisms operate in different frequency bands, they could potentially introduce their own nonlinear distortions, and these may, in principle, be separated by our ALPCSFS method, by applying specific frequency combinations to either ear. Similarly, the ALPCSFS method may be used to test quantitative models of the ascending auditory system^{19,54}.
In addition to evaluating the steadystate dynamics of the ASSRs (as done in the present study), the ALPCSFS method could be extended to determine the latencies associated with the instantaneous phases of the ASSR frequencies, obtained by either Gabor wavelets or the Hilbert transform^{55,56}. Such an extension could help to assess the neural mechanisms within the transition phases of the auditory (or visual) processing, as may be observed immediately after stimulus onset^{33} or offset^{57}.
Conclusions
We proposed a nonparametric method, ALPCSFS with time compensation, to estimate multiple latencies of the auditory system from ASSRs recorded on a single EEG channel. Compared with existing methods, our method requires no prior assumptions of the underlying nonlinearity (e.g., system orders), and can be readily implemented to analyze higherorder ASSRs, when combined with a zerophase cosine stimulus complex. We illustrated our method on recorded EEG signals and it successfully identified two major (LF and HF dominated) latencies that were stable across subjects and between measurement days. The LFdominated latencies were longer than the HFdominated latencies and may be putatively related to different brain regions (e.g., central brain or temporallobe regions). The LFdominated latency is mainly determined by EEG epochs that are strongly phaselocked to the stimuli. ALPCSFS is promising as an objective measure for the ASSR latencies and for the underlying nonlinearities of both primary and nonprimary auditory cortex (dys)function.
Data availability
The Matlab scripts for the ALPCSFS method is available here: https://github.com/ieeeWang/ALPCSFSmethodMatlabscripts.
Abbreviations
 \(\alpha _{\Sigma _{i}}\) :

Current phase angle of output frequencies \(\in [0,\,2\pi ]\)
 \(\varphi _{\Sigma _{i}}\) :

Initial phase angle (at t = 0) of output frequencies \(\in [0,\,2\pi ]\)
 \(\varphi _i^{in}\) :

Initial phase angle (at t = 0) of input frequencies \(\in [0,\,2\pi ]\)
 \(\omega _{\Sigma _{i}}\) :

Output angular frequencies
 \(\omega _i^{in}\) :

Input angular frequencies
 \(\tau _e\) :

Estimate of real latency
 \(\tau _p\) :

Pseudo latency
 \(\Delta _t\) :

The onset delay of a target EEG epoch
 \(C_\tau ^i\) :

Latency consistency index (LCI) of ith ASSR
 \(C_\tau ^{sig}\) :

95% Confidence limit of LCI
 \(d_\phi\) :

The slope of phase lags vs frequencies
 \(\xi\) :

Relative amplitude gain (\(\xi > 0\))
 \(\theta\) :

Threshold for PLV
 \(\phi ({f_i})\) :

Compute current phase of a frequency by FT
 \(\bmod ( \cdot )\) :

Modulus function: remainder after division
 PLV:

Phase lock value
 MPE:

Mean phase error at the estimated latency \(\in [0,\,2]\)
 MSPC:

Multispectral phase coherence
 ALPC:

Apparent latency from phase coherence
 SFS:

Sequential forward selection
 MB:

Monaural beat
References
Picton, T. W., John, M. S., Dimitrijevic, A. & Purcell, D. Human auditory steadystate responses: Respuestas auditivas de estado estable en humanos. Int. J. Audiol. 42(4), 177–219 (2003).
Picton, T. W. Human Auditory Evoked Potentials (Plural Publishing, 2010). ISBN: 1597566225.
Farahani, E. D., Wouters, J. & van Wieringen, A. Contributions of nonprimary cortical sources to auditory temporal processing. NeuroImage 191, 303–314 (2019).
Reyes, S. A. et al. Mapping the 40hz auditory steadystate response using current density reconstructions. Hear. Res. 204(1–2), 1–15 (2005).
Chandrasekaran, B. & Kraus, N. The scalprecorded brainstem response to speech: neural origins and plasticity. Psychophysiology 47(2), 236–246 (2010).
Bidelman, G. M. Multichannel recordings of the human brainstem frequencyfollowing response: scalp topography, source generators, and distinctions from the transient abr. Hear. Res. 323, 68–80 (2015).
Coffey, E. B., Herholz, S. C., Chepesiuk, A. M., Baillet, S. & Zatorre, R. J. Cortical contributions to the auditory frequencyfollowing response revealed by meg. Nat. Commun. 7, 11070 (2016).
Luke, R., De Vos, A. & Wouters, J. Source analysis of auditory steadystate responses in acoustic and electric hearing. NeuroImage 147, 568–576 (2017).
Roberts, L. E., Bosnyak, D. J., Bruce, I. C., Gander, P. E. & Paul, B. T. Evidence for differential modulation of primary and nonprimary auditory cortex by forward masking in tinnitus. Hear. Res. 327, 9–27 (2015).
van der Reijden, C. S., Mens, L. H. & Snik, A. F. Signaltonoise ratios of the auditory steadystate response from fiftyfive eeg derivations in adults. J. Am. Acad. Audiol. 15(10), 692–701 (2004).
Korczak, P., Smart, J., Delgado, R., Strobel, T. M. & Bradford, C. Auditory steadystate responses. J. Am. Acad. Audiol. 23(3), 146–170 (2012).
Undurraga, J. A., Haywood, N. R., Marquardt, T. & McAlpine, D. Neural representation of interaural time differences in humansan objective measure that matches behavioural performance. J. Assoc. Res. Otolaryngol. 17(6), 591–607 (2016).
Skoe, E., Krizman, J., Spitzer, E. & Kraus, N. The auditory brainstem is a barometer of rapid auditory learning. Neuroscience 243, 104–114 (2013).
Lehmann, A. & Schnwiesner, M. Selective attention modulates human auditory brainstem responses: relative contributions of frequency and spatial cues. PLoS ONE 9(1), e85442 (2014).
O’Donnell, B. F. et al. The auditory steadystate response (assr): a translational biomarker for schizophrenia. In Supplements to Clinical Neurophysiology Vol. 62, 101–112 (Elsevier, 2013), ISBN: 1567424X.
Diesch, E., Andermann, M., Flor, H. & Rupp, A. Interaction among the components of multiple auditory steadystate responses: enhancement in tinnitus patients, inhibition in controls. Neuroscience 167(2), 540–553 (2010).
Gransier, R. et al. The utility of eassrs for ci fitting: from threshold determination to the assessment of modulation encoding. J. Hear. Sci. 8(2) (2018).
Joris, P., Schreiner, C. & Rees, A. Neural processing of amplitudemodulated sounds. Physiol. Rev. 84(2), 541–577 (2004).
van der Heijden, M. & Joris, P. X. Cochlear phase and amplitude retrieved from the auditory nerve at arbitrary frequencies. J. Neurosci. 23(27), 9194–9198 (2003).
Yang, Y. et al. A general approach for quantifying nonlinear connectivity in the nervous system based on phase coupling. Int. J. Neural Syst. 26(01), 1550031 (2016).
Herdman, A. T. et al. Intracerebral sources of human auditory steadystate responses. Brain Topogr. 15(2), 69–86 (2002).
Schoonhoven, R., Boden, C., Verbunt, J. & De Munck, J. A whole head meg study of the amplitudemodulationfollowing response: phase coherence, group delay and dipole source analysis. Clin. Neurophysiol. 114(11), 2096–2106 (2003).
Witham, C. L., Riddle, C. N., Baker, M. R. & Baker, S. N. Contributions of descending and ascending pathways to corticomuscular coherence in humans. J. Physiol. 589(15), 3789–3800 (2011).
Yang, Y., SolisEscalante, T., van der Helm, F. C. & Schouten, A. C. A generalized coherence framework for detecting and characterizing nonlinear interactions in the nervous system. IEEE Trans. Biomed. Eng. 63(12), 2629–2637 (2016).
Lang, Z.Q. & Billings, S. Evaluation of output frequency responses of nonlinear systems under multiple inputs. IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 47(1), 28–38 (2000).
Peyton Jones, J. & Choudhary, K. Output frequency response characteristics of nonlinear systems part II: overlapping effects and commensurate multitone excitations. Int. J. Control 85(9), 1279–1292 (2012).
Purcell, D. W., John, S. M., Schneider, B. A. & Picton, T. W. Human temporal auditory acuity as assessed by envelope following responses. J. Acoust. Soc. Am. 116(6), 3581–3593 (2004).
Gransier, R. et al. Auditory steadystate responses in cochlear implant users: effect of modulation frequency and stimulation artifacts. Hear. Res. 335, 149–160 (2016).
Jain, A. & Zongker, D. Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 153–158 (1997).
John, M. & Picton, T. Human auditory steadystate responses to amplitudemodulated tones: phase and latency measurements. Hear. Res. 141(1–2), 57–79 (2000).
Shils, J., Litt, M., Skolnick, B. & Stecker, M. Bispectral analysis of visual interactions in humans. Electroencephalogr. Clin. Neurophysiol. 98(2), 113–125 (1996).
Wang, L., Arends, J. B. A. M., Long, X., Cluitmans, P. J. M. & van Dijk, J. P. Seizure patternspecific epileptic epoch detection in patients with intellectual disability. Biomed. Signal Process. Control 35, 38–49 (2017).
Ro, B., Picton, T. W. & Pantev, C. Temporal integration in the human auditory cortex as represented by the development of the steadystate magnetic field. Hear. Res. 165(1–2), 68–84 (2002).
Ludwig, K. A. et al. Using a common average reference to improve cortical neuron recordings from microelectrode arrays. J. Neurophysiol. 101(3), 1679–1689 (2009).
Widmann, A. & Schrger, E. Filter effects and filter artifacts in the analysis of electrophysiological data. Front. Psychol. 3, 233 (2012).
Wang, L., Long, X., Aarts, R. M., van Dijk, J. P. & Arends, J. B. A broadband method of quantifying phase synchronization for discriminating seizure eeg signals. Biomed. Signal Process. Control 52, 371–383 (2019).
PradoGutierrez, P., MartnezMontes, E., Weinstein, A. & Zaartu, M. Estimation of auditory steadystate responses based on the averaging of independent eeg epochs. PLoS ONE 14(1), e0206018 (2019).
Dobie, R. A. & Wilson, M. J. A comparison of t test, f test, and coherence methods of detecting steadystate auditoryevoked potentials, distortionproduct otoacoustic emissions, or other sinusoids. J. Acoust. Soc. Am. 100(4), 2236–2246 (1996).
Romao, M. & TierraCriollo, C. J. A bayesian approach to the spectral ftest: application to auditory steadystate responses. Comput. Methods Programs Biomed. 183, 105100 (2020).
Ross, B., Miyazaki, T., Thompson, J., Jamali, S. & Fujioka, T. Human cortical responses to slow and fast binaural beats reveal multiple mechanisms of binaural hearing. J. Neurophysiol.112(8), 1871–84 (2014). Ross, Bernhard Miyazaki, Takahiro Thompson, Jessica Jamali, Shahab Fujioka, Takako eng MOP125195/Canadian Institutes of Health Research/Canada Research Support, NonU.S. Gov’t J. Neurophysiol. 2014 Oct 15;112(8):187184. https://doi.org/10.1152/jn.00224.2014. Epub 2014 Jul 9.
Menell, P., McAnally, K. I. & Stein, J. F. Psychophysical sensitivity and physiological response to amplitude modulation in adult dyslexic listeners. J. Speech Lang. Hear. Res. 42(4), 797–803 (1999).
MartnezMontes, E., GarcaPuente, Y., Zaartu, M. & PradoGutirrez, P. Chirp analyzer for estimating amplitude and latency of steadystate auditory envelope following responses. IEEE Trans. Neural Syst. Rehabil. Eng. (2020).
TrujilloBarreto, N. J., AubertVzquez, E. & ValdsSosa, P. A. Bayesian model averaging in eeg/meg imaging. NeuroImage 21(4), 1300–1319 (2004).
Popov, T., Oostenveld, R. & Schoffelen, J. M. Fieldtrip made easy: an analysis protocol for group analysis of the auditory steady state brain response in time, frequency, and space. Front. Neurosci. 12, 711 (2018).
Cohen, M. X. & Gulbinaite, R. Rhythmic Entrainment Source Separation: Optimizing Analyses of Neural Responses to Rhythmic Sensory Stimulation. biorxiv (Generic, 2016).
Schwarz, D. W. & Taylor, P. Human auditory steady state responses to binaural and monaural beats. Clin Neurophysiol116(3), 658–68 (2005). Schwarz, D W F Taylor, P eng Comparative Study Research Support, NonU.S. Gov’t Netherlands Clin Neurophysiol. 2005 Mar;116(3):65868. https://doi.org/10.1016/j.clinph.2004.09.014.
Farahani, E. D., Goossens, T., Wouters, J. & van Wieringen, A. Spatiotemporal reconstruction of auditory steadystate responses to acoustic amplitude modulations: Potential sources beyond the auditory pathway. Neuroimage148, 240–253 (2017). Farahani, Ehsan Darestani Goossens, Tine Wouters, Jan van Wieringen, Astrid eng Neuroimage. 2017 Mar 1;148:240253. https://doi.org/10.1016/j.neuroimage.2017.01.032. Epub 2017 Jan 18.
Tichko, P. & Skoe, E. Frequencydependent fine structure in the frequencyfollowing response: the byproduct of multiple generators. Hear. Res. 348, 1–15 (2017).
Plourde, G. et al. Attenuation of the 40hertz auditory steady state response by propofol involves the cortical and subcortical generators. Anesthesiol. J. Am. Soc. Anesthesiol. 108(2), 233–242 (2008).
Van Opstal, J. The Auditory System and Human SoundLocalization Behavior (Academic Press, 2016). ISBN: 0128017252.
Salminen, N. H., Jones, S. J., Christianson, G. B., Marquardt, T. & McAlpine, D. A common periodic representation of interaural time differences in mammalian cortex. NeuroImage 167, 95–103 (2018).
Yin, T. C. Neural mechanisms of encoding binaural localization cues in the auditory brainstem. In Integrative Functions in the Mammalian Auditory Pathway 99–159 (Springer, 2002).
Vonderschen, K. & Wagner, H. Detecting interaural time differences and remodeling their representation. Trends Neurosci. 37(5), 289–300 (2014).
Verhulst, S., Altoe, A. & Vasilkov, V. Computational modeling of the human auditory periphery: auditorynerve responses, evoked potentials and hearing loss. Hear. Res. 360, 55–75 (2018).
Greenblatt, R. E., Pflieger, M. & Ossadtchi, A. Connectivity measures applied to human brain electrophysiological data. J. Neurosci. Methods 207(1), 1–16 (2012).
Sameni, R. & Seraj, E. A robust statistical framework for instantaneous electroencephalogram phase and frequency estimation and analysis. Physiol. Meas. 38(12), 2141 (2017).
Otero, M., PradoGutirrez, P., Weinstein, A., Escobar, M.J. & ElDeredy, W. Persistence of eeg alpha entrainment depends on stimulus phase at offset. Front. Hum. Neurosci. 14, 139 (2020).
Acknowledgements
We thank Yuan Yang from Northwestern university for his advice on the experiments and for providing the analysis code of the MSPC method. We thank anonymous reviewers for their constructive criticisms on an earlier version of this manuscript, which greatly helped to improve our paper. Also, we thank the students and colleagues in Radboud university and Delft university for the help on collecting the EEG datasets. This research was supported by the Dutch Organisation for Scientific Research, NWOTTW Perspectief, project ’NeuroCIMT’  Otocontrol, nr. 14689 (LW, EN), and EU Horizon 2020 ERC Advanced Grant 2016 ’Orient’, nr. 693400 (AJVO).
Author information
Authors and Affiliations
Contributions
L.W. developed the method, analyzed the data and wrote the manuscript. L.W. and E.N. performed the EEG experiment together. A.O. provided the fund and guided the whole study. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, L., Noordanus, E. & van Opstal, A.J. Estimating multiple latencies in the auditory system from auditory steadystate responses on a single EEG channel. Sci Rep 11, 2150 (2021). https://doi.org/10.1038/s41598021812325
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598021812325
This article is cited by
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.