A GRU–CNN model for auditory attention detection using microstate and recurrence quantification analysis

Attention as a cognition ability plays a crucial role in perception which helps humans to concentrate on specific objects of the environment while discarding others. In this paper, auditory attention detection (AAD) is investigated using different dynamic features extracted from multichannel electroencephalography (EEG) signals when listeners attend to a target speaker in the presence of a competing talker. To this aim, microstate and recurrence quantification analysis are utilized to extract different types of features that reflect changes in the brain state during cognitive tasks. Then, an optimized feature set is determined by employing the processes of significant feature selection based on classification performance. The classifier model is developed by hybrid sequential learning that employs Gated Recurrent Units (GRU) and Convolutional Neural Network (CNN) into a unified framework for accurate attention detection. The proposed AAD method shows that the selected feature set achieves the most discriminative features for the classification process. Also, it yields the best performance as compared with state-of-the-art AAD approaches from the literature in terms of various measures. The current study is the first to validate the use of microstate and recurrence quantification parameters to differentiate auditory attention using reinforcement learning without access to stimuli.

In 2012, Mesgarani and Chang 21 showed that it is possible to decode auditory attention in multi-talker scenarios from brain signals.Here, speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient temporal and spectral features of the attended speaker, as if subjects were listening to that speaker alone.Therefore, both attended words and speakers can be decoded by a simple classifier trained on an example of single speakers.O'Sullivan et al 46 .showed that single-trial unaveraged EEG data can be decoded to determine attentional selection in a naturalistic multi-speaker environment.They found that there is a significant correlation between the EEG measure of attention and performance on a high-level attention task.Also, neural processing at ~ 200 ms as being critical was identified for solving the cortical party problem to decode attention at individual latencies.Previous approaches in decoding the auditory attention have mainly focused on linear mappings between the sound stream cues and EEG responses 51,52 .More specifically, the mapping from auditory stimuli to cortical responses is typically referred to as the forward model or multivariate temporal response function (mTRF).It can be used to map in both the forward and backward direction to perform response function estimation and stimulus reconstruction, respectively.Fuglsang et al 53 .used the mapping from cortical responses to acoustic features as the backward model or stimulus reconstruction to decode the attentional selection of listeners in multi-talker scenarios.With reverberant speech, they observed a late cortical response to the attended speech stream that encoded temporal modulation in the speech signal without its reverberation distortion.de Cheveigne et al 54,55 .proposed an alternative to both forward and backward mapping, 1) DTU database: This dataset was published in 59 and acquired from 44 subjects (age 51-76) where 22 of them were hearing-impaired (19 right-handed, 9 females) and the rest were normal-hearing (16 females, 18 right-handed) listeners.All the EEG signals were collected with a sampling frequency of 512 Hz from the subjects in two ways: ear-EEG with 6 electrodes (three for each ear) and 64-channel EEG scalp recorded by the BioSemi Active-Two system.During the EEG recording, the subjects listened to one of the two simultaneous speech streams or a single speech stream in the quiet condition.Two different audiobooks in Danish read by a female and a male speaker (denoted as 'Spk1' and 'Spk2' in further analysis) were taken as speech stimuli presented in 48 trials each with ∼ 50 s length and 65 dB SPL.The audio files were filtered by a low- pass second-order Butterworth filter to avoid excessive high-frequency amplification for subjects with low audiometric thresholds.Each subject listened to either a single talker or two competing talkers during the 48 trials (all recording time for a listener is 50 s × 48).Sixteen trials were presented with a single talker (8 trials read by a female and 8 trials read by a male) and 32 trials were played with a multi-talker (one male and one female).In the multi-talker trials, the two speech streams were presented at the same loudness level to allow unbiased attention decoding.The audio files were presented at ±90 • azimuthal positions by non- individualized head-related transfer functions (HRTFs) and preprocessed.Listeners were prompted to answer 4 questions with multiple-choice comprehension regarding the content of the attended speech stream (for details see 59 ).It should be noted that one subject (number 24) was excluded from our analysis due to signal interruption during one of the trials.2) KUL database: This dataset was published in 60

Proposed auditory attention detection method
In the present work, auditory attention activity is assayed during the concentration on a narrator in multi-talker scenarios.Figure 1  www.nature.com/scientificreports/ in the presence of a competitive talker.Machine learning algorithms, which learn from input to make datadriven decisions, are now widely used for the analysis of EEGs.These learning algorithms are employed to train a model and develop related models with the input high-level feature vectors and lend themselves to prediction.We exploit the K-nearest neighbor (KNN), support vector machine (SVM), long short-term memory (LSTM), bi-directional long short-term memory (Bi-LSTM), and Q-Learning to construct the AAD model.

EEG preprocessing
The sampling frequency of both EEG datasets is resampled to 256 Hz.EEG data is carefully checked for body movements, eyes blinking, muscle activity, and technical artifacts.A 0.5-70 Hz band-pass finite impulse response (FIR) Butterworth filter is used to eliminate the interference of high-frequency noises with the 2112th and 212th order, respectively.The digitized EEG signals are re-referenced to the average of electrodes TP7 and TP8.
EEG microstate analysis EEG microstate analysis is a strong tool to study the temporal and spatial dynamics of human brain activity 61 .Microstate analysis reflects cortical activation for quasi-stable states in 60-120 ms which is important for investigating brain dynamics 62 .The pre-processed EEG data is analyzed by MNE 63  where x i (t) and x(t) are the instantaneous and mean potentials across N electrodes at time t.
In the second stage, topographies of each electric activity at the local GFP maximum point are recognized as a discrete EEG state and signal evolution is a series of such states.The successive microstates are derived from the EEG analyzed based on local maximum points of GFP in discrete states.In the next stage, using clustering methods, all microstates can be determined according to microstates patterns.The patterns have enabled many studies that uncovered their function and applied them to various disorders 35,37,[64][65][66] .Most studies in this field have reported 4 patterns of microstate topographies to represent brain activity measured using recording EEG.These four topographies included type A (right-frontal left posterior), type B (left-frontal right-posterior), type C (midline frontal-occipital), and type D (midline frontal), respectively 37 .Single topography remained quasistable for durations of about 80-120 ms before dynamically transitioning to another topography.Finally, when an EEG is considered to be a series of topographies of electric potentials that evolve, the entire recording can be studied using a set of topographies that dynamically fluctuate amongst themselves at discrete time points.
Figure 2 displays the microstate analysis for 2 s of attention task on EEG signal.At first, the GFP (depicted as a red line) is calculated at each given time duration as the spatial standard deviation (std).In the second step, the K-means clustering approach is executed on the scalp topographies of each input data.Several studies use the K-means clustering by the cross-validation (CV) metric to demonstrate that the optimal numeral of classes within subjects was four 35,67 .We set the numeral of clusters from 2 to 10 and the optimum set of classes is selected according to the maximum values of global explained variance (GEV).In the subsequent step, momentary maps of each group (group one: attending to 'Speaker 1' and group two: attending to 'Speaker 2') are separately categorized into 10 microstate clusters.Eventually, the generated class-labeled group maps are used as schema to allocate original individual successive EEG series of each listener to 10 microstate patterns shown in this figure.
In every microstate, four types of specification, namely, mean GPF, occurrence, duration, and coverage are computed.Mean GFP is determined as the middle GFP for a state.The occurrence is interpreted as the middle frequency of the detected states.Duration is explained as the middle length of states per unit.Coverage describes the percentage of each state appearing in every epoch.Figure 3 illustrates the occurrence values of EEG states which vary between subjects.
Since the clustering algorithm diminishes the complete set of spatial patterns, four methods derived from classical clustering algorithms, namely, independent component analysis (ICA) 68 , principal component analysis (PCA) 68 , atomize and agglomerate hierarchical clustering (AAHC) 69 , and k-means 70 algorithms.The GEV values for different numbers of EEG microstates are given in Table 1.Here, the optimal numeral of microstates is determined and then, their labels are sorted into a sequence by using four clustering algorithms and GEV criteria.GEV measures how similar each EEG sample is to the microstate prototype and it has been assigned to where the higher GEV is better.In the microstate analysis, the maximum value of GEV was selected after 10 iterations of re-run.

Recurrence quantification analysis (RQA)
To derive some useful non-linear dynamic attributes from the various states of the EEG signal, an RQA is performed 71 .Several studies have used RQA parameters to analyze EEG signals and quantify the cortical function at sleep apnea syndrome 72 , different sleep stages 73 , epileptic identification 74 , and tactile roughness discrimination 75 .It has the capability to extract non-linear characteristics of signals and quantify the complex and deterministic behavior of EEG signals.Recurrence refers to the trajectory returning to the former state in the phase space, which is generally constructed from a time-series signal using a time-embedding method.A recurrent plot (RP) was used to visualize the amount of recurrence in a multi-dimensional dynamic system by ( 1)  www.nature.com/scientificreports/simply illustrating a dot square matrix in a two-dimensional space (see Fig. 4).In Eq. ( 2), R is calculated for each sample, i, j of the time series x , under the predefined threshold distance ε 76 : where �(.) is the Heaviside function, . is the maximum norm, and N is the number of samples in the phase space trajectory.The distance in the phase space between x i and x j falls within the ε , two samples are considered to be recurrences, indicated as R i,j .Several features can be obtained to quantify the RP where each of them indicates a specific characteristic of the signal.In this work, the following features are extracted from the RP: Recurrence Rate (RR): This index measures the percentage of recurrence points in the RP which is calculated as 77 :

Determinism (DET):
This measure shows the percentage of recurrence points in the diagonal lines in the RP 78 .Higher values of this index indicate that the signal x has a deterministic nature with higher probability.It is computed by: where l and l min are the length of the diagonal line and minimum value, respectively.P(l) is the frequency dis- tribution of the length l.
Maximum line length (L MAX ): The longest diagonal line on the RP is defined as L MAX :

Mean line length (L MEAN ):
The average length of the diagonal line on the RP is defined as L MEAN : Entropy (ENTR): This index measures the entropy of the diagonal line lengths and is calculated using Eq. 6.It discloses the RP complexity of the system structure 79 .

Trapping Time (TT):
The TT represents the length of time that the dynamics remain trapped in a certain state.TT is the average length of vertical lines in the RP, as below:   where P(v) is the distribution of the length of vertical lines.Maximal vertical line length (V max ): This feature indicates the maximal length of vertical lines in the RP structure and is computed as: Recurrence time entropy (RPDE): This parameter has been successfully applied in biomedical testing.RPDE has advantages in detecting subtle changes in biological time series such as EEG and indicates the degree to which the time series repeat the same sequence.It is defined as:

Classification
To assess whether microstate features and recurrence quantification analysis are appropriate for auditory attention detection, several machine-learning algorithms are employed to compare classification performances.
• K-nearest neighbor (KNN): a non-parametric supervised learning algorithm that recognizes the class of testing samples according to the near class of K-nearest training samples 80 .In the train stage, KNN receives feature    2. • Support vector machine (SVM): operates as a supervised learning algorithm in classification analysis.When this algorithm is utilized for training, it builds a model for a given set of binary-labeled features as training data by maximizing the distance of hyperplanes.SVM maps input data into multidimensional space by a function and then, categorized groups that have maximum margins, and the region boundary by the hyperplanes 81 .Test samples are mapped into that multidimensional space and forecasted to belong to a class based on which side of the margin they drop.There is a constraint to prevent data points from falling into the margin which is called box constraint.This algorithm is widely utilized for classification problems due to its ability to manage huge data.The values of box constraint and types of kernel function affect the results of classification (more details in Table 2).• Long short-term memory (LSTM): shows great efficiency in feature sequence extraction and data classifica- tion in many implementations 82 .A simple LSMT cell consists of an input gate, output gate, forget gate, and the candidate cell.Each gate has an activation function with two weighted inputs: (1) the previous hidden state of the LSTM cell which is weighted by a recurrent weight, and (2) the current input which is weighted by an input weight where forgotten.Input and output gates have a sigmoid activation function and the cell candidate gate has a tangent hyperbolic function with bias values b.Therefore, the LSTM cell has two outputs: (1) the memory cell state, and (2) the hidden state 83 .• Bi-directional long short-term memory network (Bi-LSTM): as an extension of the traditional long-short term memory (LSTM) 84 , is trained on the input sequence with two LSTMs set up in reverse order.The LSTM layer reduces the vanishing gradient problem and allows the use of deeper networks as compared with recurrent neural networks (RNNs) 85 .The advantage of Bi-LSTM to CNN is its dependency on the sequence of inputs by taking the forward and backward paths into account.Table 2 shows the architecture of the utilized Bi-LSTM.• GRU-CNN Q-Learning (GCQL): is one of the reinforcement learning (RL) methods which is a numerical and iterative algorithm 42 .Q-learning attempts to estimate a value function that is closely related to the policy or which policy can be derived.Therefore, most RL problems can be solved by the Markov decision process (MDP) as a discrete-time state transition.Here, the current state S and action A of a system is independent to all previous states and actions P(S t+1 |S t , A t ) .P(...) is the probability of making a transition to the next state, S t+1 when the model receives action A t and state S t 86 .
The behavior of the model is described by a reward function R t−1 , which measures the success or failure of an agent's action in the environment.Here, GCQL represents an extension of QL to approximate optimal action-value function based on a gated recurrent network (GRU) and convolutional neural network (CNN) as the reinforcement learning method in the agent (see Fig. 5).As shown in this figure, the environment learns the optimal policy using the interactions between GRU and CNN in the agent.In other words, the RL algorithm employs this structure of neural networks as a function approximator.
It is noticed that CNN can learn representations and is very suitable for processing image data and RNN has memory ability in learning the non-linear features of sequence data such as EEG signals.GRU is a variant of RNN that can effectively alleviate the gradient disappearance and gradient explosion problem in the traditional RNN during training.It considers both historical information and new information when calculating the current state value.Therefore, the combination of GRU and CNN could improve the robustness of deep learning to decode small-scale EEG datasets and alleviate the overfitting phenomenon caused by insufficient data.

Experimental setup
Two experiments are conducted to assess the performance of the proposed AAD method based on microstate and RP features.The first experiment concerns the evaluation of the AAD procedure using different features extracted by microstate and recurrence quantification analysis, separately.In the second experiment, the efficiency of the MS and RQA features are assessed in different combinations.
To conduct the first experiment, the EEG data of 43 subjects during the 48 trials were selected to analyze the efficiency of the AAD.The four types of microstates and eight types of RP features are obtained on the input EEG signals on non-overlap windows along 256 samples.The extracted features are given to the classifiers (i.e., KNN, SVM, LSTM, Bi-LSTM, and GCQL) for detecting attended/unattended speech, separately.Here, seventy percent of data (i.e., 34 trials of all 48 trials recorded from each subject) is used as a training set and the rest is considered as the test set.In other words, both training and test data originated the same subject.In the second experiment, the combinations of MS and RP features as multivariate feature analysis are fed to the classifiers.This is performed to find appropriate features with high performance in attention detection from EEG signals.
To evaluate the performance of the proposed method, the recently developed attention detection system introduced by O'Sullivan et al 16 .,Lu et al 25 ., Ciccarelli et al 20 ., Geirnaert et al 26 ., Zakeri et al 27 .,Cai et al 56 ., and Niu et al 87 ., are simulated and used as baseline systems from the literature.

Evaluation criteria
The efficiency of the AAD algorithm is determined through three metrics: Accuracy, Sensitivity, and Specificity 88 .Accuracy (ACC) values show the overall detection correctness.Sensitivity (or true positive ratio: TPR) indicates the rate of correctly classified trials whereas Specificity (or true negative ratio: TNR) measures the rate of correctly rejected trials.Here, TP, TN, FP, and FN indicate true positive, true negative, false positive, and false negative predictions of the algorithm, respectively.

Results and discussion
Here, two experiments are executed to specify the best procedure for auditory attention detection using dynamic state analysis of the brain.First, the statistical analysis is performed on all single and multi-variate features to find the significant features ( p_value < 0.05 ) between the two groups.Then, classical and modern approaches mentioned in section "Literature Review" are utilized to assess the best proficiency for each single and multivariate feature set.Finally, the impact of the several durations of EEG segments is assayed on the performance of the proposed AAD method.

Statistical analysis
As a preliminary data analysis, Kolmogorov-Smirnov (KS) test 89 is used to measure the normality of feature vectors.Here, the probability values ( p < 0.05 ) indicate that the data has non-normal distributions.Therefore, the Mann-Whitney U (Wilcoxon rank sum) test is selected to compare differences in extracted features between two independent groups when data is not distributed in normal form.KS test as s non-parametric test allows us to check whether the statistics at hand take different values from two different populations.p_value < 0.05 indi- cates higher significance in terms of large differences in medians of the two groups.Tables 3 and Table 4 depict the significant p_values for every single microstate and RP features extracted from preprocessed EEG signals during the selective auditory attention task, respectively.According to these results, directly extracted features from EEG signals are not significant between the two cognitive tasks except V max .However, Table 5 shows the p_values of the multivariate features.Here, features including RR, DET, V max , and RPDE which are extracted from occurrence, duration, coverage, and mean GFP show significant differences between two auditory attention tasks, attending to 'Spk 1' vs attending to 'Spk 2' .

AAD
Classification results acquired from microstate analysis are presented in Fig. 6 with occurrence, duration, coverage, and mean GFP parameters.These results include the ACC, TPR, and TNR on EEG signals divided into 1 s segments.It can be seen that these parameters fail in classification performance among two groups of attending to 'Spk1' and attending to 'Spk2' with accuracy close to the chance level.However, the highest accuracy is achieved by the "Mean GFP feature + GCQL classifier" compared with the other microstate parameters and classifiers.Figure 7 illustrates the attention detection for classifying with only RP parameters extracted from each 1 s segment of EEG signals.According to the figure, the highest ACC is 91.5% achieved by the "RR feature + GCQL classifier".In order to obtain the highest performance of AAD, the best features of the microstates and RP analysis are selected from the point of view of classification accuracy.Therefore, the recurrence rate of the mean GFP feature is calculated as an optimal multivariate feature and fed to the GCQL classifier.
In the further experiment, the performance of the proposed method is examined for different segments of EEGs.To this aim, first, the mean GFP of EEG data is calculated by microstate analysis for different durations of EEG segments from 0.02 s to 50 s.Then, the recurrence rate (RR) is extracted from the mean GFP and given to the GCQL classifier.The average of the proposed AAD performances is shown in Fig. 8  observed that the detection performance of the proposed algorithm decreases significantly as the duration of the EEG data increases.In addition, this figure illustrates that the measures of ACC, TPR, and TNR are increased as the data length is shortened, specifically, in the length of 0.1 s to 1 s.Moreover, the TPR and TNR values lie in acceptable ranges for all EEG segments.This achievement could be considered in online applications such as neuro-steered hearing aid devices.Figure 9 compares the performances of the proposed AAD method based on the optimal feature set (i.e., mean GFP + RR") and GCQL classifier beside the baseline systems in terms of ACC measures.According to the accuracy criteria, the introduced AAD algorithm has superior proficiency than the baseline systems including "O'Sullivan et al 16 .,Lu et al. 25 ., Ciccarelli et al 20 ., Geirnaert et al 26 ., Zakeri et al 27 .,Cai et al 56 ., and Niu et al 87 .,It is observed that the accuracy of the baseline systems is increased with the increasing length of EEG durations, Although the exploratory analysis yielded significant results for the short length of EEG signals, 1 s, it was the one where differences in microstates analysis had a stronger impact on extracting differences of brain function during the milliseconds.In addition, extracting RQA from microstates as multivariate features emphasizes the dynamic behavior of the brain performance throughout the auditory attention task.As well as the ability of the GCQL-AAD model to categorize auditory attention and extract temporal features and real-time analysis is one of the other advantages of the proposed model that has not been included in previous studies.
However, the results of the present study should be viewed in light of some limitations.First, the study was not designed for multi-talker scenarios with more than two talkers, so the proposed model could be confounded by the other attended speaker in the presence of two or more talkers.Second, this algorithm has been performed www.nature.com/scientificreports/by high-dense scalp EEG which is not portable for real application.Using the smaller number of electrodes which have more relationship with the auditory attention cortex could enhance the ability of the proposed method to utilize in BCI devices.

Conclusion
In the present work, a novel approach for auditory attention detection is presented based on the microstates and recurrence quantification analysis from EEG signals.Here, participants listen to the two talkers and focus on only one of them (during the half of trials, they attend to speaker number 1 and the rest attend to speaker number 2).In the first step, microstate analysis is performed to extract appropriate features from EEG states.Also, recurrence quantification analysis is utilized on the EEG signals to the emerging complex behavior of the brain.Then, the extracted features are given to the five types of classifiers (i.e., KNN, SVM, LSTM, Bi-LSTM, and GCQL) both individually and in combination to find the optimized AAD structure.Results of the experiments show that the extracting recurrence ratio (RR) from the mean of the global field power (mean GFB) with the GCQL classifier yields a higher performance, 98.9% in terms of accuracy.The proposed AAD model has an important advantage over the forward and backward mapping algorithms, in the sense that attention recognition is performed from the EEG data of each listener without any access to auditory stimuli.Furthermore, the classification proficiency indicates that the proposed GCQL-AAD method performs higher than the recently published AAD approaches of O'Sullivan et al 16 .,Lu et al 25 ., Ciccarelli et al 20 ., Geirnaert et al 26 ., Zakeri et al 27 .,Cai et al 56 ., and Niu et al 87 ., as the bassline systems.Additionally, the decision time window of EEG-based auditory attention detection is generally more than 1 s for previous research.Obtaining the best decoding performance in a shorter time window is an urgent application requirement.The present work could capture AAD with high performance in a shorter window length of EEG.In 20,26 , methods based on deep learning, especially CNN, have dominated the field of EEG decoding in AAD.However, using only CNN has limitations in high global dependence on capturing long-term sequences and detecting auditory attention through dynamic EEG signals.Therefore, we proposed an AAD-GCQL model by capturing the dynamic behavior of the brain to address the problem of temporal and dynamic dependencies.The experimental results confirmed the effectiveness of the proposed AAD architecture, which outperformed the other baseline models.
In this research, EEG signals of all recording electrodes (i.e., 64 channels) are used in the AAD analysis.To alleviate the computational load and time cost of the AAD algorithm, the number of EEG recording channels could be reduced by electrode reduction methods.The current work uses an experimental configuration only with two competing talkers which limits the applicability of the proposed algorithm.It is necessary to examine AAD efficiency for more realistic scenarios such as the cocktail party with many speakers.
and consists of 16 normal-hearing subjects (age 17-30) where 8 of them were male and the rest were female.The speech stimuli include four Dutch stories, narrated by three male and female speakers.The audio files were presented dicotically at ±90 • azimuthal positions by HRTF filtering.64-channel EEG signals were recorded using a BioSemi ActiveTwo device at a sampling rate of 8192 Hz.A total of 72 min of EEG was recorded per subject, approximately 36 min per attended ear.All stimuli were normalized to have the same root-mean-square value and the attended stories were randomized across subjects.It is noted that the audio signals were filtered by a low pass filter at 4 kHz.

Figure 1 .
Figure 1.The proposed AAD method based on microstate and recurrence quantification analysis.

Figure 2 .
Figure 2. Schematic flowchart of the EEG microstate analysis using MNE.The GFP of each sampling point is calculated and all topographic maps at the local GFP maxima are obtained.The K-means clustering analysis method is used to analyze topographic maps to obtain optimal microstate classes.At the bottom, microstate temporal sequences are obtained by fitting microstate classes back to complete EEG data.

Figure 3 .
Figure 3. Changes in occurrence values of microstate features depend on each subject of DTU database and performance.Boxplots represent the first quantile, third quantile, and median values.

Figure 4 .
Figure 4. Example of recurrence plots (RP) with 12 different scales for the GFP extracted from EEG of DTU database.

Figure 5 .
Figure 5.The block diagram of the proposed GCQL to improve estimating the value function.

Figure 8 .
Figure 8. Performance of the proposed GCQL-AAD using multivariate feature (i.e., "mean GFP + RR") and GCQL classifier for different EEG segments on DTU and KUL databases.

Table 1 .
GEV values using ICA, AAHC, PCA, and K-means clustering for different numbers of N. Significant values are in [bold].

Table 3 .
for 100 epochs.It can be P-values of Mann-Whitney test for extracted microstate (MS) features directly from preprocessed EEGs.The symbol * indicates a significant difference ( p − value < 0.05).Significant values are in [bold].

Table 4 .
P-values of Mann-Whitney test for extracted features of recurrence plot (RP) directly from EEGs.The symbol * indicates a significant difference ( p − value < 0.05).Significant values are in [bold].

Table 5 .
16values of Mann-Whitney test for multivariate features "MS + RP" extracted from EEG (note: eight RQA features are extracted from four microstates).The symbol * indicates a significant difference ( p − value < 0.05).Significant values are in[bold].AAD performance only with MS feature vectors on DTU and KUL databases.ingeneral.In Table6, the highest performances belong to the DTU database analysis for AAD.Here, the proposed system introduced by "O'Sullivan et al16.achieved accuracies of 51.9%, 52.7%, 53.1%, 49.4%, 58.5%, and 67.3% for window lengths 1 s, 5 s, 10 s, 20 s, 30 s, and 40 s, respectively.The proposed system introduced by Lu et al.