EEG feature selection method based on maximum information coefficient and quantum particle swarm

To reduce the dimensionality of EEG features and improve classification accuracy, we propose an improved hybrid feature selection method for EEG feature selection. First, MIC is used to remove irrelevant features and redundant features to reduce the search space of the second stage. QPSO is then used to optimize the feature in the second stage to obtain the optimal feature subset. Considering that both dimensionality and classification accuracy affect the performance of feature subsets, we design a new fitness function. Moreover, we optimize the parameters of the classifier while optimizing the feature subset to improve the classification accuracy and reduce the running time of the algorithm. Finally, experiments were performed on EEG and UCI datasets and compared with five existing feature selection methods. The results show that the feature subsets obtained by the proposed method have low dimensionality, high classification accuracy, and low computational complexity, which validates the effectiveness of the proposed method.

Existing hybrid feature selection methods can be divided into two steps 17 .First, a filter feature selection method is used to remove some features to reduce the search space in the second step, and then an optimization algorithm is used to optimize the feature subset 18 .Therefore, hybrid feature selection methods can achieve higher classification accuracy and reduce the feature space dimensionality by combining the advantages of filter feature selection and wrapped feature selection 19 .Song et al. 20 proposed a fast hybrid feature selection method based on correlation-guided clustering and particle swarm optimization (PSO).This approach divides feature selection into three stages.In the first and second stages, filter feature selection and clustering algorithms are used to reduce the search space in the third stage.Then, in the third stage, PSO is used to find the optimal subset.Ansari et al. 21proposed a hybrid filter-wrapper feature selection method for sentiment classification.In this approach, the feature subset is first selected from the feature set by a rank-based feature selection method, and then the optimal feature subset is selected by PSO.Xue et al. 22 proposed a fault feature selection algorithm combining the ReliefF and quantum particle swarm optimization (QPSO).First, ReliefF is used to classify features and remove irrelevant ones.The features are then further selected using QPSO to obtain the optimal subset of features.These methods have shown good performance on datasets in their respective domains.However, EEG datasets show characteristics of small sample size and high feature dimensionality, which are different from common datasets 23 .Therefore, it remains to be verified whether these methods are suitable for EEG feature selection.In the filtering feature selection stage, many methods only deal with irrelevant features and ignore redundant features, which can affect the feature optimization effect in the second stage 24,25 .Moreover, in the stage of feature subset optimization based on the evolutionary algorithm, traditional fitness functions are mainly based on the classification accuracy of the test set, ignoring the importance of dimensionality reduction 26 .
To address the above issues, a hybrid feature selection method is proposed for EEG feature selection.In the first stage, the method uses MIC to deal with irrelevant and redundant features, which effectively reduces the search space in the second stage.On the one hand, irrelevant features are eliminated by computing the relationship between features and categories.On the other hand, redundant features are eliminated by computing the correlation between the selected features and the features to be selected.In the second stage, we design a new fitness function that combines classification accuracy and dimensionality reduction rate to improve classification accuracy while minimizing feature dimensionality.The feature subset is then further optimized using QPSO to obtain the optimal feature subset.Finally, SVM is used for EEG classification.Meanwhile, considering the interaction between feature subset and classifier, the proposed method synchronously optimizes feature subset and SVM parameters in the second stage to improve classification accuracy.

Related work
Maximum information coefficient.MIC is a correlation measurement method based on information theory, and its scope of application and accuracy is superior to other correlation measurement methods such as Pearson correlation coefficient, mutual information, and information gain 13 .Meanwhile, it has low computational complexity and is suitable for measuring the correlation between various features.For two random sequences D = (f 1,i , f 2,i ), i = 1, 2, . . ., n , the MIC is defined as follows: where X represents dividing the domain of f 1 into X segments.Y represents dividing the domain of f 2 into Y segments.XY < B represents the range of values for the number of grids.I(D, X, Y ) represents the mutual information of D under X columns and Y rows.Since the row and column divisions are not equidistant, vari- ous meshing methods exist.max and min represent the maximum and minimum values.Reference 27 pointed out that the effect is best when B = n 0.6 , where n represents the data length, so this paper also takes this value.
Quantum particle swarm optimization.Sun et al. 28 introduced quantum behavior into particle swarm optimization and proposed the QPSO algorithm, which improved the global search ability of particle swarms.Meanwhile, the QPSO algorithm only has a position vector, so it has fewer control parameters and stronger optimization ability.The implementation process of QPSO is as follows: Step 1: initialization.where x j represents the j-th dimension of the particle.x j max and x j min represents the upper and lower limits of the j-th dimension.N represents the size of the particle swarm.
where min represents the minimum value, and f represents the fitness function.
Step 3: Update the position of the particle. (1) where i represents the particle and j represents the dimension.x(t) represents the t-th generation particle.ϕ , u , and r are random numbers in the (0,1) interval.α is called the contraction expansion coefficient.T represents the current iteration number.T max represents the maximum number of iterations.
where x k represents the particle with the smallest fitness value.
Step 5: Determine whether the termination condition is met.If not, return to Step 3. Otherwise, output the optimal value.

Support vector machine.
The SVM can obtain the optimal global solution by using the principle of structural risk minimization as the optimality principle.It performs well on small sample nonlinear problems.Moreover, it has the advantages of simple system structure, global optimization, strong generalization ability, and short learning and prediction time when dealing with classification problems.Therefore, it is often used to solve various classification problems 29,30 .The basic idea of SVM is to map the input data to a high-dimensional space so that the input data is linearly separable.The core problem is to solve the minimization problem: where w and b represent the weight vector and bias of the hyperplane.C represents the regularization parameter.ξ represents the slack variable.y i represents the category of the i-th experiment.x (i) represents the feature vector of the i-th experiment.When the data is converted from low-dimensional to high-dimensional space, the kernel function is necessary, and the most commonly used kernel function is the Gaussian kernel function: where σ represents the bandwidth of the kernel function.The regularization parameter C and kernel function bandwidth σ significantly influence the performance of SVM, and the parameters can be optimized by the grid method or optimization algorithm.The proposed method uses the classification error rate of SVM to evaluate the feature subset, so there is a correlation between the parameters of the SVM and the feature subset.Therefore, QPSO is used to optimize the parameters of the feature subset and SVM synchronously to obtain a better classification effect.

EEG feature
Many scholars have conducted meaningful studies in EEG analysis and proposed a large number of reliable EEG features.In this paper, five commonly used EEG features are extracted, including approximate entropy (ApEn), power spectral density (PSD), Hjorth parameter, CO complexity, and fractal dimension (FD).
Approximate entropy (ApEn) is a nonlinear dynamic parameter used to quantify the regularity and unpredictability of time series fluctuations.ApEn uses a non-negative number to represent the complexity of a time series, reflecting the possibility of new information appearing in the time series.The more complex the time series, the larger the approximation entropy.For a time series consisting of N data, the M-dimensional reconstruction of the signal is first performed: (5) www.nature.com/scientificreports/ The distance between X m (i) and X m (j) is defined as: Given a threshold value r, for each i, count the number of d ≤ r , and calculate the ratio of B i to N − m − 1: Then average the logarithm of B m i (r): Finally, the ApEn of the signal can be estimated as: (2) Power spectral density.PSD is used to describe the change of signal power with frequency.In this paper, we used the periodogram to estimate the PSD, and its mean value was used as the EEG feature.The Fourier transform of signal x is X, then the spectral characteristics of the signal are calculated as follows: (3) Hjorth parameter.
The Hjorth parameter describes the signal characteristics in terms of Activity, Mobility and Complexity.For the signal x, the Hjorth parameter is calculated as follows: where µ s represents the average value, var represents the variance, and s ′ (n) represents the first derivative.
CO complexity is used to describe the irregularity of the signal.The signal can be divided into regular and irregular parts, and CO complexity defines the ratio of irregular parts.It reflects the complexity and randomness of the signal.For a signal x(n), the Fourier transform is first performed on the signal, and then the average value of the power spectrum is calculated: where X(k) represents the fast Fourier transform of x(n), and N represents the length of the signal in the fre- quency domain.Then a new spectral sequence is constructed using M and X(k) : Finally, the CO complexity of signal is obtained: Vol.:(0123456789) (5) Fractal dimension.
Fractal dimension (FD) can characterize the complexity of time domain signals.For a time-domain signal x(t) of length N, the signal is first transformed as follows: where τ = 1, 2, ..., T .For the transformed signal X, its length sequence is defined as: Then calculate the average length of each sequence: All values from T min to T max are calculated, and an average length sequence is obtained.Then the slope of linear fitting for ln L(T) to ln L(1/T) is estimated as the fractal dimension of the signal.

Improved feature selection method
The Filter method obtains feature subsets by ranking the quality of individual features 10 , thus the quality for each feature is well.However, the classification results rely on feature subsets rather than individual features.The Filter method lacks the evaluation of the overall performance for the feature subset, resulting in low accuracy 17 .The Wrapper method uses the classification accuracy of the classifier as the overall evaluation metric for the feature subset, and thus the classification accuracy is higher than that of Filter.However, the Wrapper method has a higher computational load than the Filter method, and when the feature dimension is high, the dimension of the obtained feature subset is relatively high 18,20 .In this paper, the proposed method first uses MIC to remove irrelevant and redundant features and then uses QPSO and SVM for quadratic feature selection, which can better eliminate irrelevant and redundant features while ensuring classification accuracy.
MIC-based feature pre-selection.In this paper, MIC is used to eliminate irrelevant and redundant features.First, the correlation between the feature vector and the category vector is measured by MIC.Sort the feature vectors in descending order by correlation.Features whose correlation is less than a certain threshold are irrelevant features.Then, the correlation between feature vectors is measured by MIC.Features whose correlation is greater than a certain threshold are redundant features.
For an nm-dimensional feature matrix X = {x i , i = 1, 2, . . ., m} , an n-dimensional category vector y = y i , i = 1, 2, . . ., n , where n represents the number of samples and m represents the number of features.Denote the correlation between feature x i and x j as mic(x i , x j ) , and the correlation between feature x i and category y as mic(x i , y) .The value of MIC ranges from 0 to 1.The larger the value, the greater the correlation between the data.mic > 0.8 indicates a strong correlation between the data.mic < 0.2 indicates a weak correlation between the data 13 .mic(x i , x j ) measures the correlation between features.When mic(x i , x j ) is larger than 0.8, it indicates that the correlation between the two features is high.mic(x i , y) measures the correlation between features and categories.When mic(x i , y) is less than 0.2, it indicates that the correlation between features and categories is weak.Therefore, in this paper, we set the threshold to 0.8 for redundant features and 0.2 for irrelevant features.
Then, the implementation steps of MIC-based feature pre-selection are as follows: Step 1: For feature vectors x i , i = 1, 2, . . ., m and category vector y , compute their maximum information coefficients mic(x i , y), i = 1, 2, . . ., m.
Step 2: When mic(x i , y) is less than 0.2, x i is an irrelevant feature, and then remove x i from the feature matrix.
Step 3: For the remaining feature vectors, calculate the MIC between different feature vectors mic(x i , x j ) , i = j .
Step 4: When mic(x i , x j ) is greater than 0.8, compare the values of mic(x i , y) and mic(x j , y) .if mic(x i , y) < mic(x j , y) , then x i is a redundant feature, otherwise x j is a redundant feature.
Step 5: Remove all redundant features, and finally get a pre-selected feature subset.
Secondary feature selection based on QPSO.After removing irrelevant and redundant features through MIC, the dimensionality of the original features will be significantly reduced.After that, the Wrapper method is used for quadratic feature selection, which can further optimize the performance of feature subsets (25) and optimize the SVM parameters.There are two critical issues in feature selection using QPSO: the mapping between particles and features and the design of the fitness function.
For the first question, it should be emphasized that QPSO optimizes both the feature subset and the SVM parameters.Thus, the dimension m of each particle is equal to the dimension of a pre-selected subset of features plus the number of SVM parameters to be optimized.Among them, the first m-2 dimensions of the particle correspond to the dimension of the feature subset, respectively, and the value range is (0, 1).When the value is (0, 0.5), the dimension is not selected, and when it is [0.5, 1), it means that the dimension is selected.The last two dimensions of the particle correspond to the parameters of the SVM and have a range of positive values.
For the second problem, the classification accuracy rate of the classifier is often used as the fitness function of the optimization algorithm, but there are limitations in using the classification accuracy only as an evaluation metric.It ignores the effect of dimensionality reduction.Especially when dealing with high-dimensional data, selecting fewer features is also of practical interest.Therefore, in this paper, we construct a new fitness function based on SVM recognition accuracy and dimensionality reduction rate: where CA represents the classification accuracy, DR represents the dimension reduction.d represents the original feature dimension.ϕ represents the proportion of classification accuracy.According to Eq. ( 29), the relationship curve between feature dimension and classification accuracy can be obtained, as shown in Fig. 1.
The Fig. 1 shows that the value approaches 1 when the feature dimension is low, indicating that the fitness function considers classification accuracy almost exclusively.This value gradually decreases as the feature dimension increases until it approaches 0.9, indicating that the fraction of classification accuracy gradually decreases while the fraction of dimension reduction gradually increases.The new fitness function takes into account both classification accuracy and dimensionality reduction.In particular, as the feature dimension increases, the fraction of dimension reduction also increases to some extent, which is more in line with the actual needs.The specific implementation steps for feature selection based on QPSO are as follows: Step 1: Initialize the number of particles N , the particle swarm X , and the maximum number of iterations T max .
Step 2: The feature matrix and SVM parameters are extracted based on the particles and the SVM classifier is used for data classification.During model training, K-fold cross-validation is used to improve the performance of the model.According to Eqs. ( 28) and ( 29), the fitness value of the particle is obtained.
Step 3: Initialize the optimal historical value and the optimal global value according to Eqs. ( 3) and (4).
Step 5: Update the optimal historical value and the optimal global value according to Eqs. ( 9) and (10).
Step 6: Let t = t + 1 , when t ≤ T max , return to Step 4. Otherwise, output the optimal global value, and obtain the accurately selected feature subset.Universiti Sains Malaysia (HUSM).MDD participants with psychotic symptoms, alcoholism, smoking and epilepsy were excluded from the study.The healthy control group was also excluded for possible mental or physical illness.All participants signed an informed consent form and were informed of the details of the trial.The experiment was designed in accordance with the Helsinki Declaration and approved by the HUSM Ethics Committee.The 19-channel EEG cap was used for EEG recording, and the electrodes were placed in the international standard 10-20 system, including Fp1, F3, C3, P3, O1, F7, T3, T5, Fz, Fp2, F4, C4, P4, O2, F8, T4, T6, Cz and Pz, as shown in Fig. 2, where A1 and A2 were the reference electrodes.
The resting-state EEG signals were acquired from MDD and HC subjects in the eye-opened (EO) condition.These signals are sampled at 256 HZ and filtered with a 0.5 Hz to 50 Hz bandpass filter and an additional 50 Hz notch filter.The artifacts were then removed using EEGLAB.Subjects 14 and 25 in the NC group and 7, 8, 12, and 34 in the MDD group were excluded because the EEG was less than 4 min after processing.Therefore, the data used in this study came from 58 subjects, including 28 NC and 30 MDD.Meanwhile, the EEG of the middle four minutes was intercepted to reduce the effect of noise at both ends.In order to improve the sample size, each subject's data were segmented according to 10 s, and finally, a total of 1392 (58 × 24) samples were obtained.

EEG features.
The sampling frequency of EEG signals is 256HZ, so six layers of wavelet transform are designed to extract delta, theta, alpha, beta, and gamma bands.On this basis, the Hjorth parameter, approximate entropy (ApEn), power spectral density (PSD), fractal dimension (FD), and CO complexity were extracted from each frequency band.Finally, the features of all channels and frequency bands are combined as feature vectors.
Table 1 shows the specific information of EEG features.The number of signal channels is 19, and the signal of each channel is decomposed into five frequency bands, so the dimension of the feature is 95 (19 × 5).The Hjorth parameter contains three perspectives, so its dimension is 285 (95 × 3).
Ablation study.To verify the effectiveness of the proposed method, an ablation experiment was conducted, and the experimental setup is shown in Table 2. MQSVM1, MQSVM2, and MPSVM are hybrid feature selection methods, FS-QPSO is Wrapper feature selection, and FS-MIC is Filter feature selection.Five-fold crossvalidation was used in the experiments.Five-fold cross-validation is to divide the data into five parts, each time extracting one part (not repeated) as the verification set, the remaining four parts as the training set.The training set is used to train the model.Also, three-fold cross-validation is used in the model training, that is, the k value of the proposed method is set to 3. The trained model is then applied to the validation set and the experimental results are obtained.The above operation is repeated five times and the results are averaged to obtain the final result.MPSVM uses PSO to optimize feature subsets and classifier parameters.The two acceleration coefficients of the PSO are set to 2, the inertia weight to 1, and the number of iterations and particles to 200.Other methods use QPSO to optimize feature subsets or classifier parameters.The number of iterations and particles in the QPSO is set to 200.FS-MIC uses a forward search strategy to find the optimal subset.All experiments were performed on Intel(R) Xeon(R) CPU, 2.3 GHz, 128 GB RAM.  Figure 3 shows the optimization process of MQSVM1 and MQSVM2 in the second stage.It can be seen that the fitness values of MQSVM1 and MQSVM increase with the number of iterations.Meanwhile, the fitness value reaches the convergence value before the end of the iteration, indicating that the particle number and iteration number settings are valid.The convergence value of MQSVM2 is lower than that of MQSVM1 for different EEG features because PSO tends to get stuck in local optima, while QPSO can jump out of local optima.The above analysis shows that the performance of the proposed method is higher when QPSO is used in the second stage.
Table 3 shows the running times of all methods.It can be found that the running time of all methods increases with the feature dimension.The running time of FS-MIC increases significantly with the feature dimension, since FS-MIC employs a forward search strategy, which requires the QPSO to optimize the classifier parameters each time a feature is accepted.The running time of MQSVM1 gently increases with the feature dimension, since MQSVM1 uses QPSO to optimize the classifier parameters while optimizing a subset of features.When the feature dimension is low (CO, APEN, FD), the running time of FS-MIC is small, which is due to the low computational complexity of the Filter method.When the feature dimension is high (fusion), the running time of MQSVM1 is small, indicating that the proposed method is suitable for high-dimensional data sets.In terms of average running time, FS-QPSO has the longest running time, while the other four methods have almost no difference in running time, indicating that the Wrapper method has the highest computational complexity.In particular, the running time of FS-QPSO is very long in high-dimensional features, which is not favorable for practical applications.The above analysis shows that the proposed method is not computationally complex and can be used for EEG feature selection.
Table 4 shows the dimensions of the feature subsets.It can be seen that the feature subset dimension after feature selection is significantly lower than the original feature dimension.The dimension of feature subsets obtained by three hybrid feature selection methods (MQSVM1, MQSVM2, MPSVM) is significantly lower than that of the Wrapper and the Filter feature selection methods, which indicates that the hybrid feature selection method has advantages in feature selection.Among all EEG features, the dimensionality of the feature subset obtained by the proposed method is significantly lower than that of the other two hybrid feature selection methods, indicating that the proposed improved strategy is effective.The dimensionality of the feature subset of  www.nature.com/scientificreports/MQSVM1 is significantly lower than that of MQSVM2, which indicates that the fitness function constructed in this paper using the dimensionality of the feature subset and the classification accuracy is effective.The dimension of the feature subset of MQSVM1 is significantly lower than that of MPSVM, indicating that PSO is more likely to get stuck in the local optimal solution when optimizing the feature subset.At the same time, QPSO can jump out of the local optimal solution, so the dimension of the feature subset obtained by MQSVM1 is low.The above analysis shows that the dimension of the feature subset obtained by the proposed method is lower than that of the other methods, which is beneficial for reducing the computational complexity of the proposed method.
Table 5 shows the classification accuracy of EEG.It can be found that the classification accuracy of FS-QPSO is significantly higher than the other methods because the Wrapper method can evaluate the overall performance of feature subsets, which is beneficial for improving the classification accuracy of feature subsets.The classification accuracy of MQSVM1 is significantly higher than that of FS-MIC, which indicates that the proposed method can combine the advantages of the Wrapper method and improve the classification accuracy of the proposed method.The classification accuracy of MQSVM1 is higher than that of MPSVM, which indicates that QPSO has more optimization power than PSO, and therefore it is more advantageous to use QPSO for feature optimization.
The weighted values of feature subset dimension and classification accuracy were calculated according to Eq. ( 28), and the results are shown in Table 6.The weighted values synthesize the dimensionality and classification accuracy of a subset of features, which facilitates direct comparison and analysis of the performance of different methods.The larger the weighted value, the better the performance of the feature subset obtained by the method.It can be found that the weighted value of FS-MIC is significantly lower than that of the other methods, indicating that the feature subset obtained by the Filter feature selection method performs relatively poorly.The weighted values of MQSVM1, MQSVM2, and FS-QPSO are higher, which indicates that the feature subsets obtained by the hybrid feature selection method and the Wrapper feature selection method have better In several EEG features (CO, APEN, Hjorth, PSD, fusion), MQSVM1 was weighted higher than other methods.
Considering the average values, the weighted value of MQSVM1 is the largest, indicating that the feature subset obtained by the proposed method performs the best.The proposed method runs faster when only MIC is used for feature selection.However, since the Filter feature selection method uses a forward search approach, the running time in high-dimensional datasets is extended when the classifier needs to optimize parameters.Therefore, the running time of Filter feature selection methods combined with some classifiers that do not require parameter optimization will be significantly reduced.In addition, the feature dimension and accuracy obtained by the Filter feature selection method are lower than those of Wrapper and hybrid feature selection methods because the feature subset obtained by the Filter feature selection method lacks the overall evaluation of the feature subset, so the performance of the feature subset obtained by the Filter feature selection method is not high.FS-QPSO achieves the highest classification accuracy, but its running time is longer than Filter and hybrid feature selection methods due to the ample search space.When the feature dimension is higher (fusion), the running time increases significantly, reaching 1214 min, which makes the method difficult in practical application.Therefore, although the feature subset obtained by the Wrapper feature selection method performs well, the computational complexity is too high, and it is not easy to apply it directly to high-dimensional features.MQSVM1 combines the advantages of Filter and Wrapper feature selection methods, and its runtime is significantly lower than FS-QPSO.The proposed method optimizes the parameters of the classifier while optimizing a subset of features in the second stage, which reduces the computational complexity of the method in high-dimensional features to some extent.Therefore, the running time of the proposed method is significantly slower than that of FS-QPSO as the dimension of EEG features increases.There are differences between MQSVM1 and MQSVM2 in terms of fitness function.MQSVM1 uses a fitness function constructed from feature subset dimension and classification accuracy, and MQSVM2 uses a fitness function constructed from classification accuracy.Experimental results show that the feature subset dimension of MQSVM1 is significantly lower than that of MQSVM2, and the classification accuracy of MQSVM1 is also high, which indicates that the fitness function designed in this paper can effectively reduce the dimension of the low feature subset and achieve high classification accuracy.MQSVM1 and MPSVM have different optimization algorithms.MQSVM1 uses QPSO to optimize feature subsets, and MPSVM uses PSO to optimize feature subsets.Experimental results show that the classification accuracy of MQSVM1 is significantly higher than that of MPSVM, which indicates that QPSO has a more robust search capability and can significantly reduce the probability of the method falling into local optimality.Therefore, it is efficient to use QPSO in the second stage to optimize the feature subset.In terms of weighted values, MQSVM1 achieves significantly higher values than the other methods, which verifies the effectiveness of the proposed improved strategy.
Comparison with existing methods.To thoroughly verify the superiority of the proposed method, we compared the proposed method with other classical methods, including EEG feature selection methods and other domain feature selection methods.These include the MRMR EEG feature selection method adopted by Cai et al. in 2018 11 , the Fscore-based EEG feature selection method adopted by Wu et al. in 2018 24 , and the RFINCA EEG feature selection method adopted by Tuncer et al. in 2021 12 , the ReliefF-QPSO feature selection method adopted by Xue Rui et al. in 2020 15 , and the Fscore-MIC feature selection method adopted by Zhao Ling et al. in 2021 13 .
The experimental setup is the same as in Section "Ablation study".The experiments are performed with five-fold cross-validation, the model is trained with three-fold cross-validation, and the number of particles and iterations of the QPSO is set to 200.The number of neighbors for ReliefF-QPSO and RFINCA is set to six.For the other methods, no other parameters need to be set.MRMR, Fscore, FSCOre-MIC, and RFINCA use a forward search strategy to find the optimal subset.
Table 7 shows the running times of the different methods.It can be found that as the feature dimension increases, the running time of all methods increases.MQSVM and Relief-QPSO are hybrid feature selection methods, but the running time of MQSVM is significantly lower than that of Relief-QPSO, mainly because ReliefF only deals with irrelevant features when performing feature selection.The proposed method uses MIC to deal with irrelevant and redundant features, which reduces the search space in the second stage and hence the computational complexity of the method.The remaining four methods are Filter feature selection methods, However, the running time of the Filter feature selection method increases significantly in high-dimensional features, mainly because the Filter method uses a forward search method to find the optimal subset of features.
Compared with existing methods, the running time of the proposed method grows more slowly with the increase of feature dimension, mainly because in the second stage, the proposed method optimizes the parameters of the classifier while optimizing the feature subset.The average runtime of MQSVM is 227 min, which is not a significant increase over existing methods.The above analysis shows that the computational complexity of the proposed method is not very high compared to the existing methods, and it increases slowly in high-dimensional features.Table 8 shows the dimensions of the feature subsets.It can be found that the dimensionality of the feature subsets obtained by all methods is significantly reduced.The dimension of the feature subset obtained by Relief-QPSO is significantly higher than that obtained by the other methods.In high-dimensional features (Hjorth, fusion), the performance of Relief-QPSO is significantly reduced, which is because the fitness function of Relief-QPSO does not consider the influence of the feature subset dimension, and the ReliefF only processes irrelevant features, resulting in a large search space in the second stage.The dimension of the feature subset obtained by MQSVM is significantly lower than that of Relief-QPSO because MQSVM uses MIC to process irrelevant and redundant features, which reduces the search space in the second stage.In terms of the mean value of the results, the proposed method has the smallest value, indicating that the dimension of the feature subset obtained by the proposed method is lower than that of the existing methods.
Table 9 shows the classification accuracy.It can be found that the classification accuracy of MQSVM is not much different from the existing methods, so it is difficult to judge the performance of these methods from the classification accuracy alone.www.nature.com/scientificreports/ The weighted sum of the dimension and classification accuracy of the feature subset was calculated according to Eq. ( 28) to analyze the performance of different methods.The results are shown in Table 10.It can be found that MQSVM has significantly higher weighting values on multiple EEG features (CO, FD, Hjorth, PSD, fusion) than existing methods.At the same time, MQSVM has the largest value in terms of the mean value of the results.The above analysis shows that the feature subset obtained by the proposed method has higher classification accuracy and lower feature dimensionality than the existing methods.
The average running time of the proposed method is 227 min, which is significantly lower than the existing hybrid feature selection method, indicating that the computational complexity of the proposed method is not high.At the same time, the running time of the proposed method gently increases with the feature dimension, which indicates that the proposed method performs well on high-dimensional datasets.The dimension of the feature subset obtained by the proposed method is lower than that of the existing method, and the classification accuracy of the feature subset obtained by the proposed method is not much different from that of the existing method, which indicates that the feature subset obtained by the proposed method not only has a lower dimension but also has high classification accuracy.The weighted sum of the dimensionality and classification accuracy is computed, which facilitates a direct comparison of the performance of the feature subsets obtained by different methods.The weighted value of the proposed method is higher than that of the existing methods, which indicates that the performance of the feature subset obtained by the proposed method is better than that of the existing methods.At the same time, the weighted values of the proposed method are larger than those of the existing methods in many EEG features, which indicates that the performance of the proposed method is relatively stable across different EEG features.The above analysis shows that the feature subsets obtained by the proposed method have low dimensionality, high classification accuracy, and low computational complexity compared to the existing methods, which validates the effectiveness of the proposed method.
Analyze the robustness of the proposed method.To further validate the robustness of the proposed method, the UCI dataset is used for experimental validation.The details of the datasets are given in Table 11.The UCI dataset is divided into a training set and a validation set; the training set is 75 percent, and the validation set is 25 percent.The parameter settings for the algorithm are the same as those used for the experiments in Section "Comparison with existing methods".
The weighted sum of the dimension and classification accuracy of the feature subset is calculated according to Eq. ( 28), and the results are shown in Table 12.It can be found that the weighted value of the proposed method is significantly higher than that of the existing method on multiple datasets (segment, dermatology, Parkinson, CNAE, QSAR) and has obvious superiority on high-dimensional datasets (Parkinson, CNAE, QSAR).By averaging the results over all datasets, it can be found that the mean value of MQSVM is significantly higher than that of the existing methods.The larger the weighted value, the better the overall performance of the feature subset.
The above analysis shows that the proposed method has stable performance on datasets with different dimensions, and the extracted feature subsets perform well, which indicates that the proposed method is robust and can be used for feature extraction on other data.

Conclusions
In this paper, we propose a hybrid EEG feature selection method to reduce the dimensionality of EEG features and improve classification accuracy.MIC is used to eliminate irrelevant features and redundant features, which effectively reduces the search space in the second stage and improves the speed of the method.In the second stage of feature selection, QPSO is used to optimize a subset of features, which avoids the problem that conventional PSO quickly gets stuck in local optima.Considering the coupling between the feature subset and the classifier, QPSO is used to optimize the parameters of the classifier while optimizing the feature subset, which increases the classification accuracy and reduces the computational complexity of the algorithm.Several EEG features of different dimensions are extracted from a public EEG dataset, and the proposed method is compared with five existing feature selection methods.Experimental results show that the feature subsets obtained by the proposed method have low dimensionality, high classification accuracy, and low computational complexity.Thus, the proposed method has a clear advantage over existing EEG feature selection methods.Despite the beneficial results obtained in this study, there are some limitations.First, the proposed method uses MIC to measure feature performance to eliminate irrelevant and redundant features.However, MIC suffers from high computational complexity and long running time when the dimensionality of the dataset is high.Although mutual information and the Pearson correlation coefficient can measure correlations, mutual information is challenging to remove redundant features, and the Pearson correlation coefficient is difficult to measure nonlinear relationships between features.Future work will involve identifying and proposing a low computational complexity correlation measure to reduce the overall computational complexity of the algorithm.Moreover, the QPSO used in the proposed method may still get stuck in local optimality.Therefore, future work can focus on improving the optimization algorithm to improve the global search capability of the algorithm.

10 Figure 1 .
Figure 1.Transformation curve of proportion for classification accuracy.

Figure 3 .
Figure 3.The change curve of fitness value in the second stage.

Table 1 .
Details of EEG features.

Table 2 .
Comparison methods of experiments.

Table 3 .
Runtime of different methods (min).Bold values indicate the minimum runtime.

Table 4 .
Dimension of the feature subset.Bold values indicate the smallest dimension of the feature subset.

Table 5 .
Classification accuracy of EEG.Bold values indicate the highest classification accuracy.

Table 6 .
The weighted value of feature subset dimension and classification accuracy.Bold values indicate the largest weighted values of feature subset dimension and classification accuracy. .The weighted value of MPSVM is significantly lower than that of the other two hybrid feature selection methods, indicating that QPSO performs significantly better than PSO in feature subset optimization.

Table 7 .
Runtime of different methods (min).Bold values indicate the minimum runtime.whichhave significantly lower running times than hybrid feature selection methods in low-dimensional features.

Table 8 .
Dimensions of the feature subset.Bold values indicate the smallest dimension of the feature subset.

Table 9 .
Classification accuracy.Bold values indicate the highest classification accuracy.

Table 10 .
The weighted sum of classification accuracy and dimension of the feature subset.Bold values indicate the largest weighted values of feature subset dimension and classification accuracy.

Table 11 .
Details of the UCI dataset.

Table 12 .
Weighted sum of all methods on the UCI dataset.Bold values indicate the largest weighted values of feature subset dimension and classification accuracy.