Introduction

The number of biometric systems based on electroencephalography (EEG) has been steadily growing using various approaches to solve problems related to the authentication and verification stages. A research-grade EEG device guarantees a controlled environment and high-quality multi-channel EEG recording, which is offset by the high computational cost, non-portability of the equipment, and use of inconvenient conductive gels. The development of dry EEG sensors has created new possibilities for the development of new types of portable EEG systems.

An important step towards this goal is a reduction in the number of required EEG channels while increasing, or at least maintaining, the same performance high-density EEGs. Depending on whether the paradigm is task-related or task-free, certain EEG channels provide only redundant or sub-optimal information. Several techniques have been studied with the aim of developing low-density EEG-based systems with high performance, i.e, pre-processing and feature extraction, channel selection, and paradigms to stimulate brain signals. For EEG-based biometric systems, several approaches have been presented using various paradigms to stimulate and record the EEG signals, i.e. imagined speech1,2,3, resting-state4,5,6,7,8,9,10, and event-related potentials (ERPs)11,12.

The use of the resting-state does not require any training before using the system. We present an approach using the resting-state during which the eyes of the subjects were open and repeated the experiments using the resting-state during which the eyes of the subjects were closed. We organized our approach as a multi-objective optimization problem with three unconstrained objectives: (1) reduce the necessary number of EEG channels to perform identification, (2) obtain a true acceptance rate (TAR) as high as possible, and (3) obtain a true rejection rate (TRR) as high as possible. The TAR is the percentage of times the system correctly verifies a true claim of identity, and the TRR the percentage of times it correctly rejects the subjects that are not in the system.

In general, there are a number of approaches that use raw data as input for various configurations of neural networks (NN)13,14,15,16 and several have been proposed to tackle the high dimensionality of the data using methods for feature extraction, such as principal component analysis (PCA)17, the discrete wavelet transform (DWT)3, or empirical mode decomposition (EMD)2,3,11,12.

Several approaches have been proposed for the creation of a biometric system following various experiment configurations, with various paradigms and methods for feature extraction and classification using the public EEG Motor Movement/Imagery Dataset (EEGMMIDB), using various configurations of neural networks14,18,19,20, other supervised and unsupervised techniques5,21,22,23,24,25,26,27,28,29,30,31, and methods for EEG channel selection6,32,33.

One approach used a subset of eight pre-selected channels32 and EEG data from a task for training and then that from another task for testing. The selection of the channels was justified based on their stability across various mental tasks, and the results presented were evaluated using the half total error rate (HTER), which was 14.69%. Another approach used various tasks from the EEGMMIDB and channel selection, using the binary flower pollination algorithm (BFPA),and reported accuracies of up to 0.87 using supervised learning and approximately 32 EEG channels33. However, the analysis considered only non-intruders when using multi-class classification, and therefore the addition of more stages for detecting the intruders is necessary.

Other approaches use instances of different length with the same dataset, such as instances of 10 or 12 s5,24. Resting-state instances of 10 s have been validated with the leave one-out framework, consisting of five instances of 10 s for training and one instance for validating the model24, resulting in a correct recognition rate (CRR) of 0.997 for the resting-state with the eyes-open, and 0.986 with the eyes-closed, all using 64 EEG channels. Biometric systems based on resting-state and one-second EEG signals from only two channels (FP1 and FP2) and a 256-Hz sample rate has been proposed by extracting features directly from the raw data and using Fisher’s discriminant analysis, obtaining an up to 0.966 TAR and a 0.034 FAR9.

Deep learning algorithms have shown to be a success in image processing and other fields, but when using EEG data they have not shown convincing and consistent improvements over the most advanced methods to date16,34. However, recently some new approaches have been presented with high accuracies, for instance, an approach using Convolutional Neural Networks (CNN) Gated Recurrent Unit (CNN-GRU) was presented15, and the authors evaluated the proposed method in a public dataset called DEAP, which consists of EEG signals from 32 subjects using 32 channels using different emotions as a paradigm35. Their experiments were performed using 10-s segments of EEG signals and reported that CNN-GRU reached up to 0.999 mean CRR with 32 channels, and 0.991 with 5 channels that were selected using one-way repeated measures ANOVA with Bonferroni pairwise comparison (post-hoc). The findings of this work are interesting and the accuracies obtained are high. However, deep learning approaches need to use a large amount of data, as well as the length of the signal segments and the paradigm followed are not the standard, and for a real-time application, the collection of a large number of instances, and instances during long periods can be exhausting and not competitive with current biometric systems in the industry (i.e fingerprint, face recognition, etc.).

The amount of data and time required for training NN are the main concerns for effective deployment and adoption in real-life scenarios of EEG-based biometric systems. In the literature, researchers report from simple NN structures (i.e., a single hidden layer) to more complex networks (Recurrent and CNN), but it requires the improvement of computational power, with faster CPUs and the use of GPUs15,28,29,30,31,34. To overcome the need for a large amount of data of deep learning approaches, there is an approach that uses simple data augmentation techniques by creating overlapped time windows18. Other related proposals using neural networks have been presented and compared in the state of the art28,29,30,31, where some of the most relevant works used around 100 subjects and in most of the cases 64 channels for testing their approaches13,14,18,36. However, there is no defined method for channel selection, since the process for selecting the most relevant channels requires the repetition of the classification process several times, and it is well known that deep learning approaches are computationally costly30,34.

Based on the results of our previous studies2,7,12, we compared EMD and DWT for decomposing the EEG signals into different frequency bands. When using EMD, we selected the two sub-bands closest to the original EEG signal for each channel, and for DWT we decomposed the EEG signal into four different sub-bands (Three levels of decomposition and the residual), as explained in the “Methods” section. For each sub-band, we then extracted the instantaneous and Teager energy and the Higuchi and Petrosian fractal dimensions, thus obtaining a set of eight features per channel when using EMD and 16 features for DWT.

We present the results from a set of experiments using the EEGMMIDB dataset, with 109 subjects, for creating an EEG-based biometric system that creates a model for each subject and can reject the other 108. We also present results from experiments using one-class support vector machine (OC-SVM) with the radial basis function (RBF) to compare the behavior with the second approach, which uses the local outlier factor (LOF). All the experiments were validated using 10 runs with 80% of the instances for training and 20% for testing. We solved the optimization problem using the non-dominated sorting genetic algorithm (NSGA)-III, which uses the Pareto-optimal solutions and a niche method to maintain stable sub-populations of good points37. NSGA-III has proven to be a robust method for channel selection, especially when using more than two objectives12,38. First, we report the results for a set of experiments using all the available channels to show the TAR and TRR using different parameters. We then present the results for experiments using OC-SVM and the optimization process with EEG signals from the resting-state in which the eyes of the subjects were open. Finally, we repeated the experiments, solving the optimization problem for channel selection using LOF with different parameters and the EEG signals from the resting-state in which the eyes of the subjects were open and then closed.

Results

Subject identification using 64 EEG channels, comparing OC-SVM and LOF using different parameters

We used EEG signals from 64 channels of 109 subjects and 60 instances of one second with a sample rate of 160 Hz that were recorded during the resting-state in which the eyes of the subject were open. We used EMD- or DWT-based features, and evaluated the results using the TAR and TRR.

For ensuring tenfold cross-validation, we performed the experiments 10 times, selecting randomly 80% of the instances for training and 20% for testing, in this way we are ensuring that the method can be generalized and that the results can be obtained even using another subset of instances for training and testing. The models were created using OC-SVM or LOF models. It should be noted that the channels and parameters were optimized for all the subjects at the same time but creating a single machine-learning model for each subject. In general, the results presented in Table 1 were obtained by creating a model for each of the 109 subjects in which the model of the subject was used to recognize the subject and reject the rest of the 108 that were not part of the model.

Table 1 Average TARs and TRRs for subject detection with EEG data from 64 channels using different parameters for OC-SVM and LOF and DWT-based features with 109 subjects.

The results obtained with OC-SVM showed the lowest TAR (Table 1), meaning that the models created with OC-SVM did not learn from the training set and thus rejected approximately 50% of the instances, on average, explaining why the TRR was high when using OC-SVM. The results obtained with LOF, using three different algorithms and one or ten neighbors, are also shown in Table 1 for illustrative purposes. LOF using the k-d tree algorithm and one neighbor resulted in the highest TAR and TRR, meaning that it was possible to identify each subject and reject almost all the rest that did not correspond to the models.

Previous results have shown that the algorithm and number of neighbors used are important for increasing the TAR and TRR. We repeated the experiments using DWT-based features considering only LOF with the k-d tree and 1 to 10 neighbors to provide more information about this behavior The average results obtained using 10-fold cross-validation are presented in Fig. 1.

Figure 1
figure 1

TARs and TRRs obtained using various numbers of neighbors with the LOF k-d tree algorithm and DWT-based features.

The use of a higher number of neighbors resulted in a decrease in the TAR from 1.000 to 0.923 and an increase in the TRR or its remaining higher than 0.988 (Fig. 1), meaning that the models were unable to learn about the features of each subject using that number of neighbors. This is relevant, as it shows the importance of selecting not only the best feature extraction method but also the LOF algorithm and the best number of neighbors.

Channel selection using NSGA-III and OC-SVM for EEG signals from the resting-state with the eyes open

We previously showed that the TAR and TRR of the models created using OC-SVM can be improved by finding the best nu and gamma parameters12. We performed the optimization process defined in the “Methods” section to provide more information about the behavior of OC-SVM models using a bigger dataset, trying to improve the TAR and TRR while reducing the necessary number of EEG channels for subject identification.

For this experiment, we used EEG signals of the 109 subjects during the resting-state with their eyes-open, using 80% of the instances for training and 20% for testing. We used NSGA-III for the channel selection method using 64 binary genes in a chromosome to represent the EEG channels (1 if the channel is used, 0 if not) and two genes with decimal values (both from 0 to 1) to select the best nu and gamma parameters, obtaining thus a chromosome of 66 genes.

The distribution of the results of one run, as an example, obtained using EMD- and DWT-based features is shown in Fig. 2. The average and standard deviation of the results obtained using 10-fold cross-validation are presented in Table 2.

Figure 2
figure 2

Frontal and aerial view of the TARs and TRRs obtained in the channel selection process using EMD-based features (left) and DWT-based features (right) with OC-SVM.

Table 2 TARs and TRRs obtained for the first five EEG channels in the Pareto-front for three objectives solved with NSGA-III using EMD- and DWT-based features with OC-SVM.

As mentioned previously, we performed the optimization 10 times for cross-validation. For some runs, the Pareto-front contained only channel combinations with one to five channels and others with one to seven. Using these different subsets found, we can further analyze the channels in common, and the others. With this, it may be possible to recommend a set of channels for a new possible headset (Considering the best subset found and the most appropriate for a new design.), but first, it is necessary to perform the analysis to choose the best paradigm or sub-task (i.e. resting-state with the eyes open or closed) for EEG data collection. For comparative purposes, we present the average TAR and TRR obtained using channel combinations of one to five channels in the Pareto-front of the 10 runs.

We obtained a TAR of \(0.822 \pm 0.028\) and a TRR of \(0.969 \pm 0.022\) with only five channels when using EMD-based features (Table 2). The TAR and TRR were \(0.822 \pm 0.028\) and \(0.981 \pm 0.017,\) respectively, when using DWT-based features and five channels with the optimization process.

As it is presented in Fig. 2, the candidates generated using EMD- or DWT-based features and OC-SVM showed a clear tendency to reject all the subjects (Which increases the TRR, since the models are rejecting correctly the intruders), even the subject in each model (Which decreases the TAR), meaning that the models created for each subject did not learn from the provided features. TAR increases only if the right nu and gamma parameters and channels are selected, which also vary in each run and is reflected in the standard deviation.

Figure 3 presents the set of channels used during the optimization process in the 10 runs. The set of channels found when using EMD-based features are presented in the top and the bottom is for DWT-based features. Each set of channels from left to right correspond to using one to five channels, and as it was mentioned early, in some of the runs, the channels found by NSGA-III were different to other runs, but the figure presents one set of them. Using EMD-based features, the channels found when using one to five channels differ, but it is possible to observe that channels around T10 and T8 are consistent across most sets. When using DWT-based features it is more clear that channel IZ appears in all the sets, and channels C4 and T10 appear in most of the sets.

Figure 3
figure 3

Set of one to five channels found during the optimization process for creating the biometric system with OC-SVM using EMD-based features (top) or DWT-based features (bottom), and resting-state with the eyes open.

Channel selection using NSGA-III and LOF for EEG signals from the resting-state with the eyes open

We performed the optimization process using the 109 subjects in the dataset, but now considering LOF for creating the models of each subject. We used NSGA-III for the channel selection method using 64 binary genes in a chromosome to represent the EEG channels and two genes with integer values to select the algorithm (1: Ball tree, 2: k-d tree, 3: Brute force) and the number of neighbors (From 1 to 10, which were proposed experimentally) to be used, obtaining thus a chromosome of 66 genes. The experiment was repeated 10 times for validation, each time using 80% of the instances of each subject for training and 20% for testing.

The results of the first run are presented in Fig. 4 as an example of the distribution of the TARs and TRRs during the optimization process and Table 3 presents the average results for both methods of feature extraction, EMD or DWT.

Figure 4
figure 4

Frontal and aerial view of the TARs and TRRs obtained in the channel selection process using EMD-based features (left), and DWT-based features (right) with LOF.

Table 3 TARs and TRRs obtained for the first seven EEG channels in the Pareto-front for three objectives solved with NSGA-III using EMD-based features and LOF.
Figure 5
figure 5

Average distribution of the algorithms and number of neighbors used in the optimization process with EMD-based features (left) and DWT-based features (right).

Figure 6
figure 6

Average distribution of the algorithms and number of neighbors used for the results in the Pareto-front of the optimization process with EMD-based features (left) and DWT-based features (right).

The use of DWT-based features allowed us to obtain an average TAR of up to \(0.993 \pm 0.001\) and a TRR of \(0.941 \pm 0.002\) using only three EEG channels (Table 3). Interestingly, the distribution of the results was very distinct and clear (Fig. 4), indicating that we can obtain similar TARs and TRRs with different channel combinations using LOF and EMD- or DWT-based features.

The average distribution of the parameters used in the complete optimization process (In all generations and all chromosomes) is presented in Fig. 5, showing that the algorithm more often used by LOF was ball tree with three neighbors when using EMD-based features. The ball tree and k-d tree algorithms were used equally, with three neighbors, when DWT-based features were used. Analysis of only the parameters used for the results in the Pareto-front in the 10-fold cross-validation (For obtaining the results presented in Table 3) confirmed that the ball tree algorithm with 3 to 4 neighbors is the most often used for EMD-based features, and for DWT-based features, the ball tree and k-d tree are used with only two neighbors, as it is shown in Fig. 6.

Figure 7 presents the set of channels of the 10 runs used for obtaining the results on Table 3, which correspond to the use of one to seven channels, using EMD-based features (Top of the figure) and DWT-based features (Bottom of the figure). Interestingly, in this case, the channels are almost the same using both methods, and they did not differ too much when using one or three channels. Another important point is that channels IZ, T8, and T10, appear in most of the cases, in both methods, using EMD- or DWT-based features. It is possible to observe that the most relevant area is around C6, T8, T10 and F5 channels.

Figure 7
figure 7

Set of one to seven channels found during the optimization process for creating the biometric system with LOF, and EMD-based features (top) or DWT-based features (bottom), and resting-state with the eyes open.

Channel selection using NSGA-III and LOF for EEG signals from the resting-state with the eyes closed

Previous experiments using LOF resulted in higher TARs and TRRs with a lower number of EEG channels than using OC-SVM. We repeated the optimization process with EEG data from 109 subjects but considering the resting-state with the eyes closed, as described in the “Methods” section, to provide additional information about the performance of LOF with EMD- or DWT-based features.

The chromosome representation was as in the previous experiment: 64 genes for representing the EEG channels and two additional genes with integer values for the different algorithms and number of neighbors. Each experiment was performed 10 times, randomly selecting 80% of the instances for training and 20% for testing, ensuring thus the tenfold cross-validation. The results obtained from a run when using EMD- and DWT-based features are presented in Fig. 8 for visualization and understanding the behavior during the optimization process.

Figure 8
figure 8

Frontal and aerial view of the TARs and TRRs obtained in the channel selection process using EMD- (Left) and DWT-based features (Right) from the resting-state with the eyes closed, using LOF.

Table 4 TARs and TRRs obtained with LOF for the first seven EEG channels in the Pareto-front for three objectives solved with NSGA-III, using EMD- or DWT-based features and the resting-state with the eyes closed.
Figure 9
figure 9

Average distribution of the algorithms and number of neighbors used in the optimization process with EMD-based features (left) and DWT-based features (right) using EEG signals from the resting-state with the eyes closed.

Figure 10
figure 10

Average distribution of the algorithms and number of neighbors used for the results in the Pareto-front of the optimization process with EMD-based features (left) and DWT-based features (right) using EEG signals from the resting-state with the eyes closed.

The average TAR and TRR in the Pareto-front for the first seven channels, in both cases, using EMD and DWT for feature extraction, are presented in Table 4. The results show that subject identification is possible using the resting-state with the eyes closed. The TAR and TRR are similar to those presented in Table 3. Interestingly, the results were maintained in the 10 runs, especially when using DWT for feature extraction, as the standard deviation was 0.011 for the TAR and 0.009 for the TRR.

The average distribution of the parameters used during the entire optimization process is shown in Fig. 9. The k-d tree algorithm was the most used, in both cases, using EMD or DWT, and the number of neighbors ranged from 1 to 4, with a clear advantage of using two neighbors. The average parameters used for obtaining the results in the Pareto-front are presented in Fig. 10, confirming that the k-d tree algorithm is the most used and the number of neighbors still ranges from 1 to 4, with preferential use of only 2 neighbors.

As in the previous experiment using resting-state with eyes open, Fig. 11 presents the set of channels found in the optimization process of the 10 runs creating the models for the biometric system using resting-state with the eyes closed, using EMD-based features (Top of the figure) as well as DWT-based features (Bottom of the figure). Interestingly, the results presented in Figs. 7 and 11 do not differ too much, even across methods and in the sets of different number of channels (In the sets created in the 10 runs with 1 to 7 channels). The most relevant area is still around the channels C6, T8, T10 and IZ.

Figure 11
figure 11

Set of one to seven channels found during the optimization process for creating the biometric system with LOF, using EMD-based features (top) or DWT-based features (bottom), and resting-state with the eyes closed.

Discussion

Our research on biometric systems has focused on the study and comparison of various task-related and task-free paradigms, i.e. resting-state, ERPs, and imagined speech, using various types of electrodes, and various numbers of channels2,3,7,11. The resting-state has been used in the state-of-the-art for this purpose since it does not require any training process for the subject. There are approaches based on multi-class classification using machine-/deep-learning and one-class classification. Although most of the approaches can discriminate between the subjects in the database when using multi-class classification, they do not consider possible intruders. In the best case, a study presented a set of eight EEG channels selected beforehand32. Another study used deep learning with a set of five EEG channels also selected beforehand, but they did not use resting-state15. We previously presented a method for channel selection using a two-stage method tested on a dataset with 26 subjects for detecting intruders and then using multi-class classification to detect the name of the subject12. The stage for intruder detection was created using OC-SVM with nu and gamma parameters determine by a genetic algorithm that also selected the most relevant channels for the task. However, OC-SVM has been shown to be very sensitive to the nu and gamma parameters.

Here, we present a new approach for an EEG-based biometric system using brain signals recorded during the resting-state with the eyes open and the resting-state with the eyes closed, using LOF and channels selected by NSGA-III. In short, we created a model using LOF with EMD-/DWT-based features for each subject, and then were able to reject the other 108 subjects in the dataset, confirming that the features extracted from each subject can help to discriminate between the subject in the model and the rest of the subjects, with good results, even with a low number of EEG channels and using 108 subjects as intruders.

First, we conducted experiments using EEG signals from the resting-state with the eyes open and 64 EEG channels, with OC-SVM and LOF, using different parameters. We show that we can attain a TAR of up to \(1.000 \pm 0.000\) and a TRR of \(0.998 \pm 0.001\) using LOF and the k-d tree algorithm with only one neighbor, all using DWT-based features. Then, we repeated the experiment using 1 to 10 neighbors with DWT-based features, LOF, and the k-d tree algorithm, as they were the best parameters found in the previous experiment, and also to show that a different number of neighbors affects the TAR and TRR.

We also showed that OC-SVM resulted in a TAR of \(0.502 \pm 0.004\) and a TRR of \(0.993 \pm 0.001,\) meaning that the models are unable to learn from any of the features of the subject (EMD- or DWT-based). It was thus necessary to fit the best nu and gamma parameters by using the multi-objective optimization process12. This resulted in substantially higher TAR and TRR values (See Fig. 2). In the best case, we obtained a TAR of up to \(0.822 \pm 0.028\) and a TRR of \(0.969 \pm 0.22\) with EMD-based features, and a TAR of \(0.822 \pm 0.28\) and a TRR of \(0.981 \pm 0.017\) with DWT-based features. However, the standard deviation was high.

The results presented with LOF when using the resting-state with the eyes open show that we can obtain a TAR of up to \(0.993 \pm 0.01\) and a TRR of \(0.941 \pm 0.002,\) with only three EEG channels, and with only two EEG channels using DWT-based features. We obtained a TAR and TRR above 0.900, which are higher than the best results obtained in the Pareto-front using EMD-based features. As shown in Fig. 4, the distribution of the TAR and TRR values was consistent when reducing the number of EEG channels during the optimization process, showing that the models created with LOF learn well from the features provided and that different channel combinations are used to obtain the best results, as presented in Table 3. In this case, the most used algorithm for the complete optimization process was ball tree, with three neighbors. Analysis of the parameters using DWT-based features and only the results obtained in the Pareto-front show the use of the ball tree and k-d tree algorithms to be highly similar, using only two neighbors.

The use of EEG signals from the resting-state with the eyes closed and LOF confirmed that DWT-based features works better, with a TAR of up to \(0.997 \pm 0.002\) and TRR of up to \(0.950 \pm 0.005,\) with only three EEG channels. The k-d tree algorithm, with 2 to 4 neighbors, was the most used for the complete optimization process, as well as the results obtained for the Pareto-front.

The use of OC-SVM can offer good results if the appropriate parameters are chosen. Otherwise, the TAR can decrease substantially. This behavior needs to be further investigated using different feature-extraction methods and compare with the results using different-size datasets. On the other hand, LOF proved to be a robust classifier for creating an EEG-based biometric system, especially using DWT-based features, with the ball tree or k-d tree algorithms and 2 to 4 neighbors. In the future we will evaluate whether solving the problems related to EMD (Best spline, end effects, mode mixing, etc.) can improve the results presented in this study.

Comparing the results presented in Figs. 37 and 11, one can argue that when using LOF it is more clear to see the most relevant area for choosing a possible set of channels and compare these findings in the future. It is interesting to see that when using resting-state with eyes open or closed did not differ substantially in the channel’s distribution. The localization of most of the relevant channels, i.e the channels that are found in most of the sets, is mainly around channels F5, T8, T10 and IZ, and as it has been shown in Fig. 7, it is more clear when using resting-state with the eyes open. In general, it is possible to mention that most of the channels are localized in the temporal and frontal areas, as well as around the inion, which may be associated with the previous task performed during the data collection. This is an aspect that must be tested in other datasets39,40,41.

One of the purposes of this paper was to prove that resting-state can be used as a paradigm for creating a biometric system in a larger dataset than in our previous work. We have provided a set of experiments where high-density EEG was available during training and testing stages, but for a real-time implementation of the biometric system, we will select only a few of the best channels for designing a new portable headset tailored for this purpose. With the set of experiments and the methods tested for classification and optimization, we have provided a proof-of-concept for a biometric system based on resting-state using a few electrodes, and proved it in a pool with a high number of subjects (109 subjects) compared with our previous work on a smaller dataset. However, with our current results it is not possible to argue that there exists a unique subset of EEG channels or brain regions that works better when creating the biometric system using resting-state. This study is a starting point for analyzing different public and private datasets trying to identify a unique subset of channels that may be used for designing a new portable and easy to use EEG headset that can be tested in real-time, adding new subjects to the system and identifying them with a few electrodes.

It is noticeable the progress in subject identification using EEG signals from different paradigms, but one of the most relevant unsolved problems is that usually the new approaches up to now, are proposed and validated using EEG datasets recorded in well-controlled environments30,42. Most of the proposals using high-density EEG signals are recorded with medical-grade sensor systems (Using gel or saline solution for improving the conductivity), which may increase the performance of the methods. However, for practical and portable devices, ease-of-use will be essential and dry electrodes might be offer opportunities here42. In general, the analysis and validation in real-life scenarios is necessary. With this, also the best and faster methods will be studied in a more realistic way, and the appropriate and necessary number of trials per subject will be considered7.

For certain brain–computer interface (BCI) applications, the problem of recognizing new instances from new sessions has been studied using EEG data from different sessions or adding new instances for calibration. In the case of session-to-session or subject-to-subject transfer, the learning problem has been studied using linear discriminant analysis (LDA) and SVM, based on motor imagery or P300 paradigms34,43,44,45,46. To adapt the EEG feature space and thus reduce session-to-session variability, we can use a data space adaptation method based on the Kullback-Leibler divergence criterion (Also called relative entropy), aiming to minimize the distribution of differences from the training session to a different session44. There is evidence that for certain BCIs, it is possible to use background noise immediately before a new session to improve session-to-session variability using a regularized spatio-temporal filter45.

The dataset used in this paper consists of EEG signals from a single session, which limits the experimental configurations and does not allow evaluation of whether we can create models for each subject from a certain session and be able to recognize the subjects or reject them using data from another session. Future steps will center the attention on tackling this problem and analyzing a possible way to use new correctly-classified instances to decrease session-to-session variability, as well as using and comparing current progress in transfer learning, using machine-/deep-learning methods for this problem16,46.

Another point to be analyzed in our future work is the comparison with our current proposal and a new way for extracting and selecting the features, looking for improving the TRR and TAR. This can be achieved using a big bag-of-features from the different sub-bands (Possibly from both methods, EMD and DWT) and add additional GA genes for representing these features in the chromosomes and thus select the best features during the optimization process, at the same time of selecting the best channels.

In general, the resting-state has been shown to be a good candidate but there is not yet sufficient research evidence using larger datasets and different stages. Our future efforts will be focused on relevant parameters that can be extracted from the EEG signals of each subject and thus add information for the complete authentication and verification process, such as re-evaluating the accepted subject using multi-class classification, detecting the age-range and sex of the subjects47, etc. Additionally, the use of ever larger datasets is still necessary, using EEG data from different sessions and of different lengths, as well as considering fewer instances for training.

Methods

Dataset

We used a public dataset that consists of EEG signals of 109 subjects from 64 channels, localized according to the 10–10 international system, with a sample rate of 160 Hz and the recorder using the BCI2000 system. The dataset is part of the physionet project48.

Each subject performed two 1-min resting-state runs, one with the eyes open and one with the eyes closed. Then, three 2-min runs were carried out for four different tasks: two motor movement tasks and two imagery tasks49. The four types of motor movement and imagery tasks were performed for opening and closing the left or right fist, imagining opening and closing the left or right fist, opening and closing both fists or both feet, and imagining opening and closing both fists or both feet according to the position of a target on the screen (Left, right, top, or bottom).

For this study, we used only the two 1-min baseline runs and created instances of 1 s, obtaining 60 instances of 1 s during the resting-state with the eyes open and 60 instances of 1 s during the resting-state with the eyes closed for each subject.

Pre-processing and feature extraction

We used the common average reference (CAR) method for improving the signal-to-noise ratio from the EEG signal, which removes the common information from all electrodes that were simultaneously recorded. It can be computed for an EEG channel \(V_{i}^{CAR}\), where i is the number of the channel, as follows:

$$\begin{aligned} V_{i}^{CAR} = V_{i}^{ER}- \frac{1}{n} \sum _{j=1}^{n}V_{j}^{ER} \end{aligned}$$
(1)

where \(V_{i}^{ER}\) is the potential between the i-th electrode and the reference and n the number of electrodes.

For comparative purposes, we used two different methods for decomposing the EEG signals and then obtained a set of features for each sub-band.

The first method used was EMD, which is a method that decomposes an EEG signal into a set of finite intrinsic mode functions (IMFs)50. Some IMFs that contain limited information may appear in the decomposition, depending on the parameters used in the EMD method (Spline for the interpolation, the method for solving the end effect problem, etc.) and because the numerical procedure is susceptible to errors51. These signals show maximum Minkowski distances with respect to the original signal52.

Based on the results obtained in our previous research2,7,12, we used the closest two IMFs, based on the Minkowski distance, and each IMF was characterized by extracting a set of four features:

  1. 1.

    Instantaneous energy, which provides the energy distribution53.

  2. 2.

    Teager energy, which reflects variations in both the amplitude and frequency of the signal53,54.

  3. 3.

    Higuchi fractal dimension, which approximates the mean length of the curve using segments of k samples and estimates the dimension of a time-varying signal directly in the time domain55.

  4. 4.

    Petrosian fractal dimension to provide a rapid computation of the fractal dimension of an EEG signal by translating the time series into a binary sequence56.

The second method used for decomposing the EEG signals into different frequency bands was DWT. When using DWT, it is necessary to specify the mother function and the levels of decomposition. Here, we use bi-orthogonal 2.2 with three levels of decomposition. We characterized each level of decomposition by extracting the same set of four features described previously.

The process for extracting four features for each selected IMF returns eight features per channel, or 16 features per channel when using DWT. The process is repeated for each channel used to then concatenate them to obtain a single feature vector that represents the EEG signal for each instance.

Classification

The first classifier that we used for comparative purposes was the well-known OC-SVM, which is an unsupervised algorithm that learns a decision function for outlier detection, classifying new data as similar to or different from that of the training set57. The radial basis function (RBF) was used as the kernel and certain important parameters required fitting. As in our previous research12, we found that the models based on OC-SVM are very sensitive to nu and gamma parameters, and they define how well the model learns. The nu parameter is an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors that should be in the interval [0, 1]. Gamma defines how much influence a single training example has. The larger the gamma, the closer other examples must be to be affected and the interval must be greater than 0; normally it is \(1/no\_features\).

This work focuses mainly on the use of LOF, which is a density-based unsupervised outlier detection algorithm that defines a degree of being an outlier by calculating the local deviation of a given data point with respect to its surrounding neighborhood. The score assigned to each data point is called the local outlier factor58. It is based on a concept of local density given by the distance of the k-nearest neighbors. Comparing the local density of a data point with the local densities of its k neighbors, it is possible to identify regions with similar density and outliers, which have lower density. The lower the density of a data point, the more likely it is to be identified as an outlier. We can use brute force, ball tree, or k-d tree algorithms for computing the nearest neighbors.

EEG channel selection

The process for EEG channel selection is critical for the development of a portable low-cost device, and also for analyzing only EEG channels with the relevant information for the discrimination between subjects.

Genetic algorithms (GAs) mimic Darwinian evolution and use biologically inspired operators. Its population is comprised of a set of candidate solutions, each with chromosomes than can be mutated and altered. GAs are normally used to solve complex optimization and search problems59. Our approach is based on the non-dominated sorting genetic algorithm (NSGA)37, which uses a non-dominated sorting ranking selection method to emphasize good candidates and a niche method to maintain stable sub-populations of good points (Pareto-front). NSGA-II solved certain problems related to the computational complexity, non-elitist approach, and the need to specify a sharing parameter to ensure diversity in a population presented in the first version. NSGA-II also reduced the computational cost from \(O(MN^{3})\) to \(O(MN^{2})\), where M is the number of objectives and N the population size. Additionally, the elitist approach was introduced by comparing the current population with the previously found best non-dominated solutions60.

We used NSGA-III, as it has been shown to efficiently solve 2- to 15-objective optimization problems38 and also because of our previous results12. NSGA-III follows the NSGA-II framework but uses a set of predefined reference points that emphasizes population members that are non-dominated, yet close to the supplied set38,61. The predefined set of reference points are used to ensure diversity in the obtained solutions.

We used a systematic approach that places points on a normalized hyper-plane that is equally inclined to all objective axes and has an intersection with each axis. For example, in a three-objective optimization problem, the reference points are created on a triangle with apexes at (1, 0, 0), (0, 1, 0), and (0, 0, 1)61,62.

Algorithm 1 presents the pseudo-code of the NSGA-III main procedure for a particular generation. The parent population \(P_t\) of size N is randomly initialized, and then the tournament selection, crossover and mutation operators are applied to create an offspring population \(Q_t\). Then, both populations are combined and sorted (According to their domination level) and the best N members are selected from the combined population to form the parent population for the next generation. After non-dominated sorting, all acceptable front members and the last front \(F_l\) that could not be completely accepted are saved in a set \(S_t\). Members in \(S_{t}/F_{l}\) are selected for the next generation and the remaining members are selected from \(F_l\). NSGA-III use the supplied reference points \(Z_r\) for selecting a well distributed set of points. Subsequently, orthogonal distance between a member in \(S_t\) and each of the reference lines (Joining the ideal point and a reference point) is computed. The member is then associated with the reference point having the smallest orthogonal distance. Next, the niche count \(\rho \) for each reference point, defined as the number of members in \(S_{t}/F_{l}\) that are associated with the reference point, is computed for further processing. The reference point having the minimum niche count is identified and the member from the last front \(F_l\) that is associated with it is included in the final population. The niche count of the identified reference point is increased by one and the procedure is repeated to fill up population \(P_{t+1}\). This niche count as \(\rho _j\) for the \(j-th\) reference point. A new niche preserving operation is devised by identifying the reference point set \(J_{min} = \{j: argmin_j (\rho _j)\}\) having minimum \(\rho _j\). In case of multiple such reference points, one (\(j^{*}\in J_min\)) is chosen at random. If \(\rho _{j}^{*} = 0\) (meaning that there is no associated \(P_{t+1}\) member to the reference point \(j^*\)), two scenarios can occur. First, there exists one or more members in front \(F_l\) that are already associated with the reference point \(j^*\). In this case, the one having the shortest perpendicular distance from the reference line is added to \(P_{t+1}\). The count \(\rho _{j}^{*}\) is then incremented by one. Second, the front \(F_{l}\) does not have any member associated with the reference point \(j^*\). In this case, the reference point is excluded from further consideration for the current generation. In the event of \(\rho _j^{*} \ge 1\) (Meaning that already one member associated with the reference point exists), a randomly chosen member, if exists, from front \(F_l\) that is associated with the reference point \(F_l\) is added to \(P_{t+1}\). If such a member exists, the count \(\rho _{j}^*\) is incremented by one. After \(\rho _j\) counts are updated, the procedure is repeated for a total of K times to increase the population size of \(P_{t+1}\) to N37,38,61,63.

figure a

Definition of the problem to optimize

After the pre-processing and feature extraction stages, we obtain a set of features for each EEG channel. These features can be used to create a model for each subject that can recognize it and reject the rest of the subjects. Our approach is to create a model for each subject with 80% of the instances and use 20% for testing. This requires that certain important parameters be fitted and that the most relevant EEG channels are selected. Thus, the problem is defined as an optimization problem with three unconstrained objectives: minimize the number of necessary EEG channels and maximize the TAR and TRR. The size of each population in each iteration is defined as 20, which was determined experimentally. The termination criterion for the optimization process is defined by the objective space tolerance, which is defined as 0.0001. This criterion is calculated every 10th generation. If optimization is not achieved, the process stops after a maximum of 300 generations.

We used 64 binary genes in a chromosome to represent the 64 EEG channels, one gene with integer values to select the algorithm (1: Ball tree, 2: k-d tree, 3: Brute force) and another gene with integer values for selecting the number of neighbors (from 1 to 10, which were proposed experimentally), obtaining thus a chromosome of 66 genes. When using OC-SVM in the optimization process, we used the same 64 genes for representing the EEG channels and two genes with decimal values for selecting the nu and gamma parameters, similarly to our previous research12. The chromosome created to represent the candidate channels in the search space and the flowchart of the complete optimization process using LOF models is illustrated in Fig. 12.

As explained in the feature extraction method, we extracted eight features per channel when using EMD, and 16 when using DWT. The features are organized and stored for iterative use, depending on the channels marked as “1” in the chromosomes. For instance, using EMD-based features, if the chromosome consists of only one gene, the classification process will be performed only with eight features from the channel indicated in the chromosome. The entire process is then performed by NSGA-III, as shown in Fig. 12, which starts creating 20 possible candidates for each generation.

The output for each chromosome for each generation is the number of channels used and the obtained TAR and TRR with the subset of channels in the chromosome. The results are returned to NSGA-III to evaluate each chromosome in the current population, and the new generation of chromosomes is created based on the best candidates found. This process is repeated until the termination criterion or the maximum number of generations is reached.

Figure 12
figure 12

Chromosome representation, and flowchart of the optimization process for EEG channel selection using NSGA-III.

We performed the experiments for this study on the NTNU IDUN computing cluster64. The cluster has more than 70 nodes and 90 GPGPUs. Each node contains two Intel Xeon cores and at least 128 GB of main memory and is connected to an Infiniband network. Half of the nodes are equipped with two or more Nvidia Tesla P100 or V100 GPGPUs. Idun’s storage is provided by two storage arrays and a Lustre parallel distributed file system.