Eye-blink artifact removal from single channel EEG with k-means and SSA

In recent years, the usage of portable electroencephalogram (EEG) devices are becoming popular for both clinical and non-clinical applications. In order to provide more comfort to the subject and measure the EEG signals for several hours, these devices usually consists of fewer EEG channels or even with a single EEG channel. However, electrooculogram (EOG) signal, also known as eye-blink artifact, produced by involuntary movement of eyelids, always contaminate the EEG signals. Very few techniques are available to remove these artifacts from single channel EEG and most of these techniques modify the uncontaminated regions of the EEG signal. In this paper, we developed a new framework that combines unsupervised machine learning algorithm (k-means) and singular spectrum analysis (SSA) technique to remove eye blink artifact without modifying actual EEG signal. The novelty of the work lies in the extraction of the eye-blink artifact based on the time-domain features of the EEG signal and the unsupervised machine learning algorithm. The extracted eye-blink artifact is further processed by the SSA method and finally subtracted from the contaminated single channel EEG signal to obtain the corrected EEG signal. Results with synthetic and real EEG signals demonstrate the superiority of the proposed method over the existing methods. Moreover, the frequency based measures [the power spectrum ratio (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma $$\end{document}Γ) and the mean absolute error (MAE)] also show that the proposed method does not modify the uncontaminated regions of the EEG signal while removing the eye-blink artifact.


Results
Constructing synthetic EEG and eye-blink signals. To validate the performance of proposed method, we construct the single channel ground truth EEG signal and the eye-blink artifact as follows: The ground truth or true EEG signal is not available usually. Therefore, for simulation studies, the non-artifact EEG region from a lengthy EEG signal is considered as a ground truth EEG signal. The non-artifact EEG region in a lengthy EEG record, where no eye-blink artifact is present at least for 10 s duration, is manually identified and segmented. With the same procedure, 20 such artifact-free EEG epochs were constructed from six subject's lengthy EEG records. Figure 1a shows 10 s ground truth EEG signal. To construct the ground truth eye-blink artifact and maintain the real morphology of eye-blink artifact we followed the procedure similar to 43 ; first, manually we identified eye-blink artifact region in the real EEG signal and segmented. Next, zeros are padded to the eyeblink segment such that the signal length is equal to 10 s. Thereafter, the MATLAB smooth command is used to remove the EEG remnants resided on the eye-blinks and also to smooth the discontinuity in between the edges of eye-blink segment and the zero line. Three similar eye-blink artifacts from three subjects EEG are constructed. Synthetically constructed eye-blink artifact is shown in Fig. 1b. With 20 EEG and three eye-blink epochs, we have constructed sixty (20 × 3 = 60) synthetically contaminated EEG signals based on the mixing model given in (7) and is shown in Fig. 1c www.nature.com/scientificreports/ Results with synthetic EEG data. The performance of any artifact removal method depends on its parameters. The proposed method has three parameters: window length M, threshold T h and the number of clusters L. In general, the eye-blink time period varies between 100−400 ms . In order to capture the on-set and the off-set events of the eye-blink artifact, we set the window length M to 500 ms = 0.5 × 256 = 128 samples. We analyze the performance of proposed method for four arbitrary thresholds ( T h ) and for various number of clusters L = 2, 4, 6, 8 and 10. This analysis helps to identify the optimal choice of the parameters that results in good performance. Figure 1d,e shows the performance of proposed method for various selections of threshold ( T h ) and clusters (L). After extensive simulation analysis, we identified low RRMSE values for the threshold selection in the range of 1.2 to 1.4 and clusters in the range of 2 to 4. However, we also observed sudden increase in the RRMSE for T h > 1. 4 . It is noticed from the simulation analysis that the threshold ( T h ) acts similar to a cut-off frequency as in classic filters and whereas the number of clusters L will act as the number of decomposition levels as in a wavelet decomposition. Therefore, for better performance, we set the number clusters as L to 4 and the threshold T h to 1.4. Figure 2 shows the key steps to estimate the eye-blink artifact â from the contaminated EEG signal x . In order to estimate the eye-blink artifact â , first, the contaminated EEG signal x is embedded into a matrix X using (1), which results in K signal vectors of length M samples. Next, the four time domain features the energy, the hjorth mobility 47 , the kurtosis and the difference between the maximum and minimum of each column of X are computed. After performing the embedding and the feature extraction steps, the k-means clustering algorithm is applied. As the eye-blink artifact appears as a high amplitude and slow varying component in the contaminated EEG signal, the signal vectors corresponding to the eye-blink and EEG are well separated in the feature space as evident from Fig. 2. The k-means clustering algorithm assigns labels to each feature point in the feature space. This labeled information allows us to identify the cluster to which particular feature point (indirectly the column vector of X ) falls. Using (2), the matrices X 1 & X 2 are derived (here the number of clusters L is set to 2).
Next, uni-variate signals s 1 & s 2 are constructed using (6). The resulting decomposed uni-variate signals s 1 & s 2 are shown in Fig. 2. The fractal dimension of signals s 1 & s 2 are computed to identify the eye-blink artifact. After identifying the eye-blink artifact with pre-set threshold T h , binary eye-blink template is constructed using step 6 of the proposed method (see "Methods" section) and multiplied with the contaminated EEG signal x . With this, the eye-blink component ā mixed in the contaminated EEG signal x is extracted. However, the direct subtraction of this component from x results in zero line at the eye-blink region in the corrected EEG signal (which also means that the EEG components superimposed on the eye-blink artifact are also removed together with the artifact). Therefore, to smoothen the onset and offset regions of eye-blink artifact ā , and to remove the EEG components superimposed on the eye-blink artifact, we employed SSA technique in this paper. We can see a smooth eye-blink artifact in Fig. 2 (zoomed region) that matches with the ground truth eye-blink artifact as indicated with black dotted lines. Finally, the estimated eye-blink artifact â will be subtracted from the contaminated EEG signal to obtain the corrected EEG signal ŝ. Figure 3 depicts the estimated eye-blink artifact ( â ) and the corrected EEG signal ( ŝ ) obtained by all the methods. The RRMSE and CC values presented in Fig. 3b www.nature.com/scientificreports/ eye-blink ( a ) and the EEG signal ( s ) as shown in Fig. 3a. The proposed method has low RRMSE and high CC values as compared to the existing methods. However, we noticed abrupt changes at onset and offset regions of the eye-blink period and non-eyeblink periods in the estimated eye-blink artifacts obtained by the method in 45 and FBSE-EWT methods, as evident from Fig. 3d,e in the first column. Moreover, the partial estimation of eye-blink artifact obtained by the methods in 45 and the FBSE-EWT method can also be noticed in the same. As a result, eye-blink component can still be visualized in the corrected EEG signals obtained by the method in 45 and the FBSE-EWT methods. In contrast, accurate estimation of the eye-blink artifact can be obtained with the proposed method and no artifact component can seen in the corrected EEG signal as shown in Fig. 3f. Even though the EEMD-ICA and SSA-ICA methods estimate the eye-blink component (eye-blink region between 6−8 s ) accurately, however they alter the non-artifact regions, which is not desirable. As the SNR of the contaminated EEG signal varies with the artifact mixing constant p, we have evaluated the performance of the proposed and the existing methods for different p values. The variations of RRMSE and CC values of all methods with respect to p are shown in Fig. 4a,b. It is obvious from these results that the proposed method shows lower RRMSE and higher CC as compared to the existing methods. However, the mean RRMSE and CC values of the performance of proposed method are better compared with the existing methods, as evident from Table 1.
The performance of proposed method in detecting the eye-blink artifact is also studied by varying the feature selection. Figure 5 shows the detection of the eye-blink artifact with respect to the artifact mixing constant p. The detection of eye-blink artifact using the proposed method is evaluated in terms of False positive rate (FPR) and it is expected to be as small as possible. When the amplitude of eye-blink artifact is large i.e., p ≥ 1 , we do not see much improvement in FPR values with four features as compared to However, it is observed from Fig. 5a,b that the use of four time domain features helps in the improvement of the eye-blink artifact detection rate as compared with single and two time domain features for p = 0.5 . We do not seen much difference in FPR values obtained with the number of features 3 and 4 (Fig. 5c). Finally, from this study we observed that the energy features f 1 and f 4 contributes more towards detection of the eye-blink artifact as compared to other feature components. In Fig. 5, the FPR comparison plots for all features vs f 2 , and f 3 have not showed, as the contribution of these components in detecting the eye-blink artifact is minimal.
We have also evaluated the performance with two different clustering algorithms-spectral clustering and agglomerative clustering. Spectral clustering algorithm does not show much improvement whereas an improvement can be seen with agglomerative clustering algorithm as compared to k-mean clustering algorithm. However, the agglomerative clustering algorithm's computational complexity is high and it increases with the number of feature samples as compared to k-means clustering method.
Results with real EEG data. For analysis with real EEG data, sixty 10s EEG epochs were segmented from the lengthy EEG data of twelve subjects such that at least one eye-blink is present in the EEG signal. Figure 6 shows the estimated eye-blink artifact ( â ) and the corrected EEG signal ( ŝ ) by all methods. When EEMD-ICA and SSA-ICA are applied to the real EEG data, the low frequency EEG information is also extracted together with the eye-blink artifact (non-artifact regions in Fig. 6a,b first column). Few spikes and the partial extraction of eye- The key steps for estimating the eye-blink artifact â from the contaminated EEG signal x . Note that fds i represents the fractal dimension of i th signal s i . Here, we have considered only three features for illustration. www.nature.com/scientificreports/ blink component at time period 8 s can be seen in the estimated eye-blink artifact by all the method 45 ( Fig. 6c first column). Similarly, we can see the clipping of the eye-blink component in estimated eye-blink artifact by the FBSE-EWT method ( Fig. 6d first column). In contrast, spikes and partial removal of eye-blink component are absent with the proposed method ( Fig. 6e first column). More importantly, the proposed method removes the eye-blink artifact without altering the EEG information from the non-artifact region of the EEG signal. In general, the ground truth EEG signal is not available to evaluate the performance of artifact removal methods on real EEG data. Therefore, we consider two metrics, the power spectral ratio, Ŵ(f ) and the MAE, to analyze the performance on real EEG signals. Figure 7, shows the average power spectral ratio curves for 60 EEG signals. The shaded region in Fig. 7a-e shows the standard deviation error from mean. It is clear from the power spectral ratio plots shown in Figure 7a-d that the existing methods alters the β band components of the EEG signal (i.e. 12−30 Hz ). Compared with the other methods, the method in 45 shows comparable performance with the proposed method. Therefore, for comparative analysis, the mean power spectral ratio curves for proposed [46] Method in [45]  www.nature.com/scientificreports/ Method in [45] Proposed FBSE-EWT SSA-ICA EEMD-ICA [46]    www.nature.com/scientificreports/ and method 45 are shown in Fig. 7f. We have also computed the MAE for different bands to see the affect the artifact removal method in each EEG band. However, average MAE values obtained by the proposed method for the β band are more good as compared to existing methods, as evident from the power spectral ratio plots shown in Fig. 7 and the Table 2.

Discussion
The performance of the proposed method in comparison with other existing methods is analyzed in terms of RRMSE and the CC values computed with respect to the ground truth eye-blink artifact (are shown in Fig. 4a,b). In all conditions i.e. p = 0.5, 0.75, 1, 1.25 and 1.5, the mean RRMSE and CC values of the proposed method are different from the RRMSE and CC valuable of EEMD-ICA, SSA-ICA and the method 45 . Comparing the estimated eye-blink artifacts from synthetic and real EEG signals, the existing method 45 fails to extract the initial and final change-over points (inflections) of the eye-blink artifact. The probable reason for this is that, the eye-blink artifact is constructed by the straight lines and the intersection of these lines are fitted with zero slope to EEG trace. Therefore, partial separation in change-over points of eye-blink can be seen in the corrected EEG signal (please refer to second column in Figs. 3d and 6c). Although there is no significant difference in the CC values obtained for the proposed and the FBSE-EWT methods for the conditions p > 1 , it can be clearly seen that the peaks of eye-blink component were clipped-off in the estimated eye-blink artifact obtained by the FBSE-EWT method (see the first column in Figs. 3e and 6d). As a result, we can see abrupt changes in the corrected EEG  www.nature.com/scientificreports/ signal obtained by the FBSE-EWT method (see the second column in Figs. 3e and 6d). These changes will lead to modification in the spectrum of the corrected EEG signal. In order to show the affect of artifact removal methods in the spectral domain, we computed the power spectrum ratio and the MAE. From the power spectral ratio plots it can be seen that all the methods removed the eye-blink artifact efficiently. However, from the power spectrum ratio plots we can notice that most of the existing methods alters the β band ( 12−30 Hz ) of the corrected EEG signal (Fig. 7a-d). In contrast, the proposed method does not alter the original EEG components and as a result the power spectrum ratio value is 1 in this frequency band (Fig. 7e). Hence, the MAE values obtained by proposed method in the β band are more better as compared to the MAE values obtained by other methods.
From this study we observe that most of the existing methods alters the EEG components from the nonartifact regions of the EEG signal and as a result the spectrum of the corrected EEG signal is altered. In contrast to the existing methods, the proposed method exploited the difference in the time-domain features of the EEG signal (evident from Fig. 2) to remove the eye-blink artifact without altering the EEG components. Moreover, we also studied the eye-blink artifact detection rate in terms of FPR and found that the energy based features contributed more towards detection of the eye-blink artifact as compared to other feature components. As evident from the simulation studies, the performance of the proposed method is sensitive to the threshold T h selection. A careful selection is required and this is also the limitation of the proposed method. A statistical based approach to address the limitation of proposed method with respect to the threshold selection will be the topic of our future research. In this paper, we mainly focused on removing the eye-blink artifact using a simple unsupervised clustering algorithm. Hence, in this study, we have used k-means clustering algorithm over other methods as it is simple and computationally efficient unsupervised clustering algorithm. However, based on the requirement of the application, other clustering algorithms can also be employed in the proposed method.

Methods
Proposed method. The overview of the single channel EEG artifact removal process employed in this paper is shown in Fig. 8. The novelty of the work lies in decomposing the given single channel EEG into L signals using the EEG features and the k-mean clustering algorithm. After decomposing the given single channel EEG signal into components, the fractal dimension (FD) of each decomposed component is computed. As the eyeblink artifact is a slow varying component, the FD is expected to be a small for the signals corresponding to the eye-blink artifact and is high for the signals corresponding to EEG. The eye-blink artifact component will be constructed by adding the decomposed components whose FD values are less than the pre-set threshold. Finally, the resulted eye-blink artifact component is further processed by SSA and subtracted from the contaminated EEG signal to obtain the corrected EEG signal. The proposed method relies on the following key steps to remove eye-blink artifact from single channel EEG signal. www.nature.com/scientificreports/ Step-1, 2 and 3. Consider x = [x(1), x(2), . . . , x(N)] = s + pa is an N sampled contaminated single channel EEG signal. The constant p represents the contribution of eye-blink artifact in the EEG signal, which we call it as artifact mixing constant. The signal vectors s and a represents the ground truth EEG signal and the eye-blink artifact, respectively. In this step, the given single channel EEG signal x is mapped into multivariate data matrix as in (1).
where M is the window length and K = N − M + 1 . In x j is the jth, j = 1, 2, . . . , K column vector of X . Note that in (1), we assumed the artifact mixing constant p = 1 for simple understanding of the proposed method.
In step-2, four time domain features the energy ( f 1 ), the hjorth mobility 47 ( f 2 ), the kurtosis ( f 3 ), and the difference in the maximum and minimum values of column of ] is j th column vector of feature matrix F . Usually, the eye-blink artifact appears in the EEG signal as a high amplitude and slow varying component. The selected four features are good in extracting such inherent property of the eye-blink artifact.
In step-3, an unsupervised machine learning algorithm, i.e. k-means 48 clustering algorithm, is applied on the extracted feature data matrix F with L number of clusters. The k-means clustering algorithm will provide the labels to each feature vector f j . The labeled information (clustering information) convey that to which cluster the feature vector f j has fallen into.
Step-4. Based on the information obtained from the k-means clustering algorithm, first, multivariate data matrix X i , i = 1, 2, . . . , L is constructed as follows. The j th column vector x j of X is placed (copied) in the j th column of matrix X i when the j th feature vector f j of matrix X belongs to i th cluster C i ; otherwise a vector with zeros is placed (copied) in that column of matrix X i . This operation is shown with a dotted line from the output the embedding block in Fig. 8 and can be represented mathematically as where x j i (j = 1, 2, . . . , K) , is the j th column vector of X i . Next, each X i matrix corresponding to each cluster is mapped into uni-variate signal using diagonal averaging step of SSA (using (6)). This results in L decomposed components, say s 1 ,s 2 , ..,s L from the contaminated EEG signal x . However, a criteria has to be selected to identify the signals corresponding to the eye-blink artifact from the L number of signals. A signal complexity measure, also called fractal dimension (FD) 49 , is successfully applied 50 to identify the eye-blink artifact component from the estimated sources.
Step-5, 6 and 7. Hence, in step-5, we have computed FD for all L the decomposed components. Since the eyeblink artifact is characterized as high amplitude and slow varying component, the FD is expected to be a lower value for eye-blink artifact when compared to EEG signals. Finally, the decomposed components whose FD is less than or equal to the pre-set threshold ( T h ) are added together. This results an eye-blink artifact component ā r . It is clear from (2) that some of the columns in X i are expected to be zeros. Applying diagonal averaging step of SSA on X i , the amplitude levels in the reconstructed eye-blink signal are effected. In other words, there will be amplitude reduction in the eye-blink regions due to this diagonal averaging operation, as few of the columns of the matrix X i are zeros. Therefore, the direct subtraction of ā r component from the contaminated EEG signal x will result the partial separation of eye-blink artifact.
To overcome such problem, in Step-6, each positive and negative valued samples in the obtained eye-blink artifact ā r are replaced to one. This results in the formation of a binary artifact template ā b , whose sample values are ones in the artifact region and zeros in non-artifact region. In Step-7, the obtained binary eye-blink artifact template is multiplied with the input signal x . Thus it results an eye-blink artifact component ā that is present in the contaminated EEG signal x , with no amplitude changes in the artifact region.
Step-8 and 9. However, the obtained artifact component ā has some EEG remnants on the eye-blinks (Fig. 8). The direct subtraction of this component from the contaminated EEG signal results in the loss of valuable EEG information and a zero line can be seen in the corrected EEG signal. Therefore, the EEG remnants should be filtered (indirectly we are retaining the EEG components) before it is subtracted from the contaminated EEG signal x. In Step-8 of proposed method, we employ SSA (discussed in the following subsection) to remove the EEG remnants that reside on the eye-blink artifact component ā . SSA can remove such remnants from eye-blink regions, as it relies on the co-variance of the data to separate the components. Therefore, the eye-blink artifact component ā is given as input signal to SSA and the EEG remnants are filtered out results an estimated eye-blink component â . Finally, in Step-9, the estimated eye-blink artifact signal â will be subtracted from the contaminated EEG signal to obtain the corrected EEG signal ŝ.
In the proposed method, the threshold T h plays an important role in identifying the eye-blink artifact component from the decomposed components after clustering. However, when there is no eye-blink artifact present in the EEG signal, it is obvious that all decomposed component belongs to EEG signal and the fractal dimension (FD) of these components is above the pre-set threshold T h . When the FD of each component is above x(2) ...... ..... x(K) x (2) x (  www.nature.com/scientificreports/ the threshold T h , then step 6, 7, and 8 of the proposed method are not performed and the estimated eye-blink artifact â is set to zero. Finally, the corrected EEG ŝ = x , indicates that the given EEG signal is not contaminated by the eye-blink artifact.

Singular spectrum analysis (SSA).
In this subsection, we briefly discuss about the SSA technique, as the proposed method development relies on the key steps of SSA technique. It is a data-driven subspace based technique, widely used to extract the low frequency trends and oscillating components from the noisy data 39,51 .
Embedding. Consider ā = [ā(1),ā(2), . . . ,ā(N)] = a + b , is a N sampled signal with noise. The vectors a and b represents the signal of interest and the noise component, respectively. In embedding step of SSA, the given uni-variate time-series signal ā is mapped into multi-variate data as shown below: where M is the window length and K = N − M + 1.
Decomposition. In the next step, the singular value decomposition (SVD) of Ā will be performed to decompose the trajectory matrix Ā into Ā 1 ,Ā 1 , . . . ,Ā M . The M trajectory matrices can be obtained as follow: The SVD of a rectangular matrix of size M × K can be factored as N A = UDV T , where U and V represents orthogonal matrices, whose columns are eigenvectors of co-variance matrix C =ĀĀ T and C =Ā TĀ , respectively; and the D is a rectangular diagonal matrix, whose elements are singular values ( σ ). Let the 1 , 2 , . . . , M and u 1 , u 2 , . . . , u M represent the eigenvalues and the eigenvectors of the co-variance matrix C =ĀĀ T . Assume that the eigenvalues are sorted in the descending order based on their strengths (amplitudes), i.e. 1 ≥ 2 ≥, . . . , ≥ M ≥ 0 . Then, the i th trajectory matrix Ā i can be represented as (4), then the i th trajectory matrix Ā i is given by The term u i u T i in (5) form subspace for the i th component in the given signal ā.
Grouping. The grouping step involves identifying the subspace for the desired signal (smooth eye-blink artifact in this present study). In other words, it identifies the appropriate eigenvectors (basis functions) to construct the desired signal. In the proposed method, we employ eigenvalue (spectrum) based grouping criteria to identify the desired eigenvector. Consider that the desired signal subspace can be constructed with d number of eigenvectors. To identify the d most significant eigenvectors (basis functions of desired signal), eigenvalue ratio is r (i) is computed by dividing each eigenvalue by the sum of total eigenvalues and is defined as r (i) = i / M l=1 l , i = 1, 2, . . . , M . The indices of eigenvalues whose ratio greater than preset threshold T SSA (set to 0.01) are identified. The number of eigenvalues whose ratio is greater than T SSA is denoted as d. Finally, using (5) the d number of Ā k , k = 1, 2, . . . , d were computed and summed together. This results interested signal trajectory matrix in the form of Â = d k=1Ā k .
Diagonal averaging. However, the obtained trajectory matrix Â do not hold the hankel structure to reconstruct uni-variate signal from it. Therefore, in the diagonal averaging (it's a reverse process to the embedding step) step, the anti-diagonal elements of Â are replaced with their mean value as defined in (6). Lets consider that â(i, j) represents the i th row and j th column element of matrix Â , then the desired signal can be estimated as follows It is clear from (6) that the n th sample of estimated eye-blink artifact â(n) is an average of anti diagonal elements â(i, n − i + 1)| i=1,2...n , (for n = 2 , â(2) = {â(1, 2) +â(2, 1)}/2 ). Finally, the corrected EEG signal ŝ is obtained by subtracting the estimated eye-blink artifact â from the contaminated EEG signal x.
Performance measures. In this section, we have defined some measures to validate the performance of the proposed method over the existing methods. However, in this paper, we have considered the following EEG mixing model for analysis. Let the vectors s and a represent the ground truth EEG and the eye-blink artifact, respectively. Then the contaminated EEG signal x is defined as  www.nature.com/scientificreports/ where p is constant that represents the contribution of eye-blink artifact in the contaminated EEG signal (known as artifact mixing constant). When p > 1 (eye-blink artifact is more predominant) the signal to noise ratio (SNR) of contaminated EEG is low and when it is < 1 SNR is high. The performance of the proposed artifact removal method is evaluated on both synthetic and real EEG signals. In order to compare the performance of proposed artifact removal method with the existing methods on synthetic EEG signals, we have defined the following two performance measures: relative root mean square error (RRMSE) and correlation coefficient (CC). When the artifact removal method accurately estimates the eye blink artifact, the RRMSE and CC values between the ground truth and the estimated eye-blink artifacts are expected to be 0 and 1, respectively.
For simulation results with real EEG data, as neither ground truth EEG and nor the eye-blink artifact are usually available, we have defined two measures, power spectrum ratio Ŵ(f ) and mean absolute error (MAE) to evaluate the performance of all the methods. When the eye-blink artifact is perfectly removed from the contaminated EEG signal, Ŵ(f ) and the MAE are expected to be 1 and 0, respectively in the β band ( 12−30 Hz ) of EEG signal. The power spectral ratio and MAE measures can objectively quantify if a particular method filters the artifact without altering the actual EEG signal.
Relative root mean square error (RRMSE). Considering the ground truth or true eye-blink artifact as a and the estimated eye-blink artifact obtained by the artifact removal method as â , the RRMSE can be defined as where, N is the number of samples in the signal. When an artifact removal method accurately estimates the eye-blink artifact from the EEG signal, the difference between a and â (numerator term) will be small and hence RRMSE value is expected to be small for a good artifact removal method.
Correlation coefficient (CC). CC is a statistical based performance measure, represents the correlation between two signals. The CC between the ground truth eye-blink a and the estimated eye-blink â components is defined as The correlation coefficient between the ground truth and the estimated eye-blink artifact is expected to be 1 when an artifact removal method perfectly estimated the eye-blink artifact.
Power spectrum ratio ( Ŵ). It is a plot describing the ratio of the power spectrum of the corrected EEG signal to the power spectrum of the contaminated EEG signal and is used as metric to evaluate the performance of the proposed technique on real EEG signals 54 . The power spectrum ratio of the corrected and the contaminated EEG signals at each frequency is defined as where, pˆs and p x are represents the power spectrums of the corrected and the contaminated EEG signals, respectively. In general, the energy of eye-blink artifact lies in the band between 0 and 12 Hz. However, when the eyeblink artifact is removed, the power spectrum ratio of corrected EEG signal to the contaminated EEG signal is expected to be less than 1 in the frequency band 0-12 Hz. The low value of Ŵ(f ) in this band doesn't mean the EEG components in 0-12 Hz band are removed, rather it is due to the elimination of high energy component (eye-blink artifact) in the corrected EEG signal. Whereas it is equal to 1 in the β band (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), this is due to the fact that energy of eye-blink artifact in this band is very small.

Mean absolute error (MAE).
Consider p x (f ) and pˆs(f ) are the power spectrums of the contaminated and the corrected EEG signals, respectively. Then the MAE between the spectrums of both the contaminated and the corrected EEG signals is defined as where j and l represent the indices of the start and end frequencies of a specific band. The MAE is computed for the following three bands 1−8 , 8−12 and 12−30 Hz to understand affect of artifact removal methods on corrected EEG signals. When there is no loss of EEG components by an artifact removal technique, the MAE value in a band is expected close to zero (in β band), which means represents better performance of the artifact removal technique. www.nature.com/scientificreports/ EEG data and pre-processing. To assess the performance of proposed and the existing methods, in this work, we have considered event related potential (ERP) BCI data (ERP-BCI) for both synthetic and real EEG data analysis. The EEG data is obtained from a publicly available database 55,56 and there is no direct involvement of humans in this research study. The ERP-BCI EEG data 55,56 is collected from 12 subjects and each subject is asked to spell 20 characters (which results 20 trails) using traditional matrix speller. The EEG signals (64 channels) are recorded using BioSemi Active Two EEG system with sampling frequency 2048 Hz. For this study, we have considered pre-frontal EEG channel Fp 1 . The EEG signals were down sampled to 256 Hz and a band-pass filter with cut-off frequencies 1−30 Hz is employed to remove dc and high frequency components from the data. More details about the EEG data used in this study are available in 55,56 .