Improved EEG-based emotion recognition through information enhancement in connectivity feature map

Electroencephalography (EEG), despite its inherited complexity, is a preferable brain signal for automatic human emotion recognition (ER), which is a challenging machine learning task with emerging applications. In any automatic ER, machine learning (ML) models classify emotions using the extracted features from the EEG signals, and therefore, such feature extraction is a crucial part of ER process. Recently, EEG channel connectivity features have been widely used in ER, where Pearson correlation coefficient (PCC), mutual information (MI), phase-locking value (PLV), and transfer entropy (TE) are well-known methods for connectivity feature map (CFM) construction. CFMs are typically formed in a two-dimensional configuration using the signals from two EEG channels, and such two-dimensional CFMs are usually symmetric and hold redundant information. This study proposes the construction of a more informative CFM that can lead to better ER. Specifically, the proposed innovative technique intelligently combines CFMs’ measures of two different individual methods, and its outcomes are more informative as a fused CFM. Such CFM fusion does not incur additional computational costs in training the ML model. In this study, fused CFMs are constructed by combining every pair of methods from PCC, PLV, MI, and TE; and the resulting fused CFMs PCC + PLV, PCC + MI, PCC + TE, PLV + MI, PLV + TE, and MI + TE are used to classify emotion by convolutional neural network. Rigorous experiments on the DEAP benchmark EEG dataset show that the proposed CFMs deliver better ER performances than CFM with a single connectivity method (e.g., PCC). At a glance, PLV + MI-based ER is shown to be the most promising one as it outperforms the other methods.


Related works
EEG has been well-studied to investigate how the brain reacts to emotional experiences.Typically, ML or DL methods are used for ER using extracted features from EEG signals.Several ER studies are available using different feature extraction and classification techniques.The features broadly fall under the categories of individual channel features and connectivity features.The following subsections review prominent ER studies categorically based on the EEG features' type.
ER using individual channel features.Individual channels are considered as independent signal sources in this category, and the characteristic(s) of signal from a particular channel are exposed as feature value(s).
Support vector machine (SVM) was used by Liu et al. 17 to classify discrete emotional states (e.g., happiness, sadness) from PSD features.The PSD features were extracted from six frequency bands for each EEG channel; thus, 6 (band) × n (channel) features were included in each feature set.Apicella et al. 32 collected EEG signals through an 8-channel dry electrode cap and classified Valence using neural network (NN) and k-nearest neighbors (KNN).Pane et al. 30 considered hybrid features of different time domain and frequency domain features to classify emotion using random forest (RF).Jagodnik et al. 28 extracted different time domain (e.g., mean), frequency domain (i.e., frequency band energies of sub-bands, e.g., Alpha), and nonlinear dynamic (e.g., Entropy) features from EEG; selected features using MI with sequential forward floating selection; finally, classified emotion using SVM, KNN, and RF.Statistical features have been extracted in both time and frequency domains in the study 31 , where 364 features were extracted for each EEG segment, and then feature selection was applied to use the features with the least square SVM and Naive Bayesian (NB) classifier.Subasi et al. 33 utilized a tunable Q wavelet transform in the feature extraction step and rotation forest ensemble classifier with different classifiers such as KNN, SVM, NN, and RF.Goshvarpour and Goshvarpour 34 constructed Poincare's plot (a 2D representation of signal) of EEG signals, extracted features, and classified emotion using SVM, KNN, and NB.In another recent study 35 , they investigated Lemniscate of Bernoulli's Map (which belongs to the family of chaotic maps) construction from EEG signal and classified emotion using KNN and SVM.
While the aforementioned studies used conventional ML models, DL methods were used in recent studies for emotion analysis, as such methods extract features through their embedded learning process.The differential entropy (DE) feature represented in the 2D map was employed with CNN by Li et al. 18 to classify three types of emotions (positive, neutral, and negative).Moctezuma et al. used 2D CNN to identify emotions according to Valence and Arousal scales from EEG channels selected by a multi-objective evolutionary algorithm 36 .In the study 37 , a combined CNN + SVM model was used to classify emotions from different time domain features (e.g., where fractal dimension, Hjorth parameters, peak-to-peak, and the root-mean-square) and frequency domain features (e.g., band power, DE, PSD).The study 37 created feature maps based on topographic (called TOPO-FM) and holographic (called HOLO-FM) representations of EEG signal characteristics.Meanwhile, Li et al. 38 designed a hybrid model incorporating recurrent NN (RNN) and CNN for emotion classification in the Valence-Arousal plane by using TOPO-FM of the PSDs of the EEG signals.Liu et al. 19 used statistical characteristics (i.e., variance, mean, kurtosis, and skewness) of the EEG signal as time domain features and used a combined CNN + sparse autoencoder (SAE) + deep NN (DNN) model to classify emotions.The statistical features were extracted from four frequency bands and represented in a 2D map band-wise individually; thus, a 3D map was constructed concatenating features from all four frequency bands.Yuvaraj et al. 39 also constructed a 3D map staking 2D spatiotemporal representation of EEG signals and then employed a 3D form of CNN for emotion recognition.
CNN-XGBoost fusion method was applied by Khan et al. 40 on signal spectrogram images for recognizing three dimensions of emotion, namely Arousal (calm or excitement), Valence (positive or negative feeling), and Dominance (without control or empowered).Moon et al. 21used PSD features, which were extracted from ten frequency bands, and SVM and CNN were used as classifiers.ER was performed using simple recurrent units network and ensemble learning by Wei et al. 41 , where the mean absolute value method is employed to extract the time-domain features; the PSD approach is adopted to obtain the characteristics of EEG signals in the frequency domain; and Fractal dimension and DE features were used for nonlinear analysis of the EEG signals.Hurst, sample entropy, Hjorth complexity, vector autoregression, wavelet entropy, spectral entropy, and PSD features were extracted by 42 , where DNN was employed to classify emotions.Song et al. 43 employed the dynamical graph CNN to classify emotion from five different features, including DE and PSD.Dynamical graph CNN was also used by Asadzadeh et al. 44 , where each emotion was modeled by mapping from scalp sensors to brain sources using a Bernoulli-Laplace-based Bayesian model.ER using connectivity feature.EEG connectivity feature is mainly based on connections in brain regions.
It is widely accepted that the brain's regions are connected by a network, and the interactions between the network's nodes can be used to interpret brain activity.Thus, emotion analysis seems beneficial in measuring the relationship between several brain areas, and several existing ER studies have revealed the effectiveness of CFM with connectivity features.Existing ER methods using CFMs are mostly considered connectivity features in 2D form.A 2D CFM may be an n × n feature matrix, where n is the number of EEG channels.Gao et al. 7 used two connectivity features named Granger Causality (GC) and TE with three classifiers (i.e., SVM, RF, and decision tree) to classify emotional states.GC and TE features were firstly represented in 2D CFMs individually with different sizes, and then applying the histogram of the oriented gradient method, 1D feature vectors were created from the 2D CFMs.
Chen et al. 6 used three connectivity methods named PCC, PLV, and MI for emotional states classification based on Valence and Arousal scales by SVM.Wang et al. 24 also used SVM for emotion classification, where Normalized MI (NMI) was used for connectivity feature extraction.Khosrowabadi et al. 45 used Phase Slope Index, Directed Transfer Function (DTF), and Generalized Partial Direct Coherence for connectivity features and considered KNN and SVM as classifiers.Petrantonakis and Hadjileontiadis 46 used higher order crossings and XCOR to extract features and SVM as a classifier.Arnau-González et al. 26 combined the MI feature with spectral power, used a feature selection approach combining Welch's t-test with principal component analysis (PCA), and classified emotions using NB and SVM.
Many existing studies considered CNN and other DL methods to classify ER from CFMs.Bagherzadeh et al. 47,48 used PDC and dDTF to extract connectivity features and classify emotions using pre-trained CNN models.Chao et al. 49  A few studies considered PCC with other methods and represented connectivity features in 3D maps.Moon et al. 21used PCC, PLV, and PLI to construct CFMs from ten frequency bands and considered SVM and CNN classifiers.The three types of features were represented in a 3D map individually whose size was n × n × 10, where n is the number of EEG channels, and 10 is the number of frequency bands.Liu et al. 19 used PCC for feature extraction from four frequency bands for n × n × 4 sized CFM, and then they used a combined CNN + SAE + DNN model to classify emotions.The studies 19,21 revealed that the connectivity features improve the performance over the individual channel feature.

EEG-based emotion recognition through information enhancement in CFM
It is closely observed from the existing methods that CFM, by a particular connectivity method, is mainly a symmetric 2D matrix having replicate feature values in upper and lower triangles.Such 2D CFM is suitable to place as input of CNN as inherited convolutional operation with the 2D kernel of CNN is its most powerful feature.Therefore, most of the existing studies (such as 14,[19][20][21] ) used to train CNN with produced 2D CFM with redundant feature values.According to our knowledge, the study 16 considered only the upper triangle of 2D CFM that minimizes redundancy, but they reformed the triangle values to 2D matrix form to make it suitable for CNN.Redundant feature values are ineffective in improving the performance of any ML/DL model (e.g., CNN), and more variant but relative information is suitable to improve the model's performance.As CNN prefers 2D-sized CFM, the information enhancement in 2D CFM is a key issue for better performance by CNN which has been explored and managed through an innovative technique in this study.
The construction of improved CFM (called fused CFM) for harmonizing relatively more extensive connectivity feature values from EEG signals is the primary issue of the study to develop a well-performed EEG-based ER.CNN is adopted to classify emotion from the fused CFMs.Considering preprocessing of EEG data as a standard step of ER, Fig. 1 demonstrates the proposed ER system.There are four major steps: preprocessing the EEG signals, two CFMs (called base CFMs) construction using two different connectivity methods, fused CFM construction from these two base CFMs, and classification of emotions from the fused CFMs by CNN.The following subsections describe these steps of ER system briefly.
Benchmark dataset and preprocessing.Database for Emotion Analysis using Physiological Signals (DEAP) 51 , one of the largest EEG datasets for ER, was considered to evaluate the performance of the proposed ER system.It includes EEG and peripheral physiological signals of 32 subjects (16 males and 16 females), where 40 emotional music videos were used as stimuli.Additionally, the database has subjective scores that characterize the emotional states brought on by seeing the movies in terms of their levels of Valence, Arousal, Liking, and Dominance.The DEAP database uses the BioSemi ActiveTwo system to record data.The EEG electrodes are placed according to the 10/20 international standard.This study employed preprocessed EEG signals from the database, which was downsampled to 128 Hz, EOG artifacts were removed, and a band-pass frequency filter with a range of 4.0-45.0Hz was used.There were 40 channels total, of which 32 were for EEG signals and the remaining channels for peripheral physiological inputs.
In the DEAP dataset, the length of the signal was 63 s: the first 3 s of data were the pre-trial baseline, which was removed, and the last 60 s of data were processed for this study.To increase samples for training, the EEG signals were segmented.An ideal segmentation time window size is 3-12 and 3-10 s, which preserves the key information of Valence and Arousal levels, respectively, as demonstrated by Candra et al. 52 .For this experiment, EEG signals were segmented using an 8 s sliding time window with an overlap of 4 s.Thus, 14 segments were obtained from a 60 s trial, and the total segments for 32 participants were 14 × 32 (participants) × 40 (trial) = 17,920; those are the samples to construct CFM.It is reported that emotion is highly related to Gamma and Beta sub-bands 16 .Only the Gamma sub-band was considered in this study for final evaluation.EEGLAB 53 , an open-source toolbox, was used to extract sub-bands from the EEG signal.Among the four quality levels available in the dataset, Valence and Arousal are the two well-studied scales which were chosen for ER in this study.In the dataset, each of the Valence and Arousal values ranges from 1 (low) to 9 (high).The scales were divided into two parts for ER task as binary classification.Similar to the work in 16 , Valence or Arousal is considered as high for values above 4.5 and low for less than or equal to 4.5.At a glance, Valence and Arousal classifications must be performed through two independent binary classification tasks.By combining Valence and Arousal, human emotions (e.g., Angry, Happy, Sad) can be expressed; often, these are visualized using Russell's circumplex model of emotions 54 .

Connectivity feature map (CFM) construction and fusing CFM.
The feature extraction technique transforms inputs to new dimensions, which are different (linear, nonlinear, directed, etc.) combinations of the inputs.The strength of connectivity between two electrodes reflects an interaction between two cortical areas during one experiment.This interaction could be a direct correlation or inverse correlation, synchronization, or asynchronization, depending on what aspects are investigated.Relationships vary depending on the connectivity types as well.Four diverse connectivity methods are chosen for this study: PCC, PLV, MI, and TE.Among the selected methods, PCC is a linear functional connectivity method, PLV and MI are nonlinear functional connectivity methods, and TE is a nonlinear effective connectivity method.Following subsections briefly describe CFM construction using the four methods and improved CFM construction fusing base CFMs.

CFM construction using individual methods. Pearson correlation coefficient (PCC) measures the linear correlation between two signals X and Y , which can be calculated as
where n is the sample size, X i , Y i are the individual sample points indexed with i .The value of PCC ranges from − 1 to 1. (− 1): complete linear inverse correlation between the two signals, (0): no linear interdependence, (1): complete linear direct correlation between the two signals.
Phase-locking value (PLV) describes the phase synchronization between two signals, which is calculated by averaging the absolute phase differences as follows-Here, ϕ t is the phase of the signal at time t, X and Y denote two electrodes, T is the time length of the signal.The value of PLV ranges from 0 to 1, indicating that the two signals are either perfectly independent or perfectly synchronized, respectively.
Mutual Information (MI) is an information theoretic approach to measuring shared information between two variables.The amount of information about one random variable that may be learned from observing another is measured as MI.The following is the definition of MI between two random variables, X and Y: In this case, H stands for Shannon entropy 55 .For calculating the probability that is required to calculate Entropy, the fixed bin histogram approach was followed.The number of bins selected for all the calculations is 10.The marginal entropies of the two variables X and Y are H(X) and H(Y ) , respectively, and their combined Entropy is H(X, Y ) .MI is symmetric and nonnegative.The range of MI's value is: 0 ≤ MI(X, Y ) < ∞ .If MI(X, Y ) is equal to 0, then X and Y are independent.If MI(X, Y ) is greater than 0, then X and Y are dependent.
The transfer entropy (TE) measures the directed flow of information from a time series or signal Y to another signal X .In other words, it describes the gain obtained by knowing Y for the prediction of X.
If w is the future of X, i.e.,X t+h , transfer Entropy T Y →X can be computed as a combination of entropies: The ranges of TE value are 0≤ TE Y →X < ∞ .If the TE value is equal to 0, then there is no causal relationship between the time series.TE value greater than 0 indicates that a causal relationship exists between them. (1) In the case of CFM, the variables are signals from individual EEG channels.The connectivity features were calculated for every pair (X, Y ) of EEG channels.Therefore, if there are N channels, the number of obtained features is N(N − 1)/2 for undirected connectivity (e.g., PCC) and N(N − 1) for directed connectivity (e.g., TE).The connectivity features for all channel pairs can be represented in a matrix, and Fig. 2 shows the heatmap representation of sample CFMs constructed with individual methods.The element of the matrix at the position (X, Y ) indicates the connectivity between the EEG signals obtained from the X th and Y th channels.The values of location (X, X) or (Y , Y ) were set to zero, as these are not information between two different channels.If there are N channels, then every feature map has N rows and N columns.As there are 32 channels in the DEAP dataset, thus every feature map has 32 rows and 32 columns.Figure 2a shows a sample CFM constructed with PCC, which indicates the correlation between signals collected from two EEG channels.More specifically (as an example), the higher value of the matrix at position (2, 4) indicates that the signals collected from channel 2 and channel 4 are highly correlated, while the lower value of the matrix at position (2, 3) indicates that the signals collected from channel 2 and channel 3 are inversely correlated.Similarly, phase synchronization, dependency, and causal relationship between two signals are indicated by Fig. 2b,c,d respectively.It is observed from Fig. 2 that the elements of a matrix at position (X, Y ) and (Y , X) are the same (i.e., CFMs are symmetric) for functional connectivity methods PCC, PLV, and MI.However, these are not the same (i.e., asymmetric CFM) for effective connectivity method TE.Another important observation from the figure is that value ranges are different in different CFMs due to the inherited properties of individual connectivity methods.Among the four individual methods, CFM using PCC holds a large variation in their values, and it is -0.99 to 0.98 in the sample presented  3b,c,d,e,f, respectively.It is already discussed that CFM constructed with TE is asymmetric; therefore, it is a matter of choice to select between upper triangular or lower triangular to fuse with the other three methods (i.e., PCC, PLV, and MI).Since the variation between upper and lower triangular TE CFM is not much, only one is considered to combine with the other and is shown in Fig. 3.The information enhancement without enlarging size is the main significance in CFM fusion in Fig. 3 over the CFMs with individual methods of Fig. 2. The CFMs in Fig. 2 hold redundant information completely or partially.CFMs with PCC, PLV, and MI are symmetric and provide complete redundancy of individual values, whereas CFM with TE also holds similar values in upper and lower portions and hold partial redundancy.While redundant information only increases the computation burden in ML without any benefit in decision making, fusing CFM is beneficial for ML as its upper and lower portions were managed from two different connectivity methods and enhanced information in CFM.While the sizes of fused CFMs (in Fig. 3) are the same as those of individual methods (in Fig. 2), fusing CFM is a cost-effective as well as efficient information enhancement for ML.
Differences in value ranges (i.e., the difference between the highest and the lowest range) in CFMs by individual methods (e.g., PCC, PLV) expose diversity in CFMs constructed by combing individual methods.It is already observed from CFMs of individual methods (in Fig. 2) that values for PCC hold large variations, then PLV and MI, and variation for TE is the lowest.Therefore, when PCC combines with another method, resulted CFM shows a large variation in the value ranges.Among the six fused CFMs, PCC + MI shows the highest values variability, and PLV + TE shows the lowest variability as the CFMs presented in Fig. 3.It is notable that the CFM figures are colorized with relative values (i.e., the lowest one is blue, and the highest one is red) the colors for values for individual methods (in Fig. 2) are changed in the combing cases (in Fig. 3).Finally, CFMs with combing two individual methods enhanced the data value variation at a glance, which is also an element to enhance the performance of ML.

Emotion classification using convolutional neural network (CNN).
Among different DL methods, CNN is the most successful classifier for two-dimensional (2D) data and can implicitly extract relevant features 56,57 .Since the constructed CFMs are in 2D, CNN was chosen as a suitable classifier.In general, a CNN architecture consists of an input layer, several convolutional-subsampling layers, a flatten layer, a fully connected layer, and an output layer.The first operation of a CNN is convolution performed on the input (matrix, image, or map) with its kernel, which generates a new convolved matrix.Preceding subsampling operation will downsize the convolved matrix with important features.After one or more convolutional-subsampling operations through a fully connected dense layer, the output layer categorizes the given 2D matrix as input of the CNN.The general description of CNN and its operations are available in existing studies, where CNN and its architectural issues are the primary concern 56,58 .
Figure 4 shows the CNN architecture to classify emotions from 2D CFMs; such architecture has been used in recent EEG-based studies 9 .Three convolutional layers, two max-pooling layers, flatten layer, a dense layer, and an output layer make up the CNN architecture employed in this study.In the figure, the size of generated 2D shape and the number of shapes are marked for each convolutional and pooling layer.Every convolution layer used kernels of size 3 × 3, and the stride was set to 1. Rectified linear unit (ReLU) was used as an activation function.The numbers of filters were 32, 64, and 128 for the 1st, 2nd, and 3rd convolution layers, respectively.The same convolution (padding = 1) is used for all the convolution layers to preserve the information from the pixels of a corner of the input feature map.Two max-pooling layers are used, one is after the first convolution layer, and another is after the third convolution layer.The 2 × 2 sized kernels with stride 2 were used in every pooling layer.After each max-poling layer, batch normalization was used to accelerate the model training.After convolution and pooling operations, the feature maps were flattened to a single-column vector of 8192 (= 8 × 8 × 128) and fed to the dense layer.The dense layer and output layer's respective neuron counts were set at 128 and 2, respectively, and the dense layer is accompanied by a 25% dropout.In the output layer, the "Sigmoid" activation function was applied.Table 1 shows the shape and hyperparameter of the CNN model.

Experimental studies
This section presents experimental results and analyses of ER systems with CFMs created by different methods (i.e., MI, NMI, and PMI) individually and fused CFMs on the DEAP EEG dataset.The efficacy of the method was assessed based on the test set recognition accuracies.Finally, the outcomes of the study were compared with the state-of-the-art methods.However, the experimental setup and evaluation metric are described briefly first.
Experimental setup and evaluation metric.Keras and TensorFlow frameworks of Python were used for implementing the CNN models.The CNN was trained by the Adam algorithm 59 , and binary cross-entropy was used as the loss function.The learning rate, batch size, and epochs for the CNN were set to 0.00001, 32, and 500, respectively.A fivefold cross validation (CV) was applied where 20% of the available data were reserved as a test set by turn, while 80% of the data was used to train the model.Moreover, the performance was also evaluated for fixed training and test sets in several cases.The P100 GPU in the Kaggle platform was used for training the model, and MATLAB R2021a was used for feature extraction through the device of configuration: CPU: Intel(R) Core(TM) i5-4200 CPU @ 2.50 GHz, RAM: 4.00 GB, 64-bit windows operating system.The performance of the implemented model was evaluated using the three most widely used evaluation metrics (i.e., sensitivity, specificity, and accuracy), which can be expressed as: Here TP or true positive means the samples were originally labeled as high, and the model also predicted those as high, TN or true negative means the samples were originally labeled as low, and the model also predicted those as low, FP or false positive means the samples were originally labeled as low, but the model predicts those as high, FN or false negative means the samples were originally labeled as high but the model predicts those as low.Sensitivity is the percentage of true detected high-labeled samples to total high-labeled samples, and specificity is the percentage of true detected low-labeled samples to total low-labeled samples.An excellent classifier should have high sensitivity and specificity at the same time.Notably, performance on the test set is more desirable, representing the generalization ability of an ML/DL system.

Experimental results and analyses. Model's loss and accuracy curves for Valence and Arousal clas-
sification for a sample run are analyzed first, and then the classification results are presented for both fivefold CV and training-test split mode.The performance of different sub-bands (i.e., Alpha, Beta, and Gamma) and full frequency band have been evaluated with individual connectivity feature maps (CFMs) for a more reasonable comparison.Since the notion of training loss and accuracy are found to be similar for all the sub-bands, graphical illustrations of varying training epochs are presented for the Gamma band only in Fig. 5. Figure 5 shows the model's training loss and accuracy curves on both training and test sets for Valence classification for a sample run where CFMs are constructed using the individual connectivity method.In ML, the performance of the training set indicates the learning or the memorization of the patterns used in training a model.At the same time, performance on the test set indicates the generalization ability (i.e., performance well behind the training data) of a model.According to Fig. 5a, the loss convergence for TE is faster than any other method, and similarly, the accuracy improvement in Fig. 5b.In the test set's accuracy, TE shows the worst performance, whereas MI achieved the highest accuracy, as seen in Fig. 5c.The test set accuracies of PCC and PLV are competitive.The test set accuracy is inferior to the training set score for the model.The scenario is acceptable because the test set was reserved for checking the performance of unseen data (i.e., not used in the training process), and lower performance on the test set than the training set is common in the ML domain.Test set performance (i.e., generalization ability) is the key performance measure element of a model and is used to compare with other related models.
Figure 6 compares test set classification accuracies with CFMs by the individual connectivity methods for Alpha, Beta, Gamma and full frequency bands.As mentioned earlier, it is reported in the literature that the  www.nature.com/scientificreports/sub-bands may yield more accurate information about constituent neuronal activities 15 , and emotion is highly related to the Beta and Gamma sub-bands than the Alpha sub-band.The experimental results presented in Fig. 6 also justify the matter of frequency band compatibility for emotion recognition.According to the figure, the recognition accuracy is generally higher in sub-bands than full EEG frequency band for any CFM construction method.Again, accuracies for the Gamma sub-band are better than the Alpha and Beta sub-bands for both Valence and Arousal classifications.Recent studies also demonstrated such observation 9 .Therefore, owing to achieving better accuracy, further experimental outcomes have been observed for the Gamma band only for simplicity in keeping the paper concise.
Figure 7 shows the model's loss and accuracy curves on both training and test sets for Valence classification for a sample run with fused CFMs where CFMs are constructed using a combination of every two connectivity methods.As the two CFMs constructed with two angular portions of TE showed similar characteristics, and one of them is displayed here.According to Fig. 7a, the loss convergence for the methods where PCC is combined with any other methods (i.e., PCC + PLV, PCC + MI, PCC + TE) is slower than any other methods, and similarly,   www.nature.com/scientificreports/ the accuracy improvement in Fig. 7b.TE combined methods show lower test sets' accuracy where PLV + MI achieved the highest accuracy, as seen in Fig. 7c.The accuracies of PCC + MI and PCC + PLV are competitive.Interestingly, test set accuracy with MI is better than others, as found in Fig. 5c, and the MI combined method PLV + MI is better than others, as found in Fig. 7c.Similar characteristics can also be observed for TE, which obtained the lowest test set accuracy in Fig. 5c, and the TE combined method PLV + TE obtained the lowest test set accuracy among the combined methods, as found in Fig. 7c.At a glance, PLV + MI achieved better accuracies than any other individual or combined CFM method.Figure 8 summarizes the test set accuracies in the training-test split mode for six fused CFMs in Valence and Arousal scales.According to the achieved classification accuracies presented in the figure, the PLV + MI method achieves the highest Valence and Arousal classification accuracies, which are 91.29% and 91.66%, respectively.On the other hand, the worst achieved accuracy was with PLV + TE.While the proposed method has been found effective in the experiments conducted by shuffling samples for all 32 subjects, it is also interesting to know how it performs individual subject basis (i.e., subject-dependent) and cross-subject basis (i.e., subject-independent). Different experiments have been conducted to evaluate the proposed method for subject-dependent and subject-independent issues.In subject-dependent cases, for a particular subject, the available 560 samples (= 14 segments × 40 trials) were shuffled, 448 samples (i.e., 80%) were used to train the CNN, and the rest 112 samples (i.e., 20%) have been reserved as the test set.Figure 9a shows Valance and Arousal test set classification accuracies for 32 subjects individually for fused CFM by PLV + MI as it has outperformed others.For Valance classification, 100% accuracy (i.e., truly classified all 112 test samples) has been observed for Subject 1 only, and the worst accuracy has been found to be 82% for Subject 12 and Subject 25.In the case of Arousal classification, the method showed 100% accuracy for Subject 9 only, and the lowest   www.nature.com/scientificreports/accuracy was 87% for Subject 22.The subject-dependent average accuracy for both Valance and Arousal cases is around 93%, better than all subjects shuffled together.Subject-dependent better performance is logical as samples were used to train the CNN for the same subject.At the same time, such performance justified the proficiency of the proposed method for emotion classification in individual subject cases.Figure 9b shows Valance and Arousal classification accuracies for fused CFM by PLV + MI in the subjectindependent measure through leave-one-subject-out mode, where classification accuracies were measured for a particular subject (with 560 samples).At the same time, CNN training was performed with 17,360 (= 560 × 31) samples of the remaining 31 subjects.The Valance and Arousal accuracies for subject-independent cases are generally inferior to the subject-dependent case.However, higher than 80% accuracies for Valance classification in two subjects (i.e., 6 and 27) and Arousal classification in four subjects (i.e., 12, 13, 20, and 24) are promising outcomes.In addition, five other subjects (i.e., 3, 7, 16, 18, and 23) for Valance and six other subjects (i.e., 9, 17,  21, 25, 27, and 32) for Arousal are shown accuracy higher than 70%.Better performance on several subjectindependent cases is achievable when similar patterns are available in training samples.On the other hand, when the test subject samples are largely different from the training subjects' samples, it is common to get inferior accuracy.The inferior outcomes (say, accuracy below 70%) for a number of subject cases indicate that the corresponding subjects' samples are largely dissimilar from other subjects.At a glance, the average subject-independent classification accuracies are around 65% for both Valance and Arousal cases.The achieved subject-independent classification accuracy is better than or competitive with the reported accuracies in several studies such as 28,60 .However, it is a remaining challenging issue for the proposed method to achieve better subject-independent classification accuracy by analyzing CFM construction and employing different DL models.
Figure 10 presents the time required to train the CNN with four based CFMs (of four individual connectivity methods) and six fused CFMs.The legend of the figure indicates the required time to train with the name of the respective CFM.It can be observed from the figure that the time needed to train with different CFMs is almost the same (and they together seem like a single bold line).For example, PCC requires 814.73 s, and PCC + PLV requires 813.25 s to train the model up to 500 epochs.The figure revealed that using fused CFMs does not incur additional computational costs.The reason is apparent because the size of a fused CFM is the same as that of individual base CFMs, and the CFM size is always 32 × 32 matrix, as explained in Section "EEG-based emotion recognition through information enhancement in CFM".For the same 32 × 32 sized input CFM and the same CNN architecture, CNN training times are expected for the base and fused CFMs to be unchanged.However, CFM fusion may look like an additional task in the proposed method over the training CFM-based methods, but the CFM fusing task is computationally negligible as fusion only replaces a portion of CFM with another CFM.Finally, the CFM fusion takes place before the training of CNN, and therefore, CNN training time remains the same.validation).Due to such variations, unbiased fair performance comparison based on a particular element is ponderous.However, classification accuracy variation with segmentation time and overlapping are more visible than other elements (e.g., classifier uses, validation method).Segmentation splits an original EEG sample into several individual samples depending on the time window and overlapping, and a larger overlapping produces more samples (i.e., CFM).With 3 s segment window with 0 s overlap (i.e., no overlap), there are 20 samples for 60 s EEG signal in the study 16 ; whereas, with 2.5 s overlap (= 2.5/3 = 83%), there are 100 + samples for the same 3 s segment window in the study 21 .For PCC, the best accuracy was achieved by the study 21

Figure 1 .
Figure 1.The framework of the proposed emotion recognition system from EEG.

Figure 4 .
Figure 4. CNN architecture (with dimension of individual layers) to classify emotions from CFMs.

Figure 5 .
Figure 5. Model loss and accuracy for Valence classification using CFMs with individual connectivity method.

Figure 6 .
Figure 6.Valence and Arousal classification accuracies with individual connectivity methods in different frequency sub-bands and full frequency band.

Figure 7 .
Figure 7. Model loss and accuracy for Valence classification using fused CFMs with two connectivity methods.

Figure 8 .
Figure 8. Test set accuracies for CFM with individual methods and fused CFMs in Valence and Arousal classification for Gamma sub-band.
used maximal information coefficient (MIC) for CFM construction, employed a PCA network (PCANet) based DL model for deep feature extraction from constructed CFMs, and used SVM and CNN to classify emotions.Islam et al. 16 generated 2D CFMs using PCC, developed a different CFM in reduced size by rearranging the values of the CFMs of the upper triangle, and used CNN for emotion classification from both of the CFMs individually.Jin et al. 50also used PCC for feature extraction, represented in 1D, and Long Short-Term Memory (LSTM) + NN was employed for emotion classification.Chen et al. 20 used PCC, PLV, and TE to extract connectivity features and employed domain adaptive residual CNN for emotion classification from CFMs.

Table 1 .
Shape and hyperparameter of CNN model.

Table 2
demonstrates test set classification performance in different evaluation metrics for Valence and Arousal with different CFMs for fivefold CV and training-test split modes.The training-test split may consider as one individual case among the five cases of fivefold CV mode, and similar performance is also observed among the different CFMs cases.The best performance for a particular evaluation metric and mode is placed in boldface.As an example, for Valence and Arousal classification, MI achieved the best specificity, sensitivity, and accuracy in both modes.The lowest results were achieved with the TE feature in both modes.The performances of PCC and PLV are competitive.At a glance, MI shows superior performance over other connectivity methods.Table3presents the classification comparison using different combined connectivity feature map methods.From the table, it can be observed that PCC + PLV has achieved the highest specificity for Valence classification in fivefold CV mode.For Arousal classification, PCC + MI has achieved the highest specificity in the fivefold CV mode.PLV + MI has obtained the highest specificity for both Valence and Arousal classification in training-test split mode.In the case of sensitivity and accuracy, PLV + MI has shown the best performance for both Valence and Arousal classification in both fivefold CV and training-test split modes.

Table 2 .
Classification comparison using different individual CFM methods.Significant values are in bold.

Table 3 .
Classification comparison using different fused CFM methods.Significant values are in [bold].

test sets split as 80-20% PCC + PLV PCC + MI PCC + TE PLV + MI PLV + TE MI + TE PCC + PLV PCC + MI PCC + TE PLV + MI PLV + TE MI + TE
9. Subject-dependent and subject-independent test set classification accuracies with fused CFM with PLV + MI.Vol.:(0123456789) Scientific Reports | (2023) 13:13804 | https://doi.org/10.1038/s41598-023-40786-2 Table 4 compares the Valence and Arousal test set classification accuracies obtained in this study with other connectivity feature-based ER studies on the DEAP dataset.The existing methods are diverse in segmentation time and overlapping, classifier consideration, and validation methods (i.e., the mode of test sample reservation: fixed training-test sample split ratio or cross-

having
Figure 10.CNN training time for the models with different CFMs.