Introduction

Brain–machine interface (BMI; brain–computer interface, BCI) refers to the creation of a direct connection between the human or animal brain and external devices, establishing a communication channel, which allows the brain’s electrical signals to be converted into operating commands, enabling information exchange between the brain and the machine. In 1999, Tsinghua University’s research team, led by Professors Gao Shangkai and Gao Xiaorong, entered the field of BCI. The BCI laboratory proposed SSVEP-based BCI technology to generate oscillation frequency potential information through visual induction. Now, this paradigm has become One of the three main paradigms of non-invasive brain–computer interface1; In 2014, Merino et al. designed a drone control system based on SSVEP. The offline command sending accuracy reached 85%, and the online information transmission rate averaged (Information Transmission Rate, ITR) is 0.98 bits/min, and 9 out of 11 online testers successfully completed the flight mission2; In 2015, Professor Gao Xiaorong’s team of Tsinghua University developed a high-speed BCI system based on SSVEP to achieve 4.5 bit/s information transmission rate3; in 2017, Institute of Semiconductors, Chinese Academy of Sciences and cooperative research team proposed a task-related component analysis algorithm to increase the BCI information transmission rate to 6.3 bit/s, becoming the highest information transmission rate at the time4; In 2019, Tsinghua University’s BCI team passed CCTV’s “Challenge Impossible” program showed the public that the excellent application performance of the BCI typing system helped Wang Jia, a patient with ALS, realize the communication of thoughts and spellings in the external environment; In 2008, photographs that showed a monkey at the University of Pittsburgh Medical Center operating a robotic arm by thinking were published in multiple studies5; In 2018, Xu Xian and others also conducted research on brain-controlled aircraft based on SSVEP. The controlled object was a drone with an STM32 series chip as the control core. platform, the wireless ERP EEG collector, as an EEG collection device, successfully achieved flight control of drones through SSVEP signals6; In 2019, Li-Wei Ko et al. designed a smart helmet based on SSVEP-BCI, planning This kind of helmet used to support soldiers to achieve additional tasks such as brain-controlled communication or remote control equipment during combat operations7; In the World Robot Conference Brain-Controlled Typing Recording Challenge in July 2019, the algorithm provided by a joint team of the University of Macau and the University of Hong Kong achieved 691.55 bits/min. This information transmission rate is equivalent to outputting one English letter in 0.413 s with 100% accuracy8; Professor Ming Dong’s neuroengineering team from Tianjin University used parallel P300 and SSVEP functions to achieve a high-speed hybrid BCI with 108 instructions. This research result has also become the hybrid BCI with the largest number of BCI instructions and the highest information transmission rate9; Zheng Dezhi of Beijing University of Aeronautics and Astronautics and others designed a single-soldier combat unmanned weapons control system based on the SSVEP brain−computer interface in 2020, allowing soldiers to operate various weapons without using their hands during combat10; In May 2023, Nankai University Professor Duan Feng’s team completed the world’s first interventional brain–computer interface test on non-human primates and succeeded in Beijing. The test realized an interventional brain-computer interface brain-controlled robotic arm in the monkey brain. It is of great significance to promote research in the field of brain science and marks that my country’s brain-computer interface technology has entered the international leading ranks11. In 2023, Tsinghua University team proposed a brain–computer interface (BCI) system based on a hybrid of electroencephalography (EEG) and magneto electroencephalography (MEG), which improved the 40-target classification accuracy from 50 to 95%, highlighting the hybrid The methodological advantages of MEG-EEG BCI indicate that it is a promising paradigm for implementing high-speed BCI12; in 2023, Song Jun of Anhui University and others designed a brain-controlled wireless sensor based on deep learning and sliding mode control. Human–machine methods and systems are used to solve the problem of being unable to move and control UAVs at any time13. The above-mentioned related studies all guide SSVEP classification and identification, and all have important reference values.

The remainder of this article is organized as follows. “Related work” section gives some related work. The details of the soft saturation-based nonlinear model proposed in this paper will be explained in “Materials and methods” section, and the hardware operating environment and experimental data set will be introduced. In the “Test results and analysis” section, data results and analysis are carried out, as well as a comparative analysis of the results of different activation function modules. Finally, a careful discussion is provided in “Discussion” section.

Related work

Different model methods

CCA

Canonical correlation analysis (CCA) mainly analyzes the SSVEP signal by calculating the canonical correlation coefficient of the two sets of signals. CCA mainly analyzes the SSVEP signal by calculating the typical correlation coefficient of the two sets of signals. One set of signals is the recorded EEG signal X =\({X}_{1}, {X}_{2}\), ...\({X}_{n}\), where the number of channels of the EEG signal is collected. Another set of signals is the reference signal corresponding to the frequency of visual stimulation:

$${Y}_{i}=\left(\genfrac{}{}{0pt}{}{\genfrac{}{}{0pt}{}{\genfrac{}{}{0pt}{}{{\text{sin}}(2\pi {f}_{i}t)}{{\text{cos}}(2\pi {f}_{i}t)}}{...}}{\genfrac{}{}{0pt}{}{{\text{sin}}(2\pi k{f}_{i}t)}{{\text{cos}}(2\pi k{f}_{i}t)}}\right), t = \frac{1}{{F}_{S}},\frac{2}{{F}_{S}},\dots ,\frac{{N}_{S}}{{F}_{S}}.$$
(1)

Among them, i is the number of stimulation targets, \({\text{f}}_{\text{i}}\) represents the stimulation frequency, and k represents the number of harmonics in the reference signal, \({\text{N}}_{\text{S}}\) represents the number of sampling points. \({\text{F}}_{\text{S}}\) indicates the sampling rate. Considering that the human brain is a low-pass filter and high-frequency signals are filtered out, k = 3 is taken here, thus:

$${Y}_{i}=\left(\genfrac{}{}{0pt}{}{\genfrac{}{}{0pt}{}{\genfrac{}{}{0pt}{}{{\text{sin}}(2\pi {f}_{i}t)}{{\text{cos}}(2\pi {f}_{i}t)}}{{\text{sin}}(4\pi k{f}_{i}t)}}{\genfrac{}{}{0pt}{}{{\text{cos}}(4\pi k{f}_{i}t)}{\genfrac{}{}{0pt}{}{{\text{sin}}(6\pi k{f}_{i}t)}{{\text{cos}}(6\pi k{f}_{i}t)}}}\right), t = \frac{1}{{F}_{S}},\frac{2}{{F}_{S}},\dots ,\frac{{N}_{S}}{{F}_{S}}.$$
(2)

The linear combination of X and \({Y}_{i}\) can be expressed as \({x={X}^{T}W}_{Y}\) and \({y={Y}^{T}W}_{Y}\), where \({W}_{X}\) and \({W}_{Y}\) are the weight matrices. Therefore, the correlation coefficient between x and the reference signal corresponding to the i-th stimulus is:

$$\rho \left({f}_{i}\right)=\frac{E\left({x}^{T}{y}_{i}\right)}{\sqrt{E\left({x}^{T}x\right)E\left({y}^{T}y\right)}}=\frac{E\left({W}_{X}^{T}X{Y}_{i}{YW}_{Yi}\right)}{\sqrt{E\left({{W}_{X}^{T}XX}^{T}{W}_{X}\right)E\left({{W}_{Yi}^{T}YY}^{T}{W}_{Yi}\right)}}.$$
(3)

If K represents the number of stimulation frequencies, the final identified target frequency is:

$${f}_{s}=max[\rho ({f}_{i})], i =\text{1,2},3,...K.$$
(4)

TRCA

Although CCA-based methods have good performance in identifying SSVEP signals, the performance of such methods is still susceptible to interference from spontaneous brain electrical activity. In addition, the researchers considered that the CCA-based method still has a big problem, that is, the phase information is not used (the sine and cosine in the reference signal do not contain phase terms). So if phase information can be effectively used, it will improve the recognition performance of SSVEP. Therefore, the researchers proposed the TRCA method. The TRCA method extracts task-related components by maximizing the reproducibility of neuroimaging data in each task. This method is very suitable for time-locked signals such as SSVEP because in SSVEP it can maximize the reproducibility between multiple trials, improve SNR (Signal-to-Noise), and suppress spontaneous brain electrical activity.

Task-Related Component Analysis (TRCA) is currently one of the most popular methods for SSVEP identification. Before this method was proposed, CCA or MSI were often used for frequency identification. The biggest advantage of these methods is that they do not require subject training data.

In 2018, Nakanishi et al. took the lead in introducing the TRCA method to SSVEP identification14. This method was first proposed by Tanaka et al. in 2013 and applied to NIRS identification15,16. This method can extract task-related components by maximizing the reproducibility of neuroimaging data in each task. This method is very suitable for time-locked signals such as SSVEP. In SSVEP, the reproducibility between multiple trials is maximized, thereby improving the signal-to-noise ratio and suppressing spontaneous brain electrical activity.

EEGNet

EEGNet is a deep learning model specially designed for EEG signal processing17,18. Waytowich et al. applied EEGNet to SSVEP classification and achieved excellent results in inter-subject classification tasks. EEGNet consists of four layers. The first layer is the convolutional layer, which simulates the filtering operation on each channel of EEG data. The second layer is a deep convolutional layer, which is equivalent to a spatial filter that weights all channels. The third layer is a separable convolutional layer used to extract classification features. The fourth layer is the fully connected layer, which outputs the classification results. Since its introduction, EEGNet has been widely used in various EEG classification tasks, such as moving images, P300, and SSVEP.

SSVEPNet

The model structure of SSVEPNET proposed by Pan Yudong and others of Southwest University of Science and Technology in 2022 mainly consists of five modules: input layer, spatial filtering module, temporal filtering module, bidirectional long short-term memory (Bi-LSTM) module and fully connected module19. Implemented based on the PyTorch framework, the input layer data input format is Nc × Nt, where Nc represents the number of channels and Nt represents the number of sampling points. Use the spatial filtering module to learn 2 × Nc spatial filters in the model, regularizing each spatial filter with the maximum value. However, when this model is used for data processing, the nonlinear effect is not good.

Several main nonlinear activation functions

The nonlinear activation function is an important part of the deep learning neural network. After the output of each layer of the neural network, a function (such as ReLU, Leaky ReLU, Parametric ReLU, etc.) will be used to calculate the result to enhance the representation of the network. Ability and learning ability, the activation function of the neural network is nonlinear.

ReLU activation function

The ReLU function, also known as the Rectified Linear Unit, is a piecewise linear function that makes up for the gradient disappearance problem of the sigmoid function and the tanh function and is widely used in current deep neural networks. The ReLU function is essentially a ramp function, as shown in formula (5) below.

$$f(x)=\left\{\begin{array}{c}x, x\ge 0\\ 0, x<0\end{array}={\text{max}}\left(0, x\right).\right.$$
(5)

When the input is positive, the derivative is 1, which improves the vanishing gradient problem to a certain extent and accelerates the convergence speed of gradient descent; the calculation speed is much faster. There is only a linear relationship in the ReLU function, so its calculation speed is faster than sigmoid and tanh. Considered to have biological plausibility, such as unilateral inhibition and wide excitation boundaries (i.e., the level of excitation can be very high).

The ReLU function has a Dead ReLU problem. Neurons are more likely to “die” during training. That is, when the input is negative, ReLU completely fails, but there is no problem in the forward propagation process greater than 0. But during backpropagation, if the input is negative, the gradient will be exactly zero. Similar to the Sigmoid activation function, the output of the ReLU function is not centered on zero. The output of the ReLU function is 0 or a positive number. Introducing a bias offset to the neural network of the subsequent layer will affect the efficiency of gradient descent.

Leaky ReLU activation function

To solve the gradient disappearance problem in the ReLU activation function, when x < 0, this article uses Leaky ReLU—this function attempts to fix the Dead ReLU problem. Leaky ReLU function expression (6).

$$LeakyReLU(x)=\left\{\begin{array}{c}x, if x>0\\ \gamma x, if x\le 0\end{array}={\text{max}}\left(0,x\right)+\gamma {\text{min}}\left(0,x\right).\right.$$
(6)

This function alleviates the Dead ReLU problem to a certain extent. Leak helps to expand the range of the ReLU function. Usually, the value of a is about 0.01; the function range of Leaky ReLU is (negative infinity to positive infinity). Although Leaky ReLU has the ReLU activation function. It has all the characteristics of efficient calculation, fast convergence, and no saturation in the positive region, in actual operation it cannot be completely proved that Leaky ReLU is always better than ReLU.

Parametric ReLU activation function

Leaky ReLU is an extension based on ReLU to address existing problems. In addition, it can also be expanded from other angles. Instead of multiplying x by a constant term, x is multiplied by a hyperparameter. This seems to be better than Leaky ReLU. This extension is Parametric ReLU, which is a ReLU function with parameters. Function expression (7)

$${PReLU}_{i}(X)=\left\{\begin{array}{l}x, if \,x>0\\ {\gamma }_{i}x, if\, x\le 0\end{array}={\text{max}}\left(0, x\right)+{\gamma }_{i}{\text{min}}\left(0, x\right),\right.$$
(7)

where \({\gamma }_{i}\) is the hyperparameter, corresponding to the slope of the time function. A random hyperparameter is introduced here, which can be learned and backpropagated. Different neurons can have different parameters, where i corresponds to the i-th neuron, which enables the neurons to choose the best gradient in the negative region. With this ability, they can become ReLU or Leaky ReLU. If = 0, then PReLU degenerates into ReLU; if it is a small constant, then PReLU can be regarded as Leaky ReLU; PReLU can allow different neurons to have different parameters, or a group of neurons can share one parameter.

Materials and methods

Method

e-SSVEPNet model construction

This paper proposes the e-SSVEPNet model to classify important parameters of SSVEP signals. First, the convolutional layer of CNN is used to extract SSVEP signal features of the process parameters, and multiple feature vectors of the SSVEP signal process parameters are obtained. The multiple extracted feature vectors are then input into the network structure of BiLSTM. BiLSTM can mine hidden features including the time dimension (Fig. 1). Finally, the final result is passed to the output layer through the sequential operations of the three gates of the BiLSTM structural layer. The structure and relationship of the e-SSVEPNet model are shown in Fig. 2. A prediction model is established with the important parameters of the SSVEP process as the prediction target. According to the degree of data correlation in the SSVEP data set, parameters with higher correlation with the important parameters of the SSVEP process are selected as the input of the model to establish the model.

Figure 1
figure 1

Soft saturation nonlinear module diagram

Figure 2
figure 2

Proposed e-SSVEPNet model diagram

Proposed soft saturation nonlinear module

To solve the problems existing in the negative part of functions such as PReLU, this paper proposes a soft saturation nonlinear module, which has high noise robustness. The soft saturation nonlinear module uses an exponential calculation to output when x is less than zero. Compared to ReLU, soft saturating nonlinear modules have negative values, which make the average value of activations close to zero. Mean activations close to zero can make learning faster because they bring the gradient closer to the natural gradient. The function expression is (8)

$$g(x)=\left\{\begin{array}{l}x, \quad x>0\\ \alpha ({e}^{\beta x}-1), \quad x\le 0\end{array}.\right.$$
(8)

The soft saturation nonlinear module has all the advantages of ReLU and does not have the Dead ReLU problem. The average value of the output is close to 0, centered on 0; by reducing the influence of the bias offset, the normal gradient is closer to the unit natural gradient, thereby making the mean Accelerate learning toward zero; saturates to negative values at smaller inputs, reducing variation and information forward propagated. Although both LReLU and PReLU also have negative values, they are not guaranteed to be robust to noise in the inactive state (that is, when the input is negative).

The e-SSVEPNet model proposed in this paper is an improvement on the SSVEPNet model, adding some convolutional layers and using different activation function modules. Such improvements can help extract richer, more abstract features and improve the model’s understanding of SSVEP signals.

The soft saturation nonlinear module has soft saturation characteristics when the input takes a small value, which improves the robustness to noise and makes the output results better. As shown in Fig. 1, where α and β are super parameters, which control when the negative part of the soft saturation nonlinear module saturates.

Specific steps are as follows:

  1. (1)

    Data preprocessing This work selects the data in the SSVEP data set as a sample, in which stimulus feature data is an important parameter. First, the sample data is filtered to eliminate the impact of noise. Such conversion helps to improve the numerical stability and convergence speed of the algorithm.

  2. (2)

    Determine model parameters The specific values of the model parameters are first determined based on manual experience, and then the validation set of the data is tested, and the accuracy of the results is determined based on the optimal results of the validation set. To make better use of the classification result data, the number of neurons in the input layer of the first module of CNN is set to Nc × Nt, corresponding to the number of channels and the number of sampling points respectively. CNN includes a convolution layer and a convolution kernel. The size is K × 1, the step size is 2, and the model uses a soft saturation nonlinear module, which can improve the vanishing gradient problem. CNN The second CNN contains three convolutional layers. The size of the convolution kernel is K × 1 and the step size is 2. The model also uses a soft saturation nonlinear module, which can improve the gradient disappearance problem. The Bi-LSTM module also adopts the BiLSTM structure containing 5-layer units and uses the Adam gradient descent optimization algorithm, which can converge quickly and is not very sensitive to the selection of the initial learning rate. In the end, to solve the possible over-fitting problem of the e-SSVEPNet model, the extracted features are input into three dense layers. The last layer consists of 9 neurons to identify the accuracy of the classification results. The hyper-parameters of the model on the data set are as shown in Table 1, in which the number of model iterations is set to 500, the initial learning rate is 0.018, the learning rate reduction factor is 0.0003, etc. And according to the results of Tables 8 and 9 obtained from the experiment, analysis was carried out to determine α and β.

  3. (3)

    Run the prediction model. The proposed soft-saturated nonlinear model is used to classify and predict the important parameters of the SSVEP data set, and the accuracy of the results is verified.

Table 1 Hyper-parameters of the model on the data set

Hardware environment

The experimental environment configuration is shown in Table 2.

Table 2 Experimental environment configuration

Dataset

This paper uses an open, publicly available dataset20. To better understand the principle and structure of the data set.

Test results and analysis

Experimental results and analysis of different models

This paper uses e-SSVEPNet, an efficient SSVEP signal recognition deep learning network with a soft saturation nonlinear module, to evaluate the impact of different data lengths and classification methods on classification accuracy. To evaluate the effectiveness of the proposed model, a traditional method (TRCA) and two deep learning methods (EEGNet, SSVEPNet) were used as baseline methods for comparison. e-SSVEPNet and all other baseline traditional methods and DL methods are evaluated in experiments.

To verify the validity of the model, we adopt the classification scenario within the discipline. In the intra-discipline classification scenario, the raw data of each discipline is divided into the training set and the test set according to the ratio of 8:2, 5:5 and 2:8.

  1. (1)

    8:2: When the dataset is large, it is common to choose to use most of the data for training (80%) and a small amount for testing (20%). The benefit of this is that the model can use more data to learn while training, which can help improve the accuracy and generalization ability of the model.

  2. (2)

    5:5: When the amount of data is not very large, but we need to fully evaluate the generalization ability of the model, it is reasonable to choose an equal training and test set ratio. In this case, the amount of data in the training and testing phases of the model is relatively balanced.

  3. (3)

    2:8: When the amount of data is very small or the data is unevenly distributed (e.g. class imbalance), choosing to use most of the data as the training set and a small amount as the test set ensures that the model is trained on more data, thus maximizing the performance of the model.

  4. (4)

    In this paper, we use cross-validation to divide the dataset into multiple folds, each containing training and test data, thus ensuring that the division is carried out according to a specific ratio to further evaluate the stability and generalization ability of the model. Tables 3 and 4 illustrate the classification results of the dataset under different conditions.

Table 3 Classification results of SSVEP for intra-subject experiments when the time window length is 1 s.
Table 4 Classification results of SSVEP for intra-subject experiments when the time window length is 0.5 s.

According to Table 3, when the time window length is 1 s, SSVEP’s classification data analysis of the intra-subject experiment shows that the accuracy of all models has dropped significantly, but the accuracy of the e-SSVEPNet model is better.

According to Table 4, when the time window length is 0.5 s, SSVEP’s classification data analysis of the intra-subject experiment shows that the accuracy of all models has dropped significantly, but the accuracy of the e-SSVEPNet model is better.

Results and analysis of different nonlinear modules

To evaluate the effectiveness of the soft saturation nonlinear module in the proposed model, PReLU, RELU, and GeLU were used as baseline methods for comparison. The soft saturation nonlinear module and all other baseline methods in the e-SSVEPNet model were evaluated in experiments.

According to Table 5, when the data set is divided into 2:8, different nonlinear modules analyze the classification data of the intra-subject experiment. The soft saturated nonlinear module in the e-SSVEPNet model has better accuracy.

Table 5 Classification results of SSVEP’s intra-subject experiment when the data set is divided into 2:8.

According to Table 6, when the data set is divided into 5:5, different nonlinear modules analyze the classification data of the intra-subject experiment. The soft saturated nonlinear module in the e-SSVEPNet model has better accuracy.

Table 6 Classification results of SSVEP’s intra-subject experiment when the data set is divided into 5:5.

According to Table 7, when the data set is divided into 8:2, different nonlinear modules analyze the classification data of the intra-subject experiment. The soft saturation nonlinear module in the e-SSVEPNet model has better accuracy.

Table 7 Classification results of SSVEP’s intra-subject experiment when the data set is divided into 8:2.

Results and analysis of different β

To evaluate the effectiveness of β in the proposed model, β is 0.6, 0.8, 1.0, 1.2 for comparison. The soft saturation nonlinear module in the e-SSVEPNet model is evaluated experimentally.

According to Table 8, when β is different, SSVEP’s classification data analysis of the intra-subject experiment shows that the accuracy of the proposed e-SSVEPNet model does not change much, but the accuracy is better when β is between 0.6 and 0.8, so the model adopts β as 0.8 (Supplementary Information).

Table 8 Classification results of SSVEP for intra-subject experiments when different β.

Results and analysis of different α

To evaluate the effectiveness of α in the proposed model, α is 0.6, 0.8, 1.0, 1.2 for comparison. The soft saturation nonlinear module in the e-SSVEPNet model is evaluated experimentally.

According to Table 9, when α is different, SSVEP’s classification data analysis of the intra-subject experiment shows that the accuracy of the proposed e-SSVEPNet model does not change much, so the model adopts α as 1.0.

Table 9 Classification results of SSVEP for intra-subject experiments when different α.

Discussion

The SSVEP-BCI system has the advantages of stable performance, a high information transmission rate, fewer training requirements, a large number of instructions, and small individual differences. However, SSVEP has problems such as insufficient data volume and a short time window. Therefore, to improve the brain–computer signal based on SSVEP. The decoding performance of impact on classification results. Experimental results show that compared with traditional and deep learning methods and different nonlinear modules, the soft saturation nonlinear model proposed in this paper can significantly improve classification accuracy, especially under short data lengths. However, although the program running time is short and the SSVEP classification and recognition accuracy is high, these are run in a quiet environment with less interference in the laboratory, while the real environment has strong noise, so there is a significant difference between the two the difference. Moreover, due to the strong noise and non-stationarity of SSVEP data, there may be significant differences in the SSVEP data collected by BCI users at the same location and in different periods. Therefore, there is a long way to go for the development of BCI.