Fast, accurate, and interpretable decoding of electrocorticographic signals using dynamic mode decomposition

Dynamic mode (DM) decomposition decomposes spatiotemporal signals into basic oscillatory components (DMs). DMs can improve the accuracy of neural decoding when used with the nonlinear Grassmann kernel, compared to conventional power features. However, such kernel-based machine learning algorithms have three limitations: large computational time preventing real-time application, incompatibility with non-kernel algorithms, and low interpretability. Here, we propose a mapping function corresponding to the Grassmann kernel that explicitly transforms DMs into spatial DM (sDM) features, which can be used in any machine learning algorithm. Using electrocorticographic signals recorded during various movement and visual perception tasks, the sDM features were shown to improve the decoding accuracy and computational time compared to conventional methods. Furthermore, the components of the sDM features informative for decoding showed similar characteristics to the high-γ power of the signals, but with higher trial-to-trial reproducibility. The proposed sDM features enable fast, accurate, and interpretable neural decoding.


Introduction
Fast and accurate characterization of the spatiotemporal dynamics of neural signals is crucial to decode neural signals for brain-computer interfaces (BCIs), which can be applied to allow severely paralyzed patients to reconstruct their lost communication and mobility functions.Dynamic mode decomposition (DMD) is a numerical method to obtain representations called Koopman modes [1][2][3][4][5][6] , each of which corresponds to an oscillation of a spatial pattern with a fixed frequency and decay/growth rate.For a multidimensional (P-dimensional) time series   ∈ ℝ that can be approximated as evolving over time Δt , as shown in Equation (1), DMD approximately decomposes   as a superposition of K oscillatory components in a complex space, as shown in Equation (2), by obtaining  and  based on singular value decomposition (SVD) applied for  in Equation (1) (see Methods).
Here, each of the K oscillatory components is represented by a spatial pattern  , an Pdimensional complex vector representing the dynamic mode (DM), and the following parameters of the k-th DM:  , the frequency of the DM;  , the decay/growth rate of the DM; and  , a scalar that determines the initial phase of the DM.
Figure 1 shows examples of DMs.Spatiotemporal signals of   were generated as the sum of two oscillations with different spatial distributions (    , Fig. 1a).
When SVD was applied for  in Equation (1), four singular values were obtained as nonzero values (Fig. 1b).By using the four SVD components corresponding to these nonzero singular values, four oscillatory components (DMD components) were acquired (Fig. 1c).By adding the products of each DM and time dynamics, an approximation for     can be obtained (Fig. 1d).Notably, some of the DMD components have complex conjugate pairs for their modes and time dynamics (e.g., DMD components 1 and 2), yielding real summed values.Because the original observed spatiotemporal signals   are strictly real, DMD always decomposes the signals into real DMD components or pairs of complex conjugate DMD components, ensuring that the reconstructed     is in real space 7 .
The Koopman modes extracted by the DMD process (DM,  ) capture some characteristic spatiotemporal patterns in the dynamics of neurophysiological signals; thus, the DMs are useful for neural decoding.Previous studies have demonstrated that DMs characterize spindles recorded by electrocorticographic (ECoG) signals 8 and different traits of functional magnetic resonance imaging (fMRI) scans 9 .Our previous study also demonstrated that DMs were informative for classifying the ECoG signals corresponding to some hand movements 10 .In that study, DMD was applied to the ECoG signals for each trial and the resultant matrix of DMs (  …  ) was compared with the matrix of another trial using the projection kernel [11][12][13] , one of the Grassmann kernel functions, to quantify the similarity among all trials.It should be emphasized that a direct comparison between the matrix of DMs for the i-th trial (  …  ) and that of the j-th trial  is difficult, because each DM  has a different frequency  .Hence, the projection kernel was introduced in the previous study to define the similarity between the matrices for each trial,   ,  .A support vector machine (SVM) was then applied to the acquired Gram matrix of   ,  to infer one of three types of movements (Fig. 2a).The SVM model successfully classified the ECoG signals with an accuracy superior to that of a model using power features estimated based on the same ECoG signals.Interestingly, the accuracy of this DM-based decoding approach significantly decreased when the method was applied to ECoG signals with where  is the input space of the kernel function and  denotes transpose of .In this study, we obtained such a nonlinear feature mapping function  • for the projection kernel to acquire vectorized features, which can be used in linear machine learning algorithms.In this work, the effectiveness of our proposed features was empirically evaluated using two types of ECoG signals: during hand movements and during visual stimuli.Furthermore, the properties of the sDM features were compared with those of the power features to characterize the properties of the features.

Nonlinear feature mapping equivalent to the projection kernel for DMD
The nonlinear feature mapping  • corresponding to the projection kernel and L2regularized SVM was obtained as follows.1f).

sDM features of ECoG signals during hand movements
The characteristics of the sDM features were evaluated by using the same dataset of ECoG signals from our previous study 10 .This dataset is composed of ECoG signals that were recorded at 1 kHz from subdural electrodes placed on the sensorimotor cortex while 11 patients performed three types of movements with their hand contralateral to the implanted electrode (ECoG dataset of arm motor task; Supplementary Table 1).
Following the method used in our previous study, the DMs were calculated based on the 500-ms ECoG signals after the cue to start the movements, with truncation of the SVD components; the truncation was performed based on the singular values so that the SVD components with the largest singular values were included in the DM calculation.
Because the number of included SVD components, referred to as the rank in this study, largely affects the decoding accuracy, it is important to optimize the rank parameter.
However, here, the rank was fixed to 300, which was the same value used in our previous study, to show the representative sDM features and to compare the classification accuracy and computational time.With the exception of this analysis, the rank parameter was optimized simultaneously with the parameter to train the decoding model via nested cross-validation for all other analyses in this paper (see Methods).
Figure 2a shows a representative example of the Gram matrix acquired by the projection kernel applied among the DMs from 120 trials, in which patient 1 performed grasping, opening, or pinching movements with his left hand.The Gram matrix showed that the kernel value between trials was clustered according to the movement type.For the same DMs from 120 trials, the sDM features ( :  , ∈ ℝ, ,  1, … ,  were acquired for each trial, where P is the number of analyzed channels for the patient.Fig. 2b shows an example of the sDM features averaged for each movement type.Because the sDM features are always symmetric, the independent components are on the diagonal and one side of the off-diagonal.We call these independent diagonal and offdiagonal components of the sDM features spatial node DM (snDM) features (  , ,  1, … , ) and spatial edge DM (seDM) features (  , , ,  1, … , ,  ), respectively.When one-way analysis of variance (ANOVA) was applied for each component in the matrix for the three types of movements, the F values of the ANOVA were higher for the diagonal components of the sDM features (snDM features) than for the off-diagonal components of the sDM features (seDM features; Fig. 2c).It was demonstrated that the snDM features had higher selectivity than the seDM features.
To characterize the neurophysiological properties of the snDM features, the snDM features were compared with the power spectrum density (PSD) features of the same ECoG signals.Because each component of the snDM features corresponds to a channel, the snDM features of all trials were concatenated to calculate Pearson's correlation coefficient, with the PSD of each frequency concatenated for all channels and trials.The correlation coefficients showed a peak above 80 Hz, which interestingly included the high-γ band (80-150 Hz), known as the most informative frequency band for movement classification 18 .Hence, the results showed that the snDM features were most similar to the high-γ power features among the frequency band powers in the ECoG signals.On the other hand, when the snDM features were compared among different trials of the same movement, the reproducibility of the snDM features was significantly higher than that of the power features, including the high-γ power features (Fig. 2e; p < 1.0*10 -21 , F(4,50)=94.95, one-way ANOVA; snDM features vs. other features, p < 6.4*10 -8 , post hoc Tukey-Kramer test; for reproductivity during visual perception, see Supplementary Fig. 1).These results suggested that snDM features capture similar cortical activities represented by high-γ power features with higher reproducibility.

Neural decoding using sDM features of ECoG signals during hand movements
To assess the feasibility of using sDM features for neural decoding, we compared the computational times and accuracies in classifying the movement types using the Gram matrix of the DMs and the sDM features by SVM.First, the computational times were The training time of the decoder using the Gram matrix increased exponentially with the number of training samples (~O(n 1.99 ), where n is number of samples per class; shown as a green line in Fig. 3a).In contrast, when the sDM features were used with the linear L2-SVM, the computational complexity was reduced to ~O(n 1.11 ) (shown as a blue line in Fig. 3a).Similarly, the prediction time for a new sample increased with the number of training samples (~O(n 0.75 )) for the Gram matrix, while the prediction time using the sDM features and linear L2-SVM was much shorter and increased much more slowly with the number of training samples (~O(n 0.03 )).In addition, the decoding accuracies were exactly the same between the method using the Gram matrix and the method using the sDM features (73.80±7.04%(mean ± 95% CI) for both, Fig. 3b).Because the matrix of the sDM features is symmetric, the lower triangular part of the sDM features was redundant for the classification analysis.By performing classification using only the diagonal and upper triangular part of the sDM features (snDM features and seDM features, respectively), the classification accuracy was slightly improved to 76.75±6.67%(labeled as snDM + seDM in Fig. 3a).Use of the sDM features increased the training and testing speeds for the neural decoding process without decreasing the decoding accuracy.
An SVM with L1 regularization (L1-SVM) was then applied to the combined snDM and seDM features.For comparison, we also applied the L1-SVM to the high-γ power features.It is worth mentioning that the L1-SVM cannot be applied to the Gram matrix.
The classification accuracy of the combined snDM and seDM features with the L1-SVM (80.45±7.52%)was significantly higher than that of the high-γ power features with the L1-SVM (71.40±8.14%;p < 0.01, two-tailed paired t test, t(10) = 4.6064; Fig. 3b) and that of the Gram matrix of the DMs with the L2-SVM (p < 0.01, t(10) = 4.6565).In addition, the classification accuracy using the snDM features (81.33±6.71%)was similar to the accuracy using the combined snDM and seDM features (Fig. 3b).In contrast, the accuracy using the seDM features was lower (58.67±6.55%).Notably, the use of the snDM features with the linear L1-SVM reduced the training time to ~O(n 0.97 ) and the prediction time to ~O(n 0.001 ) compared to the use of the sDM features with the L2-SVM (Fig. 3a).Therefore, these results demonstrated that the sDM features, especially the snDM features, with the L1-SVM model improved the classification accuracy and computational time for the neural decoding of ECoG signals recorded during hand movements.
As previously mentioned, the sDM features were composed of DMs from the full frequency range; hence, the sDM features were unlikely to capture differences in the frequencies of the DMs.To assess the difference in the frequencies of the modes for different movement types, the classification accuracies using the frequency-filtered snDM features were calculated with the L1-SVM model for conventional frequency bands (0-1, 1-4, 4-8, 8-13, 13-30, 30-80, 80-150, and 150-500 Hz).The classification accuracies significantly differed among the frequency bands (p =6.9*10 -15 , F(7,80)=19.34, one-way ANOVA); interestingly, the classification accuracies for the frequency bands of 80-150 Hz and 150-500 Hz, which are known to be informative for movement decoding using power features 18 , were significantly higher than those of the other frequency bands except 30-80 Hz (p < 0.01, post hoc Tukey-Kramer test of oneway ANOVA).However, the classification accuracy using the combined frequencyfiltered snDM features from all bands was similar to that using the (nonfiltered) snDM feature (combined frequency-filtered snDM features, 81.12±5.54%;snDM features, 80.88±6.34%).Notably, the classification accuracy did not improve when frequencyfiltered seDM features were included (combined frequency-filtered seDM features, 65.17±8.87%;combined frequency-filtered snDM and seDM features, 79.39±6.09%).
These results suggested that (1) the information about the movement type included in each frequency band was not complementary, and (2) the differences in the movement types did not largely affect the frequencies of the DMs.

Comparison of neural decoding accuracy for different types of tasks and decoding using ECoG signals
The  (Miller, 2019).
Neural decoding by either classification or regression was performed for each of these datasets.As summarized in Fig. 4, the proposed method successfully decoded ECoG signals with an accuracy higher than or comparable to that reported in previous works regardless of task or decoding type (due to the computational time, L2 regularization was used for regression throughout this study; Supplementary Fig. 2 and Supplementary Tables 3-4).Consistent with the results based on the ECoG dataset of the arm motor task, the snDM features enabled neural decoding with higher accuracy than the seDM features for all datasets (Fig. 4, black bars), and use of the frequency-filtered snDM and seDM features did not show considerable improvement in the decoding accuracy (Fig.

Discussion
We The high-ɤ activities in ECoG signals have been determined to be the most informative frequency band for neural decoding [18][19][20][21][22][23] , reflecting the spiking activities of neurons 24 ; hence, the use of high-ɤ powers has been a standard method to extract useful information from cortical activity.On the other hand, the (nonfiltered) sDM features proposed in this study are constructed based on DMs regardless of their widely distributed frequencies.Interestingly, although the high-ɤ powers and the sDM features are produced in different ways, the correlation coefficient between the high-ɤ powers and the snDM features suggests that they extract similar information.Moreover, the snDM features have smaller variance within the same task than the high-ɤ powers, which likely contributes to the improvement in the decoding accuracy.It is worth noting that the use of the frequency-filtered snDM features did not lead to a considerable improvement in the decoding accuracy of the ECoG signals.Taken together, the results suggest that the motor and visual information in the ECoG signals is not encoded in the frequencies of the DMs, although the high-ɤ power features are highly informative among the various frequency bands.The sDM features are novel electrophysiological features that stably extract neural information without explicitly selecting the frequency band for neural decoding.

Conclusions
Our proposed sDM features were demonstrated to be effective for the fast and accurate decoding of ECoG signals in various tasks.Furthermore, snDM features without frequency band selection appear to be the most effective features for decoding ECoG signals.

Datasets
This study employed two in-house datasets that were used in our previous reports

Subjects
The ECoG dataset of the arm motor task consisted of eleven subjects (7 males; age range, 13-66 years), with subdural electrodes placed on their sensorimotor cortices.All subjects were implanted with intracranial electrodes prior to the study for the purpose of treating their drug-resistant epilepsy.

Task procedure
The subjects were instructed to perform three types of movement with their upper limb contralateral to the sensorimotor cortex where the subdural electrodes were implanted.
Three types of movement were selected among grasping, pinching, hand opening, thumb flexion, and elbow flexion 18 according to their performance ability and comfort.For each trial, three visual and auditory cues were provided at intervals of 1 s; at the timing of the last cue, the subjects performed one of the three types of movement once and returned to the resting position, relaxing their hands or elbows with slightly flexed joints.For the types of performed movements and number of trials for each movement type, see Supplementary Table 1.

Experimental settings and ECoG recordings
The subjects were seated on chairs to perform the movement tasks.A computer screen was placed in front of the subjects to show the movement cue, which was also delivered auditorily.The presentation of the cues was controlled using ViSaGe (Cambridge Research System, Rochester, UK).During the experiment, ECoG signals were recorded at 1 kHz by EEG-1200 (Nihon Koden, Tokyo, Japan) by referencing the average of two intracranial electrodes.Digital pulses denoting the timing of the cue were recorded synchronously with the ECoG signals.

Signal preprocessing
Noisy channels were identified by visual inspection and rejected from subsequent analyses.Channels located outside of the motor-related regions were also rejected (for the number of analyzed channels, see Supplementary Table 1).Common-average referencing was performed among the noise-free channels.For the classification analysis, ECoG signals from 0 to 500 ms with respect to the movement cue were used.

Division of the dataset for classification
This dataset was evaluated by classification analysis with nested cross-validation for each patient.To accurately estimate the classification accuracy, 10-fold outer crossvalidation was repeated 10 times by changing the division of the dataset to calculate the average of the 10 classification accuracies.For each outer fold, inner cross-validation was also repeated 10 times by changing the division of the samples to accurately estimate the best decoder parameters.The imbalance between the number of samples for each label (movement type) was minimized for each division.

ECoG dataset of the video perception task (in-house dataset from Fukuma et al., 2022 20 ) Subjects
The ECoG dataset for the video perception task consisted of 17 subjects (12 males; age range 11-51 years), with subdural electrodes placed around their visual and temporal cortices for the treatment of epilepsy.One subject participated twice within 2 years due to a second surgery (E07 and E11).

Construction of the semantic vectors: Conversion to the semantic vectors
For each 1-s scene in the training videos and validation video, the semantic meaning of the scene was represented as a semantic vector based on the scene annotations and the trained skip-gram model.A still image was extracted from each 1-s scene, resulting in 3,600 images for the six 10-min training videos and 150 images for the first 2.5 min of the validation video.Each extracted image was manually annotated by five annotators with descriptive sentences containing 50 or more Japanese characters.Using the same preprocessing method performed with the Japanese Wikipedia dump data, lemmatized words were extracted from the annotations and filtered by discarding words that did not exist in the text corpus of the trained skip-gram model.The remaining words were then converted to 1,000-dimensional vectors using the trained skip-gram model, which were first averaged within each annotation and then averaged among the five annotators to create a 1,000-dimensional semantic vector for each scene.

Experimental settings and ECoG recordings
The subjects either sat on beds in their hospital rooms or were seated on chairs to perform the experimental tasks.A computer screen was placed in front of the subjects to show the video stimuli.A pair of speakers was also placed near the subjects to play

Signal preprocessing
Noisy channels identified by visual inspection were first excluded from the following analyses (for the number of analyzed channels, see Supplementary Table 2).The ECoG signals were then filtered with a lowpass filter (8th-order Chebyshev Type I infinite impulse response filter) and downsampled to 1 kHz.Finally, the ECoG signals were rereferenced by common averaging among the noise-free channels.For the regression analysis, the ECoG signals corresponding to the 1-s scenes were used.

Division of the dataset for regression
To enable direct comparison with our previous study, regression was performed with nested cross-validation using the same division of the dataset as in our previous study 20 , in which the samples were divided into 10 groups so that the scenes from the same video source were kept in the same group, and the imbalance in the number of trials among the groups was minimized.Hence, nonrepeated 10-fold outer cross-validation with nonrepeated 9-fold inner cross-validation was performed for the regression.

Dataset overview
To acquire the dataset, the patients were implanted with intracranial electrodes around the front-parietal area.The patients performed repetitive movements with their hand (synchronous flexion and extension of all fingers) or tongue (sticking the tongue in and out from their mouth) at their own pace (~1-2 Hz) while movement cues were provided for 2 or 3 seconds.Each movement type was repeated 15-45 times.ECoG signals were recorded at 1 kHz.Nineteen patients included in the dataset were used for the analysis in this study.

Signal preprocessing
For each patient, ECoG signals were rereferenced by common averaging among all channels.For the classification analysis, ECoG signals from 0 to 2 seconds with respect to the start of the moment cues were obtained.

Division of the dataset for classification
For this dataset, the classification analysis was performed for each subject by nested cross-validation.To accurately estimate the classification accuracy, 10-fold outer crossvalidation was repeated 10 times by changing the division of the dataset, and the average of the 10 classification accuracies was calculated.In addition, for each outer fold, 10-fold inner cross-validation was also repeated 10 times by changing the division of the samples to better estimate the decoding parameter.The division was performed so that the imbalance between the numbers of samples for each label (hand or tongue movement) was minimal.

ECoG dataset of the image perception task ("faces_basic" experiment in the open dataset from Miller, 2019 25 ) Dataset overview
Patients implanted with intracranial electrodes in the inferotemporal subdural space participated in a visual perception task in which face or house images were presented.
During the recording of the ECoG signals at 1 kHz, the patients were presented with luminance-and contrast-matched grayscale face and house images for 400 ms in random order, with an interstimulus interval of 400 ms.In each of the three repeated runs, 50 different face or house images were presented.All fourteen patients in the dataset were included in the analysis for this study.

Signal preprocessing
For each patient, rereferencing of ECoG signals was performed by common averaging among all channels.For the classification analysis, ECoG signals from 0 to 400 ms with respect to the image presentation were used.

Division of the dataset for classification
Classification analyses with this dataset were performed with a within-patient approach by nested cross-validation.To accurately estimate the classification accuracy, 10-fold outer cross-validation was repeated 10 times by changing the division of the dataset to average the classification accuracies among the repetitions.The decoding parameters of each outer fold were estimated by 10-fold inner cross-validation, which was also repeated 10 times by changing the division of the samples.During the division of the dataset, the number of samples for each label (face or house images) was blanched in each group.cropped with a 300-ms time window; here, the time window was placed at 84 ± 150 ms with respect to the selected samples because the original study reported that the best Pearson's correlation coefficient was obtained with an 84 ms offset 28 .

Division of the dataset for regression
To prevent overestimation of the accuracy, the samples in the dataset for each patient were divided into 10 groups by splitting the time sequence of the samples.Nested crossvalidation for the regression analysis was performed with this division for both the inner and outer folds; hence, nonrepeated 10-fold outer cross-validation was performed using nonrepeated 9-fold inner cross-validation.

DMD
Assuming that the spatiotemporal signals originate from one dynamic system, the system can be described as follows: , ;  … (10)   where   ∈ ℝ is a vector representing the state of the dynamic system at time t, and  and  • denote the system parameters and the dynamics, respectively.Considering that the actual signal measurement is performed in discrete time intervals of ∆, the discrete time representation of the dynamic system corresponding to Equation (10) can be written as follows:    … (11)   where  denotes the l-th measurement of the system (  ∆ ;  1, 2, . . ., ).
Practically, the dynamics F needs to be estimated from the observed signals; here, the DMD method estimated the dynamics by linear approximation as follows: Then, A is acquired by minimizing the approximation error ‖  ‖ across all measurements of  1, 2, . . .,  1.
To minimize the approximation error, two matrices of the measurement,  and  , are introduced: In the original DMD method, the dimension of X was assumed to be P >> L; for the implementation in this study, see the "Signal stacking" section for a more detailed explanation.The linear approximation in Equation ( 12) can be written as  , where the optimized A is given by    and + is the Moore-Penrose pseudoinverse.By applying SVD to X:  ∑ * where  ∈ ℂ , ∑ ∈ ℂ ,  ∈ ℂ , * represents the conjugate transpose, and K denotes the rank used for the SVD approximation.Notably, the left and right singular matrices (U and V, respectively) satisfy  *   and  *  .This process assumes a low-dimensional structure for the dynamics.Here, A can be obtained by using the pseudoinverse of X acquired by the SVD: ∑   * Because the dimension of the measurement (P) is large, eigenvalue decomposition of A requires considerable computational resources.The DMD method addresses this problem by leveraging the orthogonal matrix U, yielding   *   *  ∑  .
Then, the eigendecomposition of  was performed as follows:    where each column in W is an eigenvector and  is the diagonal matrix of the corresponding eigenvalues  .Finally, the approximated eigenvectors of A (DM) are obtained as the columns in , with the corresponding eigenvalues given by :   ∑  .
By introducing the variable    /∆, the original dynamics can be approximated as: where  is the initial condition of the mode.

Signal stacking
The original DMD method was developed for signals with P >> L, where P and L denote the number of recording sites and measurements, respectively.However, for neural signals, P is usually smaller than L. In these cases, the signals can be augmented by stacking them h times to create the two measurement matrices  and  : Throughout this study, h was the minimum integer that satisfies ℎ .Moreover, out of the ℎ DMs obtained from these stacked signals, the first P DMs were used for the analysis.

Acquisition of the Gram matrix and sDM features
DMD was first applied to the preprocessed spatiotemporal signals (  ) of each trial in each dataset.Each DM () in the matrix of DMs () for the sample was then L2normalized following the method used in our previous study (Shiraishi et al., 2020).A projection kernel was then applied to each pair of the matrix of the L2-normalized DMs to generate the Gram matrix; similarly, according to Equation ( 5), the sDM features were calculated based on the matrix of the L2-normalized DMs.

Calculation of the PSD and power features
The PSD and power features were calculated based on the same 500-ms signals (  ) that were used to calculate the DMs and sDM features for the ECoG datasets of the arm motor task and hand movement task.For each channel in   , the PSD was calculated using a Hamming window and fast Fourier transformation of 512 points.To calculate the power features, the PSD was averaged within the given frequency band (e.g., 80-150 Hz for the high-ɤ band).

Neural decoding (Nested) cross-validation
Throughout this study, the training parameters (cost or λ parameter for the decoder, and rank parameter for the sDM features) were always optimized only using the training samples independently from the testing samples to prevent overfitting of the decoder.
For all datasets, nested cross-validation was applied; for each outer cross-validation, the testing samples of the outer fold were decoded with a decoder trained based on all training samples (of the outer fold), with the optimized parameters estimated based on the inner cross-validation with the training samples.

Classification analysis
In this study, classification analysis was performed with either an L2-regularized SVM or an L1-regularized SVM.For the L2-regularized SVM model decoding based on the Gram matrix, classification was performed by LIBSVM 3.1 29 with the following parameters: svm_type, 0 (C-SVC); kernel_type, 4 (precomputed kernel).For the L2regularized SVM with the linear kernel, the following parameters were used by LIBSVM: svm_type, 0 (C-SVC); kernel_type, 0 (linear).For the L1-regularized SVM, the classification was performed by LIBLINEAR 1.8 30 with the following parameters: s, 6 (L1-regularized logistic regression).In each case, the other parameters were set to their default values.For all classification analyses, the cost for the SVM was optimized by (nested) cross-validation from candidates of 10 , 10 , ⋯ , 10 .When the number of training samples for each class was imbalanced, the samples for the classes with less samples were repeatedly included so that the number of samples was increased to that of the class with the most samples.Moreover, the classification accuracies were evaluated by the balanced accuracy.

Regression analysis
Due to the limitation of the computational time, L2-regularized ridge regression was used in this study.Parameter λ was optimized from candidates of 10 , 10 , ⋯ , 10 by (nested) cross-validation for each dimension of the dependent variables.The optimization was performed by minimizing the mean square error, and the regression accuracy was evaluated based on the average of the correlation coefficients between the true and predicted values for each dimension.

Statistical tests
The reproducibility of the snDM features and the power features was tested by one-way ANOVA with post hoc Tukey-Kramer tests (Fig. 2e).
The classification accuracy of the L1-regularized SVM with combined snDM and seDM features was compared with that of the L1-regularized SVM with high-ɤ power features by two-tailed paired t tests (Fig. 3b).
The classification accuracies using the frequency-filtered snDM features were tested among the frequency bands by one-way ANOVA with post hoc Tukey-Kramer tests to determine the frequency band that was most informative for classification (Fig. 3c).Supplementary Table 3 shuffled phases.The higher accuracy than the power features and phase-shuffled ECoG signals indicate that the DM-based decoding method efficiently utilizes the information encoded in the spatiotemporal patterns of the ECoG signals.Although the DM-based decoding method improved the accuracy of the neural decoding results, three remaining problems must be addressed: 1) the large computational time prevents real-time application of the decoding method in braincomputer interfaces (BCIs); 2) the method is incompatible with non-kernel-based machine learning algorithms; and 3) the characteristics of the signals contributing to decoding cannot be easily interpreted.The computational time for predicting a new sample (trial) with a kernel-based machine learning model is proportional to the number of training samples; hence, it is difficult to obtain predictions in real time with a large training dataset.Furthermore, it is difficult to combine sparse regularization methods with kernel-based algorithms, although the effectiveness of sparse regularization in neural decoding has been demonstrated both empirically[14][15][16] and theoretically17 .Finally, the kernel-based algorithms can evaluate only the distances among the matrices of DMs for each trial, preventing the evaluation of the characteristics of the signals in each trial.Here, we solved the above three problems by designing a nonlinear feature mapping corresponding to the projection kernel.Known as the kernel trick, kernel-based machine learning algorithms with a kernel function  •,• are equivalent to linear machine learning algorithms with a nonlinear feature mapping function  • satisfying   ,      ∀ ,  ∈ compared between the kernel-based L2-regularized SVM, with the Gram matrix acquired based on the DMs by the projection kernel, and the non-kernel-based (linear) L2-regularized SVM with the corresponding sDM features.We assessed the decoder training time and the time for the decoder to predict a new sample by changing the number of training samples per class (movement type).Because SVD was a common process among the different decoding methods, the measurement was performed based on the precomputed SVD components.Moreover, it should be noted that the rank parameter to calculate the sDM features was fixed at 300.
proposed sDM features to characterize spatiotemporal signals.The sDM features were DM representations transformed by a mathematical conversion equivalent to the projection kernel and kernel-based L2-SVM.Throughout this study, the following advantages of the sDM features were shown.(1) The sDM features enable neural decoding with DMs for real-time BCIs that require small delays to control external devices based on ECoG signal changes.In fact, the use of the sDM features drastically reduced the prediction time from ~O(n 0.75 ) to ~O(n 0.001 ) (n is number of training samples), which is the time achieved to classify the Gram matrix acquired by the projection kernel based on the DMs.Notably, the training time of the decoder was also reduced from ~O(n 1.99 ) to ~O(n 0.97 ), enabling the use of more training samples.(2) The sDM features allow neural decoding to be performed with L1 regularization, thereby improving the classification accuracy using ECoG signals.Moreover, even with regression with L2 regularization, the regression accuracy for the finger flexion task increased.These results strongly suggest that the sDM features are promising for fast and accurate decoding.(3) The characteristics of the signals can be inferred based on the different behaviors of the snDM, seDM, and the frequency-filtered sDM features.Based on the results, the ECoG signals were characterized by snDM features as opposed to seDM features.Furthermore, because the use of frequency-filtered sDM features did not show considerable improvement in the decoding accuracy compared to use of the nonfiltered sDM features, the frequency information appears to be less effective in the neural decoding of ECoG signals.Therefore, the proposed sDM features have several key factors that demonstrate their effectiveness as BCI decoders for ECoG signals: (I) high accuracy, (II) fast computational speed, (III) good scalability, and (IV) good interpretability.

(
ECoG dataset of arm motor task (Shiraishi et al., 2020 10 ) and ECoG dataset of video perception task (Fukuma et al., 2022 20 )), and publicly available datasets (ECoG signals acquired during hand versus tongue movement, flexion of fingers, and visual perception of face and house images (Miller, 2019 25 )).To record the in-house datasets, experiments were performed in accordance with the experimental protocol approved by the ethics committee of each hospital (Osaka University Medical Hospital: Approval No. 08061, No. 14353, No. 19257, UMIN000017900; Juntendo University Hospital: Approval No. 18-164; Nara Medical University Hospital: Approval No. 2098).Prior to the experiments, all subjects or their guardians provided written informed consent to participate in the study.

10 -
min video (validation video).No fixation point was presented in the video stimuli; the subjects were instructed to freely watch the videos.The presentation of the training videos took one to three days to complete.The validation video was presented after the presentation of all training videos.Visual stimuliThe six training videos and the validation video were created by sequentially concatenating short film or animation clips.The clips were cutouts from one of 75 trailers or behind-the-scene features downloaded from Vimeo and had a median duration of 16 s (interquartile range, 14 to 18 s).The six 10-min training videos were created by concatenating 224 clips, and the 10-min validation video was created with four repetitions of a 2.5-min video composed of 11 clips.The short video clips were cut so that they did not overlap; hence, there were no overlapping scenes not only between the training videos and the validation video but also among the training videos.The resulting videos contained scenes that widely varied in semantic meaning, such as animals, foods, landscapes, and text.Construction of the semantic vectors: Training the skip-gram modelA skip-gram model was trained using Japanese Wikipedia dump data with the following steps based on the procedure described in a study by Nishida and Nishimoto 26 .(1) Words were segmented and lemmatized from Japanese text in the articles in the Wikipedia dump to create a text corpus using MeCab27 , an open-source text segmentation software, along with the Nara Institute of Science and Technology (NAIST) Japanese dictionary, a vocabulary database for MeCab.(2)In the text corpus, words other than nouns, verbs, and adjectives and words that appeared less than 120 times were discarded, resulting in a text corpus of 365,312,470 words, consisting of 94,337 nouns, 4,922 verbs, and 631 adjectives.(3) By using the Gensim Python library, a skip-gram model was trained with the text corpus.The training parameters were set as follows: dimension of word vector representation, 1,000; window size, 5; number of negative samples, 5; use of hierarchical softmax function, no.
sounds during the presentation of the video stimuli.During the experiment, ECoG signals were recorded at 10 kHz by EEG-1200 (Nihon Koden, Tokyo, Japan) by referencing the average of two intracranial electrodes.The presentation timing of the video stimuli was monitored by DATAPixx3 (VPixx Technologies, Quebec, Canada) and recorded as digital signals synchronized to the ECoG signals.
To acquire the dataset, patients were implanted with intracranial electrodes around the front-parietal area.The patients performed repeated movements (flexion and extension) of individual fingers; the movement of each finger was measured at 25 Hz by a 5-DOF data glove with simultaneous recording of 1-kHz ECoG signals.Patients were given a 2-second cue to move individual fingers at their own pace.The movement cue for each finger was presented in a random order, with an intertrial interval of 2 seconds.There were 30 movement cues for each finger.All nine patients in the dataset were included in the analysis.Signal preprocessing ECoG signals were first rereferenced by common averaging among all channels for each patient.For each measurement of finger flexion, the corresponding ECoG signals were cropped to form a sample to be regressed with the following procedure: (1) In the dataset, values for finger flexion movements were upsampled from 25 Hz to 1 kHz and saved with the 1-kHz ECoG signals, leading to 40 continuous samples for the same value.Based on these values, the timing of the first sample was identified.The finger flexion values for these samples were selected as the target variables for the later regression analysis.(2) The ECoG signals corresponding to the selected samples were

Fig. 1 .
Fig. 1.Representative example of dynamic mode decomposition (a) Spatiotemporal signals of   were created by adding two different signals:   : 13-Hz sine wave with decaying amplitude over time ( ,  sech  3 0.25 sin 2 13 ;   : 8-Hz sine wave with increasing amplitude ( ,  sech  3 2 sin 2 8 .Both signals were sampled at 1 kHz for a duration of 0.5 s.Observation points (i.e.position p) ranged from -10 to 10, with an interval of 0.25.(b) Singular values acquired by SVD based on A are shown, starting with the largest value.All values except the first four SVD components were zero.Because of the stacking process prior to the SVD (see Methods), only 493 components were acquired by the SVD process.The horizontal axis is shown with a log scale.(c) Four DMD components determined based on the four SVD components with nonzero singular values are shown for time dynamics and DMs.For visibility, each DM was L2-

Fig. 2 Representative
Fig. 2 Representative Gram matrix and sDM features during the arm movement task (a) The ECoG signals recorded when patient 1 performed three types of movements were used to calculate 300 DMs for each trial to obtain the Gram matrix acquired by projection kernel   ,  among the trials.(b, c) The sDM features were calculated based on the same DMs used in (a) to visualize the (b) averaged sDM features among the trials of the same movement type and (c) F values of the sDM features among the different movement types.Notably, the sDM features are symmetrical due to their definition.(d) For each patient, Pearson's correlation coefficient was calculated between the snDM features and the PSD of each frequency by concatenating the features from all channels and all trials.The average value is shown with a black line, and the colored area denotes 95% confidence intervals (CIs) among the patients.(e) For the power features of each frequency band and the snDM features, the reproducibility of the features among the trials of the same movement type was evaluated with Pearson's correlation coefficients.To calculate the reproducibility of each feature, the correlation coefficients were calculated among all possible pairs of trials of the same movement type and averaged for each patient.The average reproducibility is shown in the bar graph, with 95% CIs among the patients.

Fig. 3
Fig. 3 Decoding accuracy and computational time using sDM features.(a) Training and testing times of SVM models plotted against different numbers of training samples per class.The training time versus the number of samples per class was fitted by a linear model in log space for both time and number of samples per class to estimate the computational complexity.(b) Accuracies to classify three types of movements using snDM features, seDM features, and the combination of both features are shown as bars with the 95% CI among the subjects.For the combination of the SVM model and features shown in (a), same color to (a) was used to plot the bar; for other combinations, white bars were used.Since the sDM features were constructed such that classification with a linear L2-SVM model based on the sDM features was mathematically equivalent to classification with kernel-based L2-SVM based on the Gram matrix of DMs, the classification accuracies were exactly the same.(c) Frequency-filtered snDM features were calculated for each frequency band of the ECoG

Fig. 4
Fig. 4 Neural decoding accuracies for various ECoG datasets.(a-d) Bars show decoding accuracies using snDM features, seDM features, and the combination of both features (black bars) and those using the corresponding part of the frequency-filtered sDM features (white bars) for ECoG datasets: (a) hand versus tongue movement task (Miller, 2019), (b) image perception task (Miller, 2019; classification accuracy is reported in Miller et al., 2016 31 ), (c) finger flexion task (Miller, 2019; Correlation coefficient is reported in Miller et al., 2009 28 , although the correlation was acquired by observation not prediction), and (d) video perception task (Fukuma et al., 2022).
accuracies of neural decoding using snDM and seDM features based on ECoG signals were compared among different types of tasks and with the accuracies reported in previous studies.Here, we used our own ECoG dataset of video perception task (Fukuma et al., 2022) and an open dataset of ECoG signals, including hand versus tongue movement task, finger flexion task, and image perception task

. Classification accuracy of individual patients for the image perception task (house vs. face image).
*These subjects were excluded from the original study.