Recursive Exponentially Weighted N-way Partial Least Squares Regression with Recursive-Validation of Hyper-Parameters in Brain-Computer Interface Applications

A tensor-input/tensor-output Recursive Exponentially Weighted N-Way Partial Least Squares (REW-NPLS) regression algorithm is proposed for high dimension multi-way (tensor) data treatment and adaptive modeling of complex processes in real-time. The method unites fast and efficient calculation schemes of the Recursive Exponentially Weighted PLS with the robustness of tensor-based approaches. Moreover, contrary to other multi-way recursive algorithms, no loss of information occurs in the REW-NPLS. In addition, the Recursive-Validation method for recursive estimation of the hyper-parameters is proposed instead of conventional cross-validation procedure. The approach was then compared to state-of-the-art methods. The efficiency of the methods was tested in electrocorticography (ECoG) and magnetoencephalography (MEG) datasets. The algorithms are implemented in software suitable for real-time operation. Although the Brain-Computer Interface applications are used to demonstrate the methods, the proposed approaches could be efficiently used in a wide range of tasks beyond neuroscience uniting complex multi-modal data structures, adaptive modeling, and real-time computational requirements.


Methods
Generic PLS and NPLS regressions. Ordinary Partial Least Squares regression 18 is an iterative procedure to estimate a linear relationship between a vector of independent (input) variables and a vector of dependent (output) variables on the basis of observation matrices  ∈ × X N n and  ∈ × Y N m : = + Y XB D, where  ∈ × B n m is the matrix of linear coefficients, and  ∈ × D N m is the matrix of noise. To build the model, the observations are iteratively projected into the low dimensional spaces of latent variables while trying to explain the maximum of variance of X and Y simultaneously:   are the matrices of the loading vectors (projectors), E and F are residual matrices, and F is the number of iterations (factors). The PLS approach is particularly well suited for high dimensional data due to its efficient dimensional reduction technique.
Multi-way Partial Least Squares (NPLS) regression is an algorithm in the PLS-family that is adopted to the case of tensor variables 32 . Tensors (multi-way arrays) are higher-order generalization of vectors and matrix data representation: vectors are tensors of order one, and the matrix is a second order tensor. In this paper, a tensor is denoted as  ∈ ×…× X I I n 1 , where n represents the order of the tensor. Detailed information about the tensors is described 57 . NPLS combines the robustness of PLS regression with the ability to preserve the structure of the data, which is lost in vector-oriented approaches. For independent and dependent tensors of observation Recursive PLS regression. Several recursive PLS-based algorithms were proposed to treat online data flows without complete recalibration of the model 35,49,50,[58][59][60] . Helland et al. 49  are concatenated to P t T 0 and Q t T 0 , respectively. Then, the PLS model is recalibrated, and the new loading matrices are prepared for use in the next iteration of the algorithm. Thus, the method always keeps the size of the analyzed data by packing the data into the loading matrices P and Q of constant size. The main shortcoming of the RPLS consists in the possible loss of information if all of the data are not retained in the matrix of latent variables. Thus, appropriate choice of the number of factors F is of crucial importance.
Qin 50 proposed an algorithm where the hyper-parameter F is estimated by the cross-validation procedure applied for the blocks of data: the whole data are divided on the blocks followed by leave-one-block-out cross-validation. However, application of the cross-validation procedure requires storing in memory the entire data set. Moreover the cross-validation procedure is not possible for online tasks. In addition, the hyper-parameters could be non-stationary and time-varying for some tasks. Despite this, if the value of the hyper-parameter F is predefined in some way, then the Recursive PLS method is widely used due to relatively small memory requirement. Forgetting factor 59 , nonlinearity 50 , as well as neural networks 60 were coupled with the Recursive PLS to model time-varying process. Recursive N-way PLS (RNPLS)-a generalization of the RPLS algorithm to the case of the tensor-input/tensor-output variables-was proposed in 35 .
Contrary to Qin's approach 50 , Dayal and MacGregort 51,52 proposed Recursive Exponentially Weighted PLS regression, which allows online treatment of data flow without loss of information. The algorithm is extremely fast in comparison with other recursive PLS-based approaches and is of great importance for real-time applications. Moreover, the integrating forgetting factor allows one to exponentially discount past data.
Unlike Qin's RPLS, in the REW-PLS, the old data from the matrices , which are used for model calibration. When the new data is available, the covariance matrices are updated: is a forgetting factor. In the case of λ = 1 past data cannot be discounted. There is no loss of information in the method because the calibrated model is completely identified by the covariance matrices. The detailed description of the fast model calibration algorithm based on the covariance matrices is reported 52 . However, as for all the PLS-family methods, the hyper-parameter F should be predefined for the REW-PLS algorithm. The question of the efficient choice of F-especially for online applications-was not considered by the authors of the REW-PLS algorithm. are estimated. Here, the k-mode tensor product is denoted as "× k " 57 . Similar to the NPLS method, the covariance tensors are used in the PARAFAC-based tensor decomposition procedure 32   , and the forgetting factor λ λ ≤ ≤ (0 1), the covariance tensors are recalculated as: as an initial approximation. The B New is then defined. A graphical representation of the REW-NPLS method is shown in Fig. 1. A detailed description of the algorithm is given in Appendix A.
The mean-centering and rescaling of each column of the observation data should be done as in the PLS-family methods. In a time-varying process, the centering and rescaling parameters could change with time and should be continuously updated. The procedure for the matrix case is reported 51 . The same approach could be applied for the tensors because all parameters are estimated independently for each coordinate.
At moment t, tensors of observation  ∈ . Taking into account the forgetting factor λ, the effective number of observation at t is . The sum and the sum of squares of the elements along the observation modality are estimated as: SCientifiC REPORTS | 7: 16281 | DOI:10.1038/s41598-017-16579-9 where " ⁎ X 2 " represents the element-wise square of the tensor X. The effective mean and the standard deviation along the first modality of the tensors at time t are: For simplicity of notation, we omit the indexes and write µX and σX. The tensors are identified similarly. To apply the centered model to the non-centered testing data X Test , equation (1) should be rewritten: A detailed description of the normalization procedure is given in Appendix B. Determination of the appropriate number of factor F is a common task for all PLS-family methods. While the cross-validation approach 55 is widely used for this purpose, it cannot be efficiently applied for online modeling. Here, we propose a procedure for recursive estimation of the optimal number of factors ( ⁎ F ) for REW-NPLS algorithm. The main idea exploits the newly available data as a testing set for the currently available models before the models are updated.
At time t, the new tensors of observation X t and Y t are available. The current prediction model − , with the forgetting factor γ γ ≤ ≤ (0 1). The optimal number of factors, corresponding to the moment t is defined as t f t f f F 1 max and the optimal model is = .
The detailed description of the Recursive-Validation procedure is given in Appendix C. Its graphic representation is shown in Fig. 2.

Experiments in Monkeys. Data description.
To validate the proposed approaches, the publicly available dataset is considered (http://neurotycho.org/epidural-ecog-food-tracking-task). The data are recorded and distributed by the Laboratory for Adaptive Intelligence, BSI, RIKEN. All procedures were performed in accordance with protocols approved by the RIKEN ethics committee. The experiments consisted in recording of epidural ECoG signals simultaneously with continuous 3D trajectories of a Japanese macaque's right wrist, elbow, and shoulder 25,61 . The experiments were carried out in two monkeys denoted as "B" and "C". Overall, 20 recordings (10 for each monkey) are provided 11 . The length of each recording was approximately 15 minutes. To record the hand motions (shoulders, elbows, and wrists), an optical motion capture system (Vicon Motion System, Oxford, UK) with a sampling rate of 120 Hz was applied. The ECoG signals were recorded from 64 electrodes (Blackrock Microsystems, Salt Lake City, UT, USA) with a sampling rate of 1 kHz, implanted in the epidural space of the left hemisphere (Fig. 3A). Each experiment showed the monkeys pieces of food, and they were trained to receive it with the right hands. The location of the food was random at a distance of about 20 cm from the monkeys. A scheme of the experiment is shown in Fig. 3B.
Feature extraction and tensors formation. An input data feature tensor X was formed from the ECoG epochs containing 1 second of the signal taken continuously with a time step of 100 ms. Each ECoG epoch was mapped to the spatial-temporal-frequency space by continuous wavelet transform (CWT). The complex Morlet wavelet was chosen as a mother wavelet 25,38,62,63 . The frequency band from 10 to 150 Hz with step 10 Hz was chosen 11 . Absolute values of the wavelet coefficients were decimated along the temporal modality 100 times by averaging 10 sliding windows each 100 ms long. A detailed description of the feature extraction procedure has been reported 24 . The tensor of the output variables Y was formed from the corresponding 3D coordinates of the monkey's right wrist, elbow, and shoulder. Figure 3C represents the scheme of the data preparation procedure.
According to 63 , each 15-minute recording from the available dataset of 20 files was split into non-overlapping training (~10 first minutes) and test (~5 last minutes) subsets; see correspond to the minimal error at the current moment t; this is considered to be optimal for the current moment.    15 10 64 , the ECoG signal from 64 channels is mapped by the Continuous Wavelet Transform with 15 frequencies (10, 20, …, 150 Hz). Then, the absolute values of the wavelet coefficients are decimated 100 times along the temporal modality, i.e., 1000 points, representing 1 second; these are decimated to 10 points. The response variable  ∈ × t y( ) 3 3 is formed from the corresponding 3D coordinates (x y z , , ) of monkey's wrist, elbow, and shoulder. The next epoch was taken with a time step of 100 ms. (D) Data tensors are split into non-overlapping training (70%) and test (30%) sub-tensors. from all participants. During the experiments, each subject lifted his/her index finger without any external cue or conditional stimulus. The binary position of the finger ("up" or "down") was registered by a laser beam. Only one finger was moved during each session. The sessions varied between 5 and 20 minutes and provided 90 to 520 self-paced finger movements. The brain activity recordings were made in a magnetically shielded room using 306-channel MEG devices (Elekta Neuromag, Helsinki, Finland). The device provides MEG data recorded from 102 triple sensor units (one magnetometer and two orthogonal planar gradiometers) with a sampling rate of 1 kHz (Fig. 4A). Feature extraction and tensors formation. Similarly to the experiments in monkeys, an input data feature tensor X was formed from the MEG epochs containing 1 second of the signal taken continuously with a time step of 100 ms. Each MEG epoch, was mapped to the spatial-temporal-frequency space by CWT based on the complex Morlet wavelet. The frequency band from 5 to 100 Hz with step 5 Hz was used. The binary vector of the output variables Y was formed from the corresponding position of the index finger. Figure 4B represents the scheme of the experiments and the data preparation procedure.
Four subjects participated in the experiments. All four sessions were recorded with each subject: 2 with left and 2 with right index movements. Overall, 16 recordings were prepared. Each recording was split on two non-overlapping training (70%) and test (30%) subsets.

Prediction performance evaluation.
To predict performance evaluation, several criteria have been applied in BCI experiments 64 . In this paper, we use Pearson correlation (r) between predicted and observed response because it is sufficiently informative and could be easily computed and interpreted. It is commonly used in BCIs to assess decoders 11,25,62,63 .

Experiments in monkeys.
Here we consider vector-and tensor-oriented approaches (PLS and NPLS) as well as their recursive modification (REW-PLS and REW-NPLS). For the recursive approaches, the training sets were split on the non-overlapping 10-second long batches with 100 epochs per batch and about 70 batches per training set. The recursive models (REW-PLS and REW-NPLS) were adjusted each time when the new batch arrived. The non-recursive models (PLS and NPLS) were calibrated just once on the entire training set. For the recursive approaches, a forgetting factor λ = 1 was chosen, i.e., there was no discounting of past data batches.
Prediction accuracy and robustness. The methods were compared based on their prediction accuracy and robustness to the value of the number of factors. For each approach, namely, PLS, NPLS, REW-PLS, and REW-NPLS, the correlation between predicted and observed trajectories is represented in Fig. 5 and is averaged over recordings, hand's joints (shoulder, elbow, wrist), and coordinates. The optimal values of the factor numbers that provide the maximum of averaged correlations and the corresponding correlations are given in Table 1.

Recursive-Validation of factors number.
To assess the quality of the Recursive-Validation procedure in estimating the number of factors F, the prediction results were compared to the REW-NPLS with an optimal number of factors = ⁎ F 200 (see Table 1). To model the online data flow, the training dataset was split on seventy 10-second non-overlapping batches (100 epochs per batch). When the new data-batch arrives, the model was adjusting according to the new data and was then tested on entire testing set. For the Recursive-Validation procedure, the   (Fig. 6A). The REW-NPLS with dynamic adaptation of F significantly outperformed the REW-NPLS with an optimal value of factors ( = ⁎ F F ) until the number of batches reached 25. The difference became insignificant (ANOVA test, significance level α = 0.05). Figure 6B

Experiments in humans. Recursive-Validation vs. Cross-Validation. To evaluate the proposed
Recursive-Validation procedure, the number of factors estimated by this approach was compared to the number of factors estimated by the standard 10-fold cross-validation procedure. The number of factors was taken from the interval ∈ F [1,20]. Table 2 demonstrates the optimal number of factors ⁎ F estimated by the methods as well as the value of the corresponding correlation = ⁎ ⁎ r r F ( ) on the test sets. The difference in the resulting correlations was not significant for either left-or right indexes (ANOVA test, significance level α = . 0 05).
Modality influences analysis. The predictive models were identified by the REW-NPLS with Recursive-Validation of ⁎ F on the complete training set and are shown in Fig. 8. The models are represented by their averaged influence in frequency, temporal, and spatial modalities. The weights are averaged over 4 subjects (2 recording per subject) for left and right finger self-paced movements.

Implementation
The crucial point for practical application of signal-processing approaches in BCI systems is a possibility for of the real-time computations. All methods proposed here are implemented in Matlab ® (The MathWorks, Inc., Natick, US) and can function in real-time on a standard computer (Intel Xenon E5-2620, 2 GHz, RAM 32 Gb). Due to real-time functioning requirements, blocks of applications and adaptations of the model are separated and implemented in two independent processes while communicating through shared memory. The Model Application Block provides a 10-Hz decision rate, i.e. control command is generated each time that the new 100 ms data block When the updated model is prepared, it replaces the current model in the Model Application Block. For the Model Application Block, the processing time for generation of a control command with a new 100 ms data block (sampling rate 1 kHz, 64 channels, 15 analyzed frequencies) is 37 ± 8 ms. Updating the model with 10 seconds of data stack in the Model Adaptation Block takes 3.41 ± 0.06 s. Thus, both blocks meet the real-time functioning requirements (37 ms ≪ 100 ms and 3.41 s ≪ 10 s). The scheme of the system is given in Fig. 9.

Discussion
Decoding of the neural activity for BCI systems is a challenging task 1 . The huge dimension, considerable data variability in time, as well as real-time computational requirements impose essential limitations on real-life applications of BCIs 66,67 .
The paper proposes the REW-NPLS algorithm for recursive analysis of multimodal data. While the standard vector-oriented approaches applied in BCI do not consider the structure of the analyzed data 11,68 , the tensor-based methods are commonly not adjusted for online computations and require complete recalibration when the new data are available 38,69 . Previously 35 , the RNPLS algorithm allows online treatment of multi-modal data, however, only part of information is extracted from arriving data batches. This could lead to a loss in the quality of the prediction model. The REW-NPLS method unites both tensor-based structures of NPLS with the possibility of efficient online treatment of data without information loss inherited from the REW-PLS.
The crucial parameter of the PLS-family methods is the number of iterations (number of factors, F) used in the model for projections into the space of latent variables 70 . The robustness of the methods in the case of F's suboptimal values is very important because there are mostly heuristic algorithms for selection of the optimal value of F. The task is especially critical for online data flows when the values of the parameters are unfixed and could vary widely with time.
In the current paper, the proposed REW-NPLS approach was studied for the robustness of its prediction accuracy to the suboptimal values of the number of factors on the ECoG data. The recursive approaches (REW-NPLS, REW-PLS) do not concede to the corresponding non-recursive approaches (NPLS, PLS) according to their prediction quality result; Fig. 5 and Table 1 . However, the prediction accuracy of the vector-oriented approaches drops less than 10% only when the number of factors varying in the range ≤ ≤ F 20 100; the tensor-oriented methods remain within 90% of the optimal correlation in the range ≤ ≤ F 20 600. The prediction accuracy results provided by the tensor-oriented approaches significantly outperform the vector-oriented ones for ≥ F 200 (ANOVA test, significance level α = . 0 05). The cross-validation procedure 56 is a state-of-the-art approach to estimate the number of factor hyper-parameters for the PLS-family methods, but it is time and memory intensive. Moreover, its direct application to online data flows is impossible. In addition, any predefined and fixed value of the hyper-parameter could be inappropriate because it should be regularly re-estimated to follow possible data variations over time. The Recursive-Validation procedure proposed here allows online estimates of the parameter values without significant additional computational time expenses. Figure 6A demonstrates the prediction accuracy of the REW-NPLS model (identified with a predefined optimal number of factors = ⁎ F 200) compared to the REW-NPLS coupled with Recursive-Validation procedure for estimation of factor numbers. Both approaches showed equivalent prediction accuracies as follows from the experiments. Moreover, when the number of batches for model calibration is less than 25, the use of constant ⁎ F leads to overfitting in small training sets. However, the Recursive-Validation procedure adjusts ⁎ F value to optimize the prediction accuracy. Figure 6B demonstrates that the number of factors estimated by the Recursive-Validation procedure is significantly less than ⁎ F as estimated offline on the entire dataset. This additionally reduces the possibility of overfitting.
When all 70 batches are processed, , the difference in prediction accuracies is insignificant (ANOVA test, significance level α = . 0 05). Table 2   For tensor-based approaches, Influence Analysis can assess the weights of the temporal, frequency, and spatial modalities on the predictive model. Figure 7 represents the evaluation of the REW-NPLS model with Recursive-Validation of ⁎ F in time and its comparison with REW-NPLS model calibrated on the complete dataset with an optimal number of factors ⁎ F on the ECoG data. For the REW-NPLS algorithm with Recursive-Validation, the models identified on batches 10, 25, 40, 55, and 70 (corresponding to 15%, 36%, 57%, 79%, and 100% of the training set) are considered. For the frequency modality, the weight distributions demonstrate the same behavior for all the models: the maximum peaks are at 10, 30, and 90 Hz. In the temporal modality, the weights of the elements tend to increase until 100 ms before the motion moment for both the online and offline models. In the spatial modality, for the REW-NPLS model, the set of the most relevant electrodes (10, 6, 8, and 15) is defined on the batch number 40 and does not change after; however, the weights of the electrodes become more equilibrate with time. For the offline model, the same electrodes 10, 6, 8, and 15 are the most informative. Figure 8 represents applications of the Influence Analysis to the models identified by the REW-NPLS algorithm with the Recursive-Validation of ⁎ F in the MEG experiments. The results are prepared for the left and right finger self-paced movements and are averaged over 8 recordings carried out in 4 subjects. In the frequency modality, there is a maximum around 20 Hz for both fingers. The frequency bands above 50 Hz are less informative. This corresponds to the quality of the MEG-data recording system. In the temporal modality-similar to the case of the ECoG-based experiments-the weights of the elements tend to increase until the motion moment. In the spatial modality, the localization of the informative electrodes for the left and right finger correspond to the motor zone of the cortex 71 . For the left finger, the most informative electrodes are 1143, 1112, 1133, 1132, and 1142; whereas for the right finger, the most informative electrodes are 0442, 0433, 0712, 0443, and 0423 (according to the Elekta Neuromag system notation, see Fig. 4A).
The main limitation of the proposed REW-NPLS approach is its memory consumption, because it stores the covariance tensors in the memory. At the same time, the memory used by the algorithm is constant and does not change over time. The limitation of the Recursive-Validation algorithm is the need to compute the model for F max number of factors. This could be significantly greater than the optimal number ⁎ F RV . However, F max can decrease with time if  ⁎ F F RV max is constantly observed. In parallel to a set of neuroscience tasks, such as BCIs, fMRI analysis, etc., the proposed REW-NPLS and the Recursive-Validation algorithms could be applied to a wide range of tasks where adaptive modeling of high dimension tensor flows is necessary. Examples include dynamic analysis of images and video sequences, adaptive monitoring of complex industrial processes, etc.

Perspectives
The next step of the study is application of the proposed methods implemented in real-time operating software in the clinical BCI in tetraplegic subjects. Online adjustment provided by the REW-NPLS algorithm allows application of the closed-loop paradigm when the user immediately receives feedbacks from the system to his/her movement imaginations reflected in the brain activity. This will allow adaptation of the human behavior in parallel with adjustment of the BCI decoder. This should result in efficiency of the BCI system. Within the framework of the CEA-LETI-CLINATEC ® BCI project, the fully-implantable device WIMAGINE ® 72 for chronic measurement and wireless transmission of ECoG data is currently developed. The full body exoskeleton EMY ® 73 is designed to let the tetraplegic subjects possibility control the exoskeleton fragments in real-time. The clinical protocol named "Brain Computer Interface: Neuroprosthetic Control of a Motorized Exoskeleton" was authorized from the French regulatory agencies to start with five patients including bilateral implantation of 2 WIMAGINE ® implants per patient (https://clinicaltrials.gov/ct2/show/NCT02550522).