Structured sparse multiset canonical correlation analysis of simultaneous fNIRS and EEG provides new insights into the human action-observation network

The action observation network (AON) is a network of brain regions involved in the execution and observation of a given action. The AON has been investigated in humans using mostly electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI), but shared neural correlates of action observation and action execution are still unclear due to lack of ecologically valid neuroimaging measures. In this study, we used concurrent EEG and functional Near Infrared Spectroscopy (fNIRS) to examine the AON during a live-action observation and execution paradigm. We developed structured sparse multiset canonical correlation analysis (ssmCCA) to perform EEG-fNIRS data fusion. MCCA is a generalization of CCA to more than two sets of variables and is commonly used in medical multimodal data fusion. However, mCCA suffers from multi-collinearity, high dimensionality, unimodal feature selection, and loss of spatial information in interpreting the results. A limited number of participants (small sample size) is another problem in mCCA, which leads to overfitted models. Here, we adopted graph-guided (structured) fused least absolute shrinkage and selection operator (LASSO) penalty to mCCA to conduct feature selection, incorporating structural information amongst the variables (i.e., brain regions). Benefitting from concurrent recordings of brain hemodynamic and electrophysiological responses, the proposed ssmCCA finds linear transforms of each modality such that the correlation between their projections is maximized. Our analysis of 21 right-handed participants indicated that the left inferior parietal region was active during both action execution and action observation. Our findings provide new insights into the neural correlates of AON which are more fine-tuned than the results from each individual EEG or fNIRS analysis and validate the use of ssmCCA to fuse EEG and fNIRS datasets.

www.nature.com/scientificreports/ Method Data fusion approach. Structured sparse CCA (ssCCA). CCA is a standard method to explore the relationship between two sets of multi-dimensional variables. When the number of samples (i.e., participants) are smaller than the number of features ( n ≪ p and n ≪ q ), overfitting becomes an inevitable problem. Usually in this situation, only fractions of features in each set are necessary for characterizing the two-set correlation.
To mitigate the overfitting and lack of generalization problem, a sparsity constraint has been added to the traditional CCA problem 37,39 . To make both canonical variates sparse, either an l1-norm (LASSO) term 40,41 or a combination of l1-norm and l2-norm (fused LASSO) 38,42 are added to the traditional CCA model. Recently, there have been a number of structured sparse CCA (ssCCA) approaches proposed using graph/network-guided fused LASSO penalties [43][44][45][46] . In this paper, we take advantage of the model suggested by He 47 where a network is represented by undirected weighted graph, G . GraphNet, a more general form of a traditional elastic net regularizer [43][44][45][46] , can be written as: where M is a matrix, and ( 1 , γ 1 ) are regularizing parameters. The vertices in G correspond to features (e.g., brain regions, optodes) and each edge, l ij , indicates if there is a link between optode i and j in G ; all the weights of l ij in the network depend on their adjacency conditions (e.g., high or low correlation). We have an elastic net when M = I in a GraphNet. Generally, M = L , where L is the Laplacian matrix of a graph. Let A be a sample correlation matrix, called the adjacency matrix, in which the higher pairwise correlation between two features corresponds to a larger weight. We identify p as features/brain regions in the dataset and a diagonal matrix, D , with where w ik depends on pairwise correlation of X and Y respectively 47 . This cost function also indicates that the adjacent regions linked in initial structure are expected to have similar weights 48,49 . To this end, the ssCCA would be formulated as: subject to: where C = (c 1 , c 2 , c 3 , c 4 ) > 0 are regularization parameters. Here, c 3 and c 4 are used to regularize the cost function controlling for spatial structure, and L w1 and L w1 are Laplacian matrices of modality 1 and 2 associated with w 1 and w 2 , respectively. Witten et al. 37 , Chen et al. 43 reformulated the constraints on w 1 and w 2 in Lagrangian form: where the regularization parameters 1 , 2 , α 1 and α 2 correspond to c 1 , c 2 , c 3 and c 4 , in (3) respectively.
Structured sparse multiset CCA (ssmCCA). Thus far, the ssCCA we have formulated does not consider more than two datasets. However, traditional CCA can be extended to more than two variables in different ways 22 . MCCA is the generalization of the CCA model where an objective (cost) function corresponding to the correlations between canonical vector pairs should be optimized such that the overall correlation between them is maximized: k,l |) . The subscript indicates the dataset, and the superscript indicates the observation in the dataset (e.g., ρ [n] k,l is the correlation between the n th canonical variates from k th and l th datasets).
In order to solve mCCA Kettenring 22 introduced five objective functions (e.g., F (x) = x indicates the sum of correlations (SUMCOR) cost function while F (x) = x 2 corresponds to the sum of squares correlations (SSQ-COR) cost function): where function F (·) is the cost function. Here, we chose the SUMCOR cost function to estimate canonical variates. The procedure can be summarized in the steps bellow.
Step 1 Step (1) can be solved using a partial derivative function of the SUMCOR objective function with respect to each w [1] m and equating it to zero to find the optimizing point. Since the SUMCOR objective function is a linear function of each w [1] m , the partial derivative is a constant, and therefore the closed form solution exists. The iterative algorithm starts from randomly initializing canonical variates and each w [1] k vector is updated subsequently to guarantee cost function optimization. All the w [1] k vectors are updated after the one step procedure. The procedure continues until convergence criteria are met and the w [1] k vectors are considered as the optimal solution. At step (2), the SUMCOR problem in (8) can then be reformulated in Lagrangian form: where 1 ≤ n ≤ N , 1 ≤ k, l ≤ M , k = l. As a reminder, n represents the n th observation in k th and l th datasets, and the regularization parameters [n] 1k , [ ). We applied the leave-one-out cross validation technique in which we estimated the model parameters (canonical variates) for n − 1 samples (participants) and calculated the errors on our one-left-out sample. We adopted the two steps cross-validation technique 50 . First, we found optimal values of α i when i was set to zero and, second, we used these optimal α i values to estimate optimal values of i . To address the overfitting problem and improve model generalization, Waaijenborg et al. 51 suggested that the test sample correlation should be approximately equal to the training sample correlation. In other words, the absolute difference between the estimated canonical correlations of the training and test samples are minimized.
Participants. Data were collected at two sites: the National Institute of Health (NIH) and University of Maryland (UMD). At NIH, participants were recruited from the healthy volunteer database at the National Institutes of Health. At UMD, participants were recruited through the program for Research for Extra Credit supported by the Department of Psychology. All experiments and methods were performed in accordance with guidelines provided in the study protocol (number: 18-CH-0001), which was approved by the Office Of Research Support and Compliance (ORSC) at NIH. In addition, all participants signed an informed consent approved by each site's Institutional Review Board (IRB) prior to the start of the experiment.
Forty healthy right-handed volunteers participated in the experiment at NIH the site; however, data from 27 participants had to be discarded as a result of technical malfunctions which caused either incomplete recording or poor signal quality in one or both modalities. The final sample consisted of seven females and six males. Twenty healthy volunteers participated at UMD. Data from eight participants was considered for further analysis (3 female and 5 male, mean age, 24.62 years). Between both sites, the final sample used for the data fusion algorithm consisted of 21 participants (22-29 years of age; mean age, 24.9 years). Experimental design. Our experiment was adapted from a paradigm used in the EEG mirror neuron literature with infant populations 52 . The paradigm consisted of 15 trials of action observation and 15 trials of action execution. Each trial was followed by a 20 s recovery period during which the participant passively viewed a moving pendulum. For further information on the paradigm we used please see Miguel et al. 53 .  54 . fNIRS data were collected using the Hitachi ETG-4100 system equipped with 10 infrared sources and 8 detectors placed over the somatosensory and parietal regions as in our previous fNIRS study investigating the AON 53 . A total of 24 channels (12 per hemisphere) measured changes in oxygenated hemoglobin (HbO) and deoxyhemoglobin (HbR) concentration.
After collecting experimental EEG/fNIRS data, we recorded the positions of sources and detectors on the head in reference to the nasion, inion, and preauricular landmarks using a 3D-magnetic space digitizer (Fastrak-Polhemus). This accounted for additional variations in cap placement and verified which channels covered each brain region. Figure 1 shows how fNIRS optodes were secured within the elastic of the EEG cap using customdesigned silicone fixtures, as well as the sensitivity profile of the fNIRS probe across the cortex.
Preprocessing. EEG data preprocessing. EEG data were pre-processed using EEGLab software using the method proposed by Debnath and colleagues 55 . EEG channels on the boundary of the electrode net (24 channels) were excluded from analyses because they were contaminated by eye, face, and head movements (  www.nature.com/scientificreports/ the EEGLAB plugin FASTER 56 , artifactual channels were identified and subsequently removed. In order to remove eye blinks, respiratory, and muscle movement noise in the data, we applied extended infomax independent component analysis (ICA). During preprocessing we used the interpolation option to estimate electrical activity of the noisy electrodes mentioned above that had been removed prior to the ICA. This is a popular and wellestablished technique that has been mentioned in the EEGLAB tutorial. Using the ADJUST plugin 57 to EEGLAB, artifactual independent components (ICs) were removed and the remaining data were epoched into − 1.5 s to 1.5 s intervals relative to the "Start Action" (SA) marker for each trial. Therefore, we extracted a total of 30 segments encompassing the 15 trials of each condition (observation and execution). These preprocessed data were used to conduct unimodal EEG analyses to assess AON activity as detected by EEG alone and were also carried forward into the EEG-fNIRS fusion analysis.
fNIRS data preprocessing. The fNIRS signal was processed using HOMER2 (MGH-Martinos Center for Biomedical Imaging, Boston, MA, USA), a MATLAB software package (The MathWorks, Inc., Natick, MA, USA). Only valid trials as identified through behavioral coding were retained in HOMER2 for data processing.
Using the optical density data, we used Principal Component Analysis (PCA) set at 0.9 for movement artifact removal 58 . Data were low-pass filtered at 0.5 Hz to remove physiological influence from the signal and were then used to calculate the change in concentration of the hemoglobin chromophores using to the modified Beer-Lambert Law 59 . Traces were then segmented into 25-s epochs around the trigger stimulus for each trial (start action; SA), with each epoch starting − 5 s prior to each stimulus (0 s). Baseline correction corresponded to the mean HbO/HbR values from − 5 to 0 s. The hemodynamic response function was then generated at each channel for each condition by participant by averaging all of a participant's response curves from all trials within a condition into a single hemodynamic curve for each channel. Due to a greater signal to noise ratio we only used the HbO signal for remaining analyses, similar to previous work in fNIRS 60 .
Mapping fNIRS electrodes via Atlas Viewer. We determined the anatomical regions covered by each fNIRS channel within each participant using the optode coordinates taken from the Polhemus digitizer. These coordinates were then entered into AtlasViewer 61 to scale the Colin29 brain atlas to each participant's head. Atlas-Viewer generated the MNI coordinates of each channel and the corresponding region of interest (ROI) for each channel was identified. Due to differences in head size, channels were not consistently positioned over the same brain region for all participants. Hence, the analyses were conducted using an ROI approach. The ROIs indicated were: postcentral, precentral, supra-marginal, inferior parietal and angular, located in both left and right hemisphere. Using the preprocessed fNIRS data and the ROI data, unimodal fNIRS analyses were conducted to assess AON activity as detected by fNIRS alone and were also carried forward into the EEG-fNIRS fusion analysis.
EEG-fNIRS data fusion. First, we convolved the mean power in the µ frequency band (8-13 Hz in adults) for each EEG channel with the hemodynamic response function (HRF) using a gamma distribution 62 . AON activation in EEG is characterized by decreased power in the µ frequency band, whereas in fNIRS, AON activation is indexed by an increase in HbO over specific brain regions, specifically bilateral superior parietal lobule, bilateral inferior parietal lobule, right supra-marginal region and right angular gyrus 53 . To account for the power decrease in EEG, we used the inverse value of the power in the µ frequency band for EEG prior to applying the HRF convolution. We also used a 1000 Hz sampling rate to resample the EEG data. Since SA markers were considered the set point (0) over a 3-s epoch from − 1.5 s to 1.5 s, a total of 3000 datapoints were extracted. This resulted in the EEG matrix E ∈ R samples×channels R 3000×128 . fNIRS data were averaged over all the 30-s trials, which consisted of − 5 s before the stimulus and 25 s after at each channel. The SA marker was considered as the zero point in time in both the fNIRS and EEG datasets. The fNIRS signal was sampled at a rate of 10 Hz, resulting in an fNIRS matrix for each participant of N ∈ R samples×channels R 300×24 . After projecting the fNIRS dataset to MNI space for each subject, we used the 12 regions of interest identified in AtlasViewer (see preprocessing, Sect. 2.4.3). Hence, the final fNIRS data matrix was ∈ R samples×ROIs R 300×12 . Since CCA requires the same number of data samples (though a different number of features is still possible), we downsampled the EEG datasets. The final EEG and fNIRS datasets had the following dimensions: E ∈ R 300×128 , N ∈ R 300×12 , respectively. Figure 2 illustrates our preprocessing pipeline.
We randomly divided 21 datasets (overall 42 since each participant had two sets of data: one EEG and one fNIRS dataset) into two subsets: a training set and test set. The optimal parameters were obtained from the training dataset by threefold cross validation. Then, we used the estimated parameters on the test sets to predict the correlation between the datasets. The main reason we chose threefold cross validation over the leave-one-out technique is the lower variance provided by this method. In the case of leave-one-out, where 90% of data are used for training and 10% used for testing, the test set is very small, so there is high variation in the performance estimate across different samples of data and across the different partitions of the same data forming the training and test sets. threefold validation reduces this variance by averaging over 3 different partitions, so the performance estimate is less sensitive to the partitioning of the data. We also repeated threefold cross-validation, where the cross-validation is performed using different partitioning of the data to form 3 subsets, and then took its average as well.

Results
Unimodal fNIRS. FNIRS findings suggested that bilateral superior parietal lobule (SPL), bilateral inferior parietal lobule (IPL), right supra-marginal region (SMG) and right angular gyrus (AG) are candidate regions of the human AON, as previously reported in Miguel et al. 53 . Figure 3 represents a summary of our findings.   EEG-fNIRS fusion. We applied our proposed ssmCCA technique to analyze the correlation between EEG and fNIRS datasets of the 21 participants during the action execution and action observation conditions separately. Our ssmCCA aims to find canonical variates (components) that are the best representative of their own modalities and, at the same time, correlate robustly with the corresponding canonical variates of the other modality. Here, we extracted four canonical variates of which one was statistically significant. Table 1 shows these components, their corresponding correlation, and statistical significance for action execution and observation. Notably, the left parietal inferior region showed a significant correlation during action observation (r = 0.48, p = 0.041), and a marginally significant correlation during action execution (r = 0.39, p = 0.055). Figure 5 provides a schematic view of the brain regions associated with action execution and action observation. Moreover, we examined the performance of our ssmCCA approach to two other common CCA approaches, smCCCA and mCCA. In Fig. 6, the correlations of components 1 to 4 are plotted across all three fusion approaches, with ssmCCA showing the largest correlation magnitudes across all four components in both the action and observation conditions.

Discussion
Recent studies have shown great potential in combining multimodal brain imaging data captured from multiple participants by leveraging the rich information each modality provides. In this study, we developed a ssmCCA algorithm to explore relationships between multi-modal datasets to take advantage of the relative strengths of both EEG and fNIRS. The focus of this work was to use multimodal, multi-participant data fusion to characterize the human AON while addressing common problems in medical data processing, such as smaller sample size and other CCA methodological challenges, and show greater specificity in findings using a multimodal approach than the unimodal approaches alone. It is worth noting that the significance of this work is two-fold: Figure 4. Unimodal EEG results. Our EEG results show strong μ desynchronization during execution and observation conditions. Here power synchronization is indicated by warmer colors on the colorbar, while desynchronization is shown using cooler colors. Thus, μ desynchronization can be seen in blue for both action execution and action observation. However, the source of μ desynchronization is unspecified across the cortex due to poor spatial resolution of EEG. www.nature.com/scientificreports/ 1. It provides methodological advancement in the field of multimodel imaging, and 2. It addresses an important research question affording better understanding of the AON. Our proposed ssmCCA is an unsupervised learning algorithm which finds canonical variates without any prior information. As the quality and interpretibilty of the CCA components depend on the usefulness and relevance of each set of extracted features that are active across sets, our proposed ssmCCA model combines mCCA with l1-norm type penalty to automatically remove irrelevant features (sparsity constraint). This feature selection enhancement also mitigates overfitting problems caused by high-dimensional data sets with few correlated components and a small sample size. Adding l2-norm penalty, we assure the correlation between canonical projections is maximized without neglecting the local structure of the data (i.e., the adjacency in brain regions).  www.nature.com/scientificreports/ Figure 6 also shows the advantage of using ssmCCA over smCCA (without considering l2-norm penalty) and mCCA (with no consideration of l1-norm and l2-norm penalties). As shown, applying those regularizations has improved the correlation between the two datasets and provided us with more accurate representation. Not only has our study contributed to the advancement of a method to combine EEG and fNIRS datasets, but also added to the AON literature by identifying a candidate ROI of the AON system in humans.
Our results are consistent with the AON literature. The ssmCCA analysis indicates canonical components in the inferior parietal, postcentral, supra-marginal, and precentral regions of the left hemisphere when participants performed action execution. As is well-established, motor execution engages mostly the contralateral sensorymotor cortex [64][65][66][67] , indexed by the precentral and postcentral regions in our study. Since participants in this study were all right-handed, the major components were detected on the left side of the brain. In the observation condition, we see the extracted components in inferior parietal, supra-marginal, postcentral, and precentral regions in the left hemisphere. Although the same sensory-motor regions are activated, the more dominant regions seem to be more posterior during the observation condition, as found in previous studies 11,68 . More importantly, the left inferior parietal lobe is shown to undergo the highest covariation (equivalent to greater oxyhemoglobin/ decreased μ power) of simultaneous EEG and fNIRS data across both action execution and action observation, indicating that this is the strongest AON candidate region. Several unimodal studies have indicated involvement of many of these regions during action execution and action observation 15,69,70 . Our fNIRS unimodal analysis also implicated these regions amongst a larger set of areas of the brain involved in the human AON Fig. 3 53 . The unimodal fNIRS findings showed widespread activation across the parietal regions, including bilateral superior parietal lobule, bilateral inferior parietal lobule, right supra-marginal region and right angular gyrus, during action execution and action observation, whereas our multimodal analysis was able to identify regions with more specificity than the unimodal approach.
In addition, our results from the EEG unimodal analysis shows μ desynchronization in both execution and observation conditions across the cortex, with limited spatial specificity of μ desynchronization Fig. 4. However, we do see a "hemisphere effect" during execution such that there was greater activity in the left central compared to right central regions, which is consistent with the contralateral effect seen in our multimodal findings. However, there was no hemisphere effect in the observation condition when using the unimodal EEG analysis, thus characterization of the AON through EEG alone was not specific to one hemisphere. Therefore, the findings from our data fusion analysis appear to be consistent with both unimodal analyses while also more specific, pointing to the left inferior parietal lobe as the region that presents the highest covariation between EEG and fNIRS signals during an AON paradigm.
While the results from our study support the use of ssmCCA to fuse simultaneous EEG and fNIRS data in an attempt to better characterize the spatial profile of cortical activation during an AON paradigm, there were limitations that could be addressed in future research. For the sake of understanding the AON, especially given our findings of lateralized components, it will be important for future studies to include left-handed participants. Furthermore, we offered a comparison of the ssmCCA findings to the fNIRS alone and EEG alone findings; however, the EEG alone findings did not utilize source localization estimates. Given that sources for surface EEG are generally understood to originate from throughout the brain and show minimal spatial specificity, EEG source localization estimates would be a useful comparison to ssmCCA findings from the multimodal dataset, or even to use in the ssmCCA to further increase specificity of our findings. Lastly, the simultaneous collection of fNIRS and high density EEG data (128 + electrodes) was challenging. Not only was the initial integration of the two caps difficult, but data loss in one or both modalities led to a notably smaller sample size for the multimodal analyses. One potential solution is to determine whether high density EEG offers an advantage when using ssmCCA, or if a similar result could be obtained using a more sparse array of EEG electrodes. Determining this could minimize cap integration issues while offering the same quality of multimodal findings and will be further investigated. Similarly, in order to clarify the role of ssmCCA in multimodal analyses, it will also be useful to apply ssmCCA to multimodal datasets that are collected both simultaneously and sequentially. It is possible that sequential data collection can provide similar multimodal findings as simultaneous data collection when using ssmCCA, which again would minimize the integration issues mentioned above. Finally, we can also apply ssmCCA to investigate the temporal components of the AON by focusing on the timing aspects of EEG timeseries as opposed to the spatial aspects, as done in the present study. Since EEG captures cortical dynamics in milliseconds, and our paradigm recorded different action markers, such as starting the action and lifting the object, we could characterize how the AON is firing in relation to each of these actions, as well as the sequence AON activation over the course of the paradigm. This would help determine whether the sequence and timing of AON activation is the same during both action execution and action observation or if there is a time lag or difference in the activation sequence. However, this would require enough spatial resolution to delineate which cortical areas are firing at different times over the course of the paradigm, which is why the present study focused on improving spatial resolution with this approach. Nonetheless, ssmCCA could also potentially be used to calculate accurate timing of the AON response between conditions, which may afford additional insight into AON function.

Data availability
All raw nirs and mmf (EEG raw data) files that support the findings of this study along with the method implementation ssmCCA are available on Dash, a NIH Data and Specimen Hub https:// dash. nichd. nih. gov/ Mirror Network in At-Risk Infants [Study ID: currently the data is being transferred. The ID will be generated upon a completion of data transfer].