Functional cortical localization of tongue movements using corticokinematic coherence with a deep learning-assisted motion capture system

Corticokinematic coherence (CKC) between magnetoencephalographic and movement signals using an accelerometer is useful for the functional localization of the primary sensorimotor cortex (SM1). However, it is difficult to determine the tongue CKC because an accelerometer yields excessive magnetic artifacts. Here, we introduce a novel approach for measuring the tongue CKC using a deep learning-assisted motion capture system with videography, and compare it with an accelerometer in a control task measuring finger movement. Twelve healthy volunteers performed rhythmical side-to-side tongue movements in the whole-head magnetoencephalographic system, which were simultaneously recorded using a video camera and examined using a deep learning-assisted motion capture system. In the control task, right finger CKC measurements were simultaneously evaluated via motion capture and an accelerometer. The right finger CKC with motion capture was significant at the movement frequency peaks or its harmonics over the contralateral hemisphere; the motion-captured CKC was 84.9% similar to that with the accelerometer. The tongue CKC was significant at the movement frequency peaks or its harmonics over both hemispheres. The CKC sources of the tongue were considerably lateral and inferior to those of the finger. Thus, the CKC with deep learning-assisted motion capture can evaluate the functional localization of the tongue SM1.


Methods
Subjects. Twelve healthy volunteers (10 men, 2 women; aged 21-35 years; mean age = 25.0 years) were examined. The participants were right-handed, as determined by the Edinburgh Handedness Inventory 23 . None of the subjects had a history of neurological or psychiatric disorders. All the participants provided written informed consent before attending the study. The study was approved by the local ethics and safety committees at Osaka University Hospital (No. 16469-2) and the Center for Information and Neural Networks (CiNet) at the National Institute of Information and Communications Technology (No. 1910280040). The study was conducted in accordance with the Declaration of Helsinki.
Movement tasks of tongue and fingers. The subjects were asked to perform constant, rhythmic sideto-side tongue movements with a slightly opened mouth for at least 3 min in two or three sessions (60-90 s each), separated by 30-s rest periods. They were asked to avoid drastic tongue movements to reduce the effects of touch sensations from the orofacial regions during tongue movement. They were also requested to relax the other orofacial parts during these tasks.
In the control task, the subjects were asked to make constant, rhythmic up-and-down movements of the right index finger over a table for at least 3 min in two sessions (90 s each), separated by a resting period of 30 s. During the resting periods, subjects were permitted to relax their orofacial muscles and swallow the saliva.
We attempted to observe the rhythmic movements of the right index finger in all twelve subjects (right finger condition). Four subjects (Subject 2, 3, 6, 12) performed rhythmical movements for both conditions (right and bilateral finger conditions) in a randomized order. The subjects were asked not to touch the table or other fingers during the finger movement tasks.
During the tongue and finger movement tasks, the participants were directed to fixate their gaze at a point on the wall in a magnetically shielded room to avoid any effects of eye movement or visual perception.
Recordings. MEG and ACC recording. Cortical activity was recorded by CiNet using a whole-head MEG system with 360 channels (204 planar gradiometers, 102 magnetometers, and 54 axial gradiometers) (Neuromag ® 360, Elekta, Helsinki, Finland). Planar gradiometers with 204 channels were used for the analysis. The position of the subject's head inside the MEG helmet was continuously monitored by supplying a current to four coils fixed to the scalp for tracking head movements. An electromagnetic tracker was used to fix the coils according to the anatomical fiducials (Fastrak, Polhemus, Colchester, VT). The participants were seated in an upright position in the magnetically shielded room. To monitor the movements of the right index finger, a three-axis ACC (KXM52-1050, Kionix, Ithaca, NY, USA) was attached to the nail of the right index finger. The ACC cables were fixed to the hand and table using tape to prevent the generation of noise. The MEG and ACC signals were recorded with a passband at 0.03-330 Hz, and the signals were sampled at 1 kHz. www.nature.com/scientificreports/ Video and MRI recording. The movements of each target region (the tongue and index fingers) were videorecorded simultaneously throughout the MEG recording at 120 frames per second (FPS) with a resolution of 1280 × 720 pixels, using a camera (DMC-FZ200, Panasonic, Osaka, Japan). To obtain a frontal view of each target region, the camera was positioned in front of the MEG gantry at a distance of 1.5 m. To record the finger and tongue movements, the zoom function of the camera was used to record the images of both hands-including the index fingers-and the lower part of the orofacial region (from neck to nasion). To match the onset time between the MEG and movement signals with motion capture analysis, the MEG system included a light-emitting diode (LED) that was strobed five times at 1 Hz before and after each movement task and was captured in the video images. To determine the brain anatomy of each subject, three-dimensional T1 magnetic resonance images (MRIs) were acquired using a 3 T MRI scanner (Siemens MAGNETOM Trio or Vida, Siemens, Munich, Germany).

Data analysis.
Movement signals with the motion capture system. The movements of the tongue and fingers were analyzed offline via deep learning-assisted motion capture with videography using the open-source toolbox, DeepLabCut 22 (https:// github. com/ AlexE MG/ DeepL abCut). DeepLabCut 2.0.6 with CUDA Toolkit 10.1 and Tensorflow 1.12.0 was used to perform markerless position estimation. The "batch size" of the deep neural network (DNN) model was set to one. The image resolution was changed to 960 × 540 pixels. We cropped the frames such that the target regions were clearly visible and manually labeled the tip of the tongue/finger in each extracted frame. For motion tracking, we trained a general model of movements based on ResNet-50 by labeling 100-150 frames selected from the videos for each movement task using k-means clustering 24 . The system was then trained using a DNN architecture to predict the target regions based on the corresponding images. Subsequently, various networks were trained for each target region in 100,000-200,000 iterations as the loss relatively flattened 22,24 . The trained networks could track the locations of the target regions in the full sets of video segments (Supplementary Videos 1, 2). The labeled x-axis (i.e. left-right) and y-axis (i.e. bottom-top) positions of the pixels in each frame were stored and exported in CSV format for subsequent analysis using MATLAB (The MathWorks, Natick, Massachusetts, USA). The Euclidian norm of the two orthogonal (x-and y-axes) signals with baseline correction was used as the movement signal for motion capture.
Coherence between MEG and movement signals. The raw MEG signals were spatially filtered offline with the temporal extension of the signal space separation method 25,26 using MaxFilter (version 2.2.12, Elekta Neuromag, Finland). The MEG and ACC signals were adjusted by down-sampling to 500 Hz. The movement signals were adjusted by up-sampling with the motion capture system to match the MEG signals at 500 Hz. LED flashes were applied to the images for correction between the MEG and movement signals with motion capture. The coherence spectra between the MEG and rectified movement signals with motion capture were calculated using the method proposed by Welch 27 for the estimation of spectral density, where half-overlapping samples, a frequency resolution of 0.5 Hz, and a Hanning window were used. The following equation was used to determine the coherence (Cohxy).
where fxx(λ) and fyy(λ) respectively denote the values of the auto-spectra of the MEG signals and rectified movement signals with motion capture for a given frequency, λ, and fxy(λ) represents the cross-spectrum between fxx(λ) and fyy(λ). We used the position data as movement signals for the CKC analysis with capture motion since the mean CKC value is within 5% error among approaches using position, velocity, and acceleration (Supplementary Table 1). The coherence spectra between the MEG and Euclidian norm of the three orthogonal ACC signals (x-axis (i.e. left-right), y-axis (i.e. bottom-top), z-axis (i.e. near-far)) from right index finger were also calculated.
We checked the epochs comprising artifacts related to unintended orofacial muscle movements such as coughing, which were distinguished through visual inspection. 96.83 ± 1.79 (mean ± standard error of the mean (SEM)) (ranging from 88 to 107 (n = 12)) samples were obtained for the tongue CKC. The epochs for the finger CKC included 96.42 ± 1.52 (ranging from 87 to 106 (n = 12)) samples for the right finger condition and 105.00 ± 3.24 (ranging from 98 to 111 (n = 4)) samples for the bilateral finger condition. According to the method proposed by Rosenberg et al. 28 , all coherence values above Z were considered to be significant at p < 0.01, where Z = 1-0.01 (1/L−1) and L denotes the total number of samples for the auto-and cross-spectrum analyses.
The cross-correlogram in the time domain was calculated by applying an inverse Fourier transformation to the averaged cross-spectra for the tongue CKC and right finger CKC with motion capture. The cross-correlogram underwent bandpass filtering at 1-45 Hz. Isocontour maps were constructed at the time points at which the peaks of the cross-correlogram were observed. The sources of the oscillatory MEG signals were modeled as equivalent current dipoles (ECDs). To estimate the ECD locations, the spherical head model was adopted; the center of this model was consistent with the local curvature of the brain surface of an individual, as determined by the MRI 29 .
Only the ECDs with a goodness-of-fit value of at least 85% were accepted. One subject (Subject 11) was excluded from the ECD analysis of the tongue CKC due to an insufficient goodness-of-fit criterion.
Statistical analysis. The data are expressed as the mean ± SEM. An arc hyperbolic tangent transformation was used to normalize the values of the coherence to ensure that the variance was stabilized 30 . The values of the CKC of the tongue were analyzed between the left and right hemispheres using paired t-tests. The statistical significance level was set to p < 0.05. The ECD locations over the left hemisphere along each axis (x-, y-, and z-axes) were analyzed between the tongue CKC and right finger CKC using paired t-tests with Bonferroni cor-  Figure 1A,B depict representative raw data and power spectra of the movement signals with motion capture and the ACC, respectively, for the right finger condition of Subject 2. Cyclic rhythms were observed at a specific frequency band of the finger movements for both motion capture and the ACC (Fig. 1A). The peak of the power spectra of movement signals with both motion capture and the ACC exhibited the same frequency band of movement rhythms, at 3.3 Hz (indicated by arrows) (Fig. 1B). The peak CKC of the right finger was observed over the contralateral hemisphere at 7.0 Hz with both motion capture (CKC value = 0.61) and the ACC (CKC value = 0.60), around the harmonic frequency band of finger movements (Fig. 1C). The peak CKC of the tongue was observed over the left hemisphere (CKC value: 0.43) and right hemisphere (CKC value: 0.46) at 3.3 Hz, around the harmonic frequency band of tongue movements ( Fig. 2A[1,2]). For the right finger condition, the peak frequencies of the power spectrum of the movement signals were the same, at 1.8-3.8 Hz for both motion capture and the ACC ( Table 1). The coherence spectra exhibited significant    (Table 1). For the bilateral finger condition, the CKC also exhibited peaks for each side of the finger in all 4 subjects ( Table 2). For the tongue movements, the peak frequencies of the power spectrum of the movement signals were detected at 1.3-3.3 Hz (Table 3). The CKC spectra for the tongue showed significant peaks (p < 0.01) at 2.5-5.3 Hz over the left hemisphere and at 2.5-6.0 Hz over the right hemisphere in all subjects, corresponding to the frequency of tongue movements or their harmonics ( Table 3). The CKC values were not significantly different between the left (mean, 0.203) and right (mean, 0.188) hemispheres (p = 0.499) ( Table 3).

Results
The spatial distributions of the cross-correlogram of the finger and tongue CKC showed peaks over the contralateral and bilateral hemispheres ( Fig. 2A[3 -5]), respectively. Dipolar field patterns, which were centered on the Rolandic sensors, were observed at the principal peaks of the cross-correlogram (Fig. 2B[1]). The sources for the tongue CKC were estimated to be over the left and right SM1 in 11 subjects, respectively ( Fig. 2B[2]). For the right finger CKC, the isofield contour maps also showed a clear dipolar pattern (Fig. 2C[1]). The sources for the right finger CKC were located in the SM1 over the contralateral hemisphere in all of the 12 subjects (Fig. 2C[2]). The results of the paired t-test implied that the locations of the ECDs of the tongue were considerably lateral (mean = 13.99 mm; p < 0.001; paired t-test with Bonferroni correction) and inferior (mean = 20.78 mm; p < 0.001), but not anterior (mean = 5.15 mm; p = 0.029) to those of the finger (Fig. 3).

Discussion
Significant coherence between MEG and tongue movement signals was detected over the bilateral hemispheres using deep learning-assisted motion capture with videography. The sources of the coherence activity were detected in the bilateral SM1 of the tongue region, which were found to be considerably lateral and inferior to the finger SM1, corresponding to the classical homunculus. These results suggest that the use of deep learningassisted motion capture in CKC is a robust and useful approach for evaluating the functional localization of the tongue SM1. The reliability of measuring CKC using motion capture is comparable to that of the conventional ACC-based CKC method 6,31 , as evidenced by the fact that the finger CKC value obtained using motion capture achieved a similarity of 84.9% when compared with the CKC value obtained using the ACC and the finger CKC value obtained using ACC. In addition, the power spectrum of movement signals and CKC showed the same peak  www.nature.com/scientificreports/ frequency bands between the motion capture and ACC for all subjects during the finger movement tasks. Moreover, because the finger SM1 region is similar for conventional ACC-based CKC and CKC with deep learningassisted motion capture ( Supplementary Fig. 1), the determination of CKC with deep learning-assisted motion capture has been proven to be reliable. Previous studies involving non-human primates have revealed that several movement parameters, such as position, rotation, direction, and movement velocity, are encoded in the SM1, as determined using the recordings of a single neuron, local field potential, and multi-unit activity [32][33][34][35][36][37] . MEG studies involving humans have also revealed the significance of the SM1 cortex oscillations for encoding the parameters of voluntary movements, such as velocity 38 and acceleration 6,31 . When studying CKC with motion capture, we evaluated the movement parameters of the target positions of pixels in each image with videography by using a deep learning-assisted motion capture system, since the CKC value with motion capture is not significantly different among approaches using position, velocity, and acceleration (Supplementary Table 1).
Recently, Bourguignon et al. 7 reported that using two different approaches showed interactions between central and peripheral body parts during motor executions; i.e. CKC and CMC occurs by different mechanisms. CKC, which is coherent with the movement frequency and its harmonics, is mainly related to proprioceptive afferent signals. CMC, which mainly occurs at beta frequency bands during weak muscle contraction, is mainly driven by mu-rhythm-specific neural modulations in efferent signals. Bourguignon et al. 7 also reported that the values of CKC during rhythmic finger movements were substantially higher and easier to detect than those of CMC during isometric finger movements [39][40][41][42][43][44][45][46] . Because a recording time of at least 10 min was required for the CMC of the tongue in previous studies 2-4 , the proposed motion capture approach offers the advantage of a short recording time-approximately 3 min for the CKC of the tongue. The CKC of the tongue with motion capture also has a technical advantage of enabling free movement because no objects, such as an ACC, electromyography (EMG) electrodes, or tracking markers, are placed on the tongue. When objects are placed on the tongue, they disturb the execution of smooth movement tasks. For example, for the tongue CMC recording, it is sometimes technically challenging to set the EMG electrodes on narrow and wet tongue regions because placing electrodes on the tongue can induce uncomfortable feelings in subjects, resulting in a vomiting reflex. Moreover, because no objects are used on the tongue in this CKC method, the risk of an object being swallowed during a tongue movement task is eliminated. In clinical applications for patients with sensorimotor disorders of the tongue, patients sometimes face difficulties performing smooth tongue movements and are easily fatigued by movement tasks. Therefore, the short recording time of the tongue CKC technique provides an advantage over the conventional CKC and CMC methods that use ACC devices or EMG electrodes. In a recent clinical setting, Marty et al. 9 reported that utilization of the finger CKC is a useful approach for patients with impairment of spinocortical proprioceptive pathways in Friedreich ataxia. As oropharyngeal dysphagia and/or speech disorders are also commonly present in individuals with Friedreich ataxia and worsens with disease duration and severity, the CKC approach of the tongue might provide electrophysiological evidence for proprioceptive impairment of corticobulbar proprioceptive pathways.
Damage to the cortical areas representing sensorimotor function of the extremities and language function causes severe dysfunction and seriously decreases the quality of life. Thus, cortical localization of these functions has received much attention for the presurgical evaluation of neurosurgical procedures. In contrast, cortical localization of functions relating to the tongue and other orofacial regions has been relatively undervalued. This is because the cortical representation of orofacial motor function is bilateral, and thus damage to the orofacial SM1 does not apparently induce severe dysfunctions unless the damage is bilateral as well 47,48 . However, dysfunctions in critical orofacial motor functions may still result from damage to the orofacial SM1, severely reducing the quality of life. For example, dysfunctions in critical tongue motor functions can cause dysphagia and silent aspiration. In addition, damage to the orofacial SM1 may cause a cosmetically conspicuous imbalance of facial expression between the left and right sides of the face 48 . Because this unbalanced facial expression is easily recognized in daily communication, the problem should be considered as a target for improvement. Thus, more attention should be paid to preserving motor functions of the tongue and other orofacial regions during neurosurgical operations. Here, the CKC technique may be helpful in evaluating SM1 localization of the orofacial regions in patients with brain lesions observed around the central sulcus.
Previous studies have shown that the finger CKC mainly reflects the proprioceptive input into the contralateral SM1 12,49 , which corresponds to the timing of the strongest deflection of the cortical MEFs associated with self-paced finger movements 15 . Thus, it is likely that the cortical mechanisms of the CKC and MEFs are closely related; therefore, it is reasonable that the tongue CKC was detected over both SM1s without hemispheric dominance-similar to the MEF results obtained in the bilateral SM1 associated with self-paced tongue protrusion tasks with intervals of approximately 10 s 5 .
Previous studies have reported that the CMC for the tongue was detected at 2-10 Hz, which may have been driven by proprioceptive afferents from the tongue muscles to the cortex-as well as the beta frequency bandduring sustained tongue protrusion tasks 2,3 . Because human tongue muscles are rich in muscle spindles 50 , it is reasonable that the tongue CKC may be related to the proprioceptive afferents from the tongue muscles associated with rhythmic tongue movements. A recent study reported that subtle postural tremors during sustained isometric contraction of the finger at low frequency bands between 5 and 11 Hz can be detected by CKC using ACC signals 51 . The presence of physiological tremors may contribute to CMC at low frequency bands during sustained tongue protrusion. The tongue muscles fundamentally move freely; therefore, slight involuntary tremors are observed in the tongues of subjects in tasks involving isometric tongue protrusion. When these subtle movements can be accurately detected using deep learning capture motion systems, the tongue CKC can be examined during persistent tongue movements.
Ruspantini et al. 52 reported that low oscillatory frequency, which is related to the proprioceptive afferent feedback obtained from the mouth muscles, might be necessary to generate the fine oral movements required to www.nature.com/scientificreports/ produce speech. Therefore, sensory feedback obtained by muscle spindles of the orofacial regions may contribute to excellent oral motor functions, including swallowing, speech, and mastication. CKC with motion capture has the advantage of being able to track the motions of multiple body parts, as the finger CKC for bilateral finger movements can be evaluated simultaneously. Thus, in the future, CKC with motion capture might be useful for elucidating the cortical mechanisms that enable swallowing and speech through evaluation of the synchronization of signals between the MEG and movements of multiple orofacial regions. In our data of CKC of the right finger, peaks were observed at the first harmonic of the movement frequency in eight and nine subjects for ACC and deep learning-assisted capture motion system, respectively. Previous studies reported that when the movement is regular, CKC mainly peaks at the movement frequency and its first harmonic 6,7,49 . Moreover, Parkinsonian 53 and essential tremors 54 induce CMC at the tremor frequency and its first harmonic. The same tendency was observed in the tongue CKC with deep learning-assisted motion capture.
The occurrence of synchronous head movements corresponding to rhythmic tongue movements may yield coherent artifacts in the cross-correlogram. This feature represents a potential limitation of the tongue CKC during repetitive tongue movements, similar to the limitations related to the finger CKC mentioned in previous studies 6,7 . In clinical applications of the tongue CKC, the appearance of artifacts related to head movements must be addressed in patients who struggle to perform repetitive movements. Another potential limitation is the effect of touch sensations from the tongue and other orofacial regions, such as the buccal and lip, during tongue movement tasks. Because CKC appears to be primarily driven by proprioceptive feedback with no significant evidence of any effect due to cutaneous input 49,55 , touch sensations might not have been a severe problem in the present study. Further studies are required to analyze the effects of touch sensations from orofacial regions on the tongue CKC during tongue movement tasks. We applied single dipole fitting analysis for the source localization for clinical application, as dipole fitting is useful for evaluating the somatotopic localization in a pre-neurosurgical situation. However, it is also useful to reveal the distribution of cortical activity based on the distributed source modelling from the systematic and physiological point of view. Further studies are needed to reveal the cortical mechanisms of tongue movements using distributed source modelling analysis. Owing to the latest advancements, human motion capture technologies can be realized using numerous alternatives, such as acoustic, mechanical, optical, and magnetic systems. It is important to evaluate the CKC reliability in future with additional motion tracking systems in comparison to the conventional CKC with ACC and CKC with deep learning-assisted motion capture system.
In conclusion, the use of CKC together with deep learning-assisted motion capture is a robust and useful approach for evaluating the functional localization of the SM1 of the tongue; it is a magnetic, noise-free, movement-free, and risk-free approach because no recording devices are placed on the tongue.

Data availability
The movements of the tongue and fingers were analyzed with deep learning-assisted motion capture using the open-source toolbox DeepLabCut (https:// github. com/ AlexE MG/ DeepL abCut). We also used custom-made MATLAB ® (MathWorks, Natick, MA, United States) scripts, created by Prof. Masao Matsuhashi (Kyoto university), for MEG data preprocessing. The custom MATLAB toolbox is available from the corresponding authors upon reasonable request, subject to a formal code sharing agreement with Prof. Masao Matsuhashi. Data presented in this study will be made available upon reasonable request and with permission of the study participants and a formal data sharing agreement.