Introduction

Spin exchange relaxation free (SERF) optically pumped magnetometers (OPM) are quantum sensors for measuring magnetic flux signals with a sensitivity in the order of few 10 fT/\(\sqrt{{\text{Hz}}}\). SERF-OPM are based on a zero-field resonance caused by the Zeeman-effect1 detected be laser spectroscopy of spin-polarized alkali metal vapor2. In recent years, several studies have shown the potential of OPM for measuring biomagnetic signals of the brain, heart, nerves, or muscles, opening up new opportunities for the research and application of human biomagnetism3,4,5,6,7,8,9. In particular, in addition to more traditional applications in magnetoencephalography (MEG)5,10,11, OPM are also increasingly utilized for studying skeletal muscles12,13,14. The spatial flexibility, small physical size (few cubic centimeters) and possibility of bi- or triaxial signal acquisition of OPM, enable the contactless investigation of muscle physiology in space and time14. Magnetomyography (MMG) has several general advantages in comparison to the current gold standard for non-invasive muscle studies, surface electromyography (EMG)15,16. In contrast to electric currents, magnetic fields are far less affected by the different tissue layers between the electromagnetic source and the skin surface, resulting in less distorted signals12,15,17,18. Furthermore, in contrast to MMG, EMG electrodes require contact with the skin where the presence of a charge at the electrode–skin interface creates noise voltages that can interfere with the signal18,19.

Magnetic flux and electric potentials originate from the same ionic currents and have comparable temporal and spectral profiles16,20, but the magnetic flux direction is orthogonal to the source (electric current). Newly available OPM offer up to triaxial simultaneous signal acquisition (\(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{\text{B}}} {\text{x}},\,\,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{\text{B}}} {\text{y}},\,\,{\text{and}}\,\,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{\text{B}}} {\text{z}}\)) with one sensor, i.e., three-dimensional spatial information on the magnetic flux vector per sensor. The gain in spatial information per sensor in comparison to EMG could be particularly relevant for human–machine interfaces based on muscle signals, such as e.g. prosthesis control21,22,23. Considering this hypothesis and the current progress in EMG-based human–machine interfaces24, in the present proof-of-principle study, we sought to investigate the potential of biomagnetic measurements as a new modality for human–machine interfaces. Specifically, we investigated the ability to differentiate individual finger movements based on EMG and OPM-MMG of the finger flexor muscles in a single human participant.

Methods

Participant

A single human participant volunteered for the study. The participant of the study was male, 28 years old, 1.76 m tall, and 72 kg weight (BMI = 23.2 kg/m2). The participant was one of the authors and gave informed consent before participating in the study. The study was conducted in accordance with the Declaration of Helsinki and approved by the ethics committee of the University of Tübingen.

Experimental design

The experiment was designed to record the magnetic activity of the right flexor muscles of the fingers Digit II and V (index and little finger; DII and DV). Before the measurements, we conducted a high-resolution ultrasound (Mindray TE7, 14 MHz-linear probe) to determine the position and axis of the muscles that were activated by DII and DV movement and to ensure that these were predominantly activated by movements of the free fingers. Then, muscle activation was measured on the forearm using simultaneous OPM-MMG and EMG (Fig. 1A,B, 4 bipolar EMG and 4 biaxial OPM channels). Every 5 s, an auditory cue indicated when to execute the finger movement. In the first session, the participant flexed DII for 30 trials, in the second session, the participant flexed DV for 30 trials, and in the third session, the participant alternated between DII and DV for 30 trials. The participant was instructed to flex and extend his fingers over a total duration of 1 s and to use a consistent speed throughout the testing. Throughout the experiment, finger movements were measured using a light meter. Involuntary movements of fingers DIII and DIV next to the target fingers would have affected the light meter measurements. Thus, we employed a cast to immobilize these adjacent fingers.

Figure 1
figure 1

(A) Photograph depicting the experimental setting, with colored arrows showing the orientation of OPM axes. (B) Conceptual representation of the sensors modalities and the location they were placed. Colored arrows represent the orientation of OPM axes, with the z-axis (green) being perpendicular to the skin and the y-axis (blue) parallel to the forearm.

Sensors

The muscle strands of the individual fingers (DII and DV) were imaged using high-resolution muscle ultrasound (Mindray TE7, 14 MHz-linear probe) to determine the longitudinal axis of the muscles. All recordings were collected inside a magnetically shielded room (Ak3b, VAC Vacuumschmelze, Hanau, Germany). Here, 8 paramagnetic EMG surface electrodes (Conmed, Cleartrace2 MR-ECG-electrodes) were placed in a bipolar montage along the longitudinal axis of the muscle. A ground electrode was placed on the right shoulder. 4 biaxial OPM (QZFM-gen-1.5, QuSpin Inc., Louisville, CO, USA) were placed in between the EMG electrode pairs about 15 mm above the skin surface. The movement of the fingers was measured using a fiber optic that measured the distance between the finger and the fiber optic (Keyence Digital Fiber Sensor FS-N10).

Data acquisition

The analog output of the OPM electronics and the EMG signals were recorded using the data acquisition electronics of an MEG system (CTF Omega 275, Coquitlam, BC, Canada) that was installed in the magnetically shielded room in which all measurements were performed. OPM and EMG signals were recorded through the direct analog input and EEG input channels of the data acquisition system, respectively. Both the OPM and EMG recordings were acquired with a sampling rate of 2343.8 Hz. The employed OPMs were capable of measuring two components of the magnetic field vector: the y- and z-axis. They provided a magnetic field sensitivity of 15 \({\text{fT}}/\sqrt{{\text{Hz}}}\) in a bandwidth of 3–135 Hz, an operating range below 200 nT, and a dynamic range of a few nanoteslas. To adapt to a non-zero magnetic background field, the sensors are equipped with internal compensation coils that can cancel magnetic background fields of up to 200 nT in the sensing hot rubidium vapor cell (cell size 3 × 3 × 3 mm).

Data preprocessing

Data analysis was performed using Python (Python Software Foundation, version 3.7, http://www.python.org). Data from 8 OPM channels (4 OPM sensors with 2 independent channels for Y and Z axis per sensor) and 4 bipolar EMG channels were demeaned and filtered using a 25–100 Hz band-pass zero-phase fourth-order butterworth infinite impulse response (IIR) filter. We employed the same 100 Hz lowpass for OPM end EMG because of the passband of the OPM sensors (3–135 Hz) and to allow for a fair comparison between both modalities. Line noise was filtered using a 49–51 Hz band-stop zero-phase fourth-order butterworth IIR filter. Then, we extracted the envelope of the signal by taking the absolute value of the Hilbert transformed signal and resampled the data at 200 Hz.

Data sampling

We designed our data analysis pipeline to approximate online detection of finger movements with 100 ms temporal resolution25. This this end, we employed a cross-validated classification of 100 ms temporal window as DII motion, DV motion or no motion based on the EMG or OPM-MMG signals (see also next section). We first aggregated the data from all three recording sessions. Then, to rule out spurious classification due to signal autocorrelation within trials, we split all trials into distinct “training” and “test” trials. From these trials we randomly sampled 100 ms windows. We sampled 500 random 100 ms windows of each class to build the training dataset and 100 random 100 ms windows for the test dataset. DII motion and DV motion windows were drawn from 0 to 300 ms after the onset of each finger movement. No motion windows were drawn from 1500 to 0 ms relative to the finger movement onset. Because of the 5 s trial length and 1 s finger movement duration, these temporal intervals ensured optimal sampling during finger motion and rest.

Classification analysis

We performed classification analysis using a supervised learning approach and a deep convolutional neural network implemented in Tensorflow26. The network architecture (Fig. 3A) consisted of an input layer that received batches of data consisting of matrices \(T\times C\), where \(T\) is the number of time points and \(C\) is the number of channels. Thus, the resulted input was a 3-dimensional array \(N\times T\times C\), where \(N\) is the batch size that we set to 250. This tensor was passed to a residual block27, consisting of a point-wise (kernel size of \(1\times 1\)) 1-dimensional convolutional layer with 128 filters, stride 1 and without the bias term and padding, followed by a batch normalization (BN)28 layer and a gaussian error linear unit (GELU)29 activation function. Then, the residual block continued with a second zero-padded 1-dimensional convolutional layer with a kernel size of \(3\times 3\) and \(C\) filters, stride 1 and without the bias term, followed by a BN layer and GELU function and concluded by adding the input array to the resulted array so far and finally applying a GELU function. After the residual block, the network architecture comprised a zero-padded 1-dimensional convolutional layer with a kernel size of \(3\times 3\) and \(16\) filters, stride 2 and with the bias term and GELU function. Then, the output of this layer was flattened and passed to a Dropout layer with a dropout rate of 0.1. From here, the output array was passed to two fully-connected (FC) layers with \(Z\) units and GELU as activation function (here, \(Z\)=100), both followed by a Dropout30 layer with a dropout rate of 0.1. Finally, the output layer consisted of an FC layer of \(F\) units and softmax activation function (here, \(F\)=3), where each unit encoded either a finger movement or both as not moving. The network was trained using 250 epochs, the categorical cross-entropy as loss function and Adam31 as optimizer with a learning rate of 0.01, \({\beta }_{1}\) as 0.9 and \({\beta }_{2}\) as 0.999. We trained one model on the OPM signal and one on the EMG signal. Model evaluation was performed with a stratified fivefold cross-validation by computing the accuracy of the model between the ground truth and its predictions. Notably, we z-scored the data by computing the moments (sample mean and standard deviation computed on the sampled windows dimension) on the train set and then applying them to the test set to avoid possible confounders.

Feature importance analysis

We investigated how the models generated predictions in the test set by exploiting a recent approach in the field of explainable deep learning32, namely the integrated gradients33, that allowed us to perform a feature importance analysis. For each sample of the test set, we first linearly interpolated the sample (i.e., a \(T\times C\) matrix) with a “baseline” matrix of zeros with the same dimensionality, using 6 levels of transparency linearly sampled from 0 to 1. We passed these interpolated samples to the model and compute the partial derivative (i.e., the gradients) of the loss function with respect to the input. Next, we combined these gradients by computing a numerical approximation of their integral over the interpolated samples using the Riemann sum approximation and normalized them to make sure they were in the same scale. We averaged these values across the temporal dimension of the input and across the samples of the test set to obtain a single value for each fold of the cross-validation scheme and for each channel.

Inter-rate reliability analysis

We assessed the consistency of the OPM-MMG and EMG model predictions by computing two metrics of inter-rater reliability. The first one was the percentage agreement, which simply quantifies the percentage of predictions for which both models predicted the same class. The second metric was Cohen’s Kappa, defined as the probability of agreement between the two models normalized by probability of agreement expected by chance.

Statistical analysis

Statistical analyses were conducted on the comparison between the accuracy values of the two models against the empirical chance level, separately for each model. The empirical chance level was computed using a permutation test approach, by permuting the labels of the train set and repeating the cross-validation for 1000 times to obtain a null distribution. P-value was obtained as the number of values found in the null distribution that exceeded the observed value, while the effect size (Cohen’s \(d\)) was computed as the difference between the observed value and the mean of the null distribution, divided by its standard deviation. We also compared the percentage agreement values, the Cohen’s Kappa values and the integrated gradients values against the resulting null distribution as above. For the direct comparison between the accuracy of the EMG and OPM models, we ran two tests specifically suited for comparing the performance between two classifiers34,35. First, we used the 5 × 2 cross-validation F-test by repeating 5 times a twofold cross validation and testing both models on the same data. Thus, we computed the pseudo f-statistic and the p-value using an F distribution with 10 and 5 degrees of freedom35. Finally, we also compared the models’ performance using the McNemar test, by computing a 2 by 2 confusion matrices between the models’ predictions. Then, we computed the McNemar statistic and the p-value using a \({X}^{2}\) distribution with one degree of freedom34.

Results

We collected data from a single human participant by measuring muscle activation on the forearm (Fig. 1A) to detect flexion movements of the index (DII) or little finger (DV) and simultaneously recording magnetic and electric signals using 4 biaxial OPM sensors and 4 bipolar surface EMG electrodes, respectively (Fig. 1B).

We computed the temporal envelope of 25–100 Hz power of all MMG and EMG signals (see methods), aggregated the data from all the three sessions, and computed the time-course of the signal envelope relative to the onset of DII (Fig. 2A) or DV (Fig. 2B) finger motion for each channel and sensor modality. The visual inspection of movement-locked envelopes suggested that both, EMG and MMG captured muscle activity during finger movement and that EMG had a higher signal-to-noise ratio during finger movement relative to the pre-motion baseline. Furthermore, the pattern of results suggested that channels positioned on the ulnar side of the forearm (EMG-2, EMG-4, OPM-2YZ and OPM-4YZ) were measuring signals more during DV motion, while sensors on the radial side (EMG-1, EMG-3, OPM-1YZ and OPM-3YZ) were measuring signals more during DII movement.

Figure 2
figure 2

(A) Line plots showing the time course of the aggregated data of the DII finger movement across the three sessions and divided by channel. (B) Line plots showing the time course of the aggregated data of the DV finger movement across the three sessions and divided by channel.

Classification analysis

To quantify these results and to compare the two sensor modalities, we classified finger movements from EMG and MMG signals. Specifically, we performed a multiclass classification analysis on 100 ms temporal windows of either EMG or MMG signals using a Deep Residual Convolutional Neural Network27 (Fig. 3A). The 3 classes to predict were whether finger DII was moving, finger DV was moving or both fingers were not moving. We trained two models, one for each sensor modality, using a stratified fivefold cross-validation scheme with a nested sampling procedure.

Figure 3
figure 3

(A) Schematic of the network architecture. Left: full network architecture. Right: residual block construction. (B) 2-dimensional simplex density plots of EMG (red) and MMG (blue) models’ predictions divided by classes. (C) Line plots of the loss function across the epochs for EMG (red) and MMG (blue) models. (D) Confusion matrices of the EMG and MMG models’ performance. (E) Results of the permutation test on the accuracy of the EMG (red) and MMG (blue) models. The density plots are the null distributions obtained separately for each model, while the vertical lines are the observed accuracy values, averaged across folds. (F) Results of the 5 × 2 cross-validation F-test. Scatter dots are individual folds, while error bars represent standard deviation.

We plotted the models’ predictions as a density plot on a 2-dimensional simplex where vertices represented the three classes (Fig. 3B). Visual inspection of these plots showed qualitatively similar distributions across sensor modalities. To assess model convergence, we plotted the loss function (categorical cross-entropy) as a function of the epochs used for training the models for both EMG and MMG models (Fig. 3C). Both models reached their plateau performance around 100 epochs, with the EMG model converging faster than the MMG model.

To quantitatively assess both models’ performance, we next computed the confusion matrix between the models’ prediction and ground truth (Fig. 3D). Both models showed high-performance in the classification task, with similar patterns of errors and accurate predictions. All three states could be significantly (compared to a null distribution) classified by both models with an overall accuracy of 95.31% (\(p\) < \(0.001\), \(d\) = 34.41, 95% CI [32.89, 35.92]) and 89.06% (\(p\) < \(0.001\), \(d\) = 53.74, 95% CI [51.38, 56.11]) for EMG and MMG, respectively (Fig. 3E). The EMG accuracy was significantly higher than the MMG accuracy (Fig. 3F, \(F\)(10,5) = 10.66, \(p\) = 0.009, \(d\) = 2.97, 95% CI [2.78, 3.16], 5 × 2 cvtest; \({X}^{2}\left(1\right)=\) 9.78, \(p\) = 0.002, Cohen’s \(g\) = 0.22, McNemar test).

Feature importance analysis

After having established the models’ performance, we investigated which channels the models relied mostly on to make predictions. We conducted a feature importance analysis exploiting recent advances in explainable methods in deep learning32 such as integrated gradients33. We computed integrated gradients for each channel across cross-validation folds and compared them against a null distribution to test their significant contribution to models’ predictions. We found EMG model significantly relied on all channels (Fig. 4A, all \(p\) < 0.002), even though the channels positioned on the ulnar side of the forearm had a higher effect size (EMG-2 \(d\) = 79.39, 95% CI [75.91, 82.88], EMG-4 \(d\) = 24.27, 95% CI [23.21, 25.34]) compared to the others (EMG-1 \(d\) = 5.03, 95% CI [4.81, 5.26], EMG-3 \(d\) = 5.28, 95% CI [5.04, 5.52]). For the MMG model, we found that it significantly relied on all channels (all \(p\) < 0.018) but OPM-1Y, OPM-3Y and OPM-4Y (all \(p\) > \(0.05\)). We also found that the effect size was generally larger on the ulnar side for the Z-axis (OPM-2Z \(d\) = 18.24, 95% CI [17.43, 19.05], OPM-4Z \(d\) = 3.66, 95% CI [3.49, 3.84]) compared to the opposite side (OPM-1Z \(d\) = 9.74, 95% CI [9.30, 10.17], OPM-3Z \(d\) = 2.79, 95% CI [2.65, 2.93]).

Figure 4
figure 4

(A) Results of the permutation test on the integrated gradients values representing the feature importance scores for each channel and sensor modality, i.e. EMG (red) and MMG (blue for y-axis and green for z-axis). The density plots are the null distributions, while the vertical lines are the observed values, averaged across folds. (B) Consensus matrix between EMG and MMG models’ predictions. (C) Permutation test showing the percentage agreement values between EMG and MMG models’ predictions. The density plot is the null distribution, while the vertical line represents the observed percentage agreement, averaged across folds. (D) Permutation test showing the Cohen’s Kappa values between EMG and MMG models’ predictions. The density plot is the null distribution, while the vertical line represents the observed Cohen’s Kappa, averaged across folds.

Inter-rate reliability analysis

Finally, we investigated the model agreement, by directly comparing their predictions. We computed a consensus matrix, where the row and column entries of the matrix where the MMG and EMG predictions, respectively (Fig. 4B). We found that their predictions were highly aligned. For all three individual states as well as for the average across all states the agreement between MMG and EMG models was significantly higher than expected by chance (Fig. 4C, mean percentage agreement = 85.26%, \(p\) < 0.001, \(d\) = 48.79, 95% CI [46.65, 50.94]). We also observed that the average Cohen’s kappa of 0.45 between the two models’ predictions was significantly higher than expected by chance (Fig. 4D, \(p\) < 0.001, \(d\) = 33.68, 95% CI [32.19, 35.16]).

Discussion

In this proof-of-principle study, we measured muscle activation using OPM-MMG and EMG to detect finger movements. We found that both sensor modalities were able to discriminate DII, DV and non-movement. Our EMG results add to previous studies showing the capability to discriminate finger movements with EMG36,37,38 by demonstrating finger movement discrimination using an end-to-end learning framework based on only 4 EMG channels without explicit feature extraction. Our OPM results are, to the best of our knowledge, the first demonstration of finger movement discrimination with OPM-MMG. We show that also for OPM-MMG an end-to-end learning framework can be adopted for efficient movement discrimination.

We found better performance for the EMG model as compared to the OPM-MMG model. This likely reflects a lower signal-to-noise ratio (SNR) of OPM-MMG. On the one hand, this may reflect a genuinely lower sensor SNR. On the other hand, this may be due to geometrical factors. While the surface-EMG was attached to the skin of the forearm, the OPM sensors were positioned above the skin and independently with respect to the forearm. Thus, both the distance between sensors and muscles and their relative motion was larger for OPM-MMG than for EMG.

The feature importance analysis revealed that both sensor modalities relied more on channels positioned on the ulnar side of the forearm to classify finger movements. This can be explained by the fact that the flexor digitorum muscles (DII-DV) are positioned more on the ulnar side of the forearm than the radial side. Notably, the z-axis (perpendicular to the skin) of the OPM sensor, positioned on the ulnar side of the forearm (OPM-2Z and OPM-4Z), was the feature most used by the model. This highlights the relevance of the spatial axis of MMG measurements and suggests that the skin perpendicular axis may be particularly suited to differentiate the signals of finger flexors. Finger movements involve complex synergies between different muscles and motor units39. Future studies may investigate to which extent the finger-movement specific OPM-MMG signals identified in the present study reflect such synergistic or individual muscle activations.

We found that OPM-MMG and EMG models were highly consistent in their prediction of finger movements. This demonstrates the potential of OPM-MMG as an alternative to EMG, since both do not only have similar classification performance, but also consistent prediction patterns. As OPM-MMG allows contactless measurements, it may be particularly suited for clinical applications in which skin contact is undesirable, such as e.g. measurements in autistic patients40 or patients with skin-diseases.

Some limitations of this study that point towards future research directions need to be considered. First, measurements were performed in a single participant. Thus, although the results provide a proof-of-principle and show that the SNR of OPM-MMG is sufficient for use in single subjects, further studies with larger samples are required to validate the results and estimate population variance. Second, we placed sensors only on the ventral forearm because we focused on palmar flexion movements of the fingers. Future studies may add sensors on the dorsal forearm to also exploit extensor muscles’ signal. Third, we classified finger movements of two well-separated fingers and by using a cast to allow only the fingers of interest to move. Future studies are required to investigate the performance of OPM-MMG for the classification of more complex, naturalistic and unconstrained finger movements. Finally, we collected all data in ideal, magnetically shielded conditions, which is different from real-world scenarios. Magnetic noise in unshielded, real-world scenarios prevents OPM-MMG based on current OPM technology, e.g., for neuroprosthetics applications. However, magnetic shielding may not be required in the future given the rapid technological development of magnetometers based on Nitrogen-Vacancy-Centers (NV-center)41 or Complementary Metal–Oxide–Semiconductor (CMOS)42 technology, which both can operate in unshielded environments. Furthermore, more research is required to investigate the extent to which naturalistic movements in residual magnetic fields will influence muscle signals and how magnetometer can be ideally placed to measure muscle activity in real-world applications.

In sum, our findings show that OPM sensors can be employed to discriminate finger movements and incentivize future applications of OPM in magnetomyography.