Introduction

The development of human–machine interaction (HMI) methods using surface electromyogram (sEMG) signals generated by neuromuscular activity and categorized based on hand gestures used to communicate with and control smart electronic devices is of significant interest1,2,3,4. sEMG signals are a qualitative measure of neuromuscular activity, with which disorders, activation levels, fatigue, and body movements can be detected or analyzed5,6,7,8,9,10. In addition to biomedical attributes, the quality, and quantity of sEMG signals are used as control signals to control prosthetic devices or some artificial devices11,12,13. In contrast to single bipolar sEMG electrodes, by which only a specific gesture associated with a localized muscle can be recognized, sEMG electrodes distributed in an array covering a portion of the skin can substantially increase the quantity and reliability of the information received14,15. This guarantees more precise control over multiple artificial devices that may be useful in daily life activities. To provide such functionality, the positioning and interelectrode distance of the sEMG electrodes are crucial. Moreover, skin-mimicking functionalities such as stretchability and breathability are essential for stable and long-term monitoring of sEMG signals, as they provide comfort and conformability.

Controlling an artificial device requires a high degree of freedom of discrete sEMG signals with gesture pattern recognition (PR) algorithms to perform the desired task according to a user’s intention. PR algorithms such as classifier-based16,17,18,19, probabilistic model-based20, fuzzy logic-based methods21, artificial neural networks (ANNs)22,23, and linear discrimination analysis (LDA)24 algorithms have been reported in the literature. These algorithms are trained to predict gestures using a sampling dataset. Among these methods, those based on neural networks have been of increasing interest because they require less handcrafted feature selection. However, most sensors are constrained to flexible EMG sensors, the performance of which may deteriorate under external physical disturbances and long-term monitoring.

Herein, we report a stretchable, wireless, multichannel sEMG sensor array with an artificial intelligence (AI)-based graph neural network (GNN) for both static and dynamic gesture recognition. The GNN is a neural architecture that operates on data structured as a graph25. A graph consists of a set of nodes and edges, and an edge can express the relationship between two nodes through a floating weight. In the GNN, the latent representation of each node is updated by taking the aggregation function of the representations of its connected neighbors, followed by a nonlinear activation function. The GNN approach has shown promising results in semi-supervised node classification; however, it has been seldom applied to gesture recognition using sEMG signals. Even though channel signals can be considered as nodes and their relations can be expressed with edges, one limitation of applying graph neural networks (GNN) directly to sEMG signals is that deciding how each sensor is associated with others is difficult. In other words, they should be presented with an appropriate edge weight. To mitigate the problem, STCN26 introduces fixed learnable parameters to determine the weights of edges for presenting the inter-sensor association. However, it may suffer from spurious signals due to long-term use or gestures with heavy movement; It can fail to capture gesture-relevant muscles. Moreover, when there are more sensors available, the overall time complexity becomes polynomial to the number of sensors. Therefore, it is essential to reduce the noisy information by constructing an adaptive sensor association graph, which leaves only salient information.

To overcome the limitation of prior work, a self-attention-based GNN layer is incorporated into a spatiotemporal gesture recognition model so that we can construct a sequence of more robust adjacency graphs for representing the evolution of spatial relevances among sensors on each gesture input. The self-attention-based mechanism of our proposed GNN model is to adaptively determine the relationship between the nodes of EMG signals and decide edge weights. For learning the temporal information, a CNN-based neural model is exploited to process the raw data or learned spatial representations. Furthermore, our model is robust to reuse scenarios and more effective at recognizing dynamic gestures, which include more interactive movements like rotating the wrist clockwise. The classification accuracy and robustness of our system were evaluated over multiple acquisition sessions, including some that involved repositioning the sensor. In addition, the AI algorithm can learn quickly from a single trial per gesture, enabling simple yet robust online learning. The adhesive stretchable patchwork of holes, with the array sensors attached conformably to the skin, delivers stable sEMG signals, which enabled skin-like attributes. Long-term, real-time wireless monitoring of sEMG signals with self-attention-based robust graph neural network can provide various opportunities to control prosthetic and artificial electronic devices with high accuracy.

Results and discussion

Stretchable multichannel sEMG sensor system

Wearable electronic devices with extended skin-like attributes, such as stretchability and breathability (or water vapor permeability, WVP), tend to be more wearable overall, as well as more adaptive to external physical movements. Mimicking the modulus and stretchability of the skin requires wearable electronic devices with robustly engineered mechanical designs. Different types of two-dimensional (2D) filamentary stretchable designs, such as fractal and mesh-based motifs, have been widely applied to integrate soft and hard materials. Electronic devices covering a large portion of the skin demand filamentary stretchable design with skin-like functionalities to accomplish the desired task while preserving the quality and quantity of the measures without interruption25. Recently, nature- and Kirigami-inspired mechanically deformable motifs arranged in an array design had been used to fabricate EMG array sensors to manipulate robotic arms27. In this study, a similar design with slight modifications is adopted to fabricate the stretchable multichannel sEMG array. We present a Bluetooth-equipped onboard printed circuit board (PCB) package for real-time wireless monitoring with an artificial graph neural network (GNN)-based algorithm to identify gestures with high accuracy. The overall working concept is shown in Fig. 1. A manikin wearing the stretchable sEMG sensor patch around their forearm with a wireless acquisition device on top is shown in Fig. 1a. The details of the sensor structure, fabrication, and integration with the PCB are discussed in subsequent sections. The sensor patch, sensor facing the topside, and pristine and stretched (30%) conditions are shown in Fig. 1b. Figure 1c shows the process flow used to achieve a high gesture recognition accuracy: the forearm skeleton muscles are interfaced with the on-skin multichannel EMG sensors under neurological activation to generate EMG signals. The signals are detected by the data acquisition system and wirelessly transmitted using onboard Bluetooth technology, and the raw sEMG datasets collected over time are transformed into an image-like representation as an input to the neural network model for the high-accuracy gesture-recognition system.

Fig. 1: Concept image of the stretchable array sEMG sensor with GNN for static and dynamic gestures recognition system.
figure 1

a Schematic illustrating the human arm wearing the wireless acquisition device integrated with the stretchable array EMG sensor for real-time monitoring of EMG signals. b The wearable array sensor at pristine (0%) and stretched (100%) conditions (scale bar 4 cm). Process flow of gesture recognition starting from the recording of sEMG signals to gesture recognition with an AI-based graph attention network: c Cross-section view of the posterior forearm showing various muscles and the sensor covering a large portion of the muscles at the outermost layer of skin, along with the real image of PCB (front and back, scale bar 2 cm). The sEMG signals are supplied to the AI programming for gesture recognition accuracy.

Stretchable filamentary serpentine-based electrodes

Conductive material electrodes are pivotal to detect not only sEMG signals but also all types of electrophysiological signals. The sEMG signals measured using these electrode pairs by electromyograph ensure a qualitative measure of muscle activity essential for analysis and/or detection of disorders and in HMI. Mostly, single-electrode pairs of commercially available Ag/AgCl electrodes have been considered, but they are constrained to specific conditions or studies and diagnoses because the information collected is insufficient to analyze with the aim of measurement owing to the localized sampling of specific muscle activity. However, data acquisition from regional muscle activation increases the quantity and reliability of sEMG signals. This can be realized using single-lead gel electrodes arranged individually but would be inconvenient for daily use. Moreover, they suffer from dehydration over time, which causes the material to lose its conductivity, thereby increasing the skin–electrode impedance. This sometimes causes skin irritation and scars on the skin during prolonged usage. In addition, metals are highly conductive materials with high charge carriers and surface electric flux densities, which offer a high signal-to-noise ratio (SNR). An array of EMG sensors can be constructed from a thin metal layer deposited on a supporting film; however, they lack stretchability and permeability to sweat to conformably adhere to the skin for a long time. Long-term wear requires sweat to pass through the sensor system to avoid false changes in the skin–electrode interface owing to the accumulation of sweat droplets. Therefore, a sensor system that mimics the physical structure of the skin is essential to realize a robust and reliable sEMG acquisition system. Structurally stretchable layouts with intrinsically rigid metal films are promising candidates for fabricating skin-like stretchable sensors. However, sensors are mostly fabricated by conventional microfabrication techniques involving patterning, etching, and developing procedures, which are expensive and time-consuming. In contrast, the subtractive laser processing technique has attracted significant interest owing to its enabling features such as facile and ultrafast processing, low cost, user-programmability, and roll-to-roll processability27,28. A schematic illustration of the stretchable array EMG sensor design with Cu as an electrode material is shown in Fig. 2a. The sensor has 2 × 10 electrodes, of which eight pairs (aligned vertically to the neutral axis of the sensor design) were used as bipolar electrodes (measuring electrodes) to record the sEMG signals, and the remaining four electrodes were used as reference (ground electrodes) to reduce the background noise. The details of the sensor fabrication are provided in the Methods section. The schematics of the array sensor fabrication and dimensions are shown in Supplementary Fig. 1. Each electrode was a hexagonal polyimide (PI)-supported thin metal layout with a Kirigami-based serpentine routing geometry, referred to as a Kirigami-serpentine-metal (KSM) electrode. KSM electrodes with different fill factors and serpentine widths are shown in Supplementary Fig. 2a, b. The first three designs have an increased fill factor with the same serpentine width (Supplementary Fig. 2a) whereas the next three designs have the same fill factor with decreasing serpentine width (Supplementary Fig. 2b). The skin–electrode impedance and sEMG signals for all the designs were measured to select the design for the array sensor. The performance of the EMG signals depends on the impedance of the skin–electrode contact area, the larger the contact area, the lower the impedance, and thus the higher the amplitude of the signal. Designs 1 and 2 have less fill factor (less contact area to the skin) than the remaining, which means high impedance can be expected. As expected, the impedances of designs 1 and 2 showed high values over the range of frequencies compared to the remaining designs (Supplementary Fig. 3). Since designs 3–6 have the same fill factor, the same impedance is generally expected. However, designs 4, 5, and 6 have increasing perimeters because of a greater number of layers with hinges connecting each layer; the longer the perimeter more is the trace subjected to laser cutting. The fabricated serpentine width is generally less compared to the designs because of the laser ablation effect, which removes excess material surrounding either side of the cutting trace. Therefore, electrodes fabricated from designs 4, 5, and 6 or 3, showed impedances with a slightly increasing trend. The same applies to the EMG signals measured with these electrodes. Because of higher impedances for electrodes of designs 1 and 2, a lower SNR was obtained compared to the SNR for electrodes of designs 4, 5, and 6 or 3 (Supplementary Fig. 4 and Table 1). The SNR calculation was described in the Supplementary Information following Supplementary Figure 4. Moreover, the fabrication time for preparing the electrode of design 4 is much quicker than the remaining 5 and 6. Therefore, design 4 is considered for fabricating the stretchable array-type sensor.

Fig. 2: Large-area sEMG sensor array for continuous real-time monitoring of various static and dynamic forearm gestures.
figure 2

a Schematic illustration of stretchable array EMG sensor of eight bipolar channel electrodes and two reference electrodes highlighted with botted boxes and text (scale bar 1 cm). b Block diagram shows the sEMG data acquisition and wireless monitoring on the computer. c Sensor locations (A, B) on the illustrated forearm and their corresponding positions (1–4) represented on a cross-sectional view of the posterior forearm, and d hand gestures include rest, static gestures, and dynamic gestures for gesture recognition accuracy with graph attention network.

Table 1 Comparison of this work with other wearable sEMG systems.

Note that other conductive electrode materials such as thermally deposited gold, screen-printed silver, spin-coated PEDOT:PSS, and laser-induced carbon, may also be applied using laser processing technology with further improvements in their electrical conductivity and environmental stability (Supplementary Fig. 5). However, Ti/Cu was selected due to their high conductivity (in the order of 107 S/m), stability in environments, and cheaper. The wireless portable acquisition device was integrated into an 8-channel stretchable array sEMG sensor using an anisotropic conductive film (ACF) bonding technique and interfaces with the KSM bipolar electrode array to record and wirelessly transmit the encoded EMG signals via Bluetooth to the network interface chip installed on a host computer. The block diagram representing the connectivity of the channel KSM electrodes with the microcontroller unit (MCU) and Bluetooth is shown in Fig. 2b. The real image (front and back) of the PCB along with its highlighted and labeled components are shown in Supplementary Fig. 6a. A flowchart of the operation of the PCB is presented in Supplementary Fig. 6b. The photographs of the array sensor device wrapped over the forearm are shown in Supplementary Fig. 7. The specifications of the analog front end and microcontroller are presented in the Supplementary Table 2. The analog front-end circuits consist of a low-powered CMOS instrumentation amplifier (INA331), which provides low-noise amplification of differential signals, a Notch filter, and amplifiers (OPA313). A microcontroller (STM32WB35) with ARM Cortex-M4 is used for digital signal processing. The digitalized data of sEMG signals information is sent wirelessly to a base station connected to a computer/laptop via Bluetooth LE 5.2. A 3.7 V Lithium battery was used to power the device for streaming the data.

The gesture recognition accuracy of EMG signals corresponding to different gestures varies depending on the location and positioning of the array sensor. Deviations in the sensor positioning over multiple acquisition sessions result in accuracy degradation. This variation is mainly attributed to muscle-to-muscle crosstalk and therefore requires a deterministic sensor position for any desired aim of the operation. An optimized sensor position is pivotal in realizing a high degree of freedom in gesture recognition. Four main positions at locations A and B, for the array sensor, were considered, in which positions 1–3 at location A were displaced by rotation concerning position 1, and position 4 at location B was shifted from position 2 towards the wrist (Fig. 2c). The array sensor electrodes covering the forearm muscles, for positions 1–4, such as the extensors of the wrist (extensor carpi radialis and ulnaris), flexors of the wrist (flexor carpi radialis, ulnaris and palmaris longus), and flexors of fingers (flexor digitorum profundus/superficialis, and flexor pollicis longus), are schematically illustrated as a cross-sectional view in Fig. 2c. The muscles responsible for various static and dynamic gesture movements such as wrist flexion, extension, ulnar and radial deviation, finger flexion, and forearm supination, and pronation, are shown in the tabular form (Supplementary Table 3). The hand gesture poses used in this study to evaluate gesture recognition accuracy using the AI-based GNN are shown in Fig. 2d. We used a total of 18 gestures, which included resting (1), static (13), and dynamic gestures (4).

Preprocessing

The best-suited sensor position among positions 1–4 on the skin was evaluated based on gesture recognition accuracy using a robust AI-based graph neural network model. The AI model is designed to classify hand gestures based on given raw sEMG signals for a certain time. For a given time, raw sEMG signals are modified into the input shape of the AI model using a continuous wavelet transform (CWT)29. This represents the time-frequency space as a matrix in which variations in magnitude can be readily accessed as a heat map reveal the enabling features (Fig. 3a). The CWT calculates the inner product of the raw input and wavelet functions with varying scales and locations.

Fig. 3: An illustration of preprocessing multichannel sensor array and graph attention neural networks.
figure 3

a The sensor array is first transformed into a matrix shape (100 × 8). CWT is then applied by selecting 32 scales for every sensor (100 × 32 × 8). Finally, the matrix size is reduced with max pooling and baseline drift for computational efficiency (24 × 7 × 8). b A diagram of graph attention neural networks we used. The preprocessed input data (24 × 8 × 7) get through three temporal blocks and a spatial block followed by three fully connected layers. The temporal block consists of one convolutional neural network layer and a gated linear unit. The first CNN layer translates input dimensions 24 to 16, and the Gated linear unit decided whether the CNN layer’s output is usable. Spatial blocks filter the output of the temporal block to catch the signal’s physical location. Fully connected layers allow the final output to be classified as 18 gestures.

The CWT provides an image-like representation that can be learned by a neural network model and has been applied to many gesture-recognition systems30,31,32. Representative wavelet functions include the Morlet, Meyer, and Mexican Hat functions. Varying the scale and location of the wavelet functions has the advantage of analyzing both time and frequency resolutions. In addition, applying the CWT has a denoising effect (signal-smoothing effect) because the CWT repeats the process of calculating the correlation between the original signal and the wavelet functions by passing the raw sEMG signal while adjusting the scale and shift of the wavelet. From the CWT, 32 scales were selected for the input signal. After the determination of the scales, the input was transformed into a 3-D shape that contains 32 times more information than the raw sEMG signals. The final shape after CWT is 32(scale) × 8(channel) × 100(time). As per previous studies, the maximum delay in real-time control systems should remain below 300 ms. A time duration of 250 ms is considered for input, and our signal frequency of 400 Hz collects 100 data over 250 ms. The amount of time the model considers as a single input is determined based on the frequency of the sensor and the time of real-time control.

Gesture classification with Graph Neural Network

A sliding-window technique is applied to augment the data. Our data overlap and are used for the next input data sample. The gesture recognition model also uses signals from different time windows for the same gesture. The multichannel sensor array measures the signals for 5 s for action; however, for real recognition, it is desirable to predict gestures with a signal only 250 ms long. Therefore, effectively using the short-time signal of the preceding time window is important. Therefore, the input data should contain as many chunks as possible for the model to learn the decision boundaries of our gestures. The sliding-window technique has the advantage of capturing the continuity of time by overlapping the data between the inputs.

The processed input signals, 32(scale) × 8(channel) × 100(time), are further processed by spline interpolation-based downsampling at a factor of 0.25. The resultant input signals become reduced in size by a factor of 0.25 compared to the original scale and time. Downsampling is applied with max-pooling31, which selects the maximum value from four(scale) × four(time) candidate values. Reducing the size of the input signals reduces the computational cost of our model. Furthermore, during the long-term monitoring of multichannel signals data, diverse noise occurs like baseline drift and motion artifact. Baseline drift is a long-term noise due to the change in the baseline point over time. Motion artifact occurs when there are changes in the relative position of electrodes. After processing the CWT, the last column of our scale and time is dropped to remove the baseline drift and motion artifact31. Finally, the input shape becomes 7(scale) × 8(channel) × 24(time).

A model recognizing both the temporal and spatial relation was considered for this work. The spatiotemporal model adopt to decompose input data into spatial and temporal views is related to the inherent characteristics of sEMG signal data. A recent study31 constituted a convolution layer with a slow-fusion technique to simultaneously obtain both temporal and spatial information. Additionally, sEMG signals are processed using LSTM (Long Short-Term Memory), which has strengths in time series data processing, to capture the temporal information. Recently, in the field of EEG, which is brain-related signal data, there is a study that applies GNN by expressing the relationships between signals into graphs.

Our model extracts temporal and spatial information separately by designing two different neural network blocks and composing them. Figure 3b shows the graph of the attention spatiotemporal model. The model consists of three temporal blocks and a spatial block with a fixed sequence, temporal-spatial-temporal-temporal.

The temporal block extracts temporal information by passing data along its time axis. The block is formulated with a convolutional layer and a gated linear unit (GLU) layer33. As the input’s shape is 3D, which is similar to normal image data, the original convolutional layer is adopted to extract temporal information. The original convolutional layer was used with a kernel size of 7(scale) × 24(time) and 7 (kernels). The same number of kernels as the input channel was chosen to avoid changing the dimension after the convolution. All scales are aggregated over a small time with kernels to obtain temporal information from our multichannel sensor array. GLU is a skip-connection activation function that helps improve flow gradients. The time dimension size C started with 24 in the first temporal block. In the temporal block, the 24-time dimension size was reduced to the 16-time dimension through the convolutional layer. Similarly, the second temporal block reduced the 16-time dimension to 8.

The main part of our model, the spatial block, constructs graphs differently depending on the input to capture the relationship between channels and aggregate the information between them via GNN34. By observing the neighbor EMG signals, the GNN can update the node to reflect its neighbor’s features. The sensor’s channels (eight) are treated as nodes and their relations as the edges. Every signal has no global relation between the sensors, but rather only includes a local relation to each input datum. Therefore, to obtain the local relation of every input, a self-attention mechanism is applied to every input data sample to identify the connectivity between sensors, and a personalized sensor-sensor relationship graph is constructed. For details, please refer to the graph-construction algorithm (self-attention) section of the Methods section. The attention mechanism assigns high weights to sensors with sEMG signals more related to each other and gives low weights when their signals are not related. The relativity among the sensors is calculated using cosine similarity.

A brief comparison of recent works on relevant baseline is shown in Supplementary Table 4. A total of 6 existing methods are compared by considering (1) which signal types are processed, (2) which AI models are developed, (3) whether the dataset is constituted with static or dynamic gestures, and (4) whether the models can learn adaptive spatial relations with different input signals. MResLSTM35 process the signal data with the LSTM approach. LSTM-based sEMG classifier models focus more on the temporal relations of each sensor channel than the relations between different channel sensors. For example, MResLSTM36 classifies dynamic motion gestures by using the LSTM model. On the other hand, GNN is a neural network that is used to widely identify the spatial relationships between channels that can be expressed in a graph structure. GNN aggregates the information from the target node’s neighbors and updates its representation. GNN-based baselines are usually applied to similar domains like EEG and Biosignals. GNNs are adapted to these domains to overcome the limitation of the CNN-based approach, widely applied to classify signal-based data. The limitations of CNNs are that it identifies local (adjacent) spatial patterns due to the small size of the kernels. However, it is more crucial to identify complex relations between non-adjacent sensors. Similarly, GNN-based approach in sEMG domain can be effectively applied by setting the appropriate neighbor relationship. The self-attention allows our model to adaptively construct the graph data, which sets the appropriate relationships among sensors. The gesture type is a column indicating which type of gesture (dynamic or static) is used in the dataset. Since many gestures in the real-world are dynamic, it is important to identify both static and dynamic gestures. Static gestures are gestures with fixed motion when signals are extracted. Dynamic gestures are non-fixed movement gestures like rotating the wrist clockwise. To be more specific, sEMG signals of dynamic gestures are measured while the subject repeats the motion gestures several times. Our model is designed to recognize both types of gestures. Adaptive adjacency learning is a column to determine whether the model learns differently depending on its distinct input signal data. Supplementary Table 4 shows that our model can train and learn differently by constructing the graph adaptively unlike other baselines.

After passing through all temporal and spatial blocks, the resulting representation vector is used to classify the gestures (Fig. 3b). The representation vector is flattened and passed through a fully connected layer to classify gestures. Three fully connected layers are used with dimensions of 500, 2000, and 18 (the number of gestures). Finally, the softmax activation function is used as the last function. The cross-entropy loss function is used to optimize the classification model.

To find the best position for the array sensor on the skin corresponding to 18 gestures, the abovementioned four positions were considered; position 2 exhibited the highest accuracy of 97.76 ± 0.03% (Fig. 4a). The sensor electrode positions aligned well with the different types of muscles, which could be the reason for the high accuracy for position 2. Although the gesture recognition accuracy among three individual participants on two days was almost the same, which shows that position 2 was the appropriate position to distinguish each of the gestures regardless of the sensor position on other participants (Fig. 4b). The raw sEMG signals corresponding to the static and dynamic gestures recorded from eight bipolar channels with four referenced electrodes at position 2 are shown in Fig. 4c. The gesture recognition accuracy with the AI-based graph neural network of 18 gestures for sensor position 2 is shown in the form of a confusion matrix (Fig. 4d). In addition, experiments to check that our model is not overfitted to the training data were examined. The training dataset’s accuracy and loss with the validation dataset’s accuracy and loss were compared. The training, validation, and test dataset is split with the 8:1:1 ratio for every subject’s data. Furthermore, early stopping is employed for training. Early stopping terminates the training phase according to the validation loss to prevent over-fitting. The difference between training and validation is small, indicating that our model is not overfitted to the training data (Supplementary Fig. 8). The individual gestures were free from crosstalk with an error of < 0.06% among all possible combinations. The SNR of the commercial gel and KS electrodes are presented in Supplementary Fig. 9. In fact, the qualitative measures among gel and KS electrodes are valid only if the measurement of EMG signals from these electrodes were performed simultaneously. The electrodes placement for the simultaneous recording of EMG signals is shown in Supplementary Fig. 9a. It was confirmed that both electrode pairs exhibited the same SNR of ~23 (Supplementary Fig. 9b). In addition, the area and fill factor are two main geometrical parameters of the sensing electrode, which determine the quality of the signal; the larger the area and the higher the fill factor, the higher the amplitude. Despite the area and fill factor of KSM-based electrodes (design 4) being less than 20% and 40% of the gel electrodes, respectively, the KSM electrodes exhibited nearly the same SNR with minimal noise level background.

Fig. 4: Evaluation of on-skin sensor position accuracy by graph attention network using 18 gestures.
figure 4

a Gesture recognition accuracy corresponds to four positions of the sensor attached to the skin. b Gesture recognition accuracy among three participants wearing sensors at position 2 of day 1 and day 2. c 8-channel raw sEMG signals of participant 1 for sensor position 2 under 18 gestures. d Explicit representation of gesture recognition accuracy for sensor position 2 in the form of the Confusion matrix.

Mechanics of KS-based stretchable array sensor

The mechanical behavior of the stretchable array sensor device is also important to understand the structural limit of the design by simultaneously examining both electrical and mechanical variations under externally applied tensile loadings. Before understanding the mechanics of the array device, it is important to examine the mechanics of different designs shown in Supplementary Fig. 2. The photographs showing the electrodes of designs 1–6 at 0% and their respective tensile strains are shown in Supplementary Fig. 10a–e. The corresponding relative changes in resistance to the applied tensile strains are shown in Supplementary Fig. 10f. Under applied strains, the electrodes with less serpentine width and less number of hinges or layers (designs 1, 2, and 4) stretched more compared to the remaining designs. This is because hinges constrain the stretchability of the structure; the more the number of hinges, the lesser the stretchability of the structure. The applied strain values correspond to the initial rise in ΔR/R0 values, for all the electrodes, indicating the development of principal strains in the metal layer. Although electrodes of designs 1, 2, and 4 showed nearly the same stretchability, design 4 is selected over designs for exhibiting better EMG signals and skin–electrode impedances. Design 4 as an array type is constructed for analyzing its mechanical compliances under applied strains. For mechanical characterization, the array device required a different design circuitry from that of the sensor design circuitry, as it required a closed-loop electrical circuitry to the supplied voltages to measure the resistance changes under applied tensile strains. To evaluate the resilience of the sensor design, the bipolar electrodes of channel 2/6 were selected because high stresses can be concentrated in the middle of the structure compared to the other channel electrodes at the end, as shown in Supplementary Fig. 11a. The engineering stress-versus-strain curves with the simultaneous electrical behavior of the array sensor device attached to the patch are presented in Supplementary Fig. 11b. The attributes of the patches are discussed in this section. Mechanical and electrical measurements were performed simultaneously using a tensile testing machine (ESM303, Mark 10 Corp.) and an electrical probe station (4200 SCS, Keithley Instruments Ltd.). With an increase in the applied strain, the load increased steadily up to 25% strain, during which time the straight and arc wire segments underwent combined shearing and bending. A further increase in applied strain above 25% resulted in a continuous increase in tensile load, representing an elastic region of overall structural stretchability, and a fracture point was reached at ~45% strain, above which a subsequent breakdown of the mechanical design occurred. Simultaneously, the relative change in the resistance of channel 3 was negligible until a strain of ~36% to a load of ~1 N. This signifies that the applied strain values were far behind the maximum principal strain value in the metal layer (~0.3% for copper) and subsequently increased drastically with a further increase in internal metal strains below the fracture strain of 10% for copper. Thus, the applied strain caused the metal to break at the mechanical breakdown threshold, as shown in Supplementary Fig. 11a. Furthermore, the results of experiments conducted for the sensor with the patch were similar to those of the sensor without the patch (Supplementary Fig. 11b). Cyclic fatigue tests were performed to evaluate the mechanical structural durability of the array sensor under continuous cyclic loading conditions. The relative change in resistance to cyclic loading from the initial 0% strain to the final 30% strain over 3500 cycles is shown in Supplementary Fig. 11c. The electrical resistance was quite stable, and started to increase slightly after 2000 cycles, which may have been due to the repeated variations in stresses in the metal layer. From these analyses, it is clear that the mechanical structure of the sensor device was able to endure 30% strain without undergoing any electrical failure, a value nearly equivalent to human upper-skin stretchability37, which enabled a robust skin-like sensor platform.

The patch for wearable sensors/electronics, owing to its softness and stickiness, plays an important role in attaching the device conformably to the skin to facilitate a stable interface between the device and the skin for steady monitoring. Moreover, the stretchable patch provided mechanical stability to the array device structure during attachment to and detachment from the skin. Patch holes were fabricated using a CO2 laser (VLS 3.5, Universal Laser System, USA) to enable skin-like WVP. The characterization of WVP was carried out as per a previous report27. The modulus of elasticity, adhesion, and WVP of a 0.8 mm thick patch of holes were 172 kPa, ~56 kPa (measured in peeling mode), and ~10 gm−2h−1 (Supplementary Fig. 12a–c). The adhesion of the patch was sufficient to hold the sensor on the skin; high adhesion may cause damage to the skin during peel-off. Moreover, the WVP of the patch lies in the range of skin WVP (8-20 gm−2h−1)38,39,40. The skin-like stretchable patch was free from rashes, redness and pressure marks even after continuous wear for four days (Supplementary Fig. 13).

Accuracy comparison with the relevant baselines

The accuracy comparison experiments were performed to show the superiority of our model. The recent baselines that most align with our contributions were compared, applying GNN to the spatiotemporal models and classifying dynamic gestures. STCN26 is the spatiotemporal model, which uses GNN to capture spatial relations and a CNN-based approach to acquire temporal information. It also parameterizes the edge weights for GNN but unifies them for each subject. On the other hand, our model parameterizes the edge weights for every input signal through self-attention-based graph construction. MResLSTM36 is the model, which checks the ability to distinguish dynamic gestures. MResLSTM uses the Conv-LSTM model to capture the Spatiotemporal information. Since GNN usually uses fewer parameters, the processing time is also faster than the Conv-LSTM model. The total accuracies of the 3 models are shown in the Supplementary Table 5. From the table, our model has promising performance in classifying both dynamic and static gestures. Learning graphs input-wise with self-attention shows better performance than STCN, which learns graphs for each subject. The accuracy of each gesture is shown in the Supplementary Table 6. From the figure, our model is robust in every gesture compared to the other baselines.

Long-term usability and reusability of KS-based stretchable sensor array

The stretchable sensor array device worn around the forearm is shown in Supplementary Video 1. To confirm long-term usability, sEMG signals were recorded every 24 h without detaching the sensor for 72 h. A static gesture, the front wrist fold, was considered to evaluate the long-term usability of the system. It was observed that the raw sEMG signals of all channels barely changed for a period of 72 h, as shown in Fig. 5a, which depicts the long-term stability of the array sensor. For validation, the long-term usability of the array sensor was compared with that of commercial gel electrodes. The variations in amplitude and SNR over time for channels 1, 2, and 3 of the KS-sensor array and commercial gel electrode are plotted as VRMS versus time in Fig. 5b. Maintaining a high amplitude with a higher SNR is particularly important for long-term usage. From Fig. 5b, it may be observed that the amplitude and SNR of the sEMG signals of the array-based sensor were relatively stable, maintaining an average SNR of 18.42 dB. However, the signals of bipolar commercial gel electrodes varied from 16.61 dB to 7.39 dB over 72 h owing to dehydration of the gel, which causes resistance to increase along with an increase in skin–electrode impedance.

Fig. 5: Evaluation of long-term usability of the on-skin stretchable array sensor by recording the sEMG signals and graph attention network using 18 gestures.
figure 5

a Raw sEMG signals recorded consecutively four times every 24 h, for position 2 of the wrist-fold back gesture. b The extracted SNR for channels 1–3 and conventional gel electrodes; gel electrodes suffer from dehydration in contrast to stable thin metal film-based KS stretchable electrodes. c Gesture recognition accuracy execution for every 24 h. Explicit representation of gesture recognition accuracy of sensor position 2 in the form of the confusion matrix executed d initially, and e after 72 h.

In addition, the gesture recognition accuracy of 18-gestures after every 24 h for up to 72 h was evaluated using an AI-based GNN model. Average accuracy of 96.61%, 95.91%, 95.35%, and 94.82% was observed for the initial, 24 h, 48 h, and 72 h, respectively (Fig. 5c). This once again signifies the stability of the sensor in maintaining a high gesture recognition accuracy without any significant reduction. An explicit representation of the recognition accuracy for the 18 gestures in the form of a confusion matrix is shown in Figs. 5d and 5e. The results show that all gestures were accurately distinguished without any false or cross-recognition with other gestures.

Similarly, the reusability test was qualitatively evaluated based on the sEMG signals measured using the array sensor. The position and gesture were the same as those in the case of long-term usability. The qualitative measurement of sEMG signals before and after 25 repeated detaching and attaching procedures is shown in Supplementary Fig. 14a,b. The reusability test evaluated for the wrist-fold front gesture by plotting gesture recognition accuracy against the reusability count GNN algorithm is shown in Supplementary Fig. 14c. The corresponding SNR of channels 1, 2, 5, and 6 for the reusability count are shown in Supplementary Fig. 14d. The explicit representation of the 18-gestures recognition accuracy, before and after the reusability of the sensor 25 times, in the form of a confusion matrix, is shown in Supplementary Fig. 14e, f. The results are similar to those for long-term usability and demonstrate the robustness of the sensor in delivering stable EMG signals under repeated detaching and attaching cycles.

The long-term usability and reusability tests of the sensor were performed to show its robustness to alignment spurious noise. These two tests have different characteristics from experiments in a general offline setting. Both experiments verify the robustness of our method from the noise generated by wearing the sensor for a long time and the noise from different alignments by wearing the sensor again. The AI model’s self-attention technique helped to maintain high accuracy. To be more specific, data-adaptive graph construction by self-attention leads to the robustness of alignment noise and long-term noise.

A comparison of our work with reported sEMG sensor systems is presented in Table 141,42,43,44,45,46,47. Compared to other reports of EMG sensor arrays with gesture recognition algorithms, a self-attention-based GNN to capture the relationship between input-aware multichannel sensors was utilized in the present work. The self-attention technique is applied to construct a multichannel sensor array into a graph data structure. This enabled us to find the relationship between the sensors and build an input graph adaptively. Under the long-term and reusability tests, our AI model is robust in finding the relationship between the sensors adaptively.

In summary, a large-area, stretchable, bipolar sEMG sensor electrode array integrated with a Bluetooth onboard real-time acquisition device for wireless monitoring of sEMG signals is presented. A system with an AI-based GNN trained to recognize 18 gestures including static and dynamic gestures with an average accuracy of ~97% is reported. The sensor system utilizes a structurally stretchable electrode array supported by a stretchable, sticky, and breathable patch to record stable EMG signals for long-term usability. The developed AI model performed better at predicting gestures through sEMG signal learning by leveraging both temporal and spatial layers. For the spatial layer, a self-attention-based graph neural network was applied to effectively capture the relations between sensors. Generating relations for each input enhances the robustness to external physical disturbances and improves performance. The sEMG array sensor, mechanically supported by a stretchable patch, delivered long-lasting sensor performance while maintaining nearly the same gesture recognition accuracy under long-term usability and repeated reusability. Owing to the rapid and accurate recognition of gestures, the system can enable applications with efficient control, ranging from prosthetic hands to virtual reality. It may even be applicable to sign language, where the visual mode of communication can be carried through hand gestures or signals between signers, which is often used by people with hearing impairments and deaf people.

Methods

KS-electrodes sensor array fabrication

Commercially available PI sheets with a thickness of 25 µm were used as flexible films to prepare a stretchable multichannel EMG sensor array. The PI film was thoroughly cleaned in acetone, IPA, and deionized water using an ultrasonicator for 10 min. Ti (10 nm) and Cu (100 nm) were deposited on the PI film via e-beam evaporation. The patterning of Ti/Cu was achieved by IR nano-pulsed laser (IN Laser, Korea) of 1064 nm wavelength using the laser ablation technique28. A laser power of 2.8 mW, repetition rate of 20 kHz, and speed of 2300 mm/s are the optimized laser parameter values for ablating the Ti/Cu in the desired pattern. For the encapsulation of extended Cu interconnects region, the KS-sensor electrodes were covered with thermal tape followed by spraying Nexcare liquid bandage (3 M, USA). The thermal tape was detached after sufficient drying of the liquid bandages. The Ti/Cu patterned PI film was attached to a glass plate using thermal release tape to hold the sensor during the cutting process. To provide mechanical stretchability, the film was cut by UV laser (INNO6, Korea) using a laser power of 5.2 A. Placing the laser-cut film stack on the hot plate at 180 oC for 5 min allows the laser-cut stretchable sensor array to easily peel-off from the thermal release tape. The stretchable electrode array was attached to a supporting soft skin-like stretchable patch (TNL) with a diameter of 180 µm and a density of 100 pores/cm2. The pores were made using a CO2 laser (VLS 3.5, Universal Laser System, USA) with an average power of 35 W and a speed of 889 mm/s. The patch allowed sweat to pass through the pores, enabling water vapor permeability. The size of the array sensor was 165 × 44 mm, and that of the stretchable patch was 185 × 65 mm.

On-skin impedance and sEMG measurements

Before the impedance and sEMG measurements, the skin was cleaned using an alcohol swap to remove unnecessary dust particles that could affect the signal quality. Skin–electrode impedances measured over a range of frequencies were carried out by multichannel electrochemical analyzer impedance spectroscopy (Ivium Technology, Netherlands). The electrodes placement is like the electrodes placed for measuring EMG signals (shown in Supplementary Fig. 9). The KS-based sensor array was roughly aligned and attached to a certain location on the upper forearm (Fig. 4b). The sEMG signals of the sensor array were mainly measured with a customized PCB acquisition device, which was specially designed for recording eight channels of bipolar electrodes, including a Bluetooth communication system and a 3.7 V battery. The signal was measured four times for 5 s to classify 18 hand gestures using an AI model and for 5 s (1 s of rest, 3 s of gesture, and 1 s of rest) to visually check the sEMG signal. Separately, for comparison between commercial Ag/AgCl electrodes and a KS-based sensor, the sEMG signals were simultaneously measured with a commercially available MP36 system (BIOPAC, USA) using an EMG signal filter of 30–250 Hz and 60 Hz of a notch filter.

Gesture recognition experiments

Participants were asked to perform 18 wrist gestures, including rest (1), static (wrist (8), finger (5)), and dynamic gestures (4). During the static gesture measurement, the gesture was maintained steadily for a certain period, whereas for the dynamic gesture measurement, the gesture moved constantly over time. Three participants were involved in the experiment, and one male participant performed four experiments. The first experiment was designed to find the best location for the array-based sensor to be worn on the upper forearm and achieve the most informative sEMG signals from 18 hand gestures. Four positions of the array sensor were tested based on the most relevant muscles associated with the 18 hand gestures. The second experiment was performed to determine the differences in the degree of gesture recognition among the participants. All participants were asked to undergo sEMG signal measurement of 18 gestures at the same sensor position and re-measure the same signals on different days. For Experiment 3, the male participant was asked to wear the array sensor for 72 h without any delamination in between for a long-term usability test. The sEMG signals of 18 gestures were first measured to set the baseline for accuracy and repeated every 24 h for 72 h. In Experiment 4, the array sensor was repeatedly laminated and delaminated at the same forearm position to obtain the accuracy of the sEMG signals of 18 gestures to qualitatively evaluate the reusability test.

Graph-construction algorithm using self-attention

The construction of an input graph has a significant effect on the performance of GNN models. Therefore, it is necessary to construct graphs with different shapes depending on the incoming sensor signals. Previously, a self-attention mechanism48 was proposed to determine the relative importance of each input segment. Here, the self-attention method is adapted to construct the graph and is defined as

$${{{\mathbf{A}}}} = {\mathrm{softmax}}\left( {{{{\mathbf{XW}}}}_{{{\boldsymbol{Q}}}}^T{{{\mathbf{XW}}}}_{{{\mathbf{K}}}}} \right)$$
(1)

where \({{{\mathbf{W}}}}_{{{\boldsymbol{Q}}}},{{{\mathbf{W}}}}_K \in \;{\mathbb{R}}^{d_{{\mathrm{feat}}} \ast d_{{\mathrm{hid}}}}\), which are the learnable parameters with the same dimension size \(d_{{\mathrm{feat}}}\) (input feature dimension) * \(d_{{\mathrm{hid}}}\) (hidden dimension). \({{{\mathbf{X}}}} \in \;{\mathbb{R}}^{|{\mathrm{node}}| \ast d_{{\mathrm{feat}}}}\) is the feature matrix of our multichannel sensor array. \(|{\mathrm{node}}|\) is the total number of nodes. \(d_{{\mathrm{feat}}}\) is the dimension of hidden nodes in the previous temporal layer. \({{{\mathbf{A}}}}\; \in \;{\mathbb{R}}^{|{\mathrm{node}}| \ast |{\mathrm{node}}|}\) is a matrix which \(A_{ij}\) shows the relationship weight of sensors i and j. This equation is a part of the self-attention equation. The matrix A is a weight matrix and each weight is calculated by considering every channel signals. Therefore, when applying the GNN, this matrix is then used as the adjacency matrix. Therefore, our model can adaptively obtain a different adjacency matrix for each input data by updating \({{{\mathbf{W}}}}_{{{\boldsymbol{Q}}}}\) and \({{{\mathbf{W}}}}_{{{\boldsymbol{K}}}}\) during the training. Since the adjacency matrix is adaptively constructed with the given input, the model with the constructed adjacency matrix is robust to long-term noise and can enhance the quality of identifying dynamic gestures. The intensity of the adjacency matrix is shown on Supplementary Video 2.

Data processing

The dataset was recorded from three individual healthy subjects (two males and one female, healthy, age 20–29). The dataset comprised both the static and dynamic gestures. During the experiment, subjects were asked to wear the device without removing it until the last measurement. Each dataset collected every 3 h is referred to as a ‘round’ comprising 4 repetitions and each repetition includes 5 s of raw signal data for every 18 gestures (including ‘rest’). To learn diverse temporal patterns that can be used to recognize short or long hand gestures, the window size of a round (the amount of a single time-chunk of our input) was chosen as 250 ms. One-hundred samples were collected during 250 ms with a frequency of 400 Hz. The first 100 samples were obtained using a sliding window as the input unit of the deep neural network model. The data were sliced with a 95% overlap of 237.5 ms (95 samples) to obtain five new samples for the next input unit. Therefore, our input shape became 7 (scale) × 8 (channel) × 25 (time) because 100 (time) samples were also downsampled. Finally, the last column of our time scale was dropped to convert the scale into even numbers for the convenience of processing. The Mexican hat wavelet function was used to characterize the raw sEMG signal while transforming it with the CWT. sEMG signals of 144,000 (5 s × 18 gestures × 4 repetitions × 400 Hz) for each channel were acquired for each round. Eight thousand sEMG signals were collected for each gesture. Experiments were conducted four times per repetition for each round. The sEMG signals of 38,000 (144,000/4) were transformed into ~8300 ((38,000/4) × 7/8) inputs to the model. The signals are reduced owing to the downsampling and reduced column of the scale.

Ethical consideration

The experiments involving human subjects were performed with the full consent of the volunteers. All participants provided written informed consent. To get preliminary data as a pilot study, Institutional Review Board of the affiliated university approved the study (IRB no. SKKU-2022-12-003).