Towards smart glasses for facial expression recognition using OMG and machine learning

Kiprijanovska, Ivana; Stankoski, Simon; Broulidakis, M. John; Archer, James; Fatoorechi, Mohsen; Gjoreski, Martin; Nduka, Charles; Gjoreski, Hristijan

doi:10.1038/s41598-023-43135-5

Download PDF

Article
Open access
Published: 25 September 2023

Towards smart glasses for facial expression recognition using OMG and machine learning

Ivana Kiprijanovska¹,
Simon Stankoski¹,
M. John Broulidakis¹,
James Archer¹,
Mohsen Fatoorechi¹,
Martin Gjoreski²,
Charles Nduka¹ &
…
Hristijan Gjoreski^1,3

Scientific Reports volume 13, Article number: 16043 (2023) Cite this article

1132 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

This study aimed to evaluate the use of novel optomyography (OMG) based smart glasses, OCOsense, for the monitoring and recognition of facial expressions. Experiments were conducted on data gathered from 27 young adult participants, who performed facial expressions varying in intensity, duration, and head movement. The facial expressions included smiling, frowning, raising the eyebrows, and squeezing the eyes. The statistical analysis demonstrated that: (i) OCO sensors based on the principles of OMG can capture distinct variations in cheek and brow movements with a high degree of accuracy and specificity; (ii) Head movement does not have a significant impact on how well these facial expressions are detected. The collected data were also used to train a machine learning model to recognise the four facial expressions and when the face enters a neutral state. We evaluated this model in conditions intended to simulate real-world use, including variations in expression intensity, head movement and glasses position relative to the face. The model demonstrated an overall accuracy of 93% (0.90 f1-score)—evaluated using a leave-one-subject-out cross-validation technique.

Development of the RIKEN database for dynamic facial expressions with multiple angles

Article Open access 08 December 2023

RETRACTED ARTICLE: Facial expressions can detect Parkinson’s disease: preliminary evidence from videos collected online

Article Open access 03 September 2021

A deep learning model for classifying human facial expressions from infrared thermal images

Article Open access 19 October 2021

Introduction

Affective computing and remote emotion monitoring are scientific fields that can have a strong impact on digital medicine, especially in the realm of digital solutions for mental-health management¹. Mental health and mental states exist on a complex continuum, experienced differently from one person to another². Thus, it would be extremely beneficial if people could monitor clinically significant parameters known to be related to mental health (e.g., facial expressions and emotions) with the same ease as they can track heart rate and daily step count. Qualitative mental health management would also have a positive effect on cognitive aging, as poor mental health accelerates age-related cognitive decline³. Cognitive aging is a worldwide emergency in and of itself. In 2020, more than 20.6% of the EU population was aged over 64, and this percentage will increase by at least 3% every 10 years⁴. In vulnerable populations with diagnosed medical problems, mental health disorders (such as depression) double the risk for cardiac mortality in people with and without cardiac disease. This relation has been confirmed for major depression, as well as in volunteers with elevated depressive traits, even when they fail to meet a formal diagnosis⁵. Another study linked depression symptoms and ischemic heart disease (coronary heart, heart attack, and angina)⁶.

The face is one of the most expressive parts of our body⁷ and has an important role in communicating emotional and mental states, as well as behavioural intentions. The relationship between facial expressions and emotions has been studied since the late nineteenth century⁸. In the late twentieth century Ekman’s ground-breaking studies first hypothesized that emotions and facial expressions are universally recognizable⁹. In recent years, this direct mapping between a facial expression and a given emotion has been significantly challenged. Meta-analyses of autonomic physiology, behaviour, and even brain imaging, all report little evidence for consistent and specific facial expression derived ‘fingerprints’ for different categories of emotion, like anger, sadness, and fear^10,11. Instead, whilst emotions and affective states are related to facial expressions, the relationship is likely both personalized and context-based¹². Thus, one should be careful when analyzing emotional states through the lenses of facial expression recognition systems that assume a direct correspondence between expressions and emotions.

Given the relationship between facial expressions and mental states, a variety of remote sensing approaches have been proposed in the past (for a more detailed overview, please refer to survey studies in the field^13,14). Regarding the recent studies related to head-worn sensors for facial tracking, several technological approaches have been proposed, including facial tracking based on EMG, EOG, capacitive sensors, and cameras. Regarding the EMG-base sensing, Sato et al.¹⁵ and Gjoreski et al.¹⁶ investigated devices with EMG sensors positioned over specific facial muscles: the frontalis (left and right side of the forehead; the orbicularis (left and right side of the eyes); the zygomaticus (left and right side of the cheeks); and the corrugator muscle (between the eyebrows). EOG sensing has been utilized in several studies^17,18,19. Yeo et al. presented JINSense smart glasses with a focus on gesture recognition and human–computer interaction¹⁷. Li et al. developed KissGlass, a smart-glasses-based method for monitoring kinds of cheek kissing gestures¹⁸. Rostaminia et al.¹⁹ developed W!NCE, a smart-glasses-based method for monitoring upper-face activation (brow movement, cheek movement, nose wrinkling, and blinking). In addition to the EMG- and EOG-based approaches, studies have explored smart glasses in combination with camera-based sensing²⁰ and capacitive sensing²¹. Regarding the commercially available smart glasses, the OCO sensors are always oriented towards the user/wearer, thus avoiding one of the biggest privacy issues that smart glasses have. For example, Google Glass—and similar solutions from Sony (SmartEyeGlass)²² and Vuzix (M300)²³—introduce privacy concerns as these devices could continuously record video using a front-mounted camera, compromising bystanders’ privacy²⁴. Regarding other face-tracking tools, camera-based systems are frequently used to track facial expressions and infer emotional states in the wild. However, their effectiveness is limited due to restrictions on sensor location, lighting, intrusiveness, and movement.

This study investigates the feasibility of using novel optomyography (OMG) based glasses for recognizing four facial expressions: smile, frown, eyebrow raise and squeezed eyes. Activation of the muscles involved in creating these expressions is used to determine emotional response, specifically valence²⁵ and pain²⁶. The OCOsense system consists of multi-sensor wearable glasses (Fig. 1). The OCO sensors are non-contact optical sensors that can read facial skin movement in three dimensions, providing a higher resolution signal (± 4.7 μm X&Y-axis, ± 4.0 μm Z-axis) than the average camera-based solution and has advantages over EMG-based systems. Although EMG is perhaps the most sensitive way to objectively track facial movement, the electrodes also require firm and constant contact with the skin in order to achieve an acceptable signal-to-noise ratio, which is not practical in a glasses format. Instead, the OCO sensors are optically based, removing the requirement for direct skin contact and are able to function accurately from 4 to 30 mm away from the skin.

An approach that is the most similar to ours has been proposed by Masai et al.²⁷, i.e., they used 16 photo-reflective sensors mounted on the glasses frame. Differently than our approach, the sensors are measuring skin movement around the eyes, focusing on the monitoring of eye-based gestures (e.g., blinks, winks, eye movements and eye gaze), rather than the face-muscle movements.

More specifically, in this study we present analysis about:

The relationship between the OCO sensor data and facial expressions.
The influence of head movement on OCO sensor data.
The influence of glasses position on OCO sensor data.
The usage of machine learning for facial expression recognition from OCO sensor data.

The experimental results demonstrated that the OCO sensors can capture distinct variations in the cheek and brow movements for each of the facial expressions studied. Specifically, cheek movements were primarily observed during the smile and squeezed eyes expression, in contrast to the frown and eyebrow raise expression. Conversely, brow movements were primarily detected during the eyebrow raise and frown expression, compared to the squeezed eyes and smile expression. The study also found head movement does not have a significant impact on the skin movement measurements obtained by the OCO sensors during different facial expressions. However, the position of the glasses on the face does, particularly when monitoring brow movement during a frown, eyebrow raise or squeezed eyes expression. The measurements obtained from the OCO sensors were also used to train a machine learning model to recognize smile, frown, eyebrow raise, and squeezed eyes expression, and when the face enters a neutral state. The model evaluated different conditions expected to mimic real-world use, including variations in expression intensity, head movement and glasses position, demonstrating an overall accuracy of 93% (0.90 f1-score). These findings highlight the ability of the OCO sensors to accurately capture facial movements associated with different facial expressions in various conditions, providing evidence for their potential use as a reliable tool for monitoring and detecting changes in facial expressions.

Results

Relationship between facial expressions and OCO sensor data

Low-intensity expressions

To assess the ability of the OCO sensors to detect movement of facial muscles during different facial expressions, we conducted statistical analyses, which involved comparing the measurements of the brow and cheek sensors during different voluntary facial expressions. To perform statistical tests, from the brow and cheek OCO sensors (right and left cheek, right and left eyebrow) measuring the skin movement along the X-, Y-, and Z-axis, we calculated the mean cheek and brow movements for each expression type, for each subject. For the statistical analysis on the low-intensity expressions, we used the data where the participants were performing both short (1 s) and long (3 s) low-intensity expressions, and the mean movement for a specific expression type was calculated over all performed expressions of that type by each participant. This procedure led to n = 27 (number of participants in the dataset) tuples of four values, where each value represented the mean cheek/brow movement for each expression type (eyebrow raise, frown, smile, and squeezed eyes). To test if there is a statistically significant difference in the mean cheek and brow movements detected by the glasses for all pairs of performed expressions—eyebrow raise vs. frown, eyebrow raise vs. smile, eyebrow raise vs. squeezed eyes, frown vs. smile, frown vs. squeezed eyes, and smile vs. squeezed eyes—we used the Wilcoxon signed-rank (paired) test with an with Bonferroni correction (alpha = 0.05). Figure 2 shows the mean cheek (left plot) and brow (right plot) movements during different low-intensity expressions, presented on the X-axis (from left to right: eyebrow raise, frown, smile, squeezed eyes), and the results from the statistical test.

For the cheek OCO sensors, we can observe more movement during the smile and squeezed eyes expression (median values 5.84 mm and 5.91 mm, respectively) compared to during eyebrow raise and frown expression (median values 2.25 mm and 2.48 mm, respectively). The results from the statistical test indicate that the mean movement of the cheek is significantly different between eyebrow raise and smile expression (p-value: 2.505 × 10^–9), eyebrow raise and squeezed eyes expression (p-value: 3.983 × 10^–9), frown and smile expression (p-value: 4.415 × 10^–7), and frown and squeezed eyes expression (p-value: 4.909 × 10^–8). The difference in cheek mean movement between eyebrow raise and frown expression (p-value: 1 × 10⁰) and between smile and squeezed eyes expression (p-value: 1 × 10⁰) is not statistically significant.

For the brow OCO sensors, we can observe the most movement during the eyebrow raise expression (median value: 10.14 mm), followed by the frown and squeezed eyes expression (median values 7.46 mm and 5.70 mm, respectively), while the lowest movement is observed during the smile expression (median value: 1.84 mm). The results from the statistical test indicate that the movement of the brow is significantly different between all pairs of expressions, except the frown and squeezed eyes expression (p-value: 1 × 10⁰). The p-values for the rest of the expression pairs are lower than 10^–4.

High-intensity expressions

For the statistical analysis on the high-intensity expressions, we used data where the participants were performing both short (1 s) and long (3 s) high-intensity expressions, and the mean movement for a specific expression type was calculated over all performed expressions of that type by each participant. This procedure led to n = 27 (number of participants in the dataset) tuples of four values, where each value represented the mean cheek/brow movement for each expression type (eyebrow raise, frown, smile, and squeezed eyes). To test if there is a statistically significant difference in the mean cheek and brow movements detected by the glasses for all pairs of performed expressions—eyebrow raise vs. frown, eyebrow raise vs. smile, eyebrow raise vs. squeezed eyes, frown vs. smile, frown vs. squeezed eyes, and smile vs. squeezed eyes—we used the Wilcoxon signed-rank (paired) test with Bonferroni correction (alpha = 0.05). Figure 3 shows the mean cheek (left plot) and brow (right plot) movements during different high-intensity expressions, presented on the X-axis (from left to right: eyebrow raise, frown, smile, squeezed eyes), and the results from the statistical test.

For the cheek OCO sensors, we again observe more movement during the smile and squeezed eyes expression (median values 12.14 mm and 12.70 mm, respectively) compared to during the eyebrow raise and frown expression (median values 4.70 mm and 5.17 mm, respectively). The results from the statistical tests indicate that the cheek movement is significantly different between eyebrow raise and smile expression (p-value: 1.640 × 10^–8), eyebrow raise and squeezed eyes expression (p-value: 2.102 × 10^–9), frown and smile expression (p-value: 9.820 × 10^–7), and frown and squeezed eyes expression (p-value: 2.553 × 10^–8). The difference in cheek movement between eyebrow raise and frown expression (p-value: 9.011 × 10^–1) and between smile and squeezed eyes expression (p-value: 1 × 10⁰) is not statistically significant.

For the brow OCO sensors, we can observe the highest movement intensity during the eyebrow raise expression (median value: 16.16 mm), followed by the squeezed eyes and frown expression (median values 14.14 mm and 11.51 mm, respectively), while the lowest movement intensity is observed during the smile expression (median value: 4.00 mm). The results from the statistical test indicate that the movement of the brow is significantly different between all pairs of expressions. The lowest difference is observed between the eyebrow raise and squeezed eyes expression (p-value: 2.473 × 10^–2), followed by the frown and squeezed eyes expression (p-value: 1.132 × 10^–3). The p-values for the rest of the expression pairs are lower than 10^–4.

Influence of head movement on OCO sensor data

To investigate the influence of head movement on the OCO sensor data, we performed statistical tests on the mean cheek and brow movements differences between expressions performed while the participants held their head still, and expressions performed while the participants simultaneously moved their head in a specific direction (to the right, to the left, upwards, or downwards). For these comparisons, we used only high-intensity, long-duration (3 s) expressions data, and the mean movement for a specific expression type was calculated over all performed expressions of that type by each participant. This procedure led to n = 27 (number of participants in the dataset) tuples of two values for each expression type, for both the cheek and the brow movement. One of the tuples values is representing the mean value for the expressions where no head movement was included, and the other one for the expressions performed with head movement included. To test if there is a statistically significant difference in the mean cheek and brow movements detected by the glasses for all expressions in the movement and no-movement condition we used the Wilcoxon signed-rank (paired) test with Bonferroni correction (alpha = 0.05). Figure 4 shows the mean cheek (left plot) and brow (right plot) movements during different high-intensity expressions performed with/without head movement, presented on the X-axis (from left to right: eyebrow raise, frown, smile, squeezed eyes), and the results from the statistical test.

For the cheek movements, the statistical test showed p-values higher than the significance level of 0.05 for all expression types (p-value = 8.690 × 10^–1 for the eyebrow raise expression, p-value = 3.973 × 10^–1 for the frown expression, p-value = 4.421 × 10^–1 for the smile expression, and p-value = 1 × 10⁰ for the squeezed eyes expression) indicating that there is no statistically significant difference between the two analysed conditions. The same was confirmed with the statistical tests applied on the brow sensor measurements, which also showed p-values higher than the significance level of 0.05 for all expression types (p-value = 1 × 10⁰ for the eyebrow raise, frown, smile, and squeezed eyes expression).

Influence of glasses position on OCO sensor data

To evaluate the influence of the positioning of the glasses on the OCO sensor data during different expressions, we performed statistical tests on the mean cheek and brow movements differences between expressions of high-intensity and short (1 s) and long (3 s) duration performed while the participants were wearing the glasses in three different positions: low (the lowest point on the nasal bridge where the glasses are staying in place), medium (the medium point on the nasal bridge), and high (the highest point on the nasal bridge). The mean movement for a specific expression type was calculated over all performed expressions of that type by each participant. This procedure led to n = 27 (number of participants in the dataset) tuples of three values for each expression type, for both the cheek and the brow movement. One of the tuples values is representing the mean value for the expression performed for the low position, the other is representing the mean value for the expression performed for the medium position, and the third one is related to the high position. To test if there is a statistically significant difference in the mean cheek and brow movements detected by the glasses for all expressions performed while the participants were wearing the glasses in three distinct positions, we used the Wilcoxon signed-rank (paired) test with Bonferroni correction (alpha = 0.05). Figure 5 shows the mean cheek (left plot) and brow (right plot) movements during different expressions performed while the participants were wearing the glasses in the three distinct positions, and the results from the statistical test.

The statistical test results showed no statistically significant difference in the cheek movement between all three positions of the glasses during the eyebrow raise and frown expression, which are characterized with no notable cheek activations. The results also showed that there is no significant difference in the cheek movements detected by the sensors between all three positions of the glasses during the smile expression. For the squeezed eyes expression, the measured cheek movements were significantly different in the three cases: high vs. medium position (p-value = 8.013 × 10^–3), high vs. low position (p-value = 3.701 × 10^–5), and medium vs. low position (p-value = 2.024 × 10^–2).

On the other hand, for the brow movements, the statistical test results showed that the position of the glasses has no influence on the measurements only during the smile expression. For the frown expression, there was a significant difference between the high and low position (p-value = 7.993 × 10^–5) and a significant difference between the high and medium position (p-value = 7.598 × 10^–4). Comparable results were obtained for the eyebrow raise expression. Namely, when the glasses were positioned highest on the nose bridge, the detected brow movements were significantly higher compared to when the glasses were positioned low or medium on the nose bridge (p-value = 8.941 × 10^–7, and p-value = 1.788 × 10^–7, respectively). Lastly, for the squeezed eyes expression, there was a significant difference between all tested pairs of glasses positioning: high vs. low position (p-value = 7.689 × 10^–6), high vs. medium position (p-value = 1.788 × 10^–6), and medium vs. low position (p-value = 2.454 × 10^–2).

Machine learning for facial expression recognition from OCO sensors data

The same dataset from the statistical analysis, was also used to train a machine learning model for recognizing facial expressions and when the face enters a neutral state. The dataset contains four distinct facial expressions: smile, frown, eyebrow raise, and squeezed eyes, with varying degrees of intensity and duration. Additionally, a subset of the expressions was recorded while the participants were simultaneously moving their head in a specific direction. All participants performed the expressions while wearing the glasses in one of three different positions on the nose: low, medium, and high. After the segmentation and feature extraction, the dataset contained 361,226 instances and 162 features, which were fed to a Random Forest model. Figure 6 presents the distribution of the labels of the machine learning instances. More details about the machine learning pipeline are presented in the section OCOsense Expression Recognition Machine Learning Pipeline. The focus in these experiments was on the data rather than the machine learning pipeline. Because of that, the pipeline was simplistic to avoid high evaluation scores caused by advanced data processing and machine learning modelling.

The machine learning model yielded an overall accuracy of 93% (0.90 f1-score), evaluated on the data collected from n = 27 participants using a leave-one-subject-out cross-validation technique. The classification matrix is illustrated in Fig. 7, and the classification report is shown in Table 1. Per-participant accuracies are presented in the Supplementary material (Fig. S1). The lowest accuracy was 0.81 (participant 2), and the highest accuracy is 0.99 (participant 14). On average, the model achieved high evaluation scores for recognizing smile, eyebrow raise, and squeezed eyes expression, with recall scores of 0.9 for each of these expressions. The lowest recall score is observed for the frown expression, which is mistaken for the expression of squeezed eyes in 10% of the cases. This misclassification can be attributed to the similarities in the movement of the eyebrow during these expressions, which are both characterized by the activation of the frontalis and corrugator supercilii muscle, leading to the lowering of the inner portion of the eyebrow and the bringing together of the eyebrows. Another notable aspect of the model’s performance is that, aside from the misclassification of the frown expression, the model demonstrates minimal instances of confusion between other expressions. This suggests that the differences in the cheek and brow movements during different expressions are easily detectable by the sensors in the glasses.

Table 1 Classification report for the ML facial expression recognition model evaluated on the whole dataset.

Full size table

Low-intensity expressions vs. high-intensity expressions

To investigate the effect of expression intensity on the model’s performance in recognizing facial expressions, separate analyses were conducted on low-intensity and high-intensity expression data. The classification matrices for the low-intensity and high-intensity expressions are illustrated in Figs. 8 and 9, respectively. The classification reports are also presented in Tables 2 and 3. From the analyses, it can be observed that the model demonstrates only slightly better performance for the recognition of high-intensity expressions, achieving an accuracy of 95% (0.92 f1-score), compared to the low-intensity expressions, where it achieves 92% accuracy (0.88 f1-score). While the rate of misclassification between different expressions is consistent across both data sets, a lower rate of instances of expressions being classified as neutral was observed for high-intensity expressions. Overall, the results indicate that the glasses sensors can capture even minimal muscle activations that are related to different facial expressions, which are sufficient for the model to recognize the performed expression.

Table 2 Classification report for the ML facial expression recognition model evaluated on the low-intensity facial expressions.

Full size table

Table 3 Classification report for the ML facial expression recognition model evaluated on the high-intensity facial expressions.

Full size table

No-movement vs. movement expressions

To investigate the influence of head movement on the model’s performance in recognizing facial expressions, separate analyses were conducted on expressions performed while the participants had their head still, and expressions performed while the participants simultaneously moved their head in a specific direction (to the right, to the left, upwards, or downwards). These comparisons include only high-intensity, long-duration expressions data. The classification matrices for both cases are illustrated in Figs. 10 and 11. The classification reports are also presented in Tables 4 and 5. The analyses show that the model demonstrates only slightly better performance for the recognition of facial expressions performed while participants had their head still, achieving 95% accuracy (f1-score 0.93), compared to the expressions performed while participants moved their head, where it achieves 92% accuracy (0.90 f1-score). Again, the rate of misclassification between different expressions is low and consistent across both data sets, except in the frown expression case. Namely, the model correctly identifies 89% of the frown expression instances in the no-movement scenario, while misclassifing only 4% as squeezed eyes expression, compared to 80% of correctly identified frown expression instances in the movement scenario, and 8% instances misclassified as squeezed eyes expression. Overall, the results indicate that the OCO sensors incorporated in the glasses are not substantially influenced by head movement and are able to capture facial movements related to different facial expressions even in motion scenarios. This eventually results in a robust machine learning model for facial expression recognition that has a high accuracy and consistency, regardless of the inclusion of head movement.

Table 4 Classification report for the ML facial expression recognition model evaluated on the facial expressions performed in a still scenario.

Full size table

Table 5 Classification report for the ML facial expression recognition model evaluated on the facial expressions performed with head movement.

Full size table

Low vs. medium vs. high position of the glasses

To evaluate the influence of the positioning of the glasses on the model’s performance in recognizing facial expressions, separate analyses were conducted on expressions performed while the participants were wearing the glasses in three different positions: low (the lowest point on the nasal bridge where the glasses are staying in place), medium (the medium point on the nasal bridge), and high (the highest point on the nasal bridge). These comparisons include only high-intensity expressions with varying duration (short and long). The classification matrices for the low, medium, and high position are illustrated in Figs. 12, 13, and 14, respectively. The classification reports are also presented in Tables 6, 7, and 8. The model demonstrates high accuracy in recognizing the facial expressions, regardless of the position of the glasses. Namely, the model achieved an accuracy of 92% on the low-position and medium-position data (f1-score of 0.88 and 0.87, respectively), and an accuracy of 94% on the high-position data (f1-score of 0.9). The increased accuracy observed in the high-position scenario in comparison to the low and medium position can be attributed to the reduction in the number of instances of facial expressions being misclassified as neutral. This effect is particularly prominent for the frown and eyebrow raise expression. This can be explained by the possibility that the sensors located on the upper frame of the glasses to not be able to capture skin movement and muscle activation in the eyebrow and forehead area, which are the regions primarily involved in these two expressions, when the glasses are worn in the low or medium position. The model’s performance for the recognition of other expressions remains consistent across all positions of the glasses.

Table 6 Classification report for the ML facial expression recognition model evaluated on the ‘low-position’ data.

Full size table

Table 7 Classification report for the ML facial expression recognition model evaluated on the ‘medium-position’ data.

Full size table

Table 8 Classification report for the ML facial expression recognition model evaluated on the ‘high-position’ data.

Full size table

Discussion

In this study we explored the performance of OMG-based smart glasses (OCOsense) for monitoring facial movements associated with different facial expressions. We performed statistical analysis and machine learning analysis.

The results from the statistical analysis indicated that:

OCO sensors were able to accurately capture differences in cheek and brow movements with a high level of sensitivity and specificity. This includes detecting both low-intensity expressions and high-intensity expressions (Figs. 2 and 3). This is particularly noteworthy, as previous studies have shown that subtle expressions made at a low intensity can be challenging to recognize and differentiate, even for the human eye^29,30,31. This highlights that OCOsense may be a valuable tool for monitoring both intense and subtle facial expressions, which convey potentially valuable information about an individual’s emotional state.
The analysis on the influence of head movement on the OCO sensor data showed that head movements did not have a statistically significant impact on the sensor data in the explored scenario. This is an important finding, as it suggests that OCOsense glasses may be well-suited for monitoring facial expressions in real-world settings, where head movement obviously will be expected to co-occur with facial expressions. Nevertheless, additional mechanisms based on the sensor data can be employed in order to avoid faulty sensor reading. For example, one could avoid sensor readings above 30 mm as this is outside of known specification for the OCO sensors.
The position of the glasses was found to have some influence on the OCO sensors measurements, particularly when monitoring the brow movement during frown, eyebrow raise, and squeezed eyes expression. This can be explained by the possibility that the sensors located on the upper frame of the glasses are unable to capture skin movement in the eyebrow and forehead area, which are the regions primary involved in these three expressions, when the glasses are not correctly positioned. This finding highlights the importance proper fit of the glasses on the wearer’s face to ensure accurate measurements are taken.

The results from the statistical analysis were further confirmed with the machine learning experiments. The measurements obtained from the OCO sensors provided valuable data for the development of a robust machine learning model that can differentiate between a neutral state and four different facial expressions performed in different conditions, including variations in the intensity, duration, and presence of head movement, as well as glasses position. Depending on the conditions, the f1-score varied between 0.87 and 0.93.

Regarding the related work, the authors of the EMG-based study¹⁶, performed a follow-up machine-learning analysis³². The experimental setup in³² is almost the same as the experimental setup in this preset study, enabling a more direct comparison between our study and the EMG-based system. In³², 30 subjects performed the same five facial expressions (smile, frown, eyebrow raise, squeezed eyes and neutral) with two intensities (low and high), and with or without head movements. Depending on the experimental setup, the EMG-based ML models achieved F1 scores between 0.8 and 0.86 for recognizing the five facial expressions in a leave-one-subject-out evaluation among 30 participants. In our study, the OCOsense-based models achieved F1 scores between 0.88 and 0.92 for recognizing the same five facial expressions in a leave-one-subject-out evaluation among 27 participants. These results show that the OCOsense-based ML models are at least as good (probably even better) as the EMG-based models from³², for recognizing the five facial expressions.

Conclusions and implications for future work

In conclusion, this study evaluated the performance of the OCOsense smart glasses with novel OMG based OCO sensors, which measure skin movement in three dimensions over key facial muscles, as a tool for capturing facial expressions and recognition of those expressions using statistical and machine learning methods. The experiments were conducted using a dataset gathered from 27 participants, who were performing facial expressions varying in intensity, duration, and head movement. Statistical tests showed expected differences in the cheek and the brow movement during smile, frown, eyebrow raise, and squeezed eyes expression—namely, the sensors detected increased cheek skin movements during smile and squeezed eyes expression compared to frown and eyebrow raise expression. Conversely, increased brow movement was detected during the eyebrow raise and frown expression, compared to the squeezed eyes and smile expressions. The study also found that head movements do not have a significant impact on the measurements obtained from the OCO sensors when monitoring the movement of the cheek and brow during different facial expressions. However, the position of the glasses was found to have some influence on the OCO sensors measurements, particularly when monitoring the movement of the brow during frown, eyebrow raise, and squeezed eyes expressions. Furthermore, the use of the OCO™ sensors provided valuable data for the development of a ML-based expression recognition algorithm, which yielded an accuracy of 93% (0.90 f1-score) evaluated on data collected from n = 27 participants using a leave-one-subject-out cross-validation technique.

Collectively, these findings demonstrate that the expression recognition model based on the OCOsense glasses data can be used as a reliable tool for monitoring facial expressions as well as serving to highlight the potential of this technology in a variety of applications. For example, in the field of human–computer interaction, the use of smart glasses with an expression recognition model can enable more natural and intuitive interactions between individuals and technology. Recognized expressions can be used as a source for interaction and control of different devices, such as hands-free control of a wheelchair or hands-free control of a heads-up display, without the need to interact with physical buttons. As facial expressions play a major role in conveying affective states, such sensing technology can be used in the emotion recognition and affective computing field within real-world scenarios or longitudinal studies for continuous monitoring and managements of emotion disorders. For mental health professionals, this technology could provide an objective way to monitor facial expressivity, as one of the major behavioural markers of depression³³. This can aid them in gaining deeper insights and understanding of their patients’ emotional states, enabling them to provide more personalized treatment that addresses specific needs.

Regarding the statistical analysis, we chose the Wilcoxon signed-rank test because it's suitable for comparing two paired samples, especially when the data is not normally distributed. While non-parametric tests often require larger sample sizes to achieve the same power as parametric tests, our observed effect size was substantial enough that our sample of 27 participants was sufficient to detect a significant difference (p-values were smaller than 0.0001 in most of the cases where statistical significance was observed). While we recognize that larger samples would provide greater confidence, our findings align with prior research in this area, lending further credibility to our results³².

The per-subject accuracy (see Fig. S1 in Supplementary material) showed accuracy differences depending on the test subject. In this study, the only step we took to address the person-specific differences was the person-specific standardization. More advanced approaches, e.g., self-supervised learning, could be investigated to address the personalization problem.

The next steps in this research include moving from analysing voluntary facial expressions to analysing spontaneous expressions and robustly validating the glasses' ability to detect them in naturalistic environments. This will involve testing the glasses on a diverse group of participants to ensure that the detection of expressions is not affected by factors such as race, gender, or age. Furthermore, the research will also involve further development of the model to improve its ability to detect subtle spontaneous expressions and gestures, which are often harder to detect but may still convey valuable information about a person's emotional state.

Methods

Participants

A group of 27 healthy volunteers, 11 females and 14 males, with a mean age of 26.3 ± 7.66 (range 16–47) were recruited to participate in the experiment. The inclusion age range was 16–68 years. Detailed demographic information about the recruited participants is available in the Supplementary Table S1. Exclusion criteria for recruitment were the presence of facial neuromuscular and nervous disorders. Ethical approval was obtained from the London—Riverside Research Ethics Committee on 15 July 2022 (ref: 22/LIO/0415). After a detailed explanation of the experimental procedure, all participants also provided written informed consent before participating in the study. The experiment was conducted following institutional ethical provisions and the Declaration of Helsinki.

Apparatus

For data collection, we used the Emteq’s OCOsense smart glasses. The glasses contain six OCO™ sensors, a 9‑axis IMU, altimeter and integrated dual speech detection microphones. In this study, we analysed only the data coming from the OCO™ sensors. The OCO™ sensors are proprietary sensors developed by Emteq Labs, which use an optical non-contact approach called optomyography (OMG). They are used to measure skin movement due to underlying myogenic activity in 3 dimensions (X, Y, and Z) and do not require skin contact, i.e., they can function accurately from 4 to 30 mm away from the skin. The sensors are built into the OCOsense glasses frame and are positioned to measure skin movement over the frontalis muscle (left and right side of the forehead), zygomaticus major muscle (left and right side of the cheeks), and the left and right temples.

Experimental scenario

Prior to initiation of the data collection procedure, all participants were briefed on the experimental design, task requirements, and equipment utilized.

The data collection protocol involved performing three different tasks (Task A, Task B, and Task C) in which the participants were instructed to perform four distinct facial expressions, namely, smile, frown, eyebrow raise, and squeezed eyes, varying in intensity and duration. All expressions were repeated three times in each task:

1.
In Task A, the participants were required to perform voluntary expressions with varying intensity (low and high) and duration (short and long) as follows: (1) low-intensity, short-duration expressions (one second per repetition); (2) low-intensity, long duration expressions (three seconds per repetition); (3) high-intensity, short duration expressions (one second per repetition); and (4) high-intensity, long duration expressions (three seconds per repetition).
2.
In Task B, the participants were required to perform voluntary expressions of high intensity and long duration while simultaneously moving their head in a specific direction, as follows: (1) high-intensity expressions with head movement to the left (three seconds per repetition); (2) high-intensity expressions with head movement to the right (three seconds per repetition); (3) high-intensity expressions with head movement upwards (three seconds per repetition); and (4) high–intensity expressions with head movement downwards (three seconds per repetition).
3.
In Task C, the participants were required to perform voluntary expressions of high-intensity and short and long duration while wearing the glasses in three different positions: low (the lowest point on the nasal bridge where the glasses are staying in place), medium (the medium point on the nasal bridge), and high (the highest point on the nasal bridge). Figure S. 2 (in the Supplementary material) presents a visual explanation of the positioning of the glasses. The expressions in Task C were performed in the following order: (1) low position: high-intensity, short duration expressions (one second per repetition); (2) low position: high-intensity, long duration expressions (three seconds per repetition); (3) medium position: high-intensity, short duration expressions (one second per repetition); (4) medium position: high-intensity, long duration expressions (three seconds per repetition); (5) high position: high-intensity, short duration expressions (one second per repetition); and (6) high position: high-intensity, long duration (three seconds per repetition).

The data collection process was uninterrupted. The participants were instructed to maintain a neutral facial expression in between each posed expression.

Statistical analysis

Data preprocessing

The following data processing steps were applied to the acquired sensor data:

Linear trend removal: the linear trend was removed from each signal obtained from the OCO cheek and brow sensors to eliminate the influence of long-term drift on the measured skin movement.
Calculation of the vector magnitude for each sensor: as the OCO sensors measure skin movement in 3 dimensions (X, Y, and Z), the vector magnitude was calculated for each sensor (\(\sqrt{{\mathrm{X}}^{2}+{\mathrm{Y}}^{2}+{\mathrm{Z}}^{2}}\)).
Combination of processed sensor signals values from the left and right sensors: the vector magnitude value from the left cheek sensor was added to the vector magnitude value from the right cheek sensor, and the same was done for the brow sensors. This resulted in the creation of two signals, one representing the total cheek movement (left + right), and one representing the total brow movement (left + right).
Smoothing of the resulting signals: the resulting cheek and brow signals were smoothed using a rolling median filter with a window size of 15 samples to reduce the effects of noise on the signals.

Statistical tests

The overall data analysis was conducted using Python programming language. Hypothesis testing was performed using the Wilcoxon signed-rank test. The Wilcoxon signed-rank test is a non-parametric version of the paired T-test that compares the distribution of differences between related paired samples to determine if they come from the same distribution. The null hypothesis of the test is that the samples come from the same distribution. The p-values were Bonferroni-corrected. The significance annotations were represented as: *if p \(\in\) [0.05, 10^–2); **if p \(\in\) [10^–2, 10^–3); ***if p \(\in\) [10^–3, 10^–4); and ****if p > = 10^–4.

OCOsense expression recognition machine learning pipeline

To model the relation between the facial movements detected by the OCO sensors and the facial expressions, we used a simple expression recognition algorithm utilizing signal processing and machine learning techniques. The algorithm utilized data from all six OCO™ sensors incorporated in the OCOsense glasses, resulting in a total of 18 sensor streams. During the data collection procedure, the OCO data were continuously recorded at a fixed rate of 50 Hz. The raw sensor data underwent a data preprocessing pipeline, including filtering, segmentation, and feature engineering. The data were filtered using a low-pass filter, and the linear trend was removed from each signal stream. A sliding window technique was utilized for the sensor data segmentation. The signals were segmented using a 0.1-s window and a 0.1-s window stride, meaning that there was no overlap between consecutive windows. Eventually, 9 statistical features per sensor stream were extracted, resulting in a total of 162 features. The features included the mean, standard deviation, minimum, maximum, range, interquartile range, kurtosis, skewness, and root mean square. The feature set for each participant was also standardized (removing the mean and scaling to unit variance). The standardized features were used as an input to a Random Forest algorithm, which output the recognized expression as smile, frown, eyebrow raise, squeezed eyes, or neutral state. Random Forest builds an ensemble of decision trees, where each tree node only a subset of the features is considered. The final classification decision is made by a majority vote. Ensemble models such as those built by Random Forest are a good starting point for training machine learning models, because they do not require extensive hyperparameter tuning (compared to neural networks, for example) and are usually more robust to noisy features (compared to non-ensemble models). Due to the imbalanced nature of the dataset, the neutral class was under sampled in the training process, so that all five classes were evenly distributed.

Data availability

The datasets generated and analysed during the current study are not publicly available due to company policies on data protection and confidentiality, but are available from the corresponding author on reasonable request.

References

De Angel, V., Lewis, S., White, K., Oetzmann, C., Leightley, D., Oprea, E., Lavelle, G., et al. Digital health tools for the passive monitoring of depression: A systematic review of methods. NPJ Digit. Med. 5(1), 1–14 (2022).
https://www.who.int/news-room/fact-sheets/detail/mental-health-strengthening-our-response [Online. Accessed: 24.01.2023].
Marin, M.-F. et al. Chronic stress, cognitive functioning and mental health. Neurobiol. Learn. Mem. 96(4), 583–595 (2011).
Article PubMed Google Scholar
Eurostat, Population structure and ageing: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Population_structure_and_ageing#Slightly_more_than_three_persons_of_working_age_for_every_person_aged_65_or_over [Online. Accessed: 24.01.2023].
Penninx, B. W. J. H. et al. Depression and cardiac mortality: Results from a community-based longitudinal study. Arch. Gen. Psychiatry 58(3), 221–227 (2001).
Article CAS PubMed Google Scholar
Luo, J., Zhang, T., Zhang, D., & Zhang, H. The combined role of obesity and depressive symptom in the association with ischemic heart disease and its subtypes (2022).
Kruzic, O., Catherine, D. K., Herrera, F. & Bailenson, J. Facial expressions contribute more than body movements to conversational outcomes in avatar-mediated virtual environments. Sci. Rep. 10(1), 1–23 (2020).
Article Google Scholar
Darwin, C. The expression of the emotions in man and animals (3rd ed.) (1872).
Ekman, P., Friesen, W.V., & Ellsworth, P. Emotion in the human face in studies in emotion and social interaction (1972).
Barrett, L. F. Are emotions natural kinds?. Perspect. Psychol. Sci. 1(1), 28–58 (2006).
Article PubMed Google Scholar
Wager, T. D. et al. A Bayesian model of category-specific emotional brain responses. PLoS Comput. Biol. 11(4), e1004066 (2015).
Article ADS PubMed PubMed Central Google Scholar
Barrett, L. F. How emotions are made: The secret life of the brain. Quebec Psychol. Rev. 40(1), 153–157 (2017).
Google Scholar
Maithri, M. et al. Automated emotion recognition: Current trends and future perspectives. Comput. Methods Prog. Biomed. 1, 106646 (2022).
Article Google Scholar
Li, S., & Deng, W. Deep facial expression recognition: A survey. IEEE Trans. Affect. Comput. (2020).
Sato, W. et al. Emotional valence sensing using a wearable facial EMG device. Sci. Rep. 11(1), 1–11 (2021).
ADS CAS Google Scholar
Gjoreski, M. et al. Facial EMG sensing for monitoring affect using a wearable device. Sci. Rep. 12(1), 16876 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Yeo, H.-S., Lee, J., Woo, W., Koike, H., Quigley, A. J., & Kunze, K. JINSense: Repurposing electrooculography sensors on smart glass for midair gesture and context sensing. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–6 (2021).
Li, R., Lee, J., Woo, W., & Starner, T. Kissglass: Greeting gesture recognition using smart glasses. In Proceedings of the Augmented Humans International Conference, pp. 1–5 (2020).
Rostaminia, S., Lamson, A., Maji, S., Rahman, T. & Ganesan, D. W! nce: Unobtrusive sensing of upper facial action units with eog-based eyewear. Proc. ACM Interact. Mobile Wear. Ubiquitous Technol. 3(1), 1–26 (2019).
Article Google Scholar
Yan, Z., Wu, Y., Zhang, Y., & Chen, X. A. EmoGlass: An end-to-end ai-enabled wearable platform for enhancing self-awareness of emotional health. In CHI Conference on Human Factors in Computing Systems, pp. 1–19 (2022).
Matthies, D. J. C., Weerasinghe, C., Urban, B.,& Nanayakkara, S. Capglasses: Untethered capacitive sensing with smart glasses. In Augmented Humans Conference 2021, pp. 121–130 (2021).
Sony smart glasses: https://www.aniwaa.com/product/vr-ar/sony-smarteyeglass/.
Vuzix smart galsses: https://www.vuzix.com/pages/smart-glasses.
Shrestha, P. & Saxena, N. An offensive and defensive exposition of wearable computing. ACM Comput. Surv. (CSUR) 50(6), 1–39 (2017).
Article Google Scholar
Cacioppo, J. T. et al. Electromyographic activity over facial muscle regions can differentiate the valence and intensity of affective reactions. J. Pers. Soc. Psychol. 50(2), 260 (1986).
Article CAS PubMed Google Scholar
Kunz, M., Meixner, D. & Lautenbacher, S. Facial muscle movements encoding pain—a systematic review. Pain 160(3), 535–549 (2019).
Article PubMed Google Scholar
Masai, K., Kunze, K., & Sugimoto, M. Eye-based interaction using embedded optical sensors on an eyewear device for facial expression recognition. In Proceedings of the Augmented Humans International Conference, pp. 1–10 (2020).
Zarins, U. Anatomy of facial expression. Anatomy Next, Incorporated (2019).
Ekman, P. & Friesen, W. V. Nonverbal leakage and clues to deception. Psychiatry 32(1), 88–106 (1969).
Article CAS PubMed Google Scholar
Yan, W.-J., et al. CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces. In 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE (2013).
Ekman, P. Lie catching and microexpressions. Philos. Decept. 118–133 (2009).
Kiprijanovska, I., Sazdov, B., Majstoroski, M., Stankoski, S., Gjoreski, M., Nduka, C., Gjoreski, H. facial expression recognition using facial mask with EMG sensors. In Workshop on Virtual Reality for Health and Wellbeing, 21st International Conference on Mobile and Ubiquitous Multimedia (2023).
Cowan, T. et al. Computerized analysis of facial expressions in serious mental illness. Schizophrenia Res. 241, 44–51 (2022).
Article ADS Google Scholar

Download references

Acknowledgements

The authors acknowledge the contributions of all their colleagues at Emteq Ltd. towards the development of the OCOsense system. Special thanks go to Bojan Jakimovski, Borjan Sazdov, and Bojan Sofronievski for their support in the data collection and data preparation process for the study.

Funding

This work was supported by Innovate UK under the project “Mobile Observation of Depression (MOOD) platform for digital phenotyping” (Grant number 105207). H. Gjoreski’s work was partially funded by the WideHealth project (EU’s Horizon 2020 research and innovation programme, grant agreement No. 952279).

Author information

Authors and Affiliations

Emteq Ltd., Brighton, BN1 9SB, UK
Ivana Kiprijanovska, Simon Stankoski, M. John Broulidakis, James Archer, Mohsen Fatoorechi, Charles Nduka & Hristijan Gjoreski
Faculty of Informatics, Università della Svizzera Italiana, 6900, Lugano, Switzerland
Martin Gjoreski
Faculty of Electrical Engineering and Information Technologies, Ss. Cyril and Methodius University in Skopje, 1000, Skopje, North Macedonia
Hristijan Gjoreski

Authors

Ivana Kiprijanovska
View author publications
You can also search for this author in PubMed Google Scholar
Simon Stankoski
View author publications
You can also search for this author in PubMed Google Scholar
M. John Broulidakis
View author publications
You can also search for this author in PubMed Google Scholar
James Archer
View author publications
You can also search for this author in PubMed Google Scholar
Mohsen Fatoorechi
View author publications
You can also search for this author in PubMed Google Scholar
Martin Gjoreski
View author publications
You can also search for this author in PubMed Google Scholar
Charles Nduka
View author publications
You can also search for this author in PubMed Google Scholar
Hristijan Gjoreski
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have approved of the submitted version of this manuscript and have agreed to be accountable for their contributions and for the work in this manuscript. The authors specific contributions were as follows: writing (review and editing), conceptualization and interpretation: all authors; formal analysis: I.K; investigation, methodology, visualization, and software: I.K., M.G., S.S., J.A., M.F.; project administration, funding acquisition: M.J.B., H.G., and C.N. The authors affirm that the participants in the study signed informed consent for publishing the images from the Supplementary material in an online open-access publication.

Corresponding author

Correspondence to Ivana Kiprijanovska.

Ethics declarations

Competing interests

Dr. M. Gjoreski and dr. H. Gjoreski have previously worked as consultants with the funding company, though they did not receive funding for this study. I. Kiprijanovska, S. Stankoski, J. Archer, M. Fatoorechi, and dr. M. J. Broulidakis are employees at the funding company. Dr. C. Nduka MA, MD, FRCS, is the founder of the funding company.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kiprijanovska, I., Stankoski, S., Broulidakis, M.J. et al. Towards smart glasses for facial expression recognition using OMG and machine learning. Sci Rep 13, 16043 (2023). https://doi.org/10.1038/s41598-023-43135-5

Download citation

Received: 14 February 2023
Accepted: 20 September 2023
Published: 25 September 2023
DOI: https://doi.org/10.1038/s41598-023-43135-5

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Development of the RIKEN database for dynamic facial expressions with multiple angles

RETRACTED ARTICLE: Facial expressions can detect Parkinson’s disease: preliminary evidence from videos collected online

A deep learning model for classifying human facial expressions from infrared thermal images

Introduction

Results

Relationship between facial expressions and OCO sensor data

Low-intensity expressions

High-intensity expressions

Influence of head movement on OCO sensor data

Influence of glasses position on OCO sensor data

Machine learning for facial expression recognition from OCO sensors data

Low-intensity expressions vs. high-intensity expressions

No-movement vs. movement expressions

Low vs. medium vs. high position of the glasses

Discussion

Conclusions and implications for future work

Methods

Participants

Apparatus

Experimental scenario

Statistical analysis

Data preprocessing

Statistical tests

OCOsense expression recognition machine learning pipeline

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links