Estimation of continuous valence and arousal levels from faces in naturalistic conditions

Toisoul, Antoine; Kossaifi, Jean; Bulat, Adrian; Tzimiropoulos, Georgios; Pantic, Maja

doi:10.1038/s42256-020-00280-0

Article
Published: 11 January 2021

Estimation of continuous valence and arousal levels from faces in naturalistic conditions

Nature Machine Intelligence volume 3, pages 42–50 (2021)Cite this article

4364 Accesses
93 Citations
27 Altmetric
Metrics details

Subjects

Abstract

Facial affect analysis aims to create new types of human–computer interactions by enabling computers to better understand a person’s emotional state in order to provide ad hoc help and interactions. Since discrete emotional classes (such as anger, happiness, sadness and so on) are not representative of the full spectrum of emotions displayed by humans on a daily basis, psychologists typically rely on dimensional measures, namely valence (how positive the emotional display is) and arousal (how calming or exciting the emotional display looks like). However, while estimating these values from a face is natural for humans, it is extremely difficult for computer-based systems and automatic estimation of valence and arousal in naturalistic conditions is an open problem. Additionally, the subjectivity of these measures makes it hard to obtain good quality data. Here we introduce a novel deep neural network architecture to analyse facial affect in naturalistic conditions with a high level of accuracy. The proposed network integrates face alignment and jointly estimates both categorical and continuous emotions in a single pass, making it suitable for real-time applications. We test our method on three challenging datasets collected in naturalistic conditions and show that our approach outperforms all previous methods. We also discuss caveats regarding the use of this tool, and ethical aspects that must be considered in its application.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Valence and arousal circumplex.**

**Fig. 2: Overview of the traditional approach to facial affect recognition contrasted with our methodology.**

**Fig. 3: Overview of the architecture of our model.**

**Fig. 4: Qualitative results of our approach.**

Demographic effects on facial emotion expression: an interdisciplinary investigation of the facial action units of happiness

Article Open access 04 March 2021

A uniform human multimodal dataset for emotion perception and judgment

Article Open access 07 November 2023

A study on computer vision for facial emotion recognition

Article Open access 24 May 2023

Data availability

The datasets analysed during the current study are available from the original authors in the AFEW-VA (https://ibug.doc.ic.ac.uk/resources/afew-va-database/), AffectNet (http://mohammadmahoor.com/affectnet/) and SEWA (https://db.sewaproject.eu/) repositories. The list of cleaned images for the validation and test sets of AffectNet employed in this paper are available on the authors’ Github repository (https://github.com/face-analysis/emonet).

Code availability

The pretrained network, testing code and the annotations of the cleaned AffectNet test and validation sets are available at https://github.com/face-analysis/emonet under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence (CC BY-NC-ND).

References

Posner, J., Russell, J. A. & Peterson, B. S. The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev. Psychopathol. 17, 715–734 (2005).
Article Google Scholar
Russell, J. A circumplex model of affect. J. Pers. Soc. Psychol. 39, 1161–1178 (1980).
Article Google Scholar
Gunes, H. & Schuller, B. Categorical and dimensional affect analysis in continuous input: current trends and future directions. Image Vision Comput. 31, 120–136 (2013).
Article Google Scholar
Sariyanidi, E., Gunes, H. & Cavallaro, A. Automatic analysis of facial affect: a survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1113–1133 (2015).
Article Google Scholar
Mollahosseini, A., Hasani, B. & Mahoor, M. H. AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10, 18–31 (2019).
Article Google Scholar
Kossaifi, J., Tzimiropoulos, G., Todorovic, S. & Pantic, M. AFEW-VA database for valence and arousal estimation in-the-wild. Image Vision Comput. 65, 23–36 (2017).
Article Google Scholar
Kossaifi, J. et al. SEWA DB: a rich database for audio-visual emotion and sentiment research in the wild. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2019.2944808 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (2016).
Lin, L. I.-K. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989).
Article Google Scholar
Ringeval, F. et al. AV⁺ EC 2015: The first affect recognition challenge bridging across audio, video, and physiological data. In Proc. 5th International Workshop on Audio/Visual Emotion Challenge (ACM, 2015).
Valstar, M. et al. AVEC 2016: Depression, mood, and emotion recognition workshop and challenge. In Proc. 6th International Workshop on Audio/Visual Emotion Challenge (ACM, 2016).
Ringeval, F. et al. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proc. 7th Annual Workshop on Audio/Visual Emotion Challenge (ACM, 2017).
Ringeval, F. et al. Avec 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. In Proc. 2018 on Audio/Visual Emotion Challenge and Workshop (ACM, 2018).
Kollias, D., Cheng, S., Ververas, E., Kotsia, I. & Zafeiriou, S. Deep neural network augmentation: generating faces for affect analysis. Int. J. Comp. Vision 128, 1455–1484 (2020).
Article Google Scholar
Jang, Y., Gunes, H. & Patras, I. Registration-free face-ssd: Single shot analysis of smiles, facial attributes, and affect in the wild. Comput. Vision Image Understanding 182, 17–29 (2019).
Bulat, A. & Tzimiropoulos, G. How far are we from solving the 2D & 3D face alignment problem? (And a dataset of 230,000 3D facial landmarks). In Proc. IEEE International Conference on Computer Vision (2017).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 5998–6008 (2017).
Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop (NIPS, 2015).
Furlanello, T., Lipton, Z., Tschannen, M., Itti, L. & Anandkumar, A. Born-again neural networks. In International Conference on Machine Learning 1602–1611 (2018).
Mitenkova, A., Kossaifi, J., Panagakis, Y. & Pantic, M. Valence and arousal estimation in-the-wild with tensor methods. In 14th IEEE International Conference on Automatic Face & Gesture Recognition (2019).
Kollias, D. et al. Deep affect prediction in-the-wild: Aff-Wild database and challenge, deep architectures, and beyond. Int. J. Comput. Vision 127, 907–929 (2019).
Article Google Scholar
Russell, J., J.A, B. & Fernandez-Dols, J. Facial and vocal expressions of emotions. Annu. Rev. Psychol. 54, 329–349 (2003).
Article Google Scholar
Grimm, M. & Kroschel, K. in Robust Speech (eds Grimm, M. & Kroschel, K.) Ch. 16 (IntechOpen, 2007).
Marsh, A. A., Kozak, M. N. & Ambady, N. Accurate identification of fear facial expressions predicts prosocial behavior. Emotion 7, 239–251 (2007).
Article Google Scholar
Clark, C. M., Gosselin, F. & Goghari, V. M. Aberrant patterns of visual facial information usage in schizophrenia. J. Abnorm. Psychol. 122, 513–519 (2013).
Article Google Scholar
Kring, A. M. & Elis, O. Emotion deficits in people with schizophrenia. Annu. Rev. Clin. Psychol. 9, 409–433 (2013).
Article Google Scholar
Bishay, M., Palasek, P., Priebe, S. & Patras, I. Schinet: Automatic estimation of symptoms of schizophrenia from facial behaviour analysis. IEEE Trans. Affective Comput. https://doi.org/10.1109/TAFFC.2019.2907628 (2019).
Caligiuri, M. P. & Ellwanger, J. Motor and cognitive aspects of motor retardation in depression. J. Affective Disord. 57, 83–93 (2000).
Article Google Scholar
Dibeklioğlu, H., Hammal, Z. & Cohn, J. F. Dynamic multimodal measurement of depression severity using deep autoencoding. IEEE J. Biomed. Health Inform. 22, 525–536 (2017).
Article Google Scholar
Harms, M. B., Martin, A. & Wallace, G. L. Facial emotion recognition in autism spectrum disorders: a review of behavioral and neuroimaging studies. Neuropsychol. Rev. 20, 290–322 (2010).
Article Google Scholar
Rudovic, O., Lee, J., Dai, M., Schuller, B. & Picard, R. W. Personalized machine learning for robot perception of affect and engagement in autism therapy. Sci. Robot. 3, eaao6760 (2018).
Article Google Scholar
Boy, N., Lidén, K. & Jacobsen, E. K. Societal Ethics of Biometric Technologies (SOURCE Societal Security Network, 2018).
Cowie, R. in The Oxford Handbook of Affective Computing 334–348 (Oxford Univ. Press, 2015).
Gendron, M., Crivelli, C. & Barrett, L. F. Universality reconsidered: diversity in making meaning of facial expressions. Curr. Directions Psychol. Sci. 27, 211–219 (2018).
Article Google Scholar
Bryant, D. & Howard, A. A comparative analysis of emotion-detecting ai systems with respect to algorithm performance and dataset diversity. In Proc. 2019 AAAI/ACM Conference on AI, Ethics, and Society 377–382 (2019).
Rhue, L. Racial influence on automated perceptions of emotions. Preprint at https://doi.org/10.2139/ssrn.3281765 (2018).
Pantic, M. & Bartlett, M. S. in Face recognition (eds Delac, K. & Grgic, M.) Ch. 21 (IntechOpen, 2007).
Smith, F. W. & Rossit, S. Identifying and detecting facial expressions of emotion in peripheral vision. PLoS ONE 13, e0197160 (2018).
Article Google Scholar
Gastaldi, X. Shake–Shake regularization. Preprint at https://arxiv.org/abs/1705.07485 (2017).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 8024–8035 (Curran Associates, 2017).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

Download references

Author information

Authors and Affiliations

Samsung AI, Cambridge, UK
Antoine Toisoul, Jean Kossaifi, Adrian Bulat & Georgios Tzimiropoulos
Imperial College London, Department of Computing, London, UK
Jean Kossaifi & Maja Pantic

Authors

Antoine Toisoul
View author publications
You can also search for this author in PubMed Google Scholar
Jean Kossaifi
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Bulat
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Tzimiropoulos
View author publications
You can also search for this author in PubMed Google Scholar
Maja Pantic
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The code was written by A.T., J.K. and A.B., and the experiments were conducted by A.T. and J.K. The manuscript was written by A.T., J.K., A.B. and M.P.; G.T. helped with discussions regarding the face-alignment network. M.P. supervised the entire project.

Corresponding author

Correspondence to Antoine Toisoul.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary text, Figs. 1 and 2, Tables 1 and 2 and refs.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Toisoul, A., Kossaifi, J., Bulat, A. et al. Estimation of continuous valence and arousal levels from faces in naturalistic conditions. Nat Mach Intell 3, 42–50 (2021). https://doi.org/10.1038/s42256-020-00280-0

Download citation

Received: 13 December 2019
Accepted: 03 December 2020
Published: 11 January 2021
Issue Date: January 2021
DOI: https://doi.org/10.1038/s42256-020-00280-0

This article is cited by

Emotion-wise feature interaction analysis-based visual emotion distribution learning
- Jing Zhang
- Qiuge Qin
- Wen Du
The Visual Computer (2024)
Assessing the effectiveness of virtual reality serious games in post-stroke rehabilitation: a novel evaluation method
- Mostefa Masmoudi
- Nadia Zenati
- Chafiaa Hamitouche
Multimedia Tools and Applications (2024)
Bag of states: a non-sequential approach to video-based engagement measurement
- Ali Abedi
- Chinchu Thomas
- Shehroz S. Khan
Multimedia Systems (2024)
Migration and emotions in the media: can socioeconomic indicators predict emotions in images associated with immigrants?
- Juan Sebastian Olier
- Camilla Spadavecchia
Journal of Computational Social Science (2024)
Affect-driven ordinal engagement measurement from video
- Ali Abedi
- Shehroz S. Khan
Multimedia Tools and Applications (2023)