Deep learning reveals what vocal bursts express in different cultures

Brooks, Jeffrey A.; Tzirakis, Panagiotis; Baird, Alice; Kim, Lauren; Opara, Michael; Fang, Xia; Keltner, Dacher; Monroy, Maria; Corona, Rebecca; Metrick, Jacob; Cowen, Alan S.

doi:10.1038/s41562-022-01489-2

Article
Published: 28 December 2022

Deep learning reveals what vocal bursts express in different cultures

Nature Human Behaviour volume 7, pages 240–250 (2023)Cite this article

2318 Accesses
7 Citations
46 Altmetric
Metrics details

Subjects

Abstract

Human social life is rich with sighs, chuckles, shrieks and other emotional vocalizations, called ‘vocal bursts’. Nevertheless, the meaning of vocal bursts across cultures is only beginning to be understood. Here, we combined large-scale experimental data collection with deep learning to reveal the shared and culture-specific meanings of vocal bursts. A total of n = 4,031 participants in China, India, South Africa, the USA and Venezuela mimicked vocal bursts drawn from 2,756 seed recordings. Participants also judged the emotional meaning of each vocal burst. A deep neural network tasked with predicting the culture-specific meanings people attributed to vocal bursts while disregarding context and speaker identity discovered 24 acoustic dimensions, or kinds, of vocal expression with distinct emotion-related meanings. The meanings attributed to these complex vocal modulations were 79% preserved across the five countries and three languages. These results reveal the underlying dimensions of human emotional vocalization in remarkable detail.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Schematic of our experimental and analytic approach.**

**Fig. 2: Dimensions of vocal expression that emerged as having distinct meanings within or across cultures.**

**Fig. 3: Interactive visualization of vocal bursts along the 24 acoustic dimensions of vocal modulation found to have distinct meanings within or across cultures.**

Acoustic regularities in infant-directed speech and song across cultures

Article 18 July 2022

Courtney B. Hilton, Cody J. Moser, … Samuel A. Mehr

ReCANVo: A database of real-world communicative and affective nonverbal vocalizations

Article Open access 05 August 2023

Kristina T. Johnson, Jaya Narain, … Rosalind W. Picard

Universal principles underlying segmental structures in parrot song and human speech

Article Open access 12 January 2021

Dan C. Mann, W. Tecumseh Fitch, … Marisa Hoeschele

Data availability

The data associated with this manuscript are available upon reasonable request to the corresponding authors.

Code availability

Code associated with this study, including the functions to perform PPCA, is available in the following Zenodo repository: https://doi.org/10.5281/zenodo.7111972.

References

Banse, R. & Scherer, K. R. Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70, 614–636 (1996).
Article CAS PubMed Google Scholar
Fernald, A. in The Adapted Mind: Evolutionary Psychology and the Generation of Culture (eds Barkow, J. et al.) 391–428 (Oxford Univ. Press, 1992).
Soltis, J. The signal functions of early infant crying. Behav. Brain Sci. 27, 443–458 (2004).
Article PubMed Google Scholar
Cordaro, D. T., Keltner, D., Tshering, S., Wangchuk, D. & Flynn, L. M. The voice conveys emotion in ten globalized cultures and one remote village in Bhutan. Emotion 16, 117–128 (2016).
Article PubMed Google Scholar
Keltner, D. & Kring, A. M. Emotion, social function, and psychopathology. Rev. Gen. Psychol. 2, 320–342 (1998).
Article Google Scholar
Van Kleef, G. A., De Dreu, C. K. W. & Manstead, A. S. R. An interpersonal approach to emotion in social decision making: the emotions as social information model. Adv. Exp. Social Psychol. 42, 45–96 (2010).
Bryant, G. A. in The Handbook of Communication Science and Biology (eds Floyd, K. & Weber, R.) 63–77 (Routledge, 2020).
Snowdon, C. T. in Handbook of Affective Sciences (eds Davidson, R. J. et al.) 457–480 (Oxford Univ. Press, 2003).
Wu, Y., Muentener, P. & Schulz, L. E. One- to four-year-olds connect diverse positive emotional vocalizations to their probable causes. Proc. Natl Acad. Sci. USA 114, 11896–11901 (2017).
Article CAS PubMed PubMed Central Google Scholar
Vouloumanos, A. & Bryant, G. A. Five-month-old infants detect affiliation in colaughter. Sci. Rep. 9, 4158 (2019).
Article PubMed PubMed Central Google Scholar
Smoski, M. & Bachorowski, J.-A. Antiphonal laughter between friends and strangers. Cogn. Emot. 17, 327–340 (2003).
Article PubMed Google Scholar
Bryant, G. A. et al. Detecting affiliation in colaughter across 24 societies. Proc. Natl Acad. Sci. USA 113, 4682–4687 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sauter, D. A., Eisner, F., Ekman, P. & Scott, S. K. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proc. Natl Acad. Sci. USA 107, 2408–2412 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gendron, M., Roberson, D., van der Vyver, J. M. & Barrett, L. F. Cultural relativity in perceiving emotion from vocalizations. Psychol. Sci. 25, 911–920 (2014).
Article PubMed Google Scholar
Scherer, K. R. in Emotions in Personality and Psychopathology (ed. Izard, C. E.) 493–529 (Springer, 1979).
Laukka, P. et al. Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations. Front. Psychol. 4, 353 (2013).
Article PubMed PubMed Central Google Scholar
Cowen, A. S., Elfenbein, H. A., Laukka, P. & Keltner, D. Mapping 24 emotions conveyed by brief human vocalization. Am. Psychol. 74, 698–712 (2019).
Article PubMed Google Scholar
Jolly, E. & Chang, L. J. The flatland fallacy: moving beyond low-dimensional thinking. Top. Cogn. Sci. 11, 433–454 (2019).
Article PubMed Google Scholar
Sauter, D. A., Eisner, F., Ekman, P. & Scott, S. K. Emotional vocalizations are recognized across cultures regardless of the valence of distractors. Psychol. Sci. 26, 354–356 (2015).
Article PubMed Google Scholar
Whiting, C. M., Kotz, S. A., Gross, J., Giordano, B. L. & Belin, P. The perception of caricatured emotion in voice. Cognition 200, 104249 (2020).
Article PubMed PubMed Central Google Scholar
Monroy, M., Cowen, A. S. & Keltner, D. Intersectionality in emotion signaling and recognition: the influence of gender, ethnicity, and social class. Emotion https://doi.org/10.1037/emo0001082 (2022).
Jackson, J. C. et al. Emotion semantics show both cultural variation and universal structure. Science 366, 1517–1522 (2019).
Article CAS PubMed Google Scholar
Rozin, P. & Cohen, A. B. High frequency of facial expressions corresponding to confusion, concentration, and worry in an analysis of naturally occurring facial expressions of Americans. Emotion 3, 68–75 (2003).
Article PubMed Google Scholar
Hejmadi, A., Davidson, R. J. & Rozin, P. Exploring Hindu Indian emotion expressions: evidence for accurate recognition by Americans and Indians. Psychol. Sci. 11, 183–186 (2000).
Article CAS PubMed Google Scholar
Russell, J. A., Suzuki, N. & Ishida, N. Canadian, Greek, and Japanese freely produced emotion labels for facial expressions. Motiv. Emot. 17, 337–351 (1993).
Article Google Scholar
Cowen, A. S., Laukka, P., Elfenbein, H. A., Liu, R. & Keltner, D. The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures. Nat. Hum. Behav. 3, 369–382 (2019).
Article PubMed PubMed Central Google Scholar
Cowen, A. S., Fang, X., Sauter, D. & Keltner, D. What music makes us feel: at least 13 dimensions organize subjective experiences associated with music across different cultures. Proc. Natl Acad. Sci. USA 117, 1924–1934 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cowen, A. S. & Keltner, D. Universal facial expressions uncovered in art of the ancient Americas: a computational approach. Sci. Adv. 6, eabb1005 (2020).
Article PubMed PubMed Central Google Scholar
Demszky, D. et al. GoEmotions: a dataset of fine-grained emotions. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 4040–4054 (ACL, 2020).
Cowen, A. S. & Keltner, D. Semantic space theory: a computational approach to emotion. Trends Cogn. Sci. 25, 124–136 (2021).
Article PubMed Google Scholar
Cowen, A. S. & Keltner, D. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Natl Acad. Sci. USA 114, E7900–E7909 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cordaro, D. T. et al. The recognition of 18 facial-bodily expressions across nine cultures. Emotion 20, 1292–1300 (2020).
Article PubMed Google Scholar
Cordaro, D. T. et al. Universals and cultural variations in 22 emotional expressions across five cultures. Emotion 18, 75–93 (2018).
Article PubMed Google Scholar
Keltner, D., Sauter, D., Tracy, J. & Cowen, A. Emotional expression: advances in basic emotion theory. J. Nonverbal Behav. 43, 133–160 (2019).
Article PubMed PubMed Central Google Scholar
Peterson, J. C., Abbott, J. T. & Griffiths, T. L. Adapting deep network features to capture psychological representations. In Proc. of the 48th Annual Conference of the Cognitive Science Society 2363–2368 (2016).
Peterson, J. C., Uddenberg, S., Griffiths, T. L., Todorov, A. & Suchow, J. W. Deep models of superficial face judgments. Proc. Natl Acad. Sci. USA 119, e2115228119 (2022).
Article CAS PubMed PubMed Central Google Scholar
Peters, B. & Kriegeskorte, N. Capturing the objects of vision with neural networks. Nat. Hum. Behav. 5, 1127–1144 (2021).
Article PubMed Google Scholar
Storrs, K. R., Anderson, L. & Fleming, R. W. Unsupervised learning predicts human perception and misperception of gloss. Nat. Hum. Behav. 5, 1402–1417 (2021). https://doi.org/10.1101/2020.04.07.026120
Lake, B. M., Zaremba, W., Fergus, R. & Gureckis, T. M. Deep neural networks predict category typicality ratings for images. In Proc. 37th Annual Meeting of the Cognitive Science Society (eds Noelle, D. C. et al.) 1243–1248 (The Cognitive Science Society, 2015); https://cogsci.mindmodeling.org/2015/papers/0219/paper0219.pdf
Cowen, A. S. & Keltner, D. Universal emotional expressions uncovered in art of the ancient Americas: a computational approach. Sci. Adv. 6, eabb1005 (2020).
Cowen, A., Sauter, D., Tracy, J. L. & Keltner, D. Mapping the passions: toward a high-dimensional taxonomy of emotional experience and expression. Psychol. Sci. Public Interest 20, 69–90 (2019).
Article PubMed PubMed Central Google Scholar
Cowen, A. S. & Keltner, D. What the face displays: mapping 28 emotions conveyed by naturalistic expression. Am. Psychol. 75, 349–364 (2020).
Article PubMed Google Scholar
Hess, U. & Fischer, A. Emotional mimicry: why and when we mimic emotions. Soc. Pers. Psychol. Compass 8, 45–57 (2014).
Article Google Scholar
Fischer, A. & Hess, U. Mimicking emotions. Curr. Opin. Psychol. 17, 151–155 (2017).
Article PubMed Google Scholar
Ćwiek, A. et al. Novel vocalizations are understood across cultures. Sci. Rep. 11, 10108 (2021).
Article PubMed PubMed Central Google Scholar
Simon-Thomas, E. R., Keltner, D. J., Sauter, D., Sinicropi-Yao, L. & Abramson, A. The voice conveys specific emotions: evidence from vocal burst displays. Emotion 9, 838–846 (2009).
Article PubMed Google Scholar
Cowen, A. S. et al. Sixteen facial expressions occur in similar contexts worldwide. Nature 589, 251–257 (2021).
Article CAS PubMed Google Scholar
Horikawa, T., Cowen, A. S., Keltner, D. & Kamitani, Y. The neural representation of visually evoked emotion is high-dimensional, categorical, and distributed across transmodal brain regions. iScience 23, 101060 (2020).
Article PubMed PubMed Central Google Scholar
Hofstede, G. Dimensionalizing cultures: the Hofstede model in context. Online Read. Psychol. Cult. 2, 8 (2011).
Google Scholar
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
Article PubMed Google Scholar
Roth, P. L. Missing data: a conceptual review for applied psychologists. Pers. Psychol. 47, 537–560 (1994).
Article Google Scholar
Juslin, P. N. & Laukka, P. Communication of emotions in vocal expression and music performance: different channels, same code? Psychol. Bull. 129, 770–814 (2003).
Article PubMed Google Scholar
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
Tzirakis, P., Zhang, J. & Schuller, B. W. End-to-end speech emotion recognition using deep neural networks. In Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5089–5093 (IEEE, 2018). https://doi.org/10.1109/icassp.2018.8462677
Zamani, H. & Croft, W. B. On the theory of weak supervision for information retrieval. In Proc. 2018 ACM SIGIR International Conference on Theory of Information Retrieval 147–154 (Association for Computing Machinery, 2018).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (ICLR) (ICLR, 2015).
He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proc. 2015 IEEE International Conference on Computer Vision (ICCV) 1026–1034 (IEEE, 2015).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Google Scholar
G’Sell, M. G., Wager, S., Chouldechova, A. & Tibshirani, R. Sequential selection procedures and false discovery rate control. J. R. Stat. Soc. B 78, 423–444 (2016).
Article Google Scholar
Kaiser, H. F. The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200 (1958).
Article Google Scholar

Download references

Acknowledgements

This work was supported by Hume AI as part of its effort to advance emotion research using computational methods. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Research Division, Hume AI, New York, NY, USA
Jeffrey A. Brooks, Panagiotis Tzirakis, Alice Baird, Lauren Kim, Michael Opara, Dacher Keltner, Jacob Metrick & Alan S. Cowen
University of California, Berkeley, Berkeley, CA, USA
Jeffrey A. Brooks, Dacher Keltner, Maria Monroy, Rebecca Corona & Alan S. Cowen
Zhejiang University, Hangzhou, China
Xia Fang

Authors

Jeffrey A. Brooks
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Tzirakis
View author publications
You can also search for this author in PubMed Google Scholar
Alice Baird
View author publications
You can also search for this author in PubMed Google Scholar
Lauren Kim
View author publications
You can also search for this author in PubMed Google Scholar
Michael Opara
View author publications
You can also search for this author in PubMed Google Scholar
Xia Fang
View author publications
You can also search for this author in PubMed Google Scholar
Dacher Keltner
View author publications
You can also search for this author in PubMed Google Scholar
Maria Monroy
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Corona
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Metrick
View author publications
You can also search for this author in PubMed Google Scholar
Alan S. Cowen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.S.C. and D.K. designed the experiment. L.K., M.O., X.F., M.M., R.C., J.M. and A.S.C. implemented the study design and collected data. J.A.B., P.T., A.B. and A.S.C. analysed data. J.A.B. and A.S.C. interpreted results and created figures. J.A.B. and A.S.C. drafted the manuscript. All authors provided critical revisions and approved the final manuscript for submission.

Corresponding authors

Correspondence to Jeffrey A. Brooks or Alan S. Cowen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–7 and Tables 1–3.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Brooks, J.A., Tzirakis, P., Baird, A. et al. Deep learning reveals what vocal bursts express in different cultures. Nat Hum Behav 7, 240–250 (2023). https://doi.org/10.1038/s41562-022-01489-2

Download citation

Received: 28 April 2022
Accepted: 26 October 2022
Published: 28 December 2022
Issue Date: February 2023
DOI: https://doi.org/10.1038/s41562-022-01489-2