Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Deep learning reveals what vocal bursts express in different cultures


Human social life is rich with sighs, chuckles, shrieks and other emotional vocalizations, called ‘vocal bursts’. Nevertheless, the meaning of vocal bursts across cultures is only beginning to be understood. Here, we combined large-scale experimental data collection with deep learning to reveal the shared and culture-specific meanings of vocal bursts. A total of n = 4,031 participants in China, India, South Africa, the USA and Venezuela mimicked vocal bursts drawn from 2,756 seed recordings. Participants also judged the emotional meaning of each vocal burst. A deep neural network tasked with predicting the culture-specific meanings people attributed to vocal bursts while disregarding context and speaker identity discovered 24 acoustic dimensions, or kinds, of vocal expression with distinct emotion-related meanings. The meanings attributed to these complex vocal modulations were 79% preserved across the five countries and three languages. These results reveal the underlying dimensions of human emotional vocalization in remarkable detail.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic of our experimental and analytic approach.
Fig. 2: Dimensions of vocal expression that emerged as having distinct meanings within or across cultures.
Fig. 3: Interactive visualization of vocal bursts along the 24 acoustic dimensions of vocal modulation found to have distinct meanings within or across cultures.

Similar content being viewed by others

Data availability

The data associated with this manuscript are available upon reasonable request to the corresponding authors.

Code availability

Code associated with this study, including the functions to perform PPCA, is available in the following Zenodo repository:


  1. Banse, R. & Scherer, K. R. Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70, 614–636 (1996).

    Article  CAS  PubMed  Google Scholar 

  2. Fernald, A. in The Adapted Mind: Evolutionary Psychology and the Generation of Culture (eds Barkow, J. et al.) 391–428 (Oxford Univ. Press, 1992).

  3. Soltis, J. The signal functions of early infant crying. Behav. Brain Sci. 27, 443–458 (2004).

    Article  PubMed  Google Scholar 

  4. Cordaro, D. T., Keltner, D., Tshering, S., Wangchuk, D. & Flynn, L. M. The voice conveys emotion in ten globalized cultures and one remote village in Bhutan. Emotion 16, 117–128 (2016).

    Article  PubMed  Google Scholar 

  5. Keltner, D. & Kring, A. M. Emotion, social function, and psychopathology. Rev. Gen. Psychol. 2, 320–342 (1998).

    Article  Google Scholar 

  6. Van Kleef, G. A., De Dreu, C. K. W. & Manstead, A. S. R. An interpersonal approach to emotion in social decision making: the emotions as social information model. Adv. Exp. Social Psychol. 42, 45–96 (2010).

  7. Bryant, G. A. in The Handbook of Communication Science and Biology (eds Floyd, K. & Weber, R.) 63–77 (Routledge, 2020).

  8. Snowdon, C. T. in Handbook of Affective Sciences (eds Davidson, R. J. et al.) 457–480 (Oxford Univ. Press, 2003).

  9. Wu, Y., Muentener, P. & Schulz, L. E. One- to four-year-olds connect diverse positive emotional vocalizations to their probable causes. Proc. Natl Acad. Sci. USA 114, 11896–11901 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Vouloumanos, A. & Bryant, G. A. Five-month-old infants detect affiliation in colaughter. Sci. Rep. 9, 4158 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Smoski, M. & Bachorowski, J.-A. Antiphonal laughter between friends and strangers. Cogn. Emot. 17, 327–340 (2003).

    Article  PubMed  Google Scholar 

  12. Bryant, G. A. et al. Detecting affiliation in colaughter across 24 societies. Proc. Natl Acad. Sci. USA 113, 4682–4687 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sauter, D. A., Eisner, F., Ekman, P. & Scott, S. K. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proc. Natl Acad. Sci. USA 107, 2408–2412 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Gendron, M., Roberson, D., van der Vyver, J. M. & Barrett, L. F. Cultural relativity in perceiving emotion from vocalizations. Psychol. Sci. 25, 911–920 (2014).

    Article  PubMed  Google Scholar 

  15. Scherer, K. R. in Emotions in Personality and Psychopathology (ed. Izard, C. E.) 493–529 (Springer, 1979).

  16. Laukka, P. et al. Cross-cultural decoding of positive and negative non-linguistic emotion vocalizations. Front. Psychol. 4, 353 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Cowen, A. S., Elfenbein, H. A., Laukka, P. & Keltner, D. Mapping 24 emotions conveyed by brief human vocalization. Am. Psychol. 74, 698–712 (2019).

    Article  PubMed  Google Scholar 

  18. Jolly, E. & Chang, L. J. The flatland fallacy: moving beyond low-dimensional thinking. Top. Cogn. Sci. 11, 433–454 (2019).

    Article  PubMed  Google Scholar 

  19. Sauter, D. A., Eisner, F., Ekman, P. & Scott, S. K. Emotional vocalizations are recognized across cultures regardless of the valence of distractors. Psychol. Sci. 26, 354–356 (2015).

    Article  PubMed  Google Scholar 

  20. Whiting, C. M., Kotz, S. A., Gross, J., Giordano, B. L. & Belin, P. The perception of caricatured emotion in voice. Cognition 200, 104249 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Monroy, M., Cowen, A. S. & Keltner, D. Intersectionality in emotion signaling and recognition: the influence of gender, ethnicity, and social class. Emotion (2022).

  22. Jackson, J. C. et al. Emotion semantics show both cultural variation and universal structure. Science 366, 1517–1522 (2019).

    Article  CAS  PubMed  Google Scholar 

  23. Rozin, P. & Cohen, A. B. High frequency of facial expressions corresponding to confusion, concentration, and worry in an analysis of naturally occurring facial expressions of Americans. Emotion 3, 68–75 (2003).

    Article  PubMed  Google Scholar 

  24. Hejmadi, A., Davidson, R. J. & Rozin, P. Exploring Hindu Indian emotion expressions: evidence for accurate recognition by Americans and Indians. Psychol. Sci. 11, 183–186 (2000).

    Article  CAS  PubMed  Google Scholar 

  25. Russell, J. A., Suzuki, N. & Ishida, N. Canadian, Greek, and Japanese freely produced emotion labels for facial expressions. Motiv. Emot. 17, 337–351 (1993).

    Article  Google Scholar 

  26. Cowen, A. S., Laukka, P., Elfenbein, H. A., Liu, R. & Keltner, D. The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures. Nat. Hum. Behav. 3, 369–382 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Cowen, A. S., Fang, X., Sauter, D. & Keltner, D. What music makes us feel: at least 13 dimensions organize subjective experiences associated with music across different cultures. Proc. Natl Acad. Sci. USA 117, 1924–1934 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Cowen, A. S. & Keltner, D. Universal facial expressions uncovered in art of the ancient Americas: a computational approach. Sci. Adv. 6, eabb1005 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Demszky, D. et al. GoEmotions: a dataset of fine-grained emotions. In Proc. 58th Annual Meeting of the Association for Computational Linguistics 4040–4054 (ACL, 2020).

  30. Cowen, A. S. & Keltner, D. Semantic space theory: a computational approach to emotion. Trends Cogn. Sci. 25, 124–136 (2021).

    Article  PubMed  Google Scholar 

  31. Cowen, A. S. & Keltner, D. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Natl Acad. Sci. USA 114, E7900–E7909 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Cordaro, D. T. et al. The recognition of 18 facial-bodily expressions across nine cultures. Emotion 20, 1292–1300 (2020).

    Article  PubMed  Google Scholar 

  33. Cordaro, D. T. et al. Universals and cultural variations in 22 emotional expressions across five cultures. Emotion 18, 75–93 (2018).

    Article  PubMed  Google Scholar 

  34. Keltner, D., Sauter, D., Tracy, J. & Cowen, A. Emotional expression: advances in basic emotion theory. J. Nonverbal Behav. 43, 133–160 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Peterson, J. C., Abbott, J. T. & Griffiths, T. L. Adapting deep network features to capture psychological representations. In Proc. of the 48th Annual Conference of the Cognitive Science Society 2363–2368 (2016).

  36. Peterson, J. C., Uddenberg, S., Griffiths, T. L., Todorov, A. & Suchow, J. W. Deep models of superficial face judgments. Proc. Natl Acad. Sci. USA 119, e2115228119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Peters, B. & Kriegeskorte, N. Capturing the objects of vision with neural networks. Nat. Hum. Behav. 5, 1127–1144 (2021).

    Article  PubMed  Google Scholar 

  38. Storrs, K. R., Anderson, L. & Fleming, R. W. Unsupervised learning predicts human perception and misperception of gloss. Nat. Hum. Behav. 5, 1402–1417 (2021).

  39. Lake, B. M., Zaremba, W., Fergus, R. & Gureckis, T. M. Deep neural networks predict category typicality ratings for images. In Proc. 37th Annual Meeting of the Cognitive Science Society (eds Noelle, D. C. et al.) 1243–1248 (The Cognitive Science Society, 2015);

  40. Cowen, A. S. & Keltner, D. Universal emotional expressions uncovered in art of the ancient Americas: a computational approach. Sci. Adv. 6, eabb1005 (2020).

  41. Cowen, A., Sauter, D., Tracy, J. L. & Keltner, D. Mapping the passions: toward a high-dimensional taxonomy of emotional experience and expression. Psychol. Sci. Public Interest 20, 69–90 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Cowen, A. S. & Keltner, D. What the face displays: mapping 28 emotions conveyed by naturalistic expression. Am. Psychol. 75, 349–364 (2020).

    Article  PubMed  Google Scholar 

  43. Hess, U. & Fischer, A. Emotional mimicry: why and when we mimic emotions. Soc. Pers. Psychol. Compass 8, 45–57 (2014).

    Article  Google Scholar 

  44. Fischer, A. & Hess, U. Mimicking emotions. Curr. Opin. Psychol. 17, 151–155 (2017).

    Article  PubMed  Google Scholar 

  45. Ćwiek, A. et al. Novel vocalizations are understood across cultures. Sci. Rep. 11, 10108 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Simon-Thomas, E. R., Keltner, D. J., Sauter, D., Sinicropi-Yao, L. & Abramson, A. The voice conveys specific emotions: evidence from vocal burst displays. Emotion 9, 838–846 (2009).

    Article  PubMed  Google Scholar 

  47. Cowen, A. S. et al. Sixteen facial expressions occur in similar contexts worldwide. Nature 589, 251–257 (2021).

    Article  CAS  PubMed  Google Scholar 

  48. Horikawa, T., Cowen, A. S., Keltner, D. & Kamitani, Y. The neural representation of visually evoked emotion is high-dimensional, categorical, and distributed across transmodal brain regions. iScience 23, 101060 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Hofstede, G. Dimensionalizing cultures: the Hofstede model in context. Online Read. Psychol. Cult. 2, 8 (2011).

    Google Scholar 

  50. Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).

    Article  PubMed  Google Scholar 

  51. Roth, P. L. Missing data: a conceptual review for applied psychologists. Pers. Psychol. 47, 537–560 (1994).

    Article  Google Scholar 

  52. Juslin, P. N. & Laukka, P. Communication of emotions in vocal expression and music performance: different channels, same code? Psychol. Bull. 129, 770–814 (2003).

    Article  PubMed  Google Scholar 

  53. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  54. Tzirakis, P., Zhang, J. & Schuller, B. W. End-to-end speech emotion recognition using deep neural networks. In Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5089–5093 (IEEE, 2018).

  55. Zamani, H. & Croft, W. B. On the theory of weak supervision for information retrieval. In Proc. 2018 ACM SIGIR International Conference on Theory of Information Retrieval 147–154 (Association for Computing Machinery, 2018).

  56. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (ICLR) (ICLR, 2015).

  57. He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proc. 2015 IEEE International Conference on Computer Vision (ICCV) 1026–1034 (IEEE, 2015).

  58. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Google Scholar 

  59. G’Sell, M. G., Wager, S., Chouldechova, A. & Tibshirani, R. Sequential selection procedures and false discovery rate control. J. R. Stat. Soc. B 78, 423–444 (2016).

    Article  Google Scholar 

  60. Kaiser, H. F. The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200 (1958).

    Article  Google Scholar 

Download references


This work was supported by Hume AI as part of its effort to advance emotion research using computational methods. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations



A.S.C. and D.K. designed the experiment. L.K., M.O., X.F., M.M., R.C., J.M. and A.S.C. implemented the study design and collected data. J.A.B., P.T., A.B. and A.S.C. analysed data. J.A.B. and A.S.C. interpreted results and created figures. J.A.B. and A.S.C. drafted the manuscript. All authors provided critical revisions and approved the final manuscript for submission.

Corresponding authors

Correspondence to Jeffrey A. Brooks or Alan S. Cowen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–7 and Tables 1–3.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brooks, J.A., Tzirakis, P., Baird, A. et al. Deep learning reveals what vocal bursts express in different cultures. Nat Hum Behav 7, 240–250 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing