The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures


Central to emotion science is the degree to which categories, such as Awe, or broader affective features, such as Valence, underlie the recognition of emotional expression. To explore the processes by which people recognize emotion from prosody, US and Indian participants were asked to judge the emotion categories or affective features communicated by 2,519 speech samples produced by 100 actors from 5 cultures. With large-scale statistical inference methods, we find that prosody can communicate at least 12 distinct kinds of emotion that are preserved across the 2 cultures. Analyses of the semantic and acoustic structure of the recognition of emotions reveal that emotion categories drive the recognition of emotions more so than affective features, including Valence. In contrast to discrete emotion theories, however, emotion categories are bridged by gradients representing blends of emotions. Our findings, visualized within an interactive map, reveal a complex, high-dimensional space of emotional states recognized cross-culturally in speech prosody.

Fig. 1: Correlations in the meaning of emotional prosody across cultures.
Fig. 2: The preserved recognition of emotion categories accounts for the preservation of affective feature judgments across cultures.
Fig. 3: Verifying that PPCA accurately estimates the number of shared dimensions.
Fig. 4: 12 distinct varieties of emotional prosody are preserved across cultures via category recognition.
Fig. 5: Correlations between coefficients of components extracted from US and Indian category judgments using different methods.
Fig. 6: Visualizing the 12-dimensional structure of emotion conveyed by prosody.
Fig. 7: The 12 distinct categories can be blended together in a number of ways.
Fig. 8: Low-level acoustic correlates of emotion recognition and their preservation across cultures.

Code availability

Custom MATLAB analysis code can be requested from

Data availability

The 2,519 speech samples used in the present study and their ratings can be requested from Publications incorporating the speech samples should reference the previous study33.


We thank R. Rosipal for devising a correlational version of PPCA and F. Theunissen for providing input regarding acoustic analyses. Research reported in this publication was supported by the US National Institute of Mental Health under award number T32-MH020006-16A1 and by the Thomas and Ruth Ann Hornaday Chair in Psychology at the University of California, Berkeley. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

P.L. and H.A.E. contributed all speech samples; A.S.C. and D.K. designed the research with input from P.L. and H.A.E.; A.S.C. performed research, contributed analytic tools and analysed data; and A.S.C. and D.K. wrote the paper with input from P.L., H.A.E. and R.L.

