Music, defined here as human-produced sound organized by melodies, rhythms or both, is found in every society where researchers have looked1,2,3,4. It suffuses social life, appearing in contexts as diverse as healing, dancing and infant care1,5,6, and occurs across the lifespan, through infancy7, childhood8, adolescence9, adulthood10 and old age11. The importance of music in the social lives of humans stems from its potent and diverse psychological effects, which range from pacifying infants12,13,14,15 to fomenting the collective, chaotic thrashing of rock concert mosh pits16.

A central question in the study of music is whether humans have evolved specialized cognitive adaptations to produce and respond to music. The psychology of music either comprises music-specific adaptations shaped by natural selection17,18,19 or arises as a by-product of cognitive abilities serving non-musical functions. According to this by-product account, also known as the ‘auditory cheesecake hypothesis’, music is a package of cognitively compelling stimuli moulded via cultural evolution to trigger features of human psychology that evolved for non-musical ends20,21. At least three features of music-related psychological processes can help determine whether the underlying cognitive systems are specialized adaptations: domain-specificity, early expression and universality17. A psychological process is domain-specific if it has evolved to operate on a particular class of information. It is expressed early if infants exhibit the response. Universality, which can refer either to a behaviour or to an underlying feature of human psychology, is a feature that deserves further elaboration.

A behaviour is universal when it is expressed in all human populations, excepting mitigating factors. For instance, music production was expressed by 100% of populations in a sample of 315 mostly non-industrial human societies, including geographically diverse hunter-gatherers, pastoralists and intensive agriculturalists1. This universality naturally coexists with variability: not every individual in every culture is an expert producer of music (as only some individuals have extensive music training); some cultures use music less frequently than others (as with the Tsimane, who generally do not produce music in groups22); and not every individual in every culture is equally motivated to produce music (as in individuals with musical anhedonia23, for whom music production might be less rewarding than is typical). The production of music is nonetheless considered universal, as even in these cases, there is evidence for the behaviour in every population studied. A behaviour can be near-universal (sometimes called a ‘statistical universal’) if it appears above a predefined threshold but not in 100% of cultures sampled2.

Unlike a behaviour, a universal psychological mechanism or predisposition can manifest variably, not necessarily appearing in every individual in all populations24. Jealousy exhibits considerable global variation, with individuals in some cultures reporting less severe jealousy25. Yet, this variation is structured: cross-culturally, the severity of jealousy covaries with the frequency of extramarital sex and expectations of parental investment25, suggesting that jealousy is a universal emotional response that functions to ensure either parental investment (for females) or paternity certainty (for males). Like jealousy, psychological responses to music can exhibit reliable cross-cultural differences while still reflecting universal predispositions that are variably expressed depending on the environment of an individual.

To organize our discussion, we heuristically distinguish among three psychological processes at work in human musicality: music production, music perception and musical response. Music production refers to the auditory, motor and vocal processes associated with singing or playing an instrument. Music perception refers to processing that translates sounds into neural activity, which is subsequently subjected to a variety of analyses, including auditory scene analysis and the extraction of musical structure, syntax or interval relations26. Finally, musical response refers to the higher-level semantic, aesthetic, emotional and behavioural responses and inferences that follow music production and its subsequent perception (Fig. 1).

Fig. 1: Musical response and music perception comprise distinct psychological processes, both of which are at work in human musicality.
figure 1

The diagram identifies topics in music perception and musical response that are commonly studied in psychology. Topics are ordered vertically by their approximate level of abstraction, which distinguishes lower-level perceptual phenomena (such as the extraction of basic acoustic information in a stimulus) from higher-level musical responses (such as enculturation). This Review focuses in depth on emotional and behavioural responses, leaving aside other musical responses such as aesthetic appreciation.

In this Review, we synthesize the literature on universality, domain-specificity and development of psychological responses to music. We first briefly discuss the mechanics of music production and music perception, before focusing on emotional and then behavioural responses to music — two rapidly advancing areas of research. By surveying cross-cultural, developmental and neuroscientific approaches, we will demonstrate clear evidence for the universality and early development of emotional and behavioural responses to music. However, the evidence for domain-specificity is more mixed, suggesting that universal responses to music might draw on more general features of human psychology. We conclude by considering how cultural evolution interacts with universal aspects of human psychology to produce both cross-cultural similarities and cultural idiosyncrasies in the music of the world.

Music production, perception and response

Universality, development and domain-specificity have been key research areas for each of the music-related psychological processes (production, perception and response). For instance, music production is universal and the associated behaviours vary substantially less across cultures than within cultures1. Humans have manufactured musical instruments for at least 35,000 years27 and have likely produced vocal music for longer18,28,29. The universality and deep history of music production suggest that it is underlain by psychological mechanisms shared across humans.

Given the universality of music production, it is not surprising that many basic aspects of music perception are widespread and early-developing, such as mechanisms involved in hearing and understanding musical pitch (the psychological correlate of frequency, allowing it to be ordered on a frequency-related scale; in English, pitch is typically described as the highness or lowness of a tone)30,31,32,33. Perception starts with feature extraction, during which low-level acoustic features like timbre, intensity, location, pitch height and periodicity are decoded from the auditory stream34. This acoustic information is analyzed to process melodic, rhythmic, timbral and spatial groupings, eventually resulting in higher-level musical representations, such as tonal and metrical information (two foundational aspects of musical information) and harmonic structure34. The human auditory cortex is specialized for music perception35, separately from speech perception36,37, with special selectivity for vocal as opposed to instrumental music38 and with connections to reward systems found in the midbrain39,40. Whether the psychological mechanisms underlying music production and perception are best explained by domain-general processes, such as auditory scene analysis41, or domain-specific ones is up for debate, but the current overall picture is that many aspects of music production and perception form a basic part of human psychology that supports higher-level musical responses.

Musical response refers to the semantic, emotional, aesthetic and other behavioural responses and inferences that follow music production and perception (Fig. 1). Musical responses occur in both producers and listeners of music and include many apparently higher-level responses to music such as inferring musical meaning (‘this song is about birds’), inferring expressed emotions in music (‘this song sounds happy’), directly experiencing emotions evoked from music (a song makes a listener feel happy) and moving in response to music.

Whereas musical response is generally downstream of perception, the relationship is not completely linear or serial. Musical responses do not require the analysis of rhythmic or spatial groups; for instance, tones played in isolation (without other rhythmic or melodic structure) can convey meaning, such as by sounding ‘bright’, ‘feminine’ or ‘summery’42,43. Moreover, there are indications that motor regions of the brain not only respond to structural features like rhythm and metre but are also involved in extracting beat, raising the possibility of feedback loops between music perception and response44,45,46,47. Such feedback loops undoubtedly operate differently in the brain of a performer (who has more immediate access to motor information in music) than for a listener (who has less)48. Nevertheless, our heuristic distinction between music production, perception and response is justified by how humans process music psychologically26,49 and parallels distinctions used in language sciences50.

In this Review, we will largely leave aside the mechanics of music production and perception to concentrate on the domain-specificity, development and universality of musical responses. For instance, we do not discuss cultural variation in the perception of dissonance51, the effects of musical experience on auditory processing52, or the effects of antenatal exposure on auditory perception and neural development53,54. Our coverage will focus on two sets of musical responses that have received considerable research attention and are among the most important psychological effects of music. We will start by discussing emotional inferences and responses, especially recognizing expressed emotions in music. We will then address behavioural inferences and responses, particularly being soothed and dancing.

Emotional responses to music

Individuals overwhelmingly consume and deploy music for emotional regulation9,10,55,56,57,58,59. As such, much of the research on musical responses has focused on emotional responses. This research often adopts a basic emotions perspective, according to which there are basic or discrete emotions, such as happiness and fear, as well as complex or non-basic emotions such as jealousy and solemnity60. Basic emotions are said to be innately expressed and identified, whereas non-basic emotions are seen to be less biologically fundamental and more culturally variable60. As in the broader emotion literature, the main alternatives to a basic emotions perspective are dimensional perspectives, according to which emotions are organized around a few dimensions, most commonly valence (pleasantness) and arousal (activation)61,62,63.

Regardless of the model of emotions that researchers adopt, the studies of emotional musical responses reviewed here suggest that such responses are not supported by specialized adaptations. Whereas the psychological mechanisms underlying emotional responses seem to be largely conserved across populations, they reflect domain-general responses to emotion rather than music-specific psychological processes.

Cross-cultural similarities

Studies in which individuals were asked to rate emotions in foreign music have demonstrated that emotional expression is, to a modest degree, mutually intelligible across cultures64,65. For example, Mafa individuals in northern Cameroon accurately recognized emotions in western music designed to sound happy, sad and fearful66. Similarly, German, Norwegian, Korean and Indonesian individuals identified happy and sad instrumental performances by German musicians67. In another example, Indian, Japanese and Swedish listeners identified expressed emotions in the traditions of each other as well as in western music65,68. Finally, individuals from the USA and rural Cambodia tasked with creating music that expressed emotions like ‘sad’ or ‘happy’ created similar melodies69. The findings of these studies suggest broadly shared psychological mechanisms underlying the recognition of expressed emotions in music70.

Despite these similarities, culture still shapes how individuals recognize emotional expression in music. Participants might, on average, successfully recognize emotions in music from foreign cultures while nevertheless showing much lower accuracy than native participants. For example, although Mafa listeners successfully identified happiness, sadness and fear in western songs at a rate higher than chance, Canadian listeners accurately inferred the expressed emotion nearly twice as often66 (Fig. 2). Experimenters found similar results in several additional experiments65,67,68. In one, Canadian adults correctly identified joy, sadness and anger but not ‘peace’ in North Indian classical music64. In another, Swedish, Indian and Japanese participants identified anger, fear, happiness and sadness more successfully than supposedly ‘non-basic emotions’ like spirituality, solemnity and longing in western excerpts and the music of each other68. In a third study, Korean and Indonesian participants identified happiness and sadness in German music with relative ease but had difficulty recognizing surprise and disgust67. In fact, surprise and disgust were also hardest for Norwegians and Germans to recognize in German music (surprise tended to be confused with happiness and disgust was confused with fear and anger).

Fig. 2: Emotional communication in music.
figure 2

a, Mafa listeners in Cameroon and western listeners both identified happiness, sadness and fear in western music above chance, but the responses of western individuals were accurate much more often. b, Patients who underwent anteromedial temporal lobe excision (typically including the removal of the amygdala) had an impaired ability to recognize both scary music and fearful faces. Performance across the auditory and visual tasks was moderately correlated, raising the possibility that emotional recognition in music shares neural substrates with emotional recognition in faces. The emotions shown here represent a small subset of the emotions explored in this literature. Asterisk indicates that a significant correlation exists between the auditory and visual emotional tasks. Part a adapted with permission from ref. 66, Elsevier. Part b adapted with permission from ref. 100, Elsevier.

Some features of music are interpreted more variably across cultures than others, which further complicates the recognition of expressed emotion in music71. For instance, participants from the UK and participants from north-western Pakistani tribes made similar emotional inferences from features such as tempo, loudness and pitch. However, participants from the UK associated the major mode with happiness and the minor mode with sadness, whereas Pakistani participants apparently did not pay attention to mode in one study72 and exhibited the opposite set of responses in another73. In a similar vein, the extent to which both Chinese and Papua New Guinean participants associated the major and minor modes with positive and negative emotions, respectively, was predicted by their familiarity with western music74,75.

Although more precise evidence is needed concerning the exact effects of cross-cultural musical experience, together, the results above suggest that the recognition of expressed emotion in music involves a combination of culturally learned emotion cues and more universal psychological mechanisms.

Developmental trajectory

Children can identify some emotions in music by 3 or 4 years of age, although findings have been variable (Fig. 3a). For example, British 3-year-olds were presented with novel music for children and asked to indicate whether performances sounded ‘happy’ or ‘sad’. The children successfully identified happiness and sadness in both vocal and instrumental music76, with markedly better performance on ‘happy’ music. Likewise, Finnish and Hungarian children aged 3 and 4 years identified happiness and sadness in diverse musical performances (a folk song, stimuli produced by musicians) but not anger or fearfulness77. In another study, Canadian 5–8-year-olds identified high-arousal emotions (happiness and scariness) more successfully than low-arousal emotions (peacefulness and sadness) in musical stimuli designed for emotion recognition experiments; however, they were not as successful as 11-year-olds, who exhibited adult-like levels of accuracy78. Contrasting with evidence of early emotion recognition abilities, several studies have found that 3–4-year-olds failed to distinguish happy from sad songs79,80, although this might reflect experimenters using western classical music, complicating their interpretation.

Fig. 3: Ontogeny of emotional and behavioural responses.
figure 3

The ages of onset for the emotional (part a) and behavioural (part b) psychological responses to music discussed in the text. Emotional inferences appear in blue, while behavioural responses have been separated into form–function inferences (plum), responses to infant-directed song (green) and responses to rhythm (light blue).

Although developmental changes to emotional recognition in music parallel changes to emotional recognition in non-musical speech81, it remains unclear to what extent developmental differences are due to culture-specific learning. On the one hand, inferring emotional expression from mode seems both to develop after 5 years of age and to be cross-culturally variable in adulthood, suggesting a role for cultural learning72,79. On the other hand, children and even adolescents have difficulty identifying anger and fear in music77,80,82, yet these are among the emotions that adults recognize in music most reliably across cultures64,66,67,68, suggesting that some developmental trajectories play out similarly the world over.

Whether infants and toddlers can recognize emotion in music remains an open question. Several studies conducted in North America show that 9-month-olds can discriminate happy music from sad music83,84,85. However, discrimination does not imply recognition and, with few exceptions86, very little research has investigated emotional recognition in music in toddlers and infants younger than 3 years of age. This gap is somewhat surprising, given that many developmental paradigms, such as measuring looking time toward cross-modally matched faces and musical examples, could be straightforwardly adapted for such investigations. Indeed, several findings have raised the possibility that infants and toddlers can infer emotional content in music. For example, infants are both surrounded by music and fascinated by it7,87, they are attentive to the emotions of individuals with whom they interact88,89, and infants show a distinct set of psychophysiological responses to unfamiliar foreign lullabies relative to non-lullabies14. Thus, studies on emotional recognition in young infants are feasible and will help resolve to what extent infants are predisposed to associating emotions with acoustic phenomena.

Mechanisms for emotional recognition

The evidence that emotional recognition in music involves universal psychological mechanisms does not imply that those mechanisms are domain-specific. Rather, at least three lines of research suggest that emotional recognition in music draws on the same domain-general mechanisms involved in judging expressed emotion from non-musical stimuli such as non-musical vocalizations and facial expressions.

First, vocalizations produced in both musical and non-musical contexts use similar cues to communicate emotion. For example, in both music and speech, variations in tempo, volume and pitch often (although not always) communicate similar emotional states90,91,92. Like happy-sounding speech in English and Tamil, happy-sounding music in western music and South Indian music uses larger pitch intervals93. Angry speech and angry music are both characterized by faster and louder vocalizations, contrasting with the slower and softer sounds of music not typically found in angry contexts such as lullabies1. Non-musicians incorporate cues, such as tempo and volume, when producing emotional music94. When asked to make music sound happier, sadder or angrier, Finnish 3–5-year-olds adjusted tempo, pitch and volume in ways that mimic emotion cues in speech95. Chinese adults even attributed arousal and valence to environmental sounds, such as clapping, thunder or a car engine, when those sounds displayed tempo, volume and pitch cues that signal emotion in music and speech96.

Second, activity in brain regions during emotional recognition in music seems to correlate with brain activity involved in processing emotions in non-musical stimuli97,98. For instance, damage to the amygdala impairs the recognition of both scary music and fearful faces, and the performance of patients on both tasks was correlated99,100. In other research, participants exhibited activity in the medial prefrontal cortex not only when asked to track the emotional content of musical and non-musical linguistic vocalizations101 but also when processing the emotional content of body movements, facial expressions and non-linguistic interjections (such as “aah”)102. Finally, watching movements and hearing sounds associated with emotions evoked similar neural representations in visual and auditory areas of the brain, respectively, which suggested that emotional stimuli presented in diverse modes can elicit common representational structures103.

Third, children exhibit similar developmental trajectories for recognizing emotion in speech and music. Children start to recognize some emotions in speech and music by the age of four; they are better at identifying happiness and sadness than fear or anger in speech and in music; and they are capable of identifying emotions in other languages, although they are most accurate when listening to their native language81,104,105,106. When asked to rate clips of speech, music and affect bursts (such as laughter), the performance of Australian children in three age groups (7–11 years, 12–14 years and 15–17 years) and adults (18–20 years) was not distinguishable when labelling speech and music, although they were more accurate when labelling affect bursts81. Thus, the same developmental changes that allow children to recognize emotion in speech appear to be involved in recognizing emotion in music.

Despite many indications that recognition of musical and non-musical emotion expression draws on the same cognitive mechanisms, how emotion is communicated in music remains unresolved107,108,109. Consistent with basic emotion theories, basic emotions (such as happiness and fear) appear to be recognized in music both earlier in development and, in some studies, more reliably within and across cultures relative to non-basic emotions (such as jealousy and solemnity)64,66,68,78,90. However, researchers do not agree on which emotions are basic; there is conflicting evidence on whether there are distinct physiological correlates distinguishing basic emotions; and many canonical findings on emotional expression in speech come from studies in which actors portrayed emotional states (such as by acting happy), which might not accurately reflect naturalistic emotional displays62. These criticisms have inspired dimensional perspectives on communication of emotion in music, especially those centring on valence and arousal61,62,63. In support of such theories, an analysis of 53 studies published since 2003 found that a dimensional structure based on valence and arousal explains more variance in participants’ recognition of emotions in music than does a structure based on five basic emotions (anger, fear, happiness, love-tenderness and sadness)62. In addition, English speakers from 60 countries rating unfamiliar, foreign songs from 86 societies largely agreed with one another in their ratings of the valence and arousal of songs5. Thus, valence and arousal are reliably detectable dimensions of musical expression by listeners.

Resolving how emotion is communicated in music is complicated by studies of emotions felt while listening to music, which are difficult to reconcile with either the basic emotion or the dimensional perspective. A series of experiments with French-speaking listeners resulted in a nine-factor solution for recognized emotions in music (with factors such as amazement, tranquillity and power) and a related, although distinct, nine-factor solution for emotions felt from music (with factors such as transcendence, peacefulness and tension)110. Neither nine-factor solution was accounted for by basic emotion or dimensional theories. In another study, experimenters presented thousands of music samples to participants from the USA and China and asked them to label how the music made them feel, either by choosing from a list of 28 emotional categories or by rating each sample on 11 distinct Likert scales111. Thirteen dimensions of subjective experience were shared across both cultures, including basic emotions such as fear, joy and sadness as well as non-basic emotions like annoyance, triumph and dreaminess. Contrary to basic emotion accounts, non-basic emotions exhibited higher correlations across cultures than presumably basic emotions. Meanwhile, valence and arousal exhibited lower cross-cultural convergence than many other subjective experiences, challenging the theory that emotions, whether in music or more broadly, are constructed from these basic building blocks.

Although the structure of emotional communication in music remains unresolved, a general conclusion is clear: there is little reason to suspect that humans have specialized cognitive mechanisms for expressing and recognizing emotion in music. Rather, existing evidence suggests that individuals employ domain-general mechanisms for emotional communication in both music and speech. In this light, some basic aspects of musical understanding accord with the view that music is embedded in biology as one of several types of vocal signals18. As with much of human behaviour, emotional expression in music involves ‘variations on a theme’, where universal predispositions are modified by cultural exposure24. Individuals from distinct cultures can recognize emotions in the music of one another, yet they more successfully recognize some emotions relative to others and can fail to accurately interpret some acoustic cues. Similarly, young children can recognize expressed emotions in music although with limited and variable success. Thus, the role of domain-general mechanisms for the expression of emotion in music demonstrates how the diversity of the world’s music is structured by pan-human psychological predispositions.

Behavioural responses to music

In addition to processing purely auditory information (like pitch or timbre) and inferring emotional content (like expressed emotion described in the previous section), listeners also make inferences about the behavioural functions of music. By behavioural functions, we mean the social and behavioural ends for which people apparently produce music, including soothing an infant, accompanying dance and healing illness. Although these functions can leave sonic signatures on a recording, such as the sound of thumping feet in a group dance, this is not necessarily the case, as the behavioural function is foremost determined by the goals of the performer.

Although behavioural functions are related to the emotional content of music, they are a separable concept of interest for at least two reasons. First, individuals worldwide produce music for specific behavioural functions, such as dance or infant care, and comparative research suggests that many of these specific behavioural functions themselves appear reliably across societies1,5. Second, genetic evolutionary theories often explain the evolution of music in the context of specific behavioural functions such as enabling dancing18,29, soothing infants18,28, signalling mate quality112 and promoting social bonding19. Insofar as the music faculty involves domain-specific cognitive adaptations, we should expect those adaptations to be specialized for these behavioural functions.

Here, we review evidence that universal characteristics of human psychology guide individuals to respond to particular acoustical forms in similar ways. For example, humans around the world find slow, melodic music soothing and dance in response to louder, rhythmically dominated songs. In many experiments and a variety of populations, naive listeners intuit these associations: not only do they expect associations between song form and function but they reliably identify the behavioural functions of unfamiliar songs. Research demonstrates that behavioural responses to music, particularly to dance songs and lullabies, develop early and reliably across societies, although existing studies cannot determine whether those responses reflect domain-specific mechanisms.

Universal behavioural functions

A general tendency across animals is for communicative behaviours to be shaped by their intended function, manifesting as form–function associations in vocalizations113. For instance, low-frequency, harsh vocalizations tend to signal hostility because they are reliable indicators of body size114,115. Similar form–function associations characterize many human vocalizations, including spontaneous laughter116 and infant-directed speech15.

A series of experiments has investigated form–function associations in music using three related approaches: asking naive participants whether they can infer relationships between the form and function of foreign music; computationally identifying the acoustic features associated with particular behavioural functions in music; and analyzing how those acoustic features explain the inferences of listeners1,5,15,117,118,119. These approaches are informative for two reasons. First, they test whether songs that share behavioural functions exhibit common acoustical designs across societies, helping uncover whether universals in human psychology guide both musical production and response. Second, they test whether individuals have shared conceptions of what songs should sound like5,119. Although it can be difficult to determine whether these conceptions result from cultural learning or intuitions that predate cultural encounters with music, studying them in young children or infants helps elucidate to what extent form–function intuitions are shaped by cultural experience117. Methodologically, form–function experiments share a basic structure1,5,15,117,118,119. Naive listeners are presented with random excerpts of foreign songs, typically field recordings from small-scale societies. They are then asked to evaluate the songs’ functions such as by rating them on scales or selecting behavioural functions in forced-choice tasks. Finally, researchers identify acoustic features that predict the inferences of listeners to deduce their intuitions about song functions.

Across a variety of populations — among young children117, in small-scale societies119, in massive online experiments conducted with English speakers1,15 and in multilingual online experiments with participants in 59 countries119 — naive listeners infer the behavioural function of foreign songs above chance5 (Fig. 4). At least three lines of evidence suggest that this performance reflects reliably developing intuitions grounded in a universal human psychology more so than encounters with similar music. First, the familiarity of listeners with globalized musical culture does not explain their ability to identify song functions. Individuals in smaller-scale societies with limited access to western music successfully identified form–function relationships, and listeners whose experiences more closely matched the culture of the singer (whether measured in linguistic or geographic distance) were only modestly more successful at identifying them119. Second, children performed roughly equivalently to adults (with significant but very small effects of age), suggesting that abilities to infer form–function relationships required little experience117. Third, individuals unfamiliar with particular song domains — namely, westerners unfamiliar with healing songs — nevertheless identified form–function relationships1,5, suggesting that intuitions develop even without exposure to the relevant domain.

Fig. 4: Evidence of universal form–function inferences in song.
figure 4

Four lines of evidence indicate that naive listeners of diverse ages and cultural backgrounds can infer the behavioural functions of unfamiliar foreign songs. The four song types studied here (dance songs, healing songs, love songs and lullabies) represent a subset of the many behavioural ends for which individuals use music. In a forced-choice categorization, English-speaking participants in a massive online experiment (n = 29,357) successfully categorized dance songs, lullabies, healing songs and love songs at rates higher than chance (25%); love songs were the hardest to recognize (part a). Percentages below the right-hand panel show base rates of response for each type. English-speaking children (n = 2,624) successfully identified dance songs, lullabies and healing songs with only slight increases in accuracy across ages; love songs were not tested (part b). Adults in 49 countries (n = 5,524) who each spoke one of 28 non-English languages (part c) and participants in three smaller-scale societies in Indonesia, Ethiopia and Vanuatu (n = 116) were presented with the same foreign dance songs, lullabies, healing songs and love songs (part d). For both the non-English-speaking internet users and participants in smaller-scale societies, the experiment was completed in the local language only. Both the non-English-speaking internet users (part e, left half of violin plots) and participants in smaller-scale societies (part e, right half of violin plots) successfully identified dance songs, lullabies and healing songs (were rated above the average rating on the matching scale, indicted by both z-score ratings above zero on the violin plot); love songs were not recognizable. Part a, right, adapted with permission from ref. 1, ©The Authors, some rights reserved; exclusive licensee AAAS. Part b, right, © 2022 APA; adapted with permission from ref. 117. Part e adapted from ref. 119, CC BY 4.0 (

Analyses of acoustic properties of songs have provided strong evidence of form–function associations in the music of the world. For one, low-level acoustic properties of songs extracted using automated techniques, such as roughness or inharmonicity, reliably co-occurred with behavioural functions across diverse, distantly related human societies1. Moreover, a machine learning model successfully classified the behavioural functions songs on the basis of acoustic features, even when it was trained on data from songs from some societies (such as from 29 of 30 world regions or from all Old World societies) and evaluated using song data from other societies (such as from the 30th world region or all New World Societies)1. Acoustic features predicted not only the actual behavioural functions of the song but also the inferences of the behavioural functions by listeners1,117. Together, these analyses suggest that universal features of human psychology predispose individuals in any society to associate particular sounds with certain behavioural functions.

Development and domain specificity

The universality of form–function associations suggests that different psychological mechanisms are involved in responses to songs of distinct behavioural functions. Turning to development and domain-specificity, we focus here on responses to lullabies and dance songs, for several reasons. Lullabies and dance songs are the most stereotyped song domains across cultures and are identified by naive participants with the highest accuracy1,5,117,119. They have also been hypothesized to be central to the evolution of music19,120, such as in the context of credible signalling18,28,29. Among the different behavioural responses to music, responses to lullabies and dance songs are most likely to reflect evolved specialized adaptations, making them prime candidates to study early development and domain specificity.

As we review here, the psychological responses underlying both lullabies and dance songs appear early in development in human populations around the world. However, whether these responses reflect domain-specific cognitive processes remains unresolved.


Infants seem predisposed to responding to infant-directed songs and, in particular, to songs that are intended to soothe them or put them to sleep (lullabies) (Fig. 3b). Canadian infants preferred infant-directed songs over non-infant-directed songs121 and preferred maternal infant-directed song over maternal infant-directed speech122. Infants were soothed by familiar songs more than unfamiliar ones12 and, in at least one experimental paradigm, by lullabies more than play songs (vocal music directed towards children, often characterized by excitatory, amusing characteristics)13. Such behavioural responses are a likely reason why, across demographics, most parents in the USA sing to their infants daily7.

Several lines of research indicate that early-developing responses to lullabies are universal. Parents worldwide sing to their infants1, and infant-directed songs exhibit acoustic regularities14,15. Infants in the USA relaxed in response to foreign lullabies, more so than to non-lullabies, and relaxed the most to lullabies that exemplify infant-directedness14, suggesting that lullabies use common features to evoke similar psychological responses15. Although this idea might seem intuitive, consider that most lullabies infants hear come from their caregivers. Infants are highly sensitive to the identities of the individuals who interact with them, forming inferences about individuals on the basis of the language or dialect they speak123, the foods they eat124 and the music they produce125,126. Given this, infants might well be calmed by anything a trusted caregiver does for them. However, infants relax in response to lullabies produced by unfamiliar individuals in unfamiliar cultures and in unfamiliar languages that the infant cannot understand, showing that lullabies produced worldwide are well-designed to calm infants, even in the absence of rich social cues of caregiver identity.

Evidence is mixed for whether behavioural responses to lullabies reflect domain-specific adaptations. On the one hand, humans seem to respond most to lullabies during infancy, consistent with specialized cognitive mechanisms being expressed in the developmental stages when they are most useful. On the other hand, according to a preprint that has not yet undergone peer review, many English speakers use lullaby-like music (such as pop music, including a ‘lullaby’ genre label on Spotify) to fall asleep127, and many features of lullabies (such as lower tempo, loudness and energy) are reliably present in many other forms of music127. Furthermore, infants are soothed by many sounds other than lullabies, most notably shushing128,129. Sounds with minimal formant structure, including shushing or white noise, were effective at masking other sounds, such as tones or speech130, facilitating sleep in both infants129 and adults131 (as they were less likely to hear random sounds and be awoken by them). It therefore remains unclear whether lullabies soothe infants because of cognitive mechanisms specialized to respond to them or because the songs appeal to cognitive mechanisms that evolved for non-musical functions.

Dance songs

The perception and processing of rhythmic information, essential for the behaviours associated with dance, begin early (Fig. 3b). Newborns discriminated between languages with differing rhythmic profiles132, and the patterns of neural activity in Hungarian neonates indicated sensitivity to onsets and offsets of musical rhythm as well as the rate at which sounds are presented133,134. Indeed, the music perception abilities of infants are tuned-in to rhythms. European 2-month-olds perceived differences in rhythm and tempo in tone sequences135,136,137 whereas Canadian 7-month-olds showed EEG responses frequency-locked to rhythms138. Moreover, the developmental trajectory of rhythm perception is suggestive of perceptual narrowing: North American infants reacted similarly to disruptions of western and Balkan rhythms at 6 months yet did not react to disruptions of Balkan rhythms at 12 months139,140.

Infants also move in response to rhythms. In two experiments, Swiss and Finnish infants aged 5–24 months listened to clips of music, rhythm and speech141. Although no infant demonstrated entrainment — the synchronization of actions, such as body movements, to a recurring rhythmic event — the infants moved more to music and rhythms than to speech. In addition, although the youngest infants moved more inconsistently, the experimenters found no changes in the responses of infants between the ages of 7 and 24 months. These behaviours are commonly observed in naturalistic settings: in a sample of US parents of infants aged 0–24 months, the vast majority reported seeing their infant dance in the first year of life142. Humans appear to come into the world ready to respond to rhythm.

Despite their early music perception abilities, individuals nevertheless must learn to entrain to a beat143. Infants aged 8 months in the USA discriminated between synchronous and asynchronous dancing that they observed144, yet studies with Japanese infants and German preschoolers suggested that reliable beat entrainment does not appear to develop until toddlerhood145,146. Even so, the ability to synchronize to a beat is modest at such young ages, as any parent can tell you. Studies with participants from the USA suggest that the accuracy of synchronized movements does not approach adult levels until 10–12 years of age147,148.

Whether dance and other rhythmic behavioural responses to music reflect domain-specific specializations remains an open question. The only animals aside from humans who spontaneously perceive a beat and synchronize to it are parrots45,149,150. This observation has been taken as evidence that a capacity for rhythm is not a derived adaptation but rather a by-product of advanced vocal learning abilities, which both parrots and humans exhibit149. Vocal learning involves intrinsic rewards for predicting the temporal structure of auditory sequences and establishes tight reciprocal communication between motor planning regions and forebrain auditory structures45. As a result, individuals are motivated to produce synchronized action, such as dancing or singing to music, which is intrinsically rewarding. This explanation of beat entrainment is similar to that developed within a model of predictive coding of music, which also posits that synchronized action is a way of reducing reward prediction error (although without invoking the advanced ability of humans for vocal learning)151. Regardless, these explanations suggest that the rhythmic aspects of spontaneous dancing might derive their pleasurable outcomes152,153,154,155 via cognitive mechanisms that are not specific to music.

Some observations still raise the possibility that the cognitive mechanisms involved in beat perception and entrainment are domain-specific adaptations45. First, the capacity for beat perception and synchronization is not shared with the closest living relatives of humans, chimpanzees156. Second, a complex neural architecture underpins rhythmic entrainment in humans44. Third, humans can and do entrain to rhythms for long periods of time, unlike parrots, who only entrain for shorter durations150. Fourth, beat entrainment is typically a social activity in humans2,157, whereas in parrots it is not. Last, two genetic loci associated with the self-reported ability to synchronize (by clapping) to a beat are in ‘human accelerated regions’158— that is, in regions of the human genome that have substantially diverged from chimpanzees. Together, these observations have been taken as evidence for the hypothesis that humans evolved specialized adaptations for music, potentially through gene–culture coevolution (namely the interaction of genetic and cultural evolutionary processes), which could have contributed to the evolution of human musicality if the cultural invention of music subsequently selected for domain-specific (music) adaptations19,45.

Although each of the five observations above represents a promising area to test the domain-specificity of rhythmic abilities, each is still consistent with rhythmic entrainment being a by-product of vocal learning. The social aspects of dance could simply reflect the profound sociality of humans as opposed to any specialization for rhythm. The complex neural architecture, increased motivation for rhythmic engagement, and the absence of beat perception and synchronization in non-human primates could all reflect selection for sophisticated vocal learning in the human lineage159,160,161. By studying the overlap of mechanisms involved in beat perception and synchronization with those of vocal learning, future research will better pinpoint whether human psychology is specialized for rhythm.

In summary, humans appear universally predisposed to find lullabies soothing and to move rhythmically in response to dance songs, and these predispositions appear early in the populations where they have been studied. However, current research cannot establish whether lullabies and dance songs stem from domain-specific, evolved specializations or are instead by-products of mechanisms that evolved for non-musical functions. More generally, work on behavioural responses to music advances the understanding of musical diversity and function. It demonstrates that music is not a fixed biological response, adapted for a single end like mating or group bonding. Rather, it is deployed for many social goals, some of which appear to be universal, particularly soothing infants and dancing. This universality reflects shared features of human psychology, which predispose humans to respond in particular ways to certain sounds and which, in turn, produce form–function relationships in the music of the world.

Cultural transmission of music

The music of the world exhibits both profound similarities and striking idiosyncrasies. These patterns of universality and diversity can emerge and persist through cultural evolution, which both crafts ubiquitous musical traditions adapted to shared features of human psychology and canalizes idiosyncratic cultural differences in musicality162.

As an example, consider the universal tendency of vocal music to be composed of predominantly small melodic intervals and rhythmic patterns defined by integer ratios1. These characteristics could reflect biological specializations to produce music18,19,120. Alternatively, they could also emerge as individuals preferentially adopt and perform music that is easier to learn and transmit163,164, paralleling how language-like systems evolve to become more transmissible across generations165,166,167. For instance, Scottish participants were asked to imitate random drum sequences; their attempts became the model stimuli for the next group of participants, who in turn produced sequences for a subsequent group. Over the course of the study, as participants transmitted their attempts, random sequences evolved into rhythmically structured patterns163. The patterns exhibited near-universal rhythmic features, such as hierarchical structure and isochronous beats, arguably because they were easier to learn and transmit. Similarly, participants who produced and transmitted sets of whistled signals eventually developed whistled patterns that exhibited some but not all melodic near-universals168. Ubiquitous musical features might emerge simply as performances adapt to the constraints of memory and learning; biological adaptation need not be the primary explanation for such effects.

Cultural evolution can produce widespread patterns in music through mechanisms beyond making performances easier to learn and reproduce. Researchers increasingly focus on how individuals produce and selectively retain cultural products evaluated as best satisfying the goals of an individual, a process labelled ‘subjective selection’169. Subjective selection seems to underlie the evolution not only of useful technology169,170 but also of many domains of so-called ‘symbolic’ culture, including social norms171,172, fictional narratives173,174, and religious practices and beliefs175,176,177. Subjective selection is a promising explanation for some musical universals. As long as individuals consistently perceive certain musical features to be useful for producing particular ends, then cross-cultural convergence should be expected169. If individuals everywhere tend to dance to certain sounds or to be soothed by certain sounds or regard certain sounds as communicating particular emotions, then cultural evolution should lead to similarities as individuals craft and retain music that seems to best satisfy those ends. As we have shown, shared features of human psychology indeed predispose humans to respond to music in similar ways. Such predispositions might result from human-specific adaptations, such as the physical limits of human auditory perception, or they might result from constraints that are shared across species114. Cultural evolution likely exploits these shared psychological predispositions to produce compelling performances, yielding reliable cross-cultural associations between musical form and emotional content65,66,67 or musical form and behavioural function1,5,118,119.

Cultural transmission also sustains and drives musical diversity. Differences in music can emerge for many reasons, such as social structure6,178, motor constraints179 or stochasticity163. These differences can, in turn, stabilize as the cultural exposure of individuals canalizes how they produce or respond to music180. For instance, Australian undergraduates show memory advantages for melodies in familiar compared to unfamiliar tuning systems181,182. Similarly, North American and western European adults have difficulty remembering or producing rhythmic patterns that do not exhibit a familiar metrical structure (isochrony)183,184,185,186. These types of biases seem to develop early, as infants become accustomed to the music they are exposed to139,140. Such musical enculturation, a topic of longstanding interest in music research187, has been corroborated by cross-cultural studies, which reveal patterns consistent with a core set of musical universals underlying broad cross-cultural diversity. For example, according to a preprint that has not yet undergone peer review, in 39 participant groups across 15 countries, differences were documented in the distributions of preferred rhythmic integer ratios in a tapping task, often reflecting local musical traditions188. Nevertheless, all participant groups favoured small integer ratios, indicating that discrete representations of rhythm were universal. As cultural traditions diverge and differences become canalized, music diversifies189,190,191,192, but it apparently always retains some universal properties.

By crafting products that are memorable, transmissible and (most importantly) compelling for achieving specific ends, such as dancing or communicating emotion, cultural evolution creates auditory cheesecake. In other words, generations of cultural transmission and ingenious tinkering interact to produce compelling auditory stimuli that appeal to psychological mechanisms that exist for non-musical functions.

Summary and future directions

In this Review, we provided evidence of the universality and early development of many psychological responses to music yet uncovered few indications of innate domain-specificity. Although the systems underlying these responses could become specialized for or adapted to music over the course of development193, the current evidence is consistent with music communicating emotions, soothing infants, urging individuals to dance, and inducing other emotional and behavioural responses by appealing to features of human psychology that have evolved for non-musical functions. Moving forward, it will become important to further investigate how genetic and cultural evolution give rise to musical behaviour while expanding the musical responses under consideration. In that vein, our Review highlights four key topics for future work to address.

First, research in neuroscience and genetics provides powerful new tools to study the neural and genetic mechanisms underlying musical responses. These tools, in turn, will allow researchers to better assess whether humans have evolved specialized adaptations for responding to music. For instance, research has shown that the neural and genetic mechanisms involved in beat perception and synchronization are also involved in vocal learning, consistent with the by-product account reviewed above45,158,159. Similar approaches applied to other emotional and behavioural responses can help map out the proximate and ultimate reasons humans find music so compelling.

Second, future research will help clarify how universal psychological responses give rise to the profound musical diversity observed in human societies. Although explaining and studying musical diversity is a focus in ethnomusicology6,194,195, cognitive and behavioural research on music has, with few exceptions179,196, overlooked the question of why musical traditions vary in the ways that they do. As researchers gain a better grasp of how and why psychology and culture vary across populations25,197,198, the ability to explain the drivers of musical diversity will also improve.

Third, research on psychological responses beyond emotion will help elucidate the diverse social roles of music. Most research on psychological responses to music has focused on emotional communication, yet music has many other effects, including many beyond the emotional and behavioural responses covered here. Across cultures, individuals use music to heal illness, mourn death, tell stories, greet visitors and demonstrate virtuosity1. Music can influence the content, vividness and sentiment of directed imagination199 and help induce mystical experiences for individuals taking psychedelic drugs200. Songs can evoke animals, as in the Sámi yoik tradition201, as well as communicate a staggering richness of information202. Depending on the culture, people can interpret differences in pitch to mean that particular sounds are hot, far, smooth, old, full, active, happy, sleepy, wintry, masculine, and either like a crocodile or like individuals who follow crocodiles43. Strikingly, even these inferences are, to some degree, interpretable across cultures, suggesting cross-domain and cross-culturally consistent mappings that connect concepts, acoustic features and other sensory information43,69,203. Research on responses beyond emotion can advance our understanding not only of the diverse effects of music but also of the more general processes involved in deriving meaning from sensory stimuli.

Last, musical aesthetics represents a uniquely controversial and difficult topic for future research. The most obvious aspect of music perception is that music sounds good. However, aesthetic value in music is poorly understood204. This gap is demonstrated by the ongoing difficulty of accurately predicting individual music preferences205, even by corporations that benefit hugely from doing so such as music streaming and recommendation services like Spotify or Apple Music. Research investigating why music is pleasant will expand the understanding of why individuals produce and listen to music.

Research connecting the psychology of music to its cultural and biological evolutionary roots has exploded in the last two decades, uncovering new insights about the origins of this pervasive yet puzzling behaviour. We expect that successful research in these four topics will accelerate scientific insights, helping uncover not just why humans produce and respond to music but also how cultural and biological evolution interact more generally to shape human behaviour.