Music exhibits some cross-cultural similarities, despite its variety across the world. Evidence from a broad range of human cultures suggests the existence of musical universals 1 , here defined as strong regularities emerging across cultures above chance. In particular, humans demonstrate a general proclivity for rhythm 2 , although little is known about why music is particularly rhythmic and why the same structural regularities are present in rhythms around the world. We empirically investigate the mechanisms underlying musical universals for rhythm, showing how music can evolve culturally from randomness. Human participants were asked to imitate sets of randomly generated drumming sequences and their imitation attempts became the training set for the next participants in independent transmission chains. By perceiving and imitating drumming sequences from each other, participants turned initially random sequences into rhythmically structured patterns. Drumming patterns developed into rhythms that are more structured, easier to learn, distinctive for each experimental cultural tradition and characterized by all six statistical universals found among world music 1 ; the patterns appear to be adapted to human learning, memory and cognition. We conclude that musical rhythm partially arises from the influence of human cognitive and biological biases on the process of cultural evolution 3 .
Percussion instruments may have provided the first form of musical expression in human evolution. Great apes, our closest living relatives, show drumming behaviour 4 , which they can learn socially 5 , producing some human-like rhythmic sequences 6 . Percussive behaviour may therefore have already been present in our ancestors some million years ago, before the split between the human and Pan lineages 2 . Archaeological findings also suggest that the first human musical instrument might have been percussive, as also attested in modern hunter-gatherer societies around the world 7 . This makes rhythm a particularly apt musical dimension for reconstructing crucial steps in the evolution of music.
Six rhythmic features can be considered human universals, showing a greater than chance frequency overall and appearing in all geographic regions of the world. These statistical universals 1 are:
A regularly spaced (isochronous) underlying beat, akin to an implicit metronome.
Hierarchical organization of beats of unequal strength, so that some events in time are marked with respect to others.
Grouping of beats in two (for example, marches) or three (for example, waltzes).
A preference for binary (2-beat) groupings.
Clustering of beat durations around a few values distributed in less than five durational categories.
The use of durations from different categories to construct riffs, that is, rhythmic motifs or tunes.
Until now, research on musical universals has focused either on individual psychological processes 8 , investigating rhythm perception/production in meticulously controlled environments 9,10 , or on large-scale phenomena, performing cross-cultural analyses of world musical traditions 11,12 . Combining these approaches, we show that basic psychological mechanisms (working memory, perceptual primitives, categorical perception and so on) can lead to large-scale musical universals via cultural transmission. Our experiment aimed to reconstruct, in a laboratory setting (Fig. 1a), how initially unstructured sounds might have been shaped into complex musical systems by early humans perceiving and imitating them 7,12,13 . We tested experimentally controlled human microsocieties and show that cultural transmission accounts for the emergence of both structural regularities and all of the predicted rhythmic universals. Our method builds on previous experimental methodologies, which showed how systematic structure may emerge from weak learning biases 14 . Iterated learning (Fig. 1a) refers to a process by which the individual learns a new behaviour by observing another individual who acquired the behaviour in the same way 15 . This method directly taps into the dynamics of cultural transmission, thereby enabling an empirical approach to human cultural evolution 16 . Iterated learning of artificial sounds 17 , visual representations 18 and language-like systems15,19 can lead to a large range of outcomes. However, two characteristics seem to emerge in most experiments: random patterns evolve into sequences that exhibit increasing learnability and structure over generations of learners 19,20.
In a similar way to vertical transmission shaping the complexity and variety of musical cultures 3,12 , in our experiment, each participant heard and had to imitate drumming patterns received from a previous participant, who themself had copied them from someone else and thereby potentially introduced errors. In measuring the changes that occurred to the drum patterns, we observed how cognitive biases for rhythm are magnified and mirrored in musical structure and how initially independently reproduced sequences come to pattern together as part of an overall rhythmical system 21 . As predicted, after several experimental generations, initially random sequences transformed into increasingly structured and learnable music-like patterns. In addition, these patterns showed convergence towards all of the six rhythmic universals found in human musical cultures 1 .
First, the sequences acquire systematic structure. Systematicity is a measure of mutual predictability among the elements of a system; it quantifies how much structural information about a whole system is provided by each constituent element. In musical harmony, for instance, rock and roll is very systematic, because knowing a musical excerpt provides a better than chance guess of the chord progressions of a broad range of songs, while dodecaphonic music is less systematic. We found an increase in structural similarities and combinatorial structure over generations (Page’s trend test; L = 1558.0, m = 6, n = 9, P < 0.001; Fig. 1b).
Second, sequences become easier to learn. A system or structure is highly learnable if it can be rapidly acquired with low error by an organism. Reproduction errors (time distance between participants’ outputs) decrease over generations (Page’s trend test; L = 833, m = 6, n = 8, P < 0.0001; Fig. 1c). Learners in later generations found the rhythms easier to imitate accurately, indicating that patterns increasingly fit participants’ cognitive biases.
Third, timing patterns converge to durational categories. The frequency distributions of inter-onset intervals (IOIs; the time between consecutive drum hits) of all chains showed a similar pattern across experimental generations: initial uniform distributions (the random patterns presented to the first generation) converged on chain-specific clusters of IOIs by the final generation (Fig. 2). A K-means cluster algorithm showed that rhythmic patterns converged to three durational categories (Supplementary Table 1), matching the statistical universal across world musical cultures, which predicts less than five categories 1,11 . The range of durations produced by our participants was consistent with musical rhythms, as used in rhythm experiments 9 . The first cluster in all chains had a median of 203 ms (Supplementary Table 1), close to 200 ms, a recurrent durational value in musical rhythm and metre 22 . Moreover, the resulting centroids of the clusters were related by numbers close to integer ratios (Supplementary Table 1).
Fourth, systematicity and learnability increase, translating to the emergence of repeating structures (phase-space plots in Fig. 3a). Specifically: (i) rhythmic patterns acquired motivic structure (another musical universal 1 ) whereby rhythmic ‘riffs’ emerged, corresponding to polygons in phase-space coordinates where the number of vertices equals the length of the repeating riff within a pattern 23 ; (ii) riffs were used multiple times by each participant across separate drum patterns, shown by similar polygons overlapping in one phase-space plot; (iii) motivic patterns evolved gradually as they were passed from earlier to later generations (Fig. 3a, similar polygons in different plots of one chain); and (iv) riffs partly differed between chains (different polygons in different chains).
Fifth, sequences become more metronomic (isochronous), hierarchically structured (metrical) and composed through durations that are related by small-integer ratios. Isochrony and metre in perceived music are usually probed by asking participants to tap along, testing whether their taps occur at simple multiples or at divisors of the occurring musical intervals. As our task involved musical production, we reversed the above logic. Participants creating a metrical grid with binary and ternary subdivisions and an underlying regular beat 24 would produce: (i) adjacent IOIs related by small integer ratios; (ii) IOIs with many values close to 1:1 (equal-length IOIs); (iii) IOIs with ratios of two and three (showing binary and ternary subdivisions) 24 ; and (iv) the strongest beats at IOIs twice or three times their multiples of each other, suggesting musical metre. We found that the distributions of ratios in the last generation (Fig. 4a,b) significantly differed from a simulated uniform ratio distribution, predicted under null hypothesis of no pairwise structure between IOIs (two-sample Kolmogorov–Smirnov test, all D > 0.08, P < 0.01; see Supplementary Information). This holds for both the distributions of adjacent IOIs and of IOIs between high-intensity hits, suggesting the existence of structural relationships between IOIs. We then tested whether peaks in the ratio distributions (Fig. 4a,b) corresponded to specific constant relationships between IOIs (see Methods). One of the highest peaks in Fig. 4a occurs at 1.015, and the median of the distribution is at 0.968. Both values are close to 1:1, providing some evidence for isochrony, another universal. We then tested whether the highest peaks in Fig. 4a,b coincided beyond chance with those expected theoretically in actual music. For adjacent ratios, we found four peaks, at 1:2, 1:4, 3:2 and 3:4. The match between ratios expected in music and experimental ratios is not attributable to chance. (The corresponding Jaccard index, which measures overlap 25 , is J = 0.222; a randomization test returned an average J value of 0.064, with a pseudo P value (P′) of 0.029; see Supplementary Information.) A similar analysis on the distribution of the ratios of IOIs between strong beats (median = 0.947) found support for the hypothesis that metre is exclusively binary (J = 0.028, P′ = 0.045), with strong and weak alternating beats, but not exclusively ternary (J = .028, P′ = 1.0). Strong beats occurred above chance in intervals that were half or double each other in length (that is, related by 1:2 and 2:1 ratios). Notes of ternary length existed, but did not always coincide with the metrical grid (for example, a binary metre with many notes of length 1/4 and 3/4). This suggests the presence of: (i) an underlying regular beat; which is (ii) composed of alternating weak–strong beats; and (iii) used as a reference duration to generate the duration of other notes (by multiplying and dividing it by two or three), providing evidence for the remaining universals.
Sixth, chains evolve independently. We calculated the Kolmogorov–Smirnov D statistic for each generation and pairs of participants using their distribution of IOIs to quantify the degree of cultural divergence. Chains significantly diverged over generations towards separate lineages, with different timing structures (L = 1586.0, m = 6, n = 9, P < 0.001; Fig. 3b). Moreover, all IOI distributions of the final generations were significantly different between chain pairs (all D < 0.3, all P < 0.01; Supplementary Table 2). Hence, the drum patterns within the same lineage participated in a system of rhythmic patterns, sharing similar characteristics or motifs. As in actual music 12 , chains gained more structure over generations, although each transmission chain developed its own set of structural features.
It has been debated whether some human biological traits evolved under selective pressures to specifically hear and perform music
Similarly to previous results on the evolution of linguistic structure 15,17,19 , we hypothesize that a few perceptual, learning and production biases may be responsible for the regularities that evolved in our drumming patterns. The formation of durational categories and small-integer ratios between intervals might be partially amenable to the categorical perception of rhythmic sequences. In fact, small ratios function as attractors when musicians are asked to categorize notes of varying durations that are not related by integer ratios 29 . The proximity, although not equality, to integer ratios dovetails with previous findings in music psychology 30 . The emergence of a few durational categories and motifs may instead be a by-product of the human tendency to compress sensory stimuli, possibly dictated by working memory constraints and a limited capacity for processing information 20 . Conversely, motor biases seem to only moderately influence the structures obtained; humans have a preferred tapping rate 22 of 600 ms on average, which was rarely found in our IOI distributions and clusters (Supplementary Table 1 and Fig. 2). However, our experiment cannot disentangle which of the human biases generating musical features are basic and which are acquired, and at least two alternative hypotheses can account for our results. In other words, the fact that our participants had already been exposed to a musical culture may have shaped the results. However, two points counteract this interpretation. First, we saw clear divergence between chains, suggesting that there was no single culturally acquired attractor that was driving the evolution of the systems. Second, there were striking parallels in the evolution of systematic structure between this experiment and another sequence learning experiment in the non-musical domain 21 . Ultimately, cross-cultural replications of this experiment are needed to accurately gauge the influence of acquired biases in this task.
Music, language and dance all involve copying to some extent, although imitation/copying is only one of many factors in their evolution 3,18,27,28 . Although the motivations to copy probably differ, the outcomes seem to be similar. We believe the assumption that early humans might have had a motivation to copy music-like sequences is quite realistic. Several hypotheses on the origins of the biological capacity for musical rhythm involve some motivation to copy or imitate. These hypotheses often suggest imitation, learning or synchronization of audio-motor behaviours as necessary steps to achieve interindividual coordination, group cohesiveness, mating success or territorial defence, providing in turn evolutionary pressures on the development of rhythmic abilities in modern humans 26 .
Human music is inherently structured, showing a few structural similarities across musical cultures and traditions. Why do these similarities arise? How do different musical traditions end up with similar features? We have addressed these questions empirically: in the laboratory, we set the conditions for random percussion patterns to be transmitted, similarly to real musical traditions. As a result, we were able to witness the evolution of musical rhythmic structure in real time, as it responded to human constraints and converged towards all six statistical universals found among world rhythms. Musical rhythmic universals arise because human behaviour and cognition slightly transform what is copied 13,15,18,19 . These transformations, amplified by the process of cultural transmission, lead to diverse musical traditions that contain nonetheless a few universals: traces of the biology of the organisms who created them.
Forty-eight participants (mean age 23 years 4 months; females = 37) were recruited from the University of Edinburgh’s graduate employment service “to participate in a 30-minute drumming experiment”. Each received £5 for participating. Musicians (those that had formal musical training or regularly practiced a musical instrument) were excluded from participation. The sample size was established a priori on the basis of a meta-analysis of previous iterated learning experiments
This experiment was modelled on a simple transmission chain paradigm, in which learners received training inputs from the outputs of the previous learners 16 . Participants were randomly assigned to six different lineages (transmission chains: 1–6), each containing eight ‘generations’ of learners (1–8). The first generation of participants in all chains heard different randomly generated patterns as training input (first column in Fig. 2 and Fig. 3a).
Participants in each generation were presented with 32 drum patterns. These patterns were either random drum sequences (generation 1) or sequences produced by a previous participant (generations 2–8). The 32 initial and independent drum patterns were each composed of 12 MIDI (Musical Instrument Digital Interface) snare drum hits (Supplementary Information). Each chain had its own unique set of 32 initial random patterns. Each snare drum hit in the initial sequences had a random velocity (force and speed used to play an instrument) and the IOIs (duration between the start of one note and the start of the next note) were random. An additional cymbal sound was always presented 1.5 s after the last snare hit and signalled the end of a sequence. The cymbal timing was neither counted as part of the pattern nor included in the analyses. Participants heard and reproduced two blocks of the same 32 drum patterns, with the order of drum patterns within each block randomized. The first block of patterns was intended for the participant to practice drumming and copying. Patterns reproduced in the second block, recorded on a laptop, were used as the training stimuli for the next learner in the chain.
Participants were given headphones, a single drumstick and an Alesis SamplePad, which was connected to a Macbook Pro laptop via a Duo-Capture EX USB-MIDI interface. The Python code that recorded the drumming patterns rounded temporal information to the nearest millisecond (although the theoretical maximum resolution of MIDI is slightly better than 1 ms). The interface had four independent drum pads; three produced the snare drum sound and one produced the cymbal sound, which participants struck to conclude a pattern. Participants were instructed to reproduce each pattern immediately after hearing it to the best of their ability. Each sequence was recorded and given to the next participant in the chain. Participants were unaware that they would be listening to stimuli produced by a previous learner. After the behavioural task, participants completed a questionnaire (Supplementary Information).
The output patterns were analysed to determine if the the initially independent sequences evolved, becoming easier to learn over generations and forming rhythmic-like systems with structural regularities. Data analysis was performed in R, Stata 11.0 and using custom written Python scripts. All analyses were performed on the IOI between contiguous drum hits within a pattern. Experiments in human perception of musical rhythms have shown that the IOI is usually more important than the length of the notes themselves
. Several quantitative measures were adapted to assess the learnability and structure of the patterns
Beat and metre
Ratios were taken to normalize with respect to tempo and to compare structures (rather than absolute durations) across patterns. For each ratio distribution, we found the location of the maxima by taking the second derivative of the kernel density estimation (KDE) function. We then tested whether these fixed IOI relationships (the peaks in Fig. 2) coincided beyond chance with those expected theoretically. The most parsimonious way of generating a musical duration from another is to multiply or divide it by two, three or four. Hence we predicted that we would find, with high frequency, ratios of 1:1 (equal duration IOIs), 1:2, 1:3, 1:4, 2:3, 3:4, and their reciprocals, giving a total of 11 expected theoretical ratios. As the predicted ratios spanned 11 possible values, we extracted the 11 most frequent ratios from our empirical distributions. We then matched the expected with the empirical ratios (with a 0.01 tolerance on ratio differences) and quantified the match using the Jaccard index 25 . Given two sets, the Jaccard index is calculated as the ratio between their union and their intersection, that is, the number of elements in common divided by the number of overall elements. Finally, we performed a Monte Carlo simulation with 1 million iterations to test whether the matching of the predicted and found peaks was attributable to chance. This provided a P′ value, calculated as the proportion of randomizations with an average Jaccard index greater than or equal to the empirical Jaccard index; that is, the relative number of cases for which a list of 11 random ratios has equal number or more matches with predicted ratios than the 11 empirical ratios.
Structure and systematicity
Unlike previous cultural transmission research, the transmitted behaviour in this experiment was continuous (that is, time intervals) rather than discrete. We discretized the intervals into three categories using a K-means clustering algorithm (Supplementary Table 1). We mapped each duration to the tercile it belonged to (for example, three durations such as (0.1, 0.8, 0.4 ms) would map to (short, long, medium)). The number of categories in the K-means algorithm was established using the Elbow method 47 , with three categories emerging as the most parsimonious clustering for each chain (Supplementary Information). We then calculated a grammatical structure index (G, a modified measure for entropy that is comparable with previous studies 41 ) for each participant.
The decrease in imitation errors (E) is equivalent to an increase in learnability/imitation fidelity. We calculated this as the (edit) time distance between two drum patterns; that is, the total cost of the minimal cost set of substitutions, insertions or deletions among IOIs necessary to transform the pattern of durations a participant has heard into the pattern they have reproduced, where the edit costs are taken to be the absolute difference in time between durations 43 . The time distance between identical patterns equals zero. Notice that, unlike other metrics in musicology that assume beat induction or metrical hierarchies 48,49 , this edit distance minimizes assumptions about metrical, top-down processing.
Data analysis was performed in R, Stata 11.0 and using custom written Python scripts. All scripts are available from the corresponding author.
The data that support the findings of this study are available for download as supplementary material.
How to cite this article: Ravignani, A, Delgado, T. & Kirby, S. Musical evolution in the lab exhibits rhythmic universals. Nat. Hum. Behav. 1, 0007 (2016).
A.R. was supported by Fonds Wetenschappelijk Onderzoek Vlaanderen grant no. V439315N, and European Research Council (ERC) grant (283435 ABACUS, to B. de Boer). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank P. Filippi, B. Thompson, B. de Boer, H. Little, S. van der Ham, N. Chr. Hansen, J. Iversen, D. Bowling, T. Grossi, A.C. Miralles, P. Norton, V. Spinosa, Y.-H. Su, P. Tinits and K. Smith, as well as all members of the Centre for Language Evolution (Edinburgh), AI-Lab (VUB Brussels), Biolinguistics (Barcelona) and attendants of Evolang XI, IBAC XXV, Statistical Learning 2015 and the DZG Graduate Meeting 2016 for their comments and advice.
The patterns heard and produced by each participant are organized and sorted by number of experimental chain, number of drumming pattern, and participant number (equivalent to generation number) within an experimental transmission chain.