Music history is riddled with debates on attribution. Did Andrea Luchesi compose many of the symphonies currently attributed to Mozart? Is the score of L'Incoronazione di Poppea Monteverdi's, or the collaborative work of several editors during its early performances across Italy? Did Johann Sebastian Bach really write the chorale Nun ist das Heil und die Kraft, the original score of which has never been found?

Credit: D. PARKINS

Statistical analysis may help to resolve such long-standing controversies, as it has proved successful in linguistic texts. It may also allow electronic databases to automatically classify musical style and period. And it promises much more — to help us understand some of the most elusive qualities of music, their connection to its organizational structure and to the cognitive processes involved in both the composition and perception of music. Eventually, statistics may also allow us to identify a quantifiable signature of complexity in music.

The composer Arnold Schoenberg summarized the fundamental principles of musical form as “the demand for repetition of pleasant stimuli, and the opposing desire for variety, for change”. Repetition of melodic motifs, rhythmic patterns and harmonic progressions makes musical structure coherent and forms the basis of its comprehensibility. Variation, in turn, keeps monotony and dullness at bay.

This delicate balance — somewhere between the uniform ticking of a clock and the random pitter-patter of raindrops — is reminiscent of complex systems, in which an intermediate degree of internal organization maintains coherence, yet allows for rich dynamics and functional flexibility. The melodic and rhythmic patterns of even the simplest folk tune reveal the complexity of the creative process, and of the system behind it — the human brain. By probing the structural texture of music and the recurrence and diversity of elements such as notes, rhythms, melodies and chords, statistical techniques provide a way to penetrate the nature of mind.

Word play

In the 1930s, the American philologist George Zipf discovered a strong regularity in the relative frequencies of word occurrence in speeches and texts. Now called Zipf's law, this rule applies to many different authors, styles and languages. If, for instance, the tenth most used word in a text occurs 300 times, Zipf's law predicts that the hundredth most used word will appear some 30 times.

In 1955, social scientist Herbert Simon pointed out that Zipf's law can be quantitatively explained by assuming that the usage frequency of a word increases proportionally to its previous appearances — the more you use a word, the more you will use it. This very simple rule was enough for Simon to derive Zipf's law as the inverse relation between the number of occurrences of a word and its rank in frequency of use.

Since the late 1980s, several researchers have shown Zipf's law also holds for musical elements within pieces. Words are replaced by notes, defined by pitch and duration, or by composite items such as note duplets and triplets, interval successions and chords. This suggests a strong affinity between the processes of writing text and composing music.

Simon's model for the relative frequency of words in a text can be interpreted as representing the progression of the author's choices during the creative process that shape the work's intelligibility. In language these choices are grammatical, morphological and semantic. In music they are melodic, harmonic, rhythmic and dynamic.

Music as message

Literary texts and musical compositions are created as organic entities, not series of isolated decisions. Nevertheless, the outcome is an ordered sequence of events conveying information: a message. As the message flows, a context emerges, favouring the appearance of some elements at the expense of others. From this viewpoint, Simon's model for Zipf's law unifies the concept of context in both language and music.

We can also distinguish the choices made by composers from the different forms that Zipf's law takes in their music. The law quantifies the difference, say, between the intentional lack of tonal context in Schoenberg's pieces, and Bach or Mozart's more consistent, less flexible use of tonal elements. Yet intriguingly, serialism, the technique Schoenberg, Berg, Webern and others used to write music without tonality, is still based on the principles of repetition and variation.

Zipf's law is not the be-all and end-all of the statistical characterization of musical structure — for one thing, it would still hold if all the notes of a composition were shuffled and rearranged at random. Happily, information theory provides other ways to analyse the organization of symbols in a sequence.

Segmentation, for instance, can be used to detect portions of a sequence, such as a musical score, that differ as much as possible in the frequencies of different symbols. It proceeds in steps, first dividing the whole sequence into two segments with maximal difference, and then iterating the algorithm on the resulting segments. The product is a dissection of the sequence into domains which are maximally divergent — the relative frequency of symbols differs as much as possible between the resulting domains.

Statistical techniques provide a way to penetrate the nature of mind.

Segmentation was recently used to analyse the first movement of Mozart's keyboard sonata in C major (K. 545), as a sequence formed by the twelve tones of the chromatic scale. The analysis revealed the same tonality changes spotted by humans trained in musical analysis. In 1997, psychologist Carol Krumhansl of Cornell University in Ithaca, New York, demonstrated that non-specialist listeners also spot modulation between different tonalities when asked to divide Mozart's keyboard sonata in E-flat major (K. 282) into sections with different perceived musical qualities.

These preliminary results suggest that such statistical tools, which can be automated for large-scale computational application, can reveal the same structural features as ordinary methods of musical analysis. They might also unveil evidence of hitherto undetected organizational levels and patterns.

Segmentation can also be applied to combinations of pitch and duration, dynamics, intervals and chords. Analysing these more complex items may reveal patterns related to the richer cognitive qualities of music, such as melodic inflections and rhythmic change, which listeners associate with the unfolding of a piece's mood. This has its limits. As a collection of symbols becomes larger and more sophisticated, each element's frequency decreases. When each symbol appears too few times to be statistically significant, a meaningful message becomes indistinguishable from a random sequence.

Complex futures

Since 1996, physicist Pedro Bernaola-Galván of the University of Málaga, Spain, and his collaborators have applied segmentation analysis to DNA sequences to study the origin and significance of long-range nucleotide patterns. The resulting segments show large variations in length, a feature that has been related to the slowly decaying probability that two nuleotides of the same type are found at a certain distance in the genetic sequence. Galván's group has suggested that a broad distribution of segment lengths may be a signature of complexity for symbolic sequences. Both random and periodic sequences, such as raindrops' pattering and clocks' ticking, show little variation in segment lengths. The information-carrying DNA sequence, on the other hand, is characterized by a long-tailed distribution.

Mathematicians should investigate whether segmentation of long linguistic and musical sequences also gives broad length distributions. This would enable us to compare the degrees of complexity in language, music and the genetic code, disclosing structural similarities and differences between these forms of communication. A quantification of complexity in music would also allow us to identify the structural elements underlying different periods and styles. But for the time being, the quest to define a unified complexity measure continues.

Statistical analysis seems to be at odds with traditional ways of thinking about art. These — unlike mathematics — emphasize aesthetic nuances, psychological and experiential qualities and personal values. Indeed, how we integrate and elaborate sensory information into artistic experience may always be beyond quantitative description. Nonetheless, quantitative methods can tell us much about artistic creation — notably, about the organization of an artwork's many strands into a comprehensible structure.

Further reading

Bernaola–Galván, P., Román–Roldán, R. & Oliver, J. L. Phys. Rev E 53, 5181–5189 (1996).

Krumhansl, C. L. Music Percept. 13, 401–432 (1996).

Patel A. D. Nature Neurosci. 6, 674–681 (2003).

Schoenberg, A. Theory of Harmony (Faber & Faber, London, 1978).

Zanette, D. H. Complex Syst. 17, 279–293 (2007).

See other essays in the Science & Music series at http://www.nature.com/nature/focus/scienceandmusic .