Statistical Evolutionary Laws in Music Styles

If a cultural feature is transmitted over generations and exposed to stochastic selection when spreading in a population, its evolution may be governed by statistical laws and be partly predictable, as in the case of genetic evolution. Music exhibits steady changes of styles over time, with new characteristics developing from traditions. Recent studies have found trends in the evolution of music styles, but little is known about their relations to the evolution theory. Here we analyze Western classical music data and find statistical evolutionary laws. For example, distributions of the frequencies of some rare musical events (e.g. dissonant intervals) exhibit steady increase in the mean and standard deviation as well as constancy of their ratio. We then study an evolutionary model where creators learn their data-generation models from past data and generate new data that will be socially selected by evaluators according to the content dissimilarity (novelty) and style conformity (typicality) with respect to the past data. The model reproduces the observed statistical laws and can make non-trivial predictions for the evolution of independent musical features. In addition, the same model with different parameterization can predict the evolution of Japanese enka music, which is developed in a different society and has a qualitatively different tendency of evolution. Our results suggest that the evolution of musical styles can partly be explained and predicted by the evolutionary model incorporating statistical learning, which can be important for other cultures and future music technologies.


Analysis on the Classical Music Data
We can analyze frequencies of non-diatonic motions in the same way as for frequencies of tritones ( Fig. 1 in the main text). The result is shown in Fig. 1. We can find the same statistical tendencies that are found for the frequencies of tritones, even though they are less clear: • Beta-like distribution of frequency features • Steady increase of the mean and standard deviation • Nearly constant ratio of the mean and standard deviation We used the birth year of the composer as the reference time of each musical piece in Fig. 1 in the main text and Fig. 1 in this Supplemental Material. This is because the composition year for each individual piece is not given in the dataset used. Alternatively, if we use as the reference time the death year, the middle year (defined as the average of the birth and death years), and the active year (defined as the birth year plus 35 years to represent the active time of the composer's career) of the corresponding composer, we obtain similar results apart from shifts in time.

Analysis of the SCE models for other distributions
In the main text, we analyze the SCE model for the beta distribution. Here we analyze the SCE models defined with the gamma and log-normal distributions to show the generality of the model analysis result, especially the existence of a slow manifold when the novelty term is active. A gamma distribution is defined as and the parameters a t , b t > 0 are related to the mean and standard deviation as A log-normal distribution is defined as and the parametersμ t ∈ (0, ∞) andσ t > 0 are related to the mean and standard deviation as Both of these probability distributions are defined in the range θ ∈ (0, ∞), so they are not strictly proper for the probability parameter θ restricted in the range (0, 1). Nevertheless, when the mean µ t is smaller than unity and the standard deviation is sufficiently small, the supports of these distributions are effectively bounded in the range (0, 1). We study these distributions for the demonstration purpose.
The gamma, log-normal, and beta distributions are compared in Fig. 3, where distributions with the same mean and standard deviation (µ = 0.1 and σ = 0.05) are shown. As we see in the figure, the shapes of these three distributions are generally similar for a small mean and for a standard deviation smaller than the mean. SCE models for the gamma and log-normal distributions are defined by substituting Eqs. (1) and (3) into Eq. (4) in the main text, respectively. We can conduct numerical analyses similarly as in the main text. Focusing on the case β N > 0 and β T = 0, results of numerical analyses are shown in Figs. 4 and 5. From the results in the figures and with the same argument as in the main text, one can see a slow manifold in which σ t /µ t is kept almost constant and µ t grows nearly exponentially, similarly as in the case of the beta distribution.

Additional Comparison between the SCE Model and the Log-Potential Model
In Figs. 4(a) and 4(b) of the main text, we compared how the SCE model and the logpotential model can fit the real data of classical music. There, the model parameters were optimized to best fit the two sets of data (frequencies of tritones and non-diatonic motions).
Here we report the results when the model parameters are fitted to the two sets of data individually.
The results are shown in Figs. 6(a) and 6(b). The root mean squared errors of the (tritone, non-diatonic motion) data are (4.4 × 10 −3 , 7.2 × 10 −3 ) for the SCE model and (3.0 × 10 −3 , 5.8 × 10 −3 ) for the log-potential model. Compared to the results in the main text, these results show that for the SCE model the precision of the individual fit is similar to that of the simultaneous fit, and that the log-potential model can fit individual data slightly better than the SCE model. This shows that although the log-potential model is flexible for fitting individual sets of data, it cannot fit both sets of data simultaneously, confirming that it is not trivial to fit both sets of data in a unified manner.
Mean µ t 1 10 −1 10 −2  Bold lines indicate means, and thin lines and shadow indicate the ranges of ±1 standard deviation. Model parameters are optimized individually to fit the two datasets to minimize the squared error of predicted means and standard deviations (optimal parameters are shown in the insets).