Introduction

Contemporary cognitive and evolutionary musicology faces great controversies in attempting to identify the cognitive function of music and its evolutionary origins1,2,3,4,5,6,7. Arguments about the power of music over human psyche began long ago. Aristotle8 listed the power of music among the unsolved problems alongside finiteness of the world and existence of God. Kant9 was able to explain the epistemology of the beautiful by relating it to knowledge, but could not explain music: “it merely plays with senses.” Darwin10 noted that the human ability for music “must be ranked amongst the most mysterious with which (man) is endowed” because music is a human cultural universal that appears to serve no obvious adaptive purpose.

Among current evolutionary psychologists and musicologists, some consider that music plays no adaptive role in human evolution. So following Kant, Pinker11 has argued that music is an “auditory cheesecake,” a byproduct of natural selection that just happened to “tickle the sensitive spots.” While other contemporary scientists12,13 suggested that music clearly has an evolutionary role, pointed to music's universality and continued developing Darwin's idea on sexual selection of music, a review14 about kin selection and other biological mechanisms for music evolution concluded that “no one selective force (e.g., sexual selection) is adequate to explaining all aspects of human music.” In 2008, Nature published a series of essays on music5, the authors however could not agree on the evolutionary origin or cognitive function of music.

Meanwhile, potential cognitive benefits of music have been explored in the context of the so called “Mozart effect1”. This is a short-term improvement on “spatial-temporal reasoning.” The finding has been hyped by the media so that many scientists conducted experiments to verify its validity. A short-term effect of any improvement was illustrated though specificity to Mozart and music was questioned15,16. Subsequently, an experimental brain activation study demonstrated that the “Mozart effect” can be accounted for by controlling for mood and arousal17.

Recently we have presented experimental evidence18,19 supporting a hypothesis that music has a fundamental cognitive function: to help to mitigate cognitive dissonance. Cognitive dissonance is a discomfort caused by holding conflicting cognitions simultaneously20. It usually leads to devaluation and discarding of conflicting knowledge21,22. This theory is among the most influential and extensively studied theories in psychology. It is intimately connected to the entirety of human evolution. At the dawn of human evolution, the emergence of language led to the proliferation of cognitive dissonances23,24. If they had not been overcome, language and knowledge would have been discarded and further human evolution would have been stopped in its tracks. This is why the ability of music to mitigate cognitive dissonance could be fundamental for musical cognitive function and music evolution25. Our attempts to experimentally confirm this hypothesis were undertaken using a classical paradigm26 known to create cognitive dissonance. Exposure to Mozart's music enabled participants to overcome the cognitive dissonance. This tentatively supports the argument that music has a fundamental cognitive function, which defines its evolutionary role and leads to music's universality18,19,20,21. Additional experimental evidence27 confirmed that music can help overcoming cognitive dissonance in the context of academic tests. The present study was conducted to pursue this issue further.

Here we evaluated the effects of music on cognitive interference. To create cognitive interference, we used a prototypical task known as the “Stroop interference task28”, which requires a person to respond to a specific dimension of a stimulus while suppressing a competing stimulus dimension. In the task, typically, a colour word such as GREEN appears in an ink colour such as red. If the participant's task is to read the word and ignore the colour (e.g. say “green”), there is no evidence of difficulty reading the word compared to reading it when printed in standard black ink. However, if the participant's task is to name the ink colour and ignore the word (e.g. say “red”), there is considerable difficulty relative to reading a colour patch. Reading the word interferes with naming the colour, but the colour does not interfere with reading the word. This is the phenomenon of Stroop interference.

We have conducted the present experiment on the basis of the hypothesis that the magnitude of such cognitive interference would be reduced when a person is tested with exposure to more consonant intervals than dissonant intervals. Our hypotheses have been, first, that music with more consonant intervals would reduce cognitive interference relative to that when the same person was tested without exposure to any music. Our second hypothesis has been that music with more dissonant intervals would increase cognitive interference relative to that when the same person was tested without exposure to any music.

We used a modified version of Stroop interference task known as the colour-word matching Stroop task29. Children aged 8- to 9-years-old and elderly adults aged 65 to 75 years old participated in the present experiment. They were asked to name the ink of a colour word that designated a colour incongruent with that of the ink of the word (‘Incongruent test session’). Also, the same participants were tested in a ‘Neutral testing session', in which they were asked to name the ink of a colour of a non-word string of letters, i.e., XXX. Both sessions were repeated under three conditions: (1) with exposure to music containing predominantly consonant intervals (Consonant condition), (2) with exposure to music containing predominantly dissonant intervals (Dissonant condition) and (3) without exposure to any music (Control condition). In every session and condition, the performance of the participants was measured as reaction time (RT) to response and error rate (ER) of the response.

The consonant music used was the original version of one of Mozart's minuets, most of which consisted of consonant intervals. For dissonant music we used a modified version of the minuet, most of which consisted of dissonant intervals. The same auditory stimuli were used in previous research30,31,32 that revealed perceptual preferences for consonance over dissonance in music in young infants and newborns.

Results

Figure 1 shows the performance results of the children assessed using RT. These results were analyzed using repeated measures of analysis of variance (ANOVA) with the following two factors: Incongruent vs. Neutral test sessions and Consonant or Dissonant vs. Control conditions. This analysis demonstrated a significant main effect for test (Incongruent vs. Neutral test sessions, F(1,24) = 407.02, P < 0.001). The main effect of condition was also significant (Consonant or Dissonant vs. Control conditions, F(2,48) = 58.59, P < 0.001). There was a significant interaction between the two factors (F(2,48) = 3.69, P = 0.024). Post-hoc comparisons using Tukey's Honestly Significant Difference (HSD) tests revealed that under each of the three conditions, RTs were significantly longer for Incongruent session vs. Neutral session (Ps < 0.001). Moreover, the mean RTs for Incongruent sessions were significantly shorter under the Consonant condition than under the Dissonant or Control conditions (Ps < 0.001), confirming our hypothesis. Similarly, the RT under the Dissonant condition was significantly longer than that under the Control condition (P = 0.032). On the other hand, in Neutral sessions, the mean RTs did not differ among the three conditions (Ps > 0.711). All these results confirmed our expectations: the Stroop effect results in cognitive interference as expected and our hypotheses were confirmed: consonant music helps to overcome cognitive interference and dissonant music increases the interference.

Figure 1
figure 1

Experiments.

Mean reaction time (RTs) of the participantchildren for the Stroop task (in ‘Incongruent’ sessions) with exposure to the original (consonant) version of one of Mozart's minuets (Consonant condition; Exposure to Consonance), the modified (dissonant) version of the minuet (Dissonant condition; Exposure to Dissonance); and without exposure to any music (Control) and RTs of the participants under these three conditions in ‘Neutral’ testing sessions. Error bars represent SDs.

The results obtained using ER in the children are presented in Figure 2 and are similar to the results obtained using RT, confirming the effects seen above. There was a significant main effect for test (F(1,24) = 480.89, P < 0.001). Main effect of condition was also significant (F(2,48) = 67.46, P < 0.001). There was a significant interaction between the two factors (F(2,48) = 24.45, P < 0.001). Under each of the three conditions, ERs recorded from the participants were significantly greater during Incongruent sessions than during Neutral sessions (Ps < 0.001). The mean score for Incongruent sessions was significantly smaller under Consonant condition than under Dissonant condition or Control condition (Ps < 0.001). The score under Dissonant condition was significantly greater than that under Control condition, too (P = 0.028). For Neutral sessions, the mean ERs did not differ among the three tests (Ps > 0.752). All of these results give additional confirmation of our hypotheses.

Figure 2
figure 2

Experiments.

Mean error rate (ERs) of the participant children for the Stroop task (in the ‘Incongruent’ testing session) with exposure to the original (consonant) version of a Mozart minuet (Consonance condition; Exposure to Consonance), with exposure to the modified (dissonant) version of the minuet (Dissonant condition; Exposure to Dissonance) and without exposure to any music (Control) and ERs of the participants under the three conditions in the ‘Neutral’ testing session. Error bars represent SDs.

The results of the experiment with the elderly adults were strikingly similar to those with the children. Figure 3 illustrates the performance results using RT. Significant main effects were found for test (Incongruent vs. Neutral test sessions, F(1,24) = 465.93, P < 0.001) as well as for condition (Consonant or Dissonant vs. Control conditions, F(2,48) = 73.08, P < 0.001). There was also a significant interaction between the two factors (F(2,48) = 29.59, P < 0.001). Under each of the three conditions RTs were significantly greater for Incongruent session vs. Neutral session (Ps < 0.001). Moreover, the mean scores for Incongruent session were significantly smaller under Consonant condition than under Dissonant or Control conditions (Ps < 0.001). The score under Dissonant condition was significantly greater than that under Control condition (P = 0.017). In contrast, in Neutral sessions the mean RTs did not differ among the three conditions (Ps > 0.841).

Figure 3
figure 3

Experiments.

Mean reaction time (RTs) of the participant elderly adults for the Stroop task (in ‘Incongruent’ sessions) with exposure to the original (consonant) version of a Mozart's minuet (Consonant condition; Exposure to Consonance), the modified (dissonant) version of the minuet (Dissonant condition; Exposure to Dissonance); and without exposure to any music (Control) and RTs of the participants under the three conditions in ‘Neutral’ testing session. Error bars represent SDs.

The results for ER in the elderly adults are presented in Figure 4. There was a significant main effect for test (F(1,24) = 510.47, P < 0.001). The main effect of condition was also significant (F(2,48) = 82.49, P < 0.001). There was a significant interaction between the two factors (F(2,48) = 27.65, P < 0.001). Under each of the three conditions, ERs recorded for the participants were significantly greater during Incongruent sessions than during Neutral sessions (Ps < 0.001). The mean score for Incongruent sessions was significantly smaller under Consonant condition than under Dissonant condition or Control condition (Ps < 0.001). The score under Dissonant condition was significantly greater than that under Control condition, too (P = 0.022). For Neutral sessions the mean ERs did not differ among the three tests (Ps > 0.863).

Figure 4
figure 4

Experiments.

Mean error rate (ERs) of the participant elderly adults for the Stroop task (in ‘Incongruent’ testing session) with exposure to the original (consonant) version of a Mozart's minuet (Consonance condition; Exposure to Consonance), with exposure to the modified (dissonant) version of the minuet (Dissonant condition; Exposure to Dissonance) and without exposure to any music (Control) and ERs of the participants under the three conditions in ‘Neutral’ testing session. Error bars represent SDs.

Discussion

As reported previously29,33,34, the colour-word matching Stroop task produced an expected interference effect in the present experiment, which remained significant during the entire test. The observed results are similar to traditionally reported Stroop effects and the main sources of difficulty are similar: cognitive interference. The new result corresponding to our hypothesis is that the magnitude of such interference was reduced when the participants were exposed to a Mozart minuet with primarily consonant intervals and the interference was increased when the participants were exposed to a modified minuet with primarily dissonant intervals. We conclude that consonant music may have an important cognitive function: help overcoming cognitive interference. Together with other recent experimental and theoretical publications18,19,23,24,25 this gives tentative support for our hypothesis about the fundamental cognitive function of music: it helps to resolve cognitive interference, cognitive dissonance and facilitates human evolution.

Another issue addressed in this paper is the role of consonant vs. dissonant music and their relations to pleasure of music. It is known that infants and even newborns exhibit strong perceptual preferences for the original minuet containing mostly consonant intervals over its modified dissonant version29,30,31,35. Consonant intervals are prevalent in most works of music36,37. Our results therefore support recent findings that the effect of music on cognitive dissonance depends on the hedonicity of music: pleasant music better helps to overcome cognitive dissonance than unpleasant music38.

Drawing conclusions about the connection between musical consonance and hedonicity requires caution since dissonant and sad music could also be sources of pleasure. Now we will discuss possible cognitive functions of musical dissonance, an unresolved issue in psychology and musicology35,36,37. We demonstrated that the modified version of the minuet containing more musical dissonance increased the cognitive interference. This result might be related to the findings of a recent non-invasive brain activation study17 whose aim was to debunk the popular version of the ‘Mozart effect’ and to demonstrate that any improvement on cognitive tests after listening to Mozart is not specific to music. Those authors reported that whereas Mozart's music results in some improvements of cognitive test scores, Adagio by Albinoni (sad, slow music) results in lower scores on the same cognitive tests; they demonstrated that mood and arousal may account for the ‘Mozart effect,’ but did not address why the Albinoni adagio is among the most popular pieces of music.

So what could be the cognitive function of musical dissonance and of music per se and evolutionary reasons for music evolution? Our original hypothesis suggested that one cognitive function of music is to overcome a large number of cognitive dissonances between virtually any two cognitions24,38. These include stress that arises in many complicated and difficult life conditions much more difficult and trying than those evoked by the Stroop effect.

A specific aspect of this question is why sad music is pleasurable39. One of the most popular pieces of western classical music is Adagio by Barber, which is sad, slow and highly dissonant, like Adagio by Albinoni. According to our hypothesis19,23,24,25,38, sad music helps to overcome dissonance arising from difficult life conditions, including the ultimately death of close people (the dissonance between the feeling of the infinity of the spirit and the knowledge of death). In general, any two (or more) cognitions involve a cognitive dissonance24,25,38. Possibly, the cognitive dissonance between any two cognitions involves its own shade of emotion and overcoming each cognitive dissonance requires a special musical emotion (we do not differentiate here between emotion and mood39). This hypothesis implies a potentially large number of musical emotions and also of cognitive dissonances and interferences. Music evolved for helping to overcome the predicament of stress that arises from holding contradictory cognitions, so that knowledge is not discarded, but rather can be accumulated and human culture can evolve25. Our experimental results18,19 emphasize a need for further research studying multiple emotions and for determining the dimensionality of these emotional spaces. This problem has not been solved and the current paper reports a step in this direction. The consonance-dissonance dimension explored here is related to hedonicity (pleasure or displeasure) perceived in music38; however, the potential pleasure from sad dissonant music makes this connection nontrivial. Possibly music is perceived as pleasant if it resolves cognitive dissonances and interferences important for a listener. Music pleasant for many people resolves dissonances and interferences important for many of us.

While the non-invasive brain activation study mentioned above17 postulated that the “Mozart effect” is equivalent to mood and arousal, we have not explored differences between moods and emotions here mainly because we hypothesize, as discussed above, relationships between musical emotions and cognitive dissonances should be much more complicated than that17 had been assumed without addressing a fundamental question of why the Adagio by Albinoni is among most popular pieces of western music. In view of our hypothesis about relationships between musical emotions and cognitive dissonances, differences among emotions and mood are certainly worth discussing to allow an understanding the differences and similarities between (various) work in this field. Meanwhile we admit that the issue remains a part of much more complicated problem that would have to be explored in future research.

The Stroop task is well known as an effective test for examining the cognitive processes of inhibition and interference resolution40. In this test, interference occurs at the conceptual level and is separated from the response preparation. Performance in the test continues to develop up to approximately 17–19 years of age and thereafter is likely to decline gradually as aging proceeds41,42. This developmental pattern of changes likely reflects underlying changes in the brain, which are profoundly related to executive function and prefrontal brain activation40,42,43. The current study presents experimental evidence for the modulation of executive function by music. It seems that consonant music enhances the inhibition function of the executive control while dissonant music might exert a disinhibition influence “emancipating” the person from the control of the executive function. Testing this hypothesis would also require further research.

Methods

This investigation was conducted according to the principles expressed in the Declaration of Helsinki. All experimental protocols are consistent with the Guide for the Experimentation with Humans and were approved by the Institutional Ethical Committee of Primate Research Institute, Kyoto University.

Participants

As participants, we recruited 25 typically developing healthy 8- to 9-year-old boys from several elementary schools and 25 healthy 65- to 75-year-old elderly adults from temporary employment agencies in Kyoto and Aichi prefectures, Japan. All were right-handed and had been exposed to the Japanese language as their first language. They were not using any medication that would influence performance on the experimental task. Normal cognitive status was verified in the elderly adults through prescreening at the time of evaluation (Mini-Mental State Examination)44,45. We obtained written informed consent from the parents of each of the participant children as well as from each of the participant elderly adults involved in our study. The experimental room was a sound-attenuated playroom (3.5 m × 5.5 m) familiar to all of the participants. It contained a one-way observation mirror, a chair and a table. A 22-inch monitor connected with a personal computer was placed on the table. A ceiling speaker connected with an audio player was installed in the ceiling of the room, just above the table.

Procedure

An adapted single trial version of the colour-word matching Stroop task29,33,34 was used in the present experiment. Participants were told by an experimenter, who had not been notified about the purpose of the present experiment, that they would see two rows of letters appear on the screen of the monitor on the table and were instructed to decide, via button-press, if the colour of the top row letters corresponded to the colour name written at the bottom row. The index (YES-response) and middle finger (NO-response) of the right hand were used to respond. During trials in “Neutral testing sessions”, the letters in the top row were “XXX” printed in red, green, blue, or yellow and the bottom row consisted of the colour words “RED,” “GREEN,” “BLUE,” and “YELLOW” printed in the Japanese language in black. For trials in “Incongruent testing sessions”, the top row consisted of the colour words “RED,” “GREEN,” “BLUE,” and “YELLOW” printed in the Japanese language in an incongruent colour (e.g., “green” printed in red) in order to produce an interference between the colour word and colour name. In order to prevent participants from focusing on the lower word and blurring out the top word, the top word was presented 100 ms before the lower word. By this, visual attention is shifted automatically to the top word. The participants decided in all conditions if the colour name of the top row corresponded with the colour word of the bottom row. The meaning of letters or words (e.g., “XXX” or “GREEN”) was task irrelevant.

A testing session, whether a Neutral one or an Incongruent one, consisted of 40 trials in random order with an interstimulus interval of 12 s. In half of the 40 trials in an Incongruent testing session, the colour of the top row letters was congruent with the colour name written at the bottom row and both were incongruent with one another in the other half of the trials. We excluded congruent trials from the analysis as in previous research40.

In all, each participant was subjected to a Neutral testing session and to an Incongruent testing session, respectively, three times; once with exposure to the original version of Mozart's minuet (Consonant condition), once with exposure to the modified version of the minuet (Dissonant condition) and once without exposure to any music (Control condition). The order of the total 6 sessions was randomized. Each testing session was conducted for each participant with an interval of 7 days. In each testing session, the experimenter led each participant into the experimental room, closed the door and remained together with the participant until the end of the session. When the participant was subjected to the test under Consonant condition or Dissonant condition, the experimenter switched on the audio player as she was entering the room so that the original version or the modified version of the minuet was played, respectively (the sound pressure level: 65 dB). The music continued to be played repeatedly until the experimenter switched off the player as the testing session finished.

The minuet was a simple one in C major by Mozart, K. # 1f. It was essentially the same as used previously31,32,33. Both the original and the modified versions were digitally generated and created by piano timbre. They were made up of 60 intervals. In the original version, only three of them were dissonant and all three were tritons (6-semitone intervals). In the modified version, all Gs were changed to F#s and all Ds to C#s. This had the effect of creating 21 additional dissonant intervals, including a total of 12 of the two most-dissonant intervals, i.e. seven tritons and five minor ninths (13 semitones). In the present stimuli, the upper voice and the lower voice were separated by more than an octave in each interval. The tempo was identical across the two versions (120 quarter per min).