Brain networks underlying aesthetic appreciation as modulated by interaction of the spectral and temporal organisations of music

Music is organised both spectrally and temporally, determining musical structures such as musical scale, harmony, and sequential rules in chord progressions. A number of human neuroimaging studies investigated neural processes associated with emotional responses to music investigating the influence of musical valence (pleasantness/unpleasantness) comparing the response to music and unpleasantly manipulated counterparts where harmony and sequential rules were varied. Interactions between the previously applied alterations to harmony and sequential rules of the music in terms of emotional experience and corresponding neural activities have not been systematically studied although such interactions are at the core of how music affects the listener. The current study investigates the interaction between such alterations in harmony and sequential rules by using data sets from two functional magnetic resonance imaging (fMRI) experiments. While replicating the previous findings, we found a significant interaction between the spectral and temporal alterations in the fronto-limbic system, including the ventromedial prefrontal cortex (vmPFC), nucleus accumbens, caudate nucleus, and putamen. We further revealed that the functional connectivity between the vmPFC and the right inferior frontal gyrus (IFG) was reduced when listening to excerpts with alterations in both domains compared to the original music. As it has been suggested that the vmPFC operates as a pivotal point that mediates between the limbic system and the frontal cortex in reward-related processing, we propose that this fronto-limbic interaction might be related to the involvement of cognitive processes in the emotional appreciation of music.

in the ventral striatum and the orbitofrontal cortex 17 , which seems to support the notion that a major underlying mechanism of music-induced pleasure is based on tension that is built up by violation of expectation and its prolonged fulfilment 18,19 .
While a number of previous studies used disruption in either a spectral or temporal structure to create unpleasant counterparts to investigate music-induced emotions, the interaction between these musical structures remains largely unknown. If aesthetical appreciation of music were indeed a holistic process as has been suggested 6 , one would expect that the processing of both structures is closely related and that disrupting the processing of one would disrupt the processing of the other. In other words, a disruption of spectral structure for example would alter neural responses differently depending on the temporal structure. As suggested in previous publications 3,20 , it is probable that these spectral and temporal structures of music are integrated in association cortices (i.e. ventromedial prefrontal cortex [vmPFC] or IFG) corresponding to a type of "higher-order" emotional processing distinct from a more "lower-order" auditory signal processing along the auditory pathway (e.g. inferior colliculus or STG). To test this conjecture, we investigated an interaction between spectral and temporal structures in music using functional magnetic resonance imaging (fMRI) data from two human experiments that are different in MR sequences, mutually complementarily capturing brain activities related to music perception. We were especially interested in areas that showed interaction effects in integrating spectral and temporal dimensions in music.

Materials and Methods
Stimuli. Musical excerpts were extracted from instrumental music from the last four centuries, which have been or had been popular to general audience. Musical styles included classical (e.g. J. S. Bach), swing (e.g. Benny Goodman), and tango (e.g. Francisco Canaro) as used in previous studies 11,21 (see Supplementary Table S1 for a full list of excerpts). The musical excerpts were manipulated in the spectral structure (i.e. harmony) and the temporal structure (i.e. play direction) resulting in an orthogonal 2 × 2 factorial design. To alter the spectral structure, the original excerpt was transposed two semitones up (i.e. major second) and six semitones down (i.e. diminished fifth), and subsequently mixed together, resulting in added dissonant intervals throughout the excerpts that affect local harmony and tonal context. To alter the temporal structure, the excerpt was played backward. This resulted in changes in musical timbre, locally, and direction of chord progressions, more globally. All stimuli across conditions were controlled for loudness by equalizing the root-mean-square of the waveforms.
It is important to note that these physical manipulations were orthogonal in the sense that the change of spectral content (or temporal order) does not alter temporal order (or spectral content), but the results of manipulations are not musically independent. For example, the sequential rule of chord progression (e.g. frequent use of a tonic [the first chord of a diatonic scale] chord after a dominant [the fifth chord of a diatonic scale] at the end of a musical phrase) is based on tonal context (because it defines a tonic function of a triad [i.e. a major triad can be either a tonic, subdominant, or dominant of a major key]) as well as local harmony (because it forms simultaneous tones as a chord). Therefore, when the spectral manipulation makes the chords dissonant, expectation based on tonal context can be weakened, which would make the effect of reversal less salient.
It is also noteworthy that these manipulations do not "completely" abolish musical organisation but proportionally degrade it. For illustration, spectrograms of four versions of a representative musical excerpt by J. S. Bach, played by Glenn Gould, are shown in Fig. 1 (download Supplementary File S1 for an audio file of the example). The conditions are labelled as "forward-consonant" (FC), "backward-consonant" (BC), "forward-dissonant" (FD), "backward-dissonant" (BD) for the play direction and the dissonance level. In general, the "dissonant" conditions (Fig. 1c,d) show more unresolved spectral components (thus perceived as dissonant) compared to "consonant" conditions ( Fig. 1a,b). However, it is clear that the local spectral density in the dissonant excerpts fluctuates Magnetic resonance imaging sequences. Functional neuroimaging data were adopted from two of our fMRI experiments, where the same stimuli but different MR sequences were used, thus providing us with complementary views of the brain activities (see Table 1 for an overview of the datasets). The first data set ("Experiment I") has not been published elsewhere. The second data set ("Experiment II") was used in two of our previous papers that revealed certain aspects of music perception that are different from the focus of the current study 7,10 ; both studies addressed an investigation of the temporal dynamics of the ventral striatum, with respect to response to pleasant music 7 and inter-subject correlation between the inferior colliculus response and subjective disliking of dissonant harmony 10 . The key difference between the two experiments was a silent delay between acquisition of fMRI volumes. In Experiment I, an fMRI volume was taken every 12 s (2 s to take one volume), allowing a silent period of 10 s without acoustic scanner noise, which was used to present auditory stimuli in the absence of acoustic noise from the MR sequence. This kind of MR sequence is known as "(temporally) sparse" scanning 22 . In Experiment II, fMRI volumes were "continuously" taken every second. Thus, in this experiment, subjects listened to musical stimuli in the presence of acoustic scanner noise.
Contamination of auditory stimuli by the acoustic noise of the MRI sequence is a serious problem in auditory fMRI experiments. In principle, given the delay of canonical hemodynamic function, acoustic contamination would be minimal with a non-scanning interval of 8 s, particularly when studying the primary auditory cortex 23 . However, this is an inefficient way to acquire fMRI data in terms of the number of volumes per given experiment time. Shortening the duration of a silent delay may increase statistical power by the increased number of samples (i.e. volumes), although it may also increase the interference by the acoustic scanner noise. A technical report study systematically compared sparse, alternating, and continuous sampling and reported non-linear alteration of auditory processing due to the presence of scanning noise 24 . Moreover, there is evidence that the effect of scanner noise is not limited to the primary auditory cortex but also to non-primary auditory cortices when processing spoken language 25 .
While the acoustic scanner noise is an issue that is not to be dealt with lightly, recent studies demonstrated that tonotopy experiments using continuous scanning can be successful with phase-encoded stimuli (i.e. frequency sweeping) and modern sound delivery systems 26,27 . Moreover, fMRI data at a higher temporal resolution enable us to investigate dynamic aspects of brain responses, particularly to dynamically evolving stimuli such as music. In our previous report of Experiment II 28 , we showed that continuous scanning yielded comparable results as the sparse scanning: a number of structures, including the cortical limbic areas and striatal regions, involved emotional appreciation. Thus, we aimed to use advantages of both MR sequences: (1) a precise localisation of involved brain regions without scanner noise using sparse sampling data (i.e. Experiment I) and (2) investigation of temporal dynamics and functional connectivity of the identified brain regions using continuous sampling data (i.e. Experiment II).
In Experiment I, twenty-four axial slices of echo planar imaging (EPI) that cover the whole brain were acquired with an in-plane resolution of 3 × 3 mm 2 , a thickness of 4 mm, and an inter-slice gap of 1 mm, resulting in a resolution of 3 × 3 × 5 mm 3 . Functional and T1-weighted images (1 × 1 × 1 mm 3 ) were obtained using a 3-T Magnetom Tim Trio scanner (Siemens, Erlangen, Germany).
In Experiment II, fifteen axial slices of EPI that cover the ventral half of the brain were acquired with an in-plane resolution of 2.5 × 2.5 mm 2 , a thickness of 4 mm, and an inter-slice gap of 0.5 mm, resulting in a resolution of 2.5 × 2.5 × 4.5 mm 3 . EPI images were acquired using a 3-T MedSpec 30/100 scanner (Bruker, Ettlingen, Germany) and a birdcage head coil, and T1-weighted images at unit-mm isotropic resolution were acquired using a 3-T Magnetom Tim Trio scanner (Siemens, Erlangen, Germany). In Experiment I, one trial consisted of a silent period without scanning for 10 s and a scanning period for 2 s. The musical excerpts were presented during the silent period at pseudorandom timing (from 3.6 to 10 s with a step of 0.7 s before the acquisition of each volume) to sample different phases of the hemodynamic response to musical excerpts (namely, event-related design). Participants were instructed to press a button to rate subjective pleasantness of each excerpt (1 = very unpleasant, 2 = unpleasant, 3 = pleasant, 4 = very pleasant) during the 2-s scanning period. Twenty-five instrumental tunes were used to create 4 versions (FC, FD, BC, BD) and played twice, resulting in 25 × 4 × 2 = 200 trials.
In Experiment II, one trial consisted of a 30-s period for presentation of musical excerpts and a 6-s period for subjective rating of presented musical excerpts (36 s in total). Acoustic scanner noise was present throughout the whole experiment. Participants were instructed to press a button to rate subjective unpleasantness (1 = very pleasant, 2 = pleasant, 3 = unpleasant, 4 = very unpleasant) of each excerpt during the late 6-s period. Twenty instrumental tunes were played only once, resulting in 20 × 4 = 80 trials.
Participants. Overall, 39 healthy volunteers participated in either one of two experiments (n = 16, 14 females, mean age 25.8 ± 2.8 years in Experiment I; n = 23, 13 females, mean age 25.9 ± 2.9 years in Experiment II). The studies were conducted strictly following guidelines approved by the Ethics Committee of the University of Leipzig. Informed written consent was obtained before the fMRI experiments. One participant in Experiment II was studying for a Bachelor's degree in music, and other participants were students or professionals in non-musical fields. Some participants (four in Experiment I, sixteen in Experiment II) reported experience in playing musical instruments. However, the differences in the proportions of gender and musical experience were not statistically significant between the datasets (gender: Z = 0.29, p = 0.77; age: T(37) = −0.48, p = 0.64; musical experience: Z = 1.59, p = 0.11).
Image processing. Using SPM12 (v6225; Wellcome Trust Centre for Neuroimaging, University Colleague of London, London, UK) and MATLAB (v8.6, R2015b; MathWorks, Natick, Massachusetts, USA), anatomical and functional images were processed, including unwarping and realignment, unified segmentation, spatial normalisation, and spatial smoothing. Because of the different TRs, slice-timing correction was done only for Experiment II. Also, for the same reason, we used 6 rigid body motion parameters and their lengths (i.e. L2-norm) of temporal derivatives of translation and rotation, respectively (i.e. 8 regressors in total) to regress out head movement artefacts in Experiment II. We resampled the functional data at isotropic resolutions that are close to the original resolutions 29 : 3-mm isotropic resolution for Experiment I; 2.5-mm for Experiment II. Different smoothing kernel sizes (full width at half maximum [FWHM] of 5 mm for Experiment I; 4 mm for Experiment II) were chosen to approximately match the effective smoothness of the two data sets at an isotropic FWHM of 8 mm. See Table 1 for detailed parameters.

Functional activation analysis.
A subject-level autoregressive general linear model (GLM) was carried out by encoding onsets and durations of four conditions (i.e. FC, FD, BC, and BD) for both data sets. Effects of conditions were estimated after adjustment for non-sphericity of the functional data using SPM12. Although the TR of Experiment I was very long (i.e. 12 s), because the images were obtained at various post-stimulus-onset times www.nature.com/scientificreports www.nature.com/scientificreports/ (from 3.6 to 10 s; namely event-related design), modelling hemodynamics in Experiment I is a valid approach. Because we previously reported that the ventral striatal response attenuated over time in Experiment II 7 , we modelled a 30-s condition into three 10-s segments and used the first segment to compute contrasts to match Experiment I. High-pass filter cut-off was 1/128 Hz for both experiments.
We computed multiple contrasts to test various effects: a partial effect of dissonance when excerpts were forward (FD -FC) or backward (BD -BC), a partial effect of reversal when excerpts were consonant (BC -FC) or dissonant (BD -FD), a joint effect of dissonance and reversal (BD -FC), and an interaction between the dissonance and reversal: (BD -BC) -(FD -FC).
A group-level one-sample T-test was carried out on subject-level contrast images. The Gaussian assumption was tested by carrying out a Kolmogorov-Sminov test at each voxel with false discovery rate correction. Because all corrected p-value in the brain mask was one, we used parametric inferences with random field theory (RFT) 30 to control family-wise error rate (FWER) less than 0.05, as implemented in SPM12. The cluster-forming height-threshold was 0.001, and the extent-threshold was determined by the minimal extent of a cluster with a cluster-wise p-value less than 0.05, which was approximately 640 mm 3 (24 voxels in Experiment I; 44 voxels in Experiment II). As pointed out by Flandin and Friston 31 , a recent criticism on cluster-extent thresholding by Eklund, et al. 32 was based on results with liberal thresholds in height (uncorrected p = 0.01) and extent (80 mm 3 ; only 3 voxels at 3-mm isotropic resolution). In the replication with a stringent height-threshold (uncorrected p = 0.001) by Flandin and Friston 31 , resulting FWERs were between 4 and 6%, demonstrating that the cluster-wise thresholding using SPM does not critically inflate FWER when employed carefully. Thus, we used conservative thresholds in both height and extent in the present study.

Psychophysiological interaction analysis.
In the functional activation analysis, the vmPFC showed similar BOLD activation to the most pleasant and the most unpleasant stimuli. Since the region showed sensitivity to disruptions in either spectral or temporal organisation of music in the current data, it is possible that what was altered was the functional connectivity of the vmPFC instead of the local activity. To test this idea, an analysis of psychophysiological interaction (PPI) 33 was performed on the functional connectivity of the VMFPC using an SPM-based MATLAB toolbox for Generalized PPI 34 (https://www.nitrc.org/projects/gppi). It has been known that a correlation between two BOLD time series is highly sensitive to abrupt and simultaneous changes in image intensities over many voxels, unlike BOLD activation analysis. Head motions during scanning may induce such signal changes, leading to spurious correlation 35 . Thus, for the PPI analysis, we employed 6 "anatomical CompCor" regressors, which are eigenvariates extracted from white matter and cerebrospinal fluid voxels to model non-neural global fluctuation in BOLD time series 36 . Also, for reliable estimation of functional connectivity, we used the whole 30-s trial for the PPI analysis. For a group-level statistical inference, the RFT was also used to control FWER to be less than 0.05.

Results
Experiment I. Partial effects of disrupted musical structures. We found decreases in the BOLD signal due to the dissonance when the excerpts were played forwards (i.e. FD -FC; Fig. 3a) in a number of brain regions, including the bilateral superior temporal gyri (STGs) and planum temporale (PT), the right planum polare (PP), the left amygdala and nucleus accumbens (NAc), the bilateral putamina (Ptm) and globus pallidi (GPs), and the medial parts of thalami (significant clusters are listed in Table 2). We also found a positive effect in the right superior frontal gyrus (SFG). Interestingly, the partial effect of dissonance when the excerpts were played backwards (i.e. BD -BC; Fig. 3b) was only significant in the auditory cortices (i.e. decreased BOLD signal in the bilateral STGs and the right PP), but not in the limbic areas (i.e. NAc, Ptm, or GPs).
The reversal of play direction when the excerpts were consonant (i.e. BC -FC; Fig. 3c) was associated with decreases in the BOLD signal in the bilateral Ptm, GP, and anterior cingulate cortex (ACC). We also found an increase in the right SFG similar to the partial effect of dissonance when played forwards. Similar to the analysis above, the partial effect of reversal when dissonant (i.e. BD -FD; Fig. 3d) was different from that when consonant. For the contrast BD -FD, we found decreases in the BOLD signal in the right PT and the right lateral orbital frontal cortex (OFC), but no change in the BOLD signal in the auditory cortices (i.e. STG, PT, and PP). We also found increases in the BOLD signal in a number of cortical regions, including the bilateral angular gyri, right ACC, left middle frontal gyrus (MFG), and the left frontal pole (FP). See Fig. S1 for all slices over the whole brain.
The joint effect of disruptions in spectral and temporal domains (i.e. BD -FC; Fig. 3e) was found as decreased BOLD signals in the bilateral STGs, the left GP, the anterior part of the left amygdala, and the medial parts of the bilateral thalami. Compared to the partial effects of either dissonance or reversal alone (Fig. 3a,c), the joint effect was weaker (i.e. less decrease in the BOLD signal) in the limbic system (i.e. NAc, Ptm, GP) and stronger (i.e., more decrease) in the auditory cortices.
Importantly, we found that the partial effect of one domain was dependent on the other domain. By definition, this implies an interaction between the two domains. We further quantitatively tested this observation in the following section.
Effect of interaction between disrupted musical structures in spectral vs. temporal domains. We tested the interaction by subtracting the partial effect of dissonance when played backwards from that when played forwards; that is, (BD -BC) -(FD -FC). This is equivalent to the subtraction of the partial effect of reversal when dissonant from that when consonant because (BD -BC) -(FD -FC) = (BD -FD) -(BC -FC). We found a positive interaction in the ventromedial prefrontal cortex (vmPFC), ACC, and the subcortical limbic system including the NAc, the GPs, and thalami (Fig. 4a). This confirmed that the partial effect of a disruption in one domain was nullified when the other domain was disrupted in the cortical and subcortical limbic areas. In other words, a disruption in www.nature.com/scientificreports www.nature.com/scientificreports/ addition to a stimulus already with another disruption did not produce a further decrease in BOLD activation in the fronto-limbic areas. This was not the case in auditory regions.
To illustrate the GLM result in terms of effect size, beta coefficients averaged within each significant clusters are plotted in Fig. 4b-d. Interestingly, the positive interaction was so strong that the signs of marginal effects were flipped in the vmPFC and the bilateral striata. Indeed, the beta coefficients between the FC and BD conditions were not significantly different in all three clusters (max T(15) = 1.01; p > 0.10), which is surprising given the sensitivity to disrupted musical structures of the regions and the widely different acoustics and related emotional valances of conditions. We addressed this issue later in the analyses of PPI. Experiment II. Replication of the functional activation analysis. We analysed the Experiment II data set using the same processing pipeline except for the temporal processing (i.e. slice-timing correction and head motion covariates). The results from Experiment II were in good agreement with Experiment I, as shown in Fig. 5 and Table 3 (see Fig. S2 for all slices). Specifically, (1) deactivation in the bilateral STGs, vmPFC, and GPs due to dissonance alone (Fig. 5a,h), (2) selective deactivation only in the limbic area but not in the STGs due to reversal alone (Fig. 5c,j), (3) deactivation in the STGs due to joint disruption (Fig. 5e,j), and the positive interaction in the vmPFC (Fig. 5f,m). There were also noticeable differences between the experiments. However, significant differences between the experiments were mainly found in the auditory cortices (i.e., HG and PT), presumably related to the acoustic noise from gradient coils during continuous scanning, but not the left NAc in the interaction (see Fig. S3 for all slices).  www.nature.com/scientificreports www.nature.com/scientificreports/ Psychophysiological interaction. In the vmPFC, the effect size (i.e. beta coefficient) of the FC condition was not significantly different from that of the BD condition in the Experiment I (T(15) = 1.01; p > 0.10) and Experiment II (T(22) = 1.22, p = 0.24). As mentioned earlier, this may look incongruent because two conditions were widely different in terms of acoustics and subjective rating. It may look more puzzling given that the vmPFC showed a strong decrease in BOLD activation due to the disruption of either a spectral or temporal structure. However, it is known that the vmPFC is engaged in widely various cognitive sub-processes 37 . Recent studies demonstrated reconfiguration of functional networks of the vmPFC due to external inputs 38,39 and a level of arousal 40 . Given that, we suspected that the vmPFC might work similarly in terms of the activation level during the conditions of FC and BD but as a part of different functional networks. To test this idea, we analysed PPI between the vmPFC time series and the contrast between the FC and BD on the Experiment II data, which was acquired continuously.
The results of the PPI analysis are shown in Fig. 6. The physical factor (i.e. correlation with the vmPFC time series) was found in extensive areas of the frontal cortices, temporal cortices, and subcortical structures (Fig. 6a). This functional connectivity of the vmPFC was reduced by the BD condition compared to the FC condition in the right IFG and FP ( Fig. 6b; see Table 3g for statistics), supporting our conjecture on the vmPFC.

Behavioural measures.
In correspondence to the observed changes in BOLD signal, we also found significant effects of disruptions and their interactions in the pleasantness ratings during scanning, as shown in Fig. 7 and Table 4. Notably, most participants rated the partially disrupted conditions (i.e. FD and BC) as "unpleasant" (mean rating of 2) without any particular preference for either (min p = 0.270). The interaction was also positive, but unlike the BOLD signal, the direction of effect was not changed so that the jointly disrupted condition (i.e. BD) was not more preferred than the partially disrupted conditions (i.e. FD or BC).
Since it was confirmed that the interaction between spectral and temporal structures is significant in terms of both BOLD signal and behavioural response, we further explored if there is a correlation between individual differences in neural and behavioural effects, as recently demonstrated in our previous study 10 . In other words, we looked at if an individual showed a small (or large) effect of the interaction in the pleasantness ratings also exhibited a small (or large) effect of the interaction in the BOLD signal. We tested the inter-subject correlation between contrast coefficients of the BOLD signal and the behavioural response for the interaction, but no significant correlation was found in either experiment (min p = 0.676).

Discussion
In the current study, we found decreased BOLD signal in auditory and limbic systems (i.e. the bilateral STGs, vmPFC, NAc, Ptm, GP, amygdala, and thalami) due to partial disruption of both spectral and temporal organisation, which corresponded to decreased subjective ratings of pleasantness. These anatomical structures have been reported to be involved in the processing of music-induced emotions in the previous studies 3, 8,9,11,20,[41][42][43][44][45] . As we hypothesised, we found a significant, positive interaction between the disruptions in the temporal and spectral organisation of the music, which was localised in the fronto-limbic system (i.e. vmPFC, Nac, CdN, Ptm). Furthermore, we found a significant modulation of functional connectivity of the vmPFC by the combined disruption of the temporal and spectral organisation. Major findings were consistently observed in both fMRI data sets. In the following sections, we will discuss the relevance of our findings to neural mechanisms that contribute to the aesthetic appreciation of music.
Partial effect of alteration of spectral and temporal structures. Dissonance is very often associated with unpleasant emotions not only by those who are familiar with Western music but also by infants 46,47 , people from an autochthonous African ethnic group with no prior exposure to polyphonic Western music 21 , in a documented case a non-human primate (i.e. a chimpanzee) 48 , and even chicken 49 . This suggests that an association between harmony and emotional valence is to some degree universal and innate, presumably related to the physical properties of tonal sounds and the network characteristics of the low-level auditory stream; for instance, encoding of beating and sensory consonance/dissonance by the neurons in the inferior colliculus 50 . An fMRI study 51 revealed that certain acoustic information of aversive sound goes from the auditory cortex to the amygdala instead of directly from thalamic inputs, supporting the notion that certain complex aversive sounds need to be analysed at cortex level to induce negative emotional responses. A similar pathway was implicated from an intracranial recording of an epileptic patient, showing a cascade of information of dissonant harmony from the auditory cortex to the orbitofrontal cortex, ACC, and amygdala 52 supporting that certain aversive acoustic information reaches the amygdala via the auditory cortex, presumably followed by feedback from the amygdala to the auditory cortex. We believe our finding of decreased BOLD activation in the auditory cortex and the limbic system also reflects such a communication between the auditory cortex and the limbic system that is related to "core liking".
Constant alteration of a temporal structure in music seems to decrease activation in the limbic areas, such as the ventral striatum, hypothalamus, and the orbitofrontal cortex 17 , unlike a focal alteration that evokes a prediction error response in the IFG 20 . We also found reliable decreases in the bilateral putamina, which is known to be sensitive to emotional and motivational information 53 and vastly studied in the context of decision-making 54 . It was theorised that the dorsal striatum (including the Ptm) contributes to an action selection in the context of decision-making, whereas the ventral striatum encodes reward value and prediction error 55 . More relevantly, a similar distinction was reported in the context of musical pleasure: The dorsal striatum encoded anticipation of musical pleasure, whereas the ventral striatum encoded experience of pleasure 43 . Therefore, the current finding of the decreased BOLD signal in the Ptm in response to the reversed excerpts seems to be related to impaired reward anticipation processes. www.nature.com/scientificreports www.nature.com/scientificreports/ Interaction between spectral and temporal structures of music. We found an interaction between spectral and temporal domains in areas including the ventral and dorsal striata, vmPFC, and ACC (prominently in Experiment I) that have been well associated with the reward processing 6 and emotional appraisal [56][57][58] . In particular, we demonstrated that the direction of the effect of dissonance (or reversal) can be switched by the presence of reversal (or dissonance) in beta coefficients. In other words, it was shown that physically identical manipulations (e.g. dissonance) can produce opposite effects depending on the context (e.g. forward or backward) in those regions.
However, such an interaction (i.e. changes of the directions of effects) does not indicate that the disruption of the harmonic structure could be perceived as more pleasant when the temporal rules are already disrupted. In fact, the behavioural ratings showed that the two types of effects both contributed to rendering the musical excerpts more unpleasant. That is, a certain manipulation of a musical structure (e.g. dissonance) was similarly still unfavourable when the other musical structure (e.g. a sequential rule) was disrupted whereas it increased the BOLD signal in the vmPFC whereas it decreased the BOLD signal in the same region when it was presented with the other musical structure was intact.
One possible explanation of this might be a specific functionality of the vmPFC at integrating positive and negative emotions 59 , which was also subject to a computational imaging study 60 . Both human lesion and imaging studies point towards such an emotional functionality of the vmPFC. For example, patients with lesions in the vmPFC showed impairment in processing negative emotions 61,62 . In the current study, musical excerpts in their original forms (i.e. FC) and the most disrupted forms (i.e. BD) were rated as either very pleasant or very unpleasant while partially disrupted forms (i.e. FD and BC) were rated as (mildly) unpleasant. That is, relatively more (or less) intensive emotional valance might have been related to decreased (or increased) BOLD signal in the vmPFC. In fact, in a neuroimaging study 63 , where various types of musical emotions (e.g. peacefulness, joy, sadness and so on) were used, the vmPFC showed higher BOLD when a certain group of emotions (both positive and negative valance) was intensified, which also supports an integrative role of the vmPFC.

Modulated functional connectivity of the fronto-limbic network. Another important finding in
relation to the vmPFC was its differential functional connectivity with the IFG/FP when both musical structures were intact or disrupted although the BOLD activation levels were similar. We interpret this finding such that it suggests the vmPFC interacts with emotional processing (presumably in limbic areas) and cognitive processing (presumably in the IFG/FP) that is related to musical structures. The following findings support this notion: (1) The vmPFC has been mostly found to be involved in higher-order cognitive processing of emotional information. For instance, the vmPFC was found to be necessary in reward evaluation in decision-making 64,65 , nullifying learned conditioning 66,67 , emotional judgment of affective pictures 68 , and intensely pleasant emotions induced by music 4 . It has also been suggested that the vmPFC is involved in modulating autonomic processes 69 , which accompany emotional responses 70,71 . (2) The right IFG has been implicated in processing a series of chords 13 , melodies 72 , or even a periodic loop of random tones 73 , suggesting the IFG to be highly relevant for extracting regularity from sequential auditory inputs and forming expectation 74 . (3) Anatomically, in non-human primate models, direct connections between the frontal operculum and the basoventral PFC were found using a tracing technique 75 , suggesting a close relationship between the vmP-FC and IFG.  (4) The functional connectivity between the vmPFC and right IFG has been implicated in studies where emotional regulation is crucial. In a human study 64 , the functional connectivity between the vmPFC and IFG correlated with the performance level of a self-control task in relation to successfully inhibiting emotional responses. In another human study 76 , it was reported that an unstable interaction between the vmPFC and IFG was found in patients with anxiety disorder, which is suggestive of the IFG delivering higher-order sensory information to the vmPFC so that it can modulate the limbic system's activities, that is, the vmPFC seems to work as a pivotal point that mediates between the limbic system and the frontal cortex in the regulation of emotion. www.nature.com/scientificreports www.nature.com/scientificreports/ Taken together, it seems plausible that the vmPFC was engaged in cognitive processes that modulate emotional responses by differentially communicating with the right IFG and FP when listening to musical excerpts with varying musical structures.
Technical limitation. It would be noteworthy that our manipulation in the temporal domain has some limitations. In this study, we were focused on sequential order in various temporal organisations of music, therefore   www.nature.com/scientificreports www.nature.com/scientificreports/ reversal was one possible choice. In a study investigating phoneme encoding in EEG signals 77 , a similar manipulation (i.e., reversed speech) was used to disrupt the intelligibility of speech while preserving overall acoustic structures. However, in our experiments, reversing entire waveforms altered musical timbre together and did not alter beats and temporal intervals between notes while it remains unclear how these factors would interact with the sequential orders. Thus, more sophisticated temporal manipulations such as local reversal 78 or quilting algorithm 79 should be considered for more precise control of temporal structures for future studies.

conclusion
In the current study, we found a significant interaction between disruptions in the spectral and temporal structures of music in the brain activity of the fronto-limbic network. In particular, the vmPFC exhibited distinctive functional connectivity with the right IFG to altered spectral and temporal organisation of music, which is indicative of cognitive involvement in emotional processes, with the vmPFC as a pivotal node of a functional network mediating integration of cognition and emotion during music listening.

Data availability
The datasets analysed in the current study are available upon reasonable request to the corresponding author.   Table 4. Statistics of subjective ratings of pleasantness. Abbreviation: Std., standard deviation.