Introduction

The appreciation of a melody is deeply dependent on the musical tempo i.e., the time with which each single note is played. If this tempo changes, the melody itself changes. Appreciating and playing music requires the very rapid processing of multiple durations. How the brain encodes and reads out all these different durations remains unclear. A growing body of evidence highlights the contribution of many different brain regions, including cortical (inferior parietal, premotor and prefrontal cortices) and subcortical brain structures to temporal computations. However, how these different “time-related” regions communicate with each other and how they contribute to time processing and perception remains largely unknown1,2.

In a recent fMRI study3, we measured human brain activity at high spatial precision (using a 7 Tesla magnet) while asking participants to discriminate visual stimuli of different durations. In different experimental trials participants were asked to discriminate pairs of durations belonging to different ranges (i.e., from 0.2 to 1 s). The aim of the study was to measure brain responses at the offset of the first stimulus of each pair (i.e., the end of a “purely” encoding stage of the task) to assess the existence of chronomaps i.e., neuronal units selective to different durations and orderly mapped in contiguous portions of the cortical surface. Results revealed the existence in the medial premotor cortex (i.e., Supplementary Motor Area, SMA) and intraparietal sulcus (IPS) of neural units displaying selective responses to stimulus duration (i.e., duration selective clusters of voxels). In these clusters, the hemodynamic response was enhanced by durations in close temporal proximity to the preferred durations and suppressed by durations far from the preferred one. However, only in SMA duration selectivity was organized in topographic maps (i.e., chronomaps). The maps were topographic in that voxels with similar response preference were spatially contiguous. Topographic maps in SMA were tested with a broader range of durations (from to 0.2 to 3 s), however in line with the hypothesis of distinct mechanisms for durations above and below the second4, we found a better spatial progression for maps below the second range3. For this reason, in this work our focus is on the millisecond range.

In humans, topographic maps of milliseconds stimulus duration and event frequency have been recently described not in a single brain region, but in ten different cortical locations, from lateral occipital to inferior parietal and premotor regions5,6.

The existence of duration preferences and topography is an intriguing finding, because it suggests that the brain has a seemingly categorical representation of stimulus duration. Additionally, the presence of duration tuning in several cortical areas suggests that this general mechanism of representation, that could show different properties5,6 and serve different purposes in the cortical hierarchy, is redundant across the brain. To shed light on the tuning mechanism itself and on its redundancy, we decided to use part of the same set of high-resolution fMRI data we used to uncover SMA chronomaps, to answer two additional questions. First, how widespread duration preferences are in the human brain? Duration and temporal frequency preferences have been recently described in a wide network of brain regions from occipital to frontal cortices4; however, it is unclear whether these preferences appear with the simple appreciation of these stimulus features (i.e., passive viewing) or they are also present when a temporal discrimination task is administered5. Second, and most important, how do duration-selective units communicate and coordinate with each other? And by answering this last question, our goal was to gain insights on the functional role of the different “time sensitive” brain areas.

In this study, we focused on the brain regions that in the original work3 were significantly active at the offset of the different encoded durations. These areas were: cerebellum, primary visual cortex (V1), IPS, SMA and Inferior Frontal Gyrus (IFG). In these locations we first assessed the existence of duration preferences and then we checked the connectivity structure between them and between their four “duration channels” (i.e., duration selective clusters of voxels). To this last purpose, we measured effective connectivity with Dynamic Causal Modelling, (DCM)7,8. We chose DCM because it is particularly appropriate for explaining brain responses that are the consequence of specific experimental manipulations. With DCM we tested various causal models of the interactions between the different network nodes and our experimental conditions (i.e., the different stimulus durations) entered the models as inputs that either drove local responses or changed connection strengths. Importantly, DCM allowed us to make inferences about the directionality of the connectivity between the different brain areas7.

Results

In the first fMRI experiment of the original work (Exp.1)3, we measured brain activity while asking participants (N = 11) to decide whether the second stimulus (S2) of a pair, was longer or shorter than the first one (S1, see Fig. 1a). S1 could be equal to one of four different durations (0.2, 0.4, 0.6 and 1.0 s) and S2 shorter or longer by a Weber fraction of 0.5. Stimuli were visual gratings (i.e., Gabor patches) varying in both orientation and duration (Fig. 1a). Orientation changes were task irrelevant (see Materials and Methods section for details).

Figure 1
figure 1

Stimulus sequence, behavioral results, and fMRI group results. (a) Schematic representation of the stimulus sequence. In each trial a standard (S1) and a comparison duration (S2) were presented. S1 could be one of four different durations (0.2, 0.4, 0.6, and 1 s) while S2 was either shorter or longer by a Weber fraction of 0.5. Stimuli were sinusoidal Gabor patches varying in orientation. Orientation changes were task irrelevant. Participants were asked, by pressing one of two response keys, to judge whether the duration of S2 was shorter or longer than S1. Inter-stimulus interval (ISI) and inter-trial interval (ITI) varied from 4 to 5.2 s with steps of 0.08 s. Response (RES) period was hold fixed at 2 s. (b) Group average (N = 11) of percentage of accuracy plotted separately for each of the four durations and as a mean of them (grey rightmost bar). The grey dots represent the different subjects. Error bars are standard errors. (c) Activations correlated with the offset of the four different S1 durations (p < 0.05, FWE cluster level corrected for multiple comparisons across the whole brain). The significant clusters are overlaid on a structural image normalized to the common DARTEL space (see Supplementary Table 1 for more details).

Behavioral data show that participants performed well in all four tested durations (see Fig. 1b). Proportion of correct responses were 85.1 ± 7.1 (mean ± standard deviation), 87.0 ± 4.9, 91.5 ± 5.4 and 90.6 ± 4.1% for each S1 duration condition (i.e., 0.2, 0.4, 0.6, and 1.0 s) respectively. Overall accuracy was 88.6 ± 3.7%. A one-way repeated measures ANOVA with within-subject factor of S1 durations showed a significant main effect (F3,30 = 4.824, p < 0.05) suggesting the presence of a difference in performance across the four durations. However, pair-wise post-hoc tests showed no significant difference between the different combinations of S1 durations (all p’s > 0.05, Bonferroni-corrected for multiple comparisons).

As in the original experiment, also here we focused on brain responses associated with the encoding of the first stimulus of the pair (S1) and specifically the offset of it. We decided to focus on the stimulus offset because this was the moment when the duration of the stimulus became available to the participant. We employed a mass-univariate General Linear Model (GLM) approach at both single subject and group level and used separate regressors for each of the four different S1 durations. These regressors of interest were convolved with the canonical hemodynamic response function (HRF, see Methods section for more details about the modelling of regressors of no interest). We first identified the brain regions significantly associated with the presentation of the four S1 durations together (see Fig. 1c and Supplementary Table 1) and then we searched for brain areas that were maximally responsive to each of the four S1 durations separately. For this purpose, we performed a winner-take all classification on the four t-statistics maps, one for each S1 duration, that were generated at both single subjects and group level (see “Materials and Methods” section for more details on the GLM analysis). The brain areas responsive to the four S1 durations were Cerebellum (Vermis VI, left lobule VI, right lobule V), left and right primary visual cortex (V1), left Intraparietal Sulcus (IPS), left Supplementary Motor Area (SMA), and left Inferior Frontal Gyrus (IFG, see Fig. 1c and Supplementary Table 1).

To the purpose of investigating the existence of duration preferences in a wide brain network, we first checked whether each duration selective cluster of the five areas (i.e., from now on called Regions of Interest -ROIs) showed a form of “duration tuning”. From the original work, we knew that both IPS and SMA exhibited duration tuning. However, since at group level, cerebellum, V1 and IFG did not show any sign of topographical arrangement of duration selective clusters, duration tuning was not previously assessed in these three regions.

To examine response tuning, for each duration selective cluster of voxels within each ROI, we looked at the changes of the hemodynamic response for preferred and non-preferred durations (2nd brain volume after S1 offset). For each individual subject to avoid circularity, the selection of the duration selective clusters was based on half of the available runs (9 out of 18) and the hemodynamic response extraction was computed on the remaining runs in 18 possible combinations (see “Materials and Methods” for details). To quantify the changes of the BOLD response to the different durations offset in the different ROIs, we performed a linear mixed model on the normalized BOLD responses of each of the 18 possible iterations of the cross-validation procedure (BOLD ~ duration selective cluster * presented duration * region of interest + (1|iteration)). The ANOVA ran on the model’s fit with the Satterthwaite's method showed a significant main effect of ROI (F4,15 = 13, p < 0.001) because the BOLD response in SMA was greater than in all other ROIs (see Supplementary Fig. 2a). As shown in Fig. 2 a significant triple interaction (ROI, duration selective cluster and presented duration F36,15 = 2.54 p < 0.001) showed that in the majority of the ROIs, in the different duration selective clusters (i.e., colored lines), the hemodynamic activity across the runs was modulated by the four S1 durations (in the x axis). Specifically, the hemodynamic response in most of ROIs peaks at the preferred duration (filled diamonds) and slowly decays for durations distant from the preferred one (open diamonds). The shaded area in the plots indicates the distribution of the combinations of runs averaged across subjects, the colored line is the average BOLD response across subjects and iterations in the different duration selective clusters. These results were also confirmed by a significant two-ways interaction between duration selective cluster and presented duration (F9,15 = 42.02 p < 0.001, see Supplementary Fig. 1b), showing that independently from the ROIs, the BOLD response in the different duration selective clusters was modulated by the presented durations i.e., activity peaked at the preferred duration and decayed with distance from it. The decay of the hemodynamic response for durations temporally distant (i.e., non-preferred) from the preferred ones is the most interesting results here, indexing the presence of duration tuning.

Figure 2
figure 2

Cross-validated duration tuning across multiple brain areas. In the top row example of volumetric masks in a single subject. The mask is overlaid on the subject’s native space. For each ROI (i.e., each column of the figure) and each duration selective cluster (i.e., the colored line in the subplot), the group average normalized BOLD responses (y-axis) for the different presented S1 durations (x-axis) is shown. The BOLD signal in the duration selective clusters is aligned to the presentation timings of the different S1 durations (i.e., 2nd volume after S1 offset). The filled symbols represent the point in time where the hemodynamic response of duration selective clusters matched the presentation timing of the appropriate duration (e.g., red-labeled voxels when the shortest S1 duration is presented). Normalization was performed first in each individual subject to the mean signal intensity across 9 fMRI runs and then for each duration selective cluster to the signal associated to the preferred duration. The colored lines are averaged BOLD responses across subjects and iterations. The grey area is the BOLD distribution across iterations (averaged across subjects and runs); black bars are standard errors. Legend: CRB cerebellum, V1 primary visual cortex 1, IPS intraparietal sulcus, SMA supplementary motor area, IFG inferior frontal gyrus.

The distribution and average cluster size of the 4 durations selective cluster of voxels in the 5 ROIs is shown in Supplementary Fig. 3. Whereas the proportion of duration selective voxels with respect to the total number of active voxels in the ROI is reported in Supplementary Fig. 4. In none of the ROIs there was a significant difference between the proportions of duration selective clusters of voxels (repeated-measures ANOVA all F3,15 <  = 2.07 p > 0.1).

Once we found duration tuning in all five brain areas, we moved to the investigation of the effective connectivity between these areas and their four “duration channels”.

We measured effective connectivity with Dynamic Causal Modelling (DCM). The main objective of DCM is to estimate the coupling parameters of a hypothetical neuronal model and evaluate how well the modeled BOLD signal approximate the observed BOLD response. These coupling parameters concern different aspects of connectivity: the network architecture (i.e., A-matrix) and the influence of a given stimulus (e.g., a visual stimulus lasting 0.2 s) on both the network connectivity (i.e., B-matrix) and the activity of the network nodes (i.e., C-matrix, see “Materials and Methods” for details of the DCM equation used). Here, we must stress that the number of possible combinations of these three different aspects of connectivity (i.e., A–B–C matrices) grows exponentially as the number of brain regions included in a system grows. For this reason, rather than exploring the effective connectivity in a 20-nodes network (i.e., the five ROIs by the four “duration channels”), a network for which the number of possible models to test would be exceedingly high, we decided to start with the investigation of the effective connectivity in a simpler 5-nodes network (i.e., the five ROIs without considering the “duration channels”). We also decided to break the whole analysis process in two main stages.

The first set of analyses (for a summary of all performed DCMs see Supplementary Table 2) concerned this 5-nodes network. Our aim here was to explore within this simpler network two features of network connectivity: the network architecture (i.e., how the different network nodes communicate and coordinate with each other, A-matrix) and the influence of a certain stimulus duration on the activity of any node of the network (C-matrix). Once these two features were identified, we moved to the second analysis stage, and we used these features to inform and therefore simplify the DCM on the 20-nodes network.

Since we did not have a clear hypothesis about the network architecture of the five duration selective brain regions, to identify the connectivity structure of this network we employed a “data-driven” approach, called Parametric Empirical Bayesian method (PEB). Briefly, PEB seeks the optimal model by testing various combinations of A–B–C-parameters through a pruning process. PEB starts with a fully connected and a fully modulated model and then proceeds by switching off certain model’s parameters (e.g., connections) and measures how this affects the model evidence. For PEB analysis we first specified a fully connected model between the five ROIs and we ran the DCM routine of 198 models (a model for each fMRI run and each subject i.e., 11 subjects by 18 runs, see “Material and Methods” for more details). Models were treated as random effects that could differ between subjects. Posterior probability (Pp) for second-level effect was set to Pp greater than 0.99. According to PEB, the optimal network architecture (log Free -energy value was equal to 8.23) was the one in which the cerebellum is the only brain area connected with all other network nodes (See Fig. 3a). Cerebellum has indeed both feedforward and feedback connections to and from V1 and IFG, and only feedforward connections to SMA and IPS. V1 and IPS have feedforward connections to SMA. IFG has feedback connections from both cerebellum and V1. SMA is the only node of the network that while being affected by the activity of other brain regions, its activity does not affect any other network node.

Figure 3
figure 3

Five-nodes network architecture resulting from Parametric Empirical Bayesian (PEB) analysis. On the left-hand side is a graph representation of the network architecture (A-matrix) as resulted from the 5-nodes PEB analysis. The directionality of connections is depicted with arrows. On the right-hand side is the matrix of the strength of the survived connections after the pruning of the parameters done by PEB. Values are in Hz (see colorbar). Legend: CRB cerebellum, V1 primary visual cortex, IPS intraparietal sulcus, SMA supplementary motor area, IFG inferior frontal gyrus.

After the identification of the optimal model across subjects (Fig. 3a, left), we proceeded with Bayesian Model Averaging (BMA, i.e., averaging the parameters across the “winning” models of all subjects), to determine the parameters strength of this connectivity structure (Fig. 3a, right). The values in the matrix depicted in Fig. 3, reflect the strength with which the activation changes in one node affect the interconnected nodes. An identical PEB analysis was conducted using a cross-validation approach in which for each subject we have used half of the runs (from run 1 to run 9) to define the ROIs and the other half (from run 10 to run 18) to run the PEB. The results of this second PEB (shown in Supplementary Fig. 5) are similar to the original ones, apart from a new feedforward connection between V1 and IPS and between IFG and SMA and the loss of a feedback connection from IFG to V1 and of a feedforward connection between IPS and SMA. The similarity of the two winning models is also confirmed by the statistically significant correlation of the Bayesian Model Averaging (BMA) matrices between cross-validated and original PEB (R = 0.8 p < 0.001, Pearson's correlation).

To check the robustness of the original PEB connectivity results, we ran two additional DCM analyses with two distinct connectivity structures (A-matrix): a “fully connected” and a “PEB-like” version of it. In both DCM models, B and C matrices were left fully modulated. We then compared the two models by using Bayesian Model Selection (BMS)9 and we computed their protected exceedance probabilities (PEPs). We considered mandatory for a “winning” model to have PEPs greater than 90%. As shown in Supplementary Fig. 6a, the winning model is the one, whose architecture is “PEB-like”. In the same vein, we compared the DCM analysis with an A-matrix “fully connected” with a new DCM analysis with an A matrix “PEB cross-validated -like”. In both DCM models, B and C matrices were left fully modulated. Also here, we compared the two models by using Bayesian Model Selection (BMS)9 and we computed their protected exceedance probabilities (PEPs, plotted in Supplementary Fig. 6b). As in the previous models’ comparison, also here the winning model has a PEB-like structure. We should point out here that the Free energy value associated to the DCM with an A-matrix PEB-like is nevertheless higher (− 2.5802e+03) than the Free energy value associated to the DCM with an A matrix cross-validated PEB-like (− 2.585e+03). These last results confirm and validate the results obtained with the initial PEB analysis (Fig. 3).

After knowing the connectivity structure of a 5-nodes network, we proceed by finding out which region(s) among the five ROIs, was modulated by stimulus presentation (C-matrix). Fifteen possible models with 15 different combinations of C-matrices were tested. In the C-matrices of these 15 models, we arbitrarily allowed to have one or maximum two brain regions modulated by event duration. All models had an A-matrix “PEB-like” and a B-matrix in which all existing connections were modulated by stimulus presentation. Thus 15 possible model variations (i.e., nested models) were constructed using Bayesian Model Reduction (BMR). Bayesian Model Selection (BMS) showed that the most likely model was the one with a C-matrix in which IPS was the area modulated by duration presentation (Supplementary Fig. 6c).

Once identified the best network architecture (A-matrix) between the five ROIs and established that the IPS is the brain area mainly affected by stimulus presentation (C-matrix), we used this knowledge to inform the DCM analysis on the 20-nodes network. In this last analysis we tested specific hypotheses of connectivity between the five ROIs and their four duration selective clusters. These hypotheses concerned the existence of connections (A-matrix “PEB-like”) and the modulation exerted by stimulus duration on both connectivity strength (B-matrix) and IPS activity (C-matrix). Specifically, we hypothesize these connections and modulations to be: (1) stimulus duration independent, (2) only partially stimulus duration dependent i.e., neighboring dependent or (3) totally stimulus duration dependent (see Fig. 4).

Figure 4
figure 4

Graph representation of the 3 hypotheses tested in the 20-nodes DCM. The hypotheses concern brain architecture (A-matrix), connectivity strength (B-matrix) and driving input in IPS (C-matrix). We hypothesized these DCM parameters to be either duration independent, neighboring dependent or duration dependent. To help understanding the hypotheses, in this figure the duration input is a 0.4 s stimulus. For A-matrix, duration independent means that all duration-specific clusters within each node connect to all the clusters (preferred duration, PD ± 2) of a different node. Neighboring dependent means that each duration specific cluster within a node connects to the same duration specific cluster and its immediate neighbors (PD ± 1) in another node. Duration dependent means that each duration specific cluster within a node connects to the same duration specific cluster in any other node (PD). The same hypotheses apply to the modulatory effect that stimulus presentation has on network connectivity (B-matrix) and on the activity of the duration selective clusters within IPS (C-matrix). Within each ROIs the duration selective clusters are color-coded: red = 0.2, orange = 0.4, yellow = 0.6, and green = 1 s. CRB cerebellum, V1 primary visual cortex, IPS intraparietal sulcus, SMA supplementary motor area, IFG inferior frontal gyrus.

We tested 108 possible models which were the combination of these three possibilities (i.e., duration independent, neighboring dependent, duration dependent) applied to the A–B–C matrices (see Materials and Methods section for more details). In other words, we tested models with an A-matrix “PEB-like” and where the C-matrix concerned the impact of S1 stimulus duration on the activity of the four duration selective clusters within the IPS.

Although our main interest was to check connections (A-matrix) and modulations (B-matrix) between the different duration-selective clusters across the five ROIs, for completeness we also assessed these connections and modulations within a given ROI.

For the calculation of the A–B–C-parameters of these 108 models we used the Bayesian Model Reduction (BMR) procedure. To identify the winning model, we first divided the 108 models into three main families (36 models each) based on the modulatory effect of S1 stimulus duration on the activity of the IPS duration selective clusters (i.e., stimulus duration independent, neighboring dependent, stimulus duration dependent, see Materials and Methods for more details) The PEPs of each family of models was then estimated using BMS where models were treated as random effects10,11 (see Supplementary Fig. 7a for PEPs of the tested models).

The winning family was the one in which the presentation of a stimulus duration modulates the IPS activity of the duration selective clusters in a duration dependent manner (Fig. 5, right panel, see Supplementary Fig. 7a). This means, that the presentation of a 0.4 s stimulus, for example, modulates the activity of the IPS cluster that is selective for that duration only.

Figure 5
figure 5

Winning model of the DCM 20-nodes network. Graph representation of the winning model. On the left-hand side is the A-matrix, on the right-hand side the B and C matrices. In the winning model of the 20-nodes DCM conduced on the entire dataset the network architecture (A-matrix) is fully connected while both connectivity strength (B-matrix) and driving input in IPS (C-matrix) are modulated in a duration dependent fashion.

We then determined which of the 36 models within the winning family of models (where activity of IPS was modulated in a duration dependent fashion), best explained our data. We applied Bayesian Model Comparison (BMC)10, to compare models differing in connectivity (A-matrix) and modulation of connectivity by stimulus duration (B-matrix) according to the different levels of duration specificity (see Fig. 4 for the different tested hypotheses). A representation of the winning model is provided in Fig. 5 (for PEPs of the tested models see Supplementary Fig. 7b). In this model all hypothesized connections resulting from the PEB analysis are present (A-matrix “PEB-like” fully connected, see Figs. 3a and 5 left panel). However, when a specific duration is presented, e.g., 0.4 s, there is a change in the connectivity that is duration dependent, i.e., it affects the connectivity between the 0.4 duration selective clusters of the functionally distinct brain areas (e.g., the 0.4 cluster in cerebellum and V1 see Fig. 5 right panel). Within each brain region though, this modulation is duration independent, for example the activity of all duration selective clusters within the cerebellum are influenced by the presentation of a 0.4 s duration. Finally, to check the parameters of the winning model across subjects, we use Bayesian Parameter Averaging (BPA) to average the A–B–C-parameters of the winning model across subjects12 (see Supplementary Fig. 8).

Since the BOLD response is highly correlated among brain regions that are responsive to the same trial event, like for example the clusters of voxels preferring the same S1 duration, one can argue that the duration specific modulation in connection strength can be due to this high correlation between duration selective clusters. To address this criticism, we used an identical pipeline to run a cross-validated 20-nodes DCM where we used half of the runs to define the duration selective clusters and the other half to run the DCM (see “Materials and Methods”). The winning cross-validated model shown in Supplementary Fig. 9 has the same C-matrix, as the original DCM i.e., the activity of the IPS is modulated by stimulus presentation in a duration dependent fashion. Whereas both the A and B matrices are partially stimulus-duration dependent (“neighboring” specific). This means that when a given stimulus duration is presented (0.4 s for example) only the neighboring clusters of voxels (0.2 s and 0.6 s) are connected (A-matrix) and only those clusters are modulated by stimulus presentation (B-matrix). If one looks at the correlations of the Bayesian Model Averaging (BMA) between cross-validated and original winning model (see Supplementary Fig. 10), one can see that A and C matrices are significantly correlated (R = 0.45 p < 0.001; R = 0.91 p < 0.08; Pearson correlation) whereas for the B-matrix two correlations are significant (i.e., the ones concerning 0.4 s and 0.6 s, R = 0.11 p < 0.03; R = 0.1 p < 0.05; Pearson correlation) and two others are not (0.2 s, R = 0.07 p < 0.16; 1 s, R = 0.04 p < 0.45; Pearson correlation). We believe these results non-trivial considering that the cross-validated model is ran on just half of the available data. Overall, we believe this last result an indication that the connectivity between the 20-nodes has a certain degree of specificity and it is not totally duration independent.

At this point we also conducted a graph analysis by using the original estimated effective connectivity A-matrix of the 20-nodes DCM and put it in graph space (see “Material and Methods” for more details). The results of this analysis are displayed in Supplementary Fig. 11. In accordance with the PEB analysis we found that Cerebellum among the different nodes, is the one that shows the greatest value in “betweenness centrality”, an index of the influence a node has over the flow of information in a graph. And cerebellum, together with V1, it is the node with the highest “out-degree” values, an index of the number of connections that leaves that node. SMA on the other hand, is the node with the highest value of “in-degree” a measure of the number of connections arriving at a given node. The graph results seem therefore consistent with the network architecture revealed by the PEB analysis.

Discussion

In this study we used fMRI at ultra-high field to show the existence in a wide circuit of brain areas, including cerebellum, primary visual cortex, intraparietal sulcus, supplementary motor area and inferior frontal gyrus, of neuronal units (clusters of voxels) displaying duration preferences. In each duration selective cluster of voxels, the signal amplitude decreased for durations distant from the preferred one. When we looked at the connectivity architecture of these five functionally distinct brain areas (PEB analysis on a 5-nodes network), we found that the optimal model is the one in which cerebellum has feedback and/or feedforward connections from and to all other network nodes. Whereas SMA is the only area that, while being modulated by the activity of cerebellum, IPS and V1, does not influence the activity of any other brain region. Moreover, we found that within this network, the brain area greatly affected by stimulus presentation was the IPS. We then explored the effective connectivity between these five regions and their four duration selective clusters of voxels. The DCM analyses of this 20-nodes network, showed that the connectivity strength of the duration selective clusters, across functionally distinct brain regions, was modulated in both a duration dependent and neighboring dependent fashion. In a similar vein, the modulation of stimulus presentation on the activity of the four IPS duration selective clusters was duration specific.

The existence of duration tuning across different brain regions is an intriguing and novel finding. Electrophysiological studies in animals, have shown the existence in numerous brain areas of cells selectively responsive to duration and intervals of the order of tens-to-hundreds of milliseconds13,14,15. However, only very recently duration preferences have been described in the human brain3,5. In a recent fMRI study, for example, Harvey and colleagues showed the existence of a topographic representation of event duration and frequency in ten cortical locations along a functional hierarchy that goes from occipital to frontal regions5. Maps were defined as changes in the response preference of voxels to duration and frequency as a function of cortical distance i.e., voxels with similar response preference were spatially contiguous. However, in this study, differently from the current work, for most of the trials, subjects were not judging the duration of the stimuli but they were asked instead to detect a color change in them. It is thus likely that any difference between the network described here, and the frequency and duration maps identified by Harvey and colleagues is due to the presence/absence of a duration task.

The representation of stimulus durations in a map, like the representation of other sensory features, for example shape, speed, pitch, very likely serves the purpose of improving the identification and the discriminability of different durations16,17,18. This may be very important in situations where a rapid and accurate processing of a multitude of different durations is needed, for instance, in playing music or understanding language. How these results in the visual modality can be extended to a network driven by a different sensory input, for example an auditory duration, is far from clear. However, since some of the nodes in this duration-selective network are high-level associative, premotor and prefrontal areas, we believe it is likely that these areas would be engaged independently from the sensory modality of the input. This hypothesis is also supported by a recent study from our group in which using a temporal reproduction task of auditory stimuli, we were able to observe duration preferences in both SMA and IPS19.

Why there are multiple duration maps across brain areas is unclear. A possibility is that despite this similar representation of time, cerebellum, V1, IPS, SMA and IFG play a different role in duration encoding and perception.

In the direction of finding out the role played by these different brain regions in duration perception, the connectivity results are of a great help. PEB results indeed seem to suggest the existence of specific functional relationships between the network of areas active at duration encoding. Specifically, the cerebellum seems to work as network hub. With its feed-forward and feedback connections to and from all the other network nodes, the cerebellum seems in charge of both controlling the flow of information and updating the state of the network. Interestingly, the only bidirectional connections that the cerebellum has, are with V1 and IFG, i.e., the two edges of a hypothetical functional hierarchy. V1 is probably the sensory input area, the area conveying the temporal signal embedded in the visual input. Whereas the IFG could be the area where encoding-related decisions takes place, as previous studies have shown20. Both V1 and IFG receive and send information to cerebellum, but the direct connection between IFG and V1 is unidirectional only i.e., IFG modulates V1 activity and not vice versa.

Cerebellum also has feedforward connections to IPS and SMA and these connections are unidirectional. This last result seems to suggest once again a key role of the cerebellum in mediating the temporal encoding process. A possibility is that the cerebellum mediates the flow of information between input and output regions and in doing so talks to core timing structure such as IPS and SMA. Since IPS is the only brain area whose activity is greatly affected by duration offset, it can be considered the place where a first reading-out of temporal signals occurs (i.e., “duration input” area).

Why is the IPS the only duration input region? We can only speculate that this is because the IPS is the first or one of the first areas in this hypothetical functional hierarchy where the low-level information conveyed by the visual cortex is decoded and recognized. A very recent fMRI study has indeed shown that in the cortical hierarchy of duration and frequency maps there is a transition in brain responses to stimulus duration that goes from monotonically increasing in early visual cortex to duration-tuned neural responses in area MT/V5, inferior parietal and frontal cortices6. Inferior parietal cortex, being one of the first areas in this cortical hierarchy showing a tuning-like response, can be then the first stage of duration recognition. The link between inferior parietal lobule (IPL) and duration perception is shown in TMS studies where IPL stimulation leads to impairments in duration encoding21 and discrimination22,23. And it is in line with a plethora of studies showing a correlation between IPL activity and temporal computations and perception23. Hayashi and colleagues for example were able to induce in IPL neural adaptation to stimulus duration24, to decode stimulus duration from its activity25 and to show a link between IPL activity and duration perception26.

SMA instead, being modulated by the activity of both IPS and V1, but not modulating the activity of any other network node, could be considered the final stage of duration recognition. At this point it is worth emphasizing here that, DCM captures functional connectivity, namely how the activity changes in one brain region are related to the hemodynamic changes in others. Functional connectivity is independent from anatomical connectivity, for example the connection we describe between V1 and SMA does not have an anatomical counterpart i.e., there is no anatomical connection between V1 and SMA.

In humans, the role of both SMA and inferior parietal lobule in temporal perception has been extensively documented23. Both areas have been implicated in a variety of timing tasks22,27 with a range of durations spanning from a few hundreds of milliseconds to a few seconds28,29 and with stimuli of different sensory modalities30,31. It is therefore likely that both areas constitute a core of the timing network. The functional link between SMA and inferior parietal lobule activity has been recently shown by mean of Transcranial Magnetic Stimulation (TMS) combined with electroencephalography32. In that study, the perturbation of the right supramarginal gyrus (SMG) just before the encoding of a stimulus duration, affects activity in frontocentral sites including SMA. These effects correlate with changes in duration perception, suggesting thus a direct link of SMG and SMA connections with duration perception. Also, cerebellum has been proved to play a key role in motor and perceptual timing tasks of discrete isolated intervals33,34,35. Specifically, the cerebellar lobule V and VI, where we observed activity correlated to duration encoding, have been previously associated to interval timing36,37, motor and working memory tasks38. These lobules, according to resting-state connectivity maps, are part of somatomotor, ventral attention and frontoparietal networks39. The computational mechanism through which cerebellum contributes to millisecond time perception is still a matter of debate. Our data here seem to suggest a role of hub of the cerebellum, coordinating the input (V1) and output (IFG) regions by talking with core timing areas (SMA and IPS).

When we considered the five network nodes and their four durations selective clusters, we found that at stimulus offset, the connectivity pattern described by the PEB analysis was either full, meaning that all duration selective clusters were connected, or neighboring, meaning that only the clusters selective for the input and the neighboring duration were connected. Both these patterns were modulated across regions in a duration dependent and neighboring dependent manner. In a similar vein, also IPS activity was affected by S1 duration offset in a duration specific fashion. All these results are compatible with the presence of tuning in all these five brain areas. When a given duration is presented, it makes sense that across the five nodes, only the communication across appropriate duration selective or neighboring selective channels become particularly “active”.

Tuning preferences are centered around a single value but with a certain variability, meaning that the same neural population respond preferentially to certain stimulus duration and to a less extent to similar ones. Form the connectivity point of view it therefore makes sense that also the communication between distinct areas is not completely duration specific but involves clusters of voxels tuned to durations similar to the preferred one.

Why should the connectivity strength be modulated with a certain degree of duration specificity? This result likely serves the efficiency of the neural communication18. A specific communication indeed, enables information entering at the input level to be reproduced faithfully at the output level. And having a more efficient communication saves metabolic resources and time18,40.

Due to the limited spatial resolution of fMRI, it is impossible to draw a direct link between the duration preferences of the clusters of voxels in humans, with the duration tuned cells in animals1,41. It is indeed difficult to establish whether the changes detected at voxel level reflect exclusively the activity of different populations of neurons tuned to different durations or whether it is the same population whose neuronal dynamics change depending on the presented duration1,42. An interesting hypothesis supported by a growing body of neurophysiological data in monkeys is that the capacity of producing highly timed motor responses i.e., single intervals42,43 or temporal patterns44,45 in monkey premotor cortex is associated with changes in the population dynamics. These changes take the form of “a scaling” of the population neuronal responses or a change in the amplitude of neural state trajectories depending on the temporal interval reproduced. These data are intriguing and somehow fit with the findings that also human brain responses to stimulus duration are different depending on the stimulus duration at hand. Although, the existence of topography seems more linked to the idea of tuning responses, we do not know whether temporal scaling co-exist with tuning mechanisms. Another possibility is that distinct brain regions according to their functional role through the stages necessary to solve the temporal task, use distinct mechanisms. For example, it has been recently shown that the early cortical stages of visual processing (V1, V2, V3a) stimulus duration leads to a sub additive increase of BOLD response as a function of stimulus duration46 and that there is a gradual transition from monotonically increasing to timing-tuned neural responses beginning in the medial temporal area6 (MT/V5). These results seem to suggest the possibility of a functional hierarchy in time processing where low-level sensory regions monotonically responding to stimulus duration represent an early stage of temporal processing where temporal info is collected and perhaps integrated.

Finally, we would like to emphasize that DCM provides the best possible model given a set of data, by changing the data set, the model evidence could change. And DCM indeed is a method particularly suitable to explore effective connectivity potentially induced by an experimental manipulation, if this manipulation (i.e., task) changes, the connectivity structure may change. This caveat should always be considered when it comes to the generalizability of the findings.

Overall, what we can conclude based on the results presented in this work is that time perception and encoding is the result of the activity of a network of brain areas, and it supported by duration preference mechanisms at both representation and connectivity level.

Materials and methods

The data of this study are partially shared with two published studies3,25.

Participants

We tested eleven healthy volunteers (5 females, mean age 23.7 years, SD 4.3 years) with normal or corrected-to-normal vision. All volunteers gave written informed consent to participate in the study. The experimental protocol was approved by the ethics committee of the Faculty of Biology and Medicine at the University Hospital of Lausanne (protocol number 92/2012) in accordance with the Declaration of Helsinki.

Stimuli and Procedure

The volunteers performed a temporal discrimination task of visual durations. Visual stimuli were sinusoidal Gabor patches (100% contrast, spatial frequency of 1.9 cycles/degree, Gaussian envelope with standard deviation of 2.2°, diameter of ~ 9°) with a central circular hole (diameter 0.6°) displayed at the center of the screen around a central fixation point (a black disk 0.5° of diameter at a viewing distance of 90 cm) on a grey background.

In each trial, two Gabor patches (S1 and S2) were sequentially presented at a variable inter-stimulus-interval ranging between 4 and 5.2 s (in 0.08 s steps). The two stimuli were followed by a response cue (a red fixation point) of 2 s duration (see Fig. 1a). S1 and S2 varied in both orientation and duration but only duration was task relevant. The duration of S1 could be 0.2, 0.4, 0.6, and 1 s and its orientation 36, 72, 108, and 144 degrees. S2 could be either shorter or longer than S1 by a constant Weber fraction of 0.5. The orientation of S2 was a value randomly chosen from the four possible orientations used for S1. Participants were asked to decide whether the duration of S2 was shorter or longer than S1 by pressing one of two buttons on a response-pad. The combination of different durations and orientations lead to 16 different trial types. Each S1 stimulus type was presented only once in each fMRI run. A total of 18 fMRI runs were collected in two separate sessions (9 runs per session with 1–3 days of latency). The total duration of each run was 3 min and 51 s.

Behavioral data analysis

A one-way repeated measures ANOVA was carried out on percentage accuracy values of each individual subject. The alpha level was set to 0.05, while Bonferroni test was used as post-hoc test.

MRI acquisition

Blood oxygenation level-dependent (BOLD) functional imaging was performed using an actively shielded, head-only 7T MRI scanner (Siemens, Germany), equipped with a head gradient-insert (AC84, 80 mT/m max gradient strength; 350 mT/m/s slew rate) and 32-channel receive coil with a tight transmit sleeve (Nova Medical, Massachusetts, USA). The ultra-high magnetic field system allowed us to obtain smaller size voxels thus to increase the spatial resolution of the functional data. This is due to the fact that 7T systems acquire improved BOLD signals; the signal strength of venous blood is reduced due to a shortened relaxation time, restricting activation signals to cortical grey matter. Time-course series of 169 volumes were acquired for each run using the 3D-EPI-CAIPI sequence47. The spatial resolution was 2.0 mm isotropic; the volume acquisition time was 1368 ms, the flip angle was 14 degrees, the repetition time (TR) 57 ms and the echo time (TE) 26 ms and the bandwidth 2774 Hz/Px. The matrix size was 106 × 88 × 72, resulting in a field of view of 210 (AP) × 175 (RL) × 144 (FH) mm. An undersampling factor of 3 and a CAIPIRINHA shift were used. Slices were oriented transversally with the phase-encoding direction left–right. 42 × 45 reference lines were acquired for the GRAPPA reconstruction. For each subject, a total of 3,042 volumes (169 volumes per run, 18 runs) were analyzed.

High-resolution whole-brain MR images were also obtained using the MP2RAGE pulse sequence optimized for 7T48 (voxel size = 1.0 × 1.0 × 1.0 mm, matrix size 256 × 256 × 176, TI1/TI2 = 750/2350 ms, α12 = 4/5 degrees, TRMP2RAGE/TR/TE = 5500/6.5/2.84 ms).

The quality of the images was assessed using the MRIQC software49. MRIQC is a fully automated pipeline that accesses data quality associated with artefacts (such as respiratory, scanner and motion-related) and it allows for visual inspection of the scans. The quality control was performed on the raw images of each fMRI run in each subject. Supplementary Fig. 1 shows the output of this quality check i.e., for the different quality measures we show the mean value of each run across all subjects.

At this point it is worth emphasizing here that in this experiment we privileged the acquisition of more data per subject (18 runs for a total of 3,042 volumes brain volumes per subject) rather than the opposite (i.e., less data/subject and more subjects, see for a review Wandell et al.50). This is a usual approach in fMRI mapping studies that put more emphasis on the single subject level analysis (Harvey et al.5, Current Biology; N = 8), (Harvey et al.51, Science; N = 8), (Harvey et al.52, PNAS; N = 5). Retinotopic mapping studies, for example, never average brain activity across subjects (Benson and Winawer53, Elife; N = 8), (DeYoe et al.54, PNAS; N = 6), (Sereno55, Science; N = 12), (Brewer et al.56, Nature Neuroscience; N = 9). This is because retinotopic maps are linked to the geometry of brain structures and can therefore be different in the different individuals. To preserve the link between function and structure, maps are studied in individual native space, on non-normalized to-a-common space and non-smoothed data. This becomes particularly important at ultra-high field (i.e., 7 Tesla) where due to the small voxel size used, performing spatial normalization and smoothing would imply the loss of the gained spatial precision. A remedy to the limited number of subjects of this approach is the great number of data acquired per subjects and the use of cross-validation methods to prove the statistical robustness of the observed effects51. However, the generalization of these effects to a broader population should always be considered cautiously.

fMRI preprocessing

Functional imaging data were preprocessed using Statistical Parametric Mapping (SPM12 v. 7219, Wellcome Department of Imaging Neuroscience, University College London) toolbox in MATLAB. The EPI volumes acquired in each session were realigned to the mean of the session and then co-registered to the T1-weighted image acquired in the same session. To perform group level analysis the realigned and co-registered images were then normalized to the averaged DARTEL template (Diffeomorphic Anatomical Registration Through Exponentiated Lie algebra)57 and smoothed with a 2 mm full-width at half-maximum Gaussian kernel. To perform connectivity analysis, data were also kept in the subject’s native space i.e., data, after realignment and co-registration to the T1-weighted image, were directly smoothed with a 2 mm full-width at half-maximum Gaussian kernel.

Identification of regions of interest (ROIs)

For ROIs identification, we performed a mass-univariate General Linear Model (GLM) approach. The first level analysis was performed on both DARTEL normalized and subject’s native space images. Apart from the different space of the images, these analyses were identical.

Each single-subject model included 18 runs/sessions with 6 event-types (model regressors) in each session. Given that we were interested in the encoding of the 4 different durations we used 4 regressors of interest time-locked to the offset of S1 (standard duration), i.e., the moment when the duration of a stimulus became available to participants. We also used 2 more regressors of no-interest; one time-locked to the onset of S2 (comparison duration) and another one time looked to the onset of the participants’ response. The duration of all events was set to zero. All events were then convolved to the canonical hemodynamic response function (HRF). The linear models included also motion correction parameters as effects of no interest. The data were high pass filtered with a cutoff frequency set to 0.0083 Hz. The 18 runs were then concatenated to avoid any deformation of the time-series due to the filtering58.

Our first aim was to identify brain regions responsive to all four S1 durations. To do so for each subject, we estimated 4 t-contrasts, one for each S1 duration, averaging across the 18 runs. The resulted contrast images (t-statistics maps in DARTEL space) were then used in a second-level ANOVA where, once again, 4 different contrasts were calculated. The statistical threshold was set to p < 0.05 FWE cluster-level corrected for multiple comparisons across the entire brain volume (cluster size estimated at a voxel level threshold p-uncorrected = 0.001). Correction for non-sphericity59 was used to account for possible differences in error variance across conditions and any non-independent error terms for the repeated-measures.

The 4 group-level t-maps were finally used to perform a “winner-takes-all” classification procedure, in which each voxel according to its t value was classified as maximally responsive to one of the four different S1 durations (an arbitrary value from 1 to 4 was assign to these voxels). All voxels with a t-value lower than 3.13 were excluded from this classification procedure (i.e., their value was set to 0).

First constrain

We considered regions of interest (ROIs), brain areas that had all four duration selective clusters (i.e., which included all 4 durations selective voxel-types) with a cluster size of at least 20 voxels per S1 duration. According to the above criterion, we identified 5 distinct ROIs: Cerebellum (Vermis VI, left lobule VI, right lobule V), left & right primary visual cortex (V1), left Intraparietal Sulcus (IPS), left Supplementary Motor Area (SMA) and left Inferior Frontal Gyrus (IFG, see Fig. 1a and Supplementary Table 1 for more details).

Second constrain

Moreover, we considered ROIs only the regions in which the duration selective clusters of voxels show a tuning profile. To avoid circularity, the duration selective clusters of voxels in each ROI were identified in half of the available runs (9 runs) and the hemodynamic response extraction was computed on the remaining runs (9 run). We use 18 randomly selected possible combinations of runs’ splits following a cross-validation procedure. For each split we averaged the normalized BOLD across subjects and runs.

The BOLD response was normalized as follow:

$$BOLD\left( {\text{t}} \right) = \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{{\text{Nruns}}}} \mathop \sum \nolimits_{{{\text{v}} = 1}}^{{{\text{Nvoxels}}}} \frac{{{\text{(x}}\left( {\text{t}} \right) - {\text{MB}})/{\text{MB}}}}{{{\text{Nvoxels}}}}}}{{{\text{Nruns * std }}\left( {\mathop \sum \nolimits_{{{\text{v}} = 1}}^{{{\text{Nvoxels}}}} \frac{{{\text{(x}}\left( {\text{t}} \right) - {\text{MB}})/{\text{MB}}}}{{{\text{Nvoxels}}}}} \right)}}$$

where t is the signal in a given voxel and MB is the baseline obtained averaging the signal of t across runs. Normalization was then performed subtracting the signal in a given voxel from a baseline value and dividing it by the baseline. The BOLD response was aligned to the 2nd volume (i.e., a TR) after the offset of the S1 duration. Within a single subject, we first averaged the BOLD signal across the voxels of a cluster and then across the fMRI runs.

The second-level analysis on DARTEL normalized images, was performed only to identify the 5 ROIs. All the following connectivity analyses were performed in the individual subjects on images in the subjects’ native space.

Mask creation

After identifying the five ROIs at group-level (DARTEL space), we identified the same regions in each individual subject by using transformation matrices available in different stereotaxic atlases. For the creation of cerebellum mask, we employed SUIT (Spatially unbiased atlas template of the cerebellum and brainstem, http://www.diedrichsenlab.org/imaging/suit.htm) toolbox60,61. Specifically, we identify in each subject the following areas: Vermis VI, right lobule V and left lobule VI. For V1, we used the Destrieux cortical parcellation atlas available in FreeSurfer (image analysis suite, which is documented and freely available for download online at http://surfer.nmr.mgh.harvard.edu)62,63 and we combined V1 and S-calcarine (left and right) labels. For IPS, SMA and IFG we used the Freesurfer’s Desikan-Killany cortical parcellation atlas64. Specifically, for IPS we used the intraparietal and parietal transverse sulcus labels. For SMA we used the medial part of BA6 label, while for IFG we combined the BA45 and BA 44 labels. Apart from cerebellum, in which we used the vermis, the right lobule V, the left lobule VI and bilateral V1, all the other ROIs were in the left hemisphere. These masks were used later to identify the volumes of interest (VOIs); i.e. voxels within the ROIs that were significantly active for the contrasts of interest (i.e., 4 t-contrasts, one for each S1 duration).

In the manuscript, we will use the expression “duration selective cluster of voxels” to indicate voxels that have an identical temporal preference and that are also located in a given ROI, for example all voxels in V1 showing preference for 0.2 s S1 stimulus. The distribution and average cluster size of the 4 durations selective cluster of voxels in the 5 ROIs is shown in Supplementary Fig. 3b. The proportion of duration selective voxels with respect to the total number of active voxels in the ROI is reported in Supplementary Fig. 4

Dynamic causal modelling (DCM) analyses

DCM is a direct measure of Effective Connectivity7,9. The main objective of DCM is to estimate the coupling parameters of a hypothetical underlying neuronal model and evaluate how well the modeled BOLD signal approximate the observed BOLD response. We used DCM because it is particularly appropriate for explaining brain responses that are the consequence of specific experimental manipulations. DCM state equations account for the influence of experimental manipulations on the system's dynamics: experimental conditions enter the model as inputs that either drive local responses or change connection strengths, respectively. Although inference can be about any aspect of model structure or any parameter in a DCM, the focus of DCM studies is typically on context-dependent changes in coupling. This is exactly the goal of our study: understanding the model architecture and see how the presentation of stimuli sharing the exact same visual features and differing only in duration, affects the network communication not only between functionally distinct brain areas but also between distinct portions of the same area.

DCM for fMRI uses a deterministic model of neural dynamics in a network of interacting brain regions or nodes7. It models the change of a neuronal state-vector x in time, where each region is represented by a single hidden state, using the following bilinear differential equation:

$$\dot{x} = f\left( {x,u,\theta } \right) = A\dot{x} + \mathop \sum \limits_{j = 1}^{m} uj B^{\left( j \right)} x + Cu$$
$$A = \frac{\partial f}{{\partial x}}\left| { _{u = 0} } \right.,B = \frac{{\partial^{2 } f}}{\partial x\partial u}\;c = \frac{\partial f}{{\partial u}}\left| { _{x = 0} } \right.$$

where \(\dot{x} = \partial x/\delta t\).

This equation, results from a bilinear Taylor approximation to any dynamic model of how changes in neuronal activity in one node xi are caused by activity in the others. More precisely, this bilinear form is the simplest low-order approximation that accounts both for endogenous and exogenous causes of system dynamics. The matrix A represents the fixed coupling among nodes in the absence of exogenous input u(t). This can be thought of as the latent coupling in the absence of any experimental manipulations. The B matrices represent the change in latent coupling induced by the j-th input. They encode context-sensitive changes in A or the modulation of coupling by experimental manipulations. Because B(j) are second-order derivatives they are referred to as bilinear. Finally, the matrix C concerns the influences of exogenous input that cause perturbations of hidden states. The parameters θ  {A,B,C} are the connectivity or coupling matrices that we wish to identify. These define the functional architecture and interactions among brain regions at a neuronal level.

To sum up, these coupling parameters therefore concern the architecture of the network (A-matrix) and the influence of a given stimulus (i.e., in our case is the S1 stimulus duration) on the strength of the connections (B-matrix) as well as on the activity of the network nodes (C-matrix).

This bilinear model of neural dynamics is combined with an empirically validated hemodynamic model that describes the transformation of neuronal activity into a BOLD response (the so-called “Balloon model”)65,66.

Together, the neuronal and hemodynamic state equations furnish a deterministic DCM. For any given combination of parameters θ and inputs u, the measured BOLD response y is modelled as the predicted BOLD signal (the generalised convolution of inputs; h(x,u,θ)) plus a linear mixture of confounds Xβ (e.g. signal drift) and Gaussian observation error \(\epsilon\)  :y = h(x,u,θ) + Xβ + \(\epsilon\).

Here, we must stress that the number of possible combinations of A–B–C-matrices grows exponentially as the number of brain regions in the system grows. For small systems, investigating all possible connectivity architectures might be plausible, but this becomes unfeasible for large-scale systems. For example, in a network with 20 nodes we would have 220×20 connections.

For this reason, rather than exploring the effective connectivity in a 20-nodes network (i.e., the five ROIs by the four “duration channels”), a network for which the number of possible models to test would be exceedingly high, we decided to start with the investigation of the effective connectivity in a simpler 5-nodes network (i.e., the five ROIs without considering the “duration channels”). We also decided to break the whole analysis process in two main stages.

The first set of analyses (for a summary of all performed DCMs see Supplementary Table 2) concerned this 5-nodes network. Our aim here was to explore within this simpler network two features of network connectivity: the network architecture (i.e., how the different network nodes communicate and coordinate with each other, A-matrix) and the influence of a certain stimulus duration on the activity of any of the network node (C-matrix). Once identified these two features, we moved to the second analysis stage, and we used these features to inform and therefore simplify the DCM on the 20-nodes network.

At both analysis stages we also used nested models (families of models). Family level inference was mainly used to overcome the limitations of having large number of models to compare or having different subjects using different models, as in the case of the Random Effect Analysis (RFX) used here (see next paragraph). The limitation to overcome refers to the fact the identification of the best model critically depends on which set of models are compared. Moreover, using families of models for example help removing uncertainty about aspects of model structure other than the characteristic of interest11. DCM analysis was performed using DCM12 in SPM12 (v. 7219).

DCM and parametric empirical bayes (PEB) of a 5-nodes network

Since we did not have a clear hypothesis about the network architecture of the 5 ROIs, we decided to use Parametric Empirical Bayes (PEB) which is considered a data-driven type of analysis67.

First, for each subject, each fMRI run and each ROI we extracted the principle eigenvariate of the active voxels within the ROI i.e., voxels active at the offset of any S1 duration. In total, we ended up with 18 concatenated runs of 5 time-series (one for each ROIs), which were adjusted to the effects of interest (the five regressors of the first level analysis). Since we lacked knowledge about the network connectivity in a temporal discrimination task, we decided to opt for the simplest model assumptions. So, we set the neuronal model of our DCMs to be deterministic, one-state, bilinear with mean-centered inputs. At this point is worth emphasizing that since the one-state model assumes excitatory activity only, we cannot make any inference about the type of neural activity (i.e., excitatory or/and inhibitory) contributing to the connectivity results.

To consider the fact that we acquired the images with a 7T MRI scanner, we revised the parameters of the BOLD signal equation8,68,69. Thus, in the file called “spm_gx_fmri.m” we set: EPI echo time at 0.026 s, ratio of intra- to extra-vascular signal at 0.026, intravascular relaxation rate at 340, frequency offset at 197.9 Hz and resting oxygen extraction fraction at 0.34. These values were changed according to the study by Heinzle et al. (2016)69.

Second, we specified a fully connected model between the five ROIs and we ran the DCM routine of 198 models (a model for each fMRI run and each subject i.e., 11 subjects × 18 runs) to calculate the parameters of the A–B–C-matrices best explaining the fMRI time-series.

Third, we used the PEB to find out the most probable network structure (i.e., strength of connections) across the 198 models67. Briefly, PEB seeks the optimal model by testing various A-B-C-structures and by switching off certain parameters (e.g. connections), it measures how this affects the model evidence. There are three main criteria which approximate the model evidence by identifying the optimum balance between accuracy (proximity to the observed BOLD signal) and complexity of a given model70. Among them, negative Free energy (F-criterion) is supposed to provide better Laplace approximation for the complexity term. This happens because the negative Free energy considers the interdependence between the estimated parameters while ensuring that they are as precise and as uncorrelated as possible9,10,71 Models were treated as random effects that could differ between subjects. Random-effect analysis (RFX) is considered better when dealing with an unknown population distribution. It is also considered more robust to possible outliers9. Posterior probability for second-level effect was set at Pp > 0.99.

After the identification of the optimal model across subjects with PEB, we proceeded with Bayesian Model Averaging (BMA), i.e. averaging the parameters across the “winning” models of all subjects. These parameters reflect the strength with which the activation changes in one network node affect the connected nodes. Positive values indicate facilitation of activation while negative values suppression of it71. Posterior probability for second-level effect was set at Pp > 0.99. To prove the robustness of the PEB results we run a second PEB analysis using a cross-validation approach in which for each subject we have used half of the runs (from run 1 to run 9) to define the ROIs and the other half to run the PEB (from run 10 to run 18).

DCM of a 20-nodes network

To explore the effective connectivity between the duration selective clusters we then ran a DCM on a 20-nodes network. The 20 nodes were the four duration selective clusters of voxels within the five ROIs (Cerebellum, V1, IPS, SMA, IFG). First, we extracted the fMRI time-course. For each subject and each fMRI run we took the principal eigenvariate for each duration-selective cluster of voxels within the ROI (the voxels active at the offset of the different S1 duration i.e., 0.2, 0.4, 0.6, 1.1 s). In total, we had 18 concatenated runs of 20 time-series (i.e., 20 VOIs), which were adjusted to the effects of interest (i.e., the five regressors of the first level analysis). We were unable to create the 20 VOIs for all participants, since not all of them show activation above threshold for each S1 duration within the 5 ROIs. The 20 nodes DCM was then performed only on 6 out of the 11 initial subjects. Although some statistical power was lost, the fact that we had 18 sessions per subject resulted in modeling 108 sets of time-series (i.e., 6 subjects by 18 sessions). This gave us enough statistical power to make inferences afterwards. After extracting the time-series, we set the neuronal model of our DCMs to be deterministic, one-state, bilinear with mean-centered inputs. As in the previous DCM we revised the parameters of the BOLD signal equation to be appropriate for a 7T MRI scanner.

Then, to simplify the complexity of the 20-nodes network, we incorporated the connectivity results (A-matrix) of the PEB analysis. More precisely, we allowed bi-directional connections between the 4 duration selective clusters of the Cerebellum to those of V1 and IFG, feedforward connections between Cerebellum and IPS, Cerebellum and SMA, V1 and SMA, IPS and SMA, and finally feedback connections from IFG to V1.

Since the model of connectivity between the five ROIs was borrowed from the PEB analysis, before testing specific hypotheses about connections and modulations by stimulus presentation in a 20-nodes network, we decided to do an extra check on the robustness of the PEB result. To do this, we ran 2 DCMs to determine the A-matrix best explaining the fMRI time-series. We compared two models one with an A-matrix “fully-connected” and one with a “PEB-like” structure. In both models B and C matrices were left fully-modulated. We then compared the two models by using Bayesian Model Selection (BMS). BMS is hierarchical and variational Bayesian approach which is more robust in dealing with outliers. This is because each model is treated as a random variable9. To measure the group evidence of the two models (“fully connected” versus “PEB-like”), we computed their protected exceedance probabilities (PEPs). PEPs are an improved extension of exceedance probabilities (Eps) in the sense that PEPs consider the possibility that the observed differences in model evidence (across subjects) might be due to chance72. EPs represent the belief that a given model is more likely than any other model, given the group data. We considered mandatory for a “winning” model to have PEPs > 90%.

Once we established that a “PEB-like” structure of the A-matrix was better than a “fully connected” one, we proceed by exploring which region among the five ROIs was modulated by S1 duration presentation (C-matrix). Fifteen possible models were tested. All models had an A-matrix “PEB-like”, a B-matrix in which all existing connections were modulated by stimulus presentation and 15 different combinations of C-matrices. The 15 combinations are the result of the fact that we arbitrarily allowed to have one or maximum two brain regions modulated by stimulus presentation. These 15 possible model-variations (i.e., nested models) were constructed using Bayesian Model Reduction (BMR). BMR is a rather new procedure that allows the computation of model evidence and posterior probabilities when the priors of a model with more parameters are known67. Bayesian Model Selection (BMS) showed that the most likely model was the one with a C- matrix in which IPS was the area modulated by duration presentation. For the winning model PEP was indeed greater than 90%. In BMS, models were treated as random effects.

At this point given a certain model structure (A-matrix “PEB-like”) and a C-matrix in which IPS was the area modulated by duration presentation, we asked how the different duration-selective clusters of voxels communicate with each other (i.e., what is their connection and how these connections are modulate by duration presentation). To achieve this goal, we ran a DCM analysis, in which we tested hypotheses concerning the existence of connections (A-matrix “PEB-like”) and the modulation exerted by stimulus duration on both existing connections (B-matrix) and IPS activity (C-matrix). Specifically, we hypothesize these connections and modulations to be: (1) stimulus duration independent, (2) only partially stimulus duration dependent i.e., neighboring dependent or (3) totally stimulus duration dependent.

We tested 108 possible models which were the combination of these 3 possibilities (i.e., duration independent, neighboring dependent, duration dependent) applied to the A-B and C matrices (see equation below). Connections (A-matrix) and modulations (B-matrix) were checked within duration selective clusters on a given ROIs (w) or between duration selective clusters in different ROIs (b).

AbiAwjBbkBwmCn.

Where ABC are the A–B–C matrices of the DCM analysis, i = 1:3, j = 1:3, k = i:3, m = j:3 and n = 1:3, w is within ROI b is between ROIs.

To simplify our models, modulations of self-connections within each duration-selective cluster were not allowed (i.e., they were set to zero)73.

For the calculation of the A-B-C-parameters of these 108 models we used the BMR procedure. To identify the winning model, we first divided all models into three main families. The three families differed based on the modulatory effect of S1 stimulus duration on the activity of IPS duration selective clusters (i.e., stimulus duration independent, neighboring dependent, stimulus duration dependent). Each family had 36 models (i.e., 36 by 3 possible C modulations equal to 108 models) The PEPs of each family and model was then estimated using BMS, where models were treated as random effects10,11. To determine which of the 36 models within the winning family best explained our data, we used Bayesian Model Comparison (BMC)10. Finally, after the identification of the optimal model across subjects, we proceeded with Bayesian Parameter Averaging (BPA), i.e. we averaged all winning’s model A-B-C-parameters across subjects12. An identical pipeline was followed to run a second cross-validated DCM 20-nodes analysis in which for each subject we used half of the runs to define the ROIs (from run 1 to run 9) and the other half (from run 10 to run 18) to run the analysis.

Graph analysis

Graph analysis is usually employed to characterize the DCM network architecture with complex graph measures to help understanding brain connectivity and relate it with other real-world networks74. The graph analysis was performed here with the help of the BRAPH2 toolbox75 (https://github.com/softmatterlab/BRAPH-2-Matlab).

The BMA average of the 20-nodes A-matrix (Supplementary Fig. 5) from the winning model was transformed as the weighted adjacency matrix of a directed graph between the 20 nodes. In the adjacency matrix, each interaction of the A-matrix corresponded to a directed influence (edge) of one node on another node. Further, the self-connections in the A-matrix were represented as zeros on the main diagonal in the adjacency matrix. Finally, we defined the binary adjacency matrix, obtained by setting to 1 all the non-zero entries of the adjacency matrix. The network architecture was then characterized with four types of typical graph metrics, clustering coefficient76, betweenness centrality74, in-degree and out-degree74 (see Supplementary Fig. 7). The clustering coefficient is a measure that assesses the presence of clusters in a graph. For each node, this can be calculated as the fraction of the node's neighbors that are also neighbors of each other. Betweenness centrality, is the fraction of all shortest paths in the network that pass-through a given node. This measure can be used to assess the significance of a node as network hub, regulating most of the information flow within the network. In a directed graph in-degree is a measure of the number of connections that arrive at a given node, whereas out-degree is the number of connections that leaves the node.