Introduction

Our environment bombards the brain with a variety of regular and irregular inputs on different timescales. Consider one of the temporally most complex inputs, music. We can simultaneously perceive the music’s different timescales and, even better, are able to integrate them into one meaningful whole like a melody. Moreover, the melody can be distinguished from the ongoing accompaniment in the background.

How can our brain process and integrate such multiscale inputs? Recent evidence suggests that the brain itself exhibits intrinsic neural timescales (INT)1,2,3,4,5,6,7,8,9,10,11. As measured by the autocorrelation window (ACW) in resting state, lower-order unimodal sensory regions—the primary visual cortex, for example—shows short timescales compared to higher-order transmodal regions like the default-mode network (DMN)2,3,4,5,6,7,8,9,10,12,13,14,15,16,17,18,19,20,21,22,23. However, the specific function or role of the INT in the brain and its neural processing still remains unclear.

Reviewing various findings on INT in both human and non-human species, we propose that they play a key role in processing and structuring inputs in different timescales (Fig. 1 for a general framework). Rather than focusing on specific kinds of inputs (like visual, somatosensory, or auditory; see Box 1), we here aim to describe basic dynamic principles of the temporal nature of input processing that are shared across different inputs. Specifically, the brain utilizes its own INT to process and actively shape the extrinsic timescales of the multiscale inputs it receives from both environment and body. This allows the brain to encode the stochastic structure of its environmental inputs according to its own stochastic structure. Its own stochastic structure is determined by its INT and its unimodal–transmodal hierarchy. That, as we will detail, is mediated by specific computational mechanisms like temporal integration/segregation as well as input sampling with consecutive shifting towards slower frequency modes within the processing hierarchy.

Fig. 1: The proposed function of the intrinsic neural timescales in input processing.
figure 1

The key assumption is that the intrinsic neural timescales process the input by matching the stochastics of neural activity with the stochastics of the environment. The nature clip art credit: Nature Vectors by Vecteez (https://www.vecteezy.com/free-vector/nature).

We first review empirical evidence on INT during both resting and task states. That is followed by a second part where we link INT to distinct facets of input processing, like encoding of stochastics and their sharing by different species. In the third part, borrowing from Physics and Mathematics, we explore the computational mechanisms driving input processing by the INT; this includes temporal integration and segregation of the input, as well as a sampling mechanism that shifts subsequent stages of input processing towards slower frequency modes. We conclude that the role or function of INT consists in processing and structuring multiscale inputs from the environment which, to some degree, is evolutionarily shared across human and non-human species. Given its key role in the brain’s input processing, INT carries major implications for psychiatric disorders (Box 2) and artificial intelligence (Box 3).

Part I: intrinsic neural timescales in rest and task states

Calculation of intrinsic neural timescales

INT is commonly investigated in cellular4,24 and systemic1,12,13,14,22,25,26 granularity levels. On a systemic level, Hasson and colleagues1,12,26,27 operationalize INT using functional connectivity during task state. They define temporal receptive windows as “the length of time before a response during which sensory information may affect that response”1 and it roughly correspond to what are described as temporal receptive fields on the cellular level28.

Recent studies operationalize the INT using the autocorrelation function of the signal during both resting and task state. Autocorrelation function is the correlation of a signal with shifted (time-lagged) versions of itself4. Since autocorrelation function yields a series of numbers, different studies report slightly different properties of it. Murray and colleagues4 obtain INT by fitting an exponential curve to the autocorrelation; however, ACW is most commonly reported in the INT’s literature.

In their fMRI studies, Watanabe and colleagues14 define ACW as the area under the curve of the autocorrelation function from zero to the time lag that the correlation reaches zero (see also refs. 13,25). However, EEG/MEG studies define ACW as the time lag the correlation reaches half of its maximum value (Fig. 2a) or when it reaches 1⁄e (Golesorkhi et al.22 defines a new variant called ACW-0).

Fig. 2: Autocorrelation window (ACW) in resting and task states.
figure 2

a ACW is defined as the first lag in which the correlation of the signal with itself drops below 50% of the maximum correlation. It is measured from the autocorrelation function. b The ACW, as recorded in MEG22, shows topographical differences between regions during resting and task states. The colormap is in milliseconds and represents the length of ACW. c The brain map represents the uni-transmodal organization of the brain regions. The table schematically shows how ACW changes from resting to task states and also from unimodal to transmodal units in two arbitrary tasks and four sample units. The table is only for illustration purposes. Unimodal and transmodal units refers to either unimodal or transmodal regions in the brain. The numbers 1 and 2 indicate the hierarchical position (1 = lower; 2 = higher) of the respective region/unit. The blue (unimodal) and red (transmodal) intervals represent the width of their respective intrinsic neural timescales (INT) during rest and two different tasks (task 1 and 2). CfR indicates the change from rest to task with either decreasing (downward arrow), increasing (upward arrow), or maintained (horizontal double-sided arrow) width of the regions’ INT during task relative to rest. Though schematically, the differences in the width and rest–task change of the INT during the two tasks shall indicate the flexible and adaptive nature of the timescales as it is supported on both regional22 and cellular24,36,37 levels (see in rest–task overlap and rest–task modulation sections).

Resting state I: temporal hierarchy of unimodal and transmodal regions

Murray and colleagues4 investigated single-cell recording data in non-human primates and calculated their autocorrelation function in pre-stimulus intervals. From that, they measured the duration of the temporal window at a correlation decay of 50%, i.e., ACW. They observed a shorter ACW in lower-order unimodal sensory regions while higher-order transmodal regions, such as the prefrontal cortex, exhibited a longer ACW4. Subsequent computational modeling studies employed large-scale non-human primate-based, human-based structural connectivity networks5,29, or a standard model of synchronization, i.e., Kuramoto model2 (0.01–0.1 Hz). They also demonstrated longer INT (as measured by the ACW) in prefrontal regions which remained shorter in sensory and motor cortex (see also ref. 17, see also other models of neural dynamics in refs. 30,31).

The computational findings are supported by observations of a corresponding hierarchy of timescales in real human data using fMRI13,18. Operating in the infraslow frequency range (0.01–0.1 Hz), resting state fMRI studies applied the autocorrelation function to the BOLD signal and, following Murray and colleagues4, determined the ACW at 50% of correlation decay13,14,18. Employing small14,15 or large-scale13,18,29 fMRI datasets, all studies observed shorter ACW in unimodal regions, including sensory and motor regions/networks on the cortical level. In contrast, transmodal regions, including higher-order networks such as the central-executive networks (CEN), dorsal attention networks (DAN), and default-mode network (DMN), generally show a longer ACW.

In addition to a temporal hierarchy on the cortical level, Raut and colleagues13 also measured the ACW in subcortical regions like the thalamus, cerebellum, striatum, and hippocampus. Interestingly, they again observed that gradients of the ACW within each of these subcortical regions, especially in the thalamus and striatum, appear to mirror the temporal hierarchy on the cortical level. Together, these data strongly suggest that both cortical and subcortical regions display an intrinsic hierarchical organization with unimodal sensory and motor regions showing shorter timescales while transmodal higher-order association regions exhibit longer timescales.

These findings were all obtained in human fMRI that measures BOLD activity in the infraslow frequency range (0.01–0.1 Hz). That raises the question as to whether the distinction of shorter unimodal and longer transmodal INTs are also present in faster frequency ranges between 1 and 70 Hz as can be typically measured with EEG/MEG. Indeed, two recent human resting state MEG studies22,29 demonstrate a longer ACW in higher-order transmodal regions/networks like the CEN and DMN, whereas it was significantly shorter in unimodal sensory regions. Hence, these findings suggest that INT follows a similar topographical distribution in faster frequencies (1–70 Hz) as those in slower frequency ranges (0.01–0.1 Hz). Such ubiquitous occurrence suggests a most basic or fundamental, though unclear, role or function of INT in the brain.

Resting state II: intrinsic neural timescales and functional connectivity

How are the intra-regional INTs related to inter-regional connections? The INT is constituted by both intra-regional cellular features5,16,32,33 and inter-regional connectivity4,5,13,20. Intra-regional cellular features concern the excitation–inhibition balance with its local recurrent wiring34 as in supragranular feedforward and infra-granular feedback connections5,13,32,35 (see also ref. 20 for demonstrating the relevance of population codes). Cavanagh et al.36 demonstrate that even within regions there is considerable variability of the INT at the single neuron level. Specifically, the temporal receptive field of a single neuron can change over time and adapt to, for instance, task demands as during working memory (see also ref. 24) and/or decision making36,37,38. Moreover, Spitmaan et al.37 observe less dependence upon the task context—this further underlines their adaptative nature. Moreover, as the authors put it, the timescales of different neurons during task-related activity suggest a certain independence, i.e., flexibility.

In addition to the intra-regional cellular features, INT is also strongly shaped by inter-regional connectivity. Chaudhuri and colleagues16 demonstrated that purely local connectivity itself is insufficient to yield the diversity of timescales across the cortex. Moreover, in their non-human primate-based computational model5, they remove all long-range projections which significantly restricts the range of different timescales and abolishes the intrinsic temporal hierarchy. The relationship of intra-regional INT and inter-regional functional connectivity holds again across different species as it can be observed in both non-human primates9 and humans13,18,29,39,40,41,42.

How exactly is inter-regional functional connectivity related to the INT? Two recent studies in human fMRI show that the duration of INT in different regions, as measured by the resting state ACW, is positively correlated with the degree of said region’s change in functional connectivity during task: the longer the region’s resting state ACW, the stronger its task-related change in its functional connectivity to other regions18,43. That is further supported by Raut and colleagues13 who demonstrated that the individual variability in ACW across different regions is directly related to the individual variation of the functional connectivity pattern of the same regions (see also42,44,45,46,47,48).

Together, these findings suggest a close relationship of INT to the brain’s inter-regional connectivity pattern—intra-regional temporal features are, in part, constituted by long-range inter-regional connections. Such an intimate link between intra-regional timescales and inter-regional connectivity means that the different timescales can interact and integrate with each other. This may enlarge the number of available timescales, i.e., the repertoire of timescales, as we will illustrate later.

Rest–task overlap: from intrinsic neural timescales to temporal receptive windows

Is the resting state’s INT related to task states? A positive answer to this question would support their involvement in input processing. The relevance of INT for input processing is strongly suggested by the excellent studies of Hasson and colleagues15,26,27,49,50,51 (ref. 1 for review). They demonstrate that shorter temporal segments of external stimuli (like single words of stories, or short episodes in movies) are processed preferentially in lower-order unimodal sensory regions. Longer intervals (like whole paragraphs in stories, or longer episodes in movies) are related to activity changes in higher-order transmodal regions. Given that external inputs are processed and structured in temporal terms, i.e., according to different durations, Hasson and colleagues1 speak of temporal receptive windows which roughly correspond to what are described as temporal receptive fields on the cellular level28.

Does the spatial, or topographical, pattern of the INT overlap between rest and task states, i.e., rest–task overlap? While such rest–task overlap has been well demonstrated for functional connectivity18,52,53,54,55, it remains an open issue in the case of INT. The various task studies on the brain’s temporal receptive windows show a spatial pattern that is well in accordance with the hierarchical organization of INT in the resting state. In the same way, the ACW is longest in the DMN during rest. Task states also show that the DMN processes the longest sequence of inputs and information26,27, while the shorter resting state ACW in unimodal sensory regions seems to find its equivalent in the short sequences of inputs processed in these regions1. Hence, comparison of rest ACW and task temporal receptive windows shows analogous hierarchical topographical organization. This suggests a close relationship between rest and task, i.e., rest–task modulation or interaction56,57,58,59,60 (see below for the discussion of task-specific changes in INT).

If there is such a rest–task overlap, one would assume that the hierarchical organization of resting state ACW is carried over to, and thus present in, the temporal receptive windows during task states. Evidence for such rest–task overlap comes from both computational modeling and brain imaging. Gollo and colleagues2 conducted a modeling study based on the synchronization model of Kuramoto with simulations of transcranial magnetic stimulation: they show that regions with longer ACW, as located in the transmodal core, display lower and more sluggish activity changes in response to external stimuli than sensory regions; the latter exhibit a shorter ACW at the more unimodal periphery, accompanied by higher amplitude and a faster response to external stimuli (see also refs. 17,35). Analogous results were observed in the modeling study by Chaudhuri and colleagues5 who applied electrical stimulation to V1 in the visual cortex to his non-human primate-based network model (see also ref. 29). One interesting finding here is that regions weakly connected to the input regions exhibit longer INT during stimulation. This again demonstrates that tasks exert effects beyond those at the stimulated regions themselves. These computational data on rest–task overlap of INT are supported by human brain imaging data. A recent human fMRI study by Ito and colleagues18 investigated the ACW in resting state and the amplitude during different task states. They demonstrated a negative correlation of resting state ACW duration (in different regions) with the magnitude of task-related activity, i.e., amplitude, in the same regions. Therefore, the longer the region’s resting state ACW, i.e., transmodal regions, the lower its task-related amplitude. While regions with shorter ACW, i.e., unimodal regions, exhibit higher amplitude during different tasks. These results support the idea that the resting state’s INT strongly shapes task-related activity and associated input processing2,60,61. The mechanisms of this, however, remain unclear.

Rest–task modulation: intrinsic neural timescales shape task states and behavior/cognition

The rest–task overlap strongly suggests that the resting state’s INT may also shape or modulate the temporal features of task states including associated cognition—this amounts to what we describe as rest–task interaction or modulation (see also refs. 56,58,62). This has recently been addressed by Golesorkhi and colleagues22 (see also ref. 15 for initial steps). Applying MEG, they investigated the ACW-50 and ACW-0 (see above) not only during rest but also during three different task states (motor, story-math, working memory). They showed that the resting state’s ACW and its hierarchical core–periphery organization strongly predict their task states: the resting state’s core–periphery organization of ACW was essentially preserved during all three task states as topographical rest–task correlation yielded high values (0.8–0.9)22. These results suggest that the resting state’s hierarchical organization of its INT is essentially carried over, and preserved, during task states, irrespective of the task.

Golesorkhi and colleagues22 also observed some task-specific changes (Fig. 2) when calculating the rest–task difference (which subtracts and cancels out the shared, i.e., correlating temporal hierarchical organization). Specifically, higher-order network regions showed a strong ACW, shortening during the story-math task (which was presented in 30 s intervals). Only minimal changes were seen in motor and working memory tasks. A reverse pattern was observed in lower-order network regions; the ACW was shortened considerably during working memory but minimally in story-math and motor tasks. These data suggest that, once one subtracts the hierarchical temporal organization present in both rest and task, task-specific changes can be observed. Furthermore, the ACW itself and, more generally, the INT, can be modulated during task states—they are dynamic and adaptive rather than static and non-adaptive. Though more studies are needed, task-related modulation seems to mainly concern the shortening of the ACW relative to rest. The adaptative nature of INT is also documented by them during the delay period of a working memory task (relative to pre-stimulus baseline) in human ECoG24.

In addition to task states, INT also shape behavior and cognition. Studies in non-human primates demonstrated that a longer duration of the resting state’s INT (as obtained during baseline intervals sandwiched between tasks) is associated with better behavioral performance in a variety of different tasks. These include a longer duration of delays in a delay discounting task4, stronger spatial response coding in the delay period during a non-match-to-goal task63, and modulating working memory performance during later periods, i.e., delay9. On the human side, recent fMRI and/or EEG studies demonstrate that the resting state’s ACW is directly related to higher-order cognition like the level of consciousness64,65, the sleep stage21, the sense of self66,67,68,69, and psychiatric disorders (see Box 2 for details). Tentatively, these data show that INT strongly shapes behavior, including perception and higher-order cognition like consciousness and self. Since task states, as well as perception and cognition, are dependent upon various kinds of inputs, together these data are compatible with the assumption that INT is key for input processing and structuring.

Part II: input processing through intrinsic neural timescales

Key findings of the INT are: (I) their topographical organization during both resting and task states along uni- and transmodal regions/networks; (II) their topographical carry-over and partial change during the transition from rest to task22; and (III) their relation to the temporal structure of external inputs during task states1,26,49,50,70,71. Together, these findings suggest their involvement specifically in the brain’s input processing.

We presuppose here a wide notion of input including stimuli from both one’s own body, i.e., interoceptive, and external environment, i.e., exteroceptive (Box 1). Our focus is primarily on the dynamic principles and mechanisms underlying input processing in general rather than describing the specifics of various inputs like auditory, visual, or somatosensory (see Box 1). Considering INT, in the following we first address the importance of input processing as distinguished from output processing and then discuss two important facets: (I) cross-species input sharing (II) and stochastic matching of the extrinsic environmental inputs with the brain’s intrinsic stochastic structure.

Input vs. output: capacity or predisposition for input processing

Is INT engaged in either input or output processing? This was recently addressed by Zilio and colleagues21, who investigated the ACW in resting state EEG in subjects with physiologic, pharmacologic, and pathological alterations of consciousness. Under such conditions, input processing is known to be altered in distinct ways, i.e., progressive decrease (NREM sleep stages N1, N2, N3), isolation from external inputs but preserved capacity for processing of internal inputs (from the own body and brain) (REM sleep and ketamine), and extreme deficiency or complete absence of both external and internal inputs (unresponsive wakefulness syndrome, sevoflurane). Additionally, they included subjects with complete loss of motor function, e.g., output processing, whereas input processing and consciousness are preserved (locked-in syndrome and amyotrophic lateral sclerosis).

The results (Fig. 3) show abnormally long ACWs in the unresponsive wakefulness syndrome, through abnormal strengthening of slow frequency power (and concurrent weakening of fast frequency power). Also, both the physiologic and pharmacologic alterations of consciousness showed abnormal prolongation of the ACW in line with the progressive decrement of the capacity of input processing in the different behavioral states. The motor conditions, in contrast, exhibited a “normal” ACW with a preserved balance of slow and fast frequencies in the power spectrum. Together, these findings support the involvement of INT specifically in input processing. In contrast, INT does not appear to be significantly associated with output processing in subjects with motor deficits but preserved input processing. If the ACW was significantly involved in both input and output processing, it should have globally changed in both types of conditions, altered states of consciousness and altered motor conditions (although the ACW is significantly shorter in the parieto-occipital regions of amyotrophic lateral sclerosis patients than in healthy subjects). We need to be cautious, however. One can neither fully exclude output disturbances in the altered states of consciousness nor changes in input processing in the motor conditions (locked-in syndrome, amyotrophic lateral sclerosis). Hence, more direct support for the role of the ACW in input processing is required.

Fig. 3: Input vs. output processing.
figure 3

On a whole-brain level, healthy awake subjects and subjects with motor deficits but preserved input processing (amyotrophic lateral sclerosis, locked-in syndrome) present short ACW, i.e., normal neural timescales accompanied by a balance of slow and fast frequencies, which is associated with a normal capacity of encoding inputs, while other physiological, pharmacological, and pathological conditions, e.g., sleep (N1-N2), unresponsive wakefulness syndrome and deep anesthesia (e.g., sevoflurane) show a progressive stretching of the ACW, i.e., prolonged neural timescales accompanied by a shift towards slower frequencies, which consequently lead to the abnormal prolongation of the input processing temporal window (the EEG signals and the ACW representations are taken from the datasets investigated in Zilio et al.21).

Of note, however, is that Zilio and colleagues21 investigated only resting state activity. Therefore, the ACW can only be indirectly related to input processing; an investigation of task states with actual inputs are needed to provide a direct relation to input processing. Given that the disorders and the alterations of consciousness are known to exhibit deficient input processing72,73,74,75,76, their findings suggest that the resting state’s INT exerts the capacity for input processing, i.e., a neural predisposition77,78,79,80. Even when not exposed to actual multiscale inputs from the external environment, the resting state still exhibits its own INT, which index its capacity for processing the former. This is, for instance, the case in sleep where we can still be awoken at any time by sufficiently strong external inputs—the brain’s capacity or predisposition of input processing is preserved76. In contrast, this remains impossible in total anesthesia and coma where even the strongest external inputs will not wake the individual—the brain’s capacity or predisposition of input processing is lost.

Input sharing: cross-species evolutionary preservation of intrinsic neural timescales

We so far demonstrated the significance of INT in processing inputs from the external environment. Assuming that different species somewhat share one and the same external environment, then one would suppose that they should, to some degree, overlap or share, at least in part, their INT. There is indeed evidence for such “input sharing” across species as it is manifested in the cross-species evolutionary preservation of INT.

The data suggests that the regional differentiation of the INT along the transmodal–unimodal gradient holds in both non-human primates4,5,20 and humans13,14,18,22. This can be extended to other species as it is supported by cross-species studies on both the cellular81,82 and more regional-systemic83 levels. Shinomoto and colleagues81,82 show, in a first step, regional differentiation in the cellular firing pattern of different regions in the non-human primate brain: spiking patterns are regular in motor areas, random in visual areas, and burst-like in the prefrontal cortex. In a second step, they demonstrate that such temporal fingerprinting in the regions’ temporal structure of their firing pattern holds across different species including non-human primates, cats, rats, and mice; the differences in firing patterns between different regions within one species are larger than the firing pattern differences within the same region across different species82,84. Together, these data demonstrate that temporal features of neural firing patterns on the cellular level of specific regions are preserved across different species.

Analogous observations of cross-species evolutionary preservation have been made on the more regional-systemic level of oscillations. Buzsáki and colleagues83 demonstrate that various oscillatory rhythms such as alpha, spindles, and ripples are present in more or less the same frequency range in different species including humans, non-human primates, dogs, bat, gerbil, guinea pig, rabbit, mouse, and hamster (see also ref. 84). Importantly, Buzsáki and colleagues83 observe that such preservation of the same frequencies across different species holds independent of brain size: even if the brain size changes and becomes larger throughout evolution, the frequency range of the rhythmic pattern remains the same in different mammals. They conclude that temporal organization of the brain is a key priority in evolution: “In summary, the preservation of temporal constants that govern brain operations across several orders of magnitude of timescales suggests that the brain’s architectural aspects—scaling of the ratios of neuron types, modular growth, system size, inter-system connectivity, synaptic path lengths, and axon caliber—are subordinated to a temporal organizational priority”. (ref. 83, p.755).

Is the human brain’s INT an evolutionarily preserved manifestation of our ancestors’ timescales, including their key role in processing external inputs from the environment? The findings by both Shinomoto and colleagues81,82 and Buzsáki and colleagues83 suggest exactly that. One would consequently expect that human behavior, if based upon its evolutionarily preserved INT, should resemble the behavior of non-human species. For instance, Zhang and Ghazanfar85 propose that the timescales of human infant vocal production can be seen in line with the multiple INTs of vocal production in marmoset monkeys, songbirds, and other vertebrates. Together, these findings suggest that INT is, in part, preserved evolutionarily across different species which may be manifested in somewhat similar forms of behavior (Fig. 4a).

Fig. 4: Evolutionary cross-species stochastic matching of the input.
figure 4

All parts are for illustration purpose only and the sequence of brains is not intended to represent any evolutionary hierarchy. a Intrinsic timescales in the brains’ of four sample species. On a general scale, a more complex brain has higher number of intrinsic timescales (e.g., human vs. mouse). Also, different brains may have timescales with similar or different lengths. b The interaction between different intrinsic timescales may create species’ repertoire of timescales. Each state in the repertoire is the result of the interaction between a pair of timescales. For example, state A is the result of the interaction between timescales 1 and 2. Here, for the sake of simplicity the interaction is defined as the difference between the lengths of two timescales. Although, the timescales themselves are unique to each species’ brain, the interactions (states) can be shared between different species, e.g., state C is shared between all four sample species. So, the repertoire of states in each species’ consists of some states that are typical to that species and some states that are shared with other species. c The interaction between the environment and the brain happens through the matching of timescales. On the left, we have the environment and a sample input which contains several timescales with different lengths (a, b, c, d, e). On the right, the matching between the input and each species’ brain is illustrated. Each timescale in the input is matched to the best state from the repertoire of timescales. The best state is the state that yields the least error. The brain clip arts are credited to ref. 115.

Input encoding: matching the environment’s stochastics by the brain’s intrinsic neural timescales

How is it possible that different species share their INT? The reason for such cross-species similarity cannot be found in the brain itself as cross-species temporal similarities occur across different brain sizes83. Picking up the suggestion by Shinomoto and colleagues81,82, cross-species similarity may rather be related to similarity in function and, as we specify, in the nature of the inputs. Different species share more or less the same environmental context, i.e., ecological niche and affordances86,87,88. They consequently are exposed to the same input that share similar timescales. Specifically, even if the inputs themselves are elaborated in somewhat distinct ways by the species-specific sensory organs, the input stochastics, i.e., the relation between inputs, may nevertheless be processed in similar ways across the different species.

Is the brain’s INT related to the input stochastics in the environment? Stephens and colleagues15 demonstrate that regional differences in the brain’s INT, as indexed in their study by the power spectrum, are related to the temporal structure of the external information, with the former aligning to the latter: (I) regions with shorter INT and faster dynamics, i.e., early auditory cortex, are activated during shorter stimulus segments (e.g., single phonemes or words) (see also refs. 89,90,91 for more support in terms of entrainment); (II) regions with intermediate timescales and balanced slow–fast dynamics, i.e., superior temporal gyrus and inferior frontal gyrus, are recruited by intermediate durations in the temporal structure of stimuli (e.g., the structure of sentences); and (III) regions with longer intrinsic timescales and slower dynamics, i.e., precuneus and medial prefrontal cortex, are activated by slowly varying stimulus dynamics (e.g., stimulus narrative, see also refs. 1,15,26,27,36).

These findings raise yet another question, though. How can the limited number of INT of the brain’s various regions process and sample a seemingly unlimited and constantly changing number of extrinsic neural timescales of the environment? Given that there is a positive relation of intra-regional INT and inter-regional functional connectivity13,18,36 (see above in Part I), we propose direct interaction between the different regions’ INT—such interaction would enlarge the number of possible timescales, i.e., the repertoire of timescales, as we say (Fig. 4b). We can take the structure of DNA as an analogy. This complex structure is created from only four bases of adenine, thymine, guanine and cytosine. If a species has a high number of regions with different INT, their degree of possible interaction through inter-regional functional connectivity is much higher than in a species with only a low number of regions exhibiting distinct INT and/or low inter-regional functional connectivity.

A large repertoire of timescales may extend the organism’s ability to encode and sample the input stochastics of their respective environment in a more fine-grained and temporally differentiated way, that is, according to distinct timescales in the environment. That, in turn, may reduce the error in the brain’s encoding of the input stochastics relative to the latter’s stochastic occurrence in the natural world. Accordingly, we tentatively suppose that species with a higher number and thus large repertoire of INT are prone to lower degrees of error in their input processing—they can better align to their environmental context in a more fine-grained way than species with a low number of intrinsic timescales and/or a small repertoire (Fig. 4c).

Part III: mechanisms of input processing through intrinsic neural timescales

Input segregation and integration: temporal precision vs. smoothing

What are the mechanisms by which the INT processes input? The various task state studies conducted by Hasson and colleagues with the formulation of the temporal receptive windows suggest that the INT may structure the inputs into segments of different durations, e.g., short and long segments like single words, sentence, and paragraphs1,15,26,27,92. Such temporal structuring may mean that certain inputs are processed together with high degrees of temporal integration, amounting to some form of “temporal smoothing”92,93. Other inputs may be processed in a more segregated and, therefore, temporally precise way entailing higher degrees of temporal segregation (see refs. 93,94). Together, this amounts to a balance of temporal integration vs. segregation in input processing.

How can INT modulate their balance of temporal integration vs. segregation during input processing? The ACW measures the degree of correlation of neural activity patterns across different time points. If only a low number of distinct time points correlate with each other, the correlation is low, indexing a short ACW. This means that inputs at more time points beyond those that correlate in ACW are sampled independent of each other—they will be processed with high degrees of temporal segregation but low temporal integration93. Moreover, the processing of single inputs may then be more or less restricted to their actual durations as, due to low correlation with a low ACW, they are not expanded (in a virtual way) beyond their actual physical durations, i.e., temporal smoothing or expansion95. Accordingly, short INT predisposes that inputs are processed with high temporal precision in both their specific time points and actual durations with low degrees of “temporal smoothing”. Such a pattern of input processing is strongly supported by the short duration of the intrinsic timescales in unimodal regions like the sensory cortex that display a short ACW in rest and a short temporal receptive window during task states1,2,3,5,12,15 (Fig. 5).

Fig. 5: Temporal integration vs. segregation and temporal precision vs. duration in task-evoked activity.
figure 5

The figure highlights two ways how the intrinsic neural timescales can manipulate input processing; this concerns the degree of integration vs. segregation of two (or three) different stimuli (left part) and the degree to which the temporal duration of the stimulus itself can be expanded in neural activity through longer time windows such that the duration of the neuronal activity related to the stimulus, i.e., neuronal duration, is extended beyond the stimulus’ physical duration (right part). From left to right, the figure shows how shorter ACW (especially in unimodal regions) permits to distinguish fast stimuli (high degree of segregation) with a precise temporal encoding consistent with the physical duration of the stimuli (high temporal precision associated with short evoked activity). On the other hand, longer ACW (especially in transmodal regions) permits higher correlation of neural activity across time (high degree of integration), leading to the virtual expansion of the actual stimulus (high temporal duration associated with long evoked activity), i.e., the capacity to encode different stimuli in a way that the evoked activity is longer than the actual physical duration of stimulus.

The reverse pattern of input processing appears to hold in transmodal regions; their rest ACW and task temporal receptive windows are much longer than those in unimodal regions1,2,3,5,12,15. A longer ACW indicates a higher correlation of neural activity across temporally more distant time points. Inputs at different time points are then not sampled independently of each other, but somewhat linked together across time resulting in “temporal summing and pooling” (see ref. 93 who speak of temporal pooling and summing) and ultimately high degrees of temporal integration93. Moreover, the duration of the single inputs’ neural processing may virtually, i.e., neuronally, expand beyond their actual physical duration: even if the stimulus is already physically absent (but still present in the neural activity), distant time points’ neural activities may still correlate highly with the preceding time points of the actual stimulus—this amounts to high temporal smoothing/expansion and low temporal precision95. For instance, higher-order transmodal regions like the prefrontal cortex support temporal integration and expansion of sensory51,96,97,98,99, motor, and cognitive information1,4,6,100. This is compatible with the idea that transmodal regions such as the prefrontal cortex is involved in higher-order cognition like memory, imagination, abstraction, self, and consciousness.

Input sampling I: immediate and short vs. delayed and long responses in unimodal and transmodal regions

So far, we have demonstrated how the INT processes the input in particular timescales by modulating them through temporal integration and segregation. However, the brain is confronted with a variety of different inputs in various or multiple scales, with the number of timescales in the environment far exceeding the available timescales of the brain83,101,102. How can the brain bridge the gap between its own restricted timescales and the more expanded ones of its environmental context? Ideally, the brain encodes all inputs from the larger-scale environment within its own smaller-scale neural activity without losing any information, i.e., minimal error.

The empirical data show hierarchical organization of the timescales within the brain according to a fast–slow gradient from uni- to transmodal regions. The inputs may be sampled in a more or less analogous way when transitioning from the faster unimodal to the slower transmodal regions; this implies what in mathematics is described as down-sampling from faster input stochastics to slower ones103. We suggest that the INT acts as input samplers, that is, down-sampling across the hierarchy of unimodal and transmodal regions. Here, we first perform numerical simulations to provide some support for the differential response of unimodal and transmodal regions during input processing (this section). This will be complemented in a second step (next section) by illustrating the mathematical principles of the fast–slow gradient of down-sampling again showing some simulation data.

Under our fast–slow gradient assumption, sensory networks would be the first to carry out this down-sampling process. Unimodal and sensory networks show shorter intrinsic timescales compared to transmodal ones15,22. This implies that the first sampling would be done at a higher frequency and, progressively, said sampling would be on more widely spaced timescales, i.e., down-sampling. In that case, one would expect a faster and more transient, i.e., fast-frequency response in unimodal regions as related to their shorter INT. Transmodal regions, in contrast, should show a slower, delayed and longer-lasting response.

In a first step, we probed this in a computational network model5 (details are provided in the Supplementary material), applying inputs of short duration to lower-order regions, i.e., visual cortex V1, and tracking the response in both lower- and higher-order regions, i.e., anterior cingulate cortex 24c. Indeed, we observe immediate and short-lasting responses in V1, whereas 24c displays more delayed, i.e., sluggish and longer-lasting responses (Fig. 6a). This is compatible with a fast–slow gradient of consecutive input down-sampling throughout the lower-higher hierarchical processing stages.

Fig. 6: Distinct neural timescales from input perturbations.
figure 6

a Activity of primary visual cortex (V1) and anterior cingulate cortex (24c) in response to 250 ms of pulse input of varying strengths to area V1. The unimodal region V1 exhibits fast, short-lasting responses, whereas the transmodal region 24c exhibits slower, long-lasting responses. b Input duration is differentially regulated by unimodal and transmodal regions. The unimodal region V1 shows rapid response saturation to brief input durations, reflecting fast integration of sensory-relevant stimuli. The response of the transmodal region 24c saturates over longer timescales, reflecting slow and delayed temporal integration of inputs.

In a second step, we varied the duration of the input to V1. Inputs of shorter duration should yield a faster response saturation in lower-order regions like V1 while inputs of longer duration would be required to yield the same effect in the higher-order regions like 24c. This again was confirmed in our simulation data (Fig. 6b), providing indirect support for input down-sampling along the lines of a fast–slow gradient. In sum, there seems to be a close link of fast–slow input sampling during the encoding of the input stochastics, with the latter being sampled in a seemingly temporal way by the brain’s INT.

Together, these simulation results are compatible with a recent study by Wengler and colleagues25. They investigate fMRI-based ACW in subsequent regions of three sensory-based input streams, sensorimotor, visual, and auditory. In their data, the ACW shows a short to long gradient from primary over secondary sensory to higher-order sensory regions (like frontal eye field and dorsolateral prefrontal cortex). This holds for all three sensorimotor, visual, and auditory systems. Albeit indirectly, this supports our view of the fast–slow gradient mediating the continuous down-sampling of the input when transitioning from lower-order unimodal to higher-order transmodal regions.

Input sampling II: unimodal–transmodal hierarchy mediates fast–slow gradient of down-sampling

Mathematically speaking, the fast–slow gradient with continuous down-sampling entails a shift from faster to slower frequencies in the input stochastics, that is, a decrease in its maximum interpretable frequency. The more input processing advances towards the transmodal end of the fast–slow gradient, the more its fast frequency-based temporal precision decreases. This means that different inputs may no longer be distinguishable from each other, something which, in signal processing, is described as aliasing103 (Fig. 7a right). Accordingly, the stronger the down-sampling going along with lower sampling rate of the input, the slower the maximum frequency that can be reconstructed from a discrete signal, i.e., a signal after a sampling process103,104.

Fig. 7: Input sampling in the uni- and transmodal units.
figure 7

a The sub-sampling of a signal and shift in the frequency. The more the input signal is sampled, the more shift toward slower frequencies (indexed by median frequency). The first of the three plots shows the original signal with sampling frequency of 100 Hz in time (left) and frequency (right) domains. The second row shows the same signal after sub-sampled, that is, the new sampling frequency is 50 Hz. The spectral component of this signal is shifted toward slower frequency compared to the previous one. Third row shows the same signal, but after a new sub-sampling at 25 Hz. b The same concept as (a) in the brain’s uni- and transmodal regions. The input is processed by both unimodal and transmodal regions, each time passing from a sampling machine, thus shifting toward slower frequencies (slow mode).

On the neuronal side, down-sampling along the fast–slow gradient of the input stochastics results in a shift of the spectral content towards slower frequencies—this can be easily indexed by the median frequency. In fact, for signals where most of the spectral content is found in slow frequencies, as in brain’s fMRI and EEG / MEG signals, this is exactly what one can observe. They have a 1⁄f distribution with stronger power in slow frequencies and less power in faster frequencies.

Analogously, the down-sampling process carried out throughout the unimodal–transmodal hierarchy of the INT necessarily leads to a removal of the faster frequencies to achieve good resolution of the inputs’ slow frequencies103,104. Throughout the course of the unimodal to transmodal hierarchical processing, one loses the inputs’ information in the faster frequencies but preserves its disproportionally strong slow frequency information (see Fig. 7a for a simulation of this model using pink noise). A reduction in the sampling frequency causes the information to be maintained in slower frequencies with their higher spectral power at the cost of losing the detailed information provided by the faster frequencies. In other words, the original inputs may be shifted towards the slower frequencies in the transmodal regions. Down-sampling along a fast–slow gradient may thus be optimal for preserving the maximal amount of input information along its whole temporal range including both fast and slow frequency input components.

The assumption of such fast–slow down-sampling along the unimodal–transmodal hierarchy is compatible with the empirical data, that is, the relation of ACW with spectral content, i.e., the power spectrum. The longer ACW in transmodal regions are related to stronger power in infraslow frequencies as compared with faster ones. Shorter ACWs in unimodal regions are more dominated by their (relative) shift towards faster frequencies, meaning that their median frequency is lower12,15,44,65,105,106 (Fig. 7b). This raises the question for the role or function of slower frequencies like delta (1–4 Hz), slow cortical potentials in the 0.1–1 Hz range, and the infraslow frequencies (0.001–0.1 Hz) in input processing. Unlike faster frequencies like gamma (>30 Hz), beta (14–30 Hz), alpha, and theta101,102, the role or function of these slower frequencies is currently unclear (see also refs. 106,107,108).

We know that INT is closely linked to slow frequencies and their long cycle durations (see also refs. 12,21). Longer cycle durations means that more inputs can be temporally integrated within one cycle and subsequently be processed together. The longer INT, especially in the transmodal regions, thus samples12,21 the inputs in favor of a slow mode—the originally faster inputs are filtered and processed in a slow frequency way, i.e., sub-sampling (see refs. 12,13,15,93,108,109,110; see also ref. 107). On a more cognitive level, this means that internally oriented cognition like mind-wandering111 or mental time travel112, as mediated by transmodal regions, may be characterized by predominant slow modes (see also ref. 113).

One may now raise yet another question. We assumed that the INT is key in input encoding, that is, encoding the stochastics of the input. One would now assume that such input encoding should be closely related to the input sampling in terms of fast–slow down-sampling. Do input encoding and encoding sampling converge? This is indeed supported by a recent combined human EEG and modeling study. SanCristobal and colleagues114 show that the neural processing of the input stochastics of a looming sound (3 s) is directly related to the resting state’s ACW: the longer the resting state’s ACW, the better the task-related power changes in delta, alpha, and beta could track the physical dynamics of the looming sound. This supports the assumption that the resting state’s INT display the capacity for input sampling with the consequent bias towards the slow frequency mode.

Next, SanCristobal et al.114 complement these data by a computational model probing for the slower mode (via Ornstein-Uhlenbeck process): a longer ACW exerts a lower sampling frequency with a shift towards slower frequencies in input sampling. This in turn makes it impossible to obtain information from faster frequencies. Together, these findings support the notion that INT mediates input sampling by tilting or biasing it towards the slow frequency mode.

Even more important, these findings suggest that the seemingly stochastic nature of input encoding converges with the fast–slow gradient of down-sampling. Is the fast–slow gradient of down-sampling, with the emphasis on slow-frequency encoding, the best way of bridging the timescale differences of the brain and environment? The timescale of the environment exceeds far beyond that of the brain, especially in the slow frequency ranges (as, for instance the brain cannot process the ultra-slow frequency ranges of seismic earth waves). The fast–slow gradient of down-sampling may then best be suitable for overcoming the timescale differences between the environmental context (where the input is coming from) and the brain in order to encode best the former’s slow frequency modes.

Conclusion

How can the brain process temporally complex inputs such as music and language and, even better, integrate them into one meaningful whole as, for instance, a melody or a sentence? We propose that the intrinsic neural timescales (INT) take on a key role or function for the brain’s input processing. This is supported by the major role of INT in both resting and task states, including carry-over from rest to task. Following these findings, we propose that a key function of the INT consists in the dynamic shaping and structuring of input processing, including its different facets like cross-species input sharing and encoding of input stochastics. This concerns input sharing across species as well as input encoding through matching the stochastics of both environment and brain. While that may be mediated by two key mechanisms, (I) input integration vs. segregation on temporal grounds as well as (II) fast–slow down-sampling along the unimodal–transmodal hierarchy of the INT. Taken together, their key role in input processing through distinct mechanisms renders INT highly relevant for current views of the brain’s function, including its role in mental features and psychiatric disorders (Box 2), as well as for designing artificial intelligence (Box 3).