Abstract
Intrinsic timescales characterize dynamics of endogenous fluctuations in neural activity. Variation of intrinsic timescales across the neocortex reflects functional specialization of cortical areas, but less is known about how intrinsic timescales change during cognitive tasks. We measured intrinsic timescales of local spiking activity within columns of area V4 in male monkeys performing spatial attention tasks. The ongoing spiking activity unfolded across at least two distinct timescales, fast and slow. The slow timescale increased when monkeys attended to the receptive fields location and correlated with reaction times. By evaluating predictions of several network models, we found that spatiotemporal correlations in V4 activity were best explained by the model in which multiple timescales arise from recurrent interactions shaped by spatially arranged connectivity, and attentional modulation of timescales results from an increase in the efficacy of recurrent interactions. Our results suggest that multiple timescales may arise from the spatial connectivity in the visual cortex and flexibly change with the cognitive state due to dynamic effective interactions between neurons.
Introduction
The brain processes information and coordinates behavioral sequences over a wide range of timescales^{1,2,3}. While sensory inputs can be processed as fast as tens of milliseconds^{4,5,6,7}, cognitive processes such as decisionmaking or working memory require integrating information over slower timescales from hundreds of milliseconds to minutes^{8,9,10}. These differences are paralleled by the timescales of intrinsic fluctuations in neural activity across the hierarchy of cortical areas. The intrinsic timescales are defined by the exponential decay rate of the autocorrelation of activity fluctuations. The intrinsic timescales are faster in sensory areas, intermediate in association cortex, and slower in prefrontal cortical areas^{11}. The hierarchy of intrinsic timescales is observed across different recording modalities including spiking activity^{11,12}, intracranial electrocorticography (ECoG)^{13,14}, and functional magnetic resonance imaging (fMRI)^{15,16}. The hierarchy of intrinsic timescales reflects the specialization of cortical areas for behaviorally relevant computations, such as the processing of rapidly changing sensory inputs in lower cortical areas and longterm integration of information (e.g., for evidence accumulation, planning, etc.) in higher cortical areas^{17}.
In addition to ongoing fluctuations characterized by intrinsic timescales, neural firing rates also change in response to sensory stimuli or behavioral task events. These stimulus or taskinduced dynamics are characterized by the timescales of trialaverage neural response^{18,19} or encoding various task events over multiple trials^{12,20}. The taskinduced timescales also increase along the cortical hierarchy^{12,14,20,21,22}. However, taskinduced and intrinsic timescales are not correlated across individual neurons in any cortical area^{12}, suggesting they may arise from different mechanisms. Indeed, the timescales of trialaverage response increase through the mouse visual cortical hierarchy, whereas the intrinsic timescales do not change^{22}. Moreover, the taskinduced and intrinsic timescales can depend differently on task conditions. For example, for a fixed trialaverage response in a specific task condition, the intrinsic timescale of neural dynamics varies substantially across trials and these changes are predictive of the reaction time in a decisionmaking task^{23}. While taskinduced timescales relate directly to task execution, less is known about how intrinsic timescales change during cognitive tasks. Intrinsic timescales measured with ECoG exhibit a widespread increase across multiple cortical association areas during working memory maintenance, consistent with the emergence of persistent activity in this period^{13}. However, whether intrinsic timescales can change with temporal and spatial specificity in local neural populations processing specific information during a task has not been tested. It is also unclear whether intrinsic timescales can flexibly change in sensory cortical areas and in cognitive processes other than memory maintenance.
The mechanism underlying the diversity of intrinsic timescales across cortical areas can be related to differences in the connectivity. The hierarchical organization of timescales correlates with the gradients in the strength of neural connections in different cortical areas^{24,25}. These gradients exhibit an increase through the cortical hierarchy in the spine density on dendritic trees of pyramidal neurons^{26,27}, gray matter myelination^{13,28}, expression of NmethylDaspartate (NMDA) and gammaaminobutyric acid (GABA) receptor genes^{13,29}, strength of structural connectivity measured using diffusion MRI^{16}, or strength of functional connectivity^{15,16,30,31,32}.
The relation between the connectivity and timescales is further supported by computational models. Differences in timescales across cortical areas can arise in network models from differences in the strength of recurrent excitatory connections^{27,33}. These models matched the strength of excitatory connections to the spine density of pyramidal neurons^{27} or to the strength of structural connectivity^{33} in different cortical areas. Moreover, models demonstrate that the topology of connections in addition to the connection strength can affect the timescales of network dynamics. For example, slower timescales emerge in networks with clustered connections compared to random networks^{34}, or heterogeneity in the strength of internode connections gives rise to diverse localized timescales in a one dimensional network^{35}. Thus, network models can relate dynamics to connectivity and generate testable predictions to identify mechanisms underlying the generation of intrinsic timescales in the brain.
We examined how the intrinsic timescales of spiking activity in visual cortex were affected by the trialtotrial alterations in the cognitive state due to visual spatial attention. We analyzed spiking activity recorded from local neural populations within cortical columns in primate area V4 during two different spatial attention tasks and a fixation task. In all tasks, the autocorrelation of intrinsic activity fluctuations showed at least two distinct timescales, one fast and one slow. The slow timescale was longer on trials when monkeys attended to the receptive fields of the recorded neurons and correlated with the monkeys’ reaction times. We used recurrent network models to test several alternative mechanisms for generating the multiplicity of timescales and their flexible modulation. We established analytically that spatially arranged connectivity generates multiple timescales in local population activity and found support for this theoretical prediction in our V4 recordings. In contrast, heterogeneous biophysical properties of individual neurons alone cannot account for both temporal and spatial structure of V4 correlations. Thus, the V4 timescales arise from spatiotemporal population dynamics shaped by the local spatial connectivity structure. The model indicates that modulation of timescales during attention can be explained by a slight increase in the efficacy of recurrent interactions. Our results suggest that multiple intrinsic timescales in local population activity arise from the spatial network structure of the neocortex and the slow timescales can flexibly adapt to trialtotrial changes in the cognitive state due to dynamic effective interactions between neurons.
Results
Multiple timescales in fluctuations of local neural population activity
We analyzed spiking activity of local neural populations within cortical columns of visual area V4 from monkeys performing a fixation task (FT) and two different spatial attention tasks (AT1, AT2)^{36,37} (Fig. 1a–c, Supplementary Fig. 1). The activity was recorded with 16channel linear array microelectrodes from vertically aligned neurons across all cortical layers such that the receptive fields (RFs) of neurons on all channels largely overlapped. In FT, the monkey was rewarded for fixating on a blank screen for 3 s on each trial (Fig. 1a). During AT1, the monkeys were trained to detect changes in the orientation of a grating stimulus in the presence of three distractor stimuli and to report the change with a saccade to the opposite location (antisaccade, Fig. 1b). On each trial, a cue indicated the stimulus that was most likely to change, which was the target of covert attention, and the stimulus opposite to the cue was the target of overt attention due to the antisaccade preparation. During AT2, the monkey was rewarded for detecting a small luminance change in a grating stimulus in the presence of a distractor stimulus placed in the opposite hemifield. The monkey reported the change by releasing a bar. An attentional cue on each trial indicated the stimulus where the change should be detected, which was the target of covert attention (Fig. 1c).
We analyzed the timescales of fluctuations in local spiking activity by computing the autocorrelations (ACs) of spike counts in 2 ms bins. Previous laminar recordings showed that the neural activity is synchronized across cortical layers alternating spontaneously between synchronous phases of high and low firing rates^{36,38}. Therefore, we pooled the spiking activity across all layers (Fig. 1d) to obtain more accurate estimates of the spikecount autocorrelations. The shape of spikecount autocorrelations in our data deviated from a single exponential decay. In logarithmiclinear coordinates, the exponential decay corresponds to a straight line with a constant slope. The spikecount autocorrelations exhibited more than one linear slope, with a steep initial slope followed by shallower slopes at longer lags (Fig. 1e). The multiple decay rates in the autocorrelations indicate the presence of multiple timescales in the fluctuations of local population spiking activity.
To verify the presence of multiple timescales and to accurately estimate their values from autocorrelations, we used a method based on adaptive Approximate Bayesian Computations (aABC, Methods)^{39}. This method overcomes the statistical bias in autocorrelations of finite data samples, which undermines the accuracy of conventional methods based on direct fitting of the autocorrelation with exponential decay functions. The aABC method estimates the timescales by fitting the spikecount autocorrelation with a generative model that can have single or multiple timescales and incorporates spiking noise. The method accounts for the finite data amount, nonPoisson statistics of the spiking noise, and differences in the mean and variance of firing rates across experimental conditions. The aABC method returns a posterior distribution of timescales that quantifies the estimation uncertainty and allows us to compare alternative hypotheses about the number of timescales in the data.
We fitted each autocorrelation with a onetimescale (M_{1}) and a twotimescale (M_{2}) generative model and selected the optimal number of timescales by approximating the Bayes factor obtained from the posterior distributions of the fitted models (Fig. 2a, Supplementary Fig. 2, Methods). The majority of autocorrelations were better described by the model with two distinct timescales (M_{2}) than with the onetimescale model (Fig. 2a, b). The presence of two distinct timescales (fast τ_{1} and slow τ_{2}) was consistent across both spontaneous (i.e. in the absence of visual stimuli, τ_{1,MAP} = 8.87 ± 0.78 ms, τ_{2,MAP} = 85.82 ± 15.9 ms, mean ± s.e.m. across sessions, MAP: Maximum a posteriori estimate from the multivariate posterior distribution) and stimulusdriven activity (τ_{1,MAP} = 5.05 ± 0.51 ms, τ_{2,MAP} = 135.87 ± 9.35 ms, mean ± s.e.m.), and across all monkeys, while the precise values of timescales were heterogeneous reflecting subject or sessionspecific characteristics (Fig. 2c). Although it is possible that autocorrelations contained more than two timescales, with our data amount, the threetimescale model did not provide a better fit than the twotimescale model (Supplementary Fig. 3). Thus, the twotimescale model provided a parsimonious description of neural dynamics in our data.
Slow timescales are modulated during spatial attention
Next, we examined whether the intrinsic timescales of spiking activity were modulated during spatial attention. We compared the timescales estimated from the stimulusdriven activity on trials when the monkeys attended toward the RFs location of the recorded neurons (attendin condition, covert or overt) versus the trials when they attended outside the RFs location (attendaway condition). In this analysis, we included recording sessions in which the autocorrelations were better fitted with two timescales in both attendaway and attendin (covert or overt) conditions. We compared the MAP estimates of the fast τ_{1} and slow τ_{2} timescales between attendin and attendaway conditions across recording sessions.
We found that the slow timescale was significantly longer during both covert and overt attention relative to the attendaway condition (covert: mean τ_{2,att−in} = 140.69 ms, mean τ_{2,att−away} = 115.07 ms, p = 3 × 10^{−4}, N = 32; overt: mean τ_{2,att−in} = 141.31 ms, mean τ_{2,att−away} = 119.58 ms, p = 7 × 10^{−4}, N = 26; twosided Wilcoxon signedrank test) (Fig. 3), while there was no significant change in the fast timescale during attention (covert: mean τ_{1,att−in} = 5.53 ms, mean τ_{1,att−away} = 5.54 ms, p = 0.75, N = 32; overt: mean τ_{1,att−in} = 3.42 ms, mean τ_{1,att−away} = 4.12 ms, p = 0.39, N = 26; twosided Wilcoxon signedrank test). The increase in the slow timescale with attention was evident on individual recording sessions when comparing the marginal posterior distributions of τ_{2} for attendin versus attendaway conditions (Fig. 3a, d). The significant increase of τ_{2} was observed in 24 out of 32 individual sessions during covert attention, and 22 out of 26 individual sessions during overt attention. Both fast and slow timescales varied across sessions, but were not significantly different between covert and overt attention (p > 0.05 for both τ_{1} and τ_{2}, twosided Wilcoxon signedrank test, Supplementary Fig. 4). The increase in τ_{2} was not due to increase in the firing rate with attention, since the aABC method accounts for the differences in the firing rate across behavioral conditions (Methods), and τ_{2} was not correlated with the mean firing rate of population activity (Supplementary Fig. 5). The increase of slow timescales during attention is consistent with the reduction in the power of lowfrequency fluctuations in local field potentials^{37,40,41,42} and spiking activity^{43} (Supplementary Note 1, Supplementary Fig. 6, 7). The modulation of the slow timescale was consistent across both attention tasks (AT1 and AT2) and each monkey, and appeared in response to trialtotrial changes in the cognitive state of the animal directed by the attention cue. These results suggest that different mechanisms control the fast and slow timescales of ongoing spiking activity, and the mechanisms underlying the slow timescale can flexibly adapt according to the animal’s behavioral state.
To test whether attentional modulation of timescales was relevant for behavior, we analyzed the relationship between timescales and monkeys’ reaction times in the attention tasks. We quantified the relationship between the average reaction times of monkeys’ responses in each session (see Supplementary Fig. 1 for details of experiment) and the MAP estimated timescales of spiking activity using linear mixedeffects models fitted separately in attendin and attendaway conditions (Fig. 4, Methods, Supplementary Table 1, 2). The linear mixedeffects models had a separate intercept for each monkey to account for individual differences between the monkeys and attention tasks (AT1 and AT2). The reaction times were negatively correlated with the slow timescales in attendin condition (combined covert and overt) (slope = −0.16 ± 0.066, mean ± 95% confidence intervals (CIs); p = 9 × 10^{−6}, Ftest; N = 58, R^{2} = 0.62), but not in attendaway condition (slope = 0.015 ± 0.12, p = 0.79, N = 32, R^{2} = 0.69). Fast timescales were not correlated with the reaction times (attendin: slope = 0.0016 ± 0.86, p = 0.997, N = 58, R^{2} = 0.46; attendaway: slope = 0.53 ± 0.94, p = 0.26, N = 32, R^{2} = 0.70). Thus, on average monkeys responded to a stimulus change faster in sessions with longer slow timescales of neurons with the receptive fields in the attended location. The spatial selectivity of this effect suggests that the increase in the slow timescale may be related to behavioral benefits of selective spatial attention.
Mechanisms for generating multiple timescales in local population dynamics
What mechanisms can generate multiple timescales in the local population activity? One possibility is that multiple timescales reflect biophysical properties of individual neurons within a local population. For example, two timescales can arise from mixing heterogeneous timescales of different neurons^{44,45} or combining different biophysical processes, such as a fast membrane time constant and a slow synaptic time constant^{46}. Alternatively, multiple timescales in local population activity can arise from spatiotemporal population dynamics in networks with spatially arranged connectivity^{47}.
Analyses of wellisolated singleunit activity (SUA) would be ideal for testing whether multiple timescales in local V4 population activity reflect the mixing of heterogeneous timescales of individual neurons or dynamics shared by the population. However, due to low firing rates, SUA did not yield sufficient data for conclusive model comparison. We fitted autocorrelations of SUA during the fixation task (which had the longest trial duration of 3 s and thus the largest data amount per trial) and performed the model comparison to determine the number of timescales. While some single units clearly showed two distinct timescales, the model comparison was inconclusive for most units because autocorrelations were dominated by noise due to low data amount (Supplementary Note 2, Supplementary Fig. 8). We, therefore, turned to computational modeling for testing possible alternative mechanisms for generating multiple timescales.
To determine which mechanism, local biophysical properties or spatial network interactions, is consistent with neural dynamics in V4, we developed three recurrent network models each with a different mechanism for timescale generation (Fig. 5). We implemented all mechanisms within the same modeling framework. The models consist of binary units arranged on a twodimensional lattice corresponding to lateral dimensions in the cortex (Fig. 5a–c). Each unit represents a small population of neurons, such as a cortical minicolumn^{48,49}, and is connected to 8 other units in the network. The activity of unit i at timestep \({t}^{{\prime} }\) is described by a binary variable \({S}_{i}({t}^{{\prime} })\in \{0,1\}\) representing high (1) and low (0) firingrate states of a local population^{36}. The activity \({S}_{i}({t}^{{\prime} })\) stochastically transitions between states driven by the selfexcitation (probability p_{s}), excitation from the connected units (probability p_{r}), and the stochastic external excitation (probability p_{ext} ≪ 1) delivered to each unit (Methods). The selfexcitation probability describes intrinsic dynamics of a unit in the absence of network interactions, arising from biophysical properties of neurons or reverberation within a local population (via the vertical connectivity within a minicolumn). The selfexcitation generates a timescale τ_{self}, which is the autocorrelation timescale of a twostate Markov process: \({\tau }_{{{{{{{{\rm{self}}}}}}}}}={(\ln ({p}_{{{{{{{{\rm{s}}}}}}}}}))}^{1}\) (Methods, Supplementary Note 3). The recurrent excitation p_{r} accounts for horizontal interactions between units. The sum of all interaction probabilities is the local branching parameter: BP = p_{s} + 8p_{r}, describing the expected number of units activated by a single active unit i.
The models differ in the mechanism generating multiple timescales in the local population activity. In two models, connectivity is random and multiple timescales arise locally from biophysical properties of individual units. In the third model, connectivity is spatially organized and multiple timescales arise from recurrent interactions between units^{47}.
The first model assumes that two timescales in local population activity reflect aggregated activity of different neuron types with distinct (fast and slow) biophysical timescales (e.g., membrane time constants), which we modeled as two types of units (A and B) each with a different selfexcitation probability (p_{s,A}, p_{s,B}, Fig. 5a). We placed two units, A and B, at each vertex of the lattice and summed their activity to obtain a local population activity as in the columnar recordings. Connections between units of any type are random. As expected, the autocorrelation of local population activity exhibits two distinct timescales corresponding to the selfexcitation timescales of the two unit types (Fig. 5d).
The second model assumes that two timescales arise from two local biophysical processes, e.g., a fast membrane time constant and a slow synaptic time constant (Fig. 5b)^{46}. We modeled the membrane time constant with the fast selfexcitation timescale, and the synaptic time constant as a lowpass filter of the input to each unit with a slow timeconstant τ_{synapse} (Methods)^{46}. The connectivity between units is random. The autocorrelation of individual unit’s activity in this model exhibit two timescales corresponding to the membrane (τ_{self}) and synaptic (τ_{synapse}) time constants (Fig. 5e).
Finally, in the third model, multiple timescales arise from recurrent dynamics shaped by the spatial network connectivity, akin to the horizontal connectivity in primate visual cortex^{49}. Each model unit is connected to 8 nearby units (Fig. 5c). Although each unit has only a single selfexcitation timescale, the unit’s autocorrelation exhibit multiple timescales with a fast decay at short timelags and a slower decay at longer timelags (Fig. 5f). The fast initial decay corresponds to the selfexcitation timescale. The slow autocorrelation decay is generated by recurrent interactions among units in the network. In simulations, the slow autocorrelation decay closely matches the autocorrelation of the net recurrent input received by a unit from its neighbors (excluding the selfexcitation input).
To understand how recurrent interactions generate slow timescales, we analytically computed the autocorrelation timescales of the unit’s activity in the network with spatial connectivity, using the master equation for binary units with Glauber dynamics^{50} (Methods, Supplementary Note 4, details in^{47}). We found that the slow decay of the autocorrelation contains a mixture of interaction timescales τ_{int,k}. Each τ_{int,k} arises from recurrent interactions on a different spatial scale, characterized by the modes of correlated fluctuations with different spatial frequencies k in the Fourier space (Methods). For each spatial frequency k, the interaction timescale depends on both the probability of horizontal interactions (p_{r}) and the selfexcitation probability (p_{s}) (Methods, Eq. (24)). Shorter interaction timescales arise from higher spatial frequency modes (larger k) which correspond to persistent activity in local neighborhoods, and longer timescales are generated by more global interactions (smaller k)^{47}. The longest timescale in the network is characterized by the global interaction timescale related to the zero spatial frequency mode (Methods, Eq. (25)). We can approximate the slow decay of the autocorrelation with a single effective interaction timescale (τ_{int}) defined as a weighted average of all interaction timescales (Methods, Eq. (27)). Therefore, the autocorrelation shape is well approximated with two timescales: the fast selfexcitation timescale and the slow effective interaction timescale.
Generating multiple timescales in spatial networks does not require strictly structured connectivity. Systematically changing the connectivity from structured to random reveals that networks with an intermediate level of local connectivity also exhibit multiple timescales in local dynamics (Fig. 6, Supplementary Note 5). However, by getting closer to a random connectivity, most interaction timescales become smaller and close to the selfexcitation timescale, and only the global timescale does not depend on the network structure. Hence networks with different connectivity have the same global timescale (Fig. 6, inset). In fully random networks, the autocorrelation of a unit’s activity effectively exhibits only two distinct timescales: the selfexcitation timescale and the global interaction timescale. However, the global timescale has a very small relative contribution in local autocorrelations (scaled with the inverse number of neurons in the network) and is hard to observe empirically as it requires data with excessively long trial duration.
While all three mechanisms account for multiple timescales in V4 autocorrelations, they can be distinguished in crosscorrelations between local population activity at different spatial distances. In models with random connectivity, crosscorrelations do not depend on distance between units on the lattice (Fig. 5g, h). In contrast, the model with spatial connectivity predicts that both the strength and timescales of crosscorrelations depend on distance (Fig. 5i). Specifically, the zero timelag crosscorrelations decrease with distance. Moreover, crosscorrelations contain multiple timescales equal to the interaction timescales in autocorrelations (Methods), but no selfexcitation timescale since selfexcitation is independent across units. With increasing distance, the weights of timescales generated by local interactions (high spatial frequency modes) decrease, and timescales generated by more global interactions (low spatial frequency modes) dominate crosscorrelations. Thus, crosscorrelations become weaker and dominated by slower timescales at longer distances (analytical derivations in Methods, details in ref. ^{47}). Approximating the shape of auto and crosscorrelations with two effective timescales, the theory predicts that both timescales in crosscorrelations are larger than in the autocorrelation and increase with distance. Therefore, by measuring timescales of crosscorrelations at different distances, we can determine which mechanism, spatial network interactions or local biophysical properties, is more consistent with neural dynamics in V4.
To test these model predictions in our V4 recordings, we computed crosscorrelations between population activity on different channels during spontaneous activity (monkey G in FT, monkey N in AT2), which had the longest trial durations for better detection of slow timescales (Methods). Columnar recordings generally exhibit slight horizontal displacements which manifest in a systematic shift of receptive fields (RFs) across channels^{51}. We used distances between the RF centers (RFcenter distance) as a proxy for horizontal cortical distances^{51}. For each monkey, we divided the crosscorrelations into two groups with larger (d_{RF,L}) and smaller (d_{RF,S}) RFcenter distances than the median distance (monkey G: 0 < d_{RF,S} < 2.08, 2.08 < d_{RF,L} < 5, monkey N: 0 < d_{RF,S} < 0.77, 0.77 < d_{RF,L} < 2.25, all distances are in degrees of visual angle, dva) and averaged the crosscorrelations within each group. For comparison, we also computed the average autocorrelation of population activity on individual channels (i.e. without pooling spikes across channels). The differences between auto and crosscorrelations of V4 data appeared smaller than in the model since horizontal displacements between channels were relatively small, sampling mainly within the same or nearby columns^{51}.
The crosscorrelations of V4 activity exhibited distinct fast and slow decay rates as predicted by the spatial network model (Fig. 5j, Supplementary Fig. 9, left). In agreement with the spatial network model, zero timelag crosscorrelations decreased with increasing RFcenter distance (monkey G: mean for d_{RF,S} = 0.047, d_{RF,L} = 0.040, p = 4 × 10^{−4}, N = 152; monkey N: mean for d_{RF,S} = 0.022, d_{RF,L} = 0.013, p = 0.001, N = 128, twosided Wilcoxon ranksum test), consistent with the reduction of pairwise noise correlations with lateral distance in V4^{51,52}. The shapes of V4 auto and crosscorrelations were well approximated by fitted twotimescale generative models (Fig. 5j, Supplementary Fig. 9, left), and the estimated posterior distributions allowed us to compare auto and crosscorrelation timescales at different distances (Fig. 5k, Supplementary Fig. 9, right). Both fast and slow timescales were smaller in autocorrelations than in crosscorrelations (Fast timescale: monkey G, mean τ_{1,AC} = 10.11 ms, τ_{1,CC,S} = 12.24 ms, τ_{1,CC,L} = 14.19 ms; monkey N, mean τ_{1,AC} = 4.93 ms, τ_{1,CC,S} = 12.18 ms, τ_{1,CC,L} = 12.34 ms; Slow timescale: monkey G, mean τ_{2,AC} = 75.46 ms, τ_{2,CC,S} = 83.94 ms, τ_{2,CC,L} = 101.94 ms; monkey N, mean τ_{2,AC} = 26.53 ms, τ_{2,CC,S} = 358.07 ms, τ_{1,CC,L} = 552.70 ms; number of samples in each posterior N = 100, all pvalues < 10^{−10}, twosided Wilcoxon ranksum test). Both fast and slow timescales of crosscorrelations increased with the RFcenter distance in both monkeys, but the increase in the fast timescale did not reach statistical significance in monkey N (τ_{2}: p < 10^{−10}, τ_{1}: p_{G} < 10^{−10}, p_{N} = 0.36, twosided Wilcoxon ranksum test), possibly due to narrower range of RFcenter distances in monkey N compared to monkey G (median d_{RF,N} = 0.77, d_{RF,G} = 2.08 dva). Thus, predictions of the spatial network model, but not the models with random connectivity, were borne out by the data.
These results suggest that multiple timescales in local population activity in V4 arise from the recurrent dynamics shaped by the spatial connectivity of the primate visual cortex and not from local biophysical processes alone. Local biophysical mechanisms can also contribute to generating multiple neural timescales. For example, spatial connectivity combined with synaptic filtering can give rise to multiple autocorrelation timescales (Supplementary Fig. 10). The dependence of crosscorrelation timescales on distance indicates that dominant timescales in the local population activity reflect the spatial network structure.
Changes in the efficacy of network interactions modulate local timescales
We used the spatial network model to investigate which mechanisms can underlie the modulation of the slow timescales during attention. We matched the timescales between the model with local connectivity (r = 1) and experimental data to determine which changes in the model parameters can explain the attentional modulation of timescales in V4. We matched the selfexcitation and effective interaction timescales of a model unit to, respectively, the fast and slow timescales of V4 activity (mean timescale ± s.e.m., Methods) for both the attendaway and attendin (averaged over covert and overt) conditions (Fig. 7). We used a combination of analytical approximations and model simulations to find parameters that produce timescales similar to the V4 data (Methods).
We found that to reproduce the timescales in V4, the model needs to operate close to the critical point BP = 1 (Fig. 7b). At the critical point, each unit activates one other unit on average resulting in selfsustained activity^{53}. Close to this regime, the timescales are flexible, such that small changes in the network excitability give rise to significant changes in timescales. To increase the slow timescale during attention, the total excitability of the network interactions should increase, shifting the network dynamics closer to the critical point. The overall increase in the interaction strength can be achieved by increasing the strength of either the selfexcitation (p_{s}) or the recurrent interactions (p_{r}). Increasing p_{r} while keeping p_{s} constant allows for substantial changes in the slow timescale and a nearly unchanged fast timescale consistent with the V4 data. The increase of p_{s} in the model produces a slight increase in the fast timescale (τ_{1}) (~ 0.4 ms on average), but such small changes in τ_{1} would be undetectable with our available data amount (the uncertainty of τ_{1} MAP estimate is ± 0.9 ms on average, Fig. 3b, e). The increase in p_{s} can also be counterbalanced by a reduction in p_{r} to produce the observed changes of timescales.
Several mechanisms can account for changes in the strength of recurrent interactions during attention. For example, the increase in p_{s} is consistent with the observation that interactions between cortical layers in V4 increase during attention^{42}, when p_{s} is interpreted as the strength of vertical recurrent interactions within cortical minicolumns. A reduction in p_{r} can be mediated by neuromodulatory effects that reduce the efficacy of lateral connections in the cortex during attention^{54}. In addition, our analytical derivations show that in the model with nonlinear recurrent interactions, the effective strengths of recurrent interactions can also change by external input (Methods, details in ref. ^{47}). The input alters the operating regime of network dynamics changing the effective strength of recurrent interactions. Thus, with nonlinear interactions, timescales can be modulated by the input to the network, such as topdown inputs from higher cortical areas during attention^{51,55}. Altogether, our model suggests that attentional modulation of timescales can arise from changes in the efficacy of recurrent interactions in visual cortex that can be mediated by neuromodulation or topdown attentional inputs.
Discussion
We found that ongoing spiking activity of local neural populations within columns of the area V4 unfolded across fast and slow timescales, both in the presence and absence of visual stimuli. The slow timescale increased when monkeys attended to the receptive fields location, showing that local intrinsic timescales can change flexibly from trial to trial according to selective attention. Furthermore, the slow timescales of neurons with RFs in the attended location correlated with the monkeys’ reaction times suggesting that the increase in the slow timescale may contribute to behavioral benefits of selective spatial attention. To understand the mechanisms underlying the multiplicity and flexible modulation of timescales, we developed network models linking intrinsic timescales to biophysical properties of individual neurons or the spatial connectivity structure of the visual cortex. Only the spatial network model correctly predicted the distancedependence of spatiotemporal correlations in V4, indicating that multiple timescales in V4 dynamics arise from the spatial connectivity of primate visual cortex. The model suggests that slow timescales increase with the effective strength of recurrent interactions.
Multiple intrinsic timescales in neural activity
Previous studies characterized the autocorrelation of ongoing neural activity with a single intrinsic timescale^{11,13,15,16}. The intrinsic timescale was usually measured for neural populations either by averaging autocorrelations of single neurons in one area^{11} or using coarsegrained measurements such as ECoG^{13} or fMRI^{15,16}. Thus, ongoing dynamics in each area were described with a single intrinsic timescale that varied across areas. We extended this view by showing that, within one area, local population activity exhibits multiple intrinsic timescales. These timescales reflect ongoing dynamics on single trials and are not driven by task events. Our results suggest that the multiplicity of timescales is an intrinsic property of neural activity arising from inherent cellular and network properties of the cortex.
We show that multiple timescales in local dynamics can emerge from the spatial connectivity structure in a recurrent network model. The presence of two dominant timescales (τ_{self}, τ_{int}) in local dynamics depends on the combination of the structured connectivity and strong, meandriven interactions between units. Networks with random connectivity (Fig. 6, b) or weak, diffusiontype interactions^{51} exhibit only one dominant timescale in local activity (Supplementary Note 6). Moreover, local biophysical properties alone cannot explain the dependence of spatiotemporal neural correlations on lateral distance in the cortex, highlighting the importance of spatial network interactions for generating multiple timescales in local population activity.
In our network model with local spatial connectivity, recurrent interactions across different spatial scales induce multiple slow timescales. To generate multiple slow timescales, our network operates close to a critical point. Spiking networks with spatial connectivity can generate fast correlated fluctuations that emerge from instability at particular spatial frequency modes^{56}. Slow fluctuations of firing rates can also arise in networks with clustered random connectivity, but interactions between clusters induce only a single slow timescale^{34}. We show that more local spatial connectivity (smaller r) leads to slower dynamics and modifies the weights and composition of timescales in the local activity. The timescale of the global activity, on the other hand, is the same across networks with distinct local timescales and different connectivity structures. These results show that local temporal and spatial correlations of neural dynamics are closely tied together.
In our model, integrating activity over larger spatial scales leads to disappearance of faster interaction timescales (higher spatial frequencies) leaving only slower interaction timescales (lower spatial frequencies) in the coarsegrained activity. At the extreme, the global network activity exhibits only the slowest interaction timescale (the global timescale). This mechanism may explain the prominence of slow dynamics in meso and macroscale measures of neural activity such as LFP or fMRI^{57}, while faster dynamics dominate in local measures such as spiking activity. The model predicts that the slowest interaction timescales have very small weights in the autocorrelation of local neural activity and thus can be detected in local activity only with excessively long recordings. Indeed, infraslow timescales (on the order of tens of seconds and minutes) are evident in the cortical spiking activity recorded over hours^{58}.
Functional relevance of neural activity timescales
Intrinsic timescales are thought to define the predominant role of neurons in the cognitive processes^{17}. For example, in the orbitofrontal cortex, neurons with long intrinsic timescales are more involved in decisionmaking and the maintenance of value information^{44}. In the prefrontal cortex (PFC), neurons with short intrinsic timescales are primarily involved in the early phases of working memory encoding^{31}, while neurons with long timescales play a significant role in coding and maintaining information during the delay period^{31,45}. Our finding that intrinsic timescales can flexibly change from trial to trial (and across epochs within a trail^{13}) suggests a possibility that taskinduced timescales may correspond with intrinsic timescales only during specific task phases. These results may explain why the taskinduced timescales of single neurons do not correlate with intrinsic timescales measured over the entire task duration^{12}.
We found that timescales of local neural activity changed from trial to trial depending on the attended location. A previous ECoG study found that the intrinsic timescale of neural activity in cortical association areas increased after engagement in a working memory task^{13}. Our findings go beyond this earlier work by showing that the modulation of timescales can be functionally specific as it selectively affects only neurons representing the attended location within the retinotopic map. While changes in timescale due to task engagement could be mediated by slow global processes such as arousal, the retinotopically precise modulation of timescales requires local changes targeted to taskrelevant neurons. Our results further show that the modulation of timescales also occurs in sensory cortical areas and cognitive processes other than memory maintenance^{13} which explicitly requires temporal integration of information. The correlation of slow timescales with reaction times during attention may be functionally relevant, potentially allowing neurons to integrate information over longer durations.
Longer timescales during attention in the model are associated with shifting the network dynamics closer to a critical point. Shifting closer to criticality was also suggested as a mechanism for the increase in gammaband synchrony and stimulus discriminability during attention^{59}. Furthermore, strong recurrent dynamics close to the critical point can flexibly control the dimensionality of neural activity^{60}. Hence, operating closer to the critical point during attention might help to optimize neural responses to environmental cues and improve information processing^{61}.
Mechanisms for attentional modulation of timescales
Changes in the slow timescale of neural activity due to attention occurred from one trial to another. Such swift changes cannot be due to significant changes in the underlying network structure and require a fast mechanism. Our model suggests that the modulation of slow timescales during attention can be explained with a slight increase in the network excitability mediated by an increase in the efficacy of horizontal recurrent interactions, or by an increase in the efficacy of vertical interactions accompanied by a decrease in the strength of horizontal interactions.
Several physiological processes may underlie these network mechanisms in the neocortex. Topdown inputs during attention can enhance the local excitability in cortical networks^{55}. Our analytical derivations show that inputs can increase the effective strength of recurrent interactions between neurons in networks with nonlinear interactions, similar to previous models^{18,62}. Similar modulation of timescales during covert and overt attention suggests that topdown attentional inputs arrive from brain areas that represent both attentionrelated and saccaderelated information. Frontal eye field (FEF) can be a possible source for such modulations^{37,63,64}. Furthermore, feedback connections from higher visual areas like PFC or the temporaloccipital area (TEO) to lower visual areas have broader terminal arborizations than the size of the receptive fields in lower areas^{65,66}. These feedback inputs can coordinate activity across minicolumns in V4. Moreover, vertical interactions in V4 measured with local field potentials (LFPs) increase during attention^{42}, while neuromodulatory mechanisms can reduce horizontal interactions. The level of Acetylcholine (ACh) can modify the efficacy of synaptic interactions during attention in a selective manner^{54}. Increase in ACh strengthens the thalamocortical synaptic efficacy by affecting nicotinic receptors and reduces the efficacy of horizontal recurrent interactions by affecting muscarinic receptors. Decrease in horizontal interactions is also consistent with the proposed reduction of spatial correlations length during attention^{51}. These observations suggest that an increase in vertical interactions and a decrease in horizontal interactions is a likely mechanism for modulation of the slow timescale during attention.
To identify biophysical mechanisms of timescales modulation, experiments with larger number of longer trials are required to provide tighter bounds for estimated timescales. Additionally, detailed biophysical models can help distinguish different mechanisms, since biophysical and celltype specific properties of neurons might also be involved in defining neural timescales^{67,68}. In particular, diverse timescales observed across single neurons within one area^{17,31,44,45,69} require models considering a heterogeneous parameter space and can have computational implications for the brain^{70}. Here, we used the RFcenter distances as a proxy for spatial distances in the cortex. Experiments with spatially organized recording sites would allow to study the relation between temporal and spatial correlations more directly. Furthermore, developing recurrent network models that perform the selective attention task can help to find direct links between the modulation of dynamics and task performance. Finally, perturbation experiments that modulate selectively topdown inputs or neuromodulatory levels can provide the most direct test of the underlying mechanisms.
Our findings reveal that targeted neural populations can integrate information over variable timescales following changes in the cognitive state. Our model suggests that local interactions between neurons via the spatial connectivity of primate visual cortex can underlie the multiplicity and flexible modulation of intrinsic timescales. Our experimental observations combined with the computational model provide a basis for studying the link between the network structure, functional brain dynamics, and flexible behavior.
Methods
Behavioral tasks and electrophysiology recordings
We used previously published datasets^{36,37,71,72,73}. Experimental procedures for the fixation task and attention task 1 were in accordance with NIH Guide for the Care and Use of Laboratory Animals, the Society for Neuroscience Guidelines and Policies, and Stanford University Animal Care and Use Committee. Experimental procedures for the attention task 2 were in accordance with the European Communities Council Directive RL 2010/63/EC, and Use of Animals for Experimental Procedures, and the UK Animals Scientific Procedures Act. Three male monkeys (Macaca mulatta, between 6 to 9 years old) were used in the experiments. Monkeys were motivated by scheduled fluid intake during experimental sessions and juice reward.
On each trial of the fixation task (FT, monkey G), the monkey was rewarded for fixating a central dot on a blank screen for 3 s. In attention task 1 (AT1, monkeys G, B), the monkey detected orientation changes in one of the four peripheral grating stimuli while maintaining central fixation. Each trial started by fixating a central fixation dot on the screen and after several hundred milliseconds (170 ms for monkey B and 333 ms for monkey G), four peripheral stimuli appeared. Following a 200 − 500 ms period, a central attention cue indicated the stimulus that was likely to change with ~ 90% validity. Cue was a short line from fixation dot pointing toward one of the four stimuli, randomly chosen on each trial with equal probability. After a variable interval (600 − 2200 ms), all four stimuli disappeared for a brief moment and reappeared. Monkeys were rewarded for correctly reporting the change in orientation of one of the stimuli (50% of trails) with an antisaccade to the location opposite to the change, or maintaining fixation if none of the orientations changed. Due to the anticipation of antisaccade response, the cued stimulus was the target of covert attention, while the stimulus in location opposite to the cue was the target of overt attention. In attendin conditions, the cue pointed either to the stimulus in the RFs of the recorded neurons (covert attention) or to the stimulus opposite to the RFs (overt attention). The remaining two cue directions were attendway conditions.
In attention task 2 (AT2, Newcastle cohort monkey, N), the monkey detected a small luminance change within the white phase of a square wave static grating. The monkey initiated a trial by holding a bar and visually fixating a fixation point. The color of the fixation point indicated the level of spatial certainty (red: narrow focus, blue: wide focus). After 500 ms a cue appeared indicating the location and focus of the visual field to attend to. The cue was switched off after 250 ms. After another second two gratings appeared, one in the center of the RFs and one diametrically opposite with respect to the fixation point. The grating at the position indicated by the cue was the test stimulus. The other grating served as the distractor. After at least 500 ms a small luminance change (dimming) occurred either in the center of the grating (narrow focus) or in one of 12 peripheral positions (wide focus). If the dimming occurred in the distractor grating first, the monkey had to ignore it. The monkey was rewarded for a bar release within 750 ms of the dimming in the test grating. The faster the monkey reacted, the larger reward it received. Two grating sizes (small and large) were used in this experiment. We analyzed trials with the small grating to avoid surroundsuppression effects created by the large grating sizes extending beyond the neurons’ summation area^{74}.
Recordings were performed in the visual area V4 with linear array microelectrodes inserted perpendicularly to the cortical layers. Data were amplified and recorded using the Omniplex system (Plexon) in AT1 and FT and with the Digital Lynx recording system (Neuralynx) in AT2. Arrays were placed such that receptive fields of recorded neurons largely overlapped. Each array had 16 channels with 150 μm centertocenter spacing. In AT1 and FT, all 16 channels were visually responsive. In AT2, the number of visuallyresponsive channels per recording ranged between 8 and 12 with the median at 9.
Computing autocorrelations of neural activity
We computed autocorrelations from multiunit (MUA) spiking activity recorded in the presence (stimulusdriven) and absence (spontaneous) of visual stimuli (brown and yellow frames in Supplementary Fig. 1). For spontaneous activity, we analyzed spikes during the 3s fixation epoch in FT, and during the 800 ms epoch from 200 ms after the cue offset until the stimulus onset in AT2. For stimulusdriven activity, we analyzed spikes in the epoch from 400 ms after the cue onset until the stimulus offset in AT1, and from 200 ms after the stimulus onset until the dimming in AT2. For the stimulusdriven activity, trials in both attention tasks had variable durations (500 − 2200 ms). Thus, we computed autocorrelations in nonoverlapping windows of 700 ms for AT1 and 500 ms for AT2. On long trials, we used as many windows as would fit within the trial duration, and we discarded trials that were shorter than the window size. The duration of windows were selected such that we had at least 50 windows for each condition in each session. 3 out of 25 recording sessions in monkey G (AT1) were excluded due to short trial durations. For spontaneous activity, the windows were 3 s in FT and 800 ms in AT2.
We computed the average spikecount autocorrelation for each recording session. On each trial we pooled the spikes from all visuallyresponsive channels and counted the pooled spikes in 2 ms bins. For each behavioral condition (stimulus orientation, attention condition), we averaged spikecounts at each timebin across trials, and subtracted the trialaverage from the spikecounts at each bin^{11} to remove correlations due to changes in firing rate locked to the task events. We segmented the meansubtracted spikecounts \(A({t}_{i}^{{\prime} })\) into windows of the same length N, where \({t}_{i}^{{\prime} }\) (i = 1…N) indexes bins within a window. We then computed the autocorrelation in each window as a function of timelag t_{j}^{39}:
Here \({\hat{\sigma }}^{2}=\frac{1}{N1}{\sum }_{i=1}^{N}({A({t}_{i}^{{\prime} })}^{2}\frac{1}{{N}^{2}}{({\sum }_{i=1}^{N}A({t}_{i}^{{\prime} }))}^{2})\) is the sample variance, and \({\hat{\mu }}_{1}(j)=\frac{1}{Nj}\mathop{\sum }\nolimits_{i=1}^{Nj}A({t}_{i}^{{\prime} })\) and \({\hat{\mu }}_{2}(j)=\frac{1}{Nj}\mathop{\sum }\nolimits_{i=j+1}^{N}A({t}_{i}^{{\prime} })\) are two different sample means. In Eq. (1) for autocorrelation, we subtracted windowspecific mean to remove correlations due to slow changes in firing rate across trials, such as slow fluctuations related to changes in the arousal state. Thus, the range of timescales was limited to the trial duration. These timescales reflect the intrinsic neural dynamics within single trials. Finally, we averaged the autocorrelations over windows of the same behavioral condition separately for each recording session. The exact method of computing autocorrelations does not affect the estimated timescales, since we use the same method for computing autocorrelations of synthetic data when fitting generative models with the aABC method^{39}.
In AT1, we averaged autocorrelations over trials with different stimulus orientation for each attention condition, since all attention conditions contained about the same number of trials with each orientation. For stimulusdriven activity in AT2, we first estimated timescales separately for focus wide and narrow conditions and found no significant differences (twosided Wilcoxon signed rank test between MAP estimates, p > 0.05). Thus, we averaged autocorrelations of the focus narrow and wide conditions and refitted the average autocorrelations. The same procedure was applied to the spontaneous activity in AT2, and since there was no significant difference in timescales between different focus or attention conditions (twosided Wilcoxon signed rank test between MAP estimates for the twobytwo conditions, p > 0.05), we averaged the autocorrelations over all conditions and refitted the average autocorrelation.
For estimating the timescales, we excluded sessions with autocorrelations dominated by noise or strong oscillations that could not be well described with a mixture of exponential decay functions. We excluded a session if the autocorrelation fell below 0.01 (\(\log ({{{{{{{\rm{AC}}}}}}}})\) fell below − 2) in lags smaller or equal to 20 ms (Supplementary Fig. 11). Based on this criterion, we excluded 3 out of 22 sessions for monkey G in AT1, 8 out of 21 sessions during covert attention and 9 out of 21 during overt attention for monkey B in AT1, 2 out 20 sessions for spontaneous activity and 8 out 20 sessions for stimulusdriven activity for Monkey N in AT2. The difference in the number of excluded sessions for Monkey N during spontaneous and stimulusdriven activity is explained by the larger amount of data available for computing autocorrelations during spontaneous activity due to averaging over attention conditions and longer window durations (800 ms vs. 500 ms).
For visualization of autocorrelations, we omitted the zero timelag (t = 0 ms) (examples with the zero timelag are shown in Supplementary Fig. 11). The autocorrelation drop between the zero and first timelag (t = 2 ms) reflects the difference between the total variance of spike counts and the variance of instantaneous rate according to the law of total variance for a doubly stochastic process^{39}. This drop is fitted by the aABC algorithm when estimating the timescales.
Statistics and reproducibility
Three male monkeys were used in the experiments which is a standard sample size for primate studies^{18,37,42,43}. Animals’ ability to perform the task determined the number of trials in each recording session. The number of simultaneously recorded neurons was determined by the properties of the linear multielectrode arrays used for the experiments. Blinding of the investigators were not relevant, since there were no differences between the subjects that would conceivably create biases. Sex of the subjects was not considered in study design, as the sample size is too small to make any meaningful statements about the impact of sex on the mechanisms we found. Participants were not allocated into groups and no subject randomization was implemented. Information about excluded recording sessions is provided in the previous section.
Estimating timescales with adaptive Approximate Bayesian Computations
We estimated the autocorrelation timescales using the aABC method that overcomes the statistical bias in empirical autocorrelations and provides the posterior distributions of unbiased estimated timescales^{39}. The width of inferred posteriors indicates the uncertainty of estimates. For more reliable estimates of timescales (i.e. narrower posteriors), we selected epochs of experiments with longer trial durations (brown and yellow frames in Supplementary Fig. 1).
The aABC method estimates timescales by fitting the spikecount autocorrelation with a generative model. We used a generative model based on a doubly stochastic process with one or two timescales. Spike counts were generated from a rate governed by a linear mixture of OrnsteinUhlenbeck (OU) processes (one OU process \({A}_{{\tau }_{k}}\) for each timescale τ_{k})
where n is the number of timescales and c_{k} are their weights. The aABC algorithm optimizes the model parameters to match the spikecount autocorrelations between V4 data and synthetic data generated from the model. We generated synthetic data with the same number of trials, trial duration, mean and variance of spike counts as in the experimental data. By matching these statistics, the empirical autocorrelations of the synthetic and experimental data are affected by the same statistical bias when their shapes match. Therefore, the timescales of the fitted generative model represent the unbiased estimate of timescales in the neural data.
The spikecounts s are sampled for each timebin \([{t}_{i}^{{\prime} },\, {t}_{i+1}^{{\prime} }]\) from a distribution \({p}_{{{{{{{{\rm{count}}}}}}}}}(s\lambda ({t}_{i}^{{\prime} }))\), where \(\lambda ({t}_{i}^{{\prime} })={A}_{{{{{{{{\rm{OU}}}}}}}}}({t}_{i}^{{\prime} })\Delta {t}^{{\prime} }\) is the mean spikecount and \(\Delta {t}^{{\prime} }={t}_{i+1}^{{\prime} }{t}_{i}^{{\prime} }\) is the bin size. To capture the possible nonPoisson statistics of the recorded neurons, we introduce a dispersion parameter α defined as the variance over mean ratio of the spikecounts distribution \(\alpha={\sigma }_{s\lambda ({t}_{i}^{{\prime} })}^{2}/\lambda ({t}_{i}^{{\prime} })\). For a Poisson distribution, α is equal to 1. We allow for nonPoisson statistics by sampling the spike counts from a gamma distribution and optimize the value of α together with the timescales and the weights.
On each iteration of the aABC algorithm, we draw sample parameters from a prior distribution (first iteration) or a proposal distribution (subsequent iterations) defined based on the prior distribution and parameters accepted on the previous iteration. Then, we generate synthetic data from the sampled parameters and compute the distance d between the autocorrelations of synthetic and experimental data:
where t_{m} is the maximum timelag considered in computing the distance. We set t_{m} to 100 ms to avoid overfitting the noise in the tail of the autocorrelations. If the distance is smaller than a predefined error threshold ε, the sample parameters are accepted and added to the posterior distribution. Each iteration continued until 100 sample parameters were accepted. The initial error threshold was set to ε_{0} = 0.1, and in subsequent iterations, the error threshold was updated to the first quartile of the distances for the accepted samples. The fraction of accepted samples out of all drawn parameter samples is recorded as the acceptance rate accR. The algorithm stops when the acceptance rate reaches accR < 0.0007. The final accepted samples are considered as an approximation for the posterior distribution. We computed the MAP estimates by smoothing the final joint posterior distribution with a multivariate Gaussian kernel and finding its maximum with a grid search.
The choice of summary statistic (e.g., autocorrelations in the time domain or power spectra in the frequency domain and the fitting range) does not affect the accuracy of estimated timescales and only changes the width of the estimated posteriors^{39}. The frequencydomain fitting converges faster in wallclock time than timedomain fitting^{39}. As a control, we also estimated timescales by fitting the whole shape of power spectral density in the frequency domain. The results of these fits (Supplementary Fig. 7) were in agreement with the timedomain fits with a limited fitting range (Fig. 3).
We used a multivariate uniform prior distribution over all parameters. For the twotimescale generative model (M_{2}), the priors’ ranges were set to
and for the onetimescale generative model (M_{1}) they were set to
Model comparison with adaptive Approximate Bayesian Computations
We used the inferred posteriors from the aABC fit to determine whether the V4 data autocorrelations were better described with the onetimescale (M_{1}) or the twotimescale (M_{2}) generative models^{39}. First, we measured the goodness of fit for each model based on the distribution of distances between the autocorrelation of synthetic data from the generative model and the autocorrelation of V4 data. We approximated the distributions of distances by generating 1000 realizations of synthetic data from each model with parameters drawn from the posterior distributions and computing the distance for each realization. If the distributions of distances were significantly different (twosided Wilcoxon ranksum test), we approximated the Bayes factor, otherwise the summary statistics were not sufficient to distinguish these two models^{75}.
Bayes factor is the ratio of marginal likelihoods of the two models and takes into account the number of parameters in each model^{76}. In the aABC method, the ratio between the acceptance rates of two models for a given error threshold ε approximates the Bayes factor (BF) for that error threshold^{39}:
Acceptance rates can be computed using the cumulative distribution function (CDF) of the distances for a given error threshold ε,
where \({p}_{{{{{{{{{\rm{M}}}}}}}}}_{i}}(d)\) is the probability distribution of distances for model M_{i}. Thus, the ratio between the CDF of distances approximates the Bayes factor for every chosen error threshold. To eliminate the dependence on a specific error threshold, we computed the acceptance rates and the Bayes factor for varying error thresholds. Since only small errors indicate a wellfitted model, we computed the Bayes factor for all error thresholds that were smaller than the largest median of distance distributions of two models.
The M_{2} model was selected if its distances were significantly smaller than for the M_{1} model (twosided Wilcoxon ranksum test) and \({{{{{{{{\rm{CDF}}}}}}}}}_{{{{{{{{{\rm{M}}}}}}}}}_{2}}(\varepsilon ) \, > \, {{{{{{{{\rm{CDF}}}}}}}}}_{{{{{{{{{\rm{M}}}}}}}}}_{1}}(\varepsilon )\), i.e. BF > 1, for all \(\varepsilon \, < \, {\max }_{{{{{{{{{\rm{M}}}}}}}}}_{1},{{{{{{{{\rm{M}}}}}}}}}_{2}}[{{{{{{{\rm{median}}}}}}}}(\varepsilon )]\) (Supplementary Fig. 2). The same procedure was applied for selecting the M_{1} model. Although the Bayes factor threshold was set at 1, in most cases we obtained BF ≫ 1, indicating strong evidence for the twotimescale model. If the distribution of distances for the two models were not significantly different or the condition for the ratio between CDFs did not hold for all selected ε (CDFs were crossing), we classified the outcome as inconclusive, meaning that data statistics were not sufficient to make the comparison.
Timescales of auto and crosscorrelations of spiking activity on individual channels
We computed the average auto and crosscorrelations of the multiunit spiking activity recorded on individual channels during spontaneous activity (monkey G in FT, Monkey N in AT2). We computed the autocorrelation of each channel’s activity using the same procedure described above and then averaged the autocorrelations across channels and recording sessions for each monkey. We computed the crosscorrelations between spike counts on every pair of channels (A_{a} and A_{b}) that were at least two channels apart (∣a − b∣ ≥ 2 e.g., channels 1 and 3) as a function of timelag t_{j}
Here \({\hat{{\sigma }_{a}}}^{2}\) and \({\hat{{\sigma }_{b}}}^{2}\) are the sample variances, and \({\hat{\mu }}_{a}(j)=\frac{1}{Nj}{\sum }_{i=1}^{Nj}{A}_{a}({t}_{i}^{{\prime} })\) and \({\hat{\mu }}_{b}(j)=\frac{1}{Nj}{\sum }_{i=j+1}^{N}{A}_{b}({t}_{i}^{{\prime} })\) are the sample means for the activity on each channel. Then, we divided the crosscorrelations for each monkey in two groups based on the monkeyspecific median RFcenter distance and averaged over the crosscorrelations within each group.
The mapping of RFs was described previously^{36}. RFs were measured by recording spiking responses to brief flashes of stimuli on an evenly spaced 6 × 6 grid covering the lower left visual field (FT) or an evenly spaced 12 × 9 grid centered on the RF (AT2). Spikes in the window 0 − 200 ms (FT) or 50 − 130 ms (AT2) relative to the stimulus onset were averaged across all presentations of each stimulus. First, we assessed the statistical significance of a given RF^{77} and only included channels with a significant RF. Then, we found the RF center as the center of mass of the response map, and estimated the horizontal displacements between the channels by computing the distances between their RF centers.
We estimated the timescales of auto and crosscorrelations using the aABC method. We assumed the correlation between channels’ activity can be modeled as a twotimescale OU process shared between the two channels. We fitted the crosscorrelation shape by the unnormalized autocorrelation of the shared OU process, such that the variance of the OU process (i.e. the autocorrelation at lag zero) defines the strength of correlations. Thus, we used a twotimescale OU process as the generative model and applied the aABC method to optimize the model parameters by minimizing the distance between the autocorrelation of synthetic data from the OU process and V4 crosscorrelations. The aABC method returned a multivariate posterior distribution for timescales, their weights and the variance of the OU process. We computed the distances starting from the first timelag t = 2 ms up to t_{m} = 100 ms. For a fair comparison between the auto and crosscorrelations timescales, we used the same procedure to estimate the timescales of individual channels’ autocorrelations. For fitting the autocorrelation of monkey G, we additionally excluded the second timelag t = 2 ms, since AC(t = 2) < AC(t = 4), potentially related to refractory period of neurons (similar to^{11,31,44}).
Testing correlation between timescales and reaction times with linear mixedeffects models
To compute the reaction times for each attention condition, we separated the trials between attendin (separate covert and overt) and attendaway conditions. We computed the average reaction times of the monkeys for each recording session and each condition as the average duration between the reappearance of the stimuli and initiation of the antisaccade response (AT1, only trials with a change in stimuli orientation) or the average duration between dimming in the target stimulus and the bar release (AT2), across trials with the same attention condition.
We quantified the relationship between average reaction times and MAP estimates of the fast and slow timescales in each session for two different attention conditions (attendin and attendaway). For this analysis, we pulled the data across covert and overt attendin conditions, resulting in more samples for the attendin than attendaway condition. For each attention condition, we fitted a separate linear mixedeffects model using the “fitlm” function in the MATLAB R2021a. In these models, we consider data from each monkey as a separate group (i.e. a random effect) with a separate intercept to account for individual differences between the monkeys and between the two response types in the attention tasks (antisaccade versus bar release).
We fitted two different models that considered either one or two fixed effects for each attention condition. First, we fitted models that considered as the fixed effect, either the slow timescale (τ_{2,cond})
or the fast timescale (τ_{1,cond}),
Here the index cond denotes attendin or attendaway condition, RT indicates the reaction time, i is the session index, and m ∈ {G, B, N} indicates three different monkeys. ω_{0} and ω_{1} give the intercept and slope of the fixed effect with a given pvalue. Ω_{0,m} and ε_{i,m} are the random effects, where Ω_{0,m} gives a monkeyspecific intercept and ε_{i,m} gives the residuals. We also fitted models that considered both fast and slow timescales as fixed effects simultaneously,
These models return two fixedeffect coefficients ω_{1,2} with pvalues, one for each timescale. The resulting statistics for the two fitted models were consistent (Supplementary Table 1, 2). In the main text, we reported statistics from the first model type (Fig. 4, Supplementary Table 1).
Network model with spatially structured connections
The network model operates on a twodimensional square lattice of size 100 × 100 with periodic boundary conditions. Each unit in the model is connected to 8 other units taken either from its direct Moore neighborhood (local connectivity, Fig. 6a, top) or randomly selected within the connectivity radius r (dispersed connectivity, Fig. 6a, bottom). Activity of each unit is represented by a binary state variable S_{i} ∈ {0, 1} (i = 1…N, where N = 10^{4} is the number of units). The units act as probabilistic integrateandfire units^{78} following linear or nonlinear integration rules. States of the units are updated in discrete timesteps \({t}^{{\prime} }\) based on a selfexcitation probability (p_{s}), probability of excitation by the connected units (p_{r}), and the probability of external excitation (p_{ext} ≪ 1). The transition probabilities for each unit S_{i} at timestep \({t}^{{\prime} }\) are either governed by additive interaction rules (linear model):
or multiplicative interaction rules (nonlinear model):
Here, ∑_{j}S_{j} indicates the number of active neighbors of unit S_{i} at timestep \({t}^{{\prime} }\). For the analysis in the main text, we used the linear model. The nonlinear model generates similar local temporal dynamics (Supplementary Fig. 12). In the linear model, the sum of connection probabilities BP = p_{s} + 8p_{r} is the branching parameter that defines the state of the dynamics relative to a critical point at BP = 1^{53,78}.
To compute the average local autocorrelation in the network, we simulated the model for 10^{5} timesteps and averaged the autocorrelations of individual units. The global autocorrelations were computed from the pooled activity of all units in the network. To compute the autocorrelation of horizontal inputs for a unit i, we simulated the network with an additional “shadow” unit, which was activated by the same horizontal inputs (p_{r}) as the unit i but without the inputs p_{s} and p_{ext}. The shadow unit did not activate other units in the network. The autocorrelation of horizontal recurrent inputs was computed from the shadow unit activity. We computed the crosscorrelations between the activity of each pair of units in the network and averaged the crosscorrelations over pairs with the same distance d between units. To have the same number of sample crosscorrelations for each distance, we randomly selected 4 × 10^{4} pairs per distance. The spatial distance in the model is defined as the Chebyshev distance on the lattice (e.g., d = 1 is the Moore neighborhood). Each simulation started with a random configuration of active units based on the analytically computed steadystate mean activity (Eq. (21)). Running simulations for long periods allowed us to avoid the statistical bias in the model autocorrelations. We set p_{ext} = 10^{−4}, but the strength of external input in the linear model does not affect the autocorrelation timescales.
Network model with different unit types
In this model, two unittypes A and B are placed at each node of a twodimensional square lattice (Fig. 5a). The connectivity between the units is random and each unit is connected to 8 other units of any type.
The activity of each unit is given by a binary state variable S_{i} ∈ {0, 1} with transition probabilities as in the spatial linear model (Eq. (12)), but with different probabilities for the selfexcitation (p_{self,A}, p_{self,B}) and recurrent interactions (p_{r,A}, p_{r,B}) for each unit type. In order for both unit types to operate in the same dynamical regime, we set p_{self,A} + 8p_{r,A} = p_{self,B} + 8p_{r,B} = BP. Simulations were performed as for the spatial network, but auto and crosscorrelations were computed using the summed activity of two units A and B at each lattice node.
Network model with synaptic filtering
The model operates on a twodimensional square lattice, where each unit on the lattice is connected to 8 randomly selected units (Fig. 5b). We define the discretetime dynamics of units in this model based on a previously proposed continuous rate model with synaptic filtering^{46}. The transition probabilities for each binary unit S_{i} ∈ {0, 1} at timestep \({t}^{{\prime} }\) are governed by
Here, f(∑_{j}S_{j}) is a lowpass filter on recurrent inputs to each unit with the time constant τ_{synapse}, which evolves in discrete timesteps:
where \(\Delta {t}^{{\prime} }=1\) ms is the duration of each time step. Simulations and computation of auto and crosscorrelations were the same as for the spatial network.
Analytical derivation of local timescales in the spatial network model
For analytical derivations, we derived a continuoustime rate model corresponding to the linear probabilistic network model (Eq. (12)), with the transition rates defined as
These equations contain two noninteraction terms \({\alpha }_{1}={p}_{{{{{{{{\rm{ext}}}}}}}}}\left[\frac{\ln ({p}_{s})}{(1{p}_{s})\Delta {t}^{{\prime} }}\right]\) and \({\alpha }_{2}=(1{p}_{s}{p}_{{{{{{{{\rm{ext}}}}}}}}})\left[\frac{\ln ({p}_{s})}{(1{p}_{s})\Delta {t}^{{\prime} }}\right]\), and two interaction terms \({\beta }_{1}={\beta }_{2}={p}_{r}\left[\frac{\ln ({p}_{s})}{(1{p}_{s})\Delta {t}^{{\prime} }}\right]\), where \(\Delta {t}^{{\prime} }=1\) ms is the duration of each time step (details in ref. ^{47}). For this model, the probability of units to stay in a certain configuration {S} = {S_{1}, S_{2}, . . . , S_{N}} at time \({t}^{{\prime} }\) is denoted as \(P(\{S\},\, {t}^{{\prime} })\). The master equation describing the time evolution of \(P(\{S\},{t}^{{\prime} })\) is given by^{50}:
where {S}^{i*} = {S_{1}, S_{2}, . . . , 1 − S_{i}, . . . , S_{N}}. Using the master equation, we can write the time evolution for the first and second moments as
and for the timedelayed quadratic moment at timelag t as
By setting the right side of Eq. (18) to zero and averaging across all units, we can compute the steadystate mean activity
where n = 8 is the number of incoming connections to each unit.
We compute the timescales analytically for the network with local connections (r = 1). From Eq. (20), we can derive the equation for the average autocorrelation of each unit AC(t) as
Here CC(x, t) is the crosscorrelation between each unit at location (i, j) and its 8 nearest neighbors x = (i ± 1, j ± 1). The crosscorrelation term in this equation gives rise to the interaction timescales in the autocorrelation. By neglecting the crosscorrelation term, we can solve the Eq. (22) to get the selfexcitation timescale
Solving the dynamical equation for the timedelayed crosscorrelation (Eq. (20)) in the Fourier domain gives the interaction timescales (Supplementary Note 4, details in^{47}):
where k = (k_{1}, k_{2}) are the spatial frequencies in the Fourier space. For each k we get a different interaction timescale. Smaller k (low spatial frequencies) correspond to interactions on larger spatial scales, whereas larger k (high spatial frequencies) correspond to interactions on more local spatial scales. The largest interaction timescale (the global timescale) is defined based on the zero spatial frequency mode:
In these derivations, we defined distances between units as Euclidean distances and discarded the contributions from third and higher moments.
Considering the selfexcitation and interaction (i.e. crosscorrelation) terms, we can write down the analytical form of the autocorrelation function as
where A is the normalization constant to get AC(t = 0) = 1. \({N}^{{\prime} }\) is the number of units in each dimension: \({N}^{{\prime} }\times {N}^{{\prime} }=N\). This equation shows that the autocorrelation function contains selfexcitation timescale τ_{self} and \({N}^{{\prime} 2}/4\) interaction timescales weighted by the amplitude of crosscorrelation function \(\tilde{{{{{{{{\rm{CC}}}}}}}}}({k}_{1},{k}_{2})\) for the given spatial frequency mode (k_{1}, k_{2}). We can approximate the slow decay of the autocorrelation with an effective interaction timescale τ_{int} given by the weighted average of all interaction timescales created by different spatial frequency modes^{47}:
Here CC(0, 0) is given by \(\mathop{\sum }\nolimits_{{k}_{1},{k}_{2}=0}^{\frac{2\pi ({N}^{{\prime} }/21)}{{N}^{{\prime} }}}\tilde{{{{{{{{\rm{CC}}}}}}}}}({k}_{1},\, {k}_{2})\).
The analytical approximation of the effective interaction timescale is more accurate when the dynamics are away from the critical point. Close to the critical point (BP → 1), the meanfield approximations are not valid.
The selfexcitation timescale for the discretetime network model can also be obtained analytically using the autocorrelation of a twostate Markov process driven by the selfexcitation and external input. Using the transition matrix (considering the linear model)
we can compute the autocorrelation of the Markov process at timelag t (Supplementary Note 3):
The decay timescale of this autocorrelation is equivalent to the selfexcitation timescale in the network model
which for \(\Delta {t}^{{\prime} }=1\) is equivalent to Eq. (23).
Analytical derivation of timescales for nonlinear interactions
We can write down the general form of transition rates described previously in Eq. (16) as
\({{{{{{{\mathcal{F}}}}}}}}(x)\) is a nonlinear activation function that is a monotonically increasing function of x and satisfies \({{{{{{{\mathcal{F}}}}}}}}(0)=0\), \({{{{{{{\mathcal{F}}}}}}}}(\infty )=1\). Here we consider \({{{{{{{\mathcal{F}}}}}}}}\) of the form:
where θ is a positive constant that controls the gain of recurrent inputs, and n is the number of connected neighbors to each target unit. The activation function with a constant global input current I ⩾ 0 can be written as:
where \(\bar{S}\) is the steadystate mean activity. Here I is a constant input current that uniformly increases activation of all units, which is different from p_{ext} that provides stochastic and spatially random activation of units. We interpret I as the attentional input (e.g., from FEF) to V4 area.
To compute the timescales in the presence of nonlinearity and external input current, we can perform Taylor expansion of the interaction terms around the mean activity \(\bar{S}\)
where \({{{{{{{{\mathcal{F}}}}}}}}}^{{\prime} }\) denotes the derivative of \({{{{{{{\mathcal{F}}}}}}}}\) and \({{{{{{{{\mathcal{F}}}}}}}}}_{0}\) is defined as
Using these expansions, we can rewrite the transition rates as
where
Hence, all noninteraction and interaction terms, as well as the mean activity \(\bar{S}\) depend on the external input. Consequently, the selfexcitation and interaction timescales become input dependent.
The explicit form of the selfexcitation timescale and the global interaction timescale are given by
and
When \(({\beta }_{1}^{{\prime} }{\beta }_{2}^{{\prime} }) \, < \, 0\), increasing the external input I would lead to an increase in the mean activity and the selfexcitation timescale. This conditions implies that already active units are more excitable in the next time step compared to silent units. Moreover, if in addition to \(({\beta }_{1}^{{\prime} }{\beta }_{2}^{{\prime} }) \, < \, 0\), we have \({\beta }_{1}^{{\prime} }{\beta }_{2}^{{\prime} }\bar{S}+{\beta }_{1}^{{\prime} } \, < \, 0\), the global timescale would also increase. Other interaction timescales increase with the input when \({\beta }_{1}^{{\prime} }{\beta }_{2}^{{\prime} }\bar{S}+{c}_{1}{\beta }_{1}^{{\prime} } \, < \, 0\) ( − 1 < c_{1} < 1) (details in^{47}). The changes in the fast timescale are smaller than in the slow timescale and can remain undetected with the limited data amount.
Matching the timescales of the network model to neural data
To match the timescales between the model and V4 data, we used the activity autocorrelation of one unit in the network model with local connections (r = 1). We searched for model parameters such that the model timescales fell within the range of timescales observed in the V4 activity, which was the mean ± s.e.m of the MAP timescaleestimates across recording sessions. We computed the range for the fast timescales from the pooled attendin and attendaway conditions, since they were not significantly different: τ_{1,att−away} = τ_{1,att−in} = 4.74 ± 0.42 ms. We used this range for the fast timescale in both the attendin and attendaway conditions. For the slow timescales, we computed the ranges separately for the attendin (averaged over covert and overt) and attendaway conditions: τ_{2,att−away} = 117.09 ± 10.58 ms, τ_{1,att−in} = 140.97 ± 11.51 ms.
We fitted the selfexcitation and effective interaction timescales obtained from the autocorrelation of an individual unit’s activity in the model to the fast and slow timescales of V4 data estimated from the aABC method. Using Eq. (30) and Eq. (27), we found an approximate range of parameters p_{s} and p_{r} that reproduce V4 timescales. Then, we performed a grid search within this parameter range to identify the model timescales falling within the range of V4 timescales during attendaway and attendin conditions. We used model simulations for grid search since the analytical results for the effective interaction timescale are approximate. We used very long model simulations (10^{5} time steps) to obtain unbiased autocorrelations and then estimated the model timescales by fitting a double exponential function
directly to the empirical autocorrelations. We fitted the exponential function up to the timelag t_{m} = 100 ms, the same as used for fitting the neural data autocorrelations with the aABC method.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All behavioral and electrophysiological data used in this study are available on Fighshare at https://doi.org/10.6084/m9.figshare.19077875.v1^{72} (fixation task, FT), https://doi.org/10.6084/m9.figshare.16934326.v3^{71} (attention task 1, AT1), and https://doi.org/10.6084/m9.figshare.21972911.v2^{73} (attention task 2, AT2). Source data are provided with this paper.
Code availability
The code for the timescale estimation and Bayesian model comparison with the aABC method is available as a Python package at: https://github.com/roxanazeraati/abcTau^{79}. The code for simulating network models is available at: https://github.com/roxanazeraati/spatialnetwork^{80}.
References
Kiebel, S. J., Daunizeau, J. & Friston, K. J. A Hierarchy of TimeScales and the Brain. PLOS Comput. Biol. 4, e1000209 (2008).
Wiltschko, A. et al. Mapping SubSecond Structure in Mouse Behavior. Neuron 88, 1121–1135 (2015).
Berman, G. J., Bialek, W. & Shaevitz, J. W. Predictability and hierarchy in Drosophila behavior. Proc. Natl Acad. Sci. 113, 11943–11948 (2016).
Uchida, N. & Mainen, Z. F. Speed and accuracy of olfactory discrimination in the rat. Nat. Neurosci. 6, 1224–1229 (2003).
Buracas, G. T., Zador, A. M., DeWeese, M. R. & Albright, T. D. Efficient Discrimination of Temporal Patterns by MotionSensitive Neurons in Primate Visual Cortex. Neuron 20, 959–969 (1998).
Yang, Y., DeWeese, M., Otazu, G. & Zador, A. Millisecondscale differences in neural activity in auditory cortex can drive decisions. Nature Precedings11 (2008). https://www.nature.com/articles/npre.2008.2280.1.
Bathellier, B., Buhl, D. L., Accolla, R. & Carleton, A. Dynamic Ensemble Odor Coding in the Mammalian Olfactory Bulb: Sensory Information at Different Timescales. Neuron 57, 586–598 (2008).
Jonides, J. et al. The Mind and Brain of ShortTerm Memory. Annu. Rev. Psychol. 59, 193–224 (2008).
Sarafyazd, M. & Jazayeri, M. Hierarchical reasoning by neural circuits in the frontal cortex. Science364 (2019). https://science.sciencemag.org/content/364/6441/eaav8911.
Shadlen, M. N. & Newsome, W. T. Neural Basis of a Perceptual Decision in the Parietal Cortex (Area LIP) of the Rhesus Monkey. J. Neurophysiol. 86, 1916–1936 (2001).
Murray, J. D. et al. A hierarchy of intrinsic timescales across primate cortex. Nat. Neurosci. 17, 1661–1663 (2014).
Spitmaan, M., Seo, H., Lee, D. & Soltani, A. Multiple timescales of neural dynamics and integration of taskrelevant signals across cortex. Proc. Natl Acad. Sci. 117, 22522–22531 (2020).
Gao, R., van den Brink, R. L., Pfeffer, T. & Voytek, B. Neuronal timescales are functionally dynamic and shaped by cortical microarchitecture. eLife 9, e61277 (2020).
Honey, C. et al. Slow Cortical Dynamics and the Accumulation of Information over Long Timescales. Neuron 76, 423–434 (2012).
Raut, R. V., Snyder, A. Z. & Raichle, M. E. Hierarchical dynamics as a macroscopic organizing principle of the human brain. Proc. Natl Acad. Sci. 117, 20890–20897 (2020).
Fallon, J. et al. Timescales of spontaneous fMRI fluctuations relate to structural connectivity in the brain. Netw. Neurosci. 4, 788–806 (2020).
Cavanagh, S. E., Hunt, L. T. & Kennerley, S. W. A Diversity of Intrinsic Timescales Underlie Neural Computations. Frontiers in Neural Circuits14 (2020). https://www.frontiersin.org/articles/10.3389/fncir.2020.615626/full?field=&id=615626&journalName=Frontiers_in_Neural_Circuits.
Wang, J., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible timing by temporal scaling of cortical responses. Nat. Neurosci. 21, 102–110 (2018).
Meirhaeghe, N., Sohn, H. & Jazayeri, M. A precise and adaptive neural mechanism for predictive temporal processing in the frontal cortex. Neuron 109, 2995–3011.e5 (2021).
Bernacchia, A., Seo, H., Lee, D. & Wang, X.J. A reservoir of time constants for memory traces in cortical neurons. Nat. Neurosci. 14, 366–372 (2011).
Runyan, C. A., Piasini, E., Panzeri, S. & Harvey, C. D. Distinct timescales of population coding across cortex. Nature 548, 92–96 (2017).
Siegle, J. H. et al. Survey of spiking in the mouse visual system reveals functional hierarchy. Nature 592, 86–92 (2021).
Boucher, P. O. et al. Neural population dynamics in dorsal premotor cortex underlying a reach decision (2022). https://www.biorxiv.org/content/10.1101/2022.06.30.497070v1.
Wang, X.J. Macroscopic gradients of synaptic excitation and inhibition in the neocortex. Nat. Rev. Neurosci. 21, 169–178 (2020).
Huntenburg, J. M., Bazin, P.L. & Margulies, D. S. LargeScale Gradients in Human Cortical Organization. Trends Cogn. Sci. 22, 21–31 (2018).
Elston, G. N. 4.13  Specialization of the Neocortical Pyramidal Cell during Primate Evolution. In Kaas, J. H. (ed.) Evolution of Nervous Systems, 191242 (Academic Press, Oxford, 2007). http://www.sciencedirect.com/science/article/pii/B0123708788001646.
Chaudhuri, R., Knoblauch, K., Gariel, M.A., Kennedy, H. & Wang, X.J. A LargeScale Circuit Mechanism for Hierarchical Dynamical Processing in the Primate Cortex. Neuron 88, 419–431 (2015).
Glasser, M. F. & Essen, D. C. V. Mapping Human Cortical Areas In Vivo Based on Myelin Content as Revealed by T1 and T2Weighted MRI. J. Neurosci. 31, 11597–11616 (2011).
Burt, J. B. et al. Hierarchy of transcriptomic specialization across human cortex captured by structural neuroimaging topography. Nat. Neurosci. 21, 1251–1259 (2018).
Hart, E. & Huk, A. C. Recurrent circuit dynamics underlie persistent activity in the macaque frontoparietal network. eLife 9, e52460 (2020).
Wasmuht, D. F., Spaak, E., Buschman, T. J., Miller, E. K. & Stokes, M. G. Intrinsic neuronal dynamics predict distinct functional roles during working memory. Nat. Commun. 9, 3499 (2018).
Safavi, S. et al. Nonmonotonic spatial structure of interneuronal correlations in prefrontal microcircuits. Proc. Natl Acad. Sci. 115, E3539–E3548 (2018).
Demirtaş, M. et al. Hierarchical Heterogeneity across Human Cortex Shapes LargeScale Neural Dynamics. Neuron 101, 1181–1194.e13 (2019).
LitwinKumar, A. & Doiron, B. Slow dynamics and high variability in balanced cortical networks with clustered connections. Nat. Neurosci. 15, 1498–1505 (2012).
Chaudhuri, R., Bernacchia, A. & Wang, X.J. A diversity of localized timescales in network activity. eLife 3, e01239 (2014).
Engel, T. A. et al. Selective modulation of cortical state during spatial attention. Science 354, 1140–1144 (2016).
Steinmetz, N. A. & Moore, T. Eye Movement Preparation Modulates Neuronal Responses in Area V4 When Dissociated from Attentional Demands. Neuron 83, 496–506 (2014).
van Kempen, J. et al. Topdown coordination of local cortical state during selective attention. Neuron (2021). http://www.sciencedirect.com/science/article/pii/S0896627320309958.
Zeraati, R., Engel, T. A. & Levina, A. A flexible Bayesian framework for unbiased estimation of timescales. Nat. Comput. Sci. 2, 193–204 (2022).
Fries, P., Reynolds, J. H., Rorie, A. E. & Desimone, R. Modulation of Oscillatory Neuronal Synchronization by Selective Visual Attention. Science 291, 1560–1563 (2001).
Chalk, M. et al. Attention Reduces StimulusDriven Gamma Frequency Oscillations and Spike Field Coherence in V1. Neuron 66, 114–125 (2010).
Ferro, D., van Kempen, J., Boyd, M., Panzeri, S. & Thiele, A. Directed information exchange between cortical layers in macaque V1 and V4 and its modulation by selective attention. Proc. Natl Acad. Sci. 118, e2022097118 (2021).
Mitchell, J. F., Sundberg, K. A. & Reynolds, J. H. Spatial Attention Decorrelates Intrinsic Activity Fluctuations in Macaque Area V4. Neuron 63, 879–888 (2009).
Cavanagh, S. E., Wallis, J. D., Kennerley, S. W. & Hunt, L. T. Autocorrelation structure at rest predicts value correlates of single neurons during rewardguided choice. eLife 5, e18937 (2016).
Kim, R. & Sejnowski, T. J. Strong inhibitory signaling underlies stable temporal dynamics and working memory in spiking neural networks. Nat. Neurosci. 24, 129–139 (2021).
Beiran, M. & Ostojic, S. Contrasting the effects of adaptation and synaptic filtering on the timescales of dynamics in recurrent networks. PLOS Comput. Biol. 15, e1006893 (2019).
Shi, Y.L., Zeraati, R., Levina, A. & Engel, T. A. Spatial and temporal correlations in neural networks with structured connectivity. Phys. Rev. Res. 5, 013005 (2023).
Buxhoeveden, D. P. & Casanova, M. F. The minicolumn hypothesis in neuroscience. Brain 125, 935–951 (2002).
Mountcastle, V. B. The columnar organization of the neocortex. Brain 120, 701–722 (1997).
Ginzburg, I. & Sompolinsky, H. Theory of correlations in stochastic neural networks. Phys. Rev. E. 50, 3171–3191 (1994).
Shi, Y.L., Steinmetz, N. A., Moore, T., Boahen, K. & Engel, T. A. Cortical state dynamics and selective attention define the spatial pattern of correlated variability in neocortex. Nat. Commun. 13, 44 (2022).
Smith, M. A. & Sommer, M. A. Spatial and Temporal Scales of Neuronal Correlation in Visual Area V4. J. Neurosci. 33, 5422–5432 (2013).
Haldeman, C. & Beggs, J. M. Critical Branching Captures Activity in Living Neural Networks and Maximizes the Number of Metastable States. Phys. Rev. Lett. 94, 058101 (2005).
Thiele, A. & Bellgrove, M. A. Neuromodulation of Attention. Neuron 97, 769–785 (2018).
Anderson, J. C., Kennedy, H. & Martin, K. A. C. Pathways of Attention: Synaptic Relationships of Frontal Eye Field to V4, Lateral Intraparietal Cortex, and Area 46 in Macaque Monkey. J. Neurosci. 31, 10872–10881 (2011).
Huang, C. et al. Circuit Models of LowDimensional Shared Variability in Cortical Networks. Neuron 101, 337–348.e4 (2019).
He, B. J., Snyder, A. Z., Zempel, J. M., Smyth, M. D. & Raichle, M. E. Electrophysiological correlates of the brain’s intrinsic largescale functional architecture. Proc. Natl Acad. Sci. 105, 16039–16044 (2008).
Okun, M., Steinmetz, N. A., Lak, A., Dervinis, M. & Harris, K. D. Distinct Structure of Cortical Population Activity on Fast and Infraslow Timescales. Cereb. Cortex. 29, 2196–2210 (2019).
Tomen, N., Rotermund, D. & Ernst, U. Marginally subcritical dynamics explain enhanced stimulus discriminability under attention. Front. Sys. Neurosci. 8 (2014). https://www.frontiersin.org/articles/10.3389/fnsys.2014.00151/full.
Dahmen, D. et al. Strong and localized recurrence controls dimensionality of neural activity across brain areas. Tech. Rep., bioRxiv (2022). https://www.biorxiv.org/content/10.1101/2020.11.02.365072v3.
Muñoz, M. A. Colloquium: Criticality and dynamical scaling in living systems. Rev. Mod. Phys. 90, 031001 (2018).
Hennequin, G., Ahmadian, Y., Rubin, D. B., Lengyel, M. & Miller, K. D. The Dynamical Regime of Sensory Cortex: Stable Dynamics around a Single StimulusTuned Attractor Account for Patterns of Noise Variability. Neuron 98, 846–860.e5 (2018).
Moore, T. & Armstrong, K. M. Selective gating of visual signals by microstimulation of frontal cortex. Nature 421, 370–373 (2003).
Schafer, R. J. & Moore, T. Attention Governs Action in the Primate Frontal Eye Field. Neuron 56, 541–551 (2007).
Rockland, K. S., Saleem, K. S. & Tanaka, K. Divergent feedback connections from areas V4 and TEO in the macaque. Vis. Neurosci. 11, 579–600 (1994).
Shou, T.D. The functional roles of feedback projections in the visual system. Neurosci. Bull. 26, 401–410 (2010).
Gjorgjieva, J., Drion, G. & Marder, E. Computational implications of biophysical diversity and multiple timescales in neurons and synapses for circuit performance. Curr. Opin. Neurobiol. 37, 44–52 (2016).
Duarte, R., Seeholzer, A., Zilles, K. & Morrison, A. Synaptic patterning and the timescales of cortical dynamics. Curr. Opin. Neurobiol. 43, 156–165 (2017).
Bright, I. M. et al. A temporal record of the past with a spectrum of time constants in the monkey entorhinal cortex. Proc. Natl Acad. Sci. 117, 20274–20283 (2020).
PerezNieves, N., Leung, V. C. H., Dragotti, P. L. & Goodman, D. F. M. Neural heterogeneity promotes robust learning. Nat. Commun. 12, 5791 (2021).
Steinmetz, N. & Moore, T. Dataset of lineararray recordings from macaque V4 during a spatial attention task. Figshare (2021). https://doi.org/10.6084/m9.figshare.16934326.v3.
Steinmetz, N. & Moore, T. Dataset of lineararray recordings from macaque V4 during a fixation task. Figshare (2022). https://doi.org/10.6084/m9.figshare.19077875.v1.
Gieselmann, M. & Thiele, A. Dataset of lineararray recordings from macaque V4 during a selective attention task. Figshare (2023). https://doi.org/10.6084/m9.figshare.21972911.v2.
Gieselmann, M. A. & Thiele, A. Comparison of spatial integration and surround suppression characteristics in spiking activity and the local field potential in macaque V1. Eur. J. Neurosci. 28, 447–459 (2008).
Marin, J.M., Pillai, N. S., Robert, C. P. & Rousseau, J. Relevant statistics for Bayesian model choice. J. R. Stat. Soc.76, 833–859 (2014).
Bishop, C. M.Pattern recognition and machine learning (springer, 2006).
Chen, X., Zirnsak, M. & Moore, T. Dissonant Representations of Visual Space in Prefrontal Cortex during Eye Movements. Cell Rep. 22, 2039–2052 (2018).
Larremore, D. B., Shew, W. L., Ott, E., Sorrentino, F. & Restrepo, J. G. Inhibition Causes Ceaseless Dynamics in Networks of Excitable Nodes. Phys. Rev. Lett. 112, 138103 (2014).
Zeraati, R., Engel, T. A. & Levina, A. roxanazeraati/abcTau: a flexible Bayesian framework for unbiased estimation of timescales (2022). https://doi.org/10.5281/zenodo.5949117.
Zeraati, R., Shi, Y.L., Levina, A. & Engel, T. A. roxanazeraati/spatialnetwork: Simulation of network models with spatial connectivity (2023). https://doi.org/10.5281/zenodo.7625655.
Acknowledgements
This work was supported by a Sofja Kovalevskaja Award from the Alexander von Humboldt Foundation, endowed by the Federal Ministry of Education and Research (R.Z., A.L.), SMARTSTART2 program provided by Bernstein Center for Computational Neuroscience and Volkswagen Foundation (R.Z.), the NIH grant R01 EB026949 (T.A.E.), the Swartz Foundation (Y.S.), the Pershing Square Foundation (T.A.E.), the Sloan Research Fellowship (Y.S., T.A.E.), NIH grant RF1DA055666 (Y.S., T.A.E.), the NIH grant EY014924 (T.M.), the MRC grant MR/P013031/1 (M.A.G., A.T.). This work was performed with assistance from the NIH Grant S10OD0286320. We acknowledge the support from the BMBF through the Tübingen AI Center (FKZ: 01IS18039B) and the International Max Planck Research School for the Mechanisms of Mental Function and Dysfunction (IMPRSMMFD). A.L. is a member of the Machine Learning Cluster of Excellence, EXC number 2064/1  Project number 39072764. We thank Julia Wang for providing the code for estimating receptive fields.
Author information
Authors and Affiliations
Contributions
R.Z., A.L., and T.A.E. designed the study. N.A.S., M.A.G., A.T., and T.M. designed the experiments. N.A.S. and M.A.G. performed the experiments and spike sorting. R.Z., Y.S., A.L., and T.A.E. developed the analysis methods and mathematical models. R.Z. analyzed the data and performed model simulations. Y.S. performed the analytical calculations for the network model. R.Z., Y.S., N.A.S., M.A.G., A.T., T.M., A.L., and T.A.E. discussed the findings and wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks X, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zeraati, R., Shi, YL., Steinmetz, N.A. et al. Intrinsic timescales in the visual cortex change with selective attention and reflect spatial connectivity. Nat Commun 14, 1858 (2023). https://doi.org/10.1038/s41467023376137
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467023376137
This article is cited by

Propagation of activity through the cortical hierarchy and perception are determined by neural variability
Nature Neuroscience (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.