Flexible control of speed of cortical dynamics

Musicians can perform at different tempos, speakers can control the cadence of their speech, and children can flexibly vary their temporal expectations of events. To understand the neural basis of such flexible timing, we recorded from the medial frontal cortex of primates trained to produce different time intervals with different effectors. The activity of neurons was heterogeneous, nonlinear and complex. However, responses were unified under a remarkable form of invariance: firing rate profiles were temporally stretched for longer intervals and compressed for short ones. At the network level, this phenomenon was evident by flexible changes in the speed with which the population activity traced an invariant trajectory. To identify the origin of speed control, we recorded from both downstream caudate neurons and thalamic neurons projecting to the medial frontal cortex. Speed adjustments were a prominent feature in the caudate but not in the thalamus suggesting that this phenomenon originates within cortical networks. To understand the underlying mechanisms, we created recurrent neural network models at different levels of complexity that could explain flexible timing with speed control. Analysis of the models revealed that the key to flexible speed control was the action of an external input upon the nonlinearities of individual neurons whose recurrent interactions set the network’s relaxation dynamics. These findings demonstrate a simple and general mechanism for conferring temporal flexibility upon sensorimotor and cognitive functions.

Mental capacities such as anticipation, motor coordination, deliberation, and imagination lie at the core of higher brain function. A fundamental feature of these capacities is that they are not tied to immediate sensory or motor events and unfold at different timescales. To support such temporal flexibility, the brain must control the dynamics of ongoing patterns of neural activity.
However, the mechanisms that control the dynamics are not known. A simple behavior where flexible adjustments of dynamics are required is motor timing; i.e., controlling when to initiate a movement. To test the neural basis of temporal flexibility in motor timing, we developed a task in which monkeys used different contextual cues to produce either a 800 ms or a 1500 ms interval, responding either by saccade or manual button press (Fig. 1a). While monkeys performed the task, we recorded neural activity in the dorsomedial frontal cortex (MFC), which has been implicated in the inhibition 1,2 , initiation [3][4][5] , and coordination 6-12 of movements. Consistent with previous work in humans [13][14][15][16][17][18][19] , nonhuman primates (NHPs) 3,[20][21][22][23][24][25][26] and rodents [27][28][29][30][31][32] , MFC responses were modulated by elapsed time. The activity of a large proportion of neurons exhibited an intriguing form of temporal invariance: responses were stretched or compressed in accordance with the produced interval. Temporal scaling at the level of single neurons is equivalent to an adjustment of speed with which the population activity evolves along an invariant trajectory. Two properties made this change of speed noteworthy.
First, unlike previous work where temporal scaling was reported during adaptation and learning 33,34 , here, speed control constituted rapid trial-by-trial switches based on contextual cues, which cannot be explained by slow plasticity mechanisms. Second, the majority of neurons that exhibited temporal scaling had nonlinear and heterogeneous response profiles making the scaling phenomenon inconsistent with existing models of timing 35,36 including pacemaker accumulator [37][38][39][40] , oscillations [41][42][43] liquid-state machine population clocks [44][45][46][47][48] . These considerations necessitated a new perspective that could explain temporal scaling in neurons with highly complex response profiles.
As a first step toward investigating the relevant neural substrates, we examined temporal scaling in the caudate nucleus downstream of MFC and regions of the thalamus projecting to MFC that have been implicated in motor timing 20,22,23,25,[49][50][51][52] . Results were best explained by the hypothesis that speed control originated in the cortex.
To investigate the potential underlying mechanisms, we analyzed the dynamics of recurrent neural network models that were either engineered or trained to produce different time intervals using context-dependent inputs. By setting the models in certain input-output regimes, we discovered a simple and novel mechanism that enabled the rapid and flexible adjustment of the speed of dynamics in networks of heterogeneous neurons.

Behavior
On each trial, the color and shape of the fixation point served as a contextual cue (' Cue' ) indicating the desired interval and response effector respectively (Fig. 1a). We refer to the four trial conditions as EL, ES, HL and HS, where E and H indicate Eye and Hand , and S and L indicate Short (800 ms) and Long (1500 ms) intervals. Production intervals ( Tp ) were measured from the time of a brief ' Set ' flash until the time of movement initiation (' Saccade ' or ' Button press '). The four conditions were randomly interleaved and animals were able to successfully switch between conditions on a trial-by-trial basis (Fig. 1b). They produced accurate Tp s whose variability increased for the Long condition compared to Short (Fig. 1c). This is consistent with the Weber's law and is a well-known property of timing behavior 53,54 . The Weber fraction of Tp s across the two contexts (ratio of standard deviation to mean) was significantly larger for button presses compared to saccades (one-tailed paired sample t -test, for monkey A, n = 31 sessions, P < .05, and for monkey D, n = 35 sessions, P < .001).

Causal experiments and single-unit electrophysiology
We first verified that the regions of interest in MFC (Fig. 2a) were causally involved in this task (Fig. 2b). Reversible inactivation with muscimol (GABA A agonist) significantly impaired performance for both Long and Short intervals as measured by the distribution of within-session increases in root-mean-squared error (RMSE) after the muscimol injection, when compared to before ( Table 1). The drop in performance in each session comprised of changes in both mean and standard deviation (Fig. 2b). However, no significant impairment was measured after saline injection (Table 1). Furthermore, muscimol inactivation had no significant effect on reaction times during a memory saccade task (Table 1). Based on these results, we concluded that MFC played a causal role in the main motor timing task 3,30,32 .

Temporal scaling of complex response profiles
To characterize the post-stimulus-time-histogram (PSTH) of MFC neurons in anticipation of the motor response, we binned trials based on production intervals and computed average firing rates for each bin from spike counts after aligning trials to the time of the motor response (Fig.  2c). Neurons had complex and heterogeneous response dynamics including linear, nonlinear, monotonic, non-monotonic and even multi-peaked activity profiles (Fig. 2d). This diversity could not be accounted for by existing models of timing. To quantify this, we tested single-neuron PSTHs against predictions of various models of motor timing using a cross-validation procedure (Fig. 2e). We considered two variants of the clock-accumulator model, one in which flexible timing was achieved by adjusting a threshold over a ramping process, and one in which the clock was adjusted. Since clock models can only accommodate neurons with linear ramping profiles [37][38][39][40]55 it was not surprising that both clock-accumulator models were unable to explain the nonlinear profiles exhibited by much of the population. This result was substantiated by fitting polynomials of different degrees to PSTHs using a cross-validation procedure (see Methods). This exercise revealed that only 11% (47/416) of neurons could be explained by the clock model and the remaining neurons had complex and nonlinear response profiles that deviated significantly from linear ramping. This number was robust to changes in the formulation of the clock-model. For example, making the clock model more flexible by allowing the starting and terminating points of the ramp to vary by up to 200 ms increased the number of neurons explained by this model by only 4%. We also tested two oscillation-based models of interval timing, in which the response time is determined by the collective phase of multiple oscillators with different frequencies [41][42][43] . In one variant, a single sinusoid was fit to the response of each neuron, and in another, multiple sinusoids (up to 4) of different frequencies were used. These models were also unable to capture the diversity of MFC responses (Fig. 2e). Finally, we tested MFC responses against a relatively unconstrained population-clock model [44][45][46][47][48] . In this model, each neuron is allowed to have a unique PSTH, and the movement is triggered when a decoder detects an interval-dependent pattern of activity across the neurons. Accordingly, we modeled each neuron by the best-fitting polynomial that could capture the activity profile in both the Short and Long contexts. This model performed better than both the clock-accumulator and oscillation models owing to the higher degrees of freedom associated with polynomial fits. However, MFC data did not fit with a key qualitative prediction of the population clock model. In this model, neurons are expected to have identical PSTHs for the Short and Long conditions for the entire duration of the timing interval. However the vast majority of neurons in MFC had PSTHs that were distinct for the Short and Long conditions (Fig. 2d).
Our initial inspection of the difference between response dynamics in Short and Long conditions suggested that PSTHs for different bins had a high degree of self similarity when stretched or compressed in accordance with the produced interval (Fig. 2c,d). This was true both for fluctuations of produced intervals within each temporal context (i.e., 800 ms or 1500 ms), and across the two temporal contexts. To verify this observation quantitatively, we asked whether the PSTHs were more accurately captured by a temporally-scaled polynomial function. This formulation clearly outperformed all other models in terms of explanatory power (Fig. 2e, one-way ANOVA, F 6, 2444 = 156.5, P < 0.001). Note that the clock-accumulator with a flexible clock also exhibits temporal scaling but it cannot capture scaling in neurons with complex response profiles, which as we reported, comprised 89% of task-modulated MFC neurons.
These observations highlighted the need for an alternative model that could explain temporal scaling across neurons with such heterogeneous response profiles.

Speed control across the population
The phenomenon of temporal scaling at the level of single neurons has an intuitive interpretation at the population level. When the population activity is plotted within a coordinate system in which each axis corresponds to the firing rate of a single neuron, also known as the state space 56 , response dynamics can be depicted as a point (i.e., neural state) moving along a trajectory. In this representation, temporal scaling corresponds to changing the speed with which the neural state traverses a single trajectory in high-dimensional space.
If all the responses were scaled perfectly, neural trajectories associated with different intervals would be identical. However, for many neurons, firing rate modulations did not scale perfectly ( Fig. 2d). We quantified the degree of scaling by a scaling index (SI) that was computed as a coefficient of determination (R 2 ) across temporally-scaled PSTHs associated with different production interval bins. With this measure, we verified that neurons exhibited variable degrees of temporal scaling ( Supplementary Fig. 1). The imperfect scaling was also evident by visualizing neural trajectories within the space spanned by the first three principal components (PCs). Within this space, neural trajectories did not overlap (Fig. 3a) suggesting that neural responses had features that did not scale perfectly across intervals.
We hypothesized that perfect speed control might emerge within a subspace where the projection of neural trajectories are invariant; i.e., scaling subspace (Fig. 3b). As a first step, we examined the degree of scaling in a subset of PCs. Using the same SI metric used for single neurons, we found that the first two PCs that explained nearly 40% of the variance (Fig. 3b, bottom) had a scaling index of 0.91 and 0.97, respectively (Fig. 3a, bottom). The third PC, however, did not exhibit temporal scaling and had a scaling index of 0.20. This provided evidence that speed control was present within a subspace whose dimensions explained a large percentage of variance (i.e., PC1 and PC2).
However, the dimensions of scaling do not have to perfectly coincide with PCs. To specifically target the dimensions of scaling, we developed a novel procedure to find a subspace in which trajectories were invariant and the activity was governed by speed control. Analogous to PCs that are ordered with respect to variance explained, we sought components that were ordered according to the degree of temporal scaling in the data. We formulated this as a cross-validated optimization problem that searched for dimensions in which the distance between neural trajectories of different temporal durations was minimized (see Methods). This method furnished a set of scaling components (SCs) that were ordered according to the degree of scaling in the data (Fig. 3c). Compared to PCs, SCs explained less variance suggesting that the scaling dimensions were not identical to PC dimensions, which capture maximum variance by design.
The SI values were relatively large for the first few SCs indicating that the optimization process correctly identified the scaling dimensions. (Fig. 3c, bottom, Supplementary Fig. 2). When responses were projected onto the subspace spanned by the first three SCs, they traced a nearly identical trajectory with different speeds (Fig. 3c, top), which is precisely what the scaling subspace hypothesis predicts. Note that because of cross-validation, the scaling index for SCs of the test data does not have to be in decreasing order (Fig 3c., bottom), although it was the case for the dataset which was used to determine the SCs (not shown).
Next, we asked how much of the variance of neural responses can the scaling subspace account for. To address this question, we performed two complementary analyses. First, we examined the relationship between the degree of scaling (SI) and the variance explained for each SC. SCs with large SIs explained a relatively large percentage of variance (Fig. 3d) suggesting that scaling was a prominent feature of neural activity during flexible time interval production. Second, we developed a procedure for quantifying the relationship between scaling and variance without relying on projections onto specific directions, such as PCs or SCs. We used a bootstrap procedure and quantified the relationship between variance explained and SI along 200 random projections in the state space. We then constructed a two-dimensional probability distribution of the relationship between variance explained and scaling across those random projections (Fig. 2d, inset). This analysis verified that the dimensions with large degrees of scaling also explained a large portion of the variance.
To further validate our conclusions, we examined the sufficiency of SI as a measure of scaling asking whether SI can be used as a reliable measure of temporal scaling. We tackled this problem by contraposition, asking whether the lack of scaling would result in small scaling indices. We used Gaussian processes to create non-scaling surrogate data that matched MFC responses in terms of smoothness, starting/terminal firing rates, dimensionality, and the correlation between Short and Long PSTHs (see Methods, and Supplementary Fig. 3). The surrogate data, despite being matched to the statistics of MFC responses, had relatively smaller SIs than computed for MFC neurons (Fig. 3e). This comparison to the non-scaling surrogate data verifies that a significant portion of variance in MFC resides within a scaling subspace in which activity evolves along invariant trajectories at different speeds.
Finally, we quantified the relationship between speed in the scaling subspace and behavior. We used a cross-validation procedure in which we derived the scaling subspace from a subset of shortest and longest trials, and asked whether the speed of neural trajectories of the remaining trials in that subspace could predict production intervals ( Tp ). This analysis provided strong evidence for the correspondence between the speed in the scaling subspace and Tp : longer Tp s were associated with slower speeds (Fig. 3f and Supplementary Fig. 4), and the average speed was inversely proportional to Tp (R 2 = 0.87). These results establish speed control in the MFC scaling subspace as a key principle that governs behavioral variability within each temporal context, and enables the flexible control of motor timing across the two timing contexts.

Speed control across cortico-basal ganglia circuits
Having established speed control in MFC as a potential mechanism for temporal flexibility, we asked whether this property was unique to MFC or whether it was also present in other upstream and downstream areas. We focused on various nodes of the cortico-basal ganglia circuits that have been implicated in motor timing 20,22,23,25,[49][50][51][52] . First, we investigated putative targets of MFC in the caudate [57][58][59] . We found that the region of interest in the caudate (Fig. 4a) was causally involved in the motor timing task as evidenced by reversible inactivation (Table 1).
Electrophysiological recordings (Fig. 4c) indicated that, caudate responses, like those in MFC, were complex and heterogeneous, and PSTHs were different between Short and Long trials.
We analyzed caudate responses in terms of temporal scaling and speed control. At the level of single neurons, the distribution of SIs was similar to MFC (Supplementary Fig. 1) suggesting that MFC neurons and their putative targets in the caudate exhibit similar degrees of scaling. At the population level, analysis of PCs and SCs verified the presence of a scaling subspace in the caudate (Fig. 3e, Supplementary Fig. 5). Finally, the SI values of PCs as well as an unbiased analysis of responses across random projections in the state space indicated that dimensions with strong scaling explained a large part of variance in the data (Fig. 4d). These analyses substantiated that neural signals in the caudate share the same key properties with MFC and may be part of the circuit involved in subspace speed control.
In addition to receiving inputs from MFC, the basal ganglia also projects back to MFC through the thalamus. The presence of this anatomical substrate raises the possibility that MFC inherits temporal scaling from the basal ganglia via transthalamic projections. To test this possibility, we examined neural activity in a region of the thalamus where MFC-projecting thalamocortical neurons were identified antidromically ( Fig. 4e; see Methods). Consistent with previous work 52 , reversible inactivation indicated that this area played a causal role in timing behavior (Table 1).
However, several observations indicated that the function of thalamocortical signals was different from that of the caudate and MFC (Fig. 4g). First, SIs of single thalamic neurons were significantly smaller across the population compared to the other areas (n MFC = 416 , n Cd = 278, n thalamus = 846, one-tailed two sample t -tests, P < .001, see Supplementary Fig. 1). Second, scaling in the thalamus was significantly smaller than the C+D+E+S surrogate data (one-tailed two sample t -test, n = 200 , P < .001 comparing to C+D+E+S surrogate model, Fig. 3e). Third, scaling was less prominent in the thalamus as indicated by the relationship between the magnitude of scaling and variance explained along random projections in the state space (Fig.   4h). Fourth, unlike the caudate and MFC, neural trajectories in the thalamus were not invariant in the space spanned by the first three SCs ( Supplementary Fig. 5). This was also evident in the profile of the second PC whose relationship to different Tp s was a systematic shift in average value -i.e., not scaling. Together, these observations provide strong evidence that thalamic neurons exhibit significantly less scaling than the MFC neurons they project to. Since the output of the basal ganglia to cortex is routed through the thalamus, the weak scaling in thalamocortical neurons implicates that the scaling subspace is likely to originate within MFC or other cortical circuits projecting to MFC.

A model for flexible subspace speed control
Since the timescales of MFC response modulations were slower than the intrinsic time constants of single neurons, we assumed that the observed dynamics were the result of network-level interactions. Motivated by recent advances in understanding the dynamics of motor, premotor and prefrontal cortical areas using recurrent network models 60-62 , we trained a recurrent neural network model with random connectivity to flexibly produce different time intervals in response to context inputs (Cue) whose magnitude specified the desired interval ( Fig. 5a). To mimic the task of the monkey, at a random time after the Cue onset, the network was administered a transient pulse (Set), which signaled the start of the time interval. The network was trained to generate a single output (a weighted linear sum of its units) that would initiate a "response" when the output breached a fixed threshold 63,64 .
The network learned to generate the desired output function (Fig. 5d) and the activity of model neurons emulated the key features in MFC: response profiles of individual network units were heterogeneous, complex and temporally-scaled (Fig. 5b). Moreover, the speed of population dynamics directly determined the produced interval (Fig. 5c). The first set of networks we developed were trained to produce a linear output that was scaled according to the desired interval. To ensure that the scaling of network units was not merely a consequence of scaling in the output, we trained additional networks to produce temporally non-scaling output functions.
However, even in the presence of non-scaling output functions, the recurrent model exhibited speed control in a scaling subspace and individual units continued to exhibit temporally scaled responses ( Supplementary Fig. 6). The model was also robust with respect to how the input Cue encoded the desired interval. For example, the network exhibited the same scaling behavior when the Cue was changed to a brief pulse ( Supplementary Fig. 6). The fact that the models captured subspace temporal scaling for various formulations of the input and for both scaling and non-scaling outputs demonstrated the generality and robustness of this solution in performing a flexible context-dependent timing task, and made the recurrent model suitable for uncovering potential underlying mechanisms.
We investigated how speed control emerges in recurrent neural populations (Fig. 5c) using a dynamical systems analysis 65,66 to study the evolution of activity in relation to neighboring fixed points in the state space. Importantly, we computed these fixed points in the presence of the input so that we could characterize the dynamics of the network in action. This analysis revealed that the network's initial and terminal states were governed by a pair of input-dependent stable fixed points, F init and F terminal . At the start of the trial, the Cue initialized the state of the network to an input-dependent fixed point, F init . Activation of the Set pulse drove the system away from F init allowing the system to evolve toward F terminal with a speed that was determined by the magnitude of the Cue input, which provided context (Fig. 5c,e).
This analysis revealed the complementary roles of the input drive and recurrent dynamics (Fig.   5c). The input drive set the position of the initial and terminal fixed points along a direction, which we refer to as the input subspace . Recurrent dynamics on the other hand, established a recurrent subspace , which determined the neural trajectory between the initial and terminal fixed points. These two subspaces emerged from different components of the network. The input subspace was governed by the direction specified by the input weights. In contrast, the recurrent subspace emerged from the constraints imposed by the recurrent weights. The two subspaces differed also in terms of their relationship to the scaling phenomenon. Within the input subspace, different intervals were associated with a change in the level of activity but did not exhibit scaling. This l change in level, in turn, controlled the speed by setting the position of the neural state along the axis of the input subspace. The recurrent space, on the other hand, did not control the speed but was responsible for the emergence of invariant trajectories and temporal scaling. The division of labour between these subspaces provides a remarkable and previously unsuspected explanation of why scaling and non-scaling signals might coexist within the same network. Non-scaling signals reflect the input that sets the speed, and scaling signals correspond to the evolution of activity with the desired speed. This organization predicts that MFC neurons with weak temporal scaling are likely recipients of relatively strong context-dependent input, and neurons with strong temporal scaling are more directly engaged in recurrent interactions. Finally, the model-based distinction between these two subspaces provides a theoretical basis for analyzing MFC responses within a scaling subspace, which corresponds to the recurrent subspace in the model.
We used the recurrent model to generate specific predictions about how activity in the scaling and non-scaling subspaces of MFC might correspond to behavior. Within the scaling subspace where neural trajectories were invariant, production intervals should be correlated with the speed of the dynamics. We demonstrated this earlier by analyzing production intervals in terms of speed within the space spanned by SCs (Fig. 3f). Within the non-scaling subspace where the activity putatively reflects the input, production times should be correlated with the average level -not speed -of neural activity. To test this novel model-based prediction, we asked whether production intervals could be predicted by the average MFC activity when projected onto the least-scaling subspace. We inferred the least-scaling direction from our scaling component analysis. SCs specified an orthonormal basis whose axes were ordered according to the level of scaling ( Supplementary Fig. 7). Therefore, we used the last SC (SC9) as an estimate of the least-scaling direction, and compared production times to average level of MFC projected onto SC9. As predicted by the model, the average activity of the non-scaling components of MFC were indeed predictive of production times ( Supplementary Fig. 8). This is a compelling result as it bears out a key prediction about an unsuspected relationship between cortical activity and behavior made by a model that was constrained only to produce behavior.
We emphasize that the recurrent model is fundamentally different from the oscillation and population-clock models since the former can support temporal scaling whereas the latter cannot. At a phenomenological level, the recurrent model superficially resembles the clock-accumulator model with a flexible clock since both exhibit temporal scaling. However, the two are conceptually and mechanistically different. The clock model can only generate ramping activity, and it does so by using the input (i.e., clock) to drive an integrator (e.g., a line attractor).
The recurrent model does not perform any integration; instead, it uses an input to set the relaxation dynamics in the recurrent subspace toward a terminal fixed point. This difference allows the recurrent model to generalize the temporal scaling phenomenon across neurons with simple to highly complex response profiles.

A potential neural mechanisms for speed control
To investigate the mechanisms that mediated the observed speed control, we analyzed the eigenvalues associated with the region in close proximity of the terminal fixed point, F terminal . In the vicinity of this fixed point, stronger inputs caused the eigenvalues to decrease systematically ( Fig. 5f, left). In a linear dynamical system, such contraction in the eigenvalue spectrum corresponds to a systematic increase in the network's effective time constants, (Fig. 5f, right). From this, we concluded that the action exerted by the input drive is equivalent to adjusting the system's effective time constant in a flexible input-dependent manner. To gain insight into the mechanism that provides such powerful and modular control of time constants, we focused on a simplified recurrent network model composed of only two mutually inhibitory neurons that received a common input (Fig. 6a). The reasons for our choice of this model were twofold. First, this two-neuron network represents one of the simplest architectures whose behavior can be controlled by the interplay between common input and recurrent interactions ( Fig. 6c). Second, previous work has demonstrated that adjustments of the common input in this model could alter its recurrent dynamics to either relax to a single fixed point with a specific time constant or act as an integrator with exceedingly long time constants [67][68][69] . We reasoned that exploring the model's behavior while between these two regimes might lead us to a mechanistic understanding of how effective time constant of a network can be flexibly adjusted.
When the two neurons receive balanced input (Cue), their interaction creates an energy landscape that engenders a pair of stable fixed points similar to the recurrent model that are separated by an energy barrier i.e., unstable fixed point (Fig. 6b). We analyzed the phase plane of the model (Fig. 6c) and verified that the input level can be used to create a continuum of as a function of the common input. This is analogous to the recurrent network model where activity along the input subspace served to control the speed. However, the two-neuron model helped us understand the underlying mechanisms in simple and intuitive terms: stronger input drives neurons toward their saturating nonlinearity where the slopes of activation functions are shallower (Fig. 6d). Regimes of shallow slopes reduce the neuron's responsiveness to changes in input and lead to an increase in . In other words, the presence of single-neuron nonlinearities enable an input to exploit different slopes along the activation function. The slope would in turn determine the energy gradients ( Fig. 6b) and set the speed with which responses change over time. What this simple network demonstrates is the importance of the action of the input on the neuron's nonlinearities, which is the key factor in adjusting the speed.
Having established a low-level mechanism in the two-neuron model, we asked whether the same mechanism was operative in the recurrent network model, whose units are each equipped with a saturating nonlinearity analogous to the those found in the simple network. For the recurrent model, we analyzed the operating point of units as a function of the input drive near F terminal . Remarkably, for stronger inputs, units were systematically driven further toward their saturating nonlinearity (Fig. 5g,h), which is consistent with the mechanism of speed control in the simple network model. These results underscore a simple and powerful mechanism at the level of single neurons for controlling the speed of dynamics independent of the neural trajectory.

Discussion
We found that motor timing in the MFC and caudate populations was governed by the rate of change of population activity as a function of time, which corresponds to the speed of dynamics.
This property was evident at the single neuron level but was most pronounced in a scaling subspace of the population activity. In this subspace, the random fluctuations of speed predicted variability within each temporal context whereas systematic adjustments to the speed predicted behavioral flexibility across the two temporal contexts. These observations underline population speed within the scaling subspace as a key variable in motor timing behavior.
To gain insight into the principles of subspace speed control, we reverse-engineered a recurrent network model trained to perform a context-dependent timing task. The model demonstrated that flexible speed control is achieved through the interaction of two subspaces, an input subspace controlled by a context cue, and a recurrent subspace established by recurrent synaptic connections. These two subspaces served complementary functions that could be readily understood under the framework of dynamical systems. The input set the initial condition of the system and the recurrent subspace controlled the dynamics in the vicinity of that initial condition. In this framework, behavioral flexibility was conferred by the input's ability to set the initial condition to different regions of the state space with different gradients (i.e., speeds).
To further understand how the context input is able to flexibly change the speed, we engineered a two-neuron model in which the input and recurrent subspaces could be represented as one-dimensional directions in the state space [67][68][69] . This model highlighted the crucial role of single-neuron nonlinearities, revealing that adjustments of speed were governed by the interaction of input with these nonlinearities. This finding motivates several hypotheses regarding the structure and control of dynamics in the brain. For example, it suggests that many circuits and subcircuits might be able to adjust the speed of their dynamics independently and operate at different timescales 70 . It also predicts that neuromodulatory effects and pharmacological treatments that interfere with the nonlinear response curve of individual neurons could alter the speed of cortical dynamics, as observations from numerous studies of interval timing might suggest 71,72 .
Furthermore, our work has important implications for a wide range of behaviors in motor control, sensory anticipation and mental tracking where temporal flexibility is critical. The mechanisms we have identified provide a simple solution for how cortical circuits could encode behaviorally relevant variables in one subspace while using another subspace to adjust the speed of their dynamics. This would allow the same behavior to unfold along the same neural trajectory at different timescales. This is consistent with a study of the dynamics of sequential activation of striatal neurons in mice during a temporal bisection task 73 , and studies of speed-accuracy tradeoff in decision making 74 . Although these studies differ from ours substantially in terms of task demands and neural computations, the similarities in terms of neural dynamics raise the intriguing possibility that speed control might be a general principle in neural computations.
The source of the external input that adjusts the speed of cortical dynamics remains a pertinent and unresolved question. Several neural pathways could provide such input. One possibility is that the input arises from subpopulations of cortical neurons. This would be consistent with a recent study showing that neurons in parietal cortex encode the desired speed in their firing rates 75 . Indeed, we found that MFC neurons that exhibited the least amount of scaling encode the speed in their average firing rates. Another possibility is that the input drive is provided by thalamic afferents, which is consistent with our recording from MFC-projecting thalamocortical neurons whose activity level varied systematically with produced intervals. Alternatively, a number of physiology and pharmacology studies have implicated dopamine in regulating timing behavior 76,77 . Other neuromodulatory systems may also play a role in controlling cortical dynamics. Cortical dynamics are also known to depend on cellular properties such as those mediated by NMDA receptors, which are thought to facilitate the generation of stable slow cortical dynamics 78 . The exact signalling pathways and underlying biophysical properties notwithstanding, the mechanisms that we have identified have the potential to explain how the brain flexibly controls cortical dynamics.

Methods
Two adult rhesus monkeys (Macaca mulatta, a 6.5 kg female and 9.0 kg male, both 5 years old) were trained on a two-interval two-effector motor timing task. All surgical, behavioral and experimental procedures conformed to the guidelines of National Institutes of Health and were approved by the Committee of Animal Care at Massachusetts Institute of Technology.

Behavior
The . Button-press responses had to be made with the hand contralateral to the recorded hemifield 79 . The production interval was measured from the endpoint of Set to the moment the saccade was initiated or the button was triggered. The width of the acceptance window was adjusted dynamically on a trial-by-trial basis and independently for the Short and Long conditions using a one-up one-down staircase procedure. As such, animals were rewarded for nearly half of trials (on average, 57% in monkey A and 51% in monkey D) for both temporal contexts. For trials that were rewarded, in addition to reward delivery, the color of the stimulus changed to green and an auditory clicking sound was simultaneously presented (Fig. 1a). Within the acceptance window, the magnitude of the reward scaled with accuracy.

Reversible inactivation
Injections were made with a microinjection pump (UMP3, World Precision Instruments) and a Hamilton syringe, which was connected to a custom 30G stainless steel injection cannula via a fused silica injection line (365µm OD, 100µm ID, Polymicro Technologies). In each injection session, we first established the animal's baseline behavioral performance. Afterwards, we pressure-injected muscimol hydrobromide (5 µg/µL in saline) in the region of interest at a rate of 0.2 µL/min. In the MFC and caudate, a total of 2 µL was injected per session. In pilot inactivation experiments in the thalamus, we noticed that animals stopped performing the task after 2µL muscimol injection. To ensure animals would perform the task, the total volume of muscimol in the thalamus was reduced to 1.5 µL. The behavioral task was resumed 10 min after the the injection was completed. As a control, in separate sessions, sterile saline was injected following the same procedure. For each injection session, we compared performance between equal number of trials before and after the injection.

Antidromic Stimulation
We used antidromic stimulation to localize thalamocortical MFC-projecting neurons. Antidromic spikes were recorded on a 24-channel electrode (V-probe, Plexon Inc.) in response to a single biphasic pulse of duration 0.2 ms (current < 500 uA) delivered to MFC via low impedance tungsten microelectrodes (100 -500KΩ, Microprobes). The guide tube for the tungsten electrode was used as the return path for the stimulation current. Antidromic activation evoked spikes reliably at a latency ranging from 1.8 to 3 ms, with less than 0.2 ms jitter. The region of interest recorded in the thalamus was within 1 mm of antidromically identified neurons.

Mathematical notation
Throughout the manuscript, we have used lowercase for scalars ( ), bold and lowercase for vectors ( ), bold and uppercase for matrices ( Point functions were shown as lowercase ( ) regardless of whether they were applied to scalars or vectors.

Data Analysis
All offline data processing and analyses were performed in MATLAB (MathWorks). Spiking data were bandpass filtered between 300 Hz to 7 kHz and spike waveforms were detected at a threshold that was typically set to 3 times the RMS noise. Single-and multi-units were sorted offline using a custom software, MKsort (https://sites.google.com/site/antimatt/software). The majority of the neurons were recorded in separate behavior sessions.
Estimating firing rates accurately is challenging when rates change dynamically and trials have different durations 31,80 , which was the case in our data. Since our focus was on firing rates leading up to the movement, we aligned trials with respect to movement time (Fig. 2c).
Additionally, for each condition, we discarded trials with Tp s that lay more than 3 standard deviations further from the mean (1.46% of trials). Firing rates were estimated by: 1) computing the peri-event time histogram (PETH) by averaging the spike counts per time bin, 2) using a 40 ms Gaussian kernel to compute smooth spiking density functions, and 3) z -scoring to minimize sampling bias due to baseline and amplitude differences across neurons.
To examine the relationship between firing rates and Tp s, we binned trials according to Tp and compared average firing rates for each bin. For the 800 ms interval, we used 7 bins centered on 740 to 860 ms every 20 ms, and for the 1500 ms, we used 9 bins centered on 1300 to 1620 ms every 40 ms. We denoted the overall average firing rate of a neuron as a function of time by To test if activity profiles could be described by a linear function (e.g. ramping activity), we compared 0 to 8th order polynomial fits to using cross-validation with randomized train and test sets. All neurons that were best explained by a polynomial of order 0 or 1 were considered linear so long as the fit explained at least 50% of variance. We also applied the same procedure allowing up to 200 ms offset from the beginning or end of the timing interval to ensure our results were robust.

Compare the motor timing models at the level of single/multi-units
All model fitting was performed on the training set and the goodness of fit (R 2 ) was quantified on the test set. In the clock-accumulator model with a flexible threshold, a linear ramp with fixed slope and different thresholds for different production intervals was fit to the response profile. In the clock-accumulator model with a flexible clock, the threshold was fixed and ramping rate was adjusted according to the interval. In the oscillation based models, sinusoidal functions or a sum of up to 4 different sinusoids were fit to activity profiles, in which the frequency, amplitude and phase for each sinusoid were free parameters. In the population clock, a single polynomial of up to 8th order was fit to the response profiles for both Short and Long contexts. For the temporal scaling model, the response profiles for the Short condition was used to find the best-fitting polynomial, and the temporally scaled version of the fitted functions were used to test the goodness of fit for Long trials.

Scaling Subspace
We used a principal component analysis (PCA) as a first step to compute a low-dimensional and unbiased estimate of data. We found that the first 9 principal components (PCs) captured nearly 80% of the variance in the data (Fig. 3b, bottom). We therefore computed the scaling components (SCs) from data captured by the first 9 PCs, which was computed as follows: and is the projection matrix, in which is the i th PC direction. Therefore, the denoised activity across all conditions and time points is of size . We computed the corresponding scaled responses using our scaling procedure and denoted the result by . To find the scaling subspace, we solved an optimization problem that minimized the difference between average firing rates associated with different Tp s (e.g., and ). We denote the corresponding projection by and refer to its columns as scaling components (SCs). The resulting projection can be computed as follows: We hypothesized that the speed of activity in the scaling subspace predicts Tp . We computed the instantaneous speed in the scaling subspace from projections of responses on to the first three SCs as follows: .
For each interval bin, we obtained an unbiased estimate of the relationship between speed and by resampling trials with replacement within each interval bin. The relationship between the average speed and production intervals was fitted in the log space by a linear function:

Scaling index for population data
We quantified temporal scaling in single units, principal components (PCs) and scaling components (SCs) using a scaling index ( SI ) that represented a general measure of the degree of similarity between multiple response profiles associated with different intervals. SI was computed as follows: (1)

Surrogate data for testing the scaling property
In order to assess the significance of scaling in firing rates recorded from the MFC, caudate and thalamus, we compared the SI computed for each region of interest to SI computed for data generated from a number of surrogate models that emulated various properties of the neural data with increasing levels of sophistication (Table 2). In what follows, we describe the four cumulative constraints that we considered for generating the surrogate data.

Surrogate model 1.
We required the firing rates to have heterogeneous and nonlinear profiles while mimicking the same temporal smoothness as our data. To address this constraint, for each surrogate neuron, we sampled the response profiles (i.e., analogous to firing rates) from a zero-mean multinomial Gaussian Process (GP) prior 81,82 . The covariance function between time points and , also known as the kernel function, , was formulated as follows: , with and .
These parameters were chosen to mimic the average smoothness observed in response profiles of neurons. This constraint allowed us to examine whether the scaling indices associated with different regions of interest could be attributed to the the smoothness of firing rate functions. The surrogate model 1 was not worthy of consideration as i did not exhibit any scaling. But the smoothness constraint was used in the subsequent more constrained models.
Surrogate model 2. The first model did not impose any structure among response profiles across the different produced intervals. In model 2, for each surrogate neuron, the starting point of the surrogate data was the same as the starting point of the real data and the endpoint of the surrogate data was the same as the endpoint of the real data, but the starting point and end point of the real data were not necessarily the same. To satisfy this constraint, for each surrogate neuron, we first generated a response profile corresponding to the shortest produced interval, which we will refer to as the primary profile. We then constrained the response profiles associated with all longer produced intervals to match the primary profile at both the starting and terminal points. To do so, we sampled the remaining profiles from a conditional GP where the endpoint values were constrained to be the same as the primary profile. This constraint can be applied using GP regression analysis 83 , where we derived the expectation and covariance of given the endpoints . This model tested whether our scaling results could be attributed to a general level of similarity between smooth data with matched initial and terminal points.
Surrogate model 3. The first two constraints ensured that the smoothness of the surrogate data and its starting/ending points corresponded to the recorded neuronal profiles. As a third constraint, we required the surrogate data to have the same dimensionality as the physiological data. To satisfy this constraint, we projected the surrogate data onto its principal components (PCs) and adjusted the variance according to the variance explained by PCs in neural data.
This can be achieved by multiplying the data in the PC space by a diagonal matrix that scales each axis according to the desired variance. This model tests whether our scaling results could be attributed to data generated from a low-dimensional process with the same smoothness and matching initial and terminal points.

Surrogate model 4.
As a final constraint, we required the surrogate data generated for different produced intervals to exhibit the same coefficient of determination (R 2 ) as the surrogate data generated from perfect scaling. To satisfy this constraint, for each surrogate neuron, we first sampled one instance from the GP for the shortest produced interval; i.e., the primary profile for that neuron. We then stretched the primary profile temporally to generate a set of perfectly scaled response profiles for all other produced intervals. We computed R 2 as a measure of similarity between perfectly scaled responses. We then created samples from GP for other produced intervals that were matched in starting/ending points and had the same R 2 as the perfectly scaled data. As a final step, we applied the same strategy as in surrogate model 3 to match the dimensionality of the surrogate and neural data. This model enabled us to test whether our scaling results could be attributed to the dimensionality of the data, given the same smoothness and similarity in response profiles achieved by matching temporal correlation as well as initial and terminal points.
We will refer to these models by the constraints they impose as follows: Correlation constrained ( C )

Table 2. Surrogate models
The same analyses used to assess the neural responses were used to asses scaling in data generated from the surrogate models. This included (1) reducing the dimensionality to the first 9 PCs (which captured 80% of total variance of the physiological data), (2)

Recurrent Network Architecture
We constructed a firing rate recurrent neural network (RNN) model with nonlinear units (  The network produced a one-dimensional output , read-out by the summation of linear units with weights and a bias term . The output weights were initialized to zero at the start of training.

Network Training
Multiple networks were trained with inputs that were presented randomly across trials as specified above. Networks were trained using backpropagation-through-time 84

and the
Hessian-free (HF) method 85,86 was used to stabilize this. For HF, we used a preconditioner that utilized the diagonal of the generalized Gauss Newton matrix 87 . We computed model parameters by minimizing a squared loss function between the network output and a target function 62 .
The objective least-squares function used during training was : The target function was computed for a subset of the trials and was only defined from the onset of the Set pulse till the length of the desired interval. For the duration of the Set pulse, the target function assumed a value of zero to prevent rapid transient switches in activity before the production stage.
Once training was complete, we generated a new set of randomized test trials, chose a threshold corresponding to the desired output ( ) and computed the times at which the network output crossed that threshold after Set. The time of threshold-crossing relative to Set was taken as the network's produced interval ( Tp ). Network responses were also tested for intervals that interpolated within and extrapolated beyond the trained range. All networks exhibited good interpolation performance. Networks that exhibited good extrapolation performance within 150 ms beyond trained ranges were used for further analysis (90% of tested networks). In the main text, we report interpolated intervals that matched the monkeys' response variability for both intervals.
We trained and tested multiple networks with different input profiles and different training objectives to check the consistency of the solution (Supplementary Data Fig. 6). In these networks, the scalar nature of the objective function was controlled by two parameters, and , to create a continuum of objective functions between a ramp (linear scaling) and a delta function (non-scaling). For example, with parameters A = 3 and alpha = 2.8, the above function approximates a scaled linear ramp of desired duration.
We tested additional networks with explicit non-scaling objectives and also with Cue inputs that did not remain constant throughout the trial. We have reported the results of two such variations in Supplementary Fig. 6, one in which the objective does not scale with interval duration (Supplementary Data Fig. 6a), and one in which the Cue input was a transient pulse ( Supplementary Data Fig. 6d).
Following techniques adopted by Sussillo & Barak 65 , we analyzed the rate of change of the network state to uncover its underlying dynamics. We computed to find the fixed and slow points of the system (corresponding to tolerances of and respectively). We analyzed the local dynamics around fixed points by performing an eigendecomposition of the corresponding linearized system. Fixed points associated with no positive eigenvalues were classified as stable fixed points. The unstable fixed points ( ) that we detected in some networks were usually associated with one unstable mode and often helped mediate the transition of trajectories after Set towards F terminal .
We projected activities of neurons on each trial onto the first three principal components of the total activity covariance matrix (across all trials) to obtain state space trajectories (Fig. 5c). Fixed points were projected onto the same basis for visualization purposes. For various networks, the mean squared speed of network trajectories during interval production over the first three PCs was used to relate dynamics to Tp (Fig. 5, Supplementary Figure 6). We additionally computed the scaling components (SCs) for the activity of various networks (Supplementary Fig. 7) The speed of the trajectory within the scaling subspace was calculated based on projection onto the first 3 SCs (Supplementary Data Fig. 7).

Two inhibitory-neuron model
Following previous work 69 , we engineered a two-neuron model that captured the key intuition behind speed control in the interval timing task. The network consists of two mutually inhibitory neurons ( u , and v ), which receive a common input representing the contextual cue, denoted as θ (Fig. 6a). The time-varying firing rates of the model neurons are governed by a pair of nonlinear differential equations as follows: Model structure and parameters were as follows: Magnitude of synaptic weight from input j to unit i Balanced input to u and v 100 ms Time constant of each unit, (time step) , Activation function for u and v respectively The nullcline for each variable is defined as the set of all the states where the derivative of the variable vanishes. For example, solving for provides a solution of the kind , which represents the nullcline for u (Fig. 6c). The intersection of the nullclines pertaining to each variable represent a point where variables do not change, i.e., fixed points (FPs). For a range of synaptic weights ( ) and input levels ( θ ), the system has two stable FPs that are separated by an unstable one. Increasing θ systematically shifts the position of the two nullclines (Fig. 6c).
It also reduces the energy gradient between the FPs (Fig. 6b) causing the system to slow down.
We numerically solved the coupled equations to compute the system's response dynamics.
The energy (E) of the system was defined as the path integral over a vector field: . This was computed numerically by integrating the speed along a deterministic trajectory from the unstable FP to the stable ones using the equations above to numerically simulate the relaxation of the activity towards the final FP. press ( Hand ). These four conditions were randomly interleaved and were cued throughout the trial by the color and shape of two central stimuli, a circular fixation for the eye and a square that cued the animal to place its hand on a button. The colored shape (circle or square) cued the effector, and the hue (red or blue) cued the desired interval (red for Short and blue for Long ). Red and blue correspond to Short and Long trials, and black, to Saline control sessions. The reported p-values correspond to a test of whether inactivation increased RMSE across sessions (RMSE 2 = ( Tp -Ts ) 2 = Bias 2 + Var, one-tailed paired t-test, see Table 1). No significant Σ change of RMSE was observed after saline injection (Table 1)      shows the two nullclines and the corresponding fixed points for two inputs levels (red and blue).
Activation of Set moves the system along a "recurrent subspace" which is orthogonal to the input subspace. The proximity of nullclines (crosses below the Input subspace) controls the speed. When the input is stronger, the nullclines are closer, which causes the system to become slower.  Table 1. Effects of muscimol inactivation in the three brain areas. In assessing the significance of inactivation in the motor timing experiment (second and third columns from left), we used one-tailed paired-sample Student's t -tests to examine whether the muscimol caused an increase in RMSE (H0: no increase in RMSE). The same test was used for the saline injection experiments (third and fourth columns from left). Statistical tests were done based on distributions of average RMSEs derived from trials before and after muscimol injection. Each pair of average RMSE values were computed from a pair of 50 random trials from before and after injection. The sampling was made without repeats to ensure trials were not counted twice.
In assessing the effect of muscimol on reaction times (RTs) in the memory saccade task (the rightmost column), we used a two-tailed Student's t -tests to examine whether the muscimol caused a change in reaction time (H0: no change in RT). Statistically significant effect are marked in gray. There was a change in RT after muscimol inactivation in the thalamus. This suggests that the region influenced by the inactivation of thalamus could also have a role in the memory saccade task. showing the various constraints considered for non-scaling models. All surrogate data was generated from a Gaussian Process (GP) with the same level of temporal smoothness (white rectangle, S ) as the data. We considered three additional constraints to make the surrogate data more similar to neural data without an explicit requirement for scaling. One constraint required responses for all production intervals to be at the same level at the time of Set and at the time of response. We refer to this constraint as endpoint matching (red circle, E ). Another constraint required that the dimensionality of the surrogate data match the neural data, and additionally the variance explained by each principal component (PC) be matched. We refer to this constraint as dimensionality matching (green circle, D ). Finally, we considered a constraint that required the collection of responses for different production intervals to have the same correlation (quantified as R 2 ) as expected from perfect scaling. We refer to this constraint as correlation matching (blue circle, C ). We created surrogate data for each constraint and for various combination of constraints, and compared the scaling properties to the original data.

Supplementary
Note that each constraint characterized a superset of the scaling hypothesis. (b) Example traces showing the procedure for generating the surrogate data in the C+D+E+S model for 5 randomly selected surrogate units aligned to the time of Set. We first sampled a Short trace (red) from a Gaussian process. The trace in blue corresponds to the perfectly scaled version of the red trace and is not a sample from the surrogate model. The surrogate data were generated using a constrained Gaussian Process (GP) prior as follows: the response for the shortest production intervals (red) was sampled from a GP with the same level of temporal smoothness as the neural data. The corresponding response with perfect scaling was generated by linear scaling (shown in blue). Note that the trace in blue is not a sample from the GP and is therefore, not part of the surrogate data. The gray traces correspond to the surrogate data. To generate the surrogate data, we drew samples from the Gaussian process that satisfied several criteria.
First, the starting point as well as the ending point of every gray trace had to be perfectly matched to the starting point and ending point of the perfectly scaled blue trace. Second, across the population of surrogate data, the dimensionality had to match observed neural data. Finally, the correlation between every gray trace and the red trace was the same as the correlation between the red and blue trace. In this way, every sample of GP (gray traces) matched the smoothness, endpoints, dimensionality and correlation as the real data (i.e., C+D+E+S model).