Musicians can perform at different tempos, speakers can control the cadence of their speech, and children can flexibly vary their temporal expectations of events. To understand the neural basis of such flexibility, we recorded from the medial frontal cortex of nonhuman primates trained to produce different time intervals with different effectors. Neural responses were heterogeneous, nonlinear, and complex, and they exhibited a remarkable form of temporal invariance: firing rate profiles were temporally scaled to match the produced intervals. Recording from downstream neurons in the caudate and from thalamic neurons projecting to the medial frontal cortex indicated that this phenomenon originates within cortical networks. Recurrent neural network models trained to perform the task revealed that temporal scaling emerges from nonlinearities in the network and that the degree of scaling is controlled by the strength of external input. These findings demonstrate a simple and general mechanism for conferring temporal flexibility upon sensorimotor and cognitive functions.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank M.S. Fee, J.J. DiCarlo, and R. Desimone for comments on the manuscript, and we thank D. Sussillo for advice on modeling. D.N. was supported by the Rubicon Grant (2015/446-14-008) from the Netherlands Scientific Organization (NWO). M.J. is supported by the NIH (NINDS-NS078127), the Sloan Foundation, the Klingenstein Foundation, the Simons Foundation, the Center for Sensorimotor Neural Engineering, and the McGovern Institute.
Integrated Supplementary Information
(a) Histograms on the top show the normalized distribution of the scaling index for individual neurons in the MFC (n = 416 neurons from both animals), caudate (n = 278 neurons), and thalamus (n = 846 neurons) for the Eye (Left) and Hand conditions (Right). The bottom panel shows a comparison of the cumulative probability distribution of scaling index across the three areas. The thalamus shows a predominance of smaller scaling index value (Mann-Whitney-Wilcoxon test, one-tailed, W(1260) = 310,733, z = 7.89, *** P < .001 in comparison with MFC, and W(1122) = 189,163, z = 6.98, *** P < .001 in comparison with caudate) (b) Example PSTHs covering a range of SIs in the three brain areas. The SI value and effector condition for each neuron is indicated.
First 9 PCs over the course of the production interval (abscissa) that explain 80% of variance in the MFC data (n = 281 neurons for Monkey A) in decreasing order of variance explained (left) for Short (warm colors) and Long (cool colors) intervals. First 9 SCs, obtained for the same data, in decreasing order of scaling (right, see Methods).
(a) Venn diagram showing the various constraints considered for non-scaling models. All surrogate data was generated from a Gaussian Process (GP) with the same level of temporal smoothness (white rectangle, S) as the data. We considered three additional constraints to make the surrogate data more similar to neural data without an explicit requirement for scaling. One constraint required responses for all production intervals to be at the same level at the time of Set and at the time of response. We refer to this constraint as endpoint matching (red circle, E). Another constraint required that the dimensionality of the surrogate data match the neural data, and additionally the variance explained by each principal component (PC) be matched. We refer to this constraint as dimensionality matching (green circle, D). Finally, we considered a constraint that required the collection of responses for different production intervals to have the same correlation (quantified as R2) as expected from perfect scaling. We refer to this constraint as correlation matching (blue circle, C). We created surrogate data for each constraint and for various combination of constraints, and compared the scaling properties to the original data. Note that each constraint characterized a superset of the scaling hypothesis. (b) Example traces showing the procedure for generating the surrogate data (n = 281 surrogate units) in the C+D+E+S model for 5 randomly selected surrogate units aligned to the time of Set. We first sampled a Short trace (red) from a Gaussian process. The trace in blue corresponds to the perfectly scaled version of the red trace and is not a sample from the surrogate model. The surrogate data were generated using a constrained Gaussian Process (GP) prior as follows: the response for the shortest production intervals (red) was sampled from a GP with the same level of temporal smoothness as the neural data. The corresponding response with perfect scaling was generated by linear scaling (shown in blue). Note that the trace in blue is not a sample from the GP and is therefore, not part of the surrogate data. The gray traces correspond to the surrogate data. To generate the surrogate data, we drew samples from the Gaussian process that satisfied several criteria. First, the starting point as well as the ending point of every gray trace had to be perfectly matched to the starting point and ending point of the perfectly scaled blue trace. Second, across the population of surrogate data, the dimensionality had to match observed neural data. Finally, the correlation between every gray trace and the red trace was the same as the correlation between the red and blue trace. In this way, every sample of GP (gray traces) matched the smoothness, endpoints, dimensionality and correlation as the real data (i.e., C+D+E+S model). (c) Cumulative percentage variance explained by PCs and SCs for the surrogate data generated from the non-scaling C+D+E+S model. (d) The first 9 principal components of population activity (PC, left) and the corresponding 9 scaling components (right, SCs) plotted as the function of time from Set for the non-scaling C+D+E+S model. Note that PCs and SCs are based on the surrogate data (gray traces in panel b) – not the perfectly scaled data (blue traces in panel b).
The speed of neural trajectory in MFC (n = 281 neurons for Monkey A, n = 135 neurons for Monkey D) within the scaling subspace spanned by the first 3 SCs predicted Tp across both Short and Long conditions on a trial-by-trial basis. Both ordinate and abscissa follow a logarithmic scale. The case of hand trials for Monkey A was shown in Fig. 5d. Here, all the other conditions are shown.
Supplementary Figure 5 Analysis of scaling at the population level in the caudate (top) and thalamus (bottom)
Left column: Population activity (n = 101 neurons for caudate and n = 481 neurons for thalamus, Moneky A) profiles projected onto the first 3 principal components (PCs). Activity profiles associated with different produced intervals for Short and Long conditions are plotted in different colors (same color scheme used throughout the paper). The state at 700 ms after Set is shown along the trajectories (diamond). Second column from the left: Population activity projected onto the first 3 scaling components (SCs). Activity spanned by the first 3 SCs overlap for different intervals in the caudate but not in the thalamus. Third column from left: Variance explained for individual SCs as a function of scaling index. Right column: The speed of neural trajectory within the scaling subspace spanned by the first 3 SCs. Both ordinate and abscissa follow a logarithmic scale.
(a) A recurrent neural network (RNN) trained to use an interval-dependent Cue input to produce time intervals flexibly. The network was trained using a non-scaling exponential objective with a fixed time constant. (b) The speed of dynamics measured within the space spanned by the first three PCs predicted Tp across both Short and Long conditions. Both ordinate and abscissa follow a logarithmic scale. (c) The response profiles of randomly selected units in the network (a) aligned to the time of Set. (d) A RNN trained to use a brief pulse as the interval-dependent Cue input to produce time intervals flexibly. The network was trained using a linear ramping objective like the network in the main text. (e) The speed of dynamics predicted Tp across both Short and Long conditions with the same format as b). (f) The response profiles of randomly selected units in network (d) aligned to the time of Set.
(a) The first 9 scaling components (SCs) of the population activity in the RNN with a scaling output function (Fig. 5). The early SCs correspond to the recurrent subspace, and the last SC represents the input subspace. (b) The average speed of population activity in the subspace spanned by the first 3 SCs is predictive of both within context and across context variations in Tp. Both ordinate and abscissa follow a logarithmic scale. (c) The average firing rate of the population activity projected onto the last SC is also predictive of Tp. (d) Cumulative percentage variance explained by PCs (white) and SCs (black). The dashed vertical line corresponds to the 9th component. (e-h) Same as a-d, for a network that was trained for a non-scaling exponential output objective function (Supplementary Fig. 6d).
Left: The time course of the SC9 (the least scaling component) across conditions. Right: The average firing rate of population activity (n = 281 neurons for Monkey A) projected onto SC9 (left), also the putative input subspace, increases with produced intervals (Tp). This is consistent with the hypothesis that the average firing rate in the non-scaling subspace controls speed. Based on the recurrent network model, this subspace likely reflects the input drive to MFC. Both ordinate and abscissa are plotted in a logarithmic scale.