Flexibility in motor timing constrains the topology and dynamics of pattern generator circuits

Temporally precise movement patterns underlie many motor skills and innate actions, yet the flexibility with which the timing of such stereotyped behaviors can be modified is poorly understood. To probe this, we induce adaptive changes to the temporal structure of birdsong. We find that the duration of specific song segments can be modified without affecting the timing in other parts of the song. We derive formal prescriptions for how neural networks can implement such flexible motor timing. We find that randomly connected recurrent networks, a common approximation for how neocortex is wired, do not generally conform to these, though certain implementations can approximate them. We show that feedforward networks, by virtue of their one-to-one mapping between network activity and time, are better suited. Our study provides general prescriptions for pattern generator networks that implement flexible motor timing, an important aspect of many motor skills, including birdsong and human speech.

zero elements of (3) ( ). Then, taking a derivative of both sides of and applying the chain rule, one gets: where *3 is the Kronecker delta. The left-hand sides of these equations are all zero except when = . Eq. (SI.2) implies that along a specific trajectory, gradients of non-target interval durations with respect to synaptic weights must be orthogonal to the tangent vector of the trajectory.
To solve Eq. (SI.2), we first get rid of (3) = : ( Once a solution to this equation is found, other solutions that trace the same trajectory with different speeds can be constructed by reparametrizations that respect the strictly increasing nature of (3) = : (3) ( )>.
With initial conditions (3) (0) = , Eq. (SI.4) defines a non-linear system of ordinary differential equations (one for each ) for unknowns, being the number of non-zero elements of . In ordinary situations << and hence these equations will typically be underdetermined with infinitely many solutions. However, caution must be exercised to avoid trajectories, : (3) O ̅ P, crossing pathological points such as: 1. When (*) ( : ) is not differentiable. This could happen, for example, if the network is poised at a bifurcation point. We exclude such cases, because such networks will not exhibit the robustness required of a pattern generator circuit.

When
We exclude a vanishing gradient because it would prevent the pattern generator to implement local changes to the th interval and hence is not flexible.
To be able to say whether the conditions listed above, and possible others, seriously restrict the existence of specific trajectories, one needs know the details of the particular temporal pattern generator network. The models we present in the main text allow for sufficiently large variations of synaptic weight matrices around their baseline value for at least ≈ 10% changes in interval durations, as can be seen in our simulations, and hence should exhibit specific trajectories. The argument we presented can be separately repeated for all the intervals that the pattern generator produces.
Until now, we have only considered specific changes to intervals starting from the original network connectivity. What if a specific change to another interval is required after a specific change to the network has already been made? Such a scenario can be generalized to taskspecific combinations of intervals and interval changes, each of which would add constraints of the form Eq. (SI.4), starting from different network connectivities. In this paper, we only focus on changes from the original network connectivity.

Independent trajectories in synaptic weight space
Next, we discuss independent trajectories. These are trajectories along which multiple intervals can change simultaneously without interference, by which we mean that their rates of change do not suffer compared to how they would change on their own. Suppose Y , … , Z are the intervals targeted for change. Is there an independent trajectory in synaptic weight space, (3 [ ,…,3 \ ) ( ), for changing the duration of these intervals?
To quantify interference, we need to compare the rate of interval changes along (3 [ ,…,3 \ ) ( ) to the rate of interval changes along specific trajectories. Therefore, to have a welldefined problem we need to assume the existence of specific trajectories (see above). Then, we can state the condition for independence as: where * O (3 _ ) ( ) P are specific trajectories. The parametrization of specific trajectories can be chosen freely for the current discussion, but in a biological setting that will be discussed in the next section, relates to the learning rate during single-and multiple-target learning. Taking derivatives with respect to and applying the chain rule on both sides gives: Note that the terms in the summation on the right-hand side are zero except when = # , due to specificity. Hence, independent paths must be orthogonal to gradients of non-target intervals. For each Y , … , Z combination, this amounts to equations, one for each interval , for unknowns. In ordinary situations, << and hence these equations will typically be underdetermined and solutions to them can be found, provided that the dependence of interval durations on synaptic weights are not pathological, as discussed above, for the required range of interval durations. This argument can be separately repeated for all possible independent interval combinations.
The existence of specific and independent trajectories in synaptic weight space is a minimum and necessary requirement for flexible time-keeping, but such paths must be found using biologically plausible learning rules, and we discuss them next.

Biologically plausible reinforcement learning of specific and independent temporal changes
Until now we discussed the existence of trajectories in weight space along which modification to individual intervals is specific and independent. In reality, such trajectories have to be traced by synaptic plasticity during trial-and-error learning. The information available at each synapse is very restricted: a signal about the produced target interval as well as knowledge about the synaptic weight of the synapse and the states of the pre-and postsynaptic neurons. This poses a serious challenge for how to implement specific and independent modifications to timing in biologically plausible networks. What neurons as a population can do, instead of solving (SI.2) and (SI.6), is to learn to increase reward. In fact, there are many biologically plausible reinforcement learning models in the literature 1-3 which suggest plasticity rules that find the direction of maximum reward increase, i.e. the 'gradient', in synaptic weight space when averaged across many trials.
Leaving the trial-and-error aspect of learning aside (it is addressed in the main paper), let us for a moment assume that the network has calculated a trajectory that is a gradient ascent on reward, where is a positive parameter that governs the rate of change. Then, using (SI.2) and (SI.8), we deduce that biologically plausible specific modification of interval duration requires orthogonality: where ( ) is a diagonal matrix with positive diagonals, is the interference matrix defined in the main text, and : (3) ( ) is now the trajectory obtained by the reinforcement learning algorithm. Note that this relation has to hold at each point along the trajectory. Equation (SI.9) is a design constraint on a pattern generator network and in the main text we check whether the right-hand side is indeed a diagonal function for various possible pattern generator networks (Figs. 2E, 3D and 5C).
Simultaneous and independent modifications to multiple intervals require multiple reinforcement signals delivered to the network. Here we assume that the combined effect of such reinforcement is given by an addition of each reinforcement. Borrowing notation from the previous section: With this assumption, specificity in a reinforcement learning experiment would imply independence, as, locally, independent paths will be linear combinations of specific paths, as in (SI.7).

Supplementary Note 2: Flexible time-keeping in feedforward networks requires a one-to-one mapping between synapses and the interval durations they affect
Here we discuss flexible time-keeping in a pattern generator network with feedforward architecture.
In this section, we make explicit references to "time points", (*) , which mark the beginnings and ends of intervals, i.e. (*) = (*) − (*qY) . We label layers by Greek symbols. We make the following general assumptions: 1. The mapping between network activity and time is layer specific: the th time point, (*) , is a function of only the activity of the neurons in the th layer.
2. As also implicit in the previous assumption, time increases with progression through the feedforward network. Activity in each layer codes for the start of an interval and the end of the previous one.
3. Only the initial layer, which we call 0 th layer, receives external input.
For example, the first spike time of the neurons in th interval may code for the beginning of the th interval and the end of the ( − 1) th interval.

The case of a single neuron per layer
For simplicity, let's first assume a single neuron per layer. We will discuss the generalization of our results to multiple neurons per layer in the next section.
Because in the case of a single neuron per layer, there is a one-to-one match between intervals and synaptic weights in a sense that will be described below, we change our notation for synaptic weights and neuron labeling slightly. Neurons and the layers they belong to are labeled with Greek subscripts, as opposed to the Latin subscripts of the previous sections. In addition, the synaptic weight between the neuron in ( − 1) th layer and th layer is denoted by We denote the activity of the th neuron by * ( ), which can be spikes or firing rate. The feedforward architecture makes * ( ) a function of the activity of the presynaptic neuron *qY ( ) and the connection between them * : (SI. 11) Interval boundaries, on the other hand, depend on the activity of neurons Hence, flexibility requires the synaptic weight to shift all upstream time points by the same amount.
We prove the claim for the first interval as the base case. Note that by (SI.15), Flexible learning requires the learning matrix to be diagonal with positive diagonal elements (see (SI.19)). The first row and column of the matrix are given by: Hence, flexible learning requires Now the (strong) induction step: Assume that the claim holds for interval durations, (*) , = 1 to = − 1. We prove that this implies that the claim holds for = . Flexible learning requires interference matrix to be diagonal with positive diagonal elements. The th row and column of the matrix are given by: Looking at the right hand side, we note that the terms in the summation for which > min( , ) are 0 by (SI.15), and for which < are 0 by our induction assumption. Then, the only possibly non-zero term in the summation is: Since flexible learning requires interference matrix to be diagonal with positive diagonal elements. Next, we present a sufficiency condition for a feedforward network to exhibit flexible timing. Proof: If perturbing * leads to a time-shift in the th neuron's activity, all later neuron activities will be time shifted by the same amount by time-invariance property, and therefore no other interval than (*) will be affected.

The case of multiple neurons per layer
The argument presented above applied to having a single neuron per layer. The argument can be extended to multiple neurons. We chose a notation that makes the layered structure of the feedforward network explicit. Let's denote by *,# ( ) the activity of the th neuron in the th layer and let *,#$ be the synaptic weight from the th neuron in the ( − 1) th layer to the th neuron in the th layer. We assume that there are * neurons in each layer. The feedforward architecture makes *,# ( ) a function of the activity of the presynaptic neurons *qY,$ ( ) and the synapses between them *,#$ : Proof. We will prove this claim using strong induction. We note that the claim already holds for > as activity of postsynaptic neurons do not affect activity of presynaptic neurons in a chain, We prove the claim for the first interval as the base case. Flexible learning requires interference matrix to be diagonal with positive diagonal elements (see (SI.9)). The first row and column of the matrix are given by: Now the (strong) induction step: Assume that the Claim 3 holds for = 1 to = − 1. We prove that this implies that the claim holds for = . Flexible learning requires interference matrix to be diagonal with positive diagonal elements. The th row and column of the matrix are given by: (SI. 28) Looking at the right hand side, we note that the terms in the summation for which > min( , ) are 0 by (SI.25) and for which < are 0 by our induction assumption. Then, the only possibly non-zero terms in the summation are:

Supplementary Note 3: Flexibility range of a chain of integrate-and-fire neurons
Here we present a calculation of the range of synaptic weight strengths within which a chain of integrate-and-fire neurons exhibit timing flexibility. Integrate-and-fire neurons are arranged in a chain with a single neuron per layer. We will work in the synaptic weight regime where the activity propagates in the chain with each neuron producing a single spike, which mark interval boundaries. The sub-threshold dynamics of neuron 's membrane potential is given by: Here, ( ) is the excitatory post-synaptic potential (EPSP), which is 0 for < 0 and normalized to unit area, ∫ " d ( ) = 1. (*qY) is the spike time of the ( − 1) th neuron. When the neuron reaches threshold, •• , the neuron produces a spike and the membrane potential is reset to -. We assume that the neuron is at rest potential, Z••• , when the first pre-synaptic spike arrives. The membrane potential of the neuron after the first pre-synaptic spike and before its own first spike is given by: (SI. 32) The minimum synaptic weight strength for producing a spike happens when the maximum value of the membrane potential hits the spiking threshold. Taking a derivative of the membrane potential and setting it to zero, we get an implicit equation for the time of the maximum potential, oe•ž : Then, the minimum synaptic weight at which a spike is produced is given by: When • = , oe•ž = and *,oe£¤ = ( •• − Z••• ) . *,oe£¤ monotonically increases with • , starting with *,oe£¤ = 0 for • = 0, which is the delta-function EPSP limit. The intuition behind this behavior is that as • get bigger, less of the EPSP falls in the time window of integration (∼ ) and the rise in membrane potential becomes smaller. To compensate for this effect, the synaptic weight has to get stronger.
The maximum synaptic weight strength for flexible timing depends on the configuration of intervals in the network. One can increase the synaptic weight until a second postsynaptic spike is generated without any effect on the durations of downstream intervals. The second spike, however, increases the excitation in the ( + 1) th neuron, and may lead to earlier spiking of the ( + 1) th neuron. Therefore, a lower bound for the maximum synaptic weight strength for flexibility, is the strength at which a second spike is generated by the th neuron. The membrane potential after the first spike (but before a possible second spike) is Note that (*) is a function of * as synaptic weight strength affects the timing of the first spike.
The maximum membrane potential after the first spike, oe•ž , is given by the solution to The lower bound on the maximum weight is when the membrane potential reaches spiking threshold at = oe•ž : .
(SI. 37) Note that (*) is a function of * and therefore this equation is an implicit equation.
A refractory period will increase the lower bound on maximum synaptic weight. To see this, note that the maximum membrane potential after the first spike, oe•ž , is now given by the solution to where Z is the refractory period. The lower bound on the maximum weight is when the membrane potential reaches spiking threshold at = oe•ž : .

(SI. 39)
Taking a derivative of this results with respect to Z results in a positive number, proving our claim that refractory period increases the lower bound on maximum synaptic weight.

Supplementary Figure 1: Specificity is not related to baseline timing correlations (A) Changes (non-baseline subtracted) in song segments relative to the targeted segment. (B)
We analyzed in greater detail how non-target changes (interference) depended on 1) the baseline timing correlation between target and non-target (x1) or 2) the extent of target changes (x2) or 3) both. In all cases, there was no dependence. Non-target changes during CAF did not tend to be larger when the pre-CAF baseline timing correlation between target and non-target interval durations was higher (Pearson's r = -0.008, p = 0.93). (C) Non-target changes during CAF were also not larger when the targets were modified to a greater extent (Pearson's r = -0.14, p = 0.11).
To test whether both the baseline timing correlations and target changes contribute to nontarget changes (i.e., non-targets that have the highest correlation with the target when the target was modified the most might exhibit most interference), we used multiple regression with three explanatory variables: x1 and x2 as stated above and an additional interaction term (product of the x1 and x2 since they are continuous variables). We found no correlation (R 2 = 0.052, p = 0.10).
Overall, detailed analysis strongly suggests specificity in birdsong timing, i.e., modifications to one part of a sequence leaves the temporal structure of other parts unaffected regardless of any baseline timing correlations. An example of what happens when the network is perturbed, and fails to produce the right timing interval. The black line shows a successful output when no perturbation is delivered. Dashed blue line shows the output when perturbation was delivered to the network during the time denoted by the brown bar. Even though the output crossed the threshold sufficiently many times, the interval durations were not within the desired 6% of their targets. (C) Example reinforcement learning simulations for a two-target 'experiment' run for 3000 trials in a fsRNN, where the 3 rd and 8 th interval was targeted for lengthening and shortening respectively. Shown for different feedback strengths. (D) Decrease in average learning rates of the 3 rd and 8 th intervals (across 20 simulations) when they were targeted together relative to when they were targeted alone. For each network, average learning rates (across 20 simulations) were calculated (as for Figure 2J)