Probing the structure–function relationship with neural networks constructed by solving a system of linear equations

Neural network models are an invaluable tool to understand brain function since they allow us to connect the cellular and circuit levels with behaviour. Neural networks usually comprise a huge number of parameters, which must be chosen carefully such that networks reproduce anatomical, behavioural, and neurophysiological data. These parameters are usually fitted with off-the-shelf optimization algorithms that iteratively change network parameters and simulate the network to evaluate its performance and improve fitting. Here we propose to invert the fitting process by proceeding from the network dynamics towards network parameters. Firing state transitions are chosen according to the transition graph associated with the solution of a task. Then, a system of linear equations is constructed from the network firing states and membrane potentials, in a way that guarantees the consistency of the system. This allows us to uncouple the dynamical features of the model, like its neurons firing rate and correlation, from the structural features, and the task-solving algorithm implemented by the network. We employed our method to probe the structure–function relationship in a sequence memory task. The networks obtained showed connectivity and firing statistics that recapitulated experimental observations. We argue that the proposed method is a complementary and needed alternative to the way neural networks are constructed to model brain function.


Results
Constructing neural networks with predefined dynamics. We will consider a network of N rec recurrently connected binary neurons (the integration neurons), which receive information about the environment from a set of N in input neurons (Fig. 1a). The temporal evolution of the network is dictated by the standard equations of the linear-threshold neuron model: where W in and W rec are synaptic weights matrices of input and integration neurons, vector y contains firing states of input neurons, which codify the stimuli presented, θ is a vector of neuron thresholds, and H stands for the Heaviside function. Vector u is a real-valued vector of length N rec that collects the network activation states, akin to membrane potentials, and z is a vector of zeros and ones that collects the network firing states. At any given time t the recurrent network can adopt one out of M network firing states m, defined by vector z m .
Equation (1) shows how firing states and activation states are linearly related through the synaptic weights. Thus, the weight matrices can be computed exactly by solving a linear system of equations, provided that firing states and the resulting activations are known. To specify, we define c s,m = [y s z m ] , the concatenation of the input firing state during presentation of stimulus s, and the network firing state when the network is at state m. Next, we construct a matrix C whose rows are vectors c s,m for all the combinations of stimuli and network states we want to include in the proposed dynamic. Then, the following system of linear equations follows: where matrix W = [W in W rec ] is the concatenation of the input and recurrent synaptic weights. Here the i th row in U is the activation state u(t) when the i th row of C is [y(t) z(t − 1)] . Then, W can be computed as: where C + stands for the pseudoinverse of C. We employed the pseudoinverse because it gives the solution that minimizes the Frobenius norm 11 . In this manner, synaptic weights are going to be small in absolute value, which is desirable since biological synapses are constrained in the number of receptors and vesicles.
Matrices C and U should be picked in such a way that they instantiate the state transitions exhibited by a network while solving a target task, e.g., transitions to different network states after presentation of different stimuli in a discrimination task. Simple tasks like the ones employed in behavioural neuroscience exhibit simple state transitions, hence the transition graph is known. Therefore, it only remains to find the actual vectors u and c. A naïve approximation to this problem would be to pick vectors u at random, apply the thresholds to obtain the associated vectors z , and construct matrices C and U by following the desired transition graph. However, by doing so we likely end up having an inconsistent system of equations, meaning that no network of neurons can follow those state transitions. To understand this important point, we may consider the case of two stimuli s 1 and s 2 , codified by input neurons with firing states y 1 and y 2 respectively. Then, each vector [y 1 z m ] can be expressed as a linear combination of [y 2 z m ] and vectors [y 1 z P ] and [y 2 z P ] , where vector z P can be any vector taken from the set of all firing states the network can adopt: Thus, rank(C) = M + 1 . Following the Rouché-Capelli theorem 12    www.nature.com/scientificreports/ adjoined to matrix C the linear dependencies expressed in Eq. (5) will be broken, and the resulting augmented matrix will have rank above M + 1. However, if the rows of U are linearly combined following the linear combinations present in C, the resulting system of equations will be consistent, and its solution will retrieve matrix W. This is the key aspect of the FTP algorithm, which can be succinctly stated as: 1. Find the transition graph between network states that solve the target task. 2. Choose activation states u from a desired distribution. 3. Use vectors u and associated vectors c to construct matrices U and C following the transition graph. 4. Linearly combine rows in U to preserve the linear combinations among rows of C (Eq. 5) Testing the FTP algorithm in a sequence memory task. In the following we exemplify the utility of the FTP algorithm by constructing networks that solve a sequence memory task (s-task). In this task two stimuli s 1 and s 2 , codified by input neurons firing states y 1 and y 2 , are sequentially presented at each time step, chosen randomly with equal probability. To obtain reward at time step t the network has to recall the stimulus presented at time step t − t . By recall we mean that the stimulus at time t − t is univocally codified by the network firing state at time t. Consequently, successful behaviour requires to have a memory of stimuli sequences of length τ = �t + 1 . We chose this task because its complexity grows exponentially with τ , making it a good benchmark to test the computational efficiency of the algorithm. It has also been shown that neural populations are sensitive to stimulus presentation ordering, and the coding of such sequences is incompletely understood 13 Figure 1. Constructing recurrent networks that follow a predefined transition graph. (a) Networks are composed of binary neurons. Input neurons codify stimuli and project to the integration neurons through synaptic weights W in . Integration neurons are recurrently connected through synaptic weights W rec . (b) Transition graph showing transitions between network states during execution of the s-task, for τ = 3 . Each node in the graph is a network state, and the directed edges depict transitions between states after stimuli presentation (blue for s 1 , red for s 2 ). Each possible sequence of 3 stimuli is codified by exactly one network state. Nodes are numbered such that transitions can be represented in a simple transition matrix. (c) Transition matrix associated with the transition graph in panel (b). It shows the activation states u that are reached when integration neurons are in a population firing state z, and s 1 (blue) or s 2 (red) are presented. (d) Same transitions depicted in panels (b) and (c), but explicitly showing vectors u i and vectors c i , which are the concatenation of one y and one z. The index i is such that z i and u i are the firing state and activation vectors corresponding to network state m i . www.nature.com/scientificreports/ we decided to exploit the FTP capabilities to study the neural dynamics and connectivity in networks of binary neurons that are capable of solving the s-task. Figure 1b shows the network states and state transitions gated by the stimuli when solving the s-task with τ = 3 . Despite its complexity, the state transitions required to solve the task have a stereotyped structure, which is evident when network states are numbered properly (Fig. 1b-d). The transition graph in Fig. 1b shows the state transitions any network that solves the s-task should follow. With the transition matrix structure at hand, we can construct matrices C and U (see details in the "Methods" section).
We assessed the performance of the algorithm by measuring the time it takes to find networks that solve the s-task ( N rec = 1024 , τ = 1 to τ = 10 ) and comparing those times with the time taken by a stochastic gradient descent (SGD) algorithm (Adam optimizer 14 , adapted to fit threshold units 15 ). The FTP algorithm outperformed SGD by at least two orders of magnitude (Fig. 2a). This is not surprising when considering that a linear system of equations can be solved in polynomial time, while SGD (and any other fitting algorithm) requires huge numbers of network evaluations to obtain a single set of parameters updates. An example network constructed with FTP is shown in Fig. 2b,c, of N rec = 8 neurons and capable of solving an s-task with τ = 3 . The resulting synaptic weight distribution had zero mean and resembled a normal distribution, at least for the W rec values (Fig. 2d). In fact, the synaptic weight distributions became progressively closer to a normal distribution as more neurons were employed in network construction (Fig. 2e). We also noted that the absolute weight value decreased, especially for W rec values (Fig. 2f), which can be explained by thinking that more neurons imply more parameters and hence more degrees of freedom to reach a lower Frobenius norm. This observation will become important later when imposing structural constraints to the network.  Figure 2. Solving the s-task with the FTP algorithm. (a) Efficiency of FTP and SGD, measured as the time expended in finding a solution network for s-tasks of different τ . The time expended by the FTP algorithm is orders of magnitude lower than the time expended by the SGD algorithm (n = 5 networks for each τ value). (b) Raster plot showing the neurons firing states in a network constructed to solve the s-task for τ = 3 . The network is composed of 8 integration neurons and 2 input neurons. Each possible sequence of 3 stimuli has a unique network firing state that codifies the sequence. Therefore, the network has 8 possible firing states. (c) Input and recurrent synaptic weights of the network. (d) Distribution of synaptic weights for input (upper panel) and recurrent (lower panel) synaptic weights, for the same network as in (b,c). (e) Kolmogorov-Smirnov statistic between the distribution of synaptic weights and a normal distribution of the same mean and variance. As the network size increases, the distribution of synaptic weights gets closer to a normal. Integration neurons weights are closer to a normal distribution than sensory neurons weights. Mean ± SD are shown for n = 100 networks that solve the s-task, with τ = 3 . (f) Absolute synaptic weight values as a function of network size. Absolute values are higher and of larger variability when the neuron count is close to the number of coded stimuli sequences. As the network size increases the absolute mean value and dispersion decrease. Input neurons weights quickly reach a minimum, while integration neurons weights decrease in the entire range of network sizes. Mean ± SD are shown for n = 100 networks that solve the s-task, with τ = 3. www.nature.com/scientificreports/ Specifying the firing rate and pairwise correlation of the network. We want to construct neural network models that not only solve relevant tasks but do so under desired firing constraints, as measured in real brains. Some of these constraints are low firing rates (FR) 16,17 , or low correlation coefficients (CC) 18 . In regular optimization algorithms, these constraints can be imposed by introducing regularization terms in the loss function 5 . On the other hand, in the FTP algorithm the activation states of the network are the result of linearly combining the rows of an initial matrix U base (see "Methods"). Hence, we can apply firing constraints by appropriately choosing this initial matrix. For example, to attain networks that solve the s-task with low/high firing, it suffices to choose an initial matrix U base such that, after thresholding, the resulting matrix C has few/ many ones. Following this procedure, we constructed networks with average FR within a wide range of target FR (Fig. 3a, blue line). Shuffling values within W in matrices, or within W rec matrices, produced only small changes to     www.nature.com/scientificreports/ the average FRs (Fig. 3a, red line). This suggests that it is the distribution of synaptic weights the critical statistic that defines the networks average FR, and not the detailed arrangement of these weights in the synaptic weight matrix.

Scientific Reports
On the other hand, we can construct networks with desired signal correlation, by multiplying (the difference in effects produced by s 1 and s 2 ) by a factor f cc (Fig. 3b). For networks shown in Fig. 3 ( τ = 4, f r = 3 ), correlations could be modulated in a range between 0.1 and 0.5. Scaling was expected to induce signal correlation. However, it also induced noise correlation as a by-product (Fig. 3c). Correlations in networks solving the s-task were significantly higher than correlations in their synaptic weight-shuffled counterparts (Fig. 3b, blue vs. red), which suggests that signal correlations depend on the precise arrangement of values in the weight matrix, and not only on the weight distribution, as was the case with FR. It also suggests that the set of networks that solve the s-task necessarily exhibit correlations above a minimum. On the other hand, correlations did not exceed a certain value: higher correlations may imply a reduced number of network states, incompatible with the number of sequences required to be codified.
Hand-based manipulation of U base allows us to generate solution networks in a wide range of FR and CC (Fig. 3d, blue dots). An even better control of firing and correlation can be achieved by fitting U base thought an optimization algorithm. We employed a genetic algorithm (GA) in which the fitness of U base is a function of the FR and CC computed from the network firing states generated from that U base . Fitting U base gave access to more extreme values of FR and CC (Fig. 3d, black dots), and kept computational cost low by computing FR and CC from the set of network firing vectors c instead of computing the firings by simulating the network. Altogether, both methods (U base manipulation, or its evolution with a GA) generated networks with desired activity constraints that perfectly solved the task.
Applying structural constraints with projected gradient descend in isofunction weight space. Networks generated so far share one structural constraint: their synaptic weight matrix is the one that minimizes the Frobenius norm. Other relevant structural constraints, such as the low probability of selfconnections 19 , Dale's principle 4 , or sparse connectivity are not satisfied. Since these structural constraints are key experimentally observed features, we were interested in imposing such constraints onto the W obtained by the algorithm. To do this we followed a projected gradient descent (PGD) approach 20 , taking advantage of the fact that the loss function L , which encloses the structural constraints, is a linear function with respect to the synaptic weights, and that the matrix W can be changed without changing the stimulus-response mapping (see "Methods"). To exemplify the procedure, we constructed a network that solves the s-task for τ = 4 , with f r = 3 (Fig. 4a), and then we employed PGD to remove self-connections, enforce Dale's principle with a 4:1 Ex:In ratio, and set a sparsity sp = 40% (defined as the percentage of weights equal to zero). The PGD steadily reduced the loss L , reaching a negligible value, provided that the network had enough neurons (Fig. 4b). It is remarkable that so dissimilar synaptic weight matrices, like the ones depicted in Fig. 4a,c,d, gave rise to the same stimulusresponse mapping.
We noted that structural constraints could not be imposed on networks with low number of neurons, for example, networks with N = N in + N rec < 2M . This is not surprising, since it is expected that imposing more constraints require more parameters. To evaluate the efficiency of the PGD in relation with the number of neurons, we imposed the above structural constraints for networks that solved the s-task with τ = 3 to τ = 6 , and N rec between 32 and 256 neurons. Since matrix U and vector θ were randomly chosen, it was expected that some of them resulted in matrices W for which the structural constraints were impossible to apply. Consequently, we measured the efficiency of PGD by computing # attempts, the number of networks that were required until obtaining the first successfully constrained network. It can be seen that # attempts decreased as the number of neurons increased (Fig. 4e). Concordantly, the computing time required to obtain a successfully constrained network decreased as the number of neurons increased (Fig. 4f), probably due to the decrement in # attempts that were required. The fitting time was higher for networks with the highest neuron count, but always within the order of tens of seconds, even for N rec = 256.
Exploiting the FTP algorithm for hypothesis testing. A distinctive aspect of the FTP algorithm is that the desired dynamic of the network is defined first and in great detail. In fact, we can specify the network . Imposing structural constraints with FTP. (a) Synaptic weight matrices W in and W rec of the network obtained through FTP before structural constraints were imposed. The network was constructed to solve the s-task for τ = 4 and f r = 3 , with a target FR of 0.1 spikes/time step. (b) Loss function L as a function of the number of iterations of the PGD algorithm. The loss function falls below the criterium e 1 = 10 −3 at iteration 121. (c,d) Synaptic weight matrices W in and W rec for a network with the same stimulus-response mapping but after applying structural constraints: (c) no self-connections, Dale's principle, with 40 excitatory and 10 inhibitory neurons, and sparsity sp = 40% ; (d) no self-connections, Dale's principle, with 26 excitatory and 24 inhibitory neurons, and sparsity sp = 23% . (e) Average number of attempts to obtain one network with successful structural fitting, as a function of the number of integration neurons, and for different τ . The number of attempts is high when the neuron number is low, but it decreases fast as the neuron number increases. From 60 neurons onwards, less than five attempts are needed, on average, to obtain one network with the desired structural constraints. Mean ± SD are shown. (f), total running time to obtain one network with successful structural fitting, as a function of the number of integration neurons, and for different τ (color code as in (e)). Running time decreases and then increases for τ = 3 and τ = 4 . The case of τ = 6 is the one with more neurons and equations to solve and present some of the highest running times, even when the number of neurons is high. Nevertheless, all average running times are in the order of tens of seconds. Positive and negative reciprocity values were mapped separately to colours red and blue, respectively. Red tones go from 0 reciprocity (white) to maximal positive reciprocity (pure red). Blue tones go from 0 reciprocity (white) to maximal (in absolute value) negative reciprocity (pure blue). All random graphs were constructed with f bc = 0.5 . Graphs were plotted with the Force-directed layout. www.nature.com/scientificreports/ dynamic up to the level of single state transitions, and the algorithm finds the set of synaptic weights that makes the desired dynamic possible. This approach is ideal for testing very specific hypotheses about the relationship between network function (neural dynamics and task performance) and the underlying structure (synaptic connectivity). We wanted to understand how the firing rate and firing dynamic constrained connectivity. To that end, we constructed networks that solved the s-task, over a range of network sizes and mean firing rate, and compared them with networks that had the same number of neurons and network states but for which the graphs of transitions between network states were generated at random (Fig. 5a,b). These "random transition" networks did not have sequence memory, but their dynamic was similar in complexity to that of networks that solved the s-task, turning them in perfect examples to study the connectivity features behind the capacity to codify sequences of stimuli. We focused on measuring the level of reciprocity in the network, which has been observed experimentally 21 and its implications studied theoretically 22 . By screening networks with memory ranging from τ = 2 to τ = 7 , and FR from 0.1 spikes/time step to 0.9 spikes/time step, we found that reciprocity varied with τ , FR and neuron number. In particular, we observed that, when f r = 1 , reciprocity was positive and of lower mean for networks that were the minimum Frobenius norm solution to the s-task (T + F networks, Fig. 5c), in comparison with networks that were the minimum Frobenius norm solution to a random transition graph (F networks, Fig. 5d). However, for bigger f r the relationship was inverted, and T + F networks showed positive reciprocity (Fig. 5e) while F networks showed negative reciprocity (Fig. 5f).
To further describe these relationships, we selected networks constructed for τ = 7 and f r = 1 (Fig. 6a-c) and f r = 4 ( Fig. 6d-f). Signal correlation varied with FR following an inverted U-shape relationship, with a maximum next to 0.5 spikes/time step (Fig. 6a,d). Note that, up to FR = 0.5 spikes/time step, CC increased with FR, as has been observed experimentally 23,24 . Interestingly, the way CC changed with FR was similar for both T + F and F networks, with the distinction that F networks had overall higher correlations than T + F networks. Shuffling the inter-spike interval of each neuron destroyed the dependence between FR and CC. The CCs after shuffling were also much smaller than the CC values of the non-shuffled firings (CC shuffled = 0.0224 ± 1.10 -4 , mean ± SD, for n = 90 T + S networks pooled over all FR values). These results ruled out the possibility that correlations were  www.nature.com/scientificreports/ trivially increasing with FR because of the higher number of spikes. We also found an inverted U-shape relationship between reciprocity and FR, and a linear relationship between reciprocity and CC. With f r = 1 reciprocity tended to be maximized as FR approached 0.5 spikes/time step, with F networks showing higher (and positive) reciprocity (Fig. 6b). With f r = 4 , reciprocity tended to increase as FR departed from 0.5 spikes/time step towards lower and higher values, i.e., networks with lower correlation (Fig. 6e). Specially, F networks showed negative reciprocity for all firing rates, except for the more extreme cases (0.1 and 0.9 spikes/time step). Just like the reciprocity/FR relationship inverted with the number of neurons, so did the reciprocity/CC relationship. Networks with higher reciprocity had higher correlation when the number of neurons was low (Fig. 6c). However, and somewhat counterintuitive, when the number of neurons was higher, more reciprocity implied lower correlation (Fig. 6f). Networks that solved the s-task but did not minimize the Frobenius norm (T networks) showed almost zero reciprocity, implying that reciprocity was not a property of all networks that solve the s-task. On the contrary, most networks that solved the s-task did not show significant reciprocity, unless other structural constraints, such as Frobenius norm minimization, was imposed. However, Frobenius norm minimization alone only produced negative reciprocity (in random graphs). For positive reciprocity to occur in networks with high number of neurons, both high sequence memory and Frobenius norm minimization were required. We asked whether the results depicted in Fig. 6 could be replicated in networks that lacked self-connections and complied with Dale's principle. To that end we imposed these structural constraints to networks constructed with τ = 7 and f r = 4 , and found a reciprocity/FR relationship that resembled the one observed in unconstrained networks, with F networks showing prominent negative reciprocity and T + F networks showing increasing reciprocity as FR departed from 0.5 spikes/time step (Fig. 7a). Correlations increased as FR approached 0.5 spikes/ time step, with F networks showing more correlation than T + F networks (Fig. 7b,c). Correlation in T + F and F networks were higher for pairs of inhibitory neurons than for pairs of excitatory neurons, as has been observed experimentally 25 , while correlations between excitatory and inhibitory neurons lied in the middle.
We went further on by asking which features of network dynamics were responsible for the differences in reciprocity that we observed between T + F and F networks. The FTP algorithm finds the network connectivity from a detailed description of the network dynamics, specified in its transition graph. Therefore, we hypothesized that specific changes in the transition graph could have a precise impact in network reciprocity. In T + F networks, any given network state could only be reached after the presentation of either s 1 or s 2 , but not both. This is clearly visualized in the transition graph of Fig. 1b, in which each node has all blue or all red incoming edges, but never incoming edges of both colours. These network states codify stimuli in an absolute manner since it only suffices to know the network state at time step t to know the identity of the presented stimuli at that time step. We termed these nodes monocoloured nodes. On the other hand, in an F network each network state can be reached from either s 1 or s 2 , or from both. Network states which can be reached from both stimuli (termed bicoloured nodes) codify stimuli in a relative manner, meaning that the identity of the stimulus presented at time step t can be decoded only if both the network states at t and t − 1 are known. Then, we specifically asked whether the proportion of bicolored and monocoloured nodes could explain the strong differences in correlation and reciprocity found between T + F and F networks. To that end, we constructed F networks with increasing fraction of bicolored nodes (f bc ), and computed reciprocity for fixed τ , FR and f r (Fig. 8). We found that reciprocity became increasingly more negative as we increased the fraction of bicolored nodes. When all network states were monocoloured ( f bc = 0 ), networks showed zero reciprocity, as observed in T + F networks with the same FR and f r . This result highlights the power of the FTP algorithm to relate neural dynamics with network connectivity.

Discussion
We have presented a simple method to generate binary neural network models that accomplish the desired task. Networks composed of binary neurons are computationally inexpensive, and despite their simplicity many neurophysiological and neuroanatomical observations have been recapitulated employing these networks 22,26 .
Our key contribution is to note that, for networks in which neuron inputs are linearly added, their synaptic weights can be found by solving a system of linear equations. In turn, this system can be constructed from the transition graph associated with the solution of the target task. System consistency is guaranteed if the dependent variables of the system (the neuron activations) are linearly combined following the linear dependences among the independent variables (the firing states). We have shown how the FTP algorithm works with the simplest of networks. Yet, we think that the same procedure could be implemented in networks composed of more complex neuron models, like the firing rate model or the leaky integrate-and-fire model, provided that a system of linear equations can be constructed. Current automated methods of network model construction rely on off-the-shelf optimization algorithms typically employed in the artificial intelligence field, like stochastic gradient descent 5 , genetic algorithms 27 , or evolutionary strategies 28 . These optimization algorithms iteratively change network parameters in a direction that minimizes a loss function and have proved to be very effective in finding networks that solve very complex tasks 29,30 . However, they require a considerable amount of human intervention, and there is no certainty that they will arrive at a solution. Moreover, each optimization iteration requires the evaluation of the network, which is time consuming, especially for a recurrent network performing in a multi-trial task. In contrast, the FTP algorithm reduces the problem of finding a suitable network to a series of linear combinations and the solution of a linear system, with both operations performed in polynomial time. Most importantly, it is guaranteed that the resulting network will solve the task perfectly.
Traditional optimization algorithms require the definition of a loss function that encompasses all the constraints the network should satisfy, whether these are task related, activity related, or structural. Then, the loss function is minimized and hence all constraints are enforced at once. In this scenario, the relationship between parameters and the loss function can be quite complex, and conflict between constraints may emerge. Conversely, one key advantage of our method is that it uncouples the dynamic and coding aspects of the network from the structural aspects, giving us the opportunity to adjust them independently. Moreover, since the method proceeds from the network firing states to its parameters, it allows us to find networks with desired activity profiles, and to study the resulting connectivity. Further structural constraints can be enforced in a second stage, by projected gradient descent, or any other optimization algorithm. The fact that projected gradient descent worked so well suggests that structural constraints are easy to implement once the connectivity required to solve the task is in place. This is probably because many connectivity features of a network, such as its sparseness or its clustering coefficient, are a simple function of its synaptic weights. On the contrary, the relationship between the synaptic weights and the network function, or coding capabilities, is far more complex. In this way, our method gives more control over each class of constraint, making the whole process simpler at the same time.
To show its applicability, we employed the FTP algorithm to construct networks that solve a stimuli sequence memory task (s-task) in which networks have to codify in their network firing states the sequence of the last τ stimuli that were presented. The s-task, akin to the n-back task commonly employed in cognitive neuroscience, is relevant within the scope of working memory function. Working memory is traditionally associated with www.nature.com/scientificreports/ maintaining information about a single stimulus in the persistent activity of recurrently connected neurons 31,32 , although mounting evidence suggests that neuron populations code information in the form of highly heterogenous firing sequences 33,34 . Sustained activity can be a suitable strategy when there is one specific relevant stimulus to attend, whose identity has been already elucidated. However, more complex scenarios require keeping track of sequences of stimuli. An example of this case is the processing of language, in which the succession of utterances must be integrated over time, from phonemes to words, to phrases, so that the meaning of speech depends on the whole sequence 35 . We explored the case of two stimuli presented with equal probability, but the analysis could be extended to more realistic cases in which the stimulus distribution is not uniform. It is expected that statistical regularities in the sequences of stimuli are going to be exploited by the network, resulting in more specialized connectivity. The relationship between sequence statistics and network structure should be further studied. For example, it would be of interest the case in which stimuli last more than one time-step, and they are interleaved by another set of stimuli that act as distractors. Then, the relationship between feedback and feedforward connections could be studied, in relation with the duration of each stimulus presentation, and with that of the distractors. The structure-function relationship is central to neuroscience [36][37][38] . Connectivity at the macro, meso, and micro scale, neuron biophysics, plasticity mechanisms, among other structural traits, all act co-ordinately to give sophisticated adaptive behaviour. It is widely believed that the structural properties of networks have evolved to proficiently perform a function, many times in an optimal way 39,40 . However, brain structure could also be the result of other constraints, different from those imposed by adaptive behaviour. For example, neural network modularity might have emerged as an adaptative structural trait for solving tasks that have a modular or hierarchical component 41 , but it could also have emerged as the result of previously acquired structural traits, such as constraints in the length of dendrites and axons, which preclude the possibility of a much wider connectivity. Thus, determining how much of the structure observed in the brain comes from task-related constraints and how much comes from other structural traits is central to understanding the structure/function relationship. A theoretical approximation to this issue consists on constructing neural network models that solve different kinds of tasks under a variety of structural constraints, and then study the patterns of connectivity that emerge and relate them to the observed connectivity in the brain. This approximation requires to sample as uniformly as possible from the set of networks that fit both the task and the structural constraints. However, optimization methods commonly employed in network parameter fitting may give a restricted set of solutions, thus biasing any conclusion about the structure/function relationship. Another issue is that some connectivity traits could emerge only in networks of certain size, and fitted to several tasks. In this case, fitting large networks with complex cost functions could have a high computational cost. Consequently, generating a relatively large sample of networks suitable for statistical treatment of their connectivity could result unfeasible. On the other hand, the FTP algorithm is very well suited for answering structure/function questions, since exact solutions can be computed starting from an arbitrary set of population firing states, as long as they define a system of equations that has a solution.
Hypotheses that link structure and function can be tested with the help of the FTP algorithm, by constructing networks that follow transition graphs that instantiate some null hypothesis. Following this approach, we constructed networks that had the same number of network states and neurons required to solve the s-task, but whose state transitions were chosen at random. With this tool at hand, we were able to show how network reciprocity depended on the memory demand and the size of the network. The same procedure can be followed to build any other set of networks under some relevant null hypothesis. Such networks can be easily constructed with the FTP algorithm, while they would be hard to construct with regular optimization algorithms.
Evidence for high reciprocity has been found experimentally, by measuring excitatory postsynaptic potentials of reciprocally connected neurons in vitro 21 . It has also been the centre of theoretical analysis. For example, it has been shown that high reciprocity is recapitulated in networks of binary neurons that have a maximum number of attractors 22 . Interestingly, the same work showed that reciprocity is lost when networks are optimized to store sequences of uncorrelated network states. However, we found that networks of high reciprocity are capable of displaying long firing state sequences when their dynamics codify the sequences of previously presented stimuli. Therefore, our work complements previous studies which have shown that reciprocity is one of the key connectivity features that support the computation performed by the neocortex.
Reciprocity was absent in networks taken at random from the set of all connectivities that solve the s-task (the T networks in Fig. 6). This implies that the observed reciprocity is the result of solving the s-task with the additional constraint of weights minimizing the Frobenius norm, the latter being understood as the consequence of an upper bound on the number of receptors and vesicles in a synapse. Thus, to explain one structural feature (reciprocity), a functional feature (solving the s-task), and another structural feature (Frobenius norm minimization) were required. It would be interesting to study to what extent other structural features encountered in biological neural networks, like modularity or sparsity of connections, can be explained as the answer to some computational demand of adaptive behaviour, or as the byproduct of another structural feature, or as the interaction of both factors, as is the case with the s-task.
In conclusion, we have provided an algorithm that inverts the usual process by which neural networks are constructed. It can be employed to probe the dependency between the firing statistics, connectivity, and function of a network in a way that is not matched by current optimization algorithms. Moreover, it is computationally inexpensive. For these reasons, we consider the FTP algorithm to be a powerful alternative method for constructing neural network models of brain function.

Methods
Nomenclature. Vectors are represented with bold lowercase letters and are considered row vectors. Matrices are represented with bold uppercase letters. In Table 1 we have summarized the principal symbols employed throughout the paper, together with their description.
Attaining system consistency in the s-task. We consider a network that solves an s-task, with 2 stimuli and sequences of length τ . Therefore, the network has to display at least a number of network states M = 2 τ . If we number the network states and sort transitions as in Fig. 1b- where T values are odd numbers between 1 and 2M − 1 such that [y 1 , z T ] and [y 2 , z T+1 ] are the T and T + 1 rows of C, and u T , u T+1 are the T and T + 1 rows of U. The row index P is and odd number between 1 and 2M − 1 . Note that z T = z T+1 and z P = z P+1 , but u T = u T+1 and u P = u P+1 . Equation (6) shows us how row vectors in matrix U should be linearly combined such that Eq. (3) has a solution. We have that: This means that the number of linear combinations in U is R = 2 τ /2 − 1 , and rank(U) = 2 τ /2 + 1 . Note that rewriting Eq. (7) we have: where u m,s 1 ,i and u m,s 2 ,i are the activations that neuron i adopts after the presentation of s 1 and s 2 , and coming from a network state m. In other words, Eq. (8) tells that the difference in effects provoked by the stimuli is a constant for each neuron, regardless of which network state or transition we are dealing with. This fact is not surprising, since synaptic weights are held fixed, so each stimulus has the same effect at any time, which is specific for each neuron. Thus, making the system of equations in (3) consistent is equivalent to guarantee that activation values are chosen so that each stimulus has a constant effect on the synaptic activations.
To construct matrix U we first defined a vector of thresholds θ with elements θ i ∈ { 1 /2, 3 /2, 5 /2} . Then, we constructed base matrix U base with M base = 2 τ /2 + 1 row vectors such that: where r(m, i) is an integer uniformly sampled from the [− 5,5] interval. We added the term 1/2 to avoid fitting errors when numerically solving the system, otherwise the activation values could be equal to the threshold values, which would result in erroneous firing states because of numeric precision issues. We chose a uniform  www.nature.com/scientificreports/ distribution over integers for simplicity, although any other distribution could be employed. The same goes for the threshold values. This initial randomly generated matrix U base is required to be full rank. We computed the vector of i elements as the difference between the first two rows of U base . Next, we applied Eq. (8) to generate the remaining R rows as linear combinations of the third to the last row of U base , obtaining 2 τ row vectors that constitute the matrix U lc . Each row vector u in matrix U lc had the neuron activations for one of the 2 τ network states. Applying Eq. (8) creates a dependency between u m,s 1 ,i and u m,s 2 ,i . Hence, for each linear combination we chose at random which activation value (the one associated with s 1 or s 2 ) was defined in terms of the other. This was to ensure that u value distributions were equal for both stimuli. We constructed matrix Z by applying threshold θ to U lc , and then we followed the ordering depicted in Fig. 1b-d to construct matrix U from U lc , and matrix C from Z and vectors y 1 , y 2 . Finally, we employed Eq. (4) to obtain the synaptic weight matrix W. Since matrix W is the minimum Frobenius norm solution to Eq. (4) and defines a network that solves the s-task, we say that W defines a Task + Frobenius (T + F) network.
The algorithm described above works under the assumption that two conditions are met after thresholding: 1) the resulting vectors z are all different, and 2) they are linearly independent. If thresholding U lc gives vectors z that appear more than once, this would result in lower performance in the task, since not all sequences of length τ will be encoded. On the other hand, if linear independency fails after thresholding, then matrix C will have more linear combinations than the contemplated in Eq. (5), meaning that combining the rows of U following Eq. (8) will not be enough, and some linear dependencies in C will be lost in the augmented matrix, making the system inconsistent. In our implementation of the algorithm, if any of these two conditions were not verified, then the algorithm was restarted from the beginning. This occurred with low probability, for τ < 5 . For higher τ , both conditions were always fulfilled in one attempt.
In the previous explanation we assumed that N rec = 2 τ , such that there is one neuron per sequence of length τ . It was possible to fit networks with lower number of neurons, but undesired linear dependencies in C after thresholding, or a number of network states bellow 2 τ occurred with higher probability, especially for τ > 3.

Network simulation and synaptic weights statistics.
Networks were evaluated in the s-task during at least N iter = 10.2 τ time steps, to gather enough samples of each network state. To assess the similarity between the synaptic weight distribution and a normal distribution we computed the Kolmogorov-Smirnov two samples statistic, between the set of synaptic weights and a set of normally distributed values of the same mean, variance, and sample size than that of the synaptic weights.
Equation (4) gives the matrix W with lowest Frobenius norm 12 . Since we are considering networks with N = 2 τ + 2 total neurons (including input and integration neurons), there are infinite solution weight matrices for the same system of equations defined by C and U. These solutions lay in a subspace of R N , of dimension N − rank(C) . The set of all solutions can be obtained by computing the sum between W and a matrix W that satisfies: where ker(C) is an orthonormal basis of the null space of C, of dimensions N x(N − rank(C)) , 0 is a matrix of zeros, and M is a linear mapping of dimensions (N − rank(C))xN rec . This set of solutions share the same stimulus-response mapping. We say they conform an isofunction space.
In several occasions, networks with N rec > 2 τ were desired. Hence, we defined N rec = 2 τ f r , where f r stands for 'redundancy factor' , as the network has f r -times more neurons than required to solve the s-task with that specific τ.

Computation of fitting times for FTP and SGD.
We assessed the efficiency of the FTP algorithm and an SGD algorithm by measuring the time required to find networks of N rec = 1024 neurons that solve an s-task with τ = 1 to τ = 10 . Since in our neuron model the firing state is a non-differentiable function of the activation states, we relied on a surrogate derivative: where υ = u − θ . We took this surrogate derivative from Bellec et al. 15 , although we obtained better results not dividing by the threshold, as originally proposed. One response neuron was added, whose output at a given time step t was defined as o t = tanh(w out z t ) , where w out contains the synaptic weights between integration and output neurons. The loss function to minimize was the quadratic error E = (o target (s) − o) 2 /2 , with o target (s 1 ) = 1 and o target (s 2 ) = −1 . We implemented back propagation thought time (BPTT), unfolding the network in sequences of 30 timesteps and taking minibatches of 30 sequences. We achieved best performance by initializing W in and w out with zeros, while matrix W rec was initialized with samples from a standard normal distribution and then dividing it by its eigenvalue of higher absolute value. Adaptive learning rates were implemented thought Adam 14 , with parameters β 1 = 0.9 , β 2 = 0.999 , ε = 10 −8 and α = 10 −4 . Normalizing gradients to unit length also proved to be helpful in accelerating convergence. Networks were trained until the mean error over the last 10 minibatches was below 0.01. After training, networks were tested with 10,000 stimuli presentations. The maximum testing error across all τ values was 0.03. www.nature.com/scientificreports/ Imposing activity constraints. To construct networks with desired FR we generated U base as previously described, but adjusted the sign of r(m, i) such that, after thresholding, matrix C had a fraction of ones that equalled the target FR. To induce signal correlation, we scaled vector by a factor f cc . This manipulation causes neurons to have very different firing rates for s 1 and s 2 , leading to an increased signal correlation. By following this procedure, we constructed networks in Fig. 3. The target FR values were taken from the range between 0.1 to 0.9 spikes/time step, in steps of 0.1 spikes/time step. The f cc values were taken from the range between 1 and 10 in unitary steps. A total of 30 networks were constructed for each combination of FR and f cc values within those ranges. For each network, the average FR was computed over the FR of all neurons in the network. Similarly, the average correlation coefficient (CC) was computed from the Spearman correlation coefficient computed for all neuron pairs. We also employed a genetic algorithm (GA) to evolve a population of matrices U base to adjust their mean FR and absolute CC. We employed a population of N pob = 200 individuals, each one composed of one matrix U base and one vector θ . For each individual we constructed matrices U and C, and computed an approximate value of FR and correlation, under the assumption that each network firing state occurred with equal probability. The fitness F of an individual was computed as: where FR and CC are the firing rate and correlation computed over the network firing states, and FR target and CC target are the firing rate and correlation we want the network to have. If an individual produced an inconsistent system, or a system with not enough network states, its fitness was set to zero. We picked the T = 0.1N pob individuals with the highest fitness as parents. Then, we picked parents at random and mutated U base by adding gaussian noise to each matrix element, of zero mean and standard deviation σ = 0.1 . One of the individuals of each generation was an unmutated copy of the best individual of the previous generation (elitism). Threshold vectors θ were not mutated. The GA was run until the average fitness surpassed F target = 0.95 . Firing rates and correlations shown as black dots in Fig. 3d were computed by running the network constructed from the elite U base during 30.2 τ time steps.
Imposing structural constraints. Solving Eq. (4) gives networks with minimum Frobenius norm. These networks do not comply with basic structural features observed experimentally, such as the low probability of self-connections, or Dale's principle. To impose such structural features, we constructed a matrix W d such that W d = W + W d . Matrix W d is a matrix which fulfils the desired structural constraints. Most probably W d is not within the null space of C, and thus W d will not be a solution to the system defined by C and U. Hence, we defined a matrix: where M sc = ker(C) + �W d , and ker(C) + is the Moore-Penrose pseudoinverse of ker(C) . Matrix M sc is a linear mapping that incorporates the desired structural constraints, making W the change in matrix W within the null space of C that is closest to W d , in the least squares sense.
We imposed three structural constraints: no self-connections, Dale's principle, and a certain degree of sparsity. Thus: The (i, j) element in matrix W self (which deletes self-connections in the integration neurons) was defined as: where W(i, j) is the (i, j) element of matrix W.
Matrix W Dale was defined as: where W c 1 = W + W self and T j ∈ {Ex, In} indicates if neuron j was chosen to be excitatory (Ex) or inhibitory (Inh). Matrix W Dale sets to zero the synaptic weights that violate Dale's principle. Neuron j was chosen to be excitatory if η j = j W c 1 (i, j) > 0 . Otherwise, it was chosen to be inhibitory. If more excitatory/inhibitory neurons were required, neurons with negative/positive η closest to 0 were set as excitatory/inhibitory as needed.
Matrix W sp to enforce sparsity was defined as: where W c 2 = W c 1 + W Dale . The value α(sp) is the sp-percentile of the absolute values in W c 2 . In this manner W sp will set to zero the weights with the lowest absolute value, such that a sparsity sp is enforced. The loss function L at iteration k was defined as the average of the absolute �W d (i, j) values: www.nature.com/scientificreports/ where i,j stands for the average across indexes (i, j) . The structural constraints were imposed through an iterative process, in which neurons were classified as excitatory or inhibitory at each iteration according to their η , a matrix W was computed using Eq. (11), and a new W was obtained. The process was stopped when the loss fell below a desired value e 1 , in which case the fitting process was considered successful. The process was also stopped if (L(k) − L(k − 1))/L(k) < e 2 . When this latter condition was met, the fitting process was considered unsuccessful, since the error was not decreasing fast enough and would probably converge to an unacceptable value above zero. We used e 1 = 10 −3 and e 2 = 10 −4 . If the process was successful, values that violated any of the constraints were set to zero. These values were expected to be small enough since the error was small. We computed: where U sc = CW sc , with W sc being the resulting synaptic weight matrix after the constraining process, to verify that the deviation from the original U was negligible. If the process was unsuccessful, or the clipping error e clip > 10 −3 , then the original W was considered not to be suitable for the structural fitting.
We measured the efficiency of the process by computing the number of networks generated (# attempts) and the running time t expended until reaching the first successfully constrained network. We varied τ from τ = 3 to τ = 6 . For each τ we varied the number of integration neurons in steps of 16 neurons, from a minimum number of 4.2 τ to the maximum value 2 6 . For each combination of τ and neuron number we generated networks with the FTP algorithm, and subjected them to structural constraining (no self-connections, 4:1 Ex:In ratio, and a minimum sparsity sp = 40% ). We obtained 10 measurements of # attempts and t, from which we computed the mean and SD depicted in Fig. 4e,f.

Network construction from random transition graphs.
To construct random transition graphs that have an associated consistent system, we first defined a row vector and a matrix U lc composed of M = 2 τ row vectors u such that u i+1 − u i = , where u i is the i th row. Here, indexes i are odd numbers between 1 and M − 1 . Next, we constructed matrix U in a way that ensures that each node in the graph was reachable, meaning that every node had to receive at least one edge. This is equivalent to say that every row in U lc is found at least once in U. Therefore, we set the first M rows in U equal to matrix U lc . The remaining rows in U were taken from U lc , picking M/2 pairs of indexes i, i + 1 , choosing i values at random from the set of odd numbers between 1 and M − 1 . In this way, and unlike the transition graphs that solve an s-task, nodes could receive just one edge, or more than two.
So far, if a row vector appeared in matrix U more than once, then it appeared only in odd rows, or only in even rows, but not in both. This is because indexes were ordered from 1 to M in the first half of U, and ordered in pairs of i, i + 1 indexes in the second half. The resulting graph would be one in which any given node is reachable as the result of the presentation of either s 1 or s 2 , but not from both. In other words, if node b is reachable from node a after s i presentation, then node b is reachable from node c only after s i presentation, where c is any other node from which b is reachable. In terms of the colour code of the graph in Fig. 1b, any node receives arrows of the same colour. We wanted graphs as random as possible, so nodes reachable through different stimuli were desired. We defined these nodes as bicoloured nodes. In terms of indexes in matrix U, a bicolored node translates into a row vector u i that appeared in matrix U in both odd and even rows. For example, if we had indexes (1, 2, 3, 4) for the first 4 rows of U, with (u 1 , u 2 , ) and (u 3 , u 4 , ) each being linearly combined, then we wanted to change this series to (1, 2, 2, 4) , or (1, 2, 3, 1) . This requires to generate new linear combinations, in particular, (u 1 , u 2 , u 4 , ) should be linearly combined, for the first example, and (u 1 , u 2 , u 3 , ) in the second example. Thus, we modified matrix U to generate f bc M/4 bicolored nodes, where f bc stands for 'bicolored fraction' and is a number between 0 and 1. The maximum number of bicolored nodes is M/4, since we generated one node for each series of indexes i to i + 3 . Finally, we constructed matrix Z by thresholding matrix U lc , and then matrix C, which rows were in the form: where z i is the ith row vector in matrix Z, and vector k is a permutation of the list of integers from 1 to M.
Following the above procedure, we constructed random transition graphs that satisfied the linear combinations required so that a consistent system of equations could be constructed. Given that these networks do not solve the s-task but are the minimum Frobenius norm solution to a random transition graph, we call them Frobenius (F) networks. The procedure avoids index sequences like (1221) , since this ordering gives a consistent system only if = 0 , in which case stimuli cannot be distinguished by the network. The procedure also avoids index sequences of the type (11) (one node leads to another node through both stimuli, s 1 and s 2 ). If this were the case, one possibility is that s 1 and s 2 produce the same u values (u being an element of vector u). Therefore, is a vector of zeros and stimuli cannot be discriminated. Another possibility is that stimuli lead to different vectors u, but these vectors in turn lead to the same vector z after thresholding. This situation is possible, but would require careful selection of u values in relation to θ , and for this it was avoided. L(k) = �W d (i, j) i,j e clip = U sc (i, j) − U(i, j) i,j       y 1 z k(1) y 2 z k (1) . . . www.nature.com/scientificreports/ We constructed networks that followed random transition graphs and compared their properties with the properties of networks that solved the s-task. In particular, we measured the reciprocity of the network, defined as the Spearman correlation between weights of incoming and outgoing synapses. Reciprocity was computed over matrix W norm , a normalized version of the synaptic weights constructed by taking the absolute values of W and scaling them between 0 and 1. Since imposing structural constraints like Dale's principle, or sparsity, may generate many zero-valued weights, reciprocity was computed for synaptic weight pairs such that both w norm ij and w norm ji were both non zero. For each network generated to solve the s-task we also picked a network from its isofunction space, that is, from the set of all networks that had the same stimulus-response mapping (as described above). We refer to these networks as Task (T) networks, since they solve the s-task but they are not the minimum Frobenius norm solution. The linear mapping M in Eq. (10) has entries M(i, j) = r i,j max i,j W(i, j) , where r i,j is a random number, different for each entry, sampled uniformly from the [−1, 1] interval, and W is the synaptic weight matrix from which an isofunction network is desired. We defined mapping M in this way to obtain isofunction networks with synaptic weight values within the range of the weights in the original network.
In addition, for each network that solves the s-task we constructed an isofunction network with structural constraints: no self-connections and Dale's principle with excitatory and inhibitory neurons in equal numbers.