Dynamical complexity and computation in recurrent neural networks beyond their fixed point

Spontaneous activity found in neural networks usually results in a reduction of computational performance. As a consequence, artificial neural networks are often operated at the edge of chaos, where the network is stable yet highly susceptible to input information. Surprisingly, regular spontaneous dynamics in Neural Networks beyond their resting state possess a high degree of spatio-temporal synchronization, a situation that can also be found in biological neural networks. Characterizing information preservation via complexity indices, we show how spatial synchronization allows rRNNs to reduce the negative impact of regular spontaneous dynamics on their computational performance.

Scientific REPORTS | (2018) 8:3319 | DOI: 10.1038/s41598-018-21624-2 of spatio-temporal dynamical properties, we particularly focus on beyond fixed point operation. We employ a variation to classical rRNNs [25][26][27]30 by using nodes with a sinusoidal activation function. A broad range of autonomous dynamics are the consequence, among which we most importantly find multiple non-fixed point states with surprisingly high computational performance. We show that spatial synchronization between nodes and bifurcation point play essential roles in information processing. The underlying mechanisms are analysed based on the mutual information between each node of rRNN and the input time signal, as well as the rRNN's maximal Lyapunov exponent. Our choice of system is directly motivated by its randomness: we can exclude structural modifications induced by learning being the cause behind spatial synchronization.

Results
Random recurrent neural networks. Our rRNN consists of a set of N = 500 nodes in state x n , internally connected via a random, uniformly distributed internal weight matrix W of dimensionality N × N. The resulting random networks have a temporal evolution governed by n n off f b n T 1 1 where {W, W off , W fb } are matrices defining the random weight connectivity of the rRNN for the network itself, b = 0.2 the offset operating points at each node, and the input layer connectivity with the signal α ⋅ + y n T 1 (α the input scaling), respectively. μ is the feedback amplification. The connectivity matrix W is constructed with 500 × 500 random, uniformly distributed coefficients in [0, 1], from a matrix with connectivity 0.99. The rRNN is schematically illustrated in Fig. 1(a), where nodes (symbol ⊕) add and nonlinearly transform all inputs according to random weights {W, W off , W fb }. In its bifurcation diagram this neural network experience multiple windows of regular dynamics. In Fig. 1(b 30 . In that case, a steady state only exists for μ < 1.4. However, once it bifurcates there are no additional steady state windows that can be used for investigating their information processing properties, i.e. the network's dynamic falls in a chaotic regime. In such a system one could not compare results obtained under similar dynamical states. As we intent to investigate general properties for computation in non-steady state systems, we opted for our modified rRNN. Autonomous rRNN dynamics (α = 0) for different values of the bifurcation parameter μ are shown in Fig. 2. The left column shows the functional input-output relationship of Eq. (1) for node 34. The central column displays exemplary individual time series for the same node, referred to as local dynamics, while the right column shows the dynamical state of the full rRNN. For μ = 5 (regime R 2 ) the node state is symmetrically concentrated along the nonlinear function's extrema, see panel (a) of Fig. 2. The resulting dynamics of + x n 1 34 and the full networks state x n+1 is shown in Fig. 2(b,c), respectively. Autonomous dynamics of + x n 1 34 are therefore periodic, and according to Fig. 2(c) such local periodic dynamics strongly synchronize across the rRNN. When increasing bifurcation parameter μ to 10, dynamics span an increasing number of the nonlinear function's periods, see Mitigating autonomous dynamics by learning. All previously discussed dynamical properties have been exclusively obtained in the absence of an external stimulus. However, as an information processing system, the rRNN realizes computation on the bases of rich dynamical responses to external, i.e. sensory input. We therefore activate the rRNN's input by setting α = 0.8 and investigate its dynamics when fed by a chaotic time series. We add an output layer that provides the computational result according to A divergence of the system's output y out for large μ is avoided by limiting the range of y out through the hyperbolic tangent in Eq. (2). The output weight vector W out is calculated according to a supervised learning rule based on a teacher/target signal α + y n T 1 , Eq. (3). Once trained, the input signal is replaced by the network's own output 30 = + + y y n T n out 1 1 in Eq. (1), and the system autonomously approximates dynamics learned from the teacher system, here the Mackey-Glass (MG) sequence from Eq. (7) 32 . Computational performance is determined after a free evolution of 35 time steps, twice the time-delay of the MG sequence (τ m = 17 in Eq. (7)) 32,33 . In Fig. 3(a) the prediction NMSE (Eq. (4)) is shown for 0 < μ ≤ 10. At each μ we repeated the previously introduced training procedure. The optimal performance (NMSE = 5.5 × 10 −4 ) is found for a very narrow regime around μ = 0.9, which comes at no surprise as it corresponds to the often employed computation close to the self organizing criticality 25 . However, additionally we identify multiple broader regions of acceptable performance with a prediction error of roughly NMSE ≈10 −2 . A comparison to the rRNN's bifurcation diagram of Fig. 1(b) reveals that these regions directly correspond to R 1 , R 2 and R 3 , where rRNN dynamics are regular and nodes are synchronized. Regimes R 4 and R 5 are not treated in our analysis since small perturbations result in their destabilization, driving the rRNN instantaneously into the next chaotic regime. In all other regions the error is orders of magnitude higher.
As previously introduced, in the self-driven mode the rRNN's output becomes its own input. If not suppressed, perturbation-like autonomous rRNN dynamics can therefore freely propagate through the system due to this recurrent input/output relationship. This raises the question as to how the network can mitigate internal dynamics so efficiently. The answer lies within the learning process. We demonstrate this by creating the rRNN's output + y n out 1 for α = 0, however using the W out previously learned for approximating + y n T 1 for the driven system at α = 0.8. From the resulting signal we discard the first 10 data points to avoid possible transient behavior. From the remaining 25 samples we calculate the average output amplitude variation σ α=0 via Eq. (5). This measure σ α=0 evaluates the output weights' performance for suppressing autonomous dynamics, and the black data in Fig. 3(a) demonstrates how intricate σ α=0 and the NMSE are related. Within regions R 1 , R 2 and R 3 , learning efficiently separates autonomous dynamics from transients induced by the rRNN's input. As revealed by the low values of σ α=0 , the rRNN can there approximate + y n T 1 well because the impact of autonomous rRNN dynamics on y out is strongly reduced. As a consequence, perturbations not present in the training data can be isolated from dynamics in the target data.
In general, the spontaneous internal activity of each node has an impact on the collective evolution of the network. As illustrated in the right column of Fig. 2, this recurrent network architecture can cause different levels of dynamical diversity through out the rRNN, which in turn potentially influences the propagation of information within the network. The spatio-temporal evolution of the rRNN can be quantified by the spatial synchronization between the nodes. The standard deviation, described by Eq. (6), globally measures spatial synchronization between nodes 24 . Figure 3(b) shows how uniformly synchronous the nodes are depending on the value of μ when the network presents autonomous (black dots) and driven activity (blue stars). The nodes of the rRNN are less synchronized in the non-regular windows when the network is perturbed by an external signal. In regions R 1 , R 2 and R 3 , however, synchronization is nearly preserved in both cases. As demonstrated by Fig. 3(b), the global rRNN synchronization error significantly decreases in regions R 1 , R 2 and R 3 . Thus, the dependence of synchronization δ on bifurcation parameter μ is highly comparable for, both, the driven and the autonomous system. Region R 1 can clearly be separated into two sections. For μ ≤ 1 node responses consists of constant states; the system is operating in the linear section of the nonlinear function. For μ > 1 the network states also cover the nonlinear function's extrema and nodes start evolving in synchrony. Region R 3 is significantly more sensitive to μ when compared to R 1 and R 2 . This is due to the proximity to parameters resulting in chaotic dynamics. Such a sensitive operating point is less recommendable when for example we use a noisy hardware or biological rRNN for prediction. In fact, in R 3 we find that the nodes are in steady states with small or vanishing amplitude dynamics, however not as well synchronized as in R 1 , and R 2 . The comparison between panels (a) and (b) of Fig. 3 highlights the importance of spatial synchronization for good prediction performance. Data shown in Fig. 3 shows the statistical average obtained from 100 realizations of W.
An extensive qualitative analysis of the dynamics associated with good prediction performance is shown by Fig. 4. For μ = 0.8, Fig. 4(a) shows the spatio-temporal plot of all nodes when the input is injected in steady state regime R 1 . Column-shaped patterns throughout the entire spatio-temporal plot are induced by the external data and therefore indicate information preservation. One example of how input information is preserved within the rRNN is shown by Fig. 4(b), where the randomly chosen node 31 ( + x n 1 31 ) shows a nonlinearly transformed version of the input. For a geometrical illustration of the information carried by the node, we illustrate the system's dynamic by reconstructing the attractor of node 31 through Takens embedding Theorem 34 . We used embedding parameters of delay τ = 12 and dimensions D = 4 which are the ones obtained for delay embedding the original MG attractor 35 , also see Methods section. A 2D projection of resulting state space is shown by Fig. 4(c), which is qualitatively comparable with the structure of the chaotic MG attractor shown in the Methods section.
At μ = 5 (regime R 2 ) we find the previously described periodic oscillations. The spatio-temporal plot of the driven rRNN shows once again a constant phase relation across all nodes, see Fig. 4(d). Spatio-temporal features corresponding to the input timetrace masked by the autonomous oscillations. Consequently, the closer inspection of individual node evolutions shows that the spontaneous rRNN internal dynamics are still present, see Fig. 4(e). We find that node dynamics consist of two contributions. Large amplitude oscillations at fast timescales correspond to the autonomous dynamics, while nonlinear transients induced by the input information are encoded in the slowly varying envelope. Such specific and well separated time scales are a requirement for suppressing the crosstalk of autonomous dynamics to y out . Combined with Fig. 4(f), the impact of regular autonomous rRNN dynamics becomes clear. The fast autonomous oscillations separate the node's attractor into two regions of its state space. Within each of these regions, the local attractor again resembles the one of the injected MG sequence. As such, removing this division by training corresponds to σ α=0 , and the system should still be able to approximate the target attractor 36 . Periodic dynamics ensure the discussed separation between timescales, while synchronization minimizes the resources learning has to dedicate for their suppression, leaving more freedom for optimizing prediction performance. Upon increasing the bifurcation parameter to μ = 8.7 (regime R 3 ), the collective dynamics shown by Fig. 4(g) have similarities to the one shown in panel (a), even experiencing a degree of synchronization. In this case, the node's responses to the external information is perturbed by irregularly appearing, noise-like epochs, see Fig. 4(h). The fixed point of R 3 is a quasi-steady state, yet due to the narrow width of R 3 the rRNN is forced outside this stability window even by small fluctuations which in turn can induce noise like epochs. Figure 4(i) shows the effect of the noise on the reconstructed attractor. The noise strongly distorts the node responses away from the MG attractor. As induced by noise-like epochs, these distortions strongly hamper the determinism in the rRNN's response to the injected information.
Preservation of information in destabilized rRNNs. Spontaneous dynamics in the rRNN therefore result in distortions of its response. At this point it is important to recall that the input sequence is chaotic, yet by no means random. It is the result of complex, yet causal deterministic processes. Predicting such a signal therefore demands these causal relationships to be preserved within the neural network's dynamical state, providing a functional relationship to currently and previously injected information. For low error prediction it is therefore an essential condition that the network can serve as carrier and short term storage of injected information. Synchronization is not sufficient to estimate if a rRNN complies with this condition. Quantifying the information content preserved within the rRNN when stimulated by an input, we calculate the mutual information (MI) between the rRNN and the input signal. This provides an estimation of how well the network is able to maintain the input information content 25 , and hence is capable to capitalize from these internal causal relationships for computation.
We consequently evaluate the network by estimating the memory capacity C via the mutual information between each node and the input, see Eq. (8), and maximal Lyapunov exponent λ max , as functions of the bifurcation parameter. In Fig. 5 we show the rRNN's C and λ max as black dots and blue stars, respectively. As during our previous analysis, we find that regimes R 1 , R 2 , and R 3 show their capabilities to accurately preserve previous input information. In steady state regimes R 1 and R 3 the memory capacity C is higher than in R 2 , where the rRNN's spontaneous behavior is periodic in general. Our Lyapunov component analysis reveals that λ max is kept small inside R 1 , R 2 , and R 3 due to their non-chaotic spontaneous features. In fact, for μ ≤ 1 in R 1 , λ max of the network approaches with the one estimated for the input signal (λ ∼ . × − 3 6 10 max MG 3 ), indicated by the dashed line in Fig. 5. For μ > 1, oscillatory, spontaneous rRNN's dynamics are combined with the injected input information. As the internal dynamics of the rRNN begin to exert influence over dynamics induced by the MG input, λ max starts increasing accordingly. This behavior agrees well with the decrease of memory capacity, where the internal rRNN's dynamics will modify the probability distribution of the nodes. This demonstrates a strong correlation between the decline of spatio-temporal synchronization and the reduction in the system's memory capacity to approximate the deterministic, functional relationship of the prediction task.

Conclusion
Unlike neural networks run on a data-center, the human brain is not a special purpose computing machine, but parameters will most likely be optimized according to a compromise between partially competing demands. We therefore demonstrated that information preservation and synchronization inside a random network allow good prediction performance at parameters where learning in biological neural networks benefits. Based on a rRNN with a periodic nonlinear function we compare various regions of regular dynamics and highlight their importance of spatial synchronization upon prediction performance, mutual information and the stability of the neural network. Synchronization between nodes plays an essential role, but it is not sufficient to understand how information processing is successful in a rRNN beyond its fixed point. On the contrary, when linear regression is used to realize supervised learning, a causal relation between processed information and target is required.
We describe a rRNN predicting the future time-steps of a chaotic trajectory. Our results illustrate the importance of information flow, divergence and the suppression of signal components not present in the training data set. The rRNN's damped autonomous deviation σ α=0 , mutual information MI and maximal Lyapunov exponent can be seen as complexity indicators for interpreting neural networks based on dynamical systems. Other than for the oscillatory state, chaotic responses were not capable to maintain important features of the input dynamic, resulting in a low prediction performance. Finally, delay systems exploiting an identical nonlinearity have been reported 37,38 and it would be of interest to investigate such hardware systems based on the here introduced methodology.

Methods
Training of the rRNN. For the training step we use 2000 values from the MG system and α = 0.8 30,32,33 . The training target is equivalent to the input signal, shifted by a single time step. Via the teacher we estimate the optimal output weight vector W op where σ is the standard deviation.
Statistical amplitude variation. The average output amplitude variation:  with i∈ [1,500]. A normalization by μ then allows to associate δ n+1 to a synchronization error in phase of the nonlinear function in Eq. (1).
The Mackey-Glass system. The MG system is a first order nonlinear delay differential equation 32 , whose time-discrete version is the following 33 : . Under these conditions the rRNN as a whole therefore is capable to preserve the input information without significant loss of information, hence learning should be possible in principle. By accumulating all MI i in the global measure C = ∑ i (MI i ) defined as memory capacity 26 . Maximal Lyapunov exponent. Complex dynamical systems are typically classified using the rate of exponential divergence between neighbor trajectories, corresponding to their Lyapunov exponent. Specifically chaotic systems have a positive maximal Lyapunov exponent λ max 40 . The maximal Lyapunov exponent [40][41][42] is at first calculated for each i-th node, λ max i . Then the maximal Lyapunov exponent of the rRNN is λ λ = max( ) max m ax i .