Information spectra and optimal background states for dynamical networks

We consider the notion of stimulus representation over dynamic networks, wherein the network states encode information about the identify of an afferent input (i.e. stimulus). Our goal is to understand how the structure and temporal dynamics of networks support information processing. In particular, we conduct a theoretical study to reveal how the background or ‘default’ state of a network with linear dynamics allows it to best promote discrimination over a continuum of stimuli. Our principal contribution is the derivation of a matrix whose spectrum (eigenvalues) quantify the extent to which the state of a network encodes its inputs. This measure, based on the notion of a Fisher linear discriminant, is relativistic in the sense that it provides an information value quantifying the ‘knowablility’ of an input based on its projection onto the background state. We subsequently optimize the background state and highlight its relationship to underlying state noise covariance. This result demonstrates how the best idle state of a network may be informed by its structure and dynamics. Further, we relate the proposed information spectrum to the controllabilty gramian matrix, establishing a link between fundamental control-theoretic network analysis and information processing.

. The optimal background state x ref amounts to a Fisher linear discriminant, onto which state distributions (induced by inputs) are projected. In the case of Gaussian noise, uncertainty can be visualized in terms of ellipsoids (with principal axis v max ) about the mean. Since the networks are dynamic, the optimal x ref will vary with time as the dynamics carry the states forward.
SCientifiC RepoRtS | (2018) 8:16181 | DOI: 10.1038/s41598-018-34528-y where the n-dimensional state vector x's recurrent dynamics are described by adjacency matrix  ∈ A n n x , input matrix  ∈ B n m x mediating the m-dimensional input u, taken here to be constant (see Discussion), and zero-mean gaussian noise w(t), which has covariance matrix Σ w . We point out the fact that the term dynamical network is used here to imply time-evolution in the network states, as opposed to a time-varying vector field; that is, A is constant. We wish to consider the linear Fisher information regarding u given the inner product of the state x(t) (which varies in time) and a reference background x ref . By basic linear system theory n n x( ) is a covariance matrix determined by the system dynamics.
Inner Product and Fisher Information. As we seek to quantify the extent to which the inner product of x(t) and x ref encodes information about the input u giving rise to x(t), we employ the Fisher information matrix, denoting it u  , which is given by Using the derivation given explicitly in the Methods section, we obtain the Fisher information matrix where Σ x is the state covariance matrix as introduced in (2). In seeking a holistic assessment of the matrix  u , we employ the trace, which is the summed component-wise variance in our estimation of u. Since the u  is an outer product of two vectors (scaled by the denominator) we may express its trace as their scaled inner product: where dependence on t has again been made explicit.
Linear Dynamics and Noise Ellipsoids. Figure 1 provides a schematic of the problem formulation. Because our dynamics are linear, at any given time t the state of the network is a Gaussian random vector. The covariance of the state can be used to parameterize a quadratic form whose level sets constitute ellipsoids that encapsulate the mean. We denote the principal eigenvector of the covariance matrix as v max . These ellipsoids capture the noise-driven uncertainty in the state. As we will soon see, the optimal x ref amounts to a Fisher linear discriminant that best disassociates two competing state distributions (ellipsoids), each associated with a different stimulus. As the network dynamics carry these trajectories forward in time, the optimal x ref will in general change. Network Parameterization, Actuated Nodes and Steady-State Assumption. We will focus our attention on networks that have a Barabási-Albert (scale-free) topology 24 . The off-diagonal elements of A are binary, while the diagonal elements are assigned large enough negative values to ensure stability (see Methods). The dynamics of such networks are asymptotically stable so that in the absence of stimuli and noise, all states return to the origin. In our analysis we will vary the structure of how inputs impinge on network nodes. In particular, for an n node network, only n d ≤ n nodes will receive input. These actuated nodes are sometimes referred to as 'driver' nodes [25][26][27] . We will mostly consider the case when each actuated node recieves an independent input, so that where I n d is the identity matrix of dimension n d (the number of driven nodes). We make the assumption that the noise covariance is always at steady-state. In concept here is that the dynamics of the network are persistently excited by ongoing noise, while receiving stimuli in a temporally punctate manner. To be mathematically precise, under this assumption (2) becomes: Critically, we assume the pair A, B is controllable, so that the controllability gramian (precisely defined later) is full-rank. A final important assumption pertains to the specification of t. In cases when t is assumed to be at steady state, we set t = 10 (which we find is five times longer than the time-constant of our considered networks). In other cases, we will vary t to assess the role of dynamics.
An optimal reference state x ref exists, maximizing information about u. We are interested, for the moment, in which choice of x ref will maximize (10). That is, we seek to answer the question: Of all possible background states x ref , which one will provide the most information about a stimulus u (with its resultant output x), given a readout of the inner product 〈x ref , x〉. In order to find this 'ideal' reference stimulus, we transform (10) as follows: x min and ⁎ x max are the eigenvectors of S associated with eigenvalues λ min and λ max respectively. We then make the reverse transformation to obtain our ideally contrasting reference state. Mathematically (and as depicted in Fig. 1) x ref is in fact the Fisher linear discriminant that best separates the induced state distributions associated with any two randomly chosen inputs.
Previous results 9 have shown that an optimally informative 'signal direction' in a non-dynamical feedforward network is one which align with the principal axis of the noise covariance ellipsoid. Similarly, with our dynamical setup, we decided to explore the optimal x ref qualitatively by examining to what extent it aligns with the principal axis of the noise covariance ellipsoid (v max of Σ x in (10)). The results are shown in Fig. 2. We notice in Fig. 2 that the ideal x ref changes its orientation relative to v max as a function of n d . This orientation is virtually uncorrelated with network size and is very predictable, as we ran 30 network realizations for each n, n d pair and found little variability. We hypothesized that this was due to prioritization of the fidelity of the portion of x ref corresponding to actuated nodes, which would explain why relatively under-actuated networks showed greater overall angular divergence between x ref and v max . This is indeed the case, as shown in Fig. 3. We first examined actuated nodes, then non-actuated nodes, by segmenting x ref and v max into the first n d elements ( Fig. 3(a)), then the last n − n d elements ( Fig. 3(b)). Clearly, the actuated part of x ref is required to be much more similar to the corresponding part of v max than is true for the non-actuated part. Aside from the dependence of the optimal x ref on input structure (particularly n d ), we also analyzed how the orientation of x ref , relative to v max , changes with time. Since, as mentioned above, we are working in a dynamical regime, a time-dependent analysis is straightforward. To this end, we evaluated the orientation of x ref , relative to v max , at several time points, using the same methodology employed above, with the results shown in Fig. 4. We see that the orientation of x ref relative to v max does indeed change with time, apparently smoothly, and that x ref becomes more similar to v max as time advances. This is especially true for fully-or nearly fully-actuated networks, but is generally true for all input scenarios.
Thus, the optimally contrasting background/reference state is fundamentally dependent on the input structure of the network and the time evolution of network dynamics.

An optimal reference input u ref exists, maximizing information about u.
We expanded our inquiry to analyze admissible reference inputs u ref which could give rise to an optimally contrasting state x ref . More generally, we asked: Of all possible stimuli, does there exist a best one u ref , resulting in an output  x ref , that provides information about all others. In this formulation,  x ref is not longer unconstrained, but rather is determined by: Using (18), we can find the optimal reference stimulus via a similar sequence of steps as in the previous subsection, defining where LL T (L is lower-triangular) is the Cholesky decomposition of B T Γ T Σ x ΓB, which is positive-definite (a requirement for this decomposition) since covariance matrix Σ x is inherently positive-definite and thus can be Cholesky decomposed into Σ Σ L L T , so that the matrix B T Γ T Σ x ΓB can be written QQ T for Q = B T Γ T L Σ and is thus positive-semidefinite, while the full-rank condition of Q ensures positive-definiteness.    We pause for a moment to consider the significance of this 'optimal' u ref (i.e. the eigenvector of S u which optimizes (21)). The existence of such an optimum means that for a given network, there is one input whose induced state best contrasts those of all other inputs.
The optimally contrasting input targets specific nodes in a concentrated manner, but not necessarily nodes of highest degree. We sought to characterize the 'optimally informative' u ref by examining its entries (recall that we are in the domain of constant inputs) as they relate to the connectivity degree of actuated nodes. Clearly, u ref has cardinality n d (see (11)). Since, then, there is a one-to-one relationship between the n d entries of u ref and the driven nodes, we are able to learn about which nodes may be specially 'targeted' by an optimally contrasting u ref . Intuition would suggest that the targeted nodes would simply be the hubs, that is, that the higher the degree of a node, the higher the value of the corresponding entry of u ref . This is borne out in simulation, but to an extent which varies consistently with network size (n) and n d . Examining Fig. 5, we see that for larger networks wherein all nodes are controlled, nearly all of the large entries of u ref are concentrated toward nodes in the top 5% by degree ranking (i.e. the hubs), while as we control fewer nodes, a majority of the large entries are directed toward the hubs, but this majority becomes smaller as n d decreases. Also, looking at the different network sizes, we see that, in general, larger networks show a more pronounced 'targeting' of the hubs, while in smaller networks the hubs are still targeted but to a lesser extent. It should be pointed out that u ref is unitary, meaning there is an essential trade-off between how much energy can be focused on hubs and how much can be focused elsewhere (as is easily seen in Fig. 5), so that in very hub-oriented scenarios (i.e. large networks with high fraction of controlled nodes), u ref is nearly a standard basis vector, while in smaller networks wherein fewer nodes are controlled, u ref is more homogenous.
For comparison's sake, we ran simulations with randomly connected (Erdös-Rènyi (ER), with 0.5 edge probability) instead of scale-free networks. These networks were also undirected and rendered stable by the same method (described in Methods). We did use small (<0.1) positive edge weights, rather than unitary weights, for these networks to render their analysis more numerically tractable. We see in Fig. 6 that the optimal u ref also tends to target nodes of higher degree in random ER networks, but to a much lesser extent than for scale-free networks. We hypothesize that this is because the degree distribution for scale-free networks is given by a power-law, which means there are many nodes of very low degree, and a few of very high degree. ER random networks have a binomial degree distribution, with more nodes of average degree and none of very high degree. Thus, it may be less crucial for the input to target the higher-degree nodes in ER random networks, simply because the higher-degree nodes are not much higher-degree nodes. In the ER random networks, we see a skewing of the values of u ref which Figure 5. The optimally contrasting input u ref targets network 'hubs' , but to a degree which varies with n d . 30 networks were realized with for each size (n) and driver node (n d ) combination. For a given network size, the graph shows the mean (μ i ) of the squared entries of the (normalized) optimal u ref . The entries of u ref are sorted according to the degree of targeted nodes (abscissa is a percentile, binned in increments of 5%, so that each b i represents 5% of the nodes). Note that when n = 100 and = n n d 10 there are twice as many bins as controlled nodes, hence the duplicity of values. Information Spectra (of S u ) are Sensitive to Network Parameterization. We now turn our attention to the problem of comparing different networks according to their information capacity, as quantified by  u . For this we examine the information capacity by varying u ref in (21), where the intuitive strategy is to let ⁎ u range over the eigenvectors of S u . Thus, a holistic characterization of tr( ) u  is provided simply by the eigenvalue spectrum of S u (recall that (21) takes on the value λ i , the i th eigenvalue, when u ref is the i th eigenvector), heretofore termed the information spectrum of a network.
We obtained a distribution of information spectra for several network parametrizations. We here restricted our attention to steady state characterizations. Each distribution amounts to an empirical probability distribution of the eigenvalues of S u over (random) network realizations. We assumed used zero-mean, unit-variance, uncorrelated noise (i.e. , though similar results were obtained for correlated noise. Figure 7(a) depicts the information spectra for several fractions of actuated (driver) nodes (aggregates over several values of n). A first observation is the presence of a small, secondary mode to the right of the principal mode. This secondary mode reflects the presence of a few particularly salient inputs that most informatively correlate with all others. It is notable that this mode, which represents the largest eigenvalue of S u , systematically decreases with smaller values of n d . Certain intuition about these observations can be deduced from the rich body of work on spectra of random matrices. One such spectral characterization 28 shows that the principal eigenvalue of the adjacency matrix (here denoted A) for undirected, binary scale-free networks (such as those used for our simulations, with the exception that the diagonal of our A is adjusted, as described in Methods, to ensure stability) approximates n 1 4 , where n is the number of network nodes. Further, recent work 29 has shown that this maximum eigenvalue, for weighted scale-free networks with expected degree distributions, varies monotonically with the maximum node degree. Maximum degree, in turn, increases dramatically as n increases, because of the preferential attachment-based network creation algorithm 30 . Thus we would expect the spectrum of S u , and in particular its principle eigenvalue, to depend on effective network size, which itself depends on n d (see (11), and note the  n = 100, 200, 300, 400). Spectra consist of a primary mode and a smaller secondary mode. (b) Spectra of the controllability gramian for different fractions of actuated nodes. As noted in previous work, these spectra display an increasing number of modes as n d decreases. The principal mode is inset. Comparing to the information spectra in (a), we see that information spectra show marked similarity to first mode of control spectra, and both spectra reveal outlying, small modes corresponding to easiest (control) and most informative (information) directions.  (20)). This makes sense intuitively, as well: We would expect higher-dimensional input spaces to admit a richer set of encoded representations. Further, as n d decreases, the distribution of the main mode becomes broader and more entropic. No additional modes or 'humps' appear as n d varies, a point we will return to shortly.
Information Spectra are Related to the Controllability Gramian. As noted previously, the information spectrum is fundamentally time-varying (governed by the network dynamics, driven by the input in question). We were particularly interested in the relationship between the information spectrum and that of the controllability gramian matrix which also fundamentally characterizes the input-output relationship of a linear (networked) system. Indeed, it is well known that in the limit as → ∞ t , the gramian is exactly equivalent to Σ x , i.e., the denominator of  u . Thus, we sought to compare the information spectrum to that of W(∞).
The gramian matrix has been a pivotal entity in the analysis of linear systems and similarly modeled networks [31][32][33] , including certain types of brain networks 12,34 . Recent theoretical work 24 has characterized the nature of the infinite-time gramian spectrum as a function of the number of driven nodes (n d ). It is shown there that for small fractions of driven nodes the spectrum manifests a series of modes or 'humps, ' over which eigenvalues are randomly distributed (over network realizations). As is well known in linear systems theory, the magnitude of a gramian eigenvalue determines the minimum input energy needed to reach the unit hypersphere in the direction of its associated eigenvector. Thus, the principal mode of the gramian spectrum describes those directions that are 'easiest' to induce. Figure 7(b) depicts the gramian spectrum for the same networks as in Fig. 7(a) (i.e., with varying fraction of actuated nodes). The aforementioned modes are readily evident. What is notable from this figure is the correspondence between the information spectra to the two rightmost modes of the gramian spectrum (that is, the principal mode and the much smaller mode at far right). In interpreting this result, it is important to note that the information and gramian spectra are of different dimensions x . This is because the information spectrum captures only constant inputs, thus for a fixed time the state is restricted to an m-dimensional subspace. In this sense, we postulate that the principal mode of the gramian spectrum corresponds not simply to the 'easiest' to reach directions, but also those associated with constant (m-dimensional) inputs.
Let us now seek to understand this numerical correspondence between control and information, shown in Fig. 7, at a conceptual level. What does it mean that the easiest directions of control (quantified by the largest eigenvalues of W −1 ) and network information (quantified by u  ) show such similarity? We hypothesize that this correspondence may be indicative of an underlying link between controllability metrics and information-based analyses, generally. Indeed, this is not a novel idea; the mathematical basis for this link has been explored 35,36 in contexts different, but related, to ours. We can summarize the essence of these discussions, as it relates to our formulation, simply by noting that u  depends fundamentally on a derivative of the state (to be more precise, an inner product of two states) with respect to u. Thus, when system dynamics are such that incremental changes made to u result in large changes to the state, informational value is increased. This information is, to some extent, a measure of network sensitivity to its inputs, and sensitivity to inputs is, of course, exactly what controllability analysis quantifies.

Discussion
We developed an analysis to quantify the amount of information about an input u that can be gleaned from the contrast/correlation between its induced state x u and a reference or background state x ref . Our analysis shows that there exists an optimally informative x ref in this context. This theoretical result reinforces intuition about how proper choice of a contrasting background might enable more rapid decoding and subsequent processing of input stimuli. We showed that the orientation of x ref relative to the principal axis of noise covariance decreased monotonically with increasing fraction of nodes actuated and that this separation also decreased over time, but to an extent limited by n d This dynamical relationship between the informational optimum and the noise covariance is complementary to results based on static models 9 .
We expanded our inquiry to examine the u ref which would give rise to x ref . We found that the optimal u ref tends to target network hubs, but in a way which varies consistently with number of nodes driven n d (See Fig. 5). We then derived an information spectrum that characterizes the full encoding capacity (in terms of inner product readout) of inputs. We showed that this spectrum has nuanced dependency on network size and fraction of driven nodes, with the presence of a low-dimensional set of inputs to which networks appear particularly well-tuned. Further, we reconciled the information encoding of a network with its control-theoretic properties, which characterize how the 'energy' of an input allow for the state space to be traversed. Our results suggest that inputs that produce 'easy' state excursions-recall that these inputs are postulated to be constant or near-constant (see Section)-are also those that are well-encoded.
It may reasonably be asked why we have chosen inputs to be constant in the overall paradigm. At a conceptual level, our information analysis is fundamentally predicated on the derivative ∂〈 That is, we seek to quantify the extent to which changes in the projection of system state x onto background x ref reflect incremental changes in u. In the case of a constant u, this is readily interpreted -it quantifies the ability to deduce changes in the input composition. However, interpretability is more problematic for a time-varying u(t sense, because we are dealing with a variational problem in infinite-dimensional function space, intuition is difficult. This argument can be seen mathematically. Examining (3), we see that taking a derivative with respect to u(t) presents us with the task of taking the derivative of one function of t ( 〈 〉 t x x [ ( ), ] ref ) with respect to another function (u(t)). Thus, u  would become dependent on u′(t). But we conduct our analysis with respect to the objective of learning about u from a 'readout' of only the projection of x(t) onto x ref . To assume a knowledge of the time derivative of u(t) changes the setup completely. One way around this dilemma would be to project u(t) onto a set of orthogonal basis functions (a Fourier basis, for example). If we denote a vector of basis functions (truncated so as not to be infinite) as h(t), we can approximate (almost) any u(t) by Uh(t), where U is a constant projection, or coefficient, matrix. Then,  u becomes linear in U and the basic formulation is preserved, with the change that instead of seeking to infer constant input u via the state projection, we seek to infer coefficient matrix U. A thorough treatment of this idea will be given in future work.
Having highlighted the results from the exploration of u  , let us take a slightly higher-level look at the information processing which u  quantifies. Considering the inner product as the 'readout' (which forms the basis of information measure u  ) is intuitive since it measures correlation/contrast between two competing representations of a stimulus. In this sense, it is a highly condensed representation of potentially high-dimensional stimuli. However, it is far from clear whether a network itself could accomplish this readout, and whether this is in fact a reasonable strategy for actual information processing tasks such as input classification. The linearity of the model considered is certainly a limiting factor in this regard.
Nonetheless, we believe our results highlight an interesting direction toward analyzing not simply the structural aspects of networks, but also their dynamics and ultimately their functionality. It is straightforward to envision generalizing our framework to examine other network topologies, dynamical nonlinearities and wider time-scales, as well as alternative information metrics. These types of analyses can shed light on the functional advantages of biological networks (e.g., those in the brain) and/or principles for guiding the design of engineered systems. Network parameterization and simulations. To ensure stability, it is sufficient 24 to ensure that, ∀ ∈ … i n {1, , }, the i th diagonal element of binary adjacency matrix A is at least as negative as the sum of the non-diagonal elements in row i. That is, ∑ . Accordingly, in constructing networks, we first created a scale-free degree distribution and then formed a corresponding random graph, thus prescribing adjacency matrix A. Next we simply assigned , where each δ i was picked at random from (0, 1).
Creation of these adjacency matrices and the B matrices, as well as the calculations of optimally contrasting background state x ref and reference stimulus u ref , with the associated statistical analyses, were performed using Mathematica, with the exception of the calculations of controllability gramians, which were done by exporting these matrices to MATLAB, and using the lyap() command.