Spontaneous emergence of computation in network cascades

Neuronal network computation and computation by avalanche supporting networks are of interest to the fields of physics, computer science (computation theory as well as statistical or machine learning) and neuroscience. Here we show that computation of complex Boolean functions arises spontaneously in threshold networks as a function of connectivity and antagonism (inhibition), computed by logic automata (motifs) in the form of computational cascades. We explain the emergent inverse relationship between the computational complexity of the motifs and their rank-ordering by function probabilities due to motifs, and its relationship to symmetry in function space. We also show that the optimal fraction of inhibition observed here supports results in computational neuroscience, relating to optimal information processing.


Introduction
The relationship between physical systems and information has been of increasing and compelling interest in the domains of physics [3,13,41], neuroscience [22,32], computer science [7,8,19,27,33,37,44], quantum computing [21,26,41], and other fields such as computation in social networks [10,28,30], or biology [4,31] to the point where some consider information to be a fundamental phenomenon in the universe [16,20,42].Often, physical systems operating on information take place on, or can be modeled by, network activity [40], since information is transmitted and processed by interactions between physical entities.
The principle of Occam's razor and goals of achieving a deeper understanding of these physical-information interactions encourage us to find the simplest possible processes achieving computation.Thus we may conduct basic research into understanding necessary and sufficient conditions for systems to perform information processing.
Cascades, particularly on networks, are such a simple and ubiquitous process.Cascades are found in a great number of systems -the brain, social networks, chemical-, physical-, and biological-systems -occurring as neuronal avalanches, information diffusion, influence spreading, chemical reactions, chain reactions, activity in granular media, forest fires or metabolic activity, to name a few [6,9,12,15,25,40].The Linear Threshold Model (LTM) is among the simplest theoretical models to undergo cascades.As a simple threshold network, the LTM is also similar to artificial models of neural networks, without topology restrictions [27].
Since the work of Shannon [34], the bit has been considered the basic unit of information.Therefore, whatever we can learn about processing of bits can be extended to information processing in non-Boolean systems.The tools of Boolean logic then allow us to begin to develop a formalism linking LTM and other cascades to information processing in the theory of computing [39].In systems of computation or statistical learning, patterns of inputs are mapped to patterns of output by Boolean functions [27,29].
Another way to express this is that a bit is the simplest possible perturbation of a system.Bits can interact via some medium, these interactions can be represented by edges in a network, and Boolean functions describe the results of possible interaction patterns.
Since we aim to study this topic from first principles, we are interested in how the combinatorial space of possible networks interacts with the combinatorial space of possible Boolean functions, via cascades and the control parameters.Particularly, we would like to understand the phase space of Boolean functions computed by LTM nodes on the input (seed) nodes by the cascade action.
From a mathematical perspective, we can treat the brain or other natural systems having N elements in the worst case as a random network, where there are N (N − 1)/2 possible connections, yielding 2 (N (N −1)/2) possible networks.Meanwhile, the space of Boolean functions grows exceptionally quickly.There are 2 2 k unique Boolean functions on k inputs.This immediately makes us ask how this space behaves, and how large networks such as the brain can navigate toward particular functions in this vast space.We also observe that for the all the functions available on k inputs, the decision tree complexity (depth of decision tree computing them) appears exponentially distributed, meaning that the vast majority of functions available are complex as k increases.
A somewhat surprising initial result in this investigation is that complex functions on inputs emerge spontaneously and seemingly inevitably as threshold networks are connected at random.

Linear Threshold Model (LTM), Boolean logic, and antagonism
The Linear Threshold model (LTM) [40] is defined as follows: A random (Erdos-Renyi-Gilbert) graph is constructed, having N nodes and p, the probability of an edge between each pair of nodes.Each node is then assigned a random threshold φ from a uniform distribution, φ ∼ U [0, 1].Nodes can be unlabelled or labelled, and are all initialized as unlabelled.To run the cascade, a small set of seed nodes are perturbed, marked as labelled.Now, each unlabelled node u is examined randomly and asynchronously, and the fraction of its graph neighbors that are labelled L(u) deg(u) is determined, where L(u) is the number of u's neighbors that are labeled, and deg(u) is u's degree.If u's fraction reaches its threshold L(u)  deg(u) ≥ φ , u is marked labelled.This process continues until no more nodes become labelled.Here we note that the LTM may be written in vector form, and bears some similarity to the artificial McCulloch-Pitts neuron [27].
It has been shown that the LTM exhibits percolation, where a giant connected component (GCC) of easily-influenced vulnerable nodes u (having φ ≤ 1/deg(u), where deg is the degree of u) suddenly arises at the critical connectivity [40].
We observe that cascades in the LTM compute monotone Boolean functions (the number of true outputs cannot decrease in the number of true inputs) at each node on input perturbation patterns [43].In our numerical experiments, we create the LTM as above, but choose input seed nodes a and b (for k = 2 inputs) as the only possible loci of initial perturbation.In one trial, we create a network, freezing network edges and thresholds across all possible input patterns [Table 1, cols.a, b].For each input pattern we reset non-seed nodes to unlabelled, set seeds according to inputs, and run the cascade.We then identify the function computed by each node (f 0 , ..., f 15 ) [Table 1, cols.0-15].inputs functions a b 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 1 1 1 .Similar sub-networks allow us to obtain nodes computing monotone functions f 3 , f 5 , f 7 [Fig.1].These sub-networks are therefore logical automata [38,39], and we note that they form functional logic motifs in the network [24].We find that an LTM network cascade will yield a distribution of Boolean functions on its input nodes, and the possible functions computed by network nodes will partition the set of monotone Boolean functions [Fig.2] (with the exception of f 15 ).Thus the LTM carries out computational cascades on input perturbation patterns.We then obtain monotonically decreasing functions (negation of the LTM), by taking the logical complement of the original LTM labelling rule, so that some node u is instead activated when its fraction of labelled neighbors is less than its threshold L(u) deg(u) < φ .We call such nodes antagonistic, from which we can construct an antagonistic linear threshold model (ALTM).For 2 inputs, replacing u with an ALTM node ¬u, will compute f 15 , f 14 , f 12 , f 10 , and f 8 [Table 1], and the sub-networks are antagonistic versions of those for f 0 , f 1 , f 3 , f 5 , and f 7 , respectively [Fig.1].(large dot) (4) (rescaled, overlaid, both dashed).Thus (4) also well-predicts (1).Frequency therefore varies inversely with decision tree complexity C ('+').(b) Rank-ordering is more evident for k = 4 inputs, appearing as a decreasing exponential with goodness of fit r 2 = 0.88.Again, N = 10000 and z = 4. Here, Pearson correlation between p(f ) and mean frequency is 0.74.Shaded regions are one standard deviation.Probabilities have been centered and normalized.
A sufficiently large ALTM, by composing monotone decreasing functions (e.g.NAND, NOR), can undergo a cascade to compute any logical function on its nodes, forming a universal basis [29].

Statistics of attractors in the Boolean function space
We experiment first on the LTM, to investigate the observed frequency of Boolean functions in simulation.With a network having N = 10000 nodes, ensembled over 500 realizations, at mean degree z = 4 we observe that the frequency of functions is very skewed [Fig.3a].Experiments for k = 4 inputs, again for N = 10000 nodes at mean degree z = 4, ensembled over 500 realizations, also yield an approximate exponential decay of the rank ordering function [Fig.3b].
We investigate the skewed distribution of these functions by asking "What is the probability of obtaining the simplest network to compute each of these functions?".From Fig. 1, we can derive the probability of each monotone function.For example, if there is no path from seed nodes a and b to some node u we obtain f 0 , thus where p path is the probability of a path between two randomly chosen nodes.
The function f 1 requires paths from a and b to u, thus However, with percolation in mind, we observe that for large graphs, the probability of paths between n nodes approaches the probability that all n nodes belong to the giant connnected component (GCC) [25].
This gives us, again from Fig. 1, where p gcc is the probability for a random node to belong to the GCC.
From [25], we have the recursive relation where z is the mean degree.
We subsequently observe that the number of required paths from seed nodes to node u, computing monotone function f , is equal to the decision tree complexity (C), the depth of the shortest decision tree to compute f .In order for u to decide the value of a seed node, the seed's perturbation information must be transmitted along a path to u.
Taking a Boolean function's Hamming cube representation, its decision tree complexity C is complementary to the number of congruent axial reflections R along each of its axes D (details in supplemental information A.1) That is, if a Boolean function's Hamming cube is constant along an axis, it is independent of that axis, giving us In other words, the number of paths a monotone Boolean function requires is exactly the number of axial reflection asymmetries of its Hamming cube.
This allows us to relate function frequency to decision tree complexity.Recall that the critical percolation threshold in an arbitrarily large Erdos-Renyi-Gilbert graph occurs at mean degree z c = 1, a very small connectivity.Thus since p ∼ zc N , p c 1. Therefore, the network will be be tree-like, since the clustering coefficient C clus ∝ p [25].In a tree, the number of nodes is one more than the number of edges N = |E| + 1.Thus, as p → p c , Indeed it appears that (4) is highly correlated with the probabilities derived from logic motifs (1), and that observed function frequency is proportional to (4) as well [Fig.3a], having a Pearson correlation of approximately 1.0 for k = 2, and 0.74 for k = 4.This also shows, due to (4) an inverse rank ordering relation between frequency and decision-tree complexity, appearing as a decreasing exponential in frequency.Given that, as mentioned in the introduction, there is a increasing exponential distribution of decision tree complexity in the truth table of all Boolean functions, this result is especially surprising.

Function Distribution with Antagonism
A similar simulation, having N = 10000 nodes, k = 2 inputs, ensembled over 500 realizations in a range of mean degree values z and fraction of antagonistic nodes θ ∈ {0, 1 6 , 2 6 , ...1}, reveals a sudden increase in the number of unique non-zero functions vs. both z and θ [Fig.4a].The number of unique functions is maximized over several orders of magnitude near criticality, for z ∈ [2 3 , 2 10 ], and θ = 1/3.Observing that antagonism and inhibition are interchangeable [27] (Supplemental section A.2) , this lends support to optimal information processing around 30% inhibition, found in other research [5], and why this fraction of inhibitory neurons seems prevalent biologically.
For this mix of LTM and ALTM nodes, we again observe a similar rank-ordering of functions, here at z = 64, θ = 1/3, and that, as in the LTM, frequency is again proportional to probability derived from function complexity [Fig.4b], having a Pearson correlation of 0.91.We note, however, that (3) under-estimates the number of paths required for non-monotone functions.For example, f 6 (XOR) requires 4 paths between 5 nodes, all of which must be in the GCC [Fig.5], so that p(f 6 ) ∝ p 5 gcc .However, this function's decision tree complexity C = 2, predicting by (4) that p(f 6 ) ∝ p 3 gcc .Therefore a more informative complexity measure is needed for non-monotone functions.

Discussion
As indicated in the title, we see the main result of interest as the spontaneous emergence of complex logic functions in the minimally-constrained random threshold networks.This then implies that many physical, biological, or other systems are able to perform such computation by ubiquitous avalanches or cascades.
We note that this result also begins to give us an explanation of the criticality hypothesis vis-à-vis neuroscience [11,23,35].That is, at the critical threshold, with the emergence of the giant component, the number of unique functions spontaneously increases.Along with that comes an increase in the number of complex functions.As neuronal networks need to compute integrative complex functions on sensory information, or on information passed between modular areas in the brain, the utility of this complexity is self-evident [32].We note that in computational neuroscience, there is also discussion of the integration of information and complexity or consciousness [22,36].These motifs therefore give us a starting point for the relationship between structure and function as well.1] in the ALTM at random node u, on seed nodes a, b.Dashed lines represent paths, and dashed nodes are antagonistic.Functions f 13 , f 11 , and f 9 are negations of these, respectively, so have very similar networks, negating each node.Also, the present work connects to machine-or statistical-learning, where in classification, Boolean functions are computed on high-dimensional data.Until now, however, despite their ubiquity in nature, neither criticality nor cascades have played a large role in machine learning as a design paradigm or analytical framework [27].We see this as a large potential opportunity to improve deep learning methods.
The spontaneous emergence of complex computation is an example of a symmetry breaking phase transition, as the giant connected component (spanning cluster) comes into existence at the critical connectivity [1,17].We conjecture that we are witnessing how complexity of functionality results from symmetry breaking in systems [1].This complexity takes on a distribution that reflects a hierarchy in an exponential rank-ordering law.
We also see that, from a larger theoretical perspective, the confluence of cascades (percolation branching processes) and information processing by Boolean logic stands at the intersection between several very large and highly developed areas of research -percolation-and computational automata-theory [6,39].
The specific mechanism of the logical automata realized by logic motifs extends previous work about network motifs and their function, mainly in the genetic domain [24], into many other areas, again due to the ubiquity of cascades in threshold networks.
The observance of logic motifs as automata also allows us to change our perspective on network percolation.In the past, we saw it perhaps only in terms of connected component size distribution.Now, however, we may view these components as a zoo or library of functions, available to the network by connection, much as importing a function occurs in programming languages.We note that the scale invariance at criticality may exist at the Pareto-optimal point between complexity and diversity.That is, there will be a small number of larger components computing complex functions, and a great number of very small, simple components having a large variety of thresholds.

Future work
In developing this work, we inevitably stumbled across an overwhelming number of ideas and directions that we can take.We can only briefly list them.
We have seen above that other complexity measures could be found for non-monotone functions, to better predict their frequency in mixed LTM/ALTM networks.We suspect that Boolean Fourier analysis would be fruitful here.We also expect that, for larger inputs, these non-monotone functions will dominate the function space, and that the Hamming cube symmetries make it possible to write a partition function for them.Along with this, it should be possible to predict more exact probabilities of functions, which depend on the occurrence of cascades being blocked, and of nodes inheriting their neighbors' complexity, among other factors.
We would also like to generalize these predictions to k 2 inputs and much larger networks (N ∼ 10 9 nodes), while understanding mechanisms and heuristics for learning by re-wiring in these large combinatorial spaces.For example, we suspect that modularity develops as a network's capacity to extract complexity from inputs is exhausted.We also suspect that function distribution can be understood in terms of multiple network density percolation thresholds, depending on function path requirements, more evident for larger inputs.
Furthermore, we intend to study the relation between function and network symmetry in the context of symmetry breaking.We conjecture, for example, that there is a conservation law of complexity or information, meaning that what we call computation comes at the expense of lost information, rendering the network a kind of information engine [18], whose output is computation, and that this lies at the heart of information creation.
Of course, it could also be fruitful to understand this work in terms of information processing, using measures such as transfer entropy, of increasing use in computational neuroscience and automata theory [19].Along with this we see an opportunity to formalize the criticality hypothesis in light of our results on computation.In the hypothesis, avalanche criticality (the kind of percolation seen here) and so-called edge of chaos are convolved qualitatively, by saying that information processing is optimized 'near criticality' [2,14].
We would like to research the effects of geographic energy constraints and other network topologies, found in real-world systems, on the function phase space.For example we conjecture that both modularity and layering will result from restricting geographic connection distance, with a result that complex functions appear at nodes on the surface (or interface) of networks, convenient for passing to subsequent networks.
Finally, although we have used the term computation here, it would be useful to carefully study the linear threshold model as a computing machine, especially when re-wiring, investigating its Turing completeness, run-time, and related phenomena.

Conclusion
Here we have shown that the Linear Threshold Model computes a distribution of monotone Boolean logic functions on perturbation inputs at each node in its network, and that with the introduction of antagonism (inhibition), any function can be computed.Notably, complex functions arise in an apparent exponentially decreasing rank-ordering due to their requirements for perturbation information from seed nodes, and these requirements correspond to their functional asymmetries.These asymmetries can be used to obtain their probability exponent as a function of the probability of belonging to the network's giant connected component.Finally, we observe that the number of unique functions computed by an LTM of mixed excitatory and antagonistic nodes is maximized near 1/3 antagonism, over several orders of magnitude of connectivity, coinciding with other research.
6 Supplementary Information The intuition is that if the Hamming cube of a particular function is congruent to an axial reflection, the function is independent of that axis.
Thus, paths and their resulting cascades break symmetry and create complexity in the network, realized in the function order parameters.

Figure 1 :
Figure 1: Logic motifs compute Boolean functions.The simplest LTM sub-networks are logical automata (logic motifs) and compute the monotone functions for k = 2 inputs at node u on perturbations of a and b.Dashed lines are network paths.

Figure 2 :
Figure 2: LTM nodes compute Boolean functions in computational cascades.Iterating through all possible perturbations of input seed nodes a and b, each network node must compute some Boolean function on the inputs.

Figure 4 :Figure 5 :
Figure 4: Antagonism fraction (θ) agrees with biology; non-monotone functions also predicted by path requirements.(a) For networks with N = 10000 nodes and k = 2 inputs, over 500 realizations, varying the mean degree z and fraction of antagonistic nodes θ ∈ {0, 1 6 , 2 6 , ...1}, we observe that the mean number of unique functions per network is maximized over several orders of magnitude (z ∈ [2 3 , 2 10 ]) by networks having a fraction of antagonistic nodes θ = 1 3 (triangles), coinciding with other findings [5].(b) At θ = 1 3 and z = 2 6 , we again observe a skewed frequency, and a proportional relationship between function frequency and probability due to complexity (4), having Pearson correlation of 0.91.Shaded region is one standard deviation.Probabilities have been centered and normalized.(Functions f 0 and f 15 have been removed, since in the ALTM they can occur outside of the GCC.)

6. 1 Figure 6 :Algorithm 1
Figure 6: Hamming cube representations of LTM-computable monotone Boolean functions of two variables.Line represents linear separator.Note that for monotone functions, true values must be below or to the right of false values.

Table 1 :
Truth tables for binary functions.The truth tables of all possible unique binary (k = 2) 0 (a, b) = 0 (False) is computed by a simple sub-network, where node u has no path to either seed node [Fig.1].Similarly, function f 1 (a, b) = a ∧ b (AND) is computed by u with a sub-network having paths from both seed nodes a, b, and a threshold φ > 1 2