# Stationary log-normal distribution of weights stems from spontaneous ordering in adaptive node networks

## Introduction

The brain is one of the most complex adaptive networks, where learning occurs by modifying the link weights1. This type of biological strategy stimulated theory and application of machine learning algorithms2,3,4 as well as recent deep learning achievements5,6,7,8. Accumulated experimental evidence indicate that neural network weights follow a wide distribution which is approximated by a log-normal distribution9,10, however, the underlying mechanism for its origination and stability is unclear11. Specifically, it is valuable to understand whether such a wide distribution of network weights, characterized by a small fraction of strong links, is a spontaneous outcome of a random stochastic process, or alternatively it is directed by a meaningful learning activity12,13,14,15,16.

## Results

### The model of adaptive nodes

In order to study the effect of nodal adaptation we modeled a node with K terminals (neuronal dendritic trees)13. Each terminal collects its many incoming signals via N/K time-independent link weights, Wm, where N stands for the total number of input units (Fig. 1a). The nodal terminal is modeled as a threshold element based on a leaky integrate and fire neuron28

$$\frac{d{V}_{i}}{dt}=-\,\frac{{V}_{i}-{V}_{st}}{{\rm T}}+{J}_{i}\cdot \sum _{m=\frac{N}{k}(i-1)+1}^{\frac{N}{K}\cdot i}\,{W}_{m}\sum _{n}\,\delta (t-({t}_{m}(n)+{\tau }_{m}))$$
(1)

where Vi(t) is the scaled voltage of the ith terminal, T = 20 ms is the membrane time constant, Vst = 0 stands for the scaled stable (resting) membrane potential (Methods) and Ji stands for the ith terminal weight. Wm and τm stand for the mth link weight and delay, respectively, and the summation over n sums all input timings arriving at the mth link, tm(n). A spike occurs when the voltage of one of the terminals crosses the threshold, Vi ≥ 1. After a spike is generated the terminal’s voltage is set to Vst, and a refractory period of 2 ms occurs, where no evoked spikes are possible by any one of the terminals (Methods). Note that in order to achieve a threshold crossing, typically many inputs have to arrive to a neuron in temporal proximity via one of its terminals29.

For every pair of a sub-threshold stimulation via terminal i and an evoked spike from a different terminal, an adaptation step occurs for Ji:

$${J}_{i}^{+}={J}_{i}\cdot (1+{\delta }_{i})+{\eta }_{i}$$
(2)

where δi and ηi stand for the relative change and an additive random white noise, respectively. The relative change, δ, is the same as the adaptation rule used for link weights and follows the modified Hebbian learning rule12,19,20 (blue-line in Fig. 1b). This relative change is a function of the time-lag between a sub-threshold stimulation and an evoked spike, tsub − tspike, originated from a different terminal. Specifically, the relative change decays exponentially to zero for large time-lags and follows its sign (Fig. 1b). The qualitative reported results were also found to be robust to a simplified two-level adaptation rule (dashed blue-line in Figs 1b and S1).

Following recent experimental evidence, an additional ingredient is introduced, where a threshold crossing by one of the terminals now generates an evoked spike with probability30

$${{\rm{P}}}_{{\rm{spike}}}={\rm{\Delta }}t\cdot {{\rm{f}}}_{{\rm{c}}}$$
(3)

where Δt is the time-lag from the last threshold crossing, and fc reflects the maximal stationary firing frequency of the neuronal terminal, e.g. 15 Hz. Note that for high stimulation frequencies (>fc) the nodal firing rate is saturated at K · fc, and for low stimulation frequencies (<fc) response failures practically vanish (Fig. 1c–e, Methods).

### Feedforward networks

When the input units are simultaneously stimulated with a common stimulation frequency, three possible types of dynamics for Ji are observed, according to the initial weights and delays, Wm and τm. This is exemplified for a network with an output node with three terminals and 15 input units, without noise (K = 3, N = 15, ηi = 0) and 5 Hz stimulation frequency (Fig. 1c). In the first type of dynamics, all Ji converge to fixed values (Fig. 1c1). The second type is characterized by fast oscillations with relatively small fluctuations of each Ji around an average value (Fig. 1c2), and their periods are below few seconds, typically sub-seconds. The third type is characterized by slow oscillations with periods which can exceed hundreds of seconds (Fig. 1c3) and exists for K > 2 only18. They are accompanied by large variations in the amplitudes of Ji and consist of long plateaus at extreme values. The fraction of initial time-independent weights and delays, Wm and τm, leading to oscillations was estimated using random sampling (Methods) for K = 3 and varies N (Fig. 1d). It increases from ~0.4 for N = 9 to ~0.8 for N = 27, indicating that the phenomenon of oscillations is a common scenario in adaptive node networks. Note that in the traditional adaptive link scenario all Wm converge either to zero or to above-threshold (similar to Fig. 1c1) and oscillations are excluded18.

The robustness of the fast and slow oscillations to small stochastic noise, η in eq. (2), was examined using the Fourier analysis of the adaptive weights (Fig. 2). For fast oscillations, the noise does not affect the periods of oscillations, but only slightly affects their Fourier amplitudes (Fig. 2a,c). In contrast, for slow oscillations the noise, η, affects the periodicity which is typically shortened (Fig. 2d). This trend is a result of the noise which prevents the plateau at small values of terminal weights, J, for long periods (Fig. 2b).

The number of different stationary firing patterns, attractors, in the large N limit, can be bounded from below, for given K and delays τm (Fig. 1a). Assuming that for each terminal there are N0 < N/K non-zero inputs, the number of different attractors, A(N0), is estimated using an exhaustive random sampling for Wm (Methods). A lower bound for the number of dynamical attractors for the entire network with N non-zero inputs, scales as

$$A({N}_{0})\cdot {(\begin{array}{c}N/K\\ {N}_{0}\end{array})}^{K}$$
(4)

since for each one of the K terminals one can select a subset of N0 inputs among N/K, with repeated above-threshold stimulated inputs. Each one of these choices results in A(N0) different attractors as a result of different delays. For K = 3 and N0 = 3, for instance, the number of different attractors was estimated as A(3) ~ 1500 (Fig. 1e and Methods), indicating that eq. (4) scales as N9. For N0 = O(N) even with small K, e.g. K = 2, the number of different attractors is expected to scale exponentially with N. This type of input scenarios is expected in biological realizations where a neuron has only a few terminals (dendritic trees)13 and many thousands of links (synapses)29, however at each firing event only a small fraction of the input links is effectively involved29. Results indicate powerful computational capabilities under biological realizations with a huge number of attractors even for such a simple feedforward network with only finite number of adaptive terminals.

### Restoring force via spontaneous spike ordering

The understanding of the underlying mechanism for the emergence of a stationary log-normal distribution requires the examination of a much simpler system imitating the network activity. We examine the dynamics of an adaptive node consisting of two terminals (K = 2) and 60 inputs, where each one of the inputs is stimulated at random and on the average at 30 Hz (Fig. 4a and Methods). The distribution of the effective weights is indeed log-normal distribution (Fig. 4b) and is practically identical to the distribution obtained in the network dynamics (Fig. 3b3).

The emergence of a log-normal distribution is natural, since multiple adaptation steps of a weight, eq. (2), result in a multiplicative process, however, its stationary shape requires an explanation. The relative change in links with a given weight J, averaged over such instances during the stationary dynamics, revealed a stochastic restoring force towards the most probable J (Figs. 4c and S5). The origin of this restoring force is the emergence of spontaneous temporal ordering of pairs of spikes for a given adaptive node during the dynamical process. For simplicity we assume K = 2 and we concentrate on momentary events of the dynamics where the adaptive node has simultaneously one weak, JW, and one strong, JS, terminals, relative to the most probable values of the log-normal distribution (Fig. 4b). Next we estimate in simulations the probability of occurrence of the following two types of pairs of spikes in a bounded time window, e.g. 5 ms (Fig. 4d1). The first type, PSW, stands for a spike generated by JS prior to a spike generated by JW, and vice versa for the second type of pairs, PWS. Simulation results indicate

$${P}_{SW} > {P}_{WS}$$
(5)

where typically PSW is several times greater than PWS, and PSW constitutes a few percent of all pairs of events (Fig. 5a). This preference, eq. (5), is exemplified using the following self-consistent argument assuming that initially the weak and the strong spikes occur almost simultaneously (Fig. 4d2). Since the input units of both terminals are stimulated at the same rate, the threshold crossing of JS occurs before JW which are both accompanied by response failures, eq. (3). Consequently, the spike generated by JS occurs prior to the spike generated by JW (Fig. 4d2). Note that the adaptation steps, eq. (2), change the two terminals from remaining strong and weak, however on the average there is a stochastic tendency for the strong spike to evoke prior the weak one.

## Discussion

The mechanism of the restoring force is a direct consequence of the spontaneous temporal ordering (Fig. 4d). A terminal that evoked a spike resets its membrane potential which rapidly increases by many sub-threshold stimulations. The threshold crossing is achieved again in several ms and is followed by many response failures. Hence, the strong terminal generates most of its sub-threshold stimulations prior to the following weak spike, whereas all the sub-threshold stimulations of the weak terminal appear after the strong spike (Fig. 4e). Following the adaptation rule, eq. (2), the strong terminal is decreased, ΔJS < 0, whereas the weak terminal is enhanced, ΔJW > 0 (Fig. 4e) and the restoring force is created.

A necessary ingredient in the formation of the mechanism to achieve a stationary log-normal distribution is that the majority of the sub-threshold stimulations of the strong spike occur prior to the weak one (Fig. 4e). For short refractory periods the time-lag between a pair of strong-weak spikes decreases, since the minimal time-lag between consecutive spikes decreases. Indeed, for short enough refractory periods and certainly for a vanishing one, the log-normal distribution was found in simulations to be unstable, where all effective weights are asymptotically above threshold, since both ΔJs and ΔJw are now positive (Fig. 5). The log-normal distribution of link weights is an emerging spontaneous feature of adaptive node networks where the essential role of the refractory period is evident. Results open the horizon to explore the possible interplay between the adaptive node rules and stationary distribution classes of the network link weights26,27.

## Methods

### Simulation dynamics

Each node is described by several independent terminals, and a node generates a spike when a terminal crosses a threshold (eqs. (1) and (3)). The voltage of each terminal is determined according to the leaky integrate and fire model as described in eq. (1), where T = 20 ms. For simplicity, we scale the equation such that Vth = 1, Vst = 0, consequently, V ≥ 1 is above threshold and V < 1 is below threshold. Nevertheless, results remain the same for both the scaled and unscaled equations, e.g. Vst = −70 mV and Vth = −54 mV. The initial voltage for each terminal is V(t=0) = 0 and Ji = 1. The adaptation is done according to eq. (2), where $$\delta =A\cdot \exp (-\,\frac{{\rm{\Delta }}t}{15})\cdot sign({\rm{\Delta }}t)$$, and Δt stands for the time between a sub-threshold stimulation and a spike, up to a cutoff at 50 ms. The parameter η is chosen randomly in the range [−0.5, 0.5] · 10−3, and A is the adaptation step taken as 0.05, unless otherwise is stated.

### Refractory period

After a spike is generated, the terminal that evoked a spike cannot respond to other stimulations arriving in the following 2 ms. During this refractory period, all other terminals cannot evoke a spike or cross the threshold as well, but can increase their membrane potential as a result of stimulations.

### Response failure

When crossing the threshold, the terminal creates a spike with probability of Δt · fc, where Δt is the time-lag from the last threshold crossing by this terminal, and fc reflects the maximal stationary firing frequency of the terminal. In case the terminal failed to respond its voltage is set to its previous value.

### The parameters for feedforward networks

Number of terminals = 3, number of inputs per terminal = 5, refractory period = 2 ms, link weights are randomly chosen from a uniform distribution in the range [0.1, 1.1], delays (τ) are randomly chosen from a uniform distribution in the range [1, 150] ms (Fig. 1c). Links are ordered with increasing delays, except the maximal delay which is linked to the first terminal (closing a loop). The dynamics is given by eq. (1) and is numerically solved with a time resolution of 1 ms. Initial terminal weights, Ji, are set to 1. We assume large fc, hence response failures are excluded. In addition, in Fig. 1 η = 0. The robustness of the results to noise, η > 0, is demonstrated in Fig. 2. The upper bound for the terminal weights is Ji = 10 and the lower bound is Ji = 10−6.

### The fraction of oscillations

The fraction of each type of dynamics was estimated using 20,000 random initial conditions for the delays, τm, and the weights, Wm, (defined above) for each number of inputs per terminal (Fig. 1d).

### The number of attractors

Number of terminals = 3, number of inputs per terminal = 3. The average and the standard deviation of each point was obtained from 10–18 samples, each sample is with a fixed set of N delays (τ) and the initial conditions for the N weights are randomly sampled. In order to determine if two initial conditions lead to the same attractor, we compared the firing rate from each input link. We calculated the number of firing events for each link, and compared it with the same link from a simulation with different initial weights. If for all of the input links the difference is less than 2%, we determine that these different initial weights lead to the same attractor. For links that have low firing rates, the comparison was made between non-firing events. We obtained very similar results when the comparison was done between the firing timings for each link, instead of number of firing events (Fig. 1e).

### Recurrent network parameters

Number of terminals = 3, number of inputs per terminal = 60, fc = 15 Hz, refractory period = 2 ms, adaptation step A = 0.05. The dynamics of each node is given by eq. (1) and is solved with a time resolution of 0.1 ms. Link weights are randomly chosen from a uniform distribution in the range [0.1, 0.2], delays are randomly chosen from a normal distribution with a mean of 100 ms and STD of 2 ms, initial terminal weights, Ji, are set to 1. In order to initiate the network simulation, 0.4 of the nodes in the network are stimulated above-threshold. Spontaneous noise, external above-threshold stimulations, is randomly added with an average frequency of 0.01 Hz per node (Fig. 3).

The ratio max/min of each weight (Fig. 3c) was calculated for the last 2 seconds of the simulation, out of 50 seconds for adaptive nodes and 350 seconds for adaptive links (same running time as for Fig. 3b). For networks of adaptive links (Fig. 3c1) a fraction of the weights vanishes, hence the upper bound of the histogram is set to 60. The histograms (Fig. 3c2 and c3) constitute of 100 bins each. For visibility, points in the raster plots (Fig. 3d1 and d2) were 50% diluted.

### The parameters for the feedforward network with random inputs

Number of terminals = 2, number of inputs per terminal = 60, fc = 15 Hz, refractory period = 2 ms, adaptation step A = 0.1, link weights are randomly chosen from a uniform distribution in the range [0.1, 0.2], initial terminal weights are set to 1 (Fig. 4). The dynamics is given by eq. (1) and is solved with a time resolution of 0.1 ms. Running time = 2500 seconds, where a transient of 200 seconds is excluded in the measurements. Strong and weak weights (Fig. 4b–e) were chosen such that 50% of the weights were between the maximum of the weak and the minimum of the strong effective weights, and in addition for each limit (maximum and minimum) 1% of the extreme weights were excluded. The force was calculated in bin size of 0.05 and defined as $$\frac{({J}^{+}-J)}{\langle J\rangle }$$, where 〈J〉 stands for the average bin value. The error bar (Fig. 4c) stands for the standard deviation of the adaptation steps belonging to each bin.

## References

1. 1.

Hebb, D. O. The Organization of Behavior: A Neuropsychological Theory (Wiley & Sons, New York, 1949).

2. 2.

Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015).

3. 3.

Watkin, T. L., Rau, A. & Biehl, M. The statistical mechanics of learning a rule. Reviews of Modern Physics 65, 499 (1993).

4. 4.

Engel, A. & Van den Broeck, C. Statistical mechanics of learning. (Cambridge University Press, 2001).

5. 5.

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

6. 6.

Buchanan, M. Depths of learning. Nat Phys 11, 798 (2015).

7. 7.

Zdeborová, L. Machine learning: New tool in the box. Nat Phys 13, 420–421 (2017).

8. 8.

Li, B. & Saad, D. Exploring the Function Space of Deep-Learning Machines. Physical Review Letters 120, 248301 (2018).

9. 9.

Song, S., Sjöström, P. J., Reigl, M., Nelson, S. & Chklovskii, D. B. Highly nonrandom features of synaptic connectivity in local cortical circuits. Plos biology 3, e68 (2005).

10. 10.

Loewenstein, Y., Kuras, A. & Rumpel, S. Multiplicative dynamics underlie the emergence of the log-normal distribution of spine sizes in the neocortex in vivo. Journal of Neuroscience 31, 9481–9488 (2011).

11. 11.

Buzsáki, G. & Mizuseki, K. The log-dynamic brain: how skewed distributions affect network operations. Nature Reviews Neuroscience 15, 264 (2014).

12. 12.

Park, Y., Choi, W. & Paik, S.-B. Symmetry of learning rate in synaptic plasticity modulates formation of flexible and stable memories. Scientific reports 7, 5671 (2017).

13. 13.

Spruston, N. Pyramidal neurons: dendritic structure and synaptic integration. Nature Reviews Neuroscience 9, 206 (2008).

14. 14.

Del Ferraro, G. et al. Finding influential nodes for integration in brain networks using optimal percolation theory. Nature Communications 9, 2274 (2018).

15. 15.

Bashan, A., Bartsch, R. P., Kantelhardt, J. W., Havlin, S. & Ivanov, P. C. Network physiology reveals relations between network topology and physiological function. Nature communications 3, 702 (2012).

16. 16.

Liu, K. K., Bartsch, R. P., Lin, A., Mantegna, R. N. & Ivanov, P. C. Plasticity of brain wave network interactions and evolution across physiologic states. Frontiers in neural circuits 9, 62 (2015).

17. 17.

Sardi, S., Vardi, R., Sheinin, A., Goldental, A. & Kanter, I. New Types of Experiments Reveal that a Neuron Functions as Multiple Independent Threshold Units. Scientific reports 7, 18036 (2017).

18. 18.

19. 19.

Dan, Y. & Poo, M.-M. Spike timing-dependent plasticity: from synapse to perception. Physiological reviews 86, 1033–1048 (2006).

20. 20.

Cassenaer, S. & Laurent, G. Conditional modulation of spike-timing-dependent plasticity for olfactory learning. Nature 482, 47 (2012).

21. 21.

Cossell, L. et al. Functional organization of excitatory synaptic strength in primary visual cortex. Nature 518, 399 (2015).

22. 22.

Ottino-Loffler, B., Scott, J. G. & Strogatz, S. H. Evolutionary dynamics of incubation periods. eLife 6 (2017).

23. 23.

Levi, F. Applied mathematics: The discovery of skewness. Nature Physics 14, 108 (2018).

24. 24.

Opper, M. Learning in neural networks: Solvable dynamics. EPL (Europhysics Letters) 8, 389 (1989).

25. 25.

Li, A., Cornelius, S. P., Liu, Y.-Y., Wang, L. & Barabási, A.-L. The fundamental advantages of temporal networks. Science 358, 1042–1046 (2017).

26. 26.

Yan, G. et al. Network control principles predict neuron function in the Caenorhabditis elegans connectome. Nature 550, 519 (2017).

27. 27.

Unicomb, S., Iñiguez, G. & Karsai, M. Threshold driven contagion on weighted networks. Scientific reports 8, 3094 (2018).

28. 28.

Brette, R. & Gerstner, W. Adaptive exponential integrate-and-fire model as an effective description of neuronal activity. Journal of neurophysiology 94, 3637–3642 (2005).

29. 29.

Abeles, M. Corticonics: Neural circuits of the cerebral cortex. (Cambridge University Press, 1991).

30. 30.

Vardi, R. et al. Neuronal response impedance mechanism implementing cooperative networks with low firing rates and μs precision. Frontiers in neural circuits 9 (2015).

31. 31.

Brama, H., Guberman, S., Abeles, M., Stern, E. & Kanter, I. Synchronization among neuronal pools without common inputs: in vivo study. Brain Structure and Function 220, 3721–3731 (2015).

## Acknowledgements

We thank Moshe Abeles for stimulating discussions. The assistance by Yael Tugendhaft is acknowledged.

## Author information

Authors

### Contributions

H.U. and S.S. performed the simulations and analyzed the data with the help of A.G. and developed the theoretical concepts under the guidance of I.K. H.U., S.S., A.G. and R.V. discussed the idea, results and commented on the manuscript. I.K. supervised all aspects of the work.

### Corresponding author

Correspondence to Ido Kanter.

## Ethics declarations

### Competing Interests

The authors declare no competing interests.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Uzan, H., Sardi, S., Goldental, A. et al. Stationary log-normal distribution of weights stems from spontaneous ordering in adaptive node networks. Sci Rep 8, 13091 (2018). https://doi.org/10.1038/s41598-018-31523-1