Self-organisation of small-world networks by adaptive rewiring in response to graph diffusion

Complex networks emerging in natural and human-made systems tend to assume small-world structure. Is there a common mechanism underlying their self-organisation? Our computational simulations show that network diffusion (traffic flow or information transfer) steers network evolution towards emergence of complex network structures. The emergence is effectuated through adaptive rewiring: progressive adaptation of structure to use, creating short-cuts where network diffusion is intensive while annihilating underused connections. With adaptive rewiring as the engine of universal small-worldness, overall diffusion rate tunes the systems’ adaptation, biasing local or global connectivity patterns. Whereas the former leads to modularity, the latter provides a preferential attachment regime. As the latter sets in, the resulting small-world structures undergo a critical shift from modular (decentralised) to centralised ones. At the transition point, network structure is hierarchical, balancing modularity and centrality - a characteristic feature found in, for instance, the human brain.

consider different proportions of adaptive and random rewirings. In contrast with the random rewirings in the Watts-Strogatz algorithm, here, they have the function of perturbing possible equilibrium network states, akin to the Boltzmann machine 16 . In this sense, the perturbed system can be regarded as an open system according to the criteria of thermodynamics.
In adaptive networks, changes to the structure generally occur at a slower rate than the network dynamics. Here, the proportion of these two rates is expressed by what we call the diffusion rate (the elapsed forward time in the network diffusion process before changes in the network structure). Low diffusion rates bias adaptive rewiring to local connectivity structures; high diffusion rates to global structures. In the latter case adaptive rewiring approaches a process of preferential attachment 17,18 .
We will show that with progressive adaptive rewiring, SWNs always emerge from initially random networks for all nonzero diffusion rates and for almost any proportion of adaptive rewirings. Depending on diffusion rate, modular or centralised SWN structures emerge. Moreover, at the critical point of phase transition, there exists a network structure in which the two opposing properties of modularity and centrality are balanced. This characteristic is observed, for instance, in the human brain [19][20][21] . We call such a structure hierarchical. In sum, adaptation to network diffusion represents a universal mechanism for the self-organisation of a family of SWNs, including modular, centralised, and hierarchical ones.

Results
For each pair (τ, p), where τ is a heat kernel parameter, and p is the rewiring probability, 100 independently evolved networks are obtained by our adaptive rewiring algorithm (see Methods). Resulting networks are described according to measures of small-world structure, modularity, and centrality. Where appropriate, the average of such measures is taken.
Small-World Structure. The small-worldness index S provides a canonical measure of the degree to which a network is small-world. Here, we take a slightly modified version, in which the normalised clustering coefficient (C) is multiplied by the normalised global efficiency (E) 8 , such that S = (C/C r ) × (E/E r ), where C r and E r are measures of C and E for an equivalent Erdös-Rényi (ER) random network 22 . In doing so, S is also defined on disconnected networks.
For random networks, S ≈ 1 and so the greater the (positive) deviation of S from one, the greater the degree of small-worldness. For comparison, we include the average small-worldness values for networks constructed by the Watts-Strogatz algorithm (100 independently constructed networks for each p = 0, 1/500, 2/500, …, 1).
In Fig. 1a we observe the average small-worldness index S as a function of random rewiring probability p. Unless stated otherwise, we consider networks arising from adaptive rewiring for random rewiring probability p ∈ {0, 1/30, …, 29/30, 1}. A striking result is that SWN emergence is observed for all sample values of τ nonzero, no matter how small or large. Moreover, for all nonzero τ that have been explored a greater maximum small-worldness is achieved than with the Watts-Strogatz algorithm.
The degree of network adaptation to network diffusion, 1 − p, for which maximum small-worldness is obtained depends on the rate of diffusion τ. The values of τ are taken from the set {0, 10 −15 , 1, 8, 10 15 }. For the sake of convenience, we denote 10 −15 = ε, 10 15 = δ. For τ = 0, adaptive rewiring is a random process; emergent networks thus reflect those of the initial ER ones. For τ ∈ {8, δ}, the heat kernel reflects the degree distribution, and thus adaptive rewiring approaches a process of preferential attachment (see Methods). For maximum small-worldness, adaptive rewiring in response to local diffusion, for τ ∈ {ε, 1}, requires small p, i.e. small degree of random rewiring, while more global diffusion -preferential attachment -, for τ ∈ {8, δ}, requires larger p.
Interestingly, in Supplementary Fig. S1, for τ ∈ {8, δ} and p large, the network achieves an even greater efficiency than an equivalent random ER network, i.e. it is more well integrated. Modular Structure. The modularity index Q is an optimised statistic of network partitioning into non-overlapping communities. The value Q is calculated as the proportion of intra-modular connections minus the expected proportion of inter-modular connections for an equivalent ER random network under the same community structure 23,24 .
In Fig. 1b we observe the average modularity index Q as a function of random rewiring probability p. We observe that modularity can be controlled by choice of pair (τ, p). This is discussed in further detail in the section Critical Network Structure. For τ ∈ {ε, 1} networks emerge with near-maximal degrees of modularity as p → 0. On the other hand, for τ ∈ {8, δ} and over all sampled p ∈ [0, 1] emergent networks posses no community structure, i.e. modularity is essentially switched off. In fact, we see a lesser degree of modularity than in an equivalent random ER network.
In Fig. 1c,d we present the adjacency matrices, permuted to visualise the modules, from randomly sampled networks resulting from single independent trials of the algorithm with pairs (τ, p), where p = p(τ) is chosen dependent on τ so that S is at maximum. In both Fig. 1c,d where (τ, p) = (ε, 0.1) and (τ, p) = (1, 0.3), respectively, emergent modules are relatively uniform with a dense intra-connectivity and sparse inter-connectivity.
Centralised Structure. Properties of centrality are characterised using the measures of PageRank, the degree, assortativity, and maximised coreness statistic. The latter three network measures are evaluated for pairs (τ, p) where p is chosen dependent on τ such that S is at maximum.
The PageRank centrality vector, a variant of eigenvector centrality, is defined as the stationary distribution achieved by instantiating a Markov chain on a network 25,26 , i.e., the probability distribution that a random walker is located at a given vertex. PageRank centrality takes into account global communication patterns, mediated by longer path lengths and patterns of convergence and divergence, whereas some of the more common centrality measures, such as closeness and betweenness centrality, do not 27 . We denote as π the (normalised) maximum element of the PageRank vector. We take equal initial PageRank probability, and take the damping factor (the probability of transitioning to an adjacent vertex) as 0.85 28 , i.e., the probability of random vertex hopping is 0.15. Since the PageRank vector sums to one, then its mean value is 1/n. For convenience we normalise π by this mean value.
In Fig. 2a we observe the average PageRank value π as a function of random rewiring probability p. As with modularity, we see that centrality can be controlled depending on the choice of τ. For τ ∈ {8, δ} emergent network structures exhibit values of π considerably (positively) far from that of the ER networks, indicating large deviations of the maximum component from the mean of the PageRank vector. Therefore, there exists at least one vertex having a greatly increased likelihood of being traversed in a random Markov chain than all others. On the other hand, for τ ∈ {ε, 1} and over all sampled p ∈ [0, 1], emergent networks posses no such central vertices, i.e. centrality is tuned off. In fact, we see a lesser degree of centrality than in an equivalent random ER network.
In accordance with Fig. 1b, depending on the values of the pair (τ, p), emergent networks are either modular or centralised. The phase transition of network structure is discussed in the section Critical Network Structure.
In Fig. 2b we present a bar-plot of networks' degree distribution. The height of individual bars is the average number of vertices, over 100 independently evolved networks, having degree d v where d v = 1, …, 70. The degree distribution for τ ∈ {ε, 1} fits a truncated normal function, while for τ ∈ {8, δ} it fits a truncated log-normal function. Moreover, for τ ∈ {8, δ} vertices emerge having remarkably high degrees (=70).
In Fig. 2c,d we present the adjacency matrices, permuted to visualise the modules, from randomly sampled networks resulting from single independent trials of the algorithm with pairs (τ, p), where p = p(τ) is chosen dependent on τ so that the values of S are at maximum. In both Fig. 2c where (τ, p) = (8, 0.5667), and Fig. 2d where (τ, p) = (δ, 0.5667), we observe a small subset of hub vertices connecting to many peripheral vertices. . (c,d) Single trial. Example modular SWN. Adjacency matrices mapped to an n-by-n grid where rows (and columns) represent vertices and white indicates the existence of an edge. Rows and columns of adjacency matrices have been permuted to visualise the modules, in accordance with 28 Scientific RepoRts | 7: 13158 | DOI:10.1038/s41598-017-12589-9 The assortativity coefficient a describes the "assortative mixing" of vertex degrees, i.e. the preference for high-degree vertices to attach to other high-degree vertices 29 . In Table 1 row a, for τ ∈ {8, δ} and p = p(τ) such that S is at maximum, a strong negative correlation indicates that vertices of a high degree typically connect to vertices of a low degree. On the other hand, for τ ∈ {ε, 1} an approximately zero correlation indicates no preference of connections between vertices of varying degrees.
The maximised coreness statistic c measures the extent to which a network may be well-partitioned into two non-overlapping groups of vertices, a core and a periphery group 23,30 . In Table 1 row c, for τ ∈ {8, δ} and p = p(τ) such that S is at maximum, values close to one indicate that the network may be well-partitioned into non-overlapping groups of core and peripheral vertices. On the other hand, for τ ∈ {ε, 1} values close to zero indicate no such core-periphery partition.  In sum, we note that for τ ∈ {8, δ}, and p = p(τ) such that S is at maximum, networks emerge as centralised, with a strong core, and that those core vertices connect to a high number of peripheral vertices. On the other hand, for τ ∈ {ε, 1}, and all sampled p ∈ [0, 1], networks exhibit none of these properties.
Critical Network Structure. We consider the transition between modularity and centrality, and show that at the phase transition of network structure, the two seemingly opposing properties are reconciled. Properties of modularity are characterised by Q while properties of centrality are characterised by π.
In Fig. 3a,b, for parameters τ ∈ {4.50, 4.55, …, 5.45, 5.50}, and p ∈ {0, 1/30, …, 29/30, 1}, we present Q (Fig. 3a), and π (Fig. 3b), averaged over 100 independently evolved networks. In the domain (τ, p) there is a broad region of high modularity where both τ and p are low, and a broad region of high centrality in the remainder. Where the domain of modularity ends, the domain of centrality begins; the system exhibits a critical transition from modular (decentralised) to centralised structure as a function of the pair (τ, p). The phase transition region between the two is relatively sharp with respect to both τ and p.
In Fig. 3c, we fix τ = 5 and take Q and π as functions of p ∈ {0.400, 0.402, …, 0.598, 0.600}, averaged over 100 independently evolved networks. It is clear that modularity and centrality are opposing, however, at the boundary of modularity and centrality, where they intersect at around p = 0.52, there is a small domain of p for which networks are a blend of both modular and central structure: each of Q and π are considerably large, indicating the presence of both network structures. Furthermore, the value of small-worldness for pair (τ, p) = (5, 0.522) is S = 5.32, indicating the network is also strongly small-world.
In Fig. 3d we present the adjacency matrix, permuted to visualise the modules, resulting from one trial of the algorithm with (τ, p) = (5, 0.522). We observe a competition between modular and centralised structure; . Along the vertical axis are Q the modularity index (left, blue), and π the maximum element of PageRank vector normalised by its mean value (right, red). (d) Single trial. Example critical SWN. Adjacency matrix mapped to an n-by-n grid where rows (and columns) represent vertices and white indicates the existence of an edge. Rows and columns of adjacency matrices have been permuted to visualise the modules, in accordance with 28 . Pair τ = . p ( , ) (5, 0 522). the simultaneous existence of densely connected communities (decentralised) and a core of high degree vertices connecting to many low degree peripheral vertices (centralised). In Supplementary Fig. S2a-d we present four additional randomly sampled networks resulting from single independent trials of the algorithm with (τ, p) = (5, 0.522). These additional figures support the notion that centrality and modularity are opposing, that at the point of phase transition they are reconciled, and that this is critical, i.e. they are competitive. The adjacency matrices exhibit some degree of both centrality and modularity: emergent networks may appear as more centralised (Fig. S2a-b), or more modular (Fig. S2c), or a blend of the two (Fig. S2d).

Conclusion and Discussion
Small-world structure offers optimal efficiency in network communication 8 ; and has been shown to facilitate synchronisation in a range of oscillator networks [31][32][33] . Here, we studied whether a simple generic mechanism could be responsible for their emergence.
We proposed a mechanism of network self-organisation that relies on ongoing network diffusion; over time, the network is rewired adaptively, rendering it conform to the patterns of network diffusion. With some probability p, the adaptive rewiring process is perturbed by random rewiring. Small-world structure emerged for almost any proportion of random rewiring, moreover, networks reached higher degrees of SWN structure than those in the Watts and Strogatz algorithm 9 .
Patterns of network diffusion may be biased by local or global connectivity structures using the diffusion rate τ. For all (positive) nonzero diffusion rates SWN structure emerges; for small τ, SWNs are modular (decentralised), whereas for large τ, SWNs are centralised. For the latter, adaptive rewiring approaches a process of preferential attachment.
Modularity versus centrality constitutes an important dimension in the characterisation of networks in the human brain, where they play a role in terms of (structural and functional) segregation and integration 34,35 .
For intermediate values of τ and p there is a critical transition point at which network structures emerge that blend modularity and centrality. We may call these "hierarchical" 36 . Such networks are desirable for natural information processing systems like the human brain, in which a core of centralised components represents a global workspace and the decentralised modules represent autonomous client systems [19][20][21] . The criticality of these architectures renders them all but robust. At the level of the neuro-anatomy of the brain, it would probably involve dynamic maintenance to keep these architectures at the critical point. As a property of functional architecture, the criticality would render cognition extremely flexible, enabling rapid switching between centralised and modular processes 37 .

Methods
Here we will provide a formal definition of network diffusion, an algorithm for adaptive rewiring, and a description of a set of computational simulations to demonstrate the role of adaptive rewiring in the generation of small-world networks. The MATLAB code of the algorithm is included in the Supplementary materials.

Notation.
In what follows we consider graphs that are undirected. A graph is an ordered pair G = (V, E) where V is the set of vertices and E is a subset of V × V called the edges. If X is a finite set, then |X| denotes its cardinality. The total number of vertices and edges in the graph are |V| = n and |E| = m, respectively. Two vertices u, v ∈ V are called adjacent if (u, v) ∈ E. For an n 1 × n 2 matrix B, B ij corresponds to the entry in the i-th row and j-th column, where i j , 0 ∈  , and ≤ i n 1 , ≤ j n 2 . The adjacency matrix of a simple graph G is a square × n n matrix A with entries = ∈ a u v E 1 if ( , ) uv , and = a 0 uv otherwise. For undirected graphs A is symmetric. It is typically the case that = a 0 uu , i.e. no self-loops. The degree d v of a vertex v is the number of vertices adjacent to vertex v: . The matrix D is the diagonal matrix of degrees having entries = D d uv u if u v = and 0 otherwise. For a given set of n vertices V the complete graph is denoted as K n and its edge set is denoted as E K n . The compliment of an edge set E, denoted as Network Diffusion. The Laplacian matrix of the graph G is = − L D A. The normalised Laplacian matrix, , is regarded as more appropriate for dealing with irregular graphs, All eigenvalues of  are real (since  is symmetric real) and confined to the interval [0, 2], in accordance with Gershgorin circle theorem 38 , and relate well to other graph invariants, such as random walks (or Markov chains), in a way that the eigenvalues of the Laplacian matrix L and adjacency matrices often fail to do 39 . Let i λ denote the eigenvalues of  with eigenvectors v i , and ω i the eigenvalues of the corresponding Markov process M with eigenvectors u i . Then, 1 Whereas  incorporates information of the local connectivity of vertices, the introduction of a graph kernel provides a global connectivity metric. Physical processes such as diffusion suggest a natural way of constructing a kernel from such local information 15 .
Network diffusion is formally represented by the Exponential Heat Kernel of the graph (cf. Theorem 10.11 in 39 ).
Scientific RepoRts | 7: 13158 | DOI:10.1038/s41598-017-12589-9 Definition 1(Exponential heat kernel) Let  be the normalised Laplacian matrix for an n n × real symmetric matrix and ≥ t 0. The exponential heat kernel of , denoted by h t ( ), is the symmetric and positive definite n n × matrix, In particular h I (0) = , the identity matrix. The matrix exponential is a weighted sum of walks 40 in Equation (2) allow for biasing of path lengths in the construction of h t ( ), where for small t shorter paths carry greater weight and longer paths carry lesser weight, and vice versa. In our simulations we use the parameter τ = t.
The matrix h t ( ) as t → ∞ can be expressed by the leading eigenvector associated with the zero eigenvalue of . Since  is real and symmetric, there exists an orthonormal matrix Q such that = Λ −  Q Q 1 where Q is the matrix of eigenvectors and Λ is the diagonal matrix of eigenvalues. It is easily shown that substitution of this eigendecomposition into the Taylor expansion yields , the nearest integer, where ρ = log n n ( ) (natural logarithm), i.e. twice the critical connection density for which a random Erdös-Rényi (ER) graph is connected with probability one 41,42 .
We consider self-organisation starting from a random network. The network is progressively rewired, with probability p at random and with probability p 1 − according to the current network diffusion. The process can be described in algorithmic form: Step 0: Generate an undirected random graph G of the Erdös-Rényi type. Begin with the graph = G V E ( , ) such that V n | | = and | | = E 0. Select uniformly at random the pair u v , from the set where h t ( ) uv is the u v , entry in matrix h t ( ). In case of ties u u , 1 2 are chosen arbitrarily.
Step 3: Repeat from Step 1 until k edge rewirings have been made. All simulations were performed using MATLAB R2014. In Step 3 we take = k m 4 ; simulations without upper limit on k show sufficient convergence after only m rewirings. We simulate 100 independent trials for each pair p ( , ) τ . In analysing the networks generated by the algorithm all measures used are provided by the Brain Connectivity Toolbox 28 . Note that for τ 1  the heat kernel approaches the matrix D D 1 1 1/2 T 1/2 T and so rewiring biases toward high degree vertices, hence, adaptive rewiring approaches a process of preferential attachment 17,18 .