Uncovering the hidden structure of small-world networks

The small-world (SW) network model introduced by Watts and Strogatz has significantly influenced the study of complex systems, spurring the development of network science as an interdisciplinary field. The Newman-Watts model is widely applied to analyze SW networks by adding several randomly placed shortcuts to a regular lattice. We meticulously examine related previous works and conclude that the scaling of various pertinent quantities lacks convincing evidence. We demonstrate that the SW property primarily stems from the existence of clusters of nodes linked by shortcuts rather than just the mean number of shortcuts. Introducing the mean degree of clusters linked by shortcuts as a new key parameter resolves the scaling ambiguity, yielding a more precise characterization of the network. Our findings provide a new framework for analyzing SW networks, highlighting the significance of considering emergent structures in complex systems. We also develop a phase diagram of the crossover transition from the small to the large world, offering profound insights into the nature of complex networks and highlighting the power of emergence in shaping their behavior.


Mean distance
The mean distance in a circular network is � l� = n 2 l=1 l • n( l) ≈ n 2 1 l • n( l) • d l , where n( l) is the number of nodes at a distance l from a given node.Initially, we use the RG method and subsequently divide the network into two sub-networks.The first sub-network is composed of nodes that are not impacted by shortcuts and we refer to them as "regular nodes".The second sub-network is made up of nodes that are influenced by shortcuts, which we term "random nodes".With this approach, we compute the mean distance of the network and obtain the corresponding formula outlined in Methods: where x = nkφ is the mean number of shortcuts, and y = 2k 2 φ represents the mean degree of clusters linked by shortcuts (see Methods) .
As the value of x approaches 0, the network becomes more regular and lim x→0 h(x) = 0 .In this scenario, the mean distance of the network approaches n 4k , thereby validating our calculation.Indeed, with periodic boundary conditions on regular circular network � l� = n 4k .Figure 1a and b serve as robust validations of our methodology, providing compelling evidence for our approach.They distinctly illustrate the alignment between simulation results and (1).Notably, (1) outperforms the Newman et al.Eq. ( 2) in accuracy 10 .Yet, we note discrepancies in simulations occurring within the same region as highlighted by Newman et al.
For large-world networks where �ℓ� ∝ n , the NW scaling function appears to be universal for small values of x.However, Newman et al. have shown that their mean field solution (2) breaks down as the density of shortcuts increases 10 : Figures 1b and 2a present compelling simulation results that challenge the notion of universality in large x values, particularly in the region traditionally classified as SW.These figures clearly depict systematic deviations for varying values of k, thereby undermining the assumption of universality in this regime.This observation is critical as it suggests that the mean distance within SW networks is influenced by factors beyond just the mean number of shortcuts, denoted by x.Our analysis points towards the need to consider an alternative parameter to more accurately characterize the dynamics of SW networks.Upon the application of the RG transformation, we define y as the mean degree of clusters interconnected by shortcuts.This parameter finds a parallel in the Erdös-Rényi network model, where y corresponds to the average degree of nodes.In this analogy, the clusters in our model are akin to nodes, and the shortcuts to links in the Erdös-Rényi framework.Notably, y emerges naturally in the random component of the mean distance expression, as delineated in Eq. (1).Given its fundamental role in this crucial expression and its similarity to the Erdős-Rényi network, y will undergo meticulous evaluation to ascertain whether it truly serves as the central parameter controlling the dynamics of SW networks.
If y is significantly small, the network behaves like a large-world network and does not exhibit the SW property.Taking y ≪ 1 in (1) we get: The expression for � l� in (3) does not directly involve y, suggesting that the appropriate scaling parameter in this regime may be x.On the other hand, if we assume that n is large and y is not too small in (1), we can approximate W ln(y+1) 2 (y+1) 4p as ln ln(y+1) 2 (y+1) 4p , h(x) approaches 1, and The above equation that displays the SW phenomenon is identical to what is observed in the Erdös-Rényi model 17 .Consequently, the appropriate universal function that applies in this context is: ( (4) � l� = ln ln(y + 1) 2 + ln y+1 4p ln(y + 1) ≈ ln n ln(y + 1) .
A critical insight from our study is that the mean degree of clusters interconnected by shortcuts emerges as the sole relevant scaling parameter, a conclusion vividly illustrated in Figs. 1 and 2. This observation underscores the pivotal role of these inter-cluster connections in defining the network's characteristics.
To substantiate our insights regarding the parameters x and y , we introduce a new parameter, , defined as � = �ℓ� n f (x) .If f (x) truly represents a universal function, then should ideally equal 1.In Fig. 2, we plot as a function of both y and x , where ℓ is derived from simulations of the Newman-Watts (NW) model, and f (x) is given by (2).Our results reveal that approximates 1 when y is much less than 1.However, for larger values of y , significantly exceeds 1, indicating the inadequacy of f (x) under these conditions.This observation aligns with Newman's earlier critique regarding the limitations of f (x) as φ nears 1 10 .
Further, Fig. 2b illustrates that f (x) loses its universality beyond certain x values, which vary according to the network parameters.Consequently, this leads us to conclude that y , rather than x , functions as the actual control parameter in the NW model.Moreover, Fig. 1c provides compelling evidence supporting the significance of y in the NW network.It demonstrates that in the large world regime, a small y value fails to produce a data collapse with ln(n) , suggesting that this range is not adequately described by y .In contrast, when y is not exceedingly small, a data collapse is observed, affirming y as an accurate descriptor of the system's behavior.This finding not only corroborates our calculations but also significantly bolsters the validity of Eq. ( 4).
Another significant aspect is the necessity to express the system's extensive quantities in terms of ln(n) .This adjustment is crucial due to the SW property of the network, which creates a perception as if there are only ln(n) nodes present.This phenomenon leads to a conceptual shift from considering individual node identities to focusing on clusters.Such a perspective is instrumental in understanding and quantifying the emergence of SW behavior in these networks.By accounting for this, we can more accurately characterize the network dynamics and better understand the underpinnings of the SW phenomenon.

Transition from large to small-world
Scaling analysis of ℓ is carried out by introducing the parameter n * , which represents the size of the network when it passes from large to SW.The scaling law for ℓ can be written as 15 : with F(i ≪ 1) ∼ i and F(i ≫ 1) ∼ ln(i) , hence ℓ(n ≫ n * ) ∼ n * ln(n) .Performing extensive simulations, we determine n * by calculating the slope of the curve of ℓ as a function of ln n .Simulations are repeated for several values of φ allowing to represent n * as a function of φ (see Fig. 3).Previously, it was believed that the value of n * is inversely proportional to φ raised to the power of τ = 1 , as indicated in prior works such as those referenced in 9,15,16 .The data presented in Fig. 3a suggests that the relation between n and φ is not universal, as it is contin- gent on the values of k.However, based on ( 4) and ( 6), we can predict that if there exists a universal function of n , it must be proportional to g(y) = 1 ln(y+1) .This hypothesis is strongly supported by the findings in Fig. 3b, which show excellent agreement between simulations and g(y).Furthermore, for the SW regime, it is deduced from these findings that n * ∼ y −τ , where τ = 1 .The way n * behaves with respect to y excludes the possibility of a phase transition for all non-zero values of y, which supports the existence of a crossover region between the SW and large-world regimes 16 .Foreseeing whether a network is a small or large world is a vital element of this system.Our computations have unveiled the phase diagram of the transition, which empowers us to anticipate the nature of the network based on its parameters, and as a result, determine the transition line that distinguishes between the two regions (Fig. 4).Using the average number of shortcuts as a system control parameter would not have been viable because, as mentioned earlier, n * does not scale with it.

Conclusion
In this study, we have employed a novel application of the RG transformation to dissect the complex structure of SW networks.Our approach distinctively categorizes nodes into 'regular' and 'random' , unveiling the hidden architecture of these networks.This method underscores the concept of emergent behavior in SW networks, highlighting that the network's macroscopic properties are not merely a sum of individual nodal connections, but rather a result of intricate interactions between clusters of nodes.Our findings suggest a significant reinterpretation of the SW regime, as previously defined in the NW model.We contend that this regime might more appropriately be characterized as a large-world regime.This reevaluation stems from our analysis showing that the average number of shortcuts, traditionally used as a control parameter, leads to misleading conclusions.Instead, by focusing on the average degree of clusters linked by these shortcuts, we demonstrate a more accurate and coherent framework.This new perspective allows for an optimal alignment of system variables with network parameters, resulting in a remarkable data collapse.Additionally, we introduce a phase diagram that distinctly maps the transitional boundary between large and SW regimes.This visual representation not only solidifies our theoretical findings but also offers a practical tool for researchers in the field to better understand and navigate the complexities of SW networks.

Methods "Regular" and "random" nodes
We assume that the network can be split into two sub-networks: a regular one and a random one.Our approach is founded on this assumption.First, we study in detail the number of neighbors n ℓ located at a distance ℓ from an arbitrary node by applying the RG in real space on the network (Fig. 5a and b.As the NW model combines both regularity and randomness, we can categorize nodes into two groups: regular nodes n re (ℓ) and random nodes n ra (ℓ) based on their distance ℓ to a randomly selected root node.Regular nodes have not been impacted by the introduction of shortcuts, while random nodes have been affected by shortcuts and their distances have consequently been altered (Fig. 5c).In the RG n becomes n = n k and k becomes k = 1 (Fig. 5).Each set of k neighboring nodes is replaced by single entity named cluster.The number of clusters in the network is then n .After adding shortcuts, the probability that a cluster is randomly linked to another cluster is φ , for n ≫ kφ we have φ ≈ 2k 3 φ n .Let P re ( l) be the probability that the distance l between any cluster and the root node has not been changed after adding the shortcuts, and let P ra ( l) be the probability that the distance between any cluster and the root node becomes l after adding the shortcuts (see Fig. 5d).Reducing the distance of a given cluster to the root can be achieved by one, two or more shortcuts.We denote π (M) (i) the probability that the regular initial distance l of a cluster is not changed to a specific smaller distance i through M shortcuts.A distinction is made between the following cases:

Shortening distances via single shortcuts
Let π (1) (i) the probability that a cluster j does not change its initial distance l to the distance i ( i < l ) through a single shortcut: The preceding expressions follow from the fact that the number of possibilities (jumps) to build a path such as l = i is 4(i − 1) .In general, this value tends to be overestimated because some of the possibilities correspond to distances l < i , which have already been included in the count for shorter distances.However when the number of jumps is small compared to the size of the network, the expression 4(i − 1) is exact.
From Eq. ( 7) we deduce the probability P re ( l) that a cluster's distance l remains unchanged with a single shortcut: re ( l) can be written: where the term π (1) (1) = (1 − φ) was omitted 18 .

Shortening distances via two shortcuts
Assuming that R is the root cluster and j is any other cluster in the network, we can determine the number of possible routes between them given that there are two shortcuts connecting the two clusters.To do this, we introduce an arbitrary cluster, denoted as z, which lies between the two shortcuts.If the distance between R and j through z is i, then the number of possible routes is i − 1 .Specifically, this includes cases such as For example, if i = 4 , we have three cases: {{1, 3}, {2, 2}, {3, 1}} .The first case, {1, 3} , indicates that the distance between cluster R and intermediate cluster z is 1, and the distance between z and cluster j is 3.The particular case of {1, i − 1} can be illustrated as follows: In this case the number of possible paths between R and z is 1 because they are directly linked with a shortcut, and the number of possible paths between z and j is 4(i − 2) since the distance between them is i − 1 (see the case of a single shortcut).The probability that the distance between clusters R and j is not equal to i is then 2) .To make the computations easier, we make an assumption that the probability of all other cases {{2, i − 2}, • • • , {i − 1, 1}} is equal to the probability of the case {1, i − 1} , which is a simplified approximation similar to mean field type.As there are i − 1 such cases, the probability of a cluster not changing its initial distance to the distance i through a specific intermediate node is given by (1 − φ2 ) 4(i−2)×(i−1) .The number of possible positions of the intermediate cluster z in the network is n − 2i (Fig. 5d), then the probability that a cluster does not change its initial distance to the distance i through two shortcuts is whence the term π (2) (2) is excluded from the sum as explained in the case of a single shortcut 18 .
Finally we get: where the sum in the exponential is approximated by an integral since in a regular SW network l ∝ n ≫ 1.
Let P ra ( l) be the probability that, due to shortcuts, the distance of any given cluster to a root cluster has changed from its regular distance to a specific distance l .P ra ( l) can be written as the product of the probability that this cluster does not change its distance to another strictly less than l by the probability that the cluster changes its distance to a distance less than or equal to l: The number of regular clusters after renormalization, nre ( l) , is: since when k = 1 each cluster has two neighbors at distance l .On the other hand, the number of random clus- ters is: where n − 2 l represents the maximum number of clusters with distance (to the root cluster) bigger than l.
Then, the total number of clusters at distance ℓ is: .
(15) P re ( l) =P (1) re ( l)P (2) re ( l)P (3) re ( l) . . .In order to measure the impact of individual sub-networks, we determine the total number of clusters within each sub-network, which we refer to as Ŝre and Ŝra : which becomes for n ≫ 1: φ , we get: While φ = 2k 3 φ n and n = n k so φ n2 2 = knφ , which is none other than the mean number of shortcuts in the network.The sum of clusters in the regular sub-network is then: , and x = knφ.The number of clusters in the random sub-network is deduced from: Ŝre ≈n re (1) Ŝre ≈2 + 2 π ).
( www.nature.com/scientificreports/Since each cluster is made up with k nodes, the total number of regular nodes is S re = n(1 − h(x)) and the total number of random nodes is S ra = nh(x).
"Regular" and "random" mean distance The mean distance in the network is � l� = � lre � + � lra � , where � lre � is the mean distance in the regular sub- network, and � lra � is the mean distance in the random sub-network.� lre � is deduced from using ( 17) and ( 19) we have taking φ small and considering l = O(n) (regular network), we get then � lre � can then be written in the following form: � lra � can deduced from the maximum of nra ( l) as explained in 19 .Explicitly, we have to solve ∂ nra ( l) d l = 0 .From ( 17) and ( 18) we obtain: where u( l) = 4 φ( l − 1)( φ(n − 2 l) + 1) l−2 .When shortcuts are present, the mean distance in the network is considerably lowered, we can therefore consider n − 2 l ≈ n , then where y = φ n is the mean degree of clusters linked by shortcuts.It is worth noting that y is analogous to the mean degree of nodes in the Erdös-Rényi network.The number of random clusters becomes: then

Figure 1 .
Figure 1.Scaling of the mean distance.(a,b), Behavior of ℓ n k as a function of x for various values of k (from top to bottom k = 10, 5, 2, 1 ), network size is n = 10 6 .Each simulation is averaged over 100 achievements.In both figures (1) is the continuous line, (2) is the dashed line.The scale is semi-logarithmic in (a) and log-log in (b).(c) ℓ ln(n) as a function of y for various values of k (from top to bottom k = 1, 2, 5, 10 ), n = 10 6 , number of realizations for each simulations is 50.The scale is logarithmic.

( 6 )Figure 2 .
Figure 2. Validity of the universal function f(x).Variations of with x in (a) and with y in (b) for various values of k (from top to bottom k = 10, 5, 2, 1 ), n = 10 6 .The number of achievements for each simulation is 1000.The scale is semi-logarithmic.

Figure 3 .
Figure 3. Data collapse and scaling of n * .Scaling of n * as a function of φ (a) and as a function of y (b) for various values of k (from top to bottom k = 1, 2, 5, 10 ).The black line represents (5) multiplied by a constant.Each point is determined from the slope of ℓ as a function of ln(n) .System size varies from 1000 to 200000, the number of realizations is 300 and the scale is logarithmic.

Figure 4 .
Figure 4. Phase Diagram of the SW network.Symbols (same as in Fig.3) are at the borders where n = n * , i.e, the limits where the network passes from the large world ( ℓ ∼ n ) to the SW ( ℓ ∼ ln n).

Figure 5 .
Figure 5. Illustrations explaining the method.The RG transformation of a network (a) with n = 20 and k = 2 (a) to another with n = 10 and k = 1 (b).(c), distance to the root node R after introducing a shortcut.Green nodes whose distance to R is changed are called random nodes.Blue nodes whose distance to R remain unchanged are called regular nodes.(d), represents the case {1, i − 1} , where the green nodes are the positions that cannot be occupied by the intermediate node.Since n = 20 and i = 3 , the number of possible positions for the intermediate node is n − 2i = 14.