Scaling behaviours in the growth of networked systems and their geometric origins

Two classes of scaling behaviours, namely the super-linear scaling of links or activities, and the sub-linear scaling of area, diversity, or time elapsed with respect to size have been found to prevail in the growth of complex networked systems. Despite some pioneering modelling approaches proposed for specific systems, whether there exists some general mechanisms that account for the origins of such scaling behaviours in different contexts, especially in socioeconomic systems, remains an open question. We address this problem by introducing a geometric network model without free parameter, finding that both super-linear and sub-linear scaling behaviours can be simultaneously reproduced and that the scaling exponents are exclusively determined by the dimension of the Euclidean space in which the network is embedded. We implement some realistic extensions to the basic model to offer more accurate predictions for cities of various scaling behaviours and the Zipf distribution reported in the literature and observed in our empirical studies. All of the empirical results can be precisely recovered by our model with analytical predictions of all major properties. By virtue of these general findings concerning scaling behaviour, our models with simple mechanisms gain new insights into the evolution and development of complex networked systems.

1 The Mean-Field Analysis of the Basic Model

The linear relation between R(t) and t
Our mean-field analysis of the basic model starts from a simple and basic relation, R(t) ∼ t. In this sub section, we present an exact proof of this relation in one dimension and an approximate estimation for d > 1 dimensions. R(t) x 0 r Supplementary Figure S1: Illustration of one-dimensional model. Where R(t) is the radius of the network, the triangle at the position R(t) is the farthest node from the origin 0. If a new node is generated at position x ∈ [R(t), R(t) + r], it can survive.
In one-dimensional space, we can analyze only one side of the system because of the symmetry of space around the center point. Suppose that at time t, the distance between the farthest node and the center point is R(t) as shown in Supplementary Fig. S1. We know that the distribution of existing nodes lies in the interval [0, R(t)], and is compact, which means that all points in [0, R(t)] are covered by the intervals [p − r, p + r], where p is the coordinate of any existing node. Therefore, R(t) can grow only when a new node is generated in the interval [R(t), R(t) + r]. Suppose that the increase of R(t) at t is ∆R(t). ∆R(t) is a stochastic variable that can be expressed using the following equations: Therefore, the probability density function of ∆R(t) is Thus, its expectation value is: We can therefore obtain the expectation value of R(t) because Hence, R(t) is proportional to t. This conclusion is also satisfied for d > 1.
In the general case, we define R(t) as the maximum distance from the center to any existing point at time step t. We consider its growth to be ∆R(t). Because the model is isotropic, the growth in any one direction is similar that in a one-dimensional model. Therefore, if ∆R(t) > 0, any newly generated node must fall into the volume of the d-dimensional ball of radius r centered on the farthest existing node exclude the R(t) radius d ball centered at the center of the hypercube (the shaded area in Supplementary   Fig. S2 represents this region for d = 2).

R(t) r
Supplementary Figure S2: Illustration for two-dimensional model. R(t) is the radius of the network, r is the interaction radius. If a new generated node falls into the shaded area, it can survive as well as R(t) will increase.
Because any newly generated nodes will be evenly distributed throughout the hypercube, the probability that one new node will fall into the shaded area is proportional to its d-dimensional volume. When R(t) is very large, this volume is approximately equal to half of the volume of a d-dimensional ball.
Therefore, the expectation value of the increase of R(t) is where V d−1 is the volume of the d − 1 -dimensional unit ball, x 1 is the first Cartesian coordinate of a d -dimensional point. Therefore, Equation (S6) not only proves that the radius R(t) is proportional to t but also provides the exact form of the expression for the proportionality constant. This constant scales with r to the power of d + 1 and L to the power of −d. Therefore, when r increases or L decreases, the rate of growth of R(t) will increase. This equation is very important because it can be used to derive other variables. In the main text and the following paragraphs, we use R(t) as an abbreviation for ⟨R(t)⟩.

Deriving the scaling constants
Here, we derive the scaling constants of V (t),t, and E(t). At any given position (ρ, Θ), the expression for the node density is Therefore, the node density at radius ρ is where V d is the volume of a d-dimensional ball. Thus, the total number of nodes is Therefore, the constant of the scaling between V (t) and . We can substitute Eq. (S6) into Eq. (S9) to derive the following: Thus, the constant of the scaling between t and N (t) is obtained.
Finally, we find the scaling constant between E(t) and N (t). First, we require the exact formula for v(ρ, Θ, t). We consider a sufficiently small volume dσ in the d-dimensional hypercube, and suppose the size of dσ to be much larger than the constant r. Because nodes in dσ distribute evenly and only nodes pairs separated by distance smaller than r are connected, the total number of links within this infinitesimal volume is where x, y are the coordinates of any two points in dσ. A multiplicative factor of 1/2 arises because the network is undirected. Thus, the link density is We integrate Eq. (S12) to obtain the total number of links, By substituting the relation between R(t) and N (t) of Eq. (S9), we obtain the following relation between E(t) and N (t):  Figure S3: The dependence of the exponent γ = (d + 2)/(d + 1) on the dimension d. The blue, purple and yellow joined points were obtained from simulations using different maximum numbers of nodes (different running times): blue corresponds to 10 6 nodes, purple corresponds to 10 5 nodes, and yellow corresponds to 10 4 nodes. The red line represents the theoretical prediction.

Simulation tests for the scaling exponent
Supplementary Fig. S3 presents comparison of the exponent between the simulations and the theoretical predictions (Eq. (S15)) for various dimensions. The simulated exponents are always smaller than the results of the mean-field approximation, but the discrepancy can be reduced by increasing the running time of the simulation. The reason for this discrepancy is that the mean-field approximation requires that nodes be very dense so that they can be treated as a continuous field.

Other network properties
Additionally, other network properties, such as the degree distribution and clustering coefficient, can also be derived analytically.

Clustering coefficient
First, we derive the explicit form of the clustering coefficient. According to the definition of the average clustering coefficient for the entire network, it can be understood as the conditional probability that Z is connected to both with X and Y given that X and Y are connected. In our geometric graph model, two nodes can connect only if the distance between them is smaller than r, and all nodes are distributed evenly throughout the hypercube. Therefore, the computation of the clustering coefficient can be converted into a calculation of volumes.

X Y
Supplementary Figure S4: Illustration of the calculation for the clustering coefficient As shown in Supplementary Fig. S4, if we denote the interaction region near node X as the ball X and the interaction region near node Y as Y , then the condition that the two nodes are connected is equivalent to the condition that the overlapped area of the two interaction regions is not empty. If the third node Z is connected to both X and Y , then Z must fall into the overlapped area of X and Y .
Therefore, the clustering coefficient C can be calculated using the following formula: where X ↔ Y indicates that X and Y are connected. S X represents the d-dimensional volume of region X. In Eq. (S16), C is a constant and depends only on the spatial dimension. The theoretical prediction is compared with the results of the computer simulations in Supplementary Fig. S5. It is evident that all results coincide with the theoretical predictions for all dimensions. The results are the same as for the classical random geometric graph.

Degree distribution
First, we know that the average degree of one node at radius R at time t is According to the mean-field approximation, all nodes within the radius R possess degrees larger than D R ; therefore, the cumulative distribution function, i.e., the probability of nodes of degree larger than a given value x, is , is the maximum degree of the graph at time t. It can be expressed explicitly using the mean-field approximation as follows: We compare the theoretical result with the simulation result in the upper plot of Supplementary Fig. S6.
There is some deviation between the two when x is large. One possible explanation is that only a few nodes possess degrees as large as predicted by the mean-field theory (when t approaches infinity).
According to Eq. (S18), the degree distribution density function can be written in the following form: where and Therefore, the degree distribution is size dependent and also can be re-scaled by the power of the system size N (t) (see Supplementary Fig. S6 (a)). This reflects the self-similarity property of the system, and it has also been observed in other systems [Wu and Zhang (2011)]. Furthermore, the form of the size dependency (Eq. (S20)) also guarantees that the total number of links scales with the number of nodes.
2 Additional Empirical Evidence

Flickr
We compare the scalings of social tagging systems to the results of our model. In the main text, the Delicious system is studied. Here, we present similar results for another online system, Flickr. The data can be downloaded freely at http://www.tagora-project.eu/data/.
In both systems, users visit certain online resources (pictures on Flickr) and may tag them with certain words. We can consider the semantic space of these tags as the space in which our model is established.
Then, the number of distinct tags can be regarded as the total volume occupied by users. stages can be seen with very different growth rates. Initially, the community passed through a phase of relatively slow growth. However, when the size of the system grew to exceed approximately 5000, the rate of growth became very large. The scaling phenomena that we discussed here apply only to the second stage. The delicious community also passed through two such stages, but the separation between them is not apparent because the time window of available data is much longer than that of Flickr.

APS citation network
As another example, we explore the scaling laws of a citation network based on the APS data set

Model Extension
The basic model is very simple, so it can be extended in several respects. As mentioned in the main text, we added the crowding rule to avoid the formation of over-compact graphs. Here, we provide detailed derivation and a discussion of other effects of this extension that are not discussed in the main text.
According to the crowding rule described in the main text, a newly generated node at position (ρ, Θ) will survive with the following probability: Therefore, when α = 0, the new node will survive with probability 1 if the surrounding node density is nonzero; in other words, we recover the basic model. By contrast, when α → ∞, new nodes will survive in areas of empty space, and we will obtain a random geometric graph with nodes distributed evenly throughout the space. First, we can prove that the relation between R(t) and t does not change when this new rule is incorporated. This is true because the probability that a node falls into the region that can cause the radius of the graph R(t) (which is also defined as the maximum distance of any existing node from the center node) to slightly increase (the shaded area in Supplementary Fig. S2) is the same as in the basic model. Thus the rate of increase of R(t) is also expressed by Eq. (S6).
Next, we will consider the nodes. We can write down a differential equation to describe node density µ: By solving this equation, we find Therefore, the relation between N (t) and R(t) is The total number of links between E(t) and R(t) is Therefore, we can obtain the scalings with respect to N (t), as mentioned in the main text, and the relation between γ and α, as shown in Eq. (10) of the main text, is We compare these exponents to the results of the numerical experiments in Supplementary Fig. S9. From this figure, it is clear that the mean-field approximation can yield better prediction in d = 2 space than in d = 3 space. Moreover, as α increases, the deviation between the theoretical prediction and the simulation becomes larger because the nodes become sparser, violating the assumption of the mean-field approximation. for the detected nighttime lights, altered to remove ephemeral lights and background noise [Elvidge et al.(2011)].
The image is 30-arc-second grided and spans from -180 to 180 degrees longitude and from -65 to 75 degrees latitude. The digital number (DN) values of the nighttime lights range from 1 to 63, while 0 represents the identified and eliminated background noise and 255 represents an area where no cloudfree observation has been collected. In addition, although sunlit data, moonlight, glare, observations containing clouds and lighting features from the aurora are excluded from the DMSP nighttime stable lights dataset, gas flares are not. Therefore, we used the global gas flare map generated by NGDC [Elvidge et al.(2009)] to identify and remove gas flares, reducing the possibility of mistaking them for urbanized areas.
The year 2009 was chosen because it was the latest product freely accessible when we first conducted our analysis. For detailed comparison between our model simulation results and nighttime light observations, we narrow our scope down to part of the south central contiguous United States, where the saturated lightness only makes up a negligible proportion so this region suffers less from the well-known saturation problem of DMSP nighttime lights data [Sutton et al.(2007)]. Using GIS software, the nighttime lights image was re-projected into Lambert conic conformal projection, and a 1000 pixels x 1000 pixels region was extracted from the global image. The upper left corner of the region of interest (ROI) is 113.8 W, 42.2 N, upper right 101.7 W, 43.4 N, lower left 111.7 W, 33.5 N and lower right 100.9 W, 34.5 N.
In this region, two lighted pixels were considered as connected if one of them is the Moore neighborhood of the other, and all the connected pixels formed a cluster. Thus we identified 921 clusters in Fig. 3(a) in the main text. For each cluster, we treated the total number of non-zero pixels as the area of the cluster, and the sum of non-zero pixels' values as the total light intensity of the cluster. Then, the scaling behaviour between light intensity and area as well as the size distribution of the areas of all clusters were calculated to produce Fig. 3(c) in the main text.

The model
In the main text, we introduced a parameter ϵ to simulate cluster formation. We provide the details of its implementation in this subsection.
In the simulation, we set a rectangular region of width 1000 and height 1000 which are equal to the width and height of the nighttime light image, respectively, in units of pixels. Initially, there are no nodes in the rectangle. In every time step, one node is added to the region, and the survival of that node is determined in accordance with the crowding rule with parameter α. If the node survives, it may become a seed with a probability of ϵ. In each time step, we identify all nodes that are connected by edges as clusters. The nodes are continuously generated, one by one, until the number of clusters larger than 33 in size(the starting point of the power-law tail in the empirical data) is equal to 335, the number of clusters in the power-law tail of the empirical data. Then, we calculate the scaling behaviour between area and light intensity and the size distribution of the clusters.
The clusters created at earlier times may have more chances to gain new nodes, and thus, they may become very large. By contrast, some newly born clusters may be very small, and some clusters may also encounter each other and merge. We locate each node on the lattice by means of the integer parts of its coordinates. Then, the area of each cluster is calculated as the total number of distinct lattices occupied by all nodes in that cluster. The total light intensity is calculated as the total degrees (number of edges) of all the nodes in one cluster because we assume that the light intensity of a cluster is proportional to the total number of interactions. Then, by tuning α = 1.5 and ϵ = 0.03, we can generate exponents of the scaling between area and light intensity and a size distribution that are similar to those determined from the empirical data. The scaling behaviour plot and the cumulative distribution of areas are presented in Supplementary   Fig. S10. However, Fig. 3 in the main text shows that the cluster size distribution of the satellite data does not follow a strict power-law. The power-law tail starts at a size of 33. However, our model can generate a power law distribution because of the underlying Yule-Simon mechanism. Therefore, in the main text, we plot only clusters with areas larger than 33. The entire plot with all clusters included is presented in Supplementary Fig. S10. We are aware that the scaling law for small clusters is not very clear because of large fluctuations. We believe that with some small changes to its implementation, our model will be able to generate better statistics that will be comparable to the empirical data. This task will be left for future studies.
In the main text, we mention that the dynamics of the change in cluster sizes with the number of nodes can be approximated as a Yule-Simon process with an exponent smaller than 1 because the area of the region is very large. We present the analysis that supports this statement in detail here.
If L → ∞, then the clusters may be isolated from each other. Therefore, the dynamics of each cluster can be viewed as an independent random process.
Suppose that at time t, there are s existing nodes and N C (t) clusters. We let M N (t) be the total number of clusters with N nodes, and P N (t) = M N (t)/N C (t) is then the fraction of such clusters among all clusters. We use s as the time index such that in a single step, only one node is added to either join one of the existing N C clusters or, with probability ϵ, form a new cluster. Therefore, we can express the dynamical equation of M N (s) as follows: is the scaling behaviour constant between the area V (t) and the number of nodes N (t), and η = (2 + 2α)/(3 + 2α) < 1. According to we find that and because, we find that Thus, we can derive Eq. (12) in the main text. We are interested in the stable state, so we let s → ∞; then, ∂P N (s) ∂s = 0, and we obtain When s → ∞, we can set Z ∼ s because we know that N 0 < N η < N 1 , and thus, s = We introduce an undetermined parameter ζ, and let and insert it into Eq. (S34): We can solve P N using this equation, This quantity can be approximated as where, ζ can be obtained using the following equation . (S40) It is hard to obtain the explicit form of this distribution because ζ is difficult to calculate. Therefore, we show the numerical result of Eq. (S40). A power law function with exponent dependent on ϵ is obtained (see Fig. S11).

Finite size effect
In real situations, L is finite, leading to possible collision of some clusters. If this occurs, the analytical results based on the assumption of isolated clusters may be violated. Therefore, we explore the influence of the crossover of clusters on the scaling exponents by carrying out simulations with continuous generation of new nodes for L = 200, 300, 500, respectively. Without loss of generality, we set ϵ = 0.03 and α = 1.5. The results are shown in Supplementary Fig. S12, S14, and S13.  We find that for finite L, as the number of nodes s increases, crossovers take place more frequently ( Supplementary Fig. S12) until only one super cluster remains for L = 200. We see that the scaling exponent almost remains unchanged for a wide range of s. For very large s, the scaling exponent slowly decreases, resulting from crossovers ( Supplementary Fig. S13). This phenomenon becomes more obvious for small L. Supplementary Fig. S14 shows that the power-law exponent of cumulative size distribution also remains stable until the finite size effect works. Take together, these results indicate that the scaling exponents are insensitive to the variation of the number of nodes and crossovers when L is large, accounting for the validity of the approximation of isolated clusters for theoretically predicting the scaling exponents with respect to finite size of the space and a large number of nodes. In addition, our simulation results indicate that the scaling exponents of real cities may decrease as the sizes of cities increase, because of the finite geographical space, in which cities are embedded, as shown in Supplementary Fig. S13 and Supplementary Fig. S14.