The growth and evolution of networks has elicited considerable interest from the scientific community and a number of mechanistic models have been proposed to explain their observed degree distributions. Various microscopic processes have been incorporated in these models, among them, node and edge addition, vertex fitness and the deletion of nodes and edges. The existing models, however, focus on specific combinations of these processes and parameterize them in a way that makes it difficult to elucidate the role of the individual elementary mechanisms. We therefore formulated and solved a model that incorporates the minimal processes governing network evolution. Some contribute to growth such as the formation of connections between existing pair of vertices, while others capture deletion; the removal of a node with its corresponding edges, or the removal of an edge between a pair of vertices. We distinguish between these elementary mechanisms, identifying their specific role on network evolution.
The study of networks has received significant attention from the scientific community, thanks to its utility as a useful representation of many complex systems found in the real world, ranging from social to technological, infrastructural, biological and epidemiological systems1,2,3,4,5,6. While seemingly disparate, these networks show common features, among them the fact that they evolve and grow, and many display heterogeneous degree distributions7,8. A series of models have been proposed to account for the growing nature of networks and to uncover the role of various processes that affect the network topology. Perhaps the best-known are the class of models based on preferential attachment9, in which vertices are added to a network with edges that attach to pre-existing vertices with probabilities depending on their degrees. When the attachment probability is precisely linear in the degree of the target vertex the resulting degree distribution follows the power-law pk ~ k−γ. This case is of special interest because many networks from citation networks to the World Wide Web are observed to have degree distributions that approximately follow power laws10.
While the preferential attachment model captures the qualitative features of network evolution, it is a minimal model with obvious limitations: (i) It predicts the value of the degree exponent to be γ = 3, whereas most real world networks have exponents in the range 2 ≤ γ ≤ 4. (ii) It predicts a pure power law degree distribution, while real systems are characterized by small degree saturation and high-degree cutoffs. (iii) It ignores a number of elementary processes that play an important role in the evolution of many real networks, like the addition of internal links and node or link removal.
To account for these limitations, a considerable amount of research has been conducted in the network science community, exploring a series of pertinent modifications to the original model, by changing the form of the attachment probability11,12,13,14,15,16, incorporating effects such as ageing17,18,19,20, fitness21,22,23, and allowing for the simultaneous addition and deletion of edges and vertices24,25,26,27,28,29, each leading to predictions that approximate better the degree distributions observed in real systems. Despite these advances, the current models were motivated by specific problems, making it difficult to understand the role of individual processes on network evolution. For example, models have typically included both random and preferential external attachment of nodes12,15, or preferential external and internal addition of nodes and edges13,16, but not simultaneously incorporating all of these, nor in a fashion that the individual role of each process can be separately elucidated. At the same time these models neglected the important role of the deletion of nodes and edges. When considered24,25,26,27,28,30, this was studied in conjunction only with preferential attachment of new nodes, and although the qualitative results were sound (namely that deletion increases the value of γ, eventually driving the network from a power law to an exponential regime) the predictions for the degree exponent γ > 3 even in the presence of low deletion rates, was not in agreement with what is seen in real networks. Furthermore, when attempts were made to incorporate simultaneously multiple growth processes, the models were parameterized such that it is difficult to separate the contributions of the individual elementary processes to the network topology. For example, Ref. 31 considered the combination of adding links between existing nodes and random rewiring of edges along with node addition. Unfortunately the variable representing each process were dependent parameters, making it difficult to “decouple” their role on the evolution from each other.
In light of these difficulties, our goal here is to study in detail a model, which contains the fundamental processes by which a network evolves, along with the degree of freedom of being able to study and emphasize the role of each process independent of the other. Our primary goal is not necessarily to uncover new results (although we do present a series of new findings) but rather separate the “wheat from the chaff” untangling the results of previous work, thus putting in context and interpreting the role of the individual growth processes.
To be specific, in this paper we study a model which incorporates some of the most elementary processes that drive network evolution, namely the addition of vertices and edges and their removal. Broadly speaking there are four distinct microscopic mechanisms that contribute to network evolution. Two contribute to growth, i.e. either a new vertex attaches to pre-existing vertices, or an existing pair of vertices form connections between them. The other two capture deletion, either the removal of an existing node with its corresponding edges, or the removal of an existing edge between a pair of vertices. We systematically distinguish between these four fundamental processes, identifying their role on network evolution and the degree distribution. We show that one can generate networks with degree distributions in the same range as measured in real networks, in the presence of either specific combinations of these processes, or indeed all of them occurring simultaneously.
Model for network evolution
Let pk denote the fraction of vertices that have degree k in a network of size n. Following11,12 we define the attachment kernel πk to be n times the probability that a given edge of a newly added vertex attaches to a pre-existing vertex of degree k. The factor n here is convenient, as it means that the total probability that the given edge attaches to any vertex of degree k is πkpk. Since each edge must attach to a vertex of some degree, πk must satisfy the normalization condition, . We define πk,k′ to be the joint attachment kernel for an edge to be placed between the two vertices of degree k and k′. The correct normalization in this case is given by , where pk,k′ is the joint degree distribution. We consider a network that evolves in time, according to the four basic processes outlined in the introduction. That is, in each unit of time, the following elementary steps are considered:
We add a new vertex to the graph along with c edges. While in principle, the number of these edges can be drawn from some distribution, in the spirit of simplicity, we assume that all newly added vertices have the same degree. We must next decide how to attach the c edges to pre-existing vertices in the network via the attachment kernel πk. In the preferential attachment model πk ~ k (ref. 9), which precludes nodes with initially no links (k = 0) from acquiring an edge. In real networks however even isolated nodes can acquire links. Indeed, in citation networks, a new research paper has a finite probability of being cited, or in social networks, a person that moves to a new city will quickly acquire acquaintances. Zero-degree nodes can acquire links if we add a constant a to πk15,32, obtaining where A = (a + b〈k〉)−1. Note that for k = 0, πk ~ a, thus this represents the probability for a node to acquire its first link and can be thought of as its initial attractiveness23 or fitness21,22. In the limit a → 0 we recover pure preferential attachment, while as b → 0, πk = 1 and we have purely random attachment of vertices leading to an exponential degree distribution33.
Addition of internal links
Often links do not arrive with new nodes, but are added between those already extant in the network. For example, the vast majority of the links in the World Wide Web are internal links, corresponding to URL's added between existing web documents, and so are virtually all new social/friendship links formed between individuals who already have other friends. To reflect this, we select m pairs of vertices already present in the graph and according to πk,k′ add a single edge between them. In order to choose the form of πk,k′, we take inspiration from measurements made on real networks34,35, suggesting that internal links are formed with probability πk,k′ ~ (s + tk)(s′ + t′k′), incorporating both random (s, s′) and preferential (t, t′) attachment. This form allows us to factorize the joint probability πk,k′ into the product , and assuming s = s′, t = t′, we choose, where B = (s + t〈k〉)−1.
Many real systems also experience node deletion, reflecting for example, the departure of an employee from an organization or the removal of a document from the WWW. To account for this phenomenon, with probability r we randomly remove a single vertex from the graph, such that r < 1 corresponds to a growing network, while r = 1 represents a network of fixed size where deletion is balanced by growth. In principle r can be greater than 1 (of course then it ceases to be a probability), in which case we have shrinking networks36,37,38 that eventually disintegrate. In this paper, however, we restrict ourselves to the case of growing networks and thus exclude this case.
Finally networks may also experience the deletion of individual links between nodes. In fact this is probably more common than node deletion, as URL's between webpages are frequently removed or relationships between friends in a social network are terminated while they continue to maintain ties with other acquaintances. Therefore with probability q we randomly select m existing pairs of vertices and remove the edge between them.
We thus have eight parameters in the model, their role being summarized in Table 1, while the four processes captured by the model are schematically illustrated in Fig. 1. Now that we have our basic ingredients, we can write down a rate equation that captures the evolution of the resulting network, The term δkc represents the addition of a vertex with degree c, while πk represents the flow of degrees from from k − 1 to k and k to k + 1 owing to the addition of a single edge from the new vertex. Terms involving πk,k′ represent the flow of degrees due to the addition of a single edge between two existing vertices, while the combinatorial factor 2 accounts for the fact that each end of the edge can connect to a vertex with degree k or k′. The terms (k + 1)pk+1 and kpk describe the flow from degree k + 1 to k and from k to k − 1 as vertices lose edges when one of their neighbors is removed from the network. The term rpk represents removal of a vertex of degree k with probability r, while ek,j is the probability that a randomly selected edge has a vertex of degree k on one end and another of degree j on the other. Contributions from processes in which a vertex gains or loses two or more edges in a single unit of time vanish in the limit of large n and have been neglected.
The rate equation (3) is fairly complex due to the presence of the joint probabilities πk,k′ and pk,k′. However if we assume that the network lacks degree correlations, then pk,k′ can be factorized as pkpk′, while Σjek,j = kpk/〈k〉. With the aid of generating functions (Supplementary Methods, Sec. S1) this can be recast in differential equation form thus, where g(z) = Σk pkzk and θ, β, α are functions of the parameters listed in Table 1. (See Supplementary Methods, Eq. (S3) for their explicit forms.)
Average degree 〈k〉
The solution to Eq. (4) is non-trivial. We can make progress, however, if the average degree 〈k〉 depends only on the free parameters of the model c, m, r, q. Note that at each time step the net number of vertices added is 1 − r, each of which has c edges. There are m edges added between existing pairs of vertices, while the average number of edges removed when removing a randomly chosen vertex is by definition 〈k〉. The number of edges removed between existing pairs of vertices is q × m. Therefore in each time-step the mean number of edges added is c + m − r〈k〉 − qm. For a graph with e edges and n vertices 〈k〉 = 2e/n. After time τ we have n = (1 − r)τ and assuming that 〈k〉 has an asymptotically constant value, e = (c + m − r〈k〉 − qm)τ. Substituting and re-arranging, we obtain
Solutions for the degree distribution pk
We proceed to solve Eq. (4) to determine the degree distribution pk. While we can solve the equation with all its components numerically, it is difficult to get a closed form analytic expression. We therefore first treat the case with only the node and link addition processes (which we can solve exactly) and then use an approximation to include the deletion processes.
We start by considering the case when vertices and edges are added but never removed (left panel of Fig. 1). In this case r, q = 0 and thus α = 0. With this simplification, and after a sequence of manipulations (Supplementary Methods, Sec. S2) it can be shown that this leads to a degree-distribution, where B(x, y) = Γ(x)Γ(y)/Γ(x + y) is the Beta function. For large x, we have B(x, y) ≈ x−y and thus asymptotically pk ~ (k + k0)−γ, a shifted power-law, where,
In Fig. 2a we plot pk as a result of numerical simulations of the evolution process described here, along with the theoretical expression (6). As the figure shows the agreement between the two is excellent.
We can isolate the effect of each growth process by setting its associated parameter to 0. The combinations are listed in Table 2, which allows us to draw the following conclusions.
Initial attractiveness/random attachment
Once present these two have the following consequences:
Increases the degree exponent γ. As we see from Table 2 its primary effect is to introduce positive contributions to the exponent γ, making the network more homogenous. For example in the simple case of preferential and random attachment of external links, we have, . This means that γ is always greater than 3 and therefore the second moment 〈k2〉 is finite affecting both network robustness39,40,41,42 and spreading phenomena43,44. In general the contributions are simple additive perturbations for each random process, in combination with internal or external preferential attachment. When both external and internal preferential attachment are present then the perturbations are more complex combinations than simple linear additive terms, however the qualitative behavior is the same, γ increases.
Generates a small-degree cutoff. We see that the solution is a shifted power law pk ~ (k + k0)−γ, implying a small-degree saturation at k0, where k0 is a function of the parameters c, m, a, b, s, t. In the limit , however this initial attractiveness loses relevance and pk has a purely power law tail, a phenomenon that can be understood from the fact that initial attractiveness predominantly favors small-degree nodes.
To understand the role of internal links we consider several special cases.
Double random attachment (t = 0, a, b, s ≠ 0). In this case we have external preferential and random attachment as well as random addition of internal links. The degree exponent resulting from this evolution process is . Therefore–like in the case of external links–random attachment of internal links continue to play a homogenizing role as the exponent γ > 3 for any combination of the parameters m, s, t. Indeed the random addition of internal links tends to favor lower degree nodes due to their preponderance, and consequently make them more similar to hubs by increasing their degree. In the limit where random–dominates over preferential–attachment (a, s → ∞) the distribution converges to the exponential universality class as 〈k2〉 is finite.
Double preferential attachment (a, s = 0 & b, t ≠ 0). In the case of pure double preferential attachment, both ends of a new link are proportional to the degrees k, k′ of the nodes they connect. The resulting exponent is , indicating that it varies between 2 and 3. Thus in this case, we see that the preferential addition of internal links makes the network more heterogenous. This is the result of two effects. Preferential attachment of external links creates a power law network with hubs (albeit with a fixed γ = 3), whereas the internal links preferentially connect high-degree nodes allowing them to grow faster at the expense of low-degree nodes, lowering γ below 3.
Random and Preferential attachment. In this case all parameters are non-zero, and the overall effect is a combination of the two listed above. The key thing to note, is that the range of the degree exponent is 2 < γ < ∞.
The most important phenomenon that we glean from the results is the heterogenizing influence on pk when internal links are added preferentially. Even in combination with the other effects, there are parameter ranges where γ < 3 and since most real networks are known to have exponents in this range, this suggests that internal preferential attachment plays a key role in maintaining the documented heterogeneity in real networks. Next we examine whether node and edge deletion preserves or destroys the effects of the elementary growth processes.
Growth with deletion
In the presence of node and edge deletion, solving Eq. (4) in closed form is difficult. However, as we are primarily interested in the asymptotic form of the degree distribution, we resort to approximation methods to determine the form of pk in the tail of the distribution. In order to do so, we first simplify the expression for the attachment kernels by setting b, t = 1, such that πk = A(a + k) and fk = B(t + k). The disadvantage of this is that we cannot treat random and preferential attachment separately. However we have already explored the homogenizing role of random processes in the previous section and based on the limiting behaviors that we found, we assume pk follows a power-law with an exponential correction, pk = Ck−γΩk, and solve for γ and Ω in the limit . Next, following14,45, we employ the method of telescoping products via an expansion of pk at large k (Supplementary Methods Sec. S2). Substituting this expansion into Eq. (3) we find two solutions for Ω, namely Ω = 1 and If Ω < 1 then the solution (8) is normalizable and pk decays exponentially (with a power-law correction). However if the ratio is greater than 1, it does not correspond to a normalizable probability distribution and therefore the correct solution is Ω = 1, leading to a purely power law distribution pk ~ k−γ. This suggests that one of the primary effects of the deletion process is to induce a topological phase transition at the point Ω = 1, separating an exponential regime from a power-law regime. This phase transition has previously been pointed out by45, in the limited context of node addition and deletion. We find that this phenomenon is robust to the inclusion of the full set of growth processes considered here.
With a further simplifying assumption, we can define a single critical parameter that determines the scaling regime. If a = s, or in other words the degree of external and internal random attachment is the same, then (8) reduces to: Ω = A(c + 2m)/(r + 2qm/〈k〉). Substituting in the expressions for A, 〈k〉, we find a strictly positive quantity, such that for a > ac the distribution is exponential, whereas for a < ac the distribution follows a power-law. At the critical point ac it can be shown that pk has the stretched exponential form (Supplementary Methods, Sec. S2).
Below ac we find and in Table 3 we list the exponent γ as a function of different parameter combinations, now including deletion. In Fig. 2b, we plot pk as a result of numerical simulations of the evolution process and compare it to the theoretical expression (10), finding that the agreement between the two is very good.
We can define one more critical point within the power-law regime, in terms of a critical parameter that separates power-laws with finite second moments (i.e γ > 3) and those with infinite second moment (2 < γ ≤ 3). To do so we set (10) equal to 3 and solve for a to find Therefore for we have a power-law with exponent γ > 3 and for we have 2 < γ ≤ 3. Note that negative values of are possible. In fact certain authors have suggested12 that one can generate power-laws with exponent γ < 3 if the parameter a is negative (one can see this in Table 2 by setting either a, b < 0). It is however unclear as to what a negative value of a might mean. The most logical way to interpret a is either as random attachment, or a fitness/intial attractiveness parameter, and there does not seem to be a reasonable argument for a vertex to have negative fitness. Consequently, we require a > 0, allowing us to define yet another critical value, such that if r > rc the phase with 2 < γ ≤ 3 disappears (as is negative) leaving us only with an exponential phase and a power-law phase with γ > 3. This is fairly easy to understand if the condition is recast in a different form. Recall that the existence of the state (2 < γ ≤ 3) is driven by internal preferential attachment which is parameterized by m. Using (5) we can rewrite Eq. (12) as the condition r〈k〉 + 2mq > 2m. The term r〈k〉 + 2mq however is just the average number of links that are removed in a given time-step through node and edge deletion, whereas 2m is the number of internal links added via preferential attachment. So if the number of deleted links is greater than the number of internal links added, than the effect of internal preferential attachment is suppressed and therefore the state vanishes.
In Fig. 3 we plot the three phases, exponential (blue), power-law γ > 3 (red) and power-law 2 < γ ≤ 3 (green), also showing the random attachment parameter a as a function of the deletion parameters r, q. In order to separate the effects of node and edge deletion we set q = 0 in Fig. 3a and r = 0 in 3b. In both cases, we have the existence of three phases separated by the ac, curves. However, we see that edge deletion permits a much larger range in the phase-space for the existence of a power-law degree distribution (especially for the green phase). The critical parameters ac(c, m, r, q) and in conjunction with Table 3 allow us to discuss the effects of each deletion process.
Node deletion has a strong homogenizing effect on the degree distribution, inducing a topological phase transition from a power law to an exponential phase. Its effect is better understood by looking at specific limits.
(a, q = 0) When including only external preferential attachment, we have . For r < 1, the number of removed nodes is less than that of newly introduced nodes and hence the network exhibits net growth. However γ increases in function of r and thus the network is more homogenous. Specifically, in the limit r → 1, we see that γ diverges when there is only external preferential attachment, and the degree distribution transitions to a stretched exponential27. This can be explained by the fact that random node removal serves as a pruning of the degree of the high-degree nodes (since the nodes that are being removed are more numerous low degree ones which are connected to the hubs resulting in a peak for pk near 〈k〉). Thus when the addition of a node is compensated by the deletion of one, the increase of neighbors of hubs (from the addition of the new node) is balanced by the removal of its low degree neighbors, ultimately resulting in the homogenization of the network.
On the other hand in the presence of internal preferential attachment the degree exponent is and one can see that the divergence is suppressed. The adverse effect of deletion is compensated by preferentially connecting hubs together, thus maintaing the heterogenous character of the network. In this regime all three phases can co-exist, although at r = rc the green state vanishes for reasons explained earlier.
(a ≠ 0, q = 0). The compensating effect of internal preferential attachment is eventually overcome with the introduction of random addition. The homogenizing effect of this, in conjunction with node deletion eventually induces a topological phase transition between a power law (red) and exponential (blue) phases at a = ac, which in this parameter regime is . Note, that ac ≥ 0 for all r, and therefore the phase transition exists whenever there is any node deletion. In addition to this the power law region is separated into the red and green phases at . One again, the green state vanishes at r = rc.
Edge deletion has a similar effect to node deletion, however, it admits a wider region in phase space for power laws. We once again examine these by looking at different limits.
(a, r = 0) The value of the exponent in this regime is . Unlike node deletion, we see that γ ranges between 2 and 3 for all values of q. In the limit q → 1, γ = 3. This is easy to understand, since for q = 1 the number of internal edges added and those removed are the same, and thus we effectively only have preferential attachment of external edges and recover that limit.
(a ≠ 0, r = 0) In this limit, just as in node deletion, there is a phase transition at . Once again, we see that the phase transition is present for all values of q. The curve marking the separation between the green and red phases is now, , which is positive for any q. So unlike the case for node deletion, the green phase exists for all values of q.
Node and edge deletion
In general all three phases co-exist. The corresponding limiting behaviors are:
Homogenous regime (r = rc) The green phase vanishes and only two phases survive (blue and red).
Exponential regime (r, q → 1) Red and green phases vanish and only the blue phase survives.
Heterogenous regime (r, q → 0) The blue phase vanishes while the green and red phases survive.
Taken together the results suggests that the form of the degree distribution pk is in general a highly complex interplay between the different parameters, and is determined by the dominant elementary process. While the combined effect is complex, we have been able to clearly outline the role of the individual mechanisms, which are:
A pure power law with γ = 3 emerges if the links are added only via preferential attachment. This can be thought of as the “backbone” or starting point for understanding the degree distribution of real networks. When one includes initial attractiveness of nodes or random attachment of the links, these lead to a small degree saturation of the distribution by introducing a shift k0. Furthermore it homogenizes the network by making γ > 3, driving it toward the exponential universality class.
When placed between nodes randomly, internal links have the same effect as initial attractiveness. However, when preferentially added, they tend to link together high degree nodes, allowing them to grow faster than low degree ones and thus the resulting network becomes more heterogenous. In conjunction with external preferential attachment this lowers the exponent to γ < 3.
Node and edge deletion
Taken together node and edge removal have a disruptive influence on the network topology. Random node removal depletes the low-degree nodes (since they are more numerous) while random edge removal depletes the high-degree nodes (since they have the most links) and their combined effect is to drive the exponent γ far from 3, thus making the network more homogenous. In particular as r → 1 the power law form of the distribution is destroyed and the network undergoes a topological phase transition to a stretched exponential. When combined with random attachment (parametrized by a), this happens for r, q < 1 at a critical value ac such that for a > ac the network has an exponential distribution whereas for a < ac it continues to follow a power-law. The power-law phase includes a region with 2 < γ ≤ 3 (due to internal link addition). This however vanishes for r > rc, when the number of deleted links exceeds the number of added internal links.
Thus random attachment and deletion act as homogenizing forces, conspiring against the heterogenizing force, preferential attachment. The resulting degree distribution, whether exponential or power law, depends on which of these dominate. However, the important thing to note is that there are wide regions in phase space of Fig. 3 that permit networks with γ < 3 when all of these elements co-exist. This is particularly important as in most real networks (for which we have 2 < γ ≤ 4), several of the elementary processes discussed here do appear together. In citation networks, for example, there are no deletion effects (in principle citations can be retracted, but this is rare) although empirical measurements suggest the presence of initial attractiveness and preferential attachment46. Our results indicate that the degree exponent should be a shifted power law with γ > 3 and this is precisely what is found47. Many other networks, where deletion effects are present, have degree exponents γ < 3 and our findings indicate that the reason for observed forms for pk strongly depends on the presence of preferential attachment of internal links (in combination with external links), as well as net growth, where vertex and edge addition outstrip their deletion.
One can augment the findings here by generalizing the framework to directed networks15,48, including non-linear corrections to preferential attachment12,14, increasing average degree49,50,51, edge rewiring22 or aging of vertices17,18,19,20 among other effects. Each of these will of course introduce perturbations to our solutions, but the qualitative behavior should remain within the bounds determined by the elementary mechanisms discussed here.
Solving for pk using generating functions
Given a rate equation involving pk and of the form , we can convert this into a differential equation by the use of generating functions . Multiplying the rate equation by zk, summing over k and noticing that terms in kpk can be written as dg(z)/dz, we arrive at a differential equation of the form dg(z)/dz = F(g(z)). Assuming that a solution to the differential equation exists in closed form (typically special functions like the Beta function or Hypergeometric functions), this can then be expanded in a power series of z, following which pk is determined by comparing coefficients.
Solving for pk using telescoping products
Frequently a closed form solution to such a differential equation does not exist. Nevertheless one can make progress if one is interested in the form of the distribution for large k (the tail of pk). Typically a guess is made to the general form of pk either through heuristic arguments or by examining the results of numerical simulations. In the case discussed in this manuscript, we chose pk = Ck−γΩk. A high degree expansion is then performed for the telescoping products pk/pk−1 and pk/pk+1 in powers of 1/k, which is then substituted back into the rate equation Eq. (3). Ignoring terms in 1/k (since we are interested in the limit ) and setting terms in k to zero gives us solutions for Ω. Depending on the regime we are interested in, the corresponding solution for Ω is substituted back into the equation and setting the k-independent term to zero gives us the solution for γ.
We thank C. Song for valuable discussions. This work was supported by the Network Science Collaborative Technology Alliance sponsored by the US Army Research Laboratory under Agreement Number W911NF-09-2-0053; the Defense Advanced Research Projects Agency under Agreement Number 11645021; the Defense Threat Reduction Agency award WMD BRBAA07-J-2-0035; and the generous support of Lockheed Martin.
This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/