Emergence of Soft Communities from Geometric Preferential Attachment

All real networks are different, but many have some structural properties in common. There seems to be no consensus on what the most common properties are, but scale-free degree distributions, strong clustering, and community structure are frequently mentioned without question. Surprisingly, there exists no simple generative mechanism explaining all the three properties at once in growing networks. Here we show how latent network geometry coupled with preferential attachment of nodes to this geometry fills this gap. We call this mechanism geometric preferential attachment (GPA), and validate it against the Internet. GPA gives rise to soft communities that provide a different perspective on the community structure in networks. The connections between GPA and cosmological models, including inflation, are also discussed.

Internet as an example, we demonstrate that GPA generates networks that are in many ways similar to real networks.

Results
Geometric Preferential Attachment. In growing networks the concept of popularity that PA exploits is just one aspect of node attractiveness; another important aspect is similarity 36 . Namely, if nodes are similar (''birds of feather''), then they have a higher chance of being connected (''flock together''), even if they are not popular. This effect, known as homophily in social sciences 37 , has been observed in many real networks of various nature 38,39 .
The GPA mechanism utilizes the idea that both popularity and similarity are important. We take the node birth time t 5 1,2,... as a proxy for node's popularity: all other things being equal, the older the node (i.e. the smaller t), the more popular it is. The similarity attribute of node t is modeled by a random variable h t distributed over a circle S 1 that abstracts the ''similarity'' space. One can think of the similarity space as an image of a certain projection p : A?S 1 from a space of unknown or not easily measurable attributes (a 1 , . . . ,a k ) [ A of nodes. For social networks, these attributes could be political beliefs, education, and social status, whereas for biological networks, {a i } may represent chemical properties of metabolites or geometric properties of protein shapes. While the absolute value of the similarity coordinate h t~p (a 1 t , . . . ,a k t ) does not have any specific meaning, the angular distance h st 5 p 2 j p 2 j h s 2 h t jj quantifies the similarity between two nodes. Upon its birth, a new node t connects to an existing node s , t if s is both popular enough and similar to t, that is if ½ is a parameter that controls the relative contributions of popularity and similarity.
The described rule for establishing new connections admits a simple geometric interpretation which is very useful for analytical treatment of the model. Let us define the radial coordinate of node s at time s as r s 5 2 ln s, and let it grow with time, so that at time t . s it is r s (t) 5 br s 1 (1 2 b)r t . The distance x st between two points in the hyperbolic plane of curvature K 5 21 with polar coordinates (r s (t), h s ) and (r t ,h t ) is approximately 40 x st~rs (t)zr t z2 ln h st 22 ln s b t 2{b h st 2 . Since for any given t, the sets of nodes s , t that minimize s b h st and x st are the same, new nodes simply connect to the hyperbolically closest existing nodes. Note that the increase of the radial coordinate r s (t) decreases the effective age of the node, and thus models the effect of popularity fading observed in many real networks 41 . But how do new nodes find their positions in this similarity space? The main assumption of our model is that the hidden attribute space A of a growing network is likely to contain ''hot'' regions (e.g. of human activity), and that the hotter the region, the more attractive it is for new nodes. Hot regions can for instance represent some hot areas in science. When these regions are projected onto the similarity space S 1 , the hotness manifests itself by a higher node density, more scientists working in a hot area. The higher attractiveness of a hot region is then modeled by placing a new node in this region with the higher probability, the hotter this region is, i.e. the higher the node density in it. That is, new scientists are expected to begin their careers working in hot areas where many existing scientists are already active, versus jumping onto some obscure developments that nobody understands. Therefore the higher the node density in a particular section of our similarity space S 1 , the higher the probability that a new node is placed in this section. Intuitively we would expect that this process should lead to heterogeneous distributions of node coordinates in the similarity space. This intuition is confirmed by empirical results: if we map real networks to their hyperbolic spaces 42,43 , we observe that the resulting empirical angular node density is not uniform (e.g. see Fig. 5(a)), and nodes tend to cluster into tight communities. In the Internet, for example, these communities are groups of Autonomous Systems belonging to the same country.
There are many ways to implement this general idea. For a variety of reasons we found that the most natural and consistent one is as follows. First we define the attractiveness of any location Q[S 1 for a new node t with radial coordinate r t as the number of existing nodes s , t lying in the hyperbolic disk D Q (r t ) of radius r t centered at (r t ,Q). The higher the attractiveness of a location Q, the higher the probability that a new node t will chose this location as its place h t 5 Q in the similarity space. We refer to this mechanism as the geometric preferential attachment (GPA) of nodes to the similarity space. This mechanism is illustrated in Fig. 1.
The exact definition of the GPA model is: 0. Initially the network is empty. New nodes t appear one at a time, t 5 1,..., and for each t: 1. The angular (similarity) coordinate h t of a new node t is determined as follows: ..,t, uniformly at random. The set of pointst 1~( r t ,Q 1 ), . . . ,t t~( r t ,Q t ) in the hyperbolic plane are the ''candidate'' positions for the newborn node; (b) Define the attractiveness A t (Q i ) of the i th candidate as the number of existing nodes that lie within hyperbolic distance r t from it; (c) Set h t 5 Q i with probability where L $ 0 is a parameter, called the initial attractiveness.
2. The radial (popularity) coordinate of node t is set to r t 5 2 ln t.
The radial coordinates of existing nodes s , t are updated to r s (t) 5 br s 1 (1 2 b)r t . Figure 1 | Geometric preferential attachment. At time t, a new node appears at distance r t from the center of the hyperbolic disk denoted by cross. Points Q 1 and Q 2 represent two potential locations of the new node, and the drop-shaped curves are the boundaries of the hyperbolic disks D Q 1 (r t ) and D Q 2 (r t ) of radius r t centered at Q 1 and Q 2 . Since similarity is attractive and D Q 1 (r t ) contains more nodes (five) than D Q 2 (r t ) (none), the new node is more likely to appear at h t 5 Q 1 . 3. Node t connects to m hyperbolically closest existing nodes (if t # m, then node t connects to all existing nodes).
The GPA model has thus three parameters: the number of links m established by every new node, the speed of popularity fading b, and the initial attractiveness L. A moment's thought shows that m controls the average degree of the network, k~2m. We prove in Methods that the model generates scale-free networks and b controls the power-law exponent c. The initial attractiveness L controls the heterogeneity of the angular node density, namely, the heterogeneity is a decreasing function of L. When L R ', the GPA model becomes manifestly identical to the homogeneous popularity 3 similarity (PS) model 36 , where the angular coordinate h t of a new node t is sampled uniformly at random on [0, 2p]. Note, however, that in GPA, choosing a position in the similarity space is an active decision made by a node based on the attractiveness of different locations, as opposed to ''passive'' uniform randomness in PS. In standard PA, the initial attractiveness term is used to control the exponent of the power-law degree distribution 7,8 . In what follows we show that in GPA, L controls certain properties of the community size distribution. Figure 2 shows the simulation results for networks of size n 5 10 3 generated by the GPA model with m 5 3 (i.e. each new node connects to the three hyperbolically closest nodes), b 5 2/3, and different values of L. As expected, the smaller the value of L, the more heterogeneous the distribution of angular coordinates. To quantify the difference between the empirical distribution of the angular coordinates and the uniform distribution on [0, 2p], we use the Kolmogorov-Smirnov (KS) statistic, one of the standard distances that measures the difference between two probability distributions. Recall that the KS statistic r is defined as the maximum difference between the values of the empirical distributionF n (h) of the sample h 1 ,...,h n and the uniform distribution The KS statistic as a function of L is shown in the bottom panel of Fig. 2. As expected, r(L) is a decreasing function of L.
Degree Distribution. For each of the three networks depicted in Fig. 2, the statistical procedure for quantifying power-law behavior  in empirical data proposed in Ref. 44 accepts the hypothesis that the network is scale-free. It estimates the lower cutoff for the scaling region as k min 5 3, which is consistent with the minimum degree in the networks m 5 3. Figure 3(a) shows a doubly logarithmic plot of the empirical degree distributions P(k) , k 2c along with the fitted power-law with exponent c 5 2.5. These empirical results show that the degree distribution of a network generated by GPA appears to be a power-law. Moreover, quite unexpectedly, the power-law exponent c remains similar for different values of L. These results can be proved analytically (see Methods for details). Remarkably, for any value of L, the GPA model produces scale-free networks with the power-law degree distribution identical to the degree distribution in networks growing according to PA, and having power-law exponent c 5 1 1 1/b.
Clustering Coefficient. The concept of clustering 45 quantifies the tendency to form cliques (complete subgraphs) in the neighborhood of a given node. Specifically, the local clustering coefficient of node s is defined as the probability that two nodes s9 and s0, adjacent to s, are also connected to each other. Figure 3(b) shows the average value of the clustering coefficient c(k) for nodes of degree k as a function of k for the three networks in Fig. 2. Interestingly, clustering does not depend on L either (a proof is in the Methods), and scales approximately as k 21 . This means that, on average, the nodes with higher degree have lower clustering, which is consistent with empirical observations of clustering in real complex networks 11,46 . For all the three PGA networks, the mean clustering (the average of the local clustering coefficients) is high, c~0:88.
Soft Communities. The hyperbolic space underlying a network and the GPA mechanism of node appearance in that space naturally induce community structure and allow to detect communities in a very intuitive and simple way. A higher density of links within a community indicates that its nodes are more similar to each other than to the other nodes, because links connect only nodes located within a certain similarity distance threshold. All such densely linked nodes are thus close to each other in some area of the similarity space, meaning that the spatial node density is high in this area. Therefore a community becomes a cluster of spatially close nodes, and the community structure is encoded in a non-uniform distribution of angular (similarity) coordinates of nodes.
Following the approach in Ref. 47, let us consider the angular gaps Dh between consecutive nodes, and define a soft community as a group of nodes separated from the rest of the network by two gaps that exceed a certain critical value Dh c . If a network has a total number of n nodes, then the critical gap Dh c is defined as the expected value of the largest gap Dh Having a geometric interpretation of the community structure, it is now easy to quantity how well communities are separated from each other. For each community C, we define its separation from the rest of the network S C ð Þ as the rescaled average of two gaps Dh 1 ,Dh 2 . Dh c that separate C from its neighboring communities, The mean community separation, i.e. the expected separation of a community that a randomly chosen node belongs to, can then be computed as follows: S~X where n i is the size of community C i and n c is total number of communities. The network metric S can also be viewed as a measure of narrowness (or specialization) of communities. For example, in scientific collaboration network, where nodes represent scientists and communities correspond to groups with similar research interests, S quantifies the degree of interdisciplinarity in the network. When S is large, the boundaries between communities are sharp and each community focuses on its narrow, specific topic. On the other hand, if S is close to one, then the boundaries are blur, communities are wide spread, and the network is highly interdisciplinary.
The difference in the stochastic behavior of the rescaled gaps in Fig. 4 suggests that the initial attractiveness L controls the mean community separation S in the GPA-generated networks. This is  confirmed by simulation results shown in Fig. 5(c), where S is shown as a function of L. As expected, S(L) is a monotonically decreasing function, approaching one when L is large.
The Internet. To demonstrate the ability of the GPA mechanism to generate graphs that are similar to real networks, and, in particular, to reproduce real non-uniform distributions of similarity node coordinates, we consider the Autonomous Systems (AS) Internet topology 48 of December 2009. The network consists of N 5 25910 nodes, ASs, and M 5 63435 links that represent logical relationships between ASs. We embed the AS Internet into its hyperbolic space, i. e compute the popularity and similarity coordinates {r i ,h i }, using HyperMap 43 , an efficient network mapping algorithm that estimates the latent hyperbolic coordinates of nodes. The network topology has a power-law degree distribution with c 5 2.1 and average node degree k<5. This automatically determines two out of three parameters of the GPA model: m~ k=2 and b 5 1/(c 2 1). In Methods, we explain how to infer the value of L from network data using the maximum likelihood method. Here we consider the snapshot of the AS Internet based on the first n 5 10 3 nodes. The corresponding estimated value of the initial attractiveness is L Int 5 0.7. Figure 5(a) shows the histogram of the angular node density for the AS Internet snapshot. We note that it is far from uniform, which is a direct indication of the presence of soft communities. We quantify the degree of heterogeneity of the angular density by the KS distance from the uniform distribution (2) and juxtapose it against the KS distances computed for networks generated by the GPA model with L 5 0.7 (Fig. 5(b)). The Internet value lies within the 25th and 75th percentiles of the synthetic values, which shows that the degrees of non-uniformity in the Internet and GPA networks are comparable. Fig. 5(c) compares the real network with its synthetic counterpart in terms of the expected mean community separation (5). The GPA mechanism generates networks with S that match the Internet value very well. In Fig. 5(d), we compare the community size distributions in the Internet snapshot and prediction given by the GPA model. Whereas S for the Internet and GPA networks are essentially identical, the KS statistics and community size distributions are similar, but the match is not perfect. This effect is explained by the systematic bias present in the inferred values of the angular coordinates {h i }. Indeed, the HyperMap method first assumes that all angular coordinates are uniformly distributed over the similarity space S 1 , i.e. L 5 ', and then perturbs them to maximize a certain likelihood function. This ''smoothes'' the inferred angular node density and makes it more homogeneous than the true distribution. Nevertheless, although the inferred value of L is only an approximation for the true value, the GPA model still captures well the degree of heterogeneity in the real network.
Finally we note that GPA defined in Eq. (1) admits an interesting interpretation that suggests a model extension that may be useful for real network analysis. The probability of a new node born at time t to chose the angular position Q i can be written as where Therefore the event of choosing a position on the circle can be understood as follows. With probability p f the new node is a follower and chooses its position according to pure GPA (L 5 0). With the remaining probability 1 2 p f the new node chooses its position uniformly at random among the t available positions. We note that L controls p f , since hA t i < 1. When L is constant, p f is also constant, and consequently there is always a fraction of nodes that are placed at random locations. At long times, these random nodes diminish the effect of pure GPA, and eventually the angular distribution of nodes become indistinguishable from a Poisson point process on the circle. We can then wonder whether a constant value of L is a realistic assumption for dealing with real networks. In scientific citation networks, for example, when a new field of science is being formed, and not much work has yet been done in it, scientists may decide either to explore a new line of research within the field, or to follow one of the  mainstream existing lines. The former case can be modeled by a random choice of the angular position, assuming that subfields are homogeneously distributed. The latter is modeled by the pure GPA term in Eq. (6). However, there is a payoff that does not remain constant during the evolution of the field. At early times, the chances to find an interesting result that would be highly cited and followed by others are very high. At late times, the topic space is crowded and the chances to find something fundamentally new are very slim. Therefore, there is a higher incentive for scientists to take higher risks at early times. This can be modeled by p f increasing with time, converging to a value close to 1 as time grows to infinity. In turn, this means that L is a decreasing function of time, having a large value at the beginning of network evolution, and decreasing to small values afterwards. Unfortunately, measuring the temporal evolution of L in a real network is not yet possible because there currently exists no parametric theory describing such evolution that could be used for statistical inference of L. However, it is fairly easy to find an approximate value of L as a function of time as follows. If timestamps of a real complex network are available, we can pretend that L is constant, and infer its value using the MLE techniques described in the Methods for subgraphs made of nodes that were born before a given time t, b L MLE (t). This value can be thought of as a (possibly weighted) average of L(t) in time window (0,t). By increasing the value of t, we can detect whether L is constant (if b L MLE (t) does not change with time, beyond statistical fluctuations), or a decreasing function of time. Figure 5(e) shows b L MLE (t) for the AS Internet where the strong temporal dependence of L is evident.

Discussion
In summary, hyperbolic network geometry, combining popularity and similarity forces driving network evolution, and coupled with preferential attachment of nodes to this geometry (GPA), naturally yields scale-free, strongly clustered growing networks with emergent soft community structure. The GPA model has three parameters that can be readily inferred from network data. Using the AS Internet topology as example, we have seen that the GPA mechanism generates heterogeneous networks that are similar to real networks with respect to key properties, including key aspects of the community size distribution and separation. The mean community separation, a new metric that quantifies the narrowness of communities in a network, is controlled in GPA by initial attractiveness L, which controls the power-law exponent in standard PA.
In the context of the asymptotic equivalence between de Sitter causal sets and popularity 3 similarity (PS) hyperbolic networks established in Ref. 49, we note that L is conceptually similar to the cosmological constant L in Einstein's equations in general relativity (GR), where it is also an additive term in the proportionality between the energy-momentum tensor and spacetime curvature. Causal sets 50,51 are random graphs obtained by Poisson sprinkling a collection of nodes onto (a patch) of a Lorentzian manifold; edges in these graphs connect all timelike-separated pairs of nodes. If there is no matter (empty spacetime) but there is only dark energy (positive L), then the solution of Einstein's equations is the de Sitter spacetime, and the main theorem in Ref. 49 states that the ensemble of PS graphs is asymptotically (n R ') identical to an ensemble of causal sets sprinkled onto de Sitter spacetime, which is one of the three maximally symmetric, homogeneous and isotropic Lorentzian manifolds (the other two are Minkowski and anti-de Sitter spacetimes). In this context, the GPA model considered here is a model with cosmological constant L and matter. Modeled by high node density, this matter, as in GR, ''attracts more matter'', thus increasing the spacetime curvature of which the node density is a proxy. Indeed the main feature of the model is that the higher the node density in a particular region of space, the more nodes will appear in this region later. The main difference with GR is that here we essentially have an analogy with only the 00-component of Einstein's equations. One can envision that other components should describe the coupled dynamics of the similarity space and nodes in it. In case of scientific collaboration network, for example, that would be the co-evolution of science (space) itself, and interests of scientists (node dynamics in this space). In the model considered here nodes do not move. Finding the laws of their spatial dynamics that may further strengthen the analogy with general relativity is a promising but challenging research direction.
In that context, the decay of initial attractiveness L that we found in the Internet must be analogous to the decay of cosmological constant L in modern cosmological theories. Cosmic inflation 52,53 is widely accepted as the most plausible resolution of many problems with the classical big bang theory, including the flatness problem, the horizon problem, and the magnetic-monopole problem. Inflation is  an initial period of accelerated expansion of the universe during which gravity was repulsive. Inflation does not last long, and can be modeled as a time dependent cosmological ''constant'' L that initially has a high value and then decays to zero. The analogies between GPA with decaying L and inflation go even further, producing similar outcomes as far as the spatial distribution of events is concerned. Indeed, cosmic inflation has the effect of smoothing out inhomogeneities so that once inflation is over, the universe is nearly flat, isotropic, and homogeneous, except for quantum fluctuations of the inflaton field. These fluctuations are the seeds of future inhomogeneities that we observe in the universe at scales smaller than 100 Mpc. In the GPA context, a high value of L has also a homogenizing effect. Indeed, if L is large, then p f is small, and new nodes chose their angular positions at random, producing a Poisson point process on the circle. Once L is small enough, we are left with a random distribution of points with Poisson fluctuations that, as in the universe, are the seeds of future communities in the network (galaxies in the universe), because once L is nearly zero, these initial fluctuations are reinforced by pure preferential attachment.

Methods
Invariance of the degree distribution and clustering. Here we prove that the degree distribution and clustering coefficient in the networks generated by the GPA model do not depend of the initial attractiveness L. Moreover, the degree distribution is power-law with exponent c 5 1 1 1/b. The proof can be reduced to the proof for the homogeneous PS model 36 ( Supplementary Information, Section IV). Consider a new node t, and let R t be the radius of a hyperbolic disk centered at this node such that t is connected to all nodes s , t that lie in this disc. Then the probability P GPA (s,t) that nodes t and s , t in the GPA model are connected can be computed as follows: where x st~rs (t)zr t z2ln h st 2 is the hyperbolic distance between nodes s 5 (r s (t),h s ) and t 5 (r t ,h t ) at time t. Using the total probability theorem, wheret i are the candidate positions generated at Step 1(a), and P t (i) are the corresponding acceptance probabilities (1). Applying the total probability theorem with respect to node s, we have: Since the angular coordinates of the candidate positionsŝ j andt i are uniformly distributed on [0, 2p], the probability P(h^s jti ƒa) is simply a/p. Therefore, where the last equality holds because X t i~1 P t (i)~X s j~1 P s (j)~1. We note that P GPA (s,t) does not depend on L, and that it is exactly the same as the probability P PS (s,t) of having a link between nodes t and s , t in the homogeneous PS model. The rest of the proof repeats the proof in Ref. 36 without a change. This leads to which means that the resulting degree distribution in GPA is identical to PA: it is the power-law with exponent c 5 1 1 1/b. Since the connection probability P GPA (s,t) does not depend on L, neither does clustering.
Critical gap. To obtain a closed-form expression for the critical gap, we note that for large n, the sequence h 1 ,...,h n , U[0,2p] can be approximately viewed as a realization of the Poisson point process on the circle of unit radius with density l 5 n/2p. In this case, the distribution of the angular gaps is approximately exponential with rate l. The maximum gap Dh (n) has then the following PDF f Dh(n) (x)ñ , and its expected value can be calculated as follows: where H n is the n th harmonic number, and c is Euler's constant.
Inference of L. The initial attractiveness L controls the distribution of angular coordinates h 1 ,...,h n of the nodes. We therefore first infer h i using the HyperMap method 43 . Given the network embedding f(r i ,h i )g n i~1 into its hyperbolic space, the likelihood function L(Ljh 1 , . . . ,h n ) can be written as follows: L(Ljh 1 , . . . ,h n )~P(h 1 , . . . ,h n jL)~P(h 1 jL)P(h 2 jL,h 1 ) . . . P(h n jL,h 1 , . . . ,h n{1 ) where A t (Q) is the attractiveness of location Q [ S 1 , that is the number of existing nodes at time (t 2 1) that lie within distance r t from (r t ,Q). The log-likelihood is then (up to an additive constant): The multiple integrals in (15) cannot be calculated analytically, since the attractiveness function cannot be written in closed-form. Nevertheless, the loglikelihood can be efficiently estimated be the Monte Carlo method. First, generate N Monte Carlo samples, Q Computing attractivenesses of the Monte Carlo samples A t (Q (j) i ) involves computing O(n 3 N) hyperbolic distances, which is the most computationally intensive part of the algorithm. Having all attractivenesses computed, we can then estimate l(L) for any L[ 0 : DL : L max ½ , and find the maximum likelihood estimate (MLE) b L MLE . An important observation that drastically improves the efficiency of the algorithm is that we do not have to use the entire network to accurately estimate b L MLE , the first n 0 =n nodes are often enough. Table 1 shows the MLEs b L MLE (n 0 ) obtained from the first n 0 5 100,200,500, and 1000 nodes of the networks generated by the GPA model with L 5 0,0.2,0.5,0.7,1, and 2. The corresponding log-likelihood functions are shown in Fig. 6. These simulation results show that the smaller the true value of Land we expect it to be small in real networks since most of them have community  www.nature.com/scientificreports structurethe less network data we need to pin b L MLE down. If, for example, L 5 0, then the MLE of L based on the first n 0 5 100 nodes is already zero. The larger the true value of L, however, the flatter the log-likelihood is around its maximum, which makes inference more challenging.