Emergent Complex Network Geometry

Networks are mathematical structures that are universally used to describe a large variety of complex systems such as the brain or the Internet. Characterizing the geometrical properties of these networks has become increasingly relevant for routing problems, inference and data mining. In real growing networks, topological, structural and geometrical properties emerge spontaneously from their dynamical rules. Nevertheless we still miss a model in which networks develop an emergent complex geometry. Here we show that a single two parameter network model, the growing geometrical network, can generate complex network geometries with non-trivial distribution of curvatures, combining exponential growth and small-world properties with finite spectral dimensionality. In one limit, the non-equilibrium dynamical rules of these networks can generate scale-free networks with clustering and communities, in another limit planar random geometries with non-trivial modularity. Finally we find that these properties of the geometrical growing networks are present in a large set of real networks describing biological, social and technological systems.

In the apparently unrelated field of quantum gravity, pregeometric models, where space is an emergent property of a network or of a simplicial complex, have attracted large interest over the years [33][34][35][36][37][38][39].Whereas in the case of quantum gravity the aim is to obtain a continuous spacetime structure at large scales, the underlying simplicial structure from which geometry should emerge bears similarities to networks.Therefore we think that similar models taylored more specifically to our desired network structure (especially growing networks) could develop emergent geometrical properties as well.
Here our aim is to propose a pregeometric model for emergent complex network geometry, in which the non-equilibrium dynamical rules do not take into account any embedding space, but during its evolution the network develops a certain heterogeneous distribution of curvatures, a small-world topology characterized by high clustering and small average distance, a modular structure and a finite spectral dimension.
In the last decades the most popular framework for describing the evolution of complex systems has been the one of growing network models [1][2][3].In particular growing complex networks evolving by the preferential attachment mechanism have been widely used to explain the emergence of the scale-free degree distributions which are ubiquitous in complex networks.In this scenario, the network grows by the addition of new nodes and these nodes are more likely to link to nodes already connected to many other nodes according to the preferential attachment rule.In this case the probability that a node acquires a new link is proportional to the degree of the node.The most simple version of these models, the Barabasi-Albert (BA) model [40], can be modified [1][2][3] in order to describe complex networks that also have a large clustering coefficient, another important and ubiquitous property of complex networks that characterizes small-world networks [41] together with the small typical distance between the nodes.
Moreover it has been observed that complex biological, social and technological networks not only have high clustering but also have a structure which suggests that the networks have an hidden embedding space, describing the similarity between the nodes.For example the local structure of protein-protein interaction networks, analysed with the tools of graphlets, suggests that these networks have an underlying non-trivial geometry [42,43].
Another interesting approach to complex networks suggests that network models evolving in a hyperbolic plane might model and approximate a large variety of complex networks [24,25].In this framework nodes are embedded in a hidden metric structure of constant negative curvature that determine their evolution in such a way that nodes closer in space are more likely to be connected.
But is it really always the case that the hidden embedding space is causing the network dynamics or might it be that this effective hidden metric space is the outcome of the network evolution?
Here we want to adopt a growing network framework in order to describe the emergence of geometry in evolving networks.We start from non-equilibrium growing dynamics independent of any hidden embedding space, and we show that spatial properties of the network emerge spontaneously.These networks are the skeleton of growing simplicial complexes that are constructed by gluing together simplexes of given dimension.In particular in this work we focus on simplicial complexes built by gluing together triangles and imposing that the number of triangles incident to a link cannot be larger than a fixed number m that parametrizes the network dynamics.In this way we provide evidence that the proposed stylized model, including only two parameters, can give rise to a wide variety of network geometries and can be considered a starting point for characterizing emergent space in complex networks.Finally we compare the properties of real complex system datasets with the structural and geometric properties of the growing geometrical model showing that despite the fact that the proposed model is extremely stylized, it captures main features observed in a large variety of datasets.

RESULTS
Metric spaces have to satisfy the triangular inequality.Therefore in spatial networks we must have that if a node i connects two nodes (the node j and the node k), these two must be connected by a path of short distance.Therefore, if we want to describe the spontaneous emergence of a discrete geometric space, in absence of an embedding space and a metric, it is plausible that starting from growing simplicial complexes should be an advantage.These structures are formed by gluing together complexes of dimension d n > 1, i.e. fully connected networks, or cliques, formed by n = d n + 1 > 2 nodes, such as triangles, tetrahedra etc.
For simplicity, let us here consider growing networks constructed by addition of connected complexes of dimension d n = 2, i.e. triangles.We distinguish between two cases: the case in which a link can belong to an arbitrarily large number of triangles ( m = ∞), and the case in which each link can belong at most to a finite number m of triangles.In the case in which m is finite we call the links to which we can still add at least one triangle unsaturated.All the other links we call saturated.
To be precise, we start from a network formed by a single triangle, a simplex of dimension d n = 2.At each time we perform two processes (see Figure 1).
• Process (a)-We add a triangle to an unsaturated link (i, j) of the network linking node i to node j.We choose this link randomly with probability Π [1] (i,j) given by Π [1] (i,j) = a ij ρ ij r,s a rs ρ rs (1) where a ij is the element (i, j) of the adjacency matrix a of the network, and where the matrix element ρ ij is equal to one (i.e.ρ ij = 1) if the number of triangles to which the link (i, j) belongs is less than m, otherwise it is zero (i.e.ρ ij = 0).Having chosen the link (i, j) we add a node s, two links (i, s) and (j, s) and the new triangle linking node i, node j and node s.
• Process (b)-With probability p we add a single link between two nodes at hopping distance 2, and we add all the triangles that this link closes, without adding more than m triangles to each link.In order to do this, we choose an unsaturated link (i, j) with probability Π [1] (i,j) given by Eq. ( 1), then we choose one random unsaturated link adjacent either to node i or node j as long as this link is not already part of a triangle including node i and node j.Therefore we choose the link (r, s) with probability Π [2] r,s given by Π [2]  r,s = 1 where δ x,y is the Kronecker delta and N is the normalization constant.Let us assume without loss of generality that the chosen link (r, s) = (r, j).Then we add a link (i, r) and all the triangles passing through node i and node r as long as this process is allowed (i.e. if by doing so we do not add more than m triangles to each link).
Otherwise we do nothing.
With the above algorithm (see Supplementary Information for the MATLAB code) we describe a growing simplicial complex formed by adding triangles.From this structure we can extract the corresponding network where we consider only the information about node connectivity (which node is linked to which other node).We call this network model the geometrical growing network.In Figure 1 we show schematically the dynamical rules for building the growing simplicial complexes and the geometrical growing networks that describe its skeleton.
Let us comment on two fundamental limits of this dynamics.In the case m = ∞, p = 0, the network is scale-free and in the class of growing networks with preferential attachment.
In fact the probability that we add a link to a generic node i of the network using process (a) is simply proportional to the number of links connected to it, i.e. its degree k i .Therefore, the mean-field equations for the degree k i of a generic node i are equal to the equations valid for the BA model and yield a scale-free network with power-law exponent γ = 3. Actually this limit of our model was already discussed in [44] as a simple and major example of scalefree network.In the limit m = 2 we should expect a planar graph since the subgraphs K 5 and K 3,3 are excluded from the dynamical rules.In addition to that the degree distribution can be shown to be exponential (see Methods and Supplementary Information for details).
In general the proposed growing geometric network model can generate a large variety of network geometries.In Figure 2 we show a visualization of single instances of the growing geometrical networks in the cases m = 2, p = 0.9 (random planar geometry), m = ∞, p = 0.
The growing geometrical network model has just two parameters m and p.The role of the parameter m is to fix the maximal number of triangles incident on each link.The role of the parameter p is to allow for a non-trivial K-core structure of the network.In fact, if p = 0 the network can be completely pruned if we remove nodes of degree k i = 2 recursively, similarly to what happens in the BA model, while for p > 0 the geometrical growing network has a non-trivial K-core.Moreover the process (b) can be used to "freeze" some region of the network.In order to see this, let us consider the role of the process (b) occurring with probability p in the case of a network with m = 2. Then for p = 0, each node will increase its connectivity indefinitely with time having always exactly two unsaturated links attached to it.On the contrary, if p > 0 there is a small probability that some nodes will have all adjacent links saturated, and a degree that is frozen and does not grow any more.A typical network of this type is shown for m = 2, p = 0.9 in Figure 2 where one can clearly distinguish between an active boundary of the networks where still many triangles can be linked and a frozen bulk region of the network.
The geometrical growing networks have highly heterogeneous structure reflected in their local properties.For example, the degree distribution is scale-free for m = ∞ and exponential for m = 2 for any value of p. Moreover for finite values of m > 2 the degree distribution can develop a tail that is broader for increasing values of m (see Figure 3).Furthermore, in Figure 3 we plot the average clustering coefficient C(k) of nodes of degree k showing that the geometrical growing networks are hierarchical [46], they have a clustering coefficient Another important and geometrical local property is the curvature, defined on each node of the network.In particular we take the definition of curvature by Oliver Knill [19,20], in which the curvature R i at a node i is defined as where are the number of simplices of n nodes and dimension d n = n − 1 to which node i belongs.In the original definition of curvature for a node of a network, Knill proposed to count as simplexes all fully connected subgraphs of the network.Here, since our network is constructed as the skeleton of a simplicial complex built by adding triangles, we truncate the sum defining the curvature of the node i to simplexes of dimension d n ≤ 2, and we consider only nodes, links and triangles since these are the original simplices building our network.
where k i is the degree of node i, and t i is the number of triangles passing through node i.
Similarly, the Euler characteristic χ of our simplicial complex and the corresponding network is given by where N indicates the total number of nodes, L the total number of links and T the total number of triangles in the network.We observe that this definition, like the original definition of Oliver Knill [19,20], satisfies the Gauss-Bonnet theorem For a planar network, for bulk nodes which have k i = t i the curvature reduces to and for nodes at the boundary for which k i = t i + 1, it reduces to Note that the expression in Eq. ( 8) is also valid for m > 2 as long as p = 0.In fact for these networks only process (a) takes place and it is easy to show that k i = t i + 1.This simple relation between the curvature R i and the degree k i allows to characterize the distribution of curvatures in the network easily.
For p = 0 the curvature distribution is dominated by a negative unbounded tail that is exponential in the case m = 2 and power-law in the case m = ∞.In particular while the average curvature is R = 0 for p = 0 and any value of m, in the limit N → ∞ the fluctuations around this average are finite (i.e.R 2 < ∞) for m = 2, and infinite (i.e.We make here two main observations.First of all, with the definition of the curvature given by Eq. ( 4), our network model has heterogeneous distribution of curvatures.Therefore here we are characterizing highly heterogeneous geometries and the geometrical growing network does not have a constant curvature.This is one of the main differences of the present model compared to network models embedded in the hyperbolic plane [24,25].In particular all the networks with m = 2 or p = 0 have χ = 1 and therefore the average curvature is zero in the thermodynamical limit, but they have a curvature distribution with an unbounded negative tail that can be either exponential for m = 2 (i.e.R 2 < ∞) or scale-free as for the case m = ∞ (i.e.R 2 = ∞).
We illustrate this in Figure 3 where we plot the distribution P (R) of curvatures for different specific models of growing geometrical networks for p = 0 and p = 0.9 for different values of m.We show that for p = 0 the negative tail can be either exponential or scale-free.For p = 0.9 we have for m = 2 a negative exponential tail and for m = ∞ a positive scale-free tail of the curvature distribution consistent with a value of the exponent α < 1 and a power-law degree distribution.
Our second observation is that the case m = 2 and p = 0 is significantly different from the case m > 2 and p > 0. In fact for m = 2 and for p = 0 the Euler characteristic of the network is χ = 1 and never increases in time (see Methods for details), while for the case m > 2, p > 0 we expect χ/N to go to a finite limit as N goes to infinity.In Figure 4 the numerical results of the Euler characteristic χ as a function of the network size N shows that, for m > 2 and p = 0, χ grows linearly with N .The quantity lim N →∞ χ/N gives the average curvature in the network and is therefore zero for m = 2 and p = 0.
The generated topologies are small-world.In fact they combine high clustering coefficient with a typical distance between the nodes increasing only logarithmically with the network size.The exponential growth of the network is to be expected by the observation that in these networks we always have that the total number of links as well as the number of unsaturated links scale linearly with time.This corresponds to a physical situation in which the "volume" (total number of links) is proportional to the "surface" (number of unsaturated links).Therefore we should expect that the typical distance of the nodes in the network should grow logarithmically with the network size N .In order to check this, in Figure 4 we give D, the average distance of the nodes from the initial triangle over the different network realisations as a function of the network size N .From this figure it is clear that asymptotically in time D ∝ log N , independently of the value of p and m.
The effects of randomness and emergent locality in these networks are reflected by their cluster structure, revealed by the lower bound on their maximal modularity measured by running efficient community detection algorithms [48] (Figure 5).Moreover also their clustering coefficient provides evidence for their emergent locality (Figure 5).Finally we observe that for p > 0 the network develops also a non-trivial K-core structure.In order to show this in Figure 5 we also plot the value of K corresponding to the maximal K-core of the network.As we already mentioned, for p = 0 we have K = 2 and the network can be completely pruned by removing the triangles recursively.For p > 0 instead, the maximal K-core can have a much larger value of K, as shown in Figure 5 for a network of N = 10 4 nodes.
Therefore these structures are different from the small world model to the extent that they are always characterised by a non-trivial community and K-core structure.
The geometrical growing network is growing exponentially, so the Hausdorff dimension is infinite.Nevertheless, these networks develop a finite spectral dimension d S as clearly shown in Figure 6, for m = 2, 3, 4 and p = 0.9.We have checked that also for other values of p the spectral dimension remains finite.This is a clear indication that these networks have non-trivial diffusion properties.
The geometrical growing network model is therefore a very stylized model with interesting limiting behaviour, in which geometrical local and global parameters can emerge spontaneously from the non-equilibrium dynamics.Moreover here we compare the properties of the geometric growing network with the properties of a variety of real datasets.In particular we have considered network datasets coming from biological, social, and technological systems and we have analysed their properties.In Table 1 we show that in several cases large modularity, large clustering, small average distance and non-trivial maximal K-core structure emerge.Moreover, in these datasets a non-trivial distribution of curvature (defined as in Eq. ( 4)) is present, showing either negative or positive tail (see Figure 7).Finally the Laplacian spectrum of these networks also display a power-law tail from which an effective finite spectral dimension can be calculated (see Table 1 and Supplementary Information for details).This shows that the geometrical growing network models have many properties in common with real datasets, describing biological, social, and technological systems, and should therefore be used and modified to model several real network datasets.

DISCUSSION
In conclusion, this paper shows that growing simplicial complexes and the corresponding growing geometrical networks are characterized by the spontaneous emergence of locality and spatial properties.In fact small-world properties, non-trivial community structure, and even finite spectral dimensions are emerging in these networks despite the fact that their dynamical rules do not depend on any embedding space.These growing networks are determined by non-equilibrium stochastic dynamics and provide evidence that it is possible to generate random complex self-organized geometries by simple stochastic rules.
An open question in this context is to determine the underlying metric for these networks.
In particular we believe that the investigation of the hyperbolic character of the models with m = 2 and p = 0 (that have zero average curvature but a negative third moment of the distribution of curvature) should be extremely interesting to shed new light on "random geometries" in which the curvature can have finite or infinite deviations from its average.A full description of their structure using tools of geometric group theory could be envisaged to solve this problem.This analysis could be facilitated also by the study of the dual network in which each triangle is a node of maximal degree 3m.In fact each edge of the triangle is at most incident to other m triangles in the geometrical growing network.
Furthermore we mention that the model can be generalized in two main directions.On the one hand the model can be extended by considering geometrical growing networks built by gluing together simplexes of higher dimension.On the other hand, one can explore methods to generate networks that have a finite Hausdorff dimension, i.e. that they have a typical distance between the nodes scaling like a power of the total number of nodes in the network.Another interesting direction of further theoretical investigation is to consider the equilibrium models of networks (ensembles of networks) in which a constraint on the total number of triangles incident to a link is imposed, similarly to recent works that have considered ensembles with given degree correlations and average clustering coefficient C(k) of nodes of degree k [47].
Finally the geometrical growing network is a very stylized model and includes the essential ingredients for describing the emergence of locality of the interactions in complex networks and can be used in a variety of fields in which networks and discrete spaces are important, including complex networks with clustering such as biological, social, and technological networks.

METHODS
A. Degree distribution of m = ∞ and p = 0- In the case m = ∞ and p = 0 the geometrical growing network model is reduced to the model proposed in [44].Here we show the derivation of the scale-free distribution in this case for completeness.In the geometrical growing network with m = ∞ and p = 0 at each time a random link is chosen and a new node attaches two links to the two ends of it.
Therefore the probability that at time t a new link is attached to a given node of degree k is given by k 2t .Using this result we can easily write the master equation for the number of nodes N (k, t) of degree k at time t, Since the network is growing, asymptotically in time the number of nodes of degree k will be proportional to the degree distribution P (k), N (k, t) tP (k), where the total number of nodes in the network is N = t + 1 t.Therefore, substituting this scaling in Eq. ( 9) we get for every k > 2, while P (2) = 1/2 yielding the solution for k ≥ 2, which is equal to the degree distribution of the BA model with minimal degree equal to 2, i.e. scale-free with power-law exponent γ = 3.Here we observe that the curvature of the nodes is in this case R = 1 − k/4, therefore P (R) has a power-law negative tail, i.e.P (R) |R| −3 for R < 0 and |R| 1. Moreover we have R = 0 (consistent with χ = 1) but R 2 is diverging with the network size N .
B. Degree distribution of m = 2 for p = 0- The degree distribution for m = 2 is exponential for any value of p.Here we discuss the simple case p = 0 leaving the treatment of the case p > 0 to the Supplementary Information.
For p = 0 every node has exactly two unsaturated links.The total number of unsaturated links is L = 1 + t t at large time t.Therefore the average number of links that a node gains at time t by process (a) is given by 2/t for t 1.The master equations for the average number of nodes N (k, t) that have degree k at time t are given by In the large time limit, in which N (k, t) tP (k), the degree distribution P (k) is given by for k ≥ 2. The curvature R = 1 − k/4 is therefore in average R = 0 in the limit t → ∞ with finite second moment R The modularity M is a measure to evaluate the significance of the community structure of a network.It is defined [45] as Here, a denotes the adjacency matrix of the network, L the total number of links, and {q i }, where q i = 1, 2 . . .Q, indicates to which community the node i belongs.Finding the network partition that optimizes modularity is a NP hard problem.Therefore different greedy algorithms have been proposed to find the community structure such as the Leuven method [48] that we have used in this study.The modularity found in this way is a lower bound on the maximal modularity of the network.

E. Definition of the Clustering coefficient-
The clustering coefficient is given by the probability that two nodes, both connected to a common node, are also connected.In the context of social networks, it describes the probability that a friend of a friend is also your friend.The local clustering coefficient C i of node i has been defined as the probability that two neighbours of the node i are neighbours of each other, where t i is the number of triangles passing through node i, and k i is the degree of node i.
F. Definition of the K-core- We define the K-core of a network as the maximal subgraph formed by the set of nodes that have at least K links connecting them to the other nodes of the K-core.The K-core of a network can be easily obtained by pruning a given network, i.e. by removing iteratively all the nodes i with degree k i < K.In panel B we plot the fitted spectral dimension for N = 10 4 averaged over 40 network realizations for p = 0.8, 0.9.
FIG. 7: Curvature distribution in real datasets.We plot the distribution P (R) in a a variety of datasets with additional structural and local properties shown in Table 1.

For a general value
of p, we can assume that the average clustering C(k) of nodes of degree k, scales as C(k) k −α .Then the average number of triangles t(k) of nodes of degree k scales as t(k) = k(k + 1)C(k)/2 k 2−α .Therefore, for large k and as long as α < 1 the average curvature of nodes of degree k R(k) = R i k i =k , is dominated by the contribution of triangles and scales like R(k) k 2−α with a positive tail for large values of k.This allows us to distinguish the phase diagram in two different regions according to the value of the exponent α: the case α < 1 in which the curvature has a positive tail, and the case α = 1 in which the curvature can have a negative tail.

2 .C.
Euler characteristic χ of geometrical growing network with either m = 2 or p = 0-The Euler characteristic of the geometrical growing networks with p = 0 is χ = 1 at every time.In fact we start for a single triangle, therefore at t = 0 we have χ = 1.At each time step we attach a new triangle to a given unsaturated link, therefore we add one new node, two new links, and one new triangle, so that ∆χ = ∆N − ∆L + δT = 0. Hence χ = 1 for every network size.For m = 2 also the process (b) does not increase the Euler characteristic.In fact in this case when the process (b) occurs, and m = 2, we add only one new link and one new triangle, therefore ∆χ = 0 also for this process.Instead in the case m > 2 and p > 0, process (b) always adds a single link but the number of triangles that close is in average greater than one, therefore the Euler characteristic χ grows linearly with the network size N .

FIG. 1 : 20 FIG. 2 :FIG. 3 :FIG. 4 :FIG. 5 :
FIG.1:The two dynamical rules for constructing the growing simplicial complex and the correspondent growing geometrical network.In process (a) a single triangle with one new node and two new links is added to a random unsaturated link, where by unsaturated link we indicate a link having less than m triangles incident to it.In process (b) with probability p two nodes at distance two in the simplicial complex are connected and all the possible triangles that can link these two nodes are added as long as this is allowed (no link acquires more than m triangles incident to it).The growing geometrical network is just the network formed by the nodes and the links of the growing simplicial complex.In the Figureweshow the case in which m = 2.

FIG. 6 :
FIG. 6: The spectral dimension of the geometrical growing networks.Asymptotically in time, the geometrical growing networks have a finite spectral dimension.Here we show typical plots of the spectral density of networks with N = 10 4 nodes, p = 0.9 and m = 2, 3, 4 (panel A).

TABLE I :
Table showing the structural properties of a variety of real datasets.N indicates the total number of nodes, L the total number of links, the average shortest distance