The inherent community structure of hyperbolic networks

A remarkable approach for grasping the relevant statistical features of real networks with the help of random graphs is offered by hyperbolic models, centred around the idea of placing nodes in a low-dimensional hyperbolic space, and connecting node pairs with a probability depending on the hyperbolic distance. It is widely appreciated that these models can generate random graphs that are small-world, highly clustered and scale-free at the same time; thus, reproducing the most fundamental common features of real networks. In the present work, we focus on a less well-known property of the popularity-similarity optimisation model and the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {S}}^1/{\mathbb {H}}^2$$\end{document}S1/H2 model from this model family, namely that the networks generated by these approaches also contain communities for a wide range of the parameters, which was certainly not an intention at the design of the models. We extracted the communities from the studied networks using well-established community finding methods such as Louvain, Infomap and label propagation. The observed high modularity values indicate that the community structure can become very pronounced under certain conditions. In addition, the modules found by the different algorithms show good consistency, implying that these are indeed relevant and apparent structural units. Since the appearance of communities is rather common in networks representing real systems as well, this feature of hyperbolic models makes them even more suitable for describing real networks than thought before.


Introduction
Complex network theory is a rapidly expanding interdisciplinary field, strongly interwoven with statistical physics, concentrating on the interesting non-trivial statistical features of the graphs representing the connections/interactions between entities of complex systems [1][2][3][4][5]. Over the last two decades, the vast number of studies of real networks have shown that some of these features seem to be almost universal, such as the small-world property [6,7], the relatively high clustering coefficient [8], the inhomogeneous degree distribution [9,10], and the presence of communities [11][12][13]. Grasping these properties in a unified modelling framework is a non-trivial problem; however, a very notable approach pointing in this direction is given by hyperbolic network models [14][15][16][17][18][19][20][21], centred around the idea of placing nodes on a hyperbolic plane, and drawing links with a probability depending on the metric distance. known but somewhat surprising fact that angular inhomogeneity is not a necessary condition for the presence of communities in hyperbolic network models, and that communities can appear in networks generated by the 'plain' PSO model or the S 1 /H 2 model as well. This was first shown for the E-PSO model (a generalisation of the PSO model [15]) in Refs. [45,47] and for the S 1 /H 2 model in Ref. [46], along with the proposition of the "Community-Sector hypothesis", supposing that most members of a community gather in the same angular sector on the hyperbolic plane. In the closely related study of Ref. [48], the dependence of the modularity (a commonly used quality score for communities introduced in Ref. [51]) on the temperature parameter T ∈ [0, 1) of the E-PSO model (controlling the clustering coefficient) for communities found by the Louvain method [52] was also studied to some extent. According to the results, the modularity can be even above 0.7 when T is low, and gradually decreases when T is increased; however, can still stay above 0.3 when T approaches 1. In parallel with these studies, in Ref. [53] the analogy between the hyperbolic embedding and the community structure was studied mostly for real networks and partly for synthetic graphs generated by the PSO model, where again, the PSO networks was observed to have a notable community structure, just like the real networks.
Even though the above results already provide important signs related to the presence of communities in hyperbolic networks with homogeneous angular node distribution, here we revisit this phenomenon in a detailed in-depth study, motivated by the following. First of all, in spite that a modularity value above 0.3 can be a good community indicator in practice [54], it is important to note that a high modularity value alone is not always accompanied by a true modular structure, as e.g. Erdős-Rényi random graphs [55] or scale-free networks obtained with the Barabási-Albert model [56] can also yield modularity values above 0.8 under certain circumstances [57,58]. Thus, in order to have a truly solid claim about the presence of communities in random graph models without any explicit community formation mechanism, it is best to back up the large modularity values with further analysis of the supposed modular structure from multiple aspects.
Another task of high importance is the more detailed exploration of the parameter space. Apart from simple parameters such as the network size and the average degree, both the PSO model and the S 1 /H 2 model have basically two parameters: one controlling the decay exponent γ of the scale-free degree distribution and the other controlling the clustering coefficient. By analysing the effect of these parameters on the communities, we can gain a clear picture about what sort of modular structure can be expected when the aim is to generate a hyperbolic random graph with specified γ and clustering coefficient values.
Along this line, here we generate random graphs according to the PSO and the S 1 /H 2 models in a wide range of parameter settings and examine their community structure with the help of three well-established community finding algorithms given by the Louvain method [52], the Infomap algorithm [59] and asynchronous label propagation [60]. The Louvain approach is known to be a very efficient modularity maximising method, while the other two algorithms included do not build on the modularity and extract the modular structure of the studied networks based on different concepts. By applying independent community finding methods, the comparison between the found modules can reveal whether they correspond to strong, significant structures that can be located consistently in several different ways or not. In order to gain a quantitative comparison between the communities found by the different methods, we rely on the concept of the adjusted mutual information (AMI) [61], a well-known information-theoretic similarity measure. Besides the modularity, we also examine the angular separation index (ASI) of the communities [50] corresponding to a measure developed specifically for hyperbolic networks, characterising the angular mixing of the groups of nodes (communities) on the native disk.
The paper is organised as follows. In section 2 we describe the PSO and S 1 /H 2 models used for network generation, together with a short summary of the applied community finding methods and the quality measures used for evaluating the detected community structures. This is followed by the results in section 3, whereas we discuss the implications of our findings in section 4.

Methods and preliminaries
We begin the description of the used methods with a brief introduction to hyperbolic network models in section 2.

Hyperbolic network models
When studying the underlying hyperbolic geometry of complex networks, commonly the native representation of the two-dimensional hyperbolic space is used [62], in which the hyperbolic plane of constant curvature K < 0 is represented by a disk of radius R in the Euclidean plane (for which K = 0). In this representation the Euclidean angles between hyperbolic lines are equal to their hyperbolic values, and the radial coordinate r of a point (defined as its Euclidean distance from the disk centre) is equal to its hyperbolic distance from the disk centre. The hyperbolic distance between two points is measured along their connecting hyperbolic line, which is either the arc of the Euclidean circle going through the given points and intersecting the disk's boundary perpendicularly or -if the disk centre falls on the Euclidean line connecting the two points in question -the corresponding diameter of the disk. The hyperbolic distance x between two points at polar coordinates (r, θ) and (r , θ ) fulfills the hyperbolic law of cosines written as cosh(ζx) = cosh(ζr) cosh(ζr ) − sinh(ζr) sinh(ζr ) cos(∆θ), where ζ = √ −K and ∆θ = π − |π − |θ − θ || is the angle between the examined points. According to Ref.
[62], for 2 · √ e −2ζr + e −2ζr < ∆θ and sufficiently large ζr and ζr , the hyperbolic distance can be approximated as 2.1.1. The PSO model for network generation In the popularity-similarity optimisation model, nodes are placed one by one in the above described native disk representation of the hyperbolic plane and connected with probabilities depending on the hyperbolic distance. The parameters of the model can be listed as follows: • The curvature K < 0 of the hyperbolic plane, controlled by ζ = √ −K > 0. Changing the value of ζ corresponds to a simple rescaling of the hyperbolic distances; the usual custom is to set the value of ζ to 1 (i.e. set K to −1).
• The final number of nodes N ∈ Z + in the network.
• The number of connections m ∈ Z + established by the newly appearing nodes, corresponding to the half of the average degree k . (The first m nodes of the network form a complete graph).
• The popularity fading parameter β ∈ (0, 1], controlling the outward drift of the nodes on the native disk. The exponent γ of the power-law decaying tail of the degree distribution is related to the popularity fading parameter as γ = 1 + 1/β. • The temperature T ∈ [0, 1), controlling the average clustering of the network, where lower temperature results in a higher average clustering coefficient.
During the random graph generation process, initially the network is empty, and at each time step i = 1, 2, ..., N a new node joins the network as follows: (i) The new node i appears at polar coordinates (r ii , θ i ), where the radial coordinate r ii is set to 2 ζ ln(i) and the angular coordinate θ i is sampled from [0, 2π) uniformly at random. (ii) The radial coordinate of each previously (at time j < i) appeared node j is increased according to the formula r ji = βr jj + (1 − β)r ii in order to simulate popularity fading.
(iii) The new node i establishes connections with previously appeared nodes. Only single links are permitted.
(a) If the number of previously appeared nodes is not larger than m, node i connects to all of them. (b) Otherwise, the new node i connects to m of the previously appeared nodes, where the connection probabilities are determined by the hyperbolic distances between the node pairs, which can be calculated based on equation (1). If T = 0, node i simply connects to the m hyperbolically closest nodes, whereas at temperatures T > 0, any previous node j = 1, 2, ..., i − 1 gets connected to node i with probability where the cutoff distance R i is set to if β < 1, ensuring that the expected number of nodes connecting to the new node i at its arrival is equal to m.
2.1.2. The S 1 /H 2 model for network generation In the S 1 model [19], first the N number of nodes are placed on a one-dimensional sphere (i.e. a circle) and each is given a hidden variable κ i ∈ [κ 0 , ∞), i = 1, 2, ..., N . Then, each pair of nodes becomes connected with a probability taking into account both the angular distance and the hidden variables. In the below described algorithm [22,63], κ i corresponds to the expected degreek i of node i in the thermodynamic limit. Thus, the connection rule can be phrased in a simple, intuitive way, namely the nodes that are closer in the hidden metric space underlying the network are more likely to be connected, but in the meantime nodes with higher degree obtain farther-reaching connections as well. In the equivalent H 2 model [22], the hidden variable κ i is converted into the radial coordinate r i in the native representation of the hyperbolic plane, and the connection probability depends on the hyperbolic distance between the nodes, that expresses the effect of both the similarity and the node degrees (the popularity). The parameters of these models can be listed as follows: • The total number of nodes N .
• The average degree k .
• The exponent 2 < γ of the tail of the degree distribution following a power law of the form P (k) ∼ k −γ .
In the S 1 model, a network of N number of nodes -each of them indexed by i ∈ [1, N ] -is generated through the following steps: (i) For each node i an angular coordinate θ i is sampled from the interval [0, 2π) uniformly at random.
(ii) For each node i a hidden variable κ i is sampled from the interval [κ 0 , ∞) according to the distribution ρ(κ) = (γ − 1) · κ −γ κ 1−γ 0 , where κ 0 = γ−2 γ−1 · k . (iii) Each pair of nodes i − j is connected with probability where ∆θ ij = π−|π−|θ i −θ j || is the angular distance between the nodes, and µ = α 2π k ·sin π α . To facilitate a straightforward comparison with the PSO model, we converted the hidden variable associated to the nodes into a radial coordinate in the native representation of the hyperbolic plane (at K = −1 curvature) as , depending on the hyperbolic distance x ij in the same way as the connection probability in equation (3).

Finding and evaluating communities
Communities (also referred to as modules, cohesive groups, clusters) are frequently occurring structural units in complex networks having usually a larger internal and a smaller external link density, lacking however a widely accepted unique definition. Finding, evaluating and comparing communities are all non-trivial problems, with a vast number of different solutions suggested in the literature [11][12][13]. Here we first describe the concept of modularity in section 2.2.1, corresponding to the most widely used measure for quantifying the quality of communities. This is followed by the angular separation index in section 2.2.2, providing a score specific for hyperbolic networks, measuring the angular intermixing between communities in the hyperbolic disk, and the adjusted mutual information in section 2.2.3, allowing the quantitative comparison between community partitions found by different methods. In our studies, we have picked three wellgrounded, commonly used methods for detecting communities, namely the asynchronous label propagation, detailed in section 2.2.4, the Louvain algorithm, described in section 2.2.5, and Infomap, summarised in section 2.2.6.

Modularity
Probably the most well-known quality measure for communities is given by the modularity [51], comparing the observed density of links between the members of the same community with the expected link density based on some random null model, written in general as where N is the number of nodes in the network, A ij denotes an element the adjacency matrix (A ij ≡ A ji = 1 if i is connected to j and otherwise A ij ≡ A ji = 0), P ij gives the connection probability between nodes i and j in the null model, L stands for the total number of links in the network, c i is the community to which node i belongs and the Kronecker delta δ c i ,c j ensures that non-zero contribution can come only from node pairs in the same community. This quality measure can take values in the Q ∈ [−1/2, 1] interval, where larger values of Q indicate stronger communities that have a significantly larger internal link density compared to the random expectation. In practice, a natural choice for the null model is provided by the configuration model, where the connection probability between nodes i and j can be given with the node degrees k i and k j simply as P ij = k i k j 2L . This form has also been extended to weighted networks [64], where the number of links L is replaced by w ij (with w ij denoting the link weight between nodes i and j), and the node degrees are replaced by the node strengths defined e.g. for node i as In order to take into account the hyperbolic distances along the links, we adopted the practice suggested in Ref. [42], and used in our community analysis a link weight defined as w ij ≡ w ji = 1 1 + x ij (9) for adjacent nodes i and j, where the hyperbolic distance x ij was calculated based on equation (1) using ζ = 1. In our analysis, we used the code available from Ref. [65] for calculating the weighted modularity of the detected community structures.

Angular separation index
In networks embedded into the hyperbolic disk, communities usually occupy well-defined angular regions, having little or no overlap with the region of the other communities [45][46][47][48][49][50]. A quantitative score characterising this tendency is given by the angular separation index (ASI) [50]. Its basic idea is to compare the number of "mistakes" in the angular arrangement -i.e. the number o i of nodes belonging to other communities falling between the boundaries of the given module i -summed over all the C communities of the network with the highest total number of mistakes obtained with the same clustering of the nodes when the angular coordinates are shuffled at random. Formally, the ASI can be expressed as where the maximisation in the denominator is over a fixed number of random shuffles (we used 1000 shuffles, i.e. r = 1, 2, ..., 1000, as suggested in Ref. [50]). Accordingly, an ASI value close to 1 indicates well-separated clusters with a low intermixing in the angular coordinates of the members, and an ASI value close to 0 is obtained when the angular arrangement of the members of different clusters is random.

Adjusted mutual information
In the field of community detection, together with the rapid increase in the number of different algorithms proposed, came the need for well-grounded methods for comparing the results of the different approaches. Since e.g. the number of found communities and the sizes of the modules can show large variations across the different methods, judging the extent of similarity between two community partitions is non-trivial. Given two sets of communities A and B over the same network, hosting C A and C B number of communities each, a wellknown information theoretic similarity measure is offered by the normalised mutual information (NMI) [66,67], that can be defined based on the mutual information and the entropies where N ij denotes the number of shared members of communities i and j, N i and N j stand for the number of nodes in the individual communities, and the total number of nodes in the network is given by N . .
This quantity becomes 1 if and only if the partitions A and B are identical, otherwise its value is lower than 1.
The concept of adjusted mutual information (AMI) supplements this consistent upper bound with a consistent zero expectation corresponding to the similarity we can expect by random chance [61,68]. To achieve this, the average mutual information of random partitions A and B is subtracted from the nominator, and the average maximum entropy of random partitions is subtracted from the denominator yielding In our analysis, we used the code available from Ref. [69] for calculating the AMI between the found community partitions.

Asynchronous label propagation algorithm for community detection
The asynchronous label propagation algorithm [60] simulates the diffusion of labels along the links in the examined network, where the nodes are labelled by the identifier of the community to which they belong, and these labels are regularly updated based on the labels of the neighbouring nodes using a majority rule. The idea behind this method is that as the labels propagate, the densely connected groups of nodes will reach a consensus on a unique label. This approach is not aimed at optimising any predefined measure or function. Initially, a unique community label is assigned to each node in the network. Afterwards, the following asynchronous update process is repeated until every node in the network has at least as many neighbours within its own community as it has in any other communities: (i) Nodes are arranged in a random order.
(ii) According to this order, we iterate over the nodes and update their label one by one based on their neighbours: each node joins the community to which most of its neighbours currently belong. Note that the label of the neighbours may have already been updated in the given iteration. The neighbouring labels are weighted based on the strength of their link connected to the current node, and ties in the weighted number of neighbours are broken at random.
Due to the random propagation of the labels, in this approach it is possible that distinct communities may eventually settle to the same label. Therefore, after the termination of the above algorithm (where we used the code available from Ref. [70] with link weights calculated from equation (9)), we also applied a breadth-first search on the subgraphs of each individual community to separate the disconnected (i.e. connected only via nodes of different communities in the original network) groups of nodes having the same label, as suggested in Ref. [60].
2.2.5. Louvain algorithm for community detection Though finding the exact maximum of modularity is a computationally hard problem [71], over the years several heuristic modularity optimisation methods were proposed [11,12], and one of the most popular among these is the Louvain algorithm [52]. This approach is capable of unfolding a complete hierarchical community structure (where modules can be composed of submodules) within a relatively short time even for extremely large networks. The algorithm is repeating two phases iteratively until the modularity stops improving: (i) Searching for a local maximum in the modularity at the given organisation level of the network.
• First, a unique community is assigned to each node of the current network.
• This is followed by a repeated iteration over the nodes until the modularity does not increase any further (or, in our case, until the gain in the modularity does not decrease below a threshold of ∆Q min = 10 −7 ).
-We evaluate the changes in the modularity that would take place if the current node i was transferred to the community of each of its neighbours. -If all the calculated modularity changes are negative, node i stays in its current community. Otherwise, we carry out the transfer of node i where the improvement in the modularity is the largest.
(ii) Moving up to the next organisation level of the system represented by the network between the just found communities: • Each community is considered as a single node.
• A self-loop is created for each new node, weighted by twice the sum of the link weights within the corresponding community. • The new nodes are connected by links weighted by the sum of the link weights between the corresponding community members on the previous organisation level.
In our investigations, we weighted the links in the examined hyperbolic networks according to equation (9) and considered only the final partition (i.e. the top-level community structure, having the highest modularity among the different organisation levels) found by the implementation of the algorithm available from Ref. [72].
2.2.6. Infomap algorithm for community detection The Infomap algorithm, as suggested by its name, provides an information-theoretic approach for finding communities in networks [59] based on a correspondence between the optimal community structure and the most parsimonious description of an infinitely long random walk trajectory on the network. The random walk can be considered as a proxy for the flow in the network (travelling passengers, spreading ideas, etc.), making its components interdependent to varying extents. It is intuitive to assume that communities correspond to localized regions of the network where random walkers spend a lot of time. We can take advantage of this property of communities when aiming for the most compact description of a random walker trajectory as follows.
In a simple approach, the trajectory is corresponding to the sequence of the visited nodes, each labelled with a unique codeword. However, trajectories can be defined more concisely by using a map-like description following the principle of geographic maps, where e.g. the same street names appear in multiple cities. In a similar manner, after naming the communities, the code words of the nodes can be recycled among the different communities, and only the members of the same community have to be given unique names. By limiting the number of different code words used to denote the nodes, the length of these code words can be reduced, leading to a considerable saving in the length of the trajectory description. Naturally, the recycling of the code words also comes at a cost, namely one has to indicate when the random walker leaves a given community to enter a new one by specifying the code word of the new community. Nevertheless, if communities are well separated from each other, then the transition between communities is not frequent, and we gain in the length of the trajectory description even with this extra cost taken into account.
For a map-like trajectory description based on a given community structure, the efficiency can be evaluated by the so-called map equation [59], expressing the optimal code length (i.e. the theoretical lower bound of the code length) for an average movement of an infinitely long random walk. The Infomap algorithm itself searches for the multi-level, hierarchical network partition minimising the map equation in a heuristic manner, splitting modules into submodules, subsubmodules and so on in order to reduce the description length. If the splitting of a given leaf in the community hierarchy does not decrease the description length anymore, the downward growth of the given branch in the hierarchy is stopped. In our community analysis, we used the code available from Ref. [73] with link weights calculated according to equation (9) and queried from the output of the algorithm the communities corresponding to the leaves of the community hierarchy.

Results
We generated random graphs using the PSO and the S 1 /H 2 models in a wide range of parameter settings, and used the obtained networks as inputs for the community finding methods given by the asynchronous label propagation, the Louvain and the Infomap algorithms. According to the results, the hyperbolic random graphs seemed to possess a strong community structure for quite a few combinations of the network generation parameters.
As an illustration, in figure 1 we show the partition found by the Louvain algorithm in networks of size N = 1000 both according to the layout in the native disk representation of the two-dimensional hyperbolic space and according to a standard layout in the Euclidean plane. In figures 1(a) and 1(c), the sets of nodes grouped together by Louvain occupy well-defined angular regions in the hyperbolic disk with barely any overlap with the region of the neighbouring communities. However, according to figures 1(b) and 1(d), the detected communities are clearly outlined even in such layouts which do not build on the hyperbolic origin of the networks.
We found that the angular separation of the detected modules exemplified by figures 1(a) and 1(c) is quite general in the hyperbolic disk. Using the angular separation index (ASI) described in section 2.2.2, we evaluated quantitatively the angular separation of the modules obtained with the asynchronous label propagation, the Louvain and the Infomap algorithms for a large variety of the network generation parameters. In the case of the PSO model, for both the temperature T and the popularity fading parameter β we took 10 equidistant data points between 0 and 1 (altogether 100 parameter combinations in the T − β parameter plane) and generated 100 networks with each parameter setting. In the case of the S 1 /H 2 model, to allow a straightforward comparison with the results seen for the PSO model, instead of the original model parameters α and γ we changed to 1/α (analogous to the temperature T in the PSO model) and 1/(γ − 1) (equivalent to the popularity fading parameter β in the PSO model). Similarly to the studies of the PSO model, we considered a 9 × 9 grid in the 1/α − 1/(γ − 1) parameter plane (in the S 1 /H 2 model 2 < γ and α is finite; hence, this model is not defined for β = 1 and T = 0), and generated 100 networks for each parameter combination. As it is shown in figure 2, for PSO and S 1 /H 2 networks of size N = 10, 000 and average degree k = 10 a considerably high ASI can be obtained with all three community finding methods for most of the T − β and α − γ parameter settings.
In order to verify that the angularly separated modules detected by the asynchronous label propagation, the Louvain and the Infomap algorithms are indeed relevant structural units of the networks, we measured the quality of the extracted community partitions by the weighted modularity Q given in equation (8). In figures 3 and 4 we show the corresponding results for networks of size N = 10, 000 and expected average degree k = 10, where the weighted modularity is plotted as a function of the model parameters with the help of heat maps. According to figure 3, for a considerably large region in the parameter plane the modularity averaged over 100 networks is larger than 0.65 for the communities found by Infomap (figure 3(c)), larger than 0.75 for the communities extracted by asynchronous label propagation (figure 3(a)) and larger than 0.85 for the communities located by Louvain ( figure 3(b)). For Louvain and Infomap, the highest scores in the modularity are achieved at low T and β parameters, corresponding to networks with a high average clustering coefficient and a rather homogeneous degree distribution. The modularity is high in this region also for the asynchronous label propagation; however, in this case the highest modularity values occur for mid-range β values. When β approaches 1, the observed Q seems to decrease for all community finding methods. Nevertheless, Q can still take relatively high values at e.g. β = 0.6, where the generated network is expected to be scale-free with a degree decay exponent of γ 2.67. According to the results displayed in figure 4, the maximum of Q for the S 1 /H 2 model is in the low-value regime of the 1/α − 1/(γ − 1) parameter plane for all three community finding methods, where the modularity values seem to be higher by a small margin compared to the case of the PSO model, e.g. reaching up to Q = 0.99 for the communities found by Louvain.
As mentioned in the Introduction, a large modularity value alone does not always indicate a true modular structure as e.g. both Erdős-Rényi random graphs and Barabási-Albert random graphs have been shown to display relatively high modularity values under certain circumstances [57,58]. However, for random graphs generated by the aforementioned two classical models with the same size and link density as in figures 3 and 4, the modularity can reach up to only about 0.28, which is significantly smaller compared to the Q values we observed in the studied hyperbolic networks. Furthermore, in the present study 2 out of the 3 community finding methods applied are not based on modularity maximisation, and they still find communities that yield high Q values.
In order to examine the significance of the found communities from another aspect, we also compared the community partitions obtained with the different methods using the adjusted mutual information described in section 2.2.3. The results are displayed in figure 5 with the help of heat maps, showing the AMI averaged over 100 networks as a function of the model parameters in the studied parameter planes. According to the figure, the highest similarity values occur between the communities found by asynchronous label propagation and Infomap (figures 5(a) and 5(b)). These can reach up to even AMI = 0.9, indicating an almost one-to-one correspondence between the modules of the different partitions. On the other hand, the lowest similarity values can be observed for Louvain and Infomap (figures 5(e) and 5(f)), where the typical value of the AMI is about 0.5. However, this is still in the range of acceptable consistency between the different partitions and is definitely way higher than what we would expect e.g. for random partitions. Therefore, based on figure 5 we can say that in those parameter regions where the communities are characterised by The results for the PSO model are given in the left column (panels (a), (c) and (e)), whereas the ASI obtained for the S 1 /H 2 model appears in the right column (panels (b), (d) and (f)). The ASI for the communities detected by asynchronous label propagation is given in the top row (panels (a) and (b)), the ASI regarding the results of Louvain is shown in the middle row (panels (c) and (d)) and the ASI for the partitions found by Infomap is presented in the bottom row (panels (e) and (f)). We show the measured ASI (indicated by the color, averaged over 100 samples) as a function of the model parameters T and β, or 1/α and 1/(γ − 1) for networks of size N = 10, 000 and expected average degree k = 10.  Figure 3. Modularity in the PSO model. We show the weighted modularity Q (indicated by the color, averaged over 100 samples) as a function of the model parameters T and β for networks of size N = 10, 000 and expected average degree k = 10. The panels correspond to the results obtained with asynchronous label propagation (a), Louvain (b), Infomap (c), and when the best community partition is taken from the three methods according to Q (d).
relatively high modularity scores, the partitions obtained with the different community detection methods also show significant consistency with each other. This fact reassures that the modules we observe in the studied hyperbolic networks are indeed relevant and apparent structural units that can be detected based on multiple approaches in a consistent way.
A basic statistic regarding the revealed community structures is given by the community size distribution, which is exemplified by figure 6 for the three examined community finding methods. According to that, the size of the communities found by the asynchronous label propagation follows more or less a power law for both the PSO model (figure 6(a)) and the S 1 /H 2 model ( figure 6(b)). In the regime of small and middle-sized communities, the curve corresponding to Infomap seems to be close to that; however, towards the larger sizes it decays faster. In contrast, the community size distribution yielded by Louvain is quite distinct from the curves obtained with both asynchronous label propagation and Infomap, mostly due to a peak at higher community sizes for both the PSO model and the S 1 /H 2 model. This difference between the community size distributions is in  Figure 4. Modularity in the S 1 /H 2 model. We show the weighted modularity Q (indicated by the color, averaged over 100 samples) as a function of the model parameters 1/α and 1/(γ − 1) for networks of size N = 10, 000 and expected average degree k = 10. The panels correspond to the results obtained with asynchronous label propagation (a), Louvain (b), Infomap (c), and when the best community partition is taken from the three methods according to Q (d).
correspondence with the results seen for the AMI, where the output of Infomap and asynchronous label propagation turned out to be more similar to each other than to Louvain.
An interesting question related to the visibly strong community structure obtained with the studied hyperbolic models is how does it relate to the community structure of such networks where the angular distribution of the nodes is non-uniform, as in the case of the hyperbolic network models proposed in Refs. [16][17][18]20]. To address this question, here we define a transition between PSO networks with uniform angular node distribution and PSO networks generated with clear angular separation between modules in a similar fashion to the nPSO model introduced in Refs. [17,18], but with uniform angular distribution within the supposed communities instead of Gaussian distribution. Our related framework begins with generating a PSO network as outlined in section 2.1.1, and then running a community finding algorithm on the resulting network for locating its modules (we used Louvain for this purpose). Based on the found communities, we can then generate PSO networks with equally-sized gaps between the supposed modules by dividing the The results for the PSO model are given in the left column (panels (a), (c) and (e)), whereas the AMI obtained for the S 1 /H 2 model appears in the right column (panels (b), (d) and (f)). The AMI between the communities detected by asynchronous label propagation and Infomap is given in the top row (panels (a) and (b)), the AMI regarding the results of asynchronous label propagation and Louvain is shown in the middle row (panels (c) and (d)) and the AMI between the partitions found by Louvain and Infomap is presented in the bottom row (panels (e) and (f)). We show the measured AMI (indicated by the color, averaged over 100 samples) as a function of the model parameters T and β, or 1/α and 1/(γ − 1) for networks of size N = 10, 000 and expected average degree k = 10.  [0, 2π) interval into subintervals having a width proportional to the size of the given community, where the aggregated width of the subintervals can be expressed as 2π(1 − g) when the aggregated width of the gaps is 2πg. The number of nodes placed in a given subinterval is equal to the number of members of the corresponding community, and the angular coordinate of these nodes is distributed uniformly at random within the subinterval. Otherwise, the network generation process is identical to that in the original PSO model. In figure 7 we show results obtained from this framework, where the top panels depict the modularity for communities found by the Louvain algorithm as a function of the relative gap size g, and the bottom panels provide layout examples at different values of g. According to the figure, although Q increases as a function of the relative gap size g as expected, this increase is rather mild, except for large β or T parameters. In other words, the modularity in the uniform PSO model can be quite close to the Q that we obtain for modules with high angular separation, and therefore, the communities we observe in the uniform PSO model can be viewed also as a meaningful limit for the modular structure of systems where the angular distribution of the nodes is non-uniform.
As a closing of this section, we draw the attention to the supplementary materials, listing further results on the communities found in PSO and S 1 /H 2 networks at different system sizes and average degree values (see sections S2-S4). In addition, in the supplementary materials our analysis is repeated on an extension of the PSO model known as the E-PSO model [15] (described in section S1), yielding results that are very similar to what we have detailed here. In section S5 we also examine what happens in the PSO model if the angular distribution of the nodes is strictly equidistant instead of homogeneous random. The qualitative behaviour of the communities found during these investigations is basically the same as seen here. Finally, in section S6 we show the results obtained for the examined PSO, E-PSO and S 1 /H 2 networks when setting all the link weights to 1 instead of using the link weights given in equation 9.

Discussion and conclusions
Motivated by interesting signs of modules in hyperbolic networks with homogeneous angular node distribution reported in Refs. [45][46][47][48]53], here we revisited the question of community structure in the PSO and S 1 /H 2 models in a detailed in-depth study. Although for both of these models the model construction itself lacks any intentionally built-in community structure, the networks generated in these approaches still possess apparently strong communities for a wide range of the model parameters, as indicated by the high modularity values measured on the results of three independent community finding algorithms, namely asynchronous label propagation, Louvain and Infomap. The significance of the found communities is supported by the fact that only 1 out of the 3 applied methods is based on modularity optimisation, and that the comparison between the different partitions yielded reasonably high AMI values, indicating a considerable consistency between the results. Furthermore, the modularity values that can be achieved in Erdős-Rényi random graphs or Barabási-Albert scale-free networks at the same average degree are way lower compared to the Q values we observed in the hyperbolic networks. In addition, the ASI (corresponding to a quality measure independent of the modularity) was also very high for the major part of the parameter space.
The parameter plane in which we examined the behaviour of the modularity corresponded to the (T, β) ∈ [0, 1) × (0, 1] plane in the PSO model and the analogous ( 1 α , 1 γ−1 ) ∈ (0, 1) × (0, 1) plane in the S 1 /H 2 model. The intuitive meaning of these parameters can be summarised as follows: the average clustering coefficient of the generated networks is regulated by the temperature T and its counterpart 1/α (lower values result in higher average clustering coefficients), while the power-law decay exponent γ of the degree distribution is controlled by the popularity fading parameter β in the case of the PSO model according to the formula γ = 1 + 1/β and is itself a parameter of the S 1 /H 2 model. According to our results, when changing these parameters, the behaviour of the modularity follows a similar pattern for both hyperbolic models and all three community finding algorithms, except for the PSO model combined with asynchronous label propagation.
Putting aside the above-mentioned exception, for increasing T (or 1/α), together with a decrease in the average clustering coefficient the modularity also decreases (which is absolutely natural), and when β (or equivalently, 1/(γ − 1)) is increased, resulting in a more fat-tailed degree distribution, Q decreases again. However, the dependence of the modularity on the model parameters is not at all linear, instead we can observe a high, slightly decreasing plateau in the parameter plane with the maximum values in the origin and a relatively narrow belt of lower Q values at the feet of the plateau, placed far from the origin. For the communities found by asynchronous label propagation in the networks generated by the PSO model, the behaviour is slightly different: although Q is high close to the origin, for increasing β it shows a slow increasing tendency, reaching its maximum in the medium β range, followed by a drop for high β values, similarly to the results seen for the other combinations between network generation models and community finding methods.
When considering the parameter settings close to the origin (T → 0, β → 0 in the PSO model and 1/α → 0, 1/(γ −1) → 0 in the S 1 /H 2 model), which yield the largest modularity values in most of the cases, it is important to note that the corresponding networks are homogeneous in terms of the degree (the degree decay exponent γ is large) and do not resemble scale-free real networks. However, when β is increased (or equivalently, γ is decreased), the modularity decreases only by a small magnitude for quite some range. E.g., at β = 0.6, corresponding to γ 2.67, the modularity averaged over 100 networks can still reach up to Q = 0.929 in the PSO model and Q = 0.898 in the S 1 /H 2 model. In other words, when setting the degree decay exponent to moderate values often seen in real systems with the help of β or by directly tuning γ, the networks obtained with the studied models can still possess a strong community structure if the other parameter (T or 1/α, controlling the clustering coefficient) is not pushed to extremely high values, meaning that the clustering coefficient is not reduced to extremely low values.
The regime where Q drops to lower values is on the one hand where β → 1 (or equivalently γ → 2 from above), corresponding to extremely fat-tailed degree distributions, and where T → 1 (or equivalently α → 1 from above), corresponding to networks with clustering coefficients close to zero. Thus, if one would like to generate scale-free hyperbolic networks having communities and a degree decay exponent close to γ = 2, it might be a better option to choose the models in Refs. [16][17][18]20], where the community formation is helped by the non-uniform angular distribution of the nodes. Nevertheless, except the mentioned extreme regimes, the studied "traditional" hyperbolic models seem to produce a strong enough community structure that can be taken as a simple model for the apparent modular structures often observed in real systems.
A remaining interesting question is why do the observed communities arise despite the absence of any explicit community formation mechanisms built into the construction of the studied models? In short, the same model properties that allow the development of a large clustering coefficient in the generated random graphs on the level of nodes also make the emergence of communities possible on a slightly larger scale. Communities are local structures in the sense that members connect to each other with a larger link density than to the rest of the system. As mentioned in the Introduction and as it can be seen in figures 1(a) and 1(c), in hyperbolic networks such units correspond to well-defined angular regions [45][46][47][48][49][50], with a relatively low number of links across them. Thus, as noted in Ref.
[53], the community structure of a network can be also viewed as a coarse version of its layout in the hyperbolic space.
In our view, the key element in the formation of communities in the studied models is that due to the hyperbolicity of the native disk, for a node newly appearing at the periphery it is much easier to connect radially than "sideways" (i.e. to nodes with similarly large radial coordinate), as indicated by e.g. the distance formula in equation (2). If the angular separation between the previously arrived nodes that are placed at smaller radii is large enough, they can become distinct attractive community cores to which the new nodes can connect with only a small interference (cross-links) between the different angular regions. In the PSO model, the condition for a large enough separation between the inner nodes is that they are pushed outwards (according to the popularity fading) relatively fast, i.e. β is not large. In parallel, the cutoff in the connection probability as a function of the hyperbolic distance must also be sharp enough for localised connections; thus, T must not be set large either to support community formation. A similar line of arguments holds also for the S 1 /H 2 model. When γ is large, then due to the relatively rapid decay in the degree distribution, the hidden variables κ i take low values that are mapped to relatively high radial coordinates even for the inner nodes, helping the formation of community cores. In parallel, a large α parameter in the S 1 /H 2 model has a similar effect to a low T value in the PSO model, sharpening the cutoff in the connection probability as a function of the metric distance.
We also compared the community structure in the PSO model to the communities in networks with a non-uniform angular distribution of the nodes in a simple framework, motivated by the fact that the embedding of real networks is often non-homogeneous in terms of the angular coordinates, similarly to the hyperbolic models with built-in community formation introduced in Refs. [16][17][18]20]. Our framework enables a continuous transition between the homogeneous angular node distribution of the PSO model and an angular distribution with empty gaps between the supposed modules, where the angular coordinates are distributed uniformly at random inside the allowed angular regions. According to our results, the modularity shows only a mild increase as a function of the relative gap size for the majority of the parameter settings. Thus, the modules in the original PSO model can be quite close in strength to modules occurring in hyperbolic networks with a non-uniform angular node distribution, and the modular structure of the PSO model as a whole can be treated as a limiting case for those hyperbolic systems where the community structure is accompanied with a non-uniform distribution in the angular coordinates of the nodes.
Our findings are also closely related to the community structures observed in networks grown with the help of simplicial complexes [39,40] that were also shown to be hyperbolic. Explicit community formation is not built in these models either; however, the simplicial complexes form complete subgraphs (cliques), and when aggregating such dense structures, the appearance of communities seems to be more natural compared to the models studied here, where links are introduced one by one. Nevertheless, the formation of communities observed here deepens further the connection between hyperbolic networks and the models introduced in Refs. [39,40], that are known to possess a strong community structure.
In conclusion, our study draws the attention to the important but less known fact that the PSO and S 1 /H 2 models are capable of generating random graphs that are not just smallworld, highly clustered and scale-free, but in addition contain communities as well. Although the advantageous properties of hyperbolic models were already appreciated in the literature, this recognition makes them even more suitable for modelling real systems than thought before. In real systems, communities provide very important units at an intermediate level of the structural organisation of the network. Our detailed study of the behaviour of the community structure as a function of the model parameters show that modules are formed also in hyperbolic networks in an "automatic" way, simply as a consequence of the connection rules and the nature of the underlying hyperbolic geometry. These findings add a novel perspective and motivation for the studies and applications of hyperbolic network models.

S1. The E-PSO model
In our studies of the community structure of hyperbolic networks, besides the PSO model and the S 1 /H 2 model, we also used the E-PSO model for random graph generation. The results on the communities found in this model are presented in sections S2-S4 and section S6, whereas in the present section we provide a brief introduction to the model itself.
The popularity-similarity optimisation (PSO) model [14] was generalised in the Supplementary Notes of Ref. [14] by the introduction of so-called internal links: in this generalised PSO model, in addition to the m number of external links connecting the new node to the already existing nodes, at each time step an L number of further internal connections are created between disconnected pairs of previously appeared nodes, where the formation of all types of links is determined by the usual distance-dependent probabilities. It is straightforward to extend this model to negative values of the parameter L [44], in which case after connecting the new node to m number of the already existing nodes, the total number of links between the previously appeared nodes is changed by L < 0, i.e. |L| number of internal links are removed at each time step. To maintain the trend that mostly hyperbolically close nodes are connected to each other, it is natural to use the same probability formula for retaining a link as for the link creation and, accordingly, the complementary probability of link creation as the probability of link removal.
The E-PSO model [15] is an equivalent of the generalised PSO model using solely external links, i.e. all edges of the network are established by the actual new node. Contrary to the original and the generalised PSO models, in the E-PSO model the number of links created at a time step is time-dependent. The expected number of links emerging at time i ∈ [1, N ] can be given as whereL i is the expected total number of internal links from previous nodes on the node appearing at iteration i at the end of the network generation process in the generalised PSO model parametrised by the number of nodes N , the number m of external links created at each time step, the change L in the number of internal links at each time step and the popularity fading parameter β. Compared to the generalised PSO model, with the E-PSO model one can generate even large networks relatively fast, since in the generalised PSO model the hyperbolic distance needs to be calculated for all node pairs at each time step to update the probabilities of internal link creation or retainment in accordance with the updated node positions, whereas in the E-PSO model always only the distances from the newly appeared node need to be determined. Not only the original PSO model, but the generalised PSO model and the analogous E-PSO model are capable of producing scale-free networks (with a degree decay exponent γ = 1 + 1/β) that are highly clustered (in the case of small temperature T ) and have the small-world property.
Moreover, in the generalised PSO model and the E-PSO model, with the introduction of the parameter L even it becomes adjustable how the average internal degree of the subgraphs spanning between nodes having a degree larger than a certain threshold depends on the degree threshold [44]: for 0 < L the average internal degree increases with the degree threshold and for L = 0 (corresponding to the original PSO model) the average internal degree does not depend on the degree threshold until the degree threshold remains below a value at which the subgraphs become extremely small, while for L < 0 the average internal degree gradually decreases as the degree threshold increases. Note that in these generalised versions of the PSO model the expected average degree of the resulted network can be calculated as k = 2(m + L) instead of k = 2m.

S2. Quality of the detected communities
We studied the quality of the community structures detected by the asynchronous label propagation [60], the Louvain [52] and the Infomap [59] algorithms in PSO [14], E-PSO [15,44] and S 1 /H 2 [19,22] networks of various parameter combinations. The isolated nodes emerging in the case of the S 1 /H 2 model and occasionally also in the networks generated by the E-PSO model of L < 0 were removed before the community detection, meaning that the actual size of the examined networks does not necessarily reach the number of nodes N inputted in these models. Each community detection algorithm was executed once for each network. Figures S2.1-S2. 3 show the achieved highest weighted modularity averaged over 100 networks of each parameter setting together with the corresponding standard deviations. Figures S2.4-S2.12 present how the performance of the three different community detection algorithms depends on the parameters of the examined network generation models. Figure S2.1 depicts the effect on the achieved highest weighted modularity of changing the number of nodes N , the expected average degree k , the popularity fading parameter β and the temperature T in the PSO model, which corresponds to the E-PSO model with L = 0. For large N and small k most of the nodes have the possibility to create connections only with hyperbolically close nodes, while for small N/ k ratios the nodes are forced more often to connect even with farther nodes to create all the expected number of links. For this reason, a larger N/ k ratio leads to connections that are more strongly determined by the hyperbolic distances, and thus to a more clear separation between the angular regions of the hyperbolic disk, i.e. a community structure with higher modularity. Besides, by sharpening the cutoff in the connection probability, small values of T also facilitate the localisation of the node-node connections; thus, with the decrease of the temperature T the modularity of the detected community structures increases. Furthermore, for smaller values of β the inner nodes drift faster away from each other during the network growth, forming thereby more separated attraction centres for the outer nodes, due to which most of the network nodes can make a more definite choice between the community centres, which leads to communities with less external connections, i.e. larger modularity. Figure S2.2 display how the parameters m and L affect the achieved highest weighted modularity in E-PSO networks. Based on these, not only the expected average degree k = 2(m + L), but even m and L itself has an effect on the community structure of the generated networks. According to equation (S1.1), for L < 0 the number of links created by a new node at its appearance is an increasing function of the node's appearance time, meaning that the earlyappearing, inner nodes connect to few nodes only, whereas the outer nodes, for which the number of realisable connections (i.e. the number of previously appeared nodes) is not limited as much, form more connections. This way, nodes of any appearance time have the possibility to create connections only with hyperbolically close nodes. On the other hand, for 0 < L the inner nodes create at their appearance a relatively large number of connections with the previously appeared nodes, which -in the absence of enough hyperbolically close candidates -leads to the emergence of connections between not so close nodes too. Taking into consideration the concept that the more the connections are restricted to hyperbolically close node pairs, the stronger the arising community structure, we can conclude that for a given expected average degree k the modularity can be increased by decreasing the parameter L and, at the same time, increasing the parameter m accordingly.
According to figure S2.3, the strength of the community structure in S 1 /H 2 networks depends the same way on the model parameters as in PSO networks: the achieved highest weighted modularity is higher for larger number of nodes N , smaller average degree k , larger degree decay exponent γ (corresponding to smaller popularity fading parameter β in the E-PSO model) and larger α (which is analogous to lower temperature T in the E-PSO model).

S3. Community size distributions
Here we present the characteristics of the community size distributions obtained with the asynchronous label propagation [60], the Louvain [52] and the Infomap [59] algorithms for the PSO [14], E-PSO [15,44] and S 1 /H 2 [19,22] networks of various parameter combinations. Each community detection algorithm was executed once for each network. The isolated nodes emerging in the case of the S 1 /H 2 model and occasionally also in the networks generated by the E-PSO model of L < 0 were removed before the community detection, meaning that the actual size of the examined networks does not necessarily reach the number of nodes N inputted in these models. We generated 100 networks with each parametrisation and investigated the sample created by assembling the occurring community sizes from all the 100 networks. Figures [19,22] networks of various parameter combinations. Each community detection algorithm was executed once for each network. The isolated nodes emerging in the case of the S 1 /H 2 model and occasionally also in the networks generated by the E-PSO model of L < 0 were removed before the community detection, meaning that the actual size of the examined networks does not necessarily reach the number of nodes N inputted in these models. We generated 100 networks with each parameter setting and calculated the adjusted mutual information ( According to these figures, asynchronous label propagation and Infomap produce the most similar partitions, while the most different is the result of Louvain and Infomap. In most of the cases, the AMI depends similarly on the network generation parameters as the weighted modularity Q. For the high modularity regions of the parameter space (large number of nodes N , small average degree k , small popularity fading parameter β or large degree decay exponent γ and low temperature T or large α) the AMI is relatively large for each pair of community detection methods, indicating in these parameter regions the emergence of really apparent communities that are detectable for all the 3 investigated algorithms alike.

S5. Equidistant angular node arrangement
In order to exclude the possibility that the emergence of communities is a result of the inhomogeneities in the angular node arrangement arising inevitably when a finite number of angular coordinates is sampled from a uniform distribution in [0, 2π), we modified the PSO model [14] to use strictly equidistant angular arrangement, i.e. instead of sampling the angular coordinates uniformly randomly, assign the coordinates θ i = (i − 1) · 2π N , i = 1, 2, ..., N to the network nodes in a randomly chosen order. According to figures S5.1-S5.4, the networks generated by this modified PSO model possess a community structure with similarly high weighted modularity as the usual PSO networks (see figures S2.1 and S2.4-S2.6). Hence, we can conclude that the emergence of communities is not just an effect of the finite network size, but the inherent property of the studied hyperbolic network models. Figures S5.5

S6. Community detection on unweighted hyperbolic networks
We studied the quality of the community structures detected by the asynchronous label propagation [60], the Louvain [52] and the Infomap [59] algorithms in PSO [14], E-PSO [15,44] and S 1 /H 2 [19,22] networks of various parameter combinations. The isolated nodes emerging in the case of the S 1 /H 2 model and occasionally also in the networks generated by the E-PSO model of L < 0 were removed before the community detection, meaning that the actual size of the examined networks does not necessarily reach the number of nodes N inputted in these models. Each community detection algorithm was executed once for each network. This section presents the results obtained in the case of setting each link weight in the previously examined synthetic networks to 1, independently of the hyperbolic distance between the connected nodes.       The results for the PSO model are given in the left column (panels (a), (c) and (e)), whereas the ASI obtained for the S 1 /H 2 model appears in the right column (panels (b), (d) and (f)). The ASI for the communities detected by asynchronous label propagation is given in the top row (panels (a) and (b)), the ASI regarding the results of Louvain is shown in the middle row (panels (c) and (d)) and the ASI for the partitions found by Infomap is presented in the bottom row (panels (e) and (f)). We show the measured ASI (indicated by the color, averaged over 100 samples) as a function of the model parameters T and β, or 1/α and 1/(γ − 1) for networks of size N = 10, 000 and expected average degree k = 10.