Generalised popularity-similarity optimisation model for growing hyperbolic networks beyond two dimensions

Hyperbolic network models have gained considerable attention in recent years, mainly due to their capability of explaining many peculiar features of real-world networks. One of the most widely known models of this type is the popularity-similarity optimisation (PSO) model, working in the native disk representation of the two-dimensional hyperbolic space and generating networks with small-world property, scale-free degree distribution, high clustering and strong community structure at the same time. With the motivation of better understanding hyperbolic random graphs, we hereby introduce the dPSO model, a generalisation of the PSO model to any arbitrary integer dimension \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d>2$$\end{document}d>2. The analysis of the obtained networks shows that their major structural properties can be affected by the dimension of the underlying hyperbolic space in a non-trivial way. Our extended framework is not only interesting from a theoretical point of view but can also serve as a starting point for the generalisation of already existing two-dimensional hyperbolic embedding techniques.


Introduction
Network theory has become an essential and ubiquitous tool for modelling various types of complex systems ranging from the level of interactions within cells to the level of the Internet, economic networks, and the society [1,2]. In the past decades, a vast number of related studies reported a few universal features that most of the real networks seem to have in common, such as sparsity [3], small-world property [4,5], inhomogeneous degree distribution [6,7], high clustering coefficient [8] or community structure [9][10][11]. Incorporating all, or at least some of these universal properties into a unified modelling framework is, however, a non-trivial issue and still presents a theoretical challenge of high relevance. Along this line, a variety of different network arXiv:2108.03328v1 [physics.soc-ph] 6 Aug 2021 models have been proposed so far, including the celebrated Barabási-Albert (BA) model with preferential attachement [12], the hidden variables formalism [13][14][15][16][17] or models based on the mechanism of triadic closure, which has been specifically designed for explaining the high clustering of social networks [18,19]. Besides these examples, a further notable approach is given by hyperbolic network models that are capable of simultaneously explaining many observed network characteristics in a natural manner by assuming that nodes are embedded into a negatively curved hidden metric space [20][21][22][23][24][25].
The random hyperbolic graph (RHG) [20], for instance, is a static network model where nodes are placed at random on the hyperbolic disk of constant curvature K = −ζ 2 , and the connection probability between any pair of nodes is a decreasing function of their hyperbolic distance. A mathematically equivalent model is given by the S 1 model [26], where the nodes are positioned on a circle and become connected according to a probability depending on the angular distance and a hidden variable drawn from a power-law distribution. By converting the hidden variables into radial coordinates we arrive to the hyperbolic H 2 model [27] that is equivalent to the RHG model; hence, the RHG is often also referred to as the S 1 /H 2 model.
In contrast to the RHG, in the popularity-similarity optimisation (PSO) model [21] the networks are not static but evolve over time via the continual appearance of new nodes on the hyperbolic plane. More precisely, new nodes are placed one by one in the native disk representation of the two-dimensional hyperbolic plane [20] with logarithmically increasing radial coordinates and uniformly random angular coordinates. Once a new node appears, it establishes connections to the previous ones with a probability depending on the hyperbolic distance in a similar way as in the RHG model. The tendency to connect to hyperbolically close nodes can be interpreted as an optimisation of a trade-off between the popularity (arising from the node birth time and reflected by the radial coordinate) of a possible candidate and its similarity (the angular distance abstracting the distance in an attribute space) compared to the newly arriving node. In vague terms, the degree of the nodes is determined by the radial coordinate, and owing to an outward shift of the nodes (referred to as the popularity fading, controlled by a parameter β), the degree distribution takes the scaling form of P(k) ∼ k −γ with a tuneable decay exponent γ = 1+ 1 β . By changing the sharpness of the cutoff in the connection probability as a function of the hyperbolic distance with another parameter T called temperature, the average clustering coefficientc of the resulting graphs can be adjusted as well. Although this model has been shown to be capable of generating networks that are small-world, highly clustered and scale-free at the same time, several other variants of the original PSO model have been suggested in order to explain further features of real-world graphs. Examples include the E-PSO model that inherently accounts for the creation of internal links, i.e. connections emerging between old nodes in the network [22], or alternatively, the deletion of already existing links [28]. Or, the nonuniform popularity-similarity optimisation (nPSO) model [24,29] that allows the generation of networks with an adjustable community structure by assuming a heterogeneous angular node distribution with multiple peaks is also worth mentioning. Nevertheless, it has also been revealed quite recently that both the RHG and the PSO models can generate networks that possess strong community structure despite lacking any explicitly built-in community generating mechanisms [25,[30][31][32][33][34].
In parallel with the developments of hyperbolic network models, another closely related field given by hyperbolic embedding techniques has also received great attention [22,27,28,[35][36][37][38]. Briefly, this tackles the problem of inferring the most plausible coordinates for the network nodes based on the topology of a given network. One of the first methods pointing in this direction was HyperMap [22], relying on a maximum likelihood estimation, where we assign hyperbolic coordinates to the nodes of the network by maximising the probability that the network was generated by the E-PSO model. Contrarily, in Refs. [36,37] an embedding technique based on a nonlinear dimension reduction of the Laplacian matrix was introduced. Along similar lines, a whole set of embedding algorithms were studied in Ref. [38], using different pre-weighted matrices encapsulating the network structure and multiple unsupervised dimension reduction techniques borrowed from machine learning. The rationale behind these approaches (usually coined as coalescent embeddings) is that when they are applied to hyperbolic networks, a common node aggregation pattern can be observed that is circularly or linearly ordered (angular coalescence) according to the original angular coordinates on the hyperbolic plane. An embedding algorithm that mixes the coalescent embedding with local angular optimisation based on likelihood maximisation was proposed in Ref. [28]. A further, very efficient embedding method is given by Mercator [27], adopting the Laplacian eigenmaps approach with the coordinates optimised according to the RHG model.
Despite the excellent performance of the above embedding techniques, there is clearly a theoretical limitation behind most of them: they are defined on the hyperbolic disk, that is, in d = 2 dimensions. Nevertheless, it has recently been revealed that higher-dimensional hyperbolic embeddings can outperform lower-dimensional ones in link prediction, for instance, in author collaboration networks [39]. Moreover, in Ref. [40] it has been also shown that the presence of additional dimensions can lead to a clearer separation between the communities of a network. Uncovering the role of the number of dimensions in hyperbolic embeddings is, therefore, of great interest that simultaneously provides a strong motivation for investigating appropriate higher-dimensional hyperbolic network models as well.
Along this line, the RHG model has recently been extended to d > 2 dimensions [41,42], placing the nodes in a d-dimensional hyperbolic ball. However, the extension of the PSO model to higher dimensions is still missing. Motivated by that, here we introduce the dPSO model, a generalisation of the original two-dimensional popularity-similarity optimisation model to any arbitrary integer dimension of d ≥ 2, which, we believe, provides a further substantial step towards a comprehensive theoretical characterisation of hyperbolic graphs.
Besides its theoretical relevance, our dPSO model opens up the possibility of systematically generalising already existing embedding techniques to higher dimensions, the necessity of which has explicitly been outlined e.g. for coalescent embeddings in Ref. [38]. Therein the authors claim that as a supplement to their findings, an additional interesting analysis could be to generate synthetic networks using for instance a threedimensional PSO model and examine the accuracy of their methods by comparing the obtained embeddings to the original node arrangement. The present work contributes to this issue by thoroughly elaborating the PSO model for d = 3 and higher dimensions.
Since the higher number of dimensions of the underlying hyperbolic space allows a much richer characterisation of the nodes in general, here we conjecture that the suggested ddimensional embeddings could provide further and deeper insights into the architecture of the hidden geometry behind the structure of complex networks.
In the present paper, we introduce the dPSO model as a natural generalisation of the well-known two-dimensional PSO model [21] to hyperbolic spaces of dimension d > 2. We show analytically that the degree distribution of dPSO networks can be written as P(k) ∼ k −γ in the large k regime, where the degree decay exponent γ is directly related to the dimension d and the popularity fading parameter β as γ = 1 + 1 (d−1)β . Besides the scale-free behaviour, the networks generated by the dPSO model can exhibit a large average clustering coefficientc and a strong community structure for a relatively wide range of the parameter settings if the number of dimensions of the underlying hyperbolic space is not extremely high. According to our results,c is controlled by an interesting interplay between the dimension d and the temperature T , where for T < 1 d−1 the average clustering coefficient is a decreasing function of T , whereas at temperatures near and above T = 1 d−1 ,c becomes independent of T . A further noteworthy feature of the dPSO model is that in dimensions d > 2 extremely skewed degree distributions with γ < 2 become accessible, which can lead to networks displaying a number of exotic properties that are absent in scale-free networks with 2 ≤ γ. The rich variety of networks that can be obtained in our proposed framework together with the capability of reproducing the fundamental properties of real networks in a natural way make the dPSO model a very promising candidate upon which higher-dimensional hyperbolic embedding techniques may be developed in the future.

Methods
The original popularity-similarity optimisation model places the network nodes in the native representation of the hyperbolic plane during the network generation. First, in Sect. 2.1 we describe the native representation of the d-dimensional hyperbolic space and the corresponding formula of the hyperbolic distance. Next, the network generation algorithm of the dPSO model is introduced by extending the PSO model to the hyperbolic space of any integer dimension d ≥ 2.

Native representation of the hyperbolic space
The d-dimensional hyperbolic space of constant curvature K < 0 is represented in the so-called native representation [20] by a d-dimensional ball of infinite radius in the Euclidean space (for which K = 0). In this representation the Euclidean angles between hyperbolic lines are equal to their hyperbolic values, and the radial coordinate r of a point (defined as its Euclidean distance from the centre of the ball) is equal to its hyperbolic distance from the ball centre. The hyperbolic distance between two points is measured along their connecting hyperbolic line, which is either the arc of the Euclidean circle going through the given points and intersecting the ball's boundary perpendicularly or -if the ball centre falls on the Euclidean line connecting the two points in question -the corresponding diameter of the ball. The hyperbolic distance x between two points given by the Cartesian coordinate vectors u = (u 1 , u 2 , ..., u d ) and where ζ = √ −K, and θ u,v = arccos( u·v u v ) = arccos( ) is the angle between the examined points. Note that in the case of r u = 0 simply x = r v , and if r v = 0 then x = r u . According to Ref. [20], for sufficiently large ζr u and ζr v with an angular distance θ u,v larger than 2 · √ e −2ζru + e −2ζrv but small enough to use the approximation sin(θ u,v /2) ≈ θ u,v /2, the hyperbolic distance can be approximated as (2)

Description of the extended PSO model
The PSO model [21] generates networks that have a scale-free degree distribution characterised by a degree decay exponent γ that is determined by the radial arrangement of the network nodes. In terms of these, there are two natural possibilities for the extension of the two-dimensional PSO model to any integer number of dimensions d ≥ 2 that both gives back the original PSO model at d = 2: one can either make γ independent of the number of dimensions by introducing a d-dependent multiplier in the radial coordinates of the low-temperature regime that yields highly clustered networks, or not change this coordinate formula compared to the two-dimensional case and make the degree decay exponent γ dependent on the value of d. In the present study, we chose the latter option since it offers the opportunity to expand the range of the achievable γ values below 2 by increasing the number of dimensions. This choice is established in more detail in Sect. S1.3 of the Supplementary Information. In the dPSO model, the network nodes appear one by one in the above described native representation of the d-dimensional hyperbolic space and connect to previously appeared nodes with probabilities depending on the hyperbolic distances. The parameters of the model can be listed as follows: • The curvature K ∈ R − of the hyperbolic space, controlled by ζ = √ −K > 0. Changing the value of ζ corresponds to a simple rescaling of the hyperbolic distances; the usual custom is to set the value of ζ to 1 (i.e. K to −1).
• The dimension 2 ≤ d ∈ Z + of the hyperbolic space.
• The final number of nodes N ∈ Z + in the network.
• The number of connections m ∈ Z + established by each node after the mth one at its appearance. The average degree of the network is approximatelyk = 2 · m.
• The popularity fading parameter β ∈ (0, 1], controlling the outward drift of the nodes in the native ball. The exponent γ of the power-law decaying tail of the degree distribution is related to the popularity fading parameter as According to this relation between γ, β and d, in the case of different dimensions different popularity fading parameters are needed to obtain the same degree decay exponent γ. Note that as the dimension d increases, the achievable smallest degree decay exponent (γ min = 1 + 1/(d − 1), yielded by β = 1) decreases, meaning that in higher dimensions the attainable highest degree is larger than in hyperbolic spaces of smaller dimensions. (The details of the derivation of Eq. (3) are given in Sect. 3.1 and in Sect. S1 of the Supplementary Information.) • The temperature 0 ≤ T , T = 1 d−1 , controlling the average clustering coefficientc of the network. As the temperature increases from 0, the average clustering coefficient decreases, and settles to a more or less constant value at T = 1 d−1 . For β ≤ 1 d−1 (i.e., for 2 ≤ γ), the clustering is asymptotically zero for any 1 d−1 < T , while for larger popularity fading parameters (i.e., for γ < 2) the lowest possiblec is an increasing function of β.
During the random graph generation process, initially the network is empty, and at each time step j = 1, 2, ..., N a new node joins the network as follows: (i) The new node j appears at radial distance r jj from the origin with a position chosen uniformly at random on the surface of the corresponding d-dimensional ball, where (a) r jj = 2 ζ ln j if T < 1 d−1 , and (b) r jj = 2T (d−1) ζ ln j if 1 d−1 < T . The change in the multiplying factor at T = 1/(d−1) was introduced to ensure that the formula of the degree decay exponent γ remains the same (given by Eq. (3)) for all temperature settings. (Note that a similar adjustment of r jj was already introduced in the original PSO model of d = 2 [21].) (ii) The radial coordinate of each previously (at time i < j) appeared node i is increased according to the formula r ij = βr ii +(1−β)r jj in order to simulate popularity fading.
(iii) The new node j establishes connections with m number of previously appeared nodes. Only single links are permitted. If the number of previously appeared nodes is not larger than m, then node j connects to all of them. Otherwise (i.e., for m + 1 < j), (a) node j connects to the m hyperbolically closest nodes if T = 0, and (b) at temperatures 0 < T , any previous node i = 1, 2, ..., j − 1 gets connected to node j with probability where the hyperbolic distance x ij between the node pair i − j can be calculated based on Eq. (1) and the so-called cutoff distance R j can be obtained by solving the equation m =k j , where the expected numberk j of the realised connections of the new node j at its arrival time j can be written as using The analytic study of the above model is given in the Supplementary Information: the expected form of the degree distribution is derived in Sect. S1, whereas the approximating formulae of the cutoff distance are presented in Sect. S2.

Results
We generated networks with the above-described dPSO model using numerous parameter settings. As an illustration, in Fig. 1 we show layouts of dPSO networks of size N = 1000 both in the native representation of the d-dimensional hyperbolic space and according to a standard force-directed layout algorithm in the Euclidean plane. We analyzed the major structural properties of dPSO networks from various points of view. The results along with their detailed explanations are presented in the following subsections, focusing on a few fundamental network characteristics such as the degree distribution (Sect. 3.1), the average clustering coefficient (Sect. 3.2) and the community structure (Sect. 3.3). Further plots regarding simulation results are shown in Sect. S3 of the Supplementary Information. Moreover, in Sect. S4 of the Supplementary Information we also study a three-dimensional extension of the nonuniform popularitysimilarity optimisation model (nPSO) [24,29] that samples the angular coordinates of the network nodes from a multimodal distribution to allow control over the number and the size of the communities.  Figure 1. Layouts of networks generated by the dPSO model in 2-and 3dimensional hyperbolic spaces of curvature K = −1. The colouring of the nodes and the links indicates communities found by the Louvain algorithm. In panels a), c), e) and g) we display the network in the native d-dimensional hyperbolic ball, whereas panels b), d), f) and h) show the standard Euclidean layout of the same graph for comparison. In the top row, we present networks of the same degree decay exponent γ = 2.5, generated on the 2-dimensional hyperbolic plane, setting the popularity fading parameter β to 2/3 (panels a) and b)), and in the 3-dimensional hyperbolic space, setting the popularity fading parameter β to 1/3 (panels c) and d)). In the bottom row, we show networks of the same popularity fading parameter β = 1, corresponding to the smallest degree decay exponent achievable in a given dimension, namely γ = 2.0 in the 2-dimensional case (panels e) and f)) and γ = 1.5 in the 3-dimensional case (panels g) and h)). For each network, we set the number of nodes N to 1000, the expected average degree 2m to 4 and the temperature T to 0. The average clustering coefficientc of the displayed networks and the modularity Q for their displayed partitions are the following: c = 0.729 and Q = 0.882 for panels a) and b);c = 0.644 and Q = 0.845 for panels c) and d); c = 0.788 and Q = 0.829 for panels e) and f); whilec = 0.964 and Q = 0.354 for panels g) and h).

Degree distribution
Our analytical calculations show that when generating networks according to the dPSO model presented in Sect. 2, we obtain degree distributions that follow a scaling form of where the degree decay exponent γ, given by Eq. (3), depends on the popularity fading parameter β and also the number of dimensions d, but is independent of the temperature T . This is a direct consequence of the fact that the probability for node i (appearing at time i) to attract a link from node j (appearing at time j > i) in the above model can be written as with m denoting the number of connections established by node j at its appearance.
(The derivation of the above formula is given in Sect. S1 of the Supplementary Information.) Indeed, by following the idea in Ref. [21], it can be shown that Eq. (8) is equivalent to imposing an extended preferential attachment rule (EPA) that defines the connection probability between an existing node i with a degree k i (j) and the newly appearing node j as where A = (γ − 2)m is a parameter called initial attractiveness. According to Ref.
[43], in networks generated by this EPA rule, the expected degree of node i becomes at time j, and the degree distribution develops into a scale-free form of P(k) ∼ k −γ , where the exponents α and γ are connected by α(γ) = 1 γ−1 . We can relate Eqs. (9) and (10) to Eq. (8) by identifying α as α = (d − 1)β, verifying that Finally, since γ can be expressed from α(γ) as γ = 1 + 1 α , we arrive at the formula in Eq. (3) for the decay exponent of the degree distribution in Eq. (7). Thus, networks that grow according to the rules outlined in Sect. 2 inherit the scale-free property from the original PSO model, albeit using the same β parameter leads to a smaller γ exponent for greater values of d. Note that by setting d to 2, one can easily recover the results of the original PSO model in Ref. [21], where the degree decay exponent has been found to be γ = 1 + 1 β . The analytical results are in perfect agreement with the numerical simulations, as indicated by Fig. 2, where we display the complementary cumulative distribution function (CCDF) of the node degrees for several networks obtained from the dPSO model with different combinations of the d and the β parameters at different values of the temperature T .

Clustering coefficient
Based on simulations, we have found that the average clustering coefficientc of dPSO networks displays an unusually rich behaviour depending on the specific choices of the model parameters. First, in Fig. 3 we show the measuredc as a function of the rescaled temperature T · (d − 1) for different settings of the further model parameters. A rather simple observation regarding this figure is thatc is an increasing function of m. This  Figure 2. Degree distribution of networks generated by the dPSO model at different parameter settings. Each panel corresponds to a given dimension d. We display in all the panels the complementary cumulative distribution function (CCDF) of the node degrees for networks of two different degree decay exponents: γ = 2.5 (blue) and γ = 1 + 1/(d − 1) (orange). In both cases, we generated a network using the temperature setting T = 0 (dashed lines), T = 0.5/(d − 1) (dash-dotted lines) or T = 1.5/(d − 1) (dotted lines). The curvature of the hyperbolic space, the number of nodes and the half of the expected average degree were the same for all networks, namely K = −ζ 2 = −1, N = 10, 000 and m = 2. All the indicated simulation results match well the curve P(k ≤ K) ∼ k −(γ−1) (shown by solid lines) that was expected based on the analytical calculations.
is reasonable in the light of the role of this model parameter outlined in Sect. 2: m is related to the expected average degree of the dPSO networks ask = 2 · m, meaning that higher values of m correspond to higher average degrees. Besides, according to Fig. 3, the average clustering coefficient is always maximal at T = 0, i.e. when newly appearing nodes connect only to the hyperbolically closest existing nodes. If, however, the temperature increases, the cutoff in the connection probability becomes milder. This implies that rather distant nodes also become likely to be connected and consequently, the value ofc decreases. Nevertheless, above a certain point T c = 1 d−1 , hereinafter referred to as the critical temperature, the average clustering coefficient remains more or less constant. This can be attributed to the fact that at the critical temperature we change the formula of the radial coordinates from r jj = 2 ζ ln j to r jj = 2T (d−1) ζ ln j, meaning that the radial coordinates become an increasing function of the temperature above T c . According to the approximating formula given by Eq. (2), the larger radial coordinates obviously yield larger hyperbolic distances between all node pairs. Therefore, despite the continuous slowing down of the decay in the connection probability as a function of the hyperbolic distance, above T c = 1 d−1 the increase in the temperature does not increment further the number of nodes that are reachable for a newly coming node, and thus, the average clustering coefficient becomes settled to a constant value.
Finally, it can be clearly seen in Fig. 3 that the average clustering coefficient is an increasing function of the popularity fading parameter β. This is especially striking in the higher-dimensional cases, where the range of the degree decay exponents that are achievable extends to lower values. When γ = 1 + 1 (d−1)β is decreased, the degree distribution of the emerging network decays more slowly and, consequently, the largest occurring degrees increase. This is realised by the increase in the preference of the newly coming nodes for connecting to the early-appeared ones, overshadowing the attractiveness of the angular neighbours located at larger radii. Now let us concentrate on the subgraph of nodes that appeared before a given time point. Since each node has to create m links at its appearance, the ratio between the connected and the non-connected node pairs in this subgraph increases towards earlier times (i.e., as the number of nodes in the subgraph decreases). Therefore, if the degree decay exponent γ of the network is decreased, the set of the primarily attractive inner nodes tightens and the new nodes tend to connect into a more densely connected group of nodes, which increases the number of triangles and, accordingly, the average clustering coefficient. It is important to note that at extremely small values of γ, the high density of the connections among the most popular inner nodes moderates the effect of the temperature increase onc by limiting the probability to connect to nodes that are not connected to each other. Due to this, the networks of small enough degree decay exponents are highly clustered not only at small temperatures, but in the hightemperature regime as well.
To provide further insight into how the model parameters affect the triangle formation in dPSO networks, in Fig. 4 we show the average clustering coefficient as a function of the degree decay exponent γ at different settings of the temperature T and the dimension d. This figure again proves that the critical temperature T c = 1 d−1 separates two distinct regimes, wherec shows a fundamentally different nature.
Below the critical temperature T c = 1 d−1 (Figs. 4a and b), the value ofc measured at a given degree decay exponent γ is affected by the number of dimensions of the underlying hyperbolic space: dPSO networks created in higher-dimensional spaces using smaller popularity fading parameters display smaller values ofc compared to networks that have the same degree decay exponent γ but were generated in lower-dimensional   hyperbolic spaces with higher popularity fading parameters. This can be explained roughly by considering that at not too high temperatures, the newly appearing nodes tend to connect to already existing nodes that are relatively similar to them, where high similarity means small angular distance [21]. If the number of dimensions or, equivalently, the number of independent angular coordinates characterising the node attributes is increased, the number of possible coordinate combinations that define the set of positions of similar attributes for a given new node also increases. Therefore, any two nodes that can be considered to be similar from a third node's point of view are less and less likely to have angular coordinates that are relatively close to each other as well with the increase in the number of dimensions. Consequently, the selected nodes to which the new node connects tend to share fewer links in higher dimensions; thus, the number of triangles in the emerging network is reduced. As we approach the critical point from below, the role of the number of dimensions in local triangle formation gradually weakens. Near to and above T = T c (Figs. 4c and d), where connections between rather distant nodes are likely to occur too, the value of c is independent of the individual values of the d and the β parameters that produce the same degree decay exponent γ. Here, it seems that the randomising influence of the high temperature on link formation is so strong that the similar effect of the large number of dimensions is negligible compared to it.

Finding and evaluating communities
Communities are very important structural units in complex networks at the "mesoscopic" scale, without a widely accepted unique definition, but usually associated with subgraphs with a larger internal and a smaller external link density. The automated extraction of communities based solely on the network topology is a challenging problem, with an immense number of different solutions proposed in the literature [9][10][11]. An interesting related feature of two-dimensional hyperbolic networks is that they also contain communities for the major part of the parameter space, in spite of the lack of any explicit built-in community formation mechanism in the graph generation algorithms [25]. Motivated by this and the overall importance of communities in network science, here we examine also the community structure of d-dimensional PSO networks.
Along this line, we apply three independent and well-established community finding algorithms to locate the modules, namely the asynchronous label propagation algorithm [44,45], the Infomap method [46, 47] and the Louvain algorithm [48,49]. The basic idea of asynchronous label propagation is to simulate the diffusion of community labels along the examined network, where the regular updating of the labels based on the neighbouring nodes brings a rapid consensus among the members of a dense group on a unique label. In contrast, the Infomap algorithm provides an information-theoretic approach for finding communities, taking advantage of the fact that communities can actively help in achieving the most parsimonious description of the trajectory of an infinitely long random walk on the network. The algorithm itself searches for the minimum of the so-called map equation, which expresses the code length for an average movement in the above-mentioned random walk process. Finally, the Louvain method performs a fast and efficient heuristic maximisation of modularity, which corresponds   to the most widely used quality measure for communities [50,51], expressed in general as where N is the number of nodes in the network, A ij denotes an element of the adjacency matrix (A ij ≡ A ji = 1 if i is connected to j, and otherwise A ij ≡ A ji = 0), P ij gives the connection probability between nodes i and j in a random null model, E stands for the total number of links in the network, c i is the community to which node i belongs and the Kronecker delta δ c i ,c j ensures that non-zero contribution can come only from node pairs of the same community. A natural choice for the null model is given by the configuration model, yielding P ij = k i k j 2E . In the Louvain approach, Q is optimised in a hierarchical manner, where after finding the local maximum at a given organisation level of the network, in the next step we move up to the next level by aggregating the current communities into single nodes.
As demonstrated by Fig. 1, a community in a hyperbolic network arises from some inner nodes that serve as community cores and the outer nodes of the corresponding angular sector that are held together by their common preference toward the same attractive centres. As it was detailed in Ref. [25] in the case of the two-dimensional PSO model, the emergence of a strong community structure can be achieved under two conditions: the existence of inner nodes that are distant from each other enough to provide well-separated attractive centres for the different angular regions, and the localisation of the connections. The distance between the inner community cores can be increased by accelerating their outward drift that simulates the popularity fading via decreasing the popularity fading parameter β. The strong localisation of the connections can be ensured primarily by setting the temperature T to a small value and thus making the cutoff in the connection probability sharp as a function of the hyperbolic distance. However, it has to be also taken into consideration that if the ratio between the number N of nodes and the number m of connections established by each node at its appearance is smaller, then, to create all the m number of links, the nodes are forced more often to connect even to farther nodes. As a consequence, a small temperature in itself is not always enough to make the connections localised, but it has to be complemented with a relatively large N/m ratio in order to make the connections more strongly determined by the hyperbolic distances and create hereby a more clear separation between the angular regions with respect to the links, thus supporting community formation.
In Fig. 5, we show the highest modularity Q obtained among the applied three community finding methods as a function of the rescaled temperature T · (d − 1) for dPSO networks generated at different parameter settings. In each panel, the bundle of curves (showing Q for networks with different degree decay exponents) seems to form a fork-like pattern, in which the curves are more distant from each other in the lowtemperature regime and approach each other at high values of T · (d − 1), where Q becomes more or less constant and independent from the rescaled temperature.
In Fig. 6, we show the achieved highest modularity Q as a function of the degree decay exponent γ for different number of dimensions at a low temperature (Fig. 6a), a moderate temperature (Fig. 6b) and a high temperature (Fig. 6c). For all of the examined temperatures, Q starts at low values and shows first a strong, then a mild increase toward the higher values of γ, i.e. as the largest node degrees decrease. The effect of the temperature can be observed by comparing the three panels, where we can see that the constant value to which Q settles in the large γ regime is higher if T is lower, as expected. Furthermore, according to Fig. 6c, when the rescaled temperature T · (d − 1) is high enough, the Q(γ) curves seem to collapse onto a universal curve for all the examined dimensions, while at smaller temperatures ( Fig. 6a and b), the modularity measured at a given degree decay exponent γ is a decreasing function of the dimension d, similarly to what has been seen on the local scale for the average clustering coefficient in Fig. 4.

Discussion
The PSO model [21] is arguably one of the most successful hyperbolic network models since it offers a quite natural way to reproduce the major structural properties of realworld networks. The original formulation of this approach is, however, given only for the  . We always set the curvature K = −ζ 2 of the hyperbolic space to −1, the network size N to 10, 000 and the expected average degree 2m to 10. We searched for communities once with all three community detection methods on 5 dPSO networks and plotted the obtained highest modularity averaged over the 5 networks for each parameter setting, with the error bar indicating the standard deviation among the 5 networks.
two-dimensional hyperbolic disk; therefore, a question arising naturally in this context is how it can reasonably be extended to higher-dimensional hyperbolic spaces. In the present paper, we studied this issue in detail and introduced the dPSO model as a d-dimensional generalisation of the original PSO model.
The obtained analytical results show that the scale-free property of dPSO networks is inherited from the original PSO model, meaning that the tail of the degree distribution decays as a power-law, lim k→∞ P(k) ∼ k −γ , for any number of dimensions d ≥ 2. The exponent γ is determined by the popularity fading parameter β and the number of dimensions d via the formula γ = 1+ 1 (d−1)·β , which can be basically explained as follows. The solid angle subtended by the entire surface of the d-dimensional ball representing the d-dimensional hyperbolic space is an increasing function of the number of dimensions, meaning that in the case of a higher d, the uniform distribution of the same number of nodes on the surface results in larger angular distances between the nearest neighbours. Therefore, at a given popularity fading parameter (controlling the attractiveness of the early-appeared nodes arising from their relatively low radial coordinate), in higher dimensions we see an increased propensity of the newly coming nodes to connect to the innermost nodes of any angular position instead of the angular neighbours located at larger radii. Due to this, the same popularity fading parameter β yields larger maximum degree, and thus, smaller degree decay exponent γ for larger values of d. To compensate the decrease in the attractiveness of the angular neighbours caused by the increase in the number of dimensions and keep the degree decay exponent γ at a given value, one has to reduce the distinguished attractiveness of the inner nodes and limit the angular range in which they are preferred. For this, the advantage of the inner nodes in their radial position has to be decreased by shifting them more outwards, which can be achieved by setting the popularity fading parameter β to a smaller value and enhancing thereby the process of popularity fading.
It is worth emphasising that the smallest attainable degree decay exponent (obtained at β = 1) is 2 in the original PSO model [21], whereas in dPSO networks which can be decreased below the two-dimensional limit γ min (2) = 2 by simply increasing the number of dimensions d above 2. Note that the analysis of scale-free networks with a degree decay exponent smaller than two is in itself an interesting topic since such networks exhibit many exotic features that can not be observed in scale-free networks with 2 ≤ γ, including the divergence of the average degree or the presence of macroscopical hubs being connected to a finite fraction of the nodes even in the thermodynamic limit N → ∞ [52, 53]. In terms of dPSO networks, we have found that such extremely skewed degree distributions lead to further unexpected results. As indicated in Fig. 4 and Fig. 6, dPSO networks of γ < 2 are characterised by high average clustering coefficientc, but remarkably at the same time relatively small modularity Q at basically any temperature T . Intuitively, this behaviour can be understood if we consider that on the one hand, since the largest hubs are formed from the first few nodes of the network generation process, they are densely connected to each other. Thus, in the presence of extremely large hubs to which most of the nodes connect at their appearance, triangles are formed with large probability, resulting in a largec. On the other hand, these large hubs make the partitioning of the network into disjunct communities with high modularity practically impossible since the community of any such hub has a macroscopic number of links pointing outside of the given community, resulting in low Q values.
In addition, we found that the number of dimensions d along with the temperature T play a joint role in controlling the average clustering coefficientc of dPSO networks as well. More precisely, as it can be seen in Fig. 3, we can distinguish two phases separated by the critical point T c = 1 d−1 , where the average clustering coefficient of the networks behaves in a fundamentally different way. At temperatures T < T c ,c is a decreasing function of the temperature; however, at temperatures near and above T c , c tends to become independent from T . Besides, based on Fig. 4, at low temperatures we can state that the average clustering coefficient measured at a given degree decay exponent γ decreases with the number of dimensions d, in perfect accordance with the results found in Ref.
[54] for the RHG model. Meanwhile, near the critical temperaturē c begins to show a universal decay with γ, irrespectively of the separate values of the popularity fading parameter β and the dimension d. Despite the obvious differences between the d-dimensional PSO and RHG models, remarkably, a similar separation of phases with the same critical temperature has also been observed in the d-dimensional RHG model and discussed in a slightly different context in Ref. [42]. (Note however that the analogous phases of the two models are different in certain aspects.) As it has already been reported in Ref. [25], the original, two-dimensional PSO model [21] is capable of generating networks with strong communities for a wide range of the parameter settings, despite the fact that it does not include any explicitly builtin community structure generating mechanism. Based on the high modularity values measured on dPSO networks (Figs. 5 and 6), here we conclude that the emergence of a strong community structure is not a peculiar feature of the two-dimensional case, but it can be observed even if the number of dimensions is larger than 2, provided that the degree decay exponent γ is not too small and the temperature T is not too high. Nevertheless, at such settings of γ and T , the modularity Q obtained at a given γ decreases as d is increased, similarly to the average clustering coefficientc.
In conclusion, motivated by the fact that there is evidently no particular reason to assume that the underlying hyperbolic space of complex networks is certainly twodimensional, here we proposed a generalisation of the two-dimensional popularitysimilarity optimisation model of network growth, namely the dPSO model, in which the dimension d acts as an additional degree of freedom. This increase in the number of freely variable model parameters allows a much richer characterisation of the networks, and therefore, enables us to better adjust the properties of the generated networks to that of the empirically observed data. In connection with this, we found that there exists a relatively broad range of parameter settings where dPSO networks are suitable for capturing many essential characteristics of real-world networks such as the scale-free property, the high value of the average clustering coefficient or the strong community structure. Namely, dPSO networks generated in lower-dimensional hyperbolic spaces using not too high temperatures and not too low degree decay exponents simultaneously exhibit all the above-mentioned features of real-world networks. Nevertheless, as the number of dimensions d increases, both the maximal average clustering coefficient and the maximal modularity that can be obtained at a given degree decay exponent γ (by setting the temperature T to 0) decreases. This implies that dPSO networks generated in high-dimensional spaces (e.g., at d > 10) can be characterised by neither a clustering nor a community structure of similar strength that can be observed in various real-world examples; hence, a reasonable upper limit can be found on the number of dimensions of the hyperbolic space that underlies real networks. By following similar considerations regarding the clustering coefficient of RHG networks, in Ref.
[55] the authors claim that in terms of both the modelling and the embedding techniques, the most suitable choice for the number of dimensions of the underlying hyperbolic space is d = 2. Here, we complement this result by showing that in general, the dPSO model with d slightly above 2 also performs excellently on the modelling ground (still yielding relatively high values of the average clustering coefficient and the modularity), which, along with the recent success of d > 2 hyperbolic embedding techniques for instance in link prediction [39] or the separation of communities [40], confirmes the relevance of low-dimensional hyperbolic spaces (e.g. with d = 3 or d = 4) in the theory of complex networks. In Ref. [38], several different methods have been already provided for assigning angular coordinates to the network nodes in hyperbolic spaces of arbitrary curvature K = −ζ 2 and number of dimensions d. Using the angular positions yielded by one of these methods, in order to create an embedding that corresponds to our dPSO model, the radial coordinate of the node having the th ( = 1, 2, ..., N ) largest degree (with ties in the order of node degrees broken arbitrarily) has to be calculated as r N = β · (2/ζ) · ln + (1 − β) · (2/ζ) · ln N , where the popularity fading parameter β is determined by the degree decay exponent γ and the number of dimensions d as

Code availability
The code used for generating networks with the d-dimensional PSO model will be available at https://github.com/BianKov/dPSO upon publication. Popularity-similarity optimisation model beyond two dimensions -Supplementary Material S1. Degree distribution of dPSO networks It has been shown in Ref. [21] that in the two-dimensional PSO model of expected average degreek ≈ 2 · m and popularity fading parameter β, the probability that node i (appearing at time i) and node j (appearing at time j > i) connect to each other can be written as The preferential attachment model in Ref.
[43] yields the same connection probability with β = 1/(γ − 1) when the number of connections created at the appearance of a new node is set to m and the degree distribution P(K = k) is proportional to k −γ . In this section we show that in the d-dimensional popularity-similarity optimisation model which -according to the analogy with the preferential attachment model -means that the degree distribution of the networks generated by the dPSO model takes the form of P(K = k) ∼ k −γ with γ = 1 + 1 (d−1)·β . Sect. S1.1 deals with the case of T = 0, where the new node always connects to the m hyperbolically closest nodes, while Sect. S1.2 describes settings where the temperature is strictly larger than zero (0 < T ), enabling hyperbolically father nodes to become connected as well.

S1.1. Connection probability in the case of deterministic connection
At temperature T = 0, each node j = 1, 2, ..., N connects at its appearance to the m hyperbolically closest nodes or, in other words, to all the previously appeared nodes that lie from node j within a certain hyperbolic distance R j . Thus, the probability Π(i, j) of the emergence of a link between nodes i and j (with i < j) equals to the probability that the hyperbolic distance of node i from node j at the appearance of the latter is not larger than R j , i.e.
where R j is determined by the equation expressing that the expected number of the previously appeared nodes that connect to node j must be m. The hyperbolic distance between nodes i and j at time j can be approximated as assuming that ζr ij and ζr jj are sufficiently large (which is true for most of the network nodes if the total number of nodes N is large enough) and therefore, 2 · √ e −2ζr ij + e −2ζr jj < θ ij , but in the meantime the angular distance between the possibly connecting nodes i and j is small enough to use the approximation sin(θ ij /2) ≈ θ ij /2 [20]. Using Eq. (S1.1.3), the connection probability of nodes i and j can be written for T = 0 as i.e., we are searching for the probability that the angular distance between nodes i and j is not larger than a given value, namely θ max . This can be rephrased as the probability that the angular position of node i falls in the spherical sector characterised by an apex angle 2 · θ max ij and an axis going through node j.
Since the angular position of the network nodes is chosen uniformly at random on the surface of a d-dimensional ball, the probability that the angular location of a given node falls in a certain spherical sector can be written simply as the fraction of the solid angle subtended by the sector in question and the solid angle subtended by the complete d-dimensional ball, independently of the direction of the axis of the examined spherical sector. Consequently, the probability that the angular distance measured between a new node and a previously appeared node is not larger than a given value θ max equals to the solid angle subtended by a spherical sector of apex angle 2 · θ max divided by the solid angle subtended by the complete ball. Thus, where Ω total d denotes the solid angle subtended by the complete surface of a d-dimensional ball, namely (S1. 1.6) and in the last step we used that the solid angle subtended by a d-dimensional spherical sector of an apex angle 2ψ can be calculated as In the case of large networks, the maximum angular distance θ max that still enables the link formation is small enough for most of the nodes to assume that in the range of the integration in Eq. (S1.1.5), sin d−2 φ ≈ φ d−2 . Based on this, (S1.1.8) Introducing the notation we arrive at the formula of the probability that nodes i and j get connected during the network growth. According to the model definition, here the initial radial coordinate of the jth node is r jj = (2/ζ) ln j and the radial coordinate of node i at time j is r ij = β·r ii +(1−β)·r jj = = β · (2/ζ) ln i + (1 − β) · (2/ζ) ln j, while R j can be expressed from the equation obtained by substituting (S1.1.10) into Eq. (S1.1.2). After some rearrangement, we can express R j as (S1.1.12) Substituting this expression back into (S1.1.10) yields Hereby we proved that the probability that nodes i and j connect to each other indeed can be written in the form of Eq. (S1.2) at T = 0. S1.2. Connection probability in the case of 0 < T At temperature 0 < T , the newly appearing node j (j = 1, 2, ..., N ) repeatedly makes attempts to create connections with the previously appeared nodes (indexed by i = 1, 2, ..., j − 1) until the emergence of m number of links. Thus, using the probability P (i, j) that the new node j connects in a given connection attempt to node i and the probability P (j) = j 1 P (i, j) di that the new node j connects in a given connection attempt to any of the already existing nodes, the probability that node i becomes connected to node j can be written as In one connection attempt, the new node j chooses randomly one of the previously appeared nodes, each with probability 1/(j − 1), and if there is still no connection between the new node and the selected one, then the link formation occurs with probability p(x) = 1/ 1 + e ζ(x−R j )/(2T ) , given that the current hyperbolic distance between the two nodes in question equals to x. Notice that in the case of large networks, most of the nodes appear at times j m, when the probability that the new node selects such a random node to which it is already connected is insignificant and can be ignored to ease the analysis. However, it has to be taken into consideration thatcontrary to the radial coordinates -the angular coordinates of the nodes are not strictly determined by the node identifiers, but are random variables; therefore, the hyperbolic distance between two nodes is also a random variable, and the probability that a given link creation attempt succeeds can be formulated as This can be rewritten as π 0 p(x) · P (θ ij = φ) dφ, since for given nodes with given radial coordinates the only source of randomness in the hyperbolic distance is the angular distance θ ij between the nodes. As described in Sect. S1.1, due to the uniformity of the angular node arrangement, the probability that the angular distance between two nodes falls in the range [φ, φ + dφ) can be calculated by dividing the solid angle subtended by the volume enclosed between two coaxial spherical sectors of apex angles 2 · φ and 2 · (φ + dφ) by the solid angle subtended by the complete d-dimensional ball, i.e., Using Eq. (S1.1.7) and that d dy we arrive at the formula which, using the notation introduced in Eq. (S1.1.9), takes the form of All things considered, the probability that node i attracts a link from node j in a given connection attempt can be formulated as Using that at the arrival of node j its hyperbolic distance from node i can be written as [20] x ij (j) ≈ r ij + r jj + 2 ζ · ln sin θ ij 2 , (S1.2.7) we arrive at the formula If the temperature T is small enough, then in the case of sufficiently large networks we can assume for most of the node pairs that the main contribution of the above integral comes from the range of small angular distances. This implies that the approximation sin φ ≈ φ can be used, and after that, changing the upper limit of the integral from π to infinity practically does not affect the value of the integral. Using these two assumptions, the connection probability becomes Finally, the substitution of Euler's reflection formula Γ(z) · Γ(1 − z) = π/ sin(z · π) and the radial coordinate formulas r jj = (2/ζ) ln j and which gives back for d = 2 the result of Ref. [21] in the 0 < T < 1 case.
and becomes negative as T exceeds 1/(d − 1), the approximation in Eq. (S1.2.9) can be valid only in the case of T < 1/(d − 1). However, at least in the case of sufficiently large networks, one can neglect the 1 in the denominator of the formula S1.2.8 for most of the node pairs at higher temperatures, yielding This approximation can hold only for 1/ To ensure that the term of the connection probability P (i, j) depending on the smaller node index i remains the same for 1/(d − 1) < T as in the case of 0 ≤ T < 1/(d − 1), the initial radial coordinate of each node ≥ 1 must be set to r = (2T (d − 1)/ζ) · ln instead of r = (2/ζ) · ln . This means that in Eq. (S1.2.11) which, using also the approximation sin (φ/2) ≈ φ/2, can be written for d = 2 as corresponding to the result of Ref. [21] in the case of 1 < T . Based on Eqs. (S1.2.10) and (S1.2.12), the probability that node j connects to any of the previously appeared nodes in a given link formation attempt is denoting the term in P (i, j) that is independent of the node index i. Finally, using Eqs. (S1.2.1), (S1.2.14) and (S1.2.15), the probability that nodes i and j connect to each other can be written as which is the same formula as the one given by Eq. (S1.2) that we wanted to prove. S1.3. The role of the multiplying factor in the initial radial coordinates and the f PSO model According to Refs. [21,43], in the properly parametrised preferential attachment model where the connection probability of nodes i (appearing at time i) and j (appearing at time j > i) can be written as Π(i, j) = m · i −q j 1 −q d (as described in Eq. (S1.1)), the degree decay exponent takes the form of γ = 1 + 1 q . The results of the previous sections show that in the dPSO model Thus, setting the initial radial coordinate of each node ≥ 1 to r = f · ln yields Since at T < 1/(d − 1) we used for any dimension d the same formula r = 2 ζ · ln (S1. 3.4) that was introduced in the original, two-dimensional PSO model [21], below the critical temperature T c = 1/(d − 1) the degree decay exponent became dependent on the number of dimensions d besides the popularity fading parameter β and, at the same time, independent of all the other model parameters (γ = 1 + 1 (d−1)·β ). Then, in order to ensure that the formula of the degree decay exponent remains the same above the critical temperature (just as in the original PSO model), we defined the initial radial coordinates as at temperatures 1/(d − 1) < T . Note that using the same initial radial coordinates as in the T < T c case would yield γ = 1 + T /β above the critical temperature. Nevertheless, by redefining r as we can obtain a degree decay exponent formula that coincides with the well-known expression γ = 1 + 1/β of the two-dimensional PSO model [21] for any number of dimensions, and we again recover the original radial coordinate formulas for d = 2. However, Eq. (S1.3.6) does not provide the possibility to achieve a degree decay exponent below 2, since in this case the smallest possible value of γ (obtained at β = 1) is independently of the number of dimensions. In contrast, if the initial radial coordinates are defined by Eqs. (S1.3.4) and (S1.3.5), then by increasing the number d of dimensions, the lower limit of the degree decay exponent can be decreased as With respect to the clustering and the community structure, the behaviour of the above-highlighted two variations of the dPSO model (differing in the initial radial coordinates given by either Eqs. (S1.3.4) and (S1.3.5) or Eq. (S1.3.6)) is the same, since for given values of ζ, d, N , m, γ and T , all the connection probabilities emerging during the network growth are equal. Looking at the approximating formula of the hyperbolic distance [20] written up for a new node j and a previously appeared node i as (S1.3.10) one can observe that the differences in the attractiveness of the already existing nodes arise from the term β·r ii and from the angular term. However, in a network characterised by a degree decay exponent γ, the term β · r ii is equal to 2 ζ·(d−1)·(γ−1) · ln i for both model variants. Thus, the difference in the initial radial coordinate formulas appears only in the term (2 − β) · r jj that is independent of i. And since the cutoff distance R j is always set to that value at which the expected number of connections will be m, together with the equal change of the (radial) distances between the new node and all of its possible neighbours, the connection probability function becomes shifted as well, leading to the emergence of a connection between node j and any previous node i eventually with the same probability in the case of using Eq. (S1.3.6) as in the case of defining the initial radial coordinates by Eqs. (S1.3.4) and (S1.3.5). According to the above, in the 2 ≤ γ regime these two model variants are equivalent. Consequently, although both definition of the initial radial coordinates are applicable, because of the wider range of achievable degree decay exponents, we stick to using Eqs. (S1.3.4) and (S1.3.5).
Let us now take a look at the effects of using the most general choice of the initial radial coordinates r = f · ln in the two-dimensional case, which we will refer to as the f PSO model. According to Eq. (S1.3.3), the initial radial coordinates r = f · ln with 2/(ζ · (d − 1)) < f · β for T < 1/(d − 1) and 2T /ζ < f · β for 1/(d − 1) < T yield γ < 2 even in the two-dimensional hyperbolic space (i.e., at d = 2). Note that γ is a decreasing function of the multiplying factor f , meaning that the f factors that yield relatively small degree decay exponents are high enough to ensure that in the hyperbolic law of cosines the terms ζr ij and ζr jj are sufficiently large and the hyperbolic distance can be written for most of the node pairs as which is an essential approximation in the derivation of the scale-free degree distribution. Nevertheless, to obtain higher values of γ, it is better to decrease only the popularity fading parameter β and not the multiplying factor f , since otherwise the approximation in Eq. (S1.3.13) becomes invalid due to the smallness of the radial coordinates.
To generate networks on the hyperbolic plane with initial radial coordinates r = f · ln , the cutoff distance R j has to be calculated for T = 0. According to Eqs. (S1.2.10) and (S1.2.11), for d = 2 Note that here, in accordance with Ref. [21], sin (φ/2) was approximated with φ/2 above the critical temperature T c = 1/(d − 1) too. Using the initial radial coordinate formula r = f · ln and that r ij = β · r ii + (1 − β) · r jj , we arrive at The cutoff distance R j of the connection probability at the appearance of node j can be expressed from the equation  [48,49] in f PSO and dPSO networks as a function of the temperature T and the degree decay exponent γ. According to these plots, in that parameter regime where the degree distribution of the f PSO networks is well-described by the theory built on the approximating formula of the hyperbolic distance given by Eq. (S1.3.13), i.e. when the multiplying factor f of the initial radial coordinates is large enough, there is no substantial difference between a two-dimensional f PSO network and a dPSO network of the same degree decay exponent and the possible lowest number of dimensions with regard to the strength of the clustering and the community structure. However, while it is easy to determine the lowest d at which a given small degree decay exponent γ is , it is rather burdensome to determine the exact limit of the factor f that separates the regions in which the hyperbolic distance is well or poorly approximable in practice. Therefore, instead of using the two-dimensional f PSO model and adjusting the β − f parameter pair, we prefer the dPSO model where γ can be controlled via the β − d setting. S1.3.1. Formulas of the radial coordinates and the popularity fading parameter in the case of d-dimensional embeddings As it is described above, there is more than one possible choice regarding the multiplying factor of the initial radial coordinates in the extension of the two-dimensional PSO model to any integer number of dimensions d ≥ 2. Although of the detailed two approaches given by only the latter can be used to obtain degree decay exponents below 2, both of these approaches may be suitable for the generation, and thus also the hyperbolic embedding of networks with 2 ≤ γ. The two model variants provide two different ways for the radial arrangement of a network with size N and degree decay exponent γ in a d-dimensional hyperbolic space of curvature K = −ζ 2 . In both cases, unless the average clustering coefficient of the network to be embedded is very close to 0, it can be assumed that the temperature T that corresponds to the network is smaller than the critical value T c = 1/(d − 1), and therefore the radial coordinate formulas of the 0 ≤ T < 1/(d − 1) . However, with the decrease of the multiplying factor f of the initial radial coordinates in the f PSO model, the approximation of the hyperbolic distance formula given by Eq. (S1.3.13) becomes invalid. As a result, the f PSO model of β = 1 already failed to generate the degree distribution characterised by the exponent γ = 2.5, and both the β = 1 and the β = 0.5 settings of the f PSO model yielded considerable deviations from the expected curve of γ = 6, while the degree distribution of the original PSO model (corresponding to the dPSO model with d = 2 or the two-dimensional f PSO model with f = 2/ζ) obtained at β = 1/(γ − 1) behaves as expected even for these higher values of γ. The curvature of the hyperbolic space, the number of nodes, the half of the expected average degree and the temperature were the same for all networks, namely K = −ζ 2 = −1 (using ζ = 1), N = 10, 000, m = 5 and T = 0. One network was generated with each parameter setting.
case can be used. Accordingly, if the popularity fading parameter β is determined as then the radial coordinate of the node having the th ( = 1, 2, ..., N ) largest degree (with ties in the order of node degrees broken arbitrarily) can be formulated as is used, then the radial coordinate in question can be calculated as Note that angular coordinates can be assigned to the network nodes independently from the radial positions using e.g. a method proposed in Ref. [38]. Note that for the f PSO model of a given popularity fading parameter β, the increase in the expected degree decay exponent γ corresponds to a decrease in the multiplying factor f of the initial radial coordinates, which ruins the approximation in Eq. (S1.3.13) and hereby increases the deviation between the degree distribution of the f PSO networks and the dPSO networks of the same expected degree decay exponent.

S2. Cutoff distance of the connection probability in the dPSO model
The cutoff distance R j of the connection probability p(x) = 1/[1 + e ζ(x−R j )/(2T ) ] applied at time j is set to the value ensuring that the expected number of nodes connecting to node j at its arrival is equal to m. As derived in Sect. S1.1, for T = 0 the cutoff distance can be calculated as or, using that r = (2/ζ) · ln , as Note that for d = 2 this formula is the same as the result of Ref. [21], which can be written as after the substitution of r jj = (2/ζ) · ln j.
In the case of 0 < T , the cutoff distance at time j is defined by the equation where j − 1 is the number of already existing nodes at the appearance of node j and P (j) is the probability that node j connects to any of the previously appeared nodes in a given link formation attempt. According to Sect. S1.2, Knowing also that the radial coordinate of node at time j is we arrive at the equation which can be solved numerically to obtain the cutoff distance R j . However, in the case of sufficiently large networks, for most of the nodes formula S2.6 can be approximated as Substituting this in Eq. (S2.5), after some rearrangement of the terms one can write up the cutoff distance of the connection probability at the appearance of node j as dφ can be calculated numerically. Note that since lim T →0 sin((d − 1) · T · π)/(T · π) = = d − 1, for T → 0 the approximated cutoff distance formula of the 0 < T < 1/(d − 1) case becomes Eq. (S2.2) as expected. For d = 2, the approximating formula of the 0 < T < 1/(d − 1) case gives back derived in Ref. [21] for 0 < T < 1 using r jj = (2/ζ) · ln j. Furthermore, if one uses the approximation sin (φ/2) ≈ φ/2 also in the 1/(d − 1) < T case as in Ref. [21], then Eq. (S2.11) yields for d = 2 and 1 < T , (S2.13) which corresponds to 14) derived in Ref. [21] for 1 < T using r jj = (2T /ζ) · ln j.
Finally, it is important to clarify how Eqs. (S2.2) and (S2.11) behave in the β → 1/(d − 1) limit. As 15) in the β = 1/(d − 1) case the cutoff distance takes the form of (S2. 16) In our simulations, in order to reduce the computational time, we always calculated the cutoff distances at 0 < T based on the approximating formulas given by Eqs. (S2.11) and (S2.16) instead of solving numerically Eq. (S2.9). According to Fig. S2.1, the difference between the results of the approximating formulas and the numerical equation solution measured for the 1000th node is already acceptable at most of the examined parameter settings. Therefore, in the case of the studied networks of size N = 10, 000 we can assume for most of the network nodes that the approximated cutoff distance was close enough to the value that could have been obtained by solving numerically Eq. (S2.9).  Figure S2.1. Absolute and relative difference between the results of the two methods proposed for calculating the cutoff distance of the connection probability at 0 < T . We calculated the cutoff distance for the node that arrives at the 1000th time step using Eq. (S2.9), or Eqs. (S2.11) and (S2. 16). The result of the first approach is denoted by R num 1000 , since it is based on the numerical solution of Eq. (S2.9), while R an 1000 stands for the cutoff distance obtained from the latter approach that is, at least for T < 1/(d − 1), an analytical calculation. Each pair of subplots depicts the effect of changing the popularity fading parameter β and the rescaled temperature T · (d − 1), with the number of dimensions d and the half m of the expected average degreek given in the title of the subplot pair. The curvature K of the hyperbolic space was set to −1 in each case, i.e. we always used ζ = 1.

S3. Simulation results for dPSO networks
This section presents simulation results additional to the figures of the main text. First, we confirm by Fig. S3.1 that in networks generated by the d PSO model of different expected average degree 2m, the tail of the complementary cumulative distribution function (CCDF) of the node degrees follows a power-law written as P(k ≤ K) ∼ k −(γ−1) , where the degree decay exponent γ can be expressed with the dimension d of the hyperbolic space and the popularity fading parameter β as 1+ 1 (d−1)·β . As it is demonstrated by Fig. 2 of the main text, the temperature T does not have any substantial impact on the degree distribution; therefore, here we study only the T = 0 case.
. 00 (k)     We also repeated our community analysis with a slight modification, taking into account the hyperbolic distances along the links. For this, we adopted the practice suggested in Ref. [38] and assigned weights to the links calculated from the hyperbolic distances between adjacent nodes as Then, we searched for the communities of the obtained weighted graphs with the Louvain, the Infomap and the asynchronous label propagation methods (where all algorithms allow link weights to be taken into account). As before, for characterising the strength of the detected community structures we used modularity. However, instead of its original version described in Sect. 3.3 of the main text, here we used an extended form defined for weighted networks [57], where the total number of links E is replaced w ij (with w ij denoting the link weight between nodes i and j), and the node degrees k i and k j are replaced by the node strengths s i and s j , defined e.g. for node i as s i = N =1 w i , resulting in the formula Similarly to Figs. S3.3-S3.8 dealing with the results of the community detection on unweighted networks, we present the averages and the standard deviations of the obtained modularities and community sizes in Figs. S3.9-S3.14 for the weighted case.  T · (d − 1)  T · (d − 1) Finally, we show some examples of the community size distributions obtained with the applied community finding methods for d PSO networks of different number of dimensions in Fig. S3. 15. With regard to the shape of the curves, the same conclusion can be drawn for each dimension d as in Ref. [25] for d = 2, namely that Louvain yields relatively narrow, bell-shaped community size distributions concentrated at higher community sizes, while the community size distributions provided by asynchronous label propagation and Infomap are rather skewed, following more or less a power law in the former case and decaying somewhat faster towards the larger sizes in the latter case. As indicated by the slight right shift of the community size distributions in Fig. S3.15, when the dimension d of the hyperbolic space is increased while keeping the curvature K, the number of nodes N , the expected average degree 2m, the degree decay exponent γ and the temperature T unaltered, all the examined community detection methods tend to find larger modules in the networks.  The panels in the left column were obtained setting all the link weights in the networks to 1, whereas the panels in the right column were created using link weights calculated from the hyperbolic distances between the connected nodes according to Eq. (S3.1). The different colours of the curves correspond to the different values of the dimension d, listed in the legend. The curvature of the hyperbolic plane K was always set to −1, i.e. we used ζ = 1. The number of nodes N was 10,000, the expected average degreē k = 2m was 10, the temperature T was 0 and the degree decay exponent γ was set to 2.5 by using the popularity fading parameter β = 1 (d−1)·(2.5−1) in each case.

S4. Nonuniform angular distribution
We have shown for the two-dimensional popularity-similarity optimisation model [21] in Ref. [25] and then for higher-dimensional cases in the present article that despite the absence of any intentional community formation mechanisms built into the model construction, dPSO networks possess an inherent, relevant community structure for a wide range of parameter settings. However, the dPSO model does not allow control over the number and the size of the communities. To deal with this problem, one can introduce heterogeneity in the angular node arrangement and generate networks using a nonuniform angular distribution of the network nodes, where denser angular regions can serve as built-in communities. This nonuniform popularity-similarity optimisation (nPSO) model has been studied in details in the two-dimensional case in Refs. [24,29]. Here, as an example, we examine a simple three-dimensional case, where the N number of network nodes are distributed in equal proportions among C number of planted (angular) groups that are created by setting the angular node distribution to a mixture of C number of von Mises-Fisher distributions [58] of the same concentration parameter κ, with their mean directions µ i (i = 1, 2, ..., C) distributed equidistantly over the unit 2-sphere. That is, we determined the angular coordinates of each network node by first choosing randomly one of the C number of mean directions µ i (representing the central locations of the angular sectors) and then sampling [59] a 3-dimensional unit vector y from the corresponding von Mises-Fisher distribution for which the probability density function is given by Eq. (S4.1). Note that a higher value of the concentration parameter κ means a higher concentration of the distributions around their mean direction, i.e. less spread and overlapping angular sectors, while κ = 0 pertains to the uniform angular distribution of the nodes. Our implementation of this three-dimensional nPSO model is available from Ref. [56].
f (y; µ i , κ) = κ 2π(e κ − e −κ ) · e κµ T i y (S4.1) The degree of separation among the planted groups is primarily determined by the concentration parameter κ and the temperature T , on the level of the angular node arrangement and the links, respectively. The effect of these two model parameters is demonstrated via some layouts in Fig. S4.1. Although e.g. κ = 20 already yields a noticeable separation among the patches of the network nodes, when T is increased to 1.5 · T c , then so many interconnections emerge between the patches that the planted modules eventually do not form actual communities. On the other hand, if the temperature is set to 0, the links become so localised that the planted groups split according to narrower angular regions. Nevertheless, for high enough values of the concentration parameter κ and moderate temperatures T , the planted groups were successfully identified by the community detection algorithm Louvain [48, 49] based on the edge list, without inputting any information about the network geometry.
In Fig. S4.2, we show how the average clustering coefficientc, the adjusted mutual information AMI [60, 61] of the planted community structure and the one detected by the Louvain algorithm (setting all the link weights to 1), and the modularity [50, 51] of the planted and the detected network partitions depend on the temperature for different concentration parameters. Similarly to what has been shown in Fig. 3 of the main text for the uniform dPSO model, the average clustering coefficientc measured in threedimensional nPSO networks gradually decreases with the increase in the temperature T before settling to a more or less constant value just above the critical temperature T c = 1/(d − 1) = 0.5. One can also observe that if κ is higher, i.e. the angular patches of the network nodes are more separated from each other, then the drop inc occurring when switching from the deterministic connection rule (T = 0) to the probabilistic one (0 < T ) is larger.
As expected, the adjusted mutual information of the planted and the detected partitions is 0 for κ = 0, when the angular position of the nodes is sampled from a mixture of uniform distributions and the nodes are assigned to the planted groups randomly, regardless of their angular coordinates. As we increase κ and create hereby larger angular gaps between the regions occupied by the network nodes, the AMI tends to become higher. At small temperatures, as exemplified by Fig. S4.1, the links are strongly localised, i.e. crowded into narrower sectors within the region of each planted group. Similarly to how the nodes of the whole hyperbolic space become grouped according to different angular regions in the case of the uniform PSO model [25], this angular confinement of the links splits the planted modules into smaller parts, yielding weaker agreement between the planted and the detected partitions. As the temperature begins to increase, at first the boundaries within the planted groups blur, and thus the AMI increases. Note that at large enough κ and moderate values of T , even AMI = 1 was achieved, meaning that the planted and the detected partitions were identical. However, even higher temperatures and farther-reaching connections already raise the interconnectedness of the adjacent planted groups too, until eventually the nodes lose their preference for primarily connecting to the members of their own planted group. Obviously, the larger the gaps between the occupied patches, the higher temperature is needed to enable the neighbouring patches to reach each other; thus, as κ increases, the AMI decreases more slowly to 0 as a function of T .
The behaviour of the modularities Q planted and Q detected as a function of the temperature T agrees with the results shown in Fig. 5 of the main text for the uniform dPSO model: according to this measure, both the planted and the detected community structure of three-dimensional nPSO networks lose from their strength as the temperature increases, and eventually both modularities settle to a constant value. It is important to note that the modularity of the planted network partitions practically does not exceed the modularity of the detected community structures, meaning that whenever we measure low modularity values, these arise due to the network structure and not because of some failure in the applied community finding algorithm. Since the presence of wider angular gaps between the patches of the nodes decreases the probability of interconnections, larger values of κ lead to higher modularities. Nevertheless, at high enough temperatures the probability for the emergence of connections can become non-negligible even for the members of angularly well-separated patches, diminishing the modularity differences among the networks characterised by different concentration parameters.
According to Fig. S4.2, while the maximum point of both the average clustering coefficientc and the modularities Q planted and Q detected is at T = 0, with respect to the adjusted mutual information of the planted and the detected partitions it is better to choose higher temperatures. Nonetheless, Fig. S4.3 demonstrates that this is not a general rule, and with proper parameter settings one can generate networks that are simultaneously highly clustered and possess a strong planted community structure that is also well detectable. Layout of three-dimensional nPSO networks in the native representation of the hyperbolic space of curvature K = −1 at different values of the concentration parameter κ and the temperature T . Each row of panels was created using a given concentration parameter κ, and each column of subplots presents the results obtained with a given value of the temperature T , as written in the panel titles. Each network was generated setting the number N of nodes to 10,000, the half m of the expected average degreek to 5, the popularity fading parameter β to 1/3 (yielding the degree decay exponent γ = 2.5) and the number C of components of the mixture distribution describing the angular arrangement of the network nodes to 8. Note that the critical temperature was T c = 1/(d − 1) = 0.5. Each network was generated in the 3-dimensional hyperbolic space of curvature K = −1, setting the number N of nodes to 10,000, the half m of the expected average degreek to 5, the popularity fading parameter β to 1/3 (i.e., the degree decay exponent γ to 2.5), and the number C of components of the mixture distribution describing the angular arrangement of the network nodes to 8. Figure S4.3. With proper settings of the parameters of the three-dimensional nPSO model, large adjusted mutual information of the planted and the detected modules can be achieved even at the temperature minimum T = 0, where the average clustering coefficient and the modularities are maximised for the given parametrisation. However, increasing the number N of nodes, decreasing the expected average degree 2 · m, increasing the popularity fading parameter β, decreasing the number C of components of the mixture distribution describing the angular arrangement of the network nodes, or decreasing the concentration parameter κ may reduce the AMI at T = 0 compared to the values obtained at slightly higher temperatures. In the right uppermost chart we show an example where -in contrast with Fig. S4.2 -the decay of the AMI towards the smallest possible temperatures does not appear. The layouts below exemplify how the change of the different model parameters compared to the settings of the uppermost figures can debase the high AMI value achieved at T = 0. The lowermost charts depict how the increase in the rescaled temperature T · (d − 1) affects the average clustering coefficientc, the adjusted mutual information AMI and the modularity Q of the planted partitions and the ones detected by Louvain in the five cases presented by the layouts above. The plotted data points correspond to the values averaged over 5 networks in the case of N = 10, 000 and 10 networks for N = 1000. The error bars indicate the standard deviations among the networks of the same parameter setting.
Lastly, we demonstrate in Fig. S4.4 that when each network node is assigned randomly to one of the 8 planted groups corresponding to von Mises-Fisher distributions of the same concentration parameter κ and mean directions pointing toward the vertices of a cube, then the resulting degree distribution reasonably preserves its form of P(K = k) ∼ k −γ with γ = 1 + 1 (d−1)·β , even if the angular distribution of the nodes becomes more and more heterogeneous due to the increasing separation of the occupied angular regions obtained at higher and higher values of κ. Similarly to what has been shown in Fig. 2 of the main text concerning the uniform dPSO model, the temperature T does not have a significant effect on the degree distribution even for the nonuniform three-dimensional PSO model.  Figure S4.4. Degree distribution of networks generated by the three-dimensional nPSO model using different values of the temperature T , the popularity fading parameter β and the concentration parameter κ. As expected, the tail of the complementary cumulative distribution function (CCDF) of the node degrees follows a power-law that can be written in the form of P(k ≤ K) ∼ k −(γ−1) with γ = 1 + 1 (d−1)·β , independently of the choice of κ and T . One network was generated with all the parameter settings. The curvature of the three-dimensional hyperbolic space, the number of nodes, the half of the expected average degree and the number of components of the mixture distribution describing the angular arrangement of the network nodes were the same for each network, namely K = −ζ 2 = −1, N = 10, 000, m = 5 and C = 8. Panel a) corresponds to the case of the deterministic connection rule (T = 0), panel b) shows the curves obtained at the half of the critical temperature T c = 1/(d − 1), while panel c) presents the results at a higher temperature of 1.5 · T c . The CCDF curves obtained with the different values of the concentration parameter κ are well grouped according to the popularity fading parameters, i.e. the expected degree decay exponents in all panels.