Generalised thresholding of hidden variable network models with scale-free property

The hidden variable formalism (based on the assumption of some intrinsic node parameters) turned out to be a remarkably efficient and powerful approach in describing and analyzing the topology of complex networks. Owing to one of its most advantageous property - namely proven to be able to reproduce a wide range of different degree distribution forms - it has become a standard tool for generating networks having the scale-free property. One of the most intensively studied version of this model is based on a thresholding mechanism of the exponentially distributed hidden variables associated to the nodes (intrinsic vertex weights), which give rise to the emergence of a scale-free network where the degree distribution p(k) ~ k−γ is decaying with an exponent of γ = 2. Here we propose a generalization and modification of this model by extending the set of connection probabilities and hidden variable distributions that lead to the aforementioned degree distribution, and analyze the conditions leading to the above behavior analytically. In addition, we propose a relaxation of the hard threshold in the connection probabilities, which opens up the possibility for obtaining sparse scale free networks with arbitrary scaling exponent.

Generalised thresholding of hidden variable network models with scalefree property Sámuel G. Balogh 1 , péter pollner 2 & Gergely palla 2 the hidden variable formalism (based on the assumption of some intrinsic node parameters) turned out to be a remarkably efficient and powerful approach in describing and analyzing the topology of complex networks. owing to one of its most advantageous property -namely proven to be able to reproduce a wide range of different degree distribution forms -it has become a standard tool for generating networks having the scale-free property. one of the most intensively studied version of this model is based on a thresholding mechanism of the exponentially distributed hidden variables associated to the nodes (intrinsic vertex weights), which give rise to the emergence of a scale-free network where the degree distribution p(k) ~ k −γ is decaying with an exponent of γ = 2. Here we propose a generalization and modification of this model by extending the set of connection probabilities and hidden variable distributions that lead to the aforementioned degree distribution, and analyze the conditions leading to the above behavior analytically. In addition, we propose a relaxation of the hard threshold in the connection probabilities, which opens up the possibility for obtaining sparse scale free networks with arbitrary scaling exponent.
Network theory provides an ubiquitous and sophisticated approach for the characterization of complex systems, possibly composed of many interacting units 1,2 . One of the most widely studied features of complex networks is given by the scale-free (SF) property, manifesting in the strong inhomogeneity of the degree distribution, p(k), accompanied by a power-law like decay of p(k) in the large degree regime [3][4][5][6][7] . On the modelling ground, several growing mechanisms have been proposed for generating networks without a characteristic degree scale, including the fundamental concept of the Barabási-Albert model together with its modifications and generalizations 3,8,9 . Meanwhile it became also evident that not all networks emerge from a growing mechanism [10][11][12] , and there are numerous examples where new connections can easily occur between already existing nodes in the system 13,14 . In some cases we can also reasonably assume that the rewiring of the network leads towards a more optimal configuration [15][16][17] , and that the propensity for creating links is encoded in each node as an intrinsic parameter. Assumptions of this type naturally gave rise to the development of the hidden variable formalism.
Inspired by the nature of protein interaction networks, the hidden variable model was originally introduced by Caldarelli et al. for explaining the emergence of non-growing scale-free networks, where link creation might be related to some intrinsic features of the nodes 10 . In a following work, Boguña et al. proposed a systematic analytic framework for generally characterizing classes of random graphs generated with hidden variables 18 . Based on this framework, later on a general method was implemented, capable of producing SF networks with a tunable γ scaling exponent 6 . Since then, the applicability of the hidden variable formalism has been confirmed on a large scale [19][20][21][22][23][24][25][26] , and due to its very general nature, a large variety of further network models ranging from stochastic block models 27-31 through multifractal graph generators 32 and evolving, fitness based network models combined with preferential attachment 33,34 to networks defined over hidden metric spaces [35][36][37][38][39] can be alternatively interpreted as special forms of this approach.
Although many different variations of this model has been proposed, the basic idea of the concept is to first associate a parameter (hidden variable) to the nodes, usually drawn from a prescribed distribution, and then connect the pairs of nodes according to a probability given by a fixed linking function taking the node variables as arguments. A part of these models can be referred to as geographical, where besides the hidden variables, node also have coordinates in a d dimensional Euclidean space, and the connection function is depending on both the distance and the hidden variables 40 . A very closely related class of models is where the coordinates are distributed in a hyperbolic space instead of a Euclidean one 36,38 , which provide a very interesting direction for research on their own, due to that they can generate SF networks with a high clustering coefficient in a natural way and due to their relevance in routing problems 38 .
A widely studied version of the hidden variable approach is where the linking function acts as a threshold, giving a connection probability 1 when the value of the two variables fulfill some criteria, and 0 otherwise. Non-geographical threshold models of this kind have gained considerable attention 10,21,22,38,[41][42][43] , later on being extended even to the geographical space 40 . An interesting phenomenon observed in the non-geographical thresholded model is that in the infinitely large system size limit the degree distribution seems to be universally characterized by a ∝ k −2 decay for various fitness distributions 21,22 . In the present paper we extend the previously studied families of hidden variable distributions and linking functions that fall into this class, and study the mathematical conditions leading to this specific degree decay exponent analytically. In addition, we also introduce a simple and intuitive relaxation of the former 'hard' threshold functions, that allows the modification of the exponent according to numerical simulations.

The Hidden Variable Model
When generating a network with N nodes in this approach, first we need to assign a hidden variable {x i | x i ≥ 0 ∀i ∈ {1, …, N}} to each node i, where x i is drawn from an arbitrary (but normalized) ρ(x) probability distribution. For simplicity, x i is often referred to as the fitness of node i. After distributing these intrinsic parameters we also have to define a linking function 0 ≤ f(x, y) ≤ 1, based on which the connection probability between nodes i and j can be simply expressed as p(link between i and j) = f(x i , x j ). Thus, in this model all of the information and the properties of the emerging network is completely encoded in the pre-defined form of ρ(x) and f(x, y). Following the continuous approximation introduced in 6,21 , the expected degree of nodes with fitness x can be expressed as Assuming that k(x) is a monotonically increasing and invertible function of x, according to the rule of transformation of random variables 6 the degree distribution can be written as x k ( ) where  x k ( ) denotes the inverse function of k(x). The simplest choice for the connection function is f(x, y) = const., where the connection probability is uniform and independent from the hidden variables, leading to the emergence of an Erdős-Rényi random graph. A more interesting example was shown in 10  , the obtained network certainly displays the SF property with the same γ exponent. Later on it turned out that for a more general class of linking functions where f(x, y) can be decomposed into a product such as f(x, y) = g(x)g(y), the hidden variable formalism can be regarded as analytically well-treatable, and it is also able to reproduce fat-tailed degree distributions with arbitrary γ exponents 6 . The product form also implies that for randomly chosen links the degrees of the endpoints are un-correlated, thus, from the point of view of degree assortativity the obtained networks are neutral. Another surprising result in ref. 10 is connected to the case where an exponential fitness distribution ρ − x ( ) e x is chosen together with an f(x, y) corresponding to a threshold function where Θ(x) refers to the Heaviside step function, and Δ is a constant with a logarithmic dependency on the system size N. Under these settings a power law decay of the degree distribution was detected with a γ = 2 scaling exponent, providing the first evidence for that SF networks can be generated in this approach even with non power-law like fitness distributions. As we already mentioned in the Introduction, in later studies it was observed that the inverse square decay of the degree distribution is actually a quite general feature, that holds for various other fitness distributions as well 21,22 . Further interesting occurrences of the inverse square decay is briefly discussed in 44 .

Generalized Classes of Non-Geographical Thresholded SF Hidden Variable Models with γ = 2
Model class definitions. In this section we introduce a broad set of hidden variable models where the degree decay exponent is equal to γ = 2. A common feature of these models is that they are thresholded in the sense that the linking function f(x, y) has a lower cut-off, controlled by a parameter Δ. The example in (3) is a special case of this, where f(x, y) immediately becomes 1 above the threshold. Here we use a much weaker assumption, namely that f(x, y) is 0 for a certain range of x and y values, and is non-zero (but not necessarily 1) outside this range. This is a far more general way of thresholding the connection probabilities, which allows a very broad range of connection functions to be used in the model, as shall be shown later. The linking functions having this property are denoted as f Δ (x, y) throughout the paper. Here we introduce sub-classes of hidden variable models, the first one is to which we refer as exponential-like and where • ρ(x) can be written as where H(x) is a differentiable, monotonously increasing function and H′(x) denotes its derivative, • while the thresholded f Δ (x, y) shows an additive dependency on H(x) and H(y), where  f x y ( , ) is assumed to be a general function taking values in the [0, 1] interval. We refer to the second sub-class as power-like, where • ρ(x) can be written as where G(x) has the same properties as H(x) in (4), • while the thresholded f Δ (x, y) shows a multiplicative dependency on G(x) and G(y), And last, if both additive and multiplicative dependency are present (mixed class): • ρ(x) can be written as where M(x) has the same properties as H(x) in (4), • while f Δ (x, y) can be expressed as: These generalized sub-classes can be derived from two simple observations. First, it can be shown in general that when replacing the step-like function in (3) by an arbitrary f Δ (x + y) having a lower cut-off at Δ, the degree distribution of the emerging networks is not affected. Second, the form of the hidden variable distributions and accompanying connection functions given in (4)(5)(6)(7)(8)(9) are also in very close relation with the transformations of the (ρ, f) pair that leave the degree distribution of the generated network invariant. To see that, let us assume an arbitrary ρ(x) and f(x, y) yielding a network with a degree distribution of p(k). By transforming the hidden variables using a monotonous function H as x i = H(z i ) and z i = H −1 (x i ) for all nodes i, according to the rule of transforming random variables the density of the original variable x can be also written as ρ ). Based on that, the expected degree for nodes with variable x given in (2) can be also expressed as x z 0 0 where we have changed the integration variable y to z′ = H −1 (y). By combining the expression of (10) with the transformation rule of degrees given in (2), we obtain that a transformed model where ρ z (z) = ρ x (H(z))H′(z) and the linking function is given by f(H(x), H(y)), will essentially lead to the same degree distribution as the original model. This conservation law of the degree distribution is also closely related to a general transformation rule of the fitness values written as which is analogous to (2). The sub-classes of models we defined in this paper exploit this property, where the 'original' model is corresponding to the simple model introduced in 10 , following the inverse square decay law. Nevertheless, this invariance of the degree distribution under appropriate simultaneous transformation of ρ(x) and f(x, y) is valid for the hidden variable approach in general, also in the case of geographical models. We also note that any model in one of the above defined three sub-classes can generally be mapped into a model in the other sub-class via simple transformations between H(x), G(x) and M(x) given by www.nature.com/scientificreports www.nature.com/scientificreports/ However, when the goal is to obtain a size independent SF degree distribution, the dependency of Δ = Δ(N) on the number of nodes N can be different in each class. In summary, the previous observations clarify in a simple manner why seemingly different realizations of (ρ, f) can give rise to the emergence of networks with the same degree distributions. In addition, we also gained simple rules for mapping the different realisations of (ρ, f) into each other. A remarkable consequence of the above is that for any Δ  f showing either additive, multiplicative or mixed dependency on its arguments, we can now construct a fitness distribution for which it is guaranteed that the degree distribution of the emerging network will display the  Table 1, all generating SF networks with a degree decay exponent γ = 2. In addition, in Fig. 1. we show the simulation results for the Weibull fitness distribution from the class of exponential-like distributions, where the corresponding linking function was chosen as where a is a constant and Δ is defined via the transcendent equation Δ = exp(a − Δ). The fact that the complementary cumulative distribution of the degrees behaves as k −1 in the large degree regime in Fig. 1. is in full consistency with the inverse square decay law. the inverse square decay law. Here we show in details that for thresholded hidden variable models falling into the class described in the previous subsection, the degree distribution of the emerging SF network will always have a degree decay exponent of γ = 2. Let us assume that we are dealing with an exponential-like model, where the fitness distribution is given in (4), and the linking function follows (5). Starting from the expression for the average degree given in (2), and multiplying both sides by exp[−H(x)] we can write   Table 1, at a scale parameter c = 2, and a linking function defined in (13), shown on logarithmic scale. The solid line is decreasing as k −1 , which is corresponding to the decay characteristics of F(k) in SF networks with γ = 2.
H Hx ( ( )) 1 where we used that f Δ (x, y) = 0 if H(x) + H(y) ≤ Δ. By a change of variable z = H(x) + H(y) we arrive to an equation where the right hand side is independent of x, z Based on (15) we define the integral z that depends on the form of the actually chosen  f z ( ) appearing in (5). Assuming that this integral exists, using (15-16) we can express the average degree of nodes with fitness x and the derivative of k(x) as In the thermodynamic limit of N → ∞, by substituting (17)(18) into (2) we obtain x k Moreover, according to (19) we can formulate a very simple condition under which p(k) becomes independent of the system size in the form of Similarly to the case of exponential-like distributions, if we choose the fitness distribution to be power-like as in (6), together with a thresholded connection function given in (7), the expected degree for a node having a hidden variable x can be written as G Gx 1 ( / ( )) 1 By changing to the integration variable z = G(x)G(y) we obtain appearing in (23) depends only on the chosen form of  f z ( ), thus, by assuming that α  K f ( ) exists we can express k(x) simply as 1 By substituting this into the general formula for the degree distribution given in (2) we gain x k x k x k ( ) proving that the power-like sub-class also leads to the emergence of a SF network with γ = 2. However, in this case the condition for a size independent degree distribution with the − p k k ( ) 2 property is given by a different formula compared to (20), written as (2019) (27) By following similar mathematical arguments as we did in the exponential-and power-like cases the same behaviour can be obtained for the third sub-class defined through (8)(9). Nevertheless, the condition for a size independent SF degree distribution with γ = 2 has yet again, a different form from (20) or (27), given by (28) An important related remark is that for all fitness distributions and linking functions given in the forms of (4), (5) or (6), (7) or (8), (9) the emerging networks always display the inverse square decay law independent from the specific form of  f , and thus, the appropriate form of  f only determines the condition for obtaining size-independent degree distribution.

Soft Thresholding with Tunable Degree Decay Exponent
The degree decay exponent of SF complex networks characterizing real systems is usually between γ = 2 and γ = 3 1,2 . Motivated by that, we extend the models defined in the previous section to allow the emergence of hidden variable networks with a γ > 2 exponent as well in the same framework.
The basic idea is to relax the 'hard' threshold in the connection function, controlled by the variable Δ in (3) and (5). Let us start with the Heaviside step function given in (3), which we can intuitively replace by a 'reversed' Fermi-Dirac function x y ( ) where β is a parameter taking positive values. Naturally, at β → ∞ we recover the original step-like connection function (3), whereas for finite β values we obtain a 'soft' threshold function. The form of f(x, y) in (29) is similar to that of the linking probability in temperature dependent graph ensembles 45,46 . The interpretation of β in this respect is analogous to the inverse temperature, with β → ∞ corresponding to the zero temperature 'ground' state, while networks generated with finite β values can be interpreted as states at higher temperatures 36,45 . Regarding the case where we can not assume the additive but rather the multiplicative dependency on x and y in (29), a simple approach to relax the step function is to define f(x, y) as xy converging to f(x, y) = Θ(xy − Δ) in the β → ∞ limit. In a similar fashion to (29), for the general exponential-like models defined in (4-5) we can replace (5) by and for the power-like models given in (6-7) we can change (7) to Similar form of connection function can be established for the third sub-class given by (8-9) based on (12). For all β dependent connection functions defined above, at β → 0 we obtain a linking kernel that becomes independent of the hidden variables, and thus, the generated network is an Erdös-Rényi random graph. However, in the opposite case, when β → ∞, the connection functions converge to the original 'hard' thresholded forms, and the generated networks are scale-free with a decay exponent of γ = 2. Therefore, by tuning the parameter β from 0 to ∞ we can scan through a series of networks starting from the classical random graph, and arriving to a SF network obeying the inverse square decay law at the other end of the spectrum. Presumably, during this transition we may find a finite β range in which the degree distribution is already power law like instead of the Poisson distribution, but the decay exponent γ has not yet reached the γ = 2 limit value. Our simulation results shown in Fig. 2. provide a strong support for the assumption above. In the four panels we display the complementary cumulative degree distribution for an exponential-like model with ρ Characterizing the γ(β) transition. In this section we aim to characterize the main features of the above mentioned transition of γ as a function of the effective temperature for the exponential case. In order to do so, we first define the general β dependent integral for k(x) and perform a transformation of variables offering a non-invertible form for the degree variable in general. However, approximate results can still be obtained possibly with logarithmic or sub-power corrections. If Δ  H x ( ) and β ∈ (0, 1), the main contribution to the integral comes from the  z 0 range, where the value of the denominator is large. Based on that, the above expression can be approximated by Even though this approximation tends to be less and less accurate as the value of β converges to zero, its advantage is that it provides an analytic and invertible expression for the degree distribution written as ( 1 1 ) We note that similar forms have already been established in refs 36,45 , however, not for the reversed Fermi-Dirac function. In addition to that, based on (34) we can also formulate an approximate condition for having a size independent degree distribution given by For β > 1 the integrand appearing in (33) becomes negligible for z = H(x) − Δ < 0, hence the lower bound of the integral can approximately be replaced by zero as www.nature.com/scientificreports www.nature.com/scientificreports/ which along similar arguments is yielding The results in (38) become exact in the limit of β → ∞. Analogously, the above analysis applies to the power sub-class as well, where the average degree as a function of fitness is written as Despite the same behaviour of the degree distribution, the formula above suggests that the condition for obtaining a size independent degree distribution requires Δ αβ to be proportional to N (for β ∈ (0, 1)). Furthermore, it also implies that the accurate characterization of the γ(β) transition for the three sub-classes requires similar considerations.
Our simulation results together with the approximation discussed above are shown in Fig. 3., depicting the transition of the scaling exponent as a function of the effective temperature 1/β. We kept control over the average degree of the generated networks by relying on (36) and (38), ensuring the size independence of the degree distribution.

Discussion
We have revisited the inverse square decay law of non-geographical thresholded hidden variable models, which was already studied in refs 10,21,22 from different perspectives. We provided a far more general framework for thresholding linking mechanisms, where the form of the connection function f(x, y) is not restricted to the usual Heaviside function, but instead can correspond to any general function with values in the [0,1] interval, as long as f(x, y) = 0 for a certain range of x and y values, and is non-zero outside this range. According to our results, this considerably weaker assumption on the form of the thresholded f(x, y) allows a very broad range of connection functions, that combined with properly chosen fitness distributions ρ(x) result in the inverse square decay law, similarly to the models discussed in ref. 10 . Along this line we provided three general sub-classes of hidden variable distributions and accompanying connection functions (i.e., the exponential, the power-like and the mixed class) that all generate SF networks with a degree decay exponent of γ = 2, and we also discussed how these different sub-classes are interrelated to each other.
Despite the invariance of the degree distribution obtained by using different f(x, y), the generated networks might show different properties at the level of local network quantities such as degree correlations or the clustering coefficient. For illustration in Fig. 4. we provide simulation results for the clustering coefficient as a function of k, when replacing the Heaviside step function with other possible forms of f Δ (x + y). According to Fig. 4a, the average clustering coefficient C is more or less constant below a characteristic degree and is decaying as a power law above this characteristic degree for the Heaviside step function (red symbols), and also for a connection function f Δ (x + y) converging to the Heaviside step function in an exponential manner (green symbols). In contrast, for connection functions having a peak at x + y = z = Δ(N) with a decaying tail for larger z values (green and yellow symbols), we can observe a peak in C k ( ), together with a faster (but still power law like) decay in the large degree regime.
We also proposed a relaxation of the 'hard' threshold for each sub-class imposed by the Δ controlled boundary in the connection functions. The basic idea was to use a 'reversed' Fermi-Dirac function providing a sigmoid transition between low and high linking probability values, where the sharpness of the transition (or in other words, the width of the intermediate linking probability values) is controlled by a parameter β. Based on analogy with temperature dependent graph ensembles 45 , β can be interpreted as a sort of inverse temperature, where the original 'hard' thresholded models are recovered in the zero temperature limit of β → ∞. The great advantage of where the fitness distribution was chosen to be ρ(x) = 3x 2 exp(−x 3 ) and the linking function f(x, y) was given by (31). The dashed line shows the (approximate) analytic results given by (35) and (38).
the higher temperature models is that according to numerical simulations, the degree decay exponent becomes larger than γ = 2, and by changing β, it can be tuned to any preferred value in the range of typical γ values measured in real systems. We also generally discussed the criteria in multiple different cases of how to generate networks having degree distribution independent of the size. Hence, the models with the relaxed threshold at finite β values offer a flexible fitness-based approach being adjustable to complicated fitness distributions for generating sparse SF networks with realistic degree decay exponent.
In conclusion, our analysis showed that linking kernels with a general lower-cutoff and having either additive, multiplicative or mixed dependence on their arguments can always generate SF networks together with the appropriately chosen fitness distributions. A further remarkable consequence of the above is that a general mapping can be established between different ρ fitness distributions and possible f linking functions. I.e., for any fitness distribution ρ * in general there exists a family of thresholded linking functions Δ ⁎ f that together give rise to scale-free networks with a γ = 2, and vice versa, for any thresholded linking function Δ ⁎ f we can find the corresponding fitness distribution ρ * together which they display the same property. This might provide an alternative way of understanding how those fitness/activity driven systems exhibit SF behaviour where the distributions of the hidden variables follow non-trivial, complicated forms. ( ) for four different networks each of them containing N = 20000 nodes. All networks were generated by using the same exponential fitness distribution but with different connection functions f Δ (x + y) = f Δ (z) corresponding to specific forms of (5) displayed by different colours and indicated in panel (b).