Variational principle for scale-free network motifs

For scale-free networks with degrees following a power law with an exponent $\tau\in(2,3)$, the structures of motifs (small subgraphs) are not yet well understood. We introduce a method designed to identify the dominant structure of any given motif as the solution of an optimization problem. The unique optimizer describes the degrees of the vertices that together span the most likely motif, resulting in explicit asymptotic formulas for the motif count and its fluctuations. We then classify all motifs into two categories: motifs with small and large fluctuations.

the density that a randomly chosen set of k hidden variables is proportional to … h h , , k 1 . The degree of a node is asymptotically Poisson distributed with its hidden variable as mean 32 , so (3) can be interpreted as a sum over all possible degree sequences. Therefore, our optimization method then needs to settle the following trade-off, inherently present in power-law networks: On the one hand, large-degree vertices contribute substantially to the number of motifs, because they are highly connected, and therefore participate in many motifs. On the other hand, large-degree vertices are by definition rare. This should be contrasted with lower-degree vertices that occur more frequently, but take part in fewer connections and hence fewer motifs. Therefore, our method give rise to a certain 'variational principle' , because it finds the selection of vertices with specific degrees that together 'optimize' this trade-off and hence maximize the expected number of such motifs.
We leverage the optimization method in two ways. First, we derive sharp expressions for the motif counts in the large-network limit in terms of the network size and the power-law exponent. Second, we use the method to identify the fluctuations of motif counts.
We present two versions of the method that we call free and typical variation. Free variation corresponds to computing the average number of motifs over many samples of the random network model. Typical variation corresponds to the number of motifs in one single instance of the random graph model. Remarkably, for τ ∈ (2, 3) these can be rather different. After that, we apply the method to study motif count fluctuations. Finally, we provide a case study where we investigate the presence of motifs in some real-world network data.

Results
Free variation. We first show that only hidden-variable sequences h with hidden variables of specific orders give the largest contribution to (3). Write the hidden variables as ∝ α h n i i for some α ≥ 0 i for all i. Then, using (2), the probability that motif H exists on vertices with hidden variables The hidden variables are an i.i.d. sample from a power-law distribution, so that the probability that k uniformly chosen hidden variables satisfy Material 2). Taking the product of this with (4) shows that the maximum contribution to the summand in (3) is . Then, the edges with α α + < 1 i j are edges inside S 1 and edges between S 1 and S 3 . If we denote the number of edges inside S 1 by E S 1 and the number of edges between S 1 and S 3 by E S S , 1 3 , then maximizing (5) is equivalent to maximizing and is thus fully determined by the partition ⁎  that optimizes (6). Theorem 1 implies that the expected number of motifs is dominated by motifs on vertices with hidden variables (and thus degrees) of specific orders of magnitude: constant degrees, degrees proportional to n or degrees proportional to n. Figures 1 and 2 show the partitions  ⁎ that dominate the expected number of motifs on three, four and five vertices.
Typical variation. The largest degrees (hubs) in typical samples of the hidden-variable model scale as τ− n 1/( 1) with high probability. The expected number of motifs, however, may be dominated by network samples where the largest degree is proportional to n (see Theorem 1). These samples contain many motifs because of the high degrees, and therefore contribute significantly to the expectation. Nevertheless, the probability of observing such a network tends to zero as n grows large. We therefore now adapt the variational principle with the goal to characterize the typical motif structure and hence the typical number of motifs.
www.nature.com/scientificreports www.nature.com/scientificreports/ We again assume degrees to be proportional to α n i , but now limit to degree sequences where the maximal degree is of order τ− n 1/( 1) , the natural cutoff in view of the typical hub degrees. The dominant typical motif structure is then obtained by maximizing (5), with the additional constraint that α ≤ τ − , and obtain an optimization problem similar to (6).
This shows that the typical degree of a motif is of constant order or proportional to τ− n 1/( 1) , n or τ τ − − n ( 2)/( 1) . Figure 3 and Supplementary Fig. 1 show the most likely motifs on three, four and five vertices. Observe that the dominant structures and the number of motifs of Figs 1 and 3 may differ. For example, the scaling of the expected number of claws (Fig. 1b) and the typical number of claws (Fig. 3b) is different. This is caused by the left upper vertex that has degree proportional to n in the free dominant structure, whereas its typical degree is proportional to τ− n 1/( 1) . Only when the solution to (6) does not involve hub vertices, the two scalings coincide. Hub vertices in the dominant structure give a major contribution to the motif count. While typical hub degrees scale as τ− n 1/( 1) , expected hub degrees may be much larger, causing the number of such motifs with hubs to scale faster in the free variation setting than in the typical variation setting. This indicates that the average and median motif count can differ dramatically.
Graphlets. It is also possible to only count the number of times H appears as an induced subgraph, also called graphlet counting. This means that an edge that is not present in graphlet H, should also be absent in the network motif. In Supplementary Material 4 we classify the expected and typical number of graphlets with a similar variational principle as for motifs. Supplementary Fig. 2 shows the typical behavior of graphlets on 4 vertices. This figure also shows that graphlet counting is more detailed than motif counting. For example, counting all square motifs is equivalent to counting all graphlets that contain the square as an induced subgraph: the square, the diamond and K 4 . Indeed, we obtain that the number of square motifs scales as τ − n n log( ) 6 2 by adding the number of square, diamond and K 4 graphlets from Supplementary Fig. 2. This shows that most square motifs are actually the diamond graphlets of Supplementary Fig. 2b. Thus, graphlet counting gives more detailed information than motif counting.
Fluctuations. Self-averaging network properties have relative fluctuations that tend to zero as the network size n tends to infinity. Several physical quantities in for example Ising models, fluid models and properties of the galaxy display non-self-averaging behavior [33][34][35][36][37][38][39] . We consider motif counts N(H) and call N(H) self-averaging Essential understanding of N(H) can then be obtained by taking a large network sample, since the sample-to-sample fluctuations vanish in the large-network limit. In contrast, if  N H N H Var ( ( ))/ [ ( )] 2 approaches a constant or tends to infinity as → ∞ n , the motif count is called non-self-averaging, in which case N(H) shows (too) strong sample-to-sample fluctuations that cannot be mitigated by taking more network samples.
Our variational principle facilitates a systematic study of such fluctuations, and leads to a classification into self-averaging and non-self-averaging for all motifs H. It turns out that whether N(H) is self-averaging or not depends on the power-law exponent τ and the dominant structure of H. We also show that non-self-averaging behavior of motif counts may not have the intuitive explanation described above. In some cases, motif counts in two instances are similar with high probability, but rare network samples behave differently, causing the motif count to be non-self-averaging. Thus, the classification of network motifs into self-averaging and non-self-averaging motifs does not give a complete picture of the motif count fluctuations. We therefore further divide the non-self-averaging motifs into two classes based on the type of fluctuations in the motif counts.
For a given motif H, let … H H , , m 1 denote all possible motifs that can be constructed by merging two copies of H at one or more vertices. We can then write the variance of the motif count as (see 37,[40][41][42] and the Methods section)   (8), we can determine for any motif H whether it is self-averaging or not. First, we find all motifs that are created by merging two copies of H. For the triangle motif for example, these motifs are the bow-tie, where two triangles are merged at one single vertex, the diamond of Fig. 3b, and the triangle itself. We find the order of magnitude of the expected number of these motifs using Theorem 1 to obtain the variance of N(H). We divide by  N H [ ( )] 2 , also obtained by Theorem 1, and check whether this fraction is diverging or not. Table 1 shows for which values of τ ∈ (2, 3) the motifs on 3, 4 and 5 vertices are self-averaging. For example, the triangle turns out to be self-averaging only for τ ∈ (2, 5/2).
Here is a general observation that underlines the importance of the dominant motif structure: Theorem 2. All self-averaging motifs for any τ ∈ (2, 3) have dominant free variation structures that consist of vertices with hidden variables Θ n ( ) only. We prove this theorem in Supplementary Material 5. Note that the condition on the dominant motif structure is a necessary condition for being self-averaging, but it is not a sufficient one, as the triangle example shows. Table 1 shows the values of τ for which all connected motifs on 3, 4 and 5 vertices are self-averaging. Combining the classification of the motifs into self-averaging and non-self-averaging with the classification based on the value of B f (H) from (6) as well as the difference between the expected and typical number of motifs yields a classification into the following three types of motifs:   . These motifs only contain vertices of degrees Θ n ( ). The number of such rescaled motifs converges to a constant 43 . Furthermore, the variance of the number of motifs is small compared to the second moment, so that the fluctuations of these types of motifs are quite small and vanish in the large network limit. The triangle for τ < 5/2 is an example of such a motif, shown in Fig. 4b,e. . These motifs also only contain vertices of degrees n. Again, the rescaled number of such motifs converges to a constant in probability 43 . Thus, most network samples contain a similar amount of motifs as n grows large, even though these motifs are non-self-averaging. Still, in rare network samples the number of motifs significantly deviates from its typical number, causing the variance of the number of motifs to be large. Figure 4a,d illustrate this for triangle counts for τ ≥ 5/2. The fluctuations are larger than for the concentrated motifs, but most of the samples have motif counts close to the expected value.
Type III: Non-concentrated motifs. B f (H) > 0. These motifs contain hub vertices. The expected and typical number of such motifs therefore scale differently in n. By Theorem 2, these motifs are non-self-averaging. The rescaled number of such motifs may not converge to a constant, so that two network samples contain significantly different motif counts. Figure 4c,f show that the fluctuations of these motifs are indeed of a different nature, since most network samples have motif counts that are far from the expected value.
Data. We now investigate motifs in five real-world networks with heavy-tailed degree distributions: the Gowalla social network 44 , the Oregon autonomous systems network 44 , the Enron email network 44,45 , the PGP web of trust 46 and the High Energy Physics collaboration network (HEP) 44 . Table 2 provides detailed statistics of these data sets. Because the number of motifs can be obtained from the number of graphlets, we focus on graphlet counts. Figure 5 shows the graphlet counts on a logarithmic scale. The order of the graphlets is from the most occurring graphlet (the claw), to the least occurring graphlet (the square and K 4 ) in the hidden-variable model, see Supplementary Fig. 2. In three networks the motif ordering follows that of the hidden-variable model, while in two networks the ordering is different. In the HEP collaboration network, for example, K 4 occurs more frequently  www.nature.com/scientificreports www.nature.com/scientificreports/ than the square. While this is not predicted by the hidden-variable model, it naturally arises due to the frequently occurring collaboration between four authors, which creates K 4 instead of the square. It would be interesting to see if this deviation from the ordering of the hidden-variable model can be linked to the specific nature of the data set in other examples. Supplementary Fig. 2 enumerates all possible vertex types in graphlets on 4 vertices. In the hidden-variable model, vertex types t 7 and t 9 have typical degrees proportional to τ− n 1/( 1) , vertex types t 1 and t 4 typically have degrees proportional to n, vertex type t 6 typically has degree proportional to τ τ − − n ( 2)/( 1) and vertex types t 5 , t 8 , t 10 , t 11 typically have constant degree. vertex types t 2 , t 3 and t 11 do not have a unique optimizer. The degrees of these vertex types are pair-constrained (see the proof of Lemma 3). Figure 6 shows the typical degree of all 11 vertex types in the five real-world data sets. Vertices with typical degree 1 in the hidden-variable model have the lowest degree in the five data sets. Vertices that have typical degree τ− n 1/( 1) in the hidden-variable model also have the highest degree among all vertex types in these five real-world data sets. Thus, typical degrees of vertices in a graphlet roughly follow the same ordering as in the hidden-variable model in these data sets.
The High Energy Physics collaboration network does not have a large distinction between the degrees of the different vertex types. This may be related to the fact that this network has less heavy-tailed degrees than the other networks (see Table 2).

Discussion
By developing a variational principle for the dominant degree composition of motifs in the hidden-variable model, we have identified the asymptotic growth of motif counts and their fluctuations for all motifs. This allowed us to determine for which values of the degree exponent τ ∈ (2, 3) the number of motifs is self-averaging. We further divide the non-self-averaging motifs into two classes with substantially different concentration properties.
Hub vertices in dominant motif structures cause wild degree fluctuations and non-self-averaging behavior, so that large differences between the average motif count and the motif count in one sample of the random network model arise. Non-self-averaging motifs without a hub vertex show milder fluctuations.
We expect that the variational principle can be extended to different random graph models, such as the hyperbolic random graph, the preferential attachment model and random intersection graphs. For example, for the hyperbolic random graph, the dominant structure of complete graphs is known to be n degrees 47 like in the hidden-variable model, but the dominant structures of other motifs are yet unknown.
In this paper, we presented a case study for motifs on 4 vertices in five scale-free network data sets. It would be interesting to perform larger network data experiments to investigate whether motifs in real-world  www.nature.com/scientificreports www.nature.com/scientificreports/ network data also have typical vertex degrees, and to what extent these vertex degrees are similar to the ones of the hidden-variable model. Similarly, investigating the typical behavior of motifs on more than 4 vertices in real-world data compared to the hidden-variable model is another topic for future research.
It would also be interesting to develop statistical tests for the presence of motifs in real-world data using the results from this paper. For example, one could compare the ordering of all motifs for size k from the most frequent occurring to the least frequent occurring motif, and compare this to the ordering in a hidden-variable with the same degree-exponent. This could shed some light on which motifs in a given data set appear more often than expected.

Methods
Fluctuations. Triangle fluctuations. We first illustrate how we can apply the variational principle to obtain the variance of the number of subgraphs by computing the variance of the number of triangles in the hidden-variable model. Let Δ denote the number of triangles, and let Δ i j k , , denote the event that vertices i, j and k form a triangle. Then, we can write the number of triangles as 1 6 , where ∑′ denotes the sum over distinct indices. Thus, the variance of the number of triangles can be written as When i, j, k and s, t, u do not overlap, the hidden variables of i, j, k and s, t, u are independent, so that the event that i, j and k form a triangle and the event that s, t and u form a triangle are independent. Thus, when i, j, k, s, t, u are all distinct, , , , so that the contribution from 6 distinct indices to (10) is zero. On the other hand, when = i u for example, the first term in (10) denotes the probability that a bow tie (see Fig. 2i) is present with i as middle vertex. Furthermore, since the degrees are i.i.d. and the edge statuses are independent as well,  Δ ( ) i j k , , is the same for any ≠ ≠ i j k, so that This results in where the diamond motif is as in Fig. 3b. The combinatorial factors 9, 18 and 6 arise because there are 9 ways to construct a bow tie (18 for a diamond, and 6 for a triangle) by letting two triangles overlap. The diamond motif does not satisfy the assumption in Theorem 1 that the optimal solution to (6) is unique. However, we can show the following result: ( )log( ) 6 2 .
Proof. Let i and j be the vertices at the diagonal of the diamond, and k and s the corner vertices. Then (5) is opti- for all values of β ∈ [1/2, 1] (see Supplementary  Information 6). All these optimizers together give the major contribution to the number of diamonds. Thus, we need to find the number of sets of four vertices, satisfying can be found using that the product of two independent power-law random variables is again distributed as a power law, with an additional logarithmic term 48  . Then, the expected number of sets of four vertices satisfying all constraints on the degrees scales as τ − n n log( ) 6 2 . By (4), the probability that a diamond exists on degrees satisfying (13) is asymptotically constant, so that the expected number of diamonds also scales as τ − n n log( ) 6 2 .
 Theorem 1 gives for the number of bow ties that (using 49 to find the optimal partition) (3 ) 7 3 4 7 3 5 2 www.nature.com/scientificreports www.nature.com/scientificreports/ and for the number of triangles (again using 49 ) that  Δ = Θ To investigate whether the triangle motif is self-averaging, we need to compare the variance to the second moment of the number of triangles, which results in Therefore, For τ = 5/2, the limit in (17) is of constant order of magnitude. Thus, the number of triangles is self-averaging as long as τ <  , where ∼ H denotes the motif that is constructed by merging two copies of H at i t 1 with j s 1 , at i t 2 with j s 2 and so on. Thus, this term can be written as