Abstract
In numerous physical models on networks, dynamics are based on interactions that exclusively involve properties of a node’s nearest neighbors. However, a node’s local view of its neighbors may systematically bias perceptions of network connectivity or the prevalence of certain traits. We investigate the strong friendship paradox, which occurs when the majority of a node’s neighbors have more neighbors than does the node itself. We develop a model to predict the magnitude of the paradox, showing that it is enhanced by negative correlations between degrees of neighboring nodes. We then show that by including neighbor-neighbor correlations, which are degree correlations one step beyond those of neighboring nodes, we accurately predict the impact of the strong friendship paradox in real-world networks. Understanding how the paradox biases local observations can inform better measurements of network structure and our understanding of collective phenomena.
Similar content being viewed by others
Introduction
Local interactions among nodes in a complex network can lead to an astounding array of collective phenomena. Examples include viral outbreaks in social networks, cascading failures in the power grid and financial networks, synchronization of coupled oscillators, opinion dynamics and consensus formation in human groups. Researchers have linked the structure of complex networks to the dynamics of collective phenomena unfolding on them: highly connected nodes amplify viral outbreaks1,2,3, while community structure affects the dynamics of synchronization4 and the spread of social contagions5.
A node’s own local view of a network, however, may be systematically biased. One source of bias is Feld’s friendship paradox: the number of connections, or degree, of a node is smaller than the average of its neighbor’s degrees6. Recently, more subtle forms of the paradox have been proposed. The strong friendship paradox7 states that the degree of a node tends to be smaller than the median of its neighbor’s degrees. Roughly speaking, this is equivalent to the node having fewer neighbors than do a majority of its neighbors. But unlike the original friendship paradox and some recent generalizations8,9,10,11, the strong friendship paradox does not arise as a straightforward result of sampling from skewed distributions7. The strong friendship paradox can dramatically distort local measurements in a network, leading to the “majority illusion”12 in which a globally rare attribute may be overrepresented in the local neighborhoods of many nodes. Physical systems whose dynamics are governed by majority rule—from Ising spin interactions13 to more complex voting models14—may be affected by this paradox.
In this manuscript, we develop a stochastic model to predict the magnitude of the strong friendship paradox. Specifically, we show that (a) increasingly disassortative networks exhibit a larger paradox, and (b) accurately modeling it requires considering degree correlations one step beyond those of neighboring nodes.
Given a network with degree distribution p(k), we define the global probability of the strong friendship paradox as P paradox = ∑ k p(k)f(k), where f(k) is the probability that a randomly chosen node with degree k experiences the paradox. Formally, we define
where \({k}_{i}^{^{\prime} }\) is the degree of the node’s ith neighbor.
Of course, networks can have structure beyond that given by the degree distribution. The dK-series framework15 specifies network structure as a series of joint degree distributions of subgraphs of d nodes. Thus, a network’s 1K-structure is specified by the degree distribution p(k). The 2K-structure captures degree correlations of nodes in connected pairs. This is specified by the joint degree distribution e(k, k′), the probability that an edge links two nodes with degrees k and k′. It follows that the degree distribution of an edge’s endpoint is \(q(k)={\sum }_{k^{\prime} }e(k,k^{\prime} )=kp(k)/\langle k\rangle \). Similarly, a network’s 3K-structure is specified by the joint degree distribution of connected triplets, either wedges or triangles. We find that these higher-order degree correlations can be substantial in real-world networks, possibly reflecting their macroscopic organization into a core-periphery structure, and that accounting for them is necessary for a quantitative understanding of the strong friendship paradox.
The strong friendship paradox depends only on the comparison between the degrees of a node and its neighbors. The probability Q > that a node sees a neighbor with degree larger than its own can be written as:
since the neighbor degree distribution of a degree k node is P(k′|k) = e(k, k′)/q(k). This expression uses information about the network’s 2K-structure, which is globally measured by the assortativity coefficient16
where the variance of k is taken with respect to the distribution q(k):
In assortative networks (r > 0), nodes preferentially link to other nodes with similar degree, while in disassortative networks (r < 0), they prefer to link to others with dissimilar degree, e.g., high to low degree nodes. Since k is in the numerator of the sum for r but in the denominator of Eq. (1), given the normalization \({\sum }_{k,k^{\prime} }e(k,k^{\prime} )=1\), we may expect disassorativity to magnify the paradox in networks, and assortativity to suppress it. Previous numerical results for the conventional friendship paradox10 support this prediction.
Results
The 2K model
Given a randomly chosen node with degree k, define an indicator function x i , i = 1…k, to track the degree of the node’s ith neighbor:
To a close approximation (and exactly, for odd k), the node is in the paradox regime if \(\bar{x}\equiv \frac{1}{k}{\sum }_{i=1}^{k}{x}_{i} > \frac{1}{2}\).
To understand how network structure affects the strong friendship paradox, we now examine μ x (k), the probability that a neighbor (say the ith one) of a randomly chosen degree-k node has degree greater than k:
If we assume that degrees of neighbors are independent and identically distributed random variables, the probability for a degree-k node to observe the strong friendship paradox is then given by the binomial distribution:
For large k, f(k) is close to Gaussian. In terms of the normal distribution’s cumulative distribution function Φ,
To demonstrate how assortativity modifies the strong friendship paradox, we consider a network with e(k, k′) that has a bivariate log-normal distribution, a long-tailed distribution defined on positive domain of k, with equal means m, equal variances s 2, and correlation coefficient c. This form of the distribution allows for analytical treatment of the problem. Thus, the assortativity can be written as
Note that the assortativity is bounded by \(-{e}^{-{s}^{2}}\le r\le 1\), and increases with c. We can then express μ x (k) analytically as
It follows that f(k) decreases with k. As the network becomes more disassortative (c < 0), f(k) undergoes an increasingly sharp transition from 1 to 0 around k = e m (Fig. 1(a)). Given that most nodes have low degree, this leads to a globally stronger paradox in more disassortative networks (Fig. 1(b)), consistent with our prediction.
The structure of real-world networks creates conditions for the paradox. Table 1 reports the observed fraction of nodes in these networks who see a majority of their neighbors with a larger degree. This fraction is very large in all networks, ranging from 75% to 90%.
Table 1 shows that the observed fractions of nodes experiencing the paradox are close to the global probabilities predicted by the 2K model, when μ x (k) is set to the actual frequency with which a neighbor of a degree-k node has larger degree. However, a breakdown by degree class reveals significant deviations. Figure 2 plots the paradox probability f(k) for a degree-k node (blue dots). We define the degree at which the 2K estimate (Eq. (6)) of paradox probability is 0.5 as the critical degree k c of the network. By construction, k c = Median(q(k)). Nodes with degree k < k c are likely to experience the paradox, while those with k > k c are unlikely to do so. The 2K model (dotted line) overestimates the paradox for low-degree nodes and underestimates it for high-degree nodes. This suggests that the 2K model is insufficient, and we need to take into account structure beyond degree correlations of connected pairs of nodes.
The 3K model
If neighbor degrees are identically distributed but correlated random variables, Eq. (6) must be modified to represent a multivariate rather than a single binomial distribution. To deal with the correlation, we now consider a pair of neighbors, with degree k i and k j , of a single degree-k node, and their indicator functions x i and x j as defined in Eq. (3). The corresponding multivariate normal approximation then gives
where the variance \({\sigma }_{x}^{2}(k)\) is now
Unlike in Eq. (6), where f(k) is completely determined by μ x (k), the 3K model requires the covariance term to be specified. Using values determined empirically from real-world networks as in the 2K model, we obtain very accurate paradox probability estimates (solid line in Fig. 2). These estimates also improve on the global 2K results shown in Table 1 for all cases except Youtube and English words, where the two estimates are nearly identical due to their close agreement for low degree values that represent a large fraction of nodes in the network.
To understand the effect of the covariance term, consider the 3K-distribution \(t({k}_{i}^{^{\prime} },k,{k}_{j}^{^{\prime} })\), the joint degree distribution of a connected ordered triplet of nodes with degrees \(({k}_{i}^{^{\prime} },k,{k}_{j}^{^{\prime} })\). Conditioning on the degree k of the focal node gives the joint degree distribution of its two neighbors:
The indicator function covariance term in Eq. (10) is
where
and \(P({k}_{i}^{^{\prime} } > k|k)\) is given by Eq. (4). Thus, the covariance takes into account correlations only up to the level of chains \(({k}_{i}^{^{\prime} },k,{k}_{j}^{^{\prime} })\). Any higher-order correlations beyond 3K, such as those involving connected subgraphs of four nodes, would no longer be consistent with a normal approximation for f(k), since they would involve information beyond the second moment of the indicator function. The remarkable success of the 3K model in Fig. 2 suggests that such higher-order correlations are not needed to explain the paradox, or that they are negligible in real-world networks.
Define the neighbor-neighbor correlation as
Note that this correlation, like σ x (k), is based not on the neighbors’ degrees but on the indicator function comparing them to the node’s degree. Figure 3 shows empirically determined values of ρ x (k) for the real-world networks we studied. Recall that in the 2K model, the probability that a degree-k node has a neighbor with degree greater than k is determined completely by e(k, k′) and is unrelated to the degrees of the other neighbors. One might reasonably expect low-degree nodes to have mostly neighbors of higher degree, high-degree nodes to have mostly neighbors of lower degree, and medium-degree nodes to have a mix of both. Figure 3, however, depicts a different scenario: medium-degree nodes prefer to have neighbors with similar degree to one another—whether those neighbors have higher or lower degree. To see how these correlations may be indicative of the macroscopic organization of a network, we plot the distribution of \(\bar{x}\), the fraction of higher-degree neighbors, for nodes with k = k c . In the technological networks of Skitter and Google, such medium-degree nodes link more often to high-degree nodes, possibly reflecting a hierarchical network structure with medium-degree at the top level and high-degree nodes at the next level. The remaining networks show a broad distribution of \(\bar{x}\), consistent with a core-periphery network structure where medium-degree nodes link to higher-degree nodes in the core and to lower-degree nodes in the periphery17, 18.
Discussion
The connection between local measurement bias and network structure revealed by the strong friendship paradox is crucial for several reasons. It is often impractical to observe large networks in their entirety: instead, researchers estimate network properties by exploring local neighborhoods of select nodes. The paradox, however, may systematically bias local views of networks structure, including sampled degree distribution19. The strong friendship paradox also affects measurements of information in networks. Consider a network where nodes have attributes and estimate their prevalence from local observations. When attribute and degree are correlated, the paradox can create an illusion that the attribute is common even when it is globally rare12. Finally, quantifying measurement bias may be necessary for predicting the evolution of dynamic processes such as domain formation by majority rule in interacting spin systems13, or synchronization of frequencies in complex networks such as electrical power grids20. Accounting for neighbor-neighbor correlations could be instrumental to the success of network models for such systems.
In this paper, we have studied strong friendship paradox in networks, a phenomenon that distorts nodes’ observations of local network structure. The paradox leads most nodes to observe that a majority of their neighbors have a larger degree than their own. We have developed an analytical model of the strong friendship paradox, enabling highly accurate predictions of its strength in networks. In contrast to Feld’s friendship paradox6, which exists in any network with variance in the degree distribution, the strong friendship paradox requires information about higher-order network structure. Specifically, negative correlations between degrees of connected nodes—given by network’s 2K structure—will magnify the paradox, especially in networks with a skewed degree distribution. The impact of disassortativity, however, is modulated by degree correlations between nodes’ neighbors. These correlations—given by network’s 3K structure—are necessary to accurately quantify the paradox. The success of the 3K model in explaining the paradox is consistent with the observation15 that it is sufficient to capture known network properties. In order to mitigate the effects of local measurement bias in networks, it is important to account for the strong friendship paradox and how it is impacted by higher-order network structure.
Methods
Data description
We study six networks from a variety of domains, including social networks (friendship links on LiveJournal blogging site soc-LiveJournal121, community structure on Youtube com-Youtube21) technological networks (Skitter internet graph as-skitter21 and Google web hyperlink graph web-Google21), scientific citations graph (Arxiv cit-HepPh21), and relationships between English words22. Table 2 shows some basic properties of the networks. These networks vary in size from 34.5 K nodes (Arxiv) to almost 4 M nodes (LiveJournal), and assortativity from 0.045 (LiveJournal) to −0.08 (Skitter).
Change history
04 December 2018
A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has not been fixed in the paper.
References
Watts, D. J. A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences 99, 5766–5771 (2002).
Kempe, D., Kleinberg, J. & Tardos, E. Maximizing the spread of influence through a social network. In KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 137–146 (ACM Press, New York, NY, USA, 2003).
Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E. & Getz, W. M. Superspreading and the effect of individual variation on disease emergence. Nature 438, 355–359 (2005).
Lerman, K. & Ghosh, R. Network structure, topology and dynamics in generalized models of synchronization. Physical Review E 86 (2012).
Weng, L., Menczer, F. & Ahn, Y.-Y. Virality prediction and community structure in social networks. Scientific reports 3 (2013).
Feld, S. L. Why Your Friends Have More Friends Than You Do. American Journal of Sociology 96, 1464–1477 (1991).
Kooti, F., Hodas, N. O. & Lerman, K. Network Weirdness: Exploring the Origins of Network Paradoxes. In International Conference on Weblogs and Social Media (ICWSM) (2014).
Hodas, N., Kooti, F. & Lerman, K. Friendship Paradox Redux: Your Friends Are More Interesting Than You. In Proc. 7th Int. AAAI Conf. on Weblogs And Social Media (2013).
Eom, Y.-H. & Jo, H.-H. Generalized friendship paradox in complex networks: The case of scientific collaboration. Scientific Reports 4 (2014).
Jo, H.-H. & Eom, Y.-H. Generalized friendship paradox in networks with tunable degree-attribute correlation. Physical Review E 90 (2014).
Cao, Y. & Ross, S. M. The friendship paradox. Mathematical Scientist 41 (2016).
Lerman, K., Yan, X. & Wu, X.-Z. The “majority illusion” in social networks. PLoS ONE 11(2), e0147617 (2016).
Krapivsky, P. L. & Redner, S. Dynamics of majority rule in two-state interacting spin systems. Phys. Rev. Lett. 90, 238701 (2003).
Liggett, T. M. Stochastic Interacting Systems: Contact, Voter and Exclusion Processes (Springer-Verlag, Berlin, 1999), 1 edn.
Mahadevan, P., Krioukov, D., Fall, K. & Vahdat, A. Systematic topology analysis and generation using degree correlations. In ACM SIGCOMM Computer Communication Review, vol. 36, 135–146 (ACM, 2006).
Newman, M. E. J. Assortative Mixing in Networks. Phys. Rev. Lett. 89, 208701 (2002).
Rombach, M. P., Porter, M. A., Fowler, J. H. & Mucha, P. J. Core-periphery structure in networks. SIAM Journal on Applied mathematics 74, 167–190 (2014).
Zhang, X., Martin, T. & Newman, M. E. J. Identification of core-periphery structure in networks. Phys. Rev. E 91, 032803 (2015).
Achlioptas, D., Clauset, A., Kempe, D. & Moore, C. On the Bias of Traceroute Sampling; or, Power-law Degree Distributions in Regular Graphs. In Proc. 37th ACM Symposium on Theory of Computing (STOC) (2005).
Dörfler, F., Chertkov, M. & Bullo, F. Synchronization in complex oscillator networks and smart grids. Proceedings of the National Academy of Sciences 110, 2005–2010 (2013).
Leskovec, J. & Krevl, A. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014).
Fellbaum, C. & Tengi, R. Wordnet: A lexical database of english. http://wordnet.princeton.edu/ (2005).
Author information
Authors and Affiliations
Contributions
Conceived and designed the experiments: X.W., K.L. Built the models: X.W., A.P., K.L. Performed the experiments: X.W. Analyzed the data: X.W. Wrote the paper: X.W., A.P., K.L.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wu, XZ., Percus, A.G. & Lerman, K. Neighbor-Neighbor Correlations Explain Measurement Bias in Networks. Sci Rep 7, 5576 (2017). https://doi.org/10.1038/s41598-017-06042-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-017-06042-0
Keywords
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.