Popularity versus similarity in growing networks

Journal name:
Nature
Volume:
489,
Pages:
537–540
Date published:
DOI:
doi:10.1038/nature11459
Received
Accepted
Published online

The principle1 that ‘popularity is attractive’ underlies preferential attachment2, which is a common explanation for the emergence of scaling in growing networks. If new connections are made preferentially to more popular nodes, then the resulting distribution of the number of connections possessed by nodes follows power laws3, 4, as observed in many real networks5, 6. Preferential attachment has been directly validated for some real networks (including the Internet7, 8), and can be a consequence of different underlying processes based on node fitness, ranking, optimization, random walks or duplication9, 10, 11, 12, 13, 14, 15, 16. Here we show that popularity is just one dimension of attractiveness; another dimension is similarity17, 18, 19, 20, 21, 22, 23, 24. We develop a framework in which new connections optimize certain trade-offs between popularity and similarity, instead of simply preferring popular nodes. The framework has a geometric interpretation in which popularity preference emerges from local optimization. As opposed to preferential attachment, our optimization framework accurately describes the large-scale evolution of technological (the Internet), social (trust relationships between people) and biological (Escherichia coli metabolic) networks, predicting the probability of new links with high precision. The framework that we have developed can thus be used for predicting new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon.

At a glance

Figures

  1. Geometric interpretation of popularity[thinsp][times][thinsp]similarity optimization.
    Figure 1: Geometric interpretation of popularity×similarity optimization.

    The nodes (dots) are numbered by their birth times, and located at random angular (similarity) coordinates. On its birth, the new circled node t in the yellow annulus connects to m old nodes s minimizing sθst. The new connections are shown by the thicker blue links. In a and b, t = 3 and m = 1. In a, node 3 connects to node 2 because 2θ23 = 2π/3<1θ13 = 5π/6. In b, node 3 connects to node 1 because 1θ13 = 2π/3<2θ23 = π. In c, an optimization-driven network with m = 3 is simulated for up to 20 nodes. The radial (popularity) coordinate of new node t = 20 is rt = ln t, shown by the long thick arrow. This node connects to the three hyperbolically closest nodes. The red shape marks the set of points located at hyperbolic distances less than rt from the new node. Arrows on dots show all nodes drifting away from the crossed origin, emulating popularity fading as explained in the text. The drift speed in the network shown corresponds to the degree distribution exponent γ = 2.1. The outer green circle shows the current network boundary of radius rt = ln t expanding with time t as indicated by green arrows.

  2. Emergence of PA from popularity[thinsp][times][thinsp]similarity optimization.
    Figure 2: Emergence of PA from popularity×similarity optimization.

    Two growing networks have been simulated up to t = 105 nodes, one growing according to the described optimization model, and the other according to PA. In both networks, each new node connects to m = 2 existing nodes. The γ right arrow 2 limit is not well-defined in PA, so that γ = 2.1 is used instead as described in the text. a, The probability Π(k) that an existing node of degree k attracts a new link. The solid line is the theoretical prediction, while the dashed line is a linear function, Π(k)k. b, The probability p(x) that a pair of nodes located at hyperbolic distance x are connected. The average clustering (over all nodes) in the optimization and PA networks is and , respectively.

  3. Popularity[thinsp][times][thinsp]similarity optimization for three different networks.
    Figure 3: Popularity×similarity optimization for three different networks.

    a, The growing Internet; b, E. coli metabolic network; and c, pretty-good-privacy (PGP) web of trust (WoT) between people. Each plot shows the probability of connections between new and old nodes, as a function of the hyperbolic (popularity×similarity) distance x between them in the real networks (circles and squares) and in PA emulations (diamonds and triangles). To emulate PA, new links are disconnected from old nodes to which these links are connected in the real networks, and reconnected to old nodes according to PA. For a pair of historical network snapshots S0 (older) and S1 (newer), new nodes are the nodes present in S1 but not in S0, and old nodes are the nodes present both in S1 and S0. Each plot shows the data for two pairs of such historical snapshots. The solid curve in each plot is the theoretical connection probability in the optimization model with the parameters corresponding to a given real network. Because the probability of new connections in the real networks is close to the theoretical curves, the shown data demonstrate that these networks grow as the popularity×similarity optimization model predicts, whereas PA, accounting only for popularity, is off by orders of magnitude in predicting the connections between similar (small x) or dissimilar (large x) nodes. To quantify this inaccuracy, the insets show the ratio between the connection probabilities in PA emulations and in the real networks, that is, the ratios of the values shown by diamonds and circles, and by triangles and squares in the main plots. The x-axes in the insets are the same as in the main plots.

References

  1. Dorogovtsev, S., Mendes, J. & Samukhin, A. WWW and Internet models from 1955 till our days and the “popularity is attractive” principle. Preprint at http://arXiv.org/abs/cond-mat/0009090 (2000)
  2. Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509512 (1999)
  3. Krapivsky, P. L., Redner, S. & Leyvraz, F. Connectivity of growing random networks. Phys. Rev. Lett. 85, 46294632 (2000)
  4. Dorogovtsev, S. N., Mendes, J. F. F. & Samukhin, A. N. Structure of growing networks with preferential linking. Phys. Rev. Lett. 85, 46334636 (2000)
  5. Dorogovtsev, S. N. Lectures on Complex Networks (Oxford Univ. Press, 2010)
  6. Newman, M. E. J. Networks: An Introduction (Oxford Univ. Press, 2010)
  7. Pastor-Satorras, R., Vázquez, A. & Vespignani, A. Dynamical and correlation properties of the internet. Phys. Rev. Lett. 87, 258701 (2001)
  8. Jeong, H., Néda, Z. & Barabási, A. L. Measuring preferential attachment in evolving networks. Europhys. Lett. 61, 567572 (2003)
  9. Dorogovtsev, S. N., Mendes, J. & Samukhin, A. Size-dependent degree distribution of a scale-free growing network. Phys. Rev. E 63, 062101 (2001)
  10. Bianconi, G. & Barabási, A.-L. Bose-Einstein Condensation in complex networks. Phys. Rev. Lett. 86, 56325635 (2001)
  11. Caldarelli, G., Capocci, A. & Rios, P. D. L. &. Muñoz, M. A. Scale-free networks from varying vertex intrinsic fitness. Phys. Rev. Lett. 89, 258702 (2002)
  12. Vázquez, A. Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. E 67, 056104 (2003)
  13. Pastor-Satorras, R., Smith, E. & Sole, R. V. Evolving protein interaction networks through gene duplication. J. Theor. Biol. 222, 199210 (2003)
  14. Fortunato, S., Flammini, A. & Menczer, F. Scale-free network growth by ranking. Phys. Rev. Lett. 96, 218701 (2006)
  15. D'Souza, R. M., Borgs, C., Chayes, J. T., Berger, N. & Kleinberg, R. D. Emergence of tempered preferential attachment from optimization. Proc. Natl Acad. Sci. USA 104, 61126117 (2007)
  16. Motter, A. E. & Toroczkai, Z. Introduction: optimization in networks. Chaos 17, 026101 (2007)
  17. McPherson, M., Smith-Lovin, L. & Cook, J. M. Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27, 415444 (2001)
  18. Simşek, O. & Jensen, D. Navigating networks by using homophily and degree. Proc. Natl Acad. Sci. USA 105, 1275812762 (2008)
  19. Redner, S. How popular is your paper? An empirical study of the citation distribution. Eur. Phys. J. B 4, 131134 (1998)
  20. Watts, D. J., Dodds, P. S. & Newman, M. E. J. Identity and search in social networks. Science 296, 13021305 (2002)
  21. Börner, K., Maru, J. T. & Goldstone, R. L. The simultaneous evolution of author and paper networks. Proc. Natl Acad. Sci. USA 101, 52665273 (2004)
  22. Crandall, D., Cosley, D., Huttenlocher, D., Kleinberg, J. & Suri, S. in Proc. 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008) (eds Li, Y., Liu, B. & Sarawagi, S.) 160168 (ACM, 2008)
  23. Menczer, F. Growing and navigating the small world Web by local content. Proc. Natl Acad. Sci. USA 99, 1401414019 (2002)
  24. Menczer, F. Evolution of document networks. Proc. Natl Acad. Sci. USA 101, 52615265 (2004)
  25. Bonahon, F. Low-Dimensional Geometry (AMS, 2009)
  26. Bollobás, B. & Riordan, O. in Handbook of Graphs and Networks (eds Bornholdt, S. & Schuster, H. G.) Ch. 1 134 (Wiley-VCH, 2003)
  27. Adamic, L. A. & Huberman, B. A. Power-law distribution of the World Wide Web. Science 287, 2115 (2000)
  28. van Raan, A. F. J. On growth, ageing, and fractal differentiation of science. Scientometrics 47, 347362 (2000)
  29. Clauset, A., Moore, C. & Newman, M. E. J. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98101 (2008)
  30. Menon, A. K. & Elkan, C. in Machine Learning and Knowledge Discovery in Databases (ECML) (eds Gunopulos, D., Hofmann, T., Malerba, D. & Vazirgiannis, M.) 437452 (Lecture Notes in Computer Science, Vol. 6912, Springer, 2011)

Download references

Author information

Affiliations

  1. Department of Electrical Engineering, Computer Engineering and Informatics, Cyprus University of Technology, 33 Saripolou Street, 3036 Limassol, Cyprus

    • Fragkiskos Papadopoulos
  2. Cooperative Association for Internet Data Analysis (CAIDA), University of California, San Diego (UCSD), La Jolla, California 92093, USA

    • Maksim Kitsak &
    • Dmitri Krioukov
  3. Departament de Física Fonamental, Universitat de Barcelona, Martí i Franquès 1, 08028 Barcelona, Spain

    • M. Ángeles Serrano &
    • Marián Boguñá

Contributions

F.P. and D.K. planned research, performed research and wrote the paper; M.K., M.A.S. and M.B. planned and performed research. All authors discussed the results and reviewed the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Information (1M)

    This file contains: (i) Supplementary Methods, including details on the real-world network data used in the main text to validate the popularity×similarity optimization approach, and on the network mapping method used to infer the popularity and similarity coordinates; (ii) Supplementary Notes including the technical details of the popularity×similarity model, comparisons between the properties of real-world and modelled networks, and discussion of related work; (iii) Supplementary Figures S1-S16 with legends; and (iv) additional references.

Additional data