Hierarchical structure and the prediction of missing links in networks

Abstract

Networks have in recent years emerged as an invaluable tool for describing and quantifying complex systems in many branches of science1,2,3. Recent studies suggest that networks often exhibit hierarchical organization, in which vertices divide into groups that further subdivide into groups of groups, and so forth over multiple scales. In many cases the groups are found to correspond to known functional units, such as ecological niches in food webs, modules in biochemical networks (protein interaction networks, metabolic networks or genetic regulatory networks) or communities in social networks4,5,6,7. Here we present a general technique for inferring hierarchical structure from network data and show that the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks, such as right-skewed degree distributions, high clustering coefficients and short path lengths. We further show that knowledge of hierarchical structure can be used to predict missing connections in partly known networks with high accuracy, and for more general network structures than competing techniques8. Taken together, our results suggest that hierarchy is a central organizing principle of complex networks, capable of offering insight into many network phenomena.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: A hierarchical network with structure on many scales, and the corresponding hierarchical random graph.
Figure 2: Application of the hierarchical decomposition to the network of grassland species interactions.
Figure 3: Comparison of link prediction methods.

References

  1. 1

    Wasserman, S. & Faust, K. Social Network Analysis (Cambridge Univ. Press, Cambridge, 1994)

  2. 2

    Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002)

  3. 3

    Newman, M. E. J. The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003)

  4. 4

    Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabási, A.-L. Hierarchical organization of modularity in metabolic networks. Science 30, 1551–1555 (2002)

  5. 5

    Clauset, A., Newman, M. E. J. & Moore, C. Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004)

  6. 6

    Guimera, R. & Amaral, L. A. N. Functional cartography of complex metabolic networks. Nature 433, 895–900 (2005)

  7. 7

    Lagomarsino, M. C., Jona, P., Bassetti, B. & Isambert, H. Hierarchy and feedback in the evolution of the Escherichia coli transcription network. Proc. Natl Acad. Sci. USA 104, 5516–5520 (2001)

  8. 8

    Liben-Nowell, D. & Kleinberg, J. M. The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58, 1019–1031 (2007)

  9. 9

    Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. Proc. Natl Acad. Sci. USA 99, 7821–7826 (2002)

  10. 10

    Krause, A. E., Frank, K. A., Mason, D. M., Ulanowicz, R. E. & Taylor, W. W. Compartments revealed in food-web structure. Nature 426, 282–285 (2003)

  11. 11

    Radicchi, F., Castellano, C., Cecconi, F., Loreto, V. & Parisi, D. Defining and identifying communities in networks. Proc. Natl Acad. Sci. USA 101, 2658–2663 (2004)

  12. 12

    Watts, D. J., Dodds, P. S. & Newman, M. E. J. Identity and search in social networks. Science 296, 1302–1305 (2002)

  13. 13

    Kleinberg, J. in Proc. 2001 Neural Inform. Processing Systems Conf. (eds Dietterich, T. G., Becker, S. & Ghahramani, Z.) 431–438 (MIT Press, Cambridge, MA, 2002)

  14. 14

    Palla, G., Derényi, I., Farkas, I. & Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)

  15. 15

    Casella, G. & Berger, R. L. Statistical Inference (Duxbury, Belmont, 2001)

  16. 16

    Newman, M. E. J. & Barkema, G. T. Monte Carlo Methods in Statistical Physics (Clarendon, Oxford, 1999)

  17. 17

    Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002)

  18. 18

    Huss, M. & Holme, P. Currency and commodity metabolites: Their identification and relation to the modularity of metabolic networks. IET Syst. Biol. 1, 280–285 (2007)

  19. 19

    Krebs, V. Mapping networks of terrorist cells. Connections 24, 43–52 (2002)

  20. 20

    Dawah, H. A., Hawkins, B. A. & Claridge, M. F. Structure of the parasitoid communities of grass-feeding chalcid wasps. J. Anim. Ecol. 64, 708–720 (1995)

  21. 21

    Bryant, D. in BioConsensus (eds Janowitz, M., Lapointe, F.-J., McMorris, F. R., Mirkin, B. & Roberts, F.) pp. 163–184 (Series in Discrete Mathematics and Theoretical Computer Science, Vol. 61, American Mathematical Society-DIMACS, Providence, RI, 2003)

  22. 22

    Dunne, J. A., Williams, R. J. & Martinez, N. D. Food-web structure and network theroy: The role of connectance and size. Proc. Natl Acad. Sci. USA 99, 12917–12922 (2002)

  23. 23

    Szilágyi, A., Grimm, V., Arakaki, A. K. & Skolnick, J. Prediction of physical protein–protein interactions. Phys. Biol. 2, S1–S16 (2005)

  24. 24

    Sprinzak, E., Sattath, S. & Margalit, H. How reliable are experimental protein-protein interaction data? J. Mol. Biol. 327, 919–923 (2003)

  25. 25

    Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA 98, 4569–4574 (2001)

  26. 26

    Lakhina, A., Byers, J. W., Crovella, M. & Xie, P. in INFOCOM 2003: Twenty-Second Annual Joint Conf. IEEE Computer and Communications Societies (ed. Bauer, F.) Vol. 1 332–341 (IEEE, Piscataway, New Jersey, 2003)

  27. 27

    Clauset, A. & Moore, C. Accuracy and scaling phenomena in Internet mapping. Phys. Rev. Lett. 94, 018701 (2005)

  28. 28

    Martinez, N. D., Hawkins, B. A., Dawah, H. A. & Feifarek, B. P. Effects of sampling effort on characterization of food-web structure. Ecology 80, 1044–1055 (1999)

  29. 29

    Hanely, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)

  30. 30

    Sales-Pardo, M., Guimerá, R., Moreira, A. A. & Amaral, L. A. N. Extracting the hierarchical organization of complex systems. Proc. Natl Acad. Sci. USA 104, 15224–15229 (2007)

Download references

Acknowledgements

We thank J. Dunne, M. Gastner, P. Holme, M. Huss, M. Porter, C. Shalizi and C. Wiggins for their help, and the Santa Fe Institute for its support. C.M. thanks the Center for the Study of Complex Systems at the University of Michigan for hospitality while some of this work was conducted.

Author information

Correspondence to Aaron Clauset.

Supplementary information

Supplementary Notes

This file contains Supplementary Notes including the technical details of our hierarchical model and the methods used to fit it to empirical data. It also contains addition results on graph resampling and the prediction of missing links, and the algorithmic specifics of our experimental studies. (PDF 123 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Clauset, A., Moore, C. & Newman, M. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98–101 (2008). https://doi.org/10.1038/nature06830

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.