Monophily in social networks introduces similarity among friends-of-friends

Abstract

The observation that individuals tend to be friends with people who are similar to themselves, commonly known as homophily, is a prominent feature of social networks. While homophily describes a bias in attribute preferences for similar others, it gives limited attention to variability. Here, we observe that attribute preferences can exhibit variation beyond what can be explained by homophily. We call this excess variation monophily to describe the presence of individuals with extreme preferences for a particular attribute possibly unrelated to their own attribute. We observe that monophily can induce a similarity among friends-of-friends without requiring any similarity among friends. To simulate homophily and monophily in synthetic networks, we propose an overdispersed extension of the classical stochastic block model. We use this model to demonstrate how homophily-based methods for predicting attributes on social networks based on friends (that is, 'the company you keep') are fundamentally different from monophily-based methods based on friends-of-friends (that is, 'the company you’re kept in'). We place particular focus on predicting gender, where homophily can be weak or non-existent in practice. These findings offer an alternative perspective on network structure and prediction, complicating the already difficult task of protecting privacy on social networks.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overdispersion in attribute preferences.
Fig. 2: Homophily and monophily across a population of friendship networks.
Fig. 3: Four different oSBMs and the associated performance of one-hop and two-hop classifiers.
Fig. 4: Predicting gender, political affiliation and terrorist group affiliation.

References

  1. 1.

    Lazarsfeld, P. F. & Merton, R. K. Friendship as a social process: a substantive and methodological analysis. Freedom Control Mod. Soc. 18, 18–66 (1954).

    Google Scholar 

  2. 2.

    McPherson, M., Smith-Lovin, L. & Cook, J. M. Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27, 415–444 (2001).

    Article  Google Scholar 

  3. 3.

    Kossinets, G. & Watts, D. J. Origins of homophily in an evolving social network. Am. J. Sociol. 115, 405–450 (2009).

    Article  Google Scholar 

  4. 4.

    Raftery, A. E. Statistics in sociology, 1950–2000: a selective review. Sociol. Methodol. 31, 1–45 (2001).

    Article  Google Scholar 

  5. 5.

    Zheng, T., Salganik, M. J. & Gelman, A. How many people do you know in prison? Using overdispersion in count data to estimate social structure in networks. J. Am. Stat. Assoc. 101 409–423 (2006).

    CAS  Article  Google Scholar 

  6. 6.

    Boutyline, A & Willer, R. The social structure of political echo chambers: variation in ideological homophily in online networks. Pol. Psychol. 38 551–569 (2017).

    Article  Google Scholar 

  7. 7.

    Bamman, D., Eisenstein, J. & Schnoebelen, T. Gender identity and lexical variation in social media. J. Socioling. 18, 135–160 (2014).

    Article  Google Scholar 

  8. 8.

    McCormick, T. H. et al. A practical guide to measuring social structure using indirectly observed network data. J. Stat. Theory Pract. 7, 120–132 (2013).

    Article  Google Scholar 

  9. 9.

    Peel, L. Graph-based semi-supervised learning for relational networks. In Proc. 2017 SIAM Int. Conf. Data Mining 435–443 (SIAM, 2017).

  10. 10.

    Neville, J. & Jensen, D. Supporting relational knowledge discovery: lessons in architecture and algorithm design. In Proc. Data Mining Lessons Learned Workshop, 19th Int. Conf. Machine Learning (JMLR, 2002).

  11. 11.

    Jensen, D., Neville, J. & Gallagher, B. Why collective inference improves relational classification. In Proc. 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining 593–598 (ACM, 2004).

  12. 12.

    Macskassy, S. A. & Provost, F. Classification in networked data: a toolkit and a univariate case study. J. Mach. Learn. Res. 8, 935–983 (2007).

    Google Scholar 

  13. 13.

    Sen, P. et al. Collective classification in network data. AI Mag. 29, 93–106 (2008).

    Article  Google Scholar 

  14. 14.

    Bhagat, S., Cormode, G. & Muthukrishnan, S. in Social Network Data Analytics 115–148 (Springer, Boston, MA, 2011).

  15. 15.

    Taskar, B., Abbeel, P. & Koller, D. Discriminative probabilistic models for relational data. In Proc. 18th Conf. Uncertainty in Artificial Intelligence 485–492 (Morgan Kaufmann, 2002).

  16. 16.

    Duncan, G. T. & Lambert, D. Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81, 10–18 (1986).

    Article  Google Scholar 

  17. 17.

    Traud, A. L., Mucha, P. J. & Porter, M. A. Social structure of Facebook networks. Physica A Stat. Mech. Appl. 391, 4165–4180 (2012).

    Article  Google Scholar 

  18. 18.

    Resnick, M. D. et al. Protecting adolescents from harm: findings from the national longitudinal study on adolescent health. JAMA 278, 823–832 (1997).

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Ugander, J., Karrer, B., Backstrom, L. & Marlow, C. The anatomy of the Facebook social graph. Preprint at https://arxiv.org/abs/1111.4503 (2011).

  20. 20.

    Thelwall, M. Homophily in MySpace. J. Am. Soc. Inf. Sci. Technol. 60, 219–231 (2009).

    Article  Google Scholar 

  21. 21.

    Shrum, W., Cheek, N. H. & Hunter, S. Friendship in school: gender and racial homophily. Sociol. Edu. 61, 227–239 (1988).

    Article  Google Scholar 

  22. 22.

    Neal, J. W. Hanging out: features of urban children’s peer social networks. J. Soc. Pers. Rel. 27, 982–1000 (2010).

    Article  Google Scholar 

  23. 23.

    Laniado, D., Volkovich, Y., Kappler, K. & Kaltenbrunner, A. Gender homophily in online dyadic and triadic relationships. EPJ Data Sci. 5, 19 (2016).

    Article  Google Scholar 

  24. 24.

    Adamic, L. A. & Glance, N. The political blogosphere and the 2004 US election: divided they blog. In Proc. 3rd Int. Workshop Link Discovery 36–43 (ACM, 2005).

  25. 25.

    Roberts, N. & Everton, S. F. Roberts and Everton Terrorist Data: Noordin Top Terrorist Network (Subset) [Machine-readable data file] (2011).

  26. 26.

    Holland, P. W., Laskey, K. B. & Leinhardt, S. Stochastic blockmodels: first steps. Social. Netw. 5, 109–137 (1983).

    Article  Google Scholar 

  27. 27.

    Coleman, J. Relational analysis: the study of social organizations with survey methods. Human Organ. 17, 28–36 (1958).

    Article  Google Scholar 

  28. 28.

    Currarini, S., Jackson, M. O. & Pin, P. An economic model of friendship: homophily, minorities, and segregation. Econometrica 77, 1003–1045 (2009).

    Article  Google Scholar 

  29. 29.

    McCullagh, P. & Nelder, J. A. Generalized Linear Models Vol. 37 (CRC Press, London, 1989).

  30. 30.

    Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).

    CAS  Article  PubMed  Google Scholar 

  31. 31.

    Van Der Hofstad, R. Random Graphs and Complex Networks Vol. 1 (Cambridge Univ. Press, Cambridge, 2016).

  32. 32.

    Agresti, A. & Kateri, M. Categorical Data Analysis (Springer, Berlin, 2011).

  33. 33.

    Gelman, A. & Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge Univ. Press, Cambridge, 2006).

  34. 34.

    Signorile, V. & O’Shea, R. M. A test of significance for the homophily index. Am. J. Sociol. 70, 467–470 (1965).

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Wedderburn, R. W. Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61, 439–447 (1974).

    Google Scholar 

  36. 36.

    Williams, D. A. Extra-binomial variation in logistic linear models. J. R. Stat. Soc. C Appl. Stat. 31, 144–148 (1982).

    Google Scholar 

  37. 37.

    Morel, J. G. & Nagaraj, N. K. A finite mixture distribution for modelling multinomial extra variation. Biometrika 80, 363–371 (1993).

    Article  Google Scholar 

  38. 38.

    Condon, A. & Karp, R. M. Algorithms for graph partitioning on the planted partition model. Random Struct. Algor. 18, 116–140 (2001).

    Article  Google Scholar 

  39. 39.

    Crowder, M. J. Beta-binomial ANOVA for proportions. J. R. Stat. Soc. C Appl. Stat. 27, 34–37 (1978).

    Google Scholar 

  40. 40.

    DiPrete, T. A. & Forristal, J. D. Multilevel models: methods and substance. Annu. Rev. Sociol. 20, 331–357 (1994).

    Article  Google Scholar 

  41. 41.

    Guo, G. & Zhao, H. Multilevel modeling for binary data. Annu. Rev. Sociol. 26, 441–462 (2000).

    Article  Google Scholar 

  42. 42.

    Page, L., Brin, S., Motwani, R. & Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web (Stanford Univ. InfoLab, 1999).

  43. 43.

    Kleinberg, J. M. Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632 (1999).

    Article  Google Scholar 

  44. 44.

    Zhu, X., Ghahramani, Z. & Lafferty, J. Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. 20th Int. Conf. Machine Learning 912–919 (JMLR, 2003).

  45. 45.

    Zheleva, E. & Getoor, L. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proc. 18th Int. Conf. World Wide Web 531–540 (IW3C2, 2009).

  46. 46.

    He, J., Chu, W. W. & Liu, Z. V. Inferring privacy information from social networks. In Int. Conf. Intelligence and Security Informatics 154–165 (Springer, 2006).

  47. 47.

    Rubin, D. B. Inference and missing data. Biometrika 63, 581–592 (1976).

    Article  Google Scholar 

  48. 48.

    Heitjan, D. F. & Basu, S. Distinguishing “missing at random” and “missing completely at random”. Am. Stat. 50, 207–213 (1996).

    Google Scholar 

  49. 49.

    Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997).

    Article  Google Scholar 

  50. 50.

    Gallagher, B. & Eliassi-Rad, T. in Advances in Social Network Mining and Analysis 1–19 (Springer, Berlin, 2010).

  51. 51.

    Gong, N. Z. et al. Joint link prediction and attribute inference using a social-attribute network. ACM Trans. Intell. Syst. Technol. 5, 27 (2014).

    Article  Google Scholar 

  52. 52.

    Golub, B. & Jackson, M. O. How homophily affects the speed of learning and best-response dynamics. Q. J. Econ. 127, 1287–1338 (2012).

    Article  Google Scholar 

  53. 53.

    Stohl, C. & Stohl, M. Networks of terror: theoretical assumptions and pragmatic consequences. Commun. Theory 17, 93–124 (2007).

    Article  Google Scholar 

  54. 54.

    Carrington, P. J. in The SAGE Handbook of Social Network Analysis 236–255 (SAGE, Los Angeles, CA, 2011).

  55. 55.

    Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).

    CAS  Article  PubMed  Google Scholar 

  56. 56.

    Watts, D. J. Should social science be more solution-oriented? Nat. Hum. Behav. 1, 0015 (2017).

    Article  Google Scholar 

  57. 57.

    Decelle, A., Krzakala, F., Moore, C. & Zdeborová, L. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (2011).

    Article  Google Scholar 

  58. 58.

    McPherson, J. M. & Ranger-Moore, J. R. Evolution on a dancing landscape: organizations and networks in dynamic Blau space. Social. Forces 70, 19–42 (1991).

    Article  Google Scholar 

  59. 59.

    Yang, Y. et al. Gender differences in communication behaviors, spatial proximity patterns, and mobility habits. Preprint at https://arxiv.org/abs/1607.06740 (2016).

  60. 60.

    Kosinski, M., Stillwell, D. & Graepel, T. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl Acad. Sci. USA 110, 5802–5805 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Traud, A. L., Kelsic, E. D., Mucha, P. J. & Porter, M. A. Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53, 526–543 (2011).

    Article  Google Scholar 

  62. 62.

    Karrer, B. & Newman, M. E. J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011).

    Article  Google Scholar 

  63. 63.

    Chung, F. & Lu, L. Connected components in random graphs with given expected degree sequences. Ann. Comb. 6, 125–145 (2002).

    Article  Google Scholar 

  64. 64.

    Chatfield, C. & Goodhardt, G. J. in Mathematical Models in Marketing 53–57 (Springer, Berlin, 1976).

Download references

Acknowledgements

We thank B. Fosdick, J. Kleinberg, I. Kloumann, D. Larremore, J. Nishimura, M. Porter, M. Salganik and S. Way for helpful comments. We thank attendees of the 2016 International Conference on Computational Social Science and the 2016 SIAM Workshop on Network Science for comments. This work was supported in part by the Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) programme, the Akiko Yamazaki and Jerry Yang Engineering Fellowship and a David Morgenthaler II Faculty Fellowship. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

K.M.A. and J.U. designed and performed the research and wrote the manuscript.

Corresponding author

Correspondence to Johan Ugander.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–4, Supplementary Figures 1–22, Supplementary Tables 1–3, Supplementary References 1–18

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Altenburger, K.M., Ugander, J. Monophily in social networks introduces similarity among friends-of-friends. Nat Hum Behav 2, 284–290 (2018). https://doi.org/10.1038/s41562-018-0321-8

Download citation

Further reading