The observation that individuals tend to be friends with people who are similar to themselves, commonly known as homophily, is a prominent feature of social networks. While homophily describes a bias in attribute preferences for similar others, it gives limited attention to variability. Here, we observe that attribute preferences can exhibit variation beyond what can be explained by homophily. We call this excess variation monophily to describe the presence of individuals with extreme preferences for a particular attribute possibly unrelated to their own attribute. We observe that monophily can induce a similarity among friends-of-friends without requiring any similarity among friends. To simulate homophily and monophily in synthetic networks, we propose an overdispersed extension of the classical stochastic block model. We use this model to demonstrate how homophily-based methods for predicting attributes on social networks based on friends (that is, 'the company you keep') are fundamentally different from monophily-based methods based on friends-of-friends (that is, 'the company you’re kept in'). We place particular focus on predicting gender, where homophily can be weak or non-existent in practice. These findings offer an alternative perspective on network structure and prediction, complicating the already difficult task of protecting privacy on social networks.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Journal of Scientific Computing Open Access 14 August 2023
Scientific Reports Open Access 06 February 2023
International Journal of Data Science and Analytics Open Access 27 November 2022
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Lazarsfeld, P. F. & Merton, R. K. Friendship as a social process: a substantive and methodological analysis. Freedom Control Mod. Soc. 18, 18–66 (1954).
McPherson, M., Smith-Lovin, L. & Cook, J. M. Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27, 415–444 (2001).
Kossinets, G. & Watts, D. J. Origins of homophily in an evolving social network. Am. J. Sociol. 115, 405–450 (2009).
Raftery, A. E. Statistics in sociology, 1950–2000: a selective review. Sociol. Methodol. 31, 1–45 (2001).
Zheng, T., Salganik, M. J. & Gelman, A. How many people do you know in prison? Using overdispersion in count data to estimate social structure in networks. J. Am. Stat. Assoc. 101 409–423 (2006).
Boutyline, A & Willer, R. The social structure of political echo chambers: variation in ideological homophily in online networks. Pol. Psychol. 38 551–569 (2017).
Bamman, D., Eisenstein, J. & Schnoebelen, T. Gender identity and lexical variation in social media. J. Socioling. 18, 135–160 (2014).
McCormick, T. H. et al. A practical guide to measuring social structure using indirectly observed network data. J. Stat. Theory Pract. 7, 120–132 (2013).
Peel, L. Graph-based semi-supervised learning for relational networks. In Proc. 2017 SIAM Int. Conf. Data Mining 435–443 (SIAM, 2017).
Neville, J. & Jensen, D. Supporting relational knowledge discovery: lessons in architecture and algorithm design. In Proc. Data Mining Lessons Learned Workshop, 19th Int. Conf. Machine Learning (JMLR, 2002).
Jensen, D., Neville, J. & Gallagher, B. Why collective inference improves relational classification. In Proc. 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining 593–598 (ACM, 2004).
Macskassy, S. A. & Provost, F. Classification in networked data: a toolkit and a univariate case study. J. Mach. Learn. Res. 8, 935–983 (2007).
Sen, P. et al. Collective classification in network data. AI Mag. 29, 93–106 (2008).
Bhagat, S., Cormode, G. & Muthukrishnan, S. in Social Network Data Analytics 115–148 (Springer, Boston, MA, 2011).
Taskar, B., Abbeel, P. & Koller, D. Discriminative probabilistic models for relational data. In Proc. 18th Conf. Uncertainty in Artificial Intelligence 485–492 (Morgan Kaufmann, 2002).
Duncan, G. T. & Lambert, D. Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81, 10–18 (1986).
Traud, A. L., Mucha, P. J. & Porter, M. A. Social structure of Facebook networks. Physica A Stat. Mech. Appl. 391, 4165–4180 (2012).
Resnick, M. D. et al. Protecting adolescents from harm: findings from the national longitudinal study on adolescent health. JAMA 278, 823–832 (1997).
Ugander, J., Karrer, B., Backstrom, L. & Marlow, C. The anatomy of the Facebook social graph. Preprint at https://arxiv.org/abs/1111.4503 (2011).
Thelwall, M. Homophily in MySpace. J. Am. Soc. Inf. Sci. Technol. 60, 219–231 (2009).
Shrum, W., Cheek, N. H. & Hunter, S. Friendship in school: gender and racial homophily. Sociol. Edu. 61, 227–239 (1988).
Neal, J. W. Hanging out: features of urban children’s peer social networks. J. Soc. Pers. Rel. 27, 982–1000 (2010).
Laniado, D., Volkovich, Y., Kappler, K. & Kaltenbrunner, A. Gender homophily in online dyadic and triadic relationships. EPJ Data Sci. 5, 19 (2016).
Adamic, L. A. & Glance, N. The political blogosphere and the 2004 US election: divided they blog. In Proc. 3rd Int. Workshop Link Discovery 36–43 (ACM, 2005).
Roberts, N. & Everton, S. F. Roberts and Everton Terrorist Data: Noordin Top Terrorist Network (Subset) [Machine-readable data file] (2011).
Holland, P. W., Laskey, K. B. & Leinhardt, S. Stochastic blockmodels: first steps. Social. Netw. 5, 109–137 (1983).
Coleman, J. Relational analysis: the study of social organizations with survey methods. Human Organ. 17, 28–36 (1958).
Currarini, S., Jackson, M. O. & Pin, P. An economic model of friendship: homophily, minorities, and segregation. Econometrica 77, 1003–1045 (2009).
McCullagh, P. & Nelder, J. A. Generalized Linear Models Vol. 37 (CRC Press, London, 1989).
Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).
Van Der Hofstad, R. Random Graphs and Complex Networks Vol. 1 (Cambridge Univ. Press, Cambridge, 2016).
Agresti, A. & Kateri, M. Categorical Data Analysis (Springer, Berlin, 2011).
Gelman, A. & Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge Univ. Press, Cambridge, 2006).
Signorile, V. & O’Shea, R. M. A test of significance for the homophily index. Am. J. Sociol. 70, 467–470 (1965).
Wedderburn, R. W. Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61, 439–447 (1974).
Williams, D. A. Extra-binomial variation in logistic linear models. J. R. Stat. Soc. C Appl. Stat. 31, 144–148 (1982).
Morel, J. G. & Nagaraj, N. K. A finite mixture distribution for modelling multinomial extra variation. Biometrika 80, 363–371 (1993).
Condon, A. & Karp, R. M. Algorithms for graph partitioning on the planted partition model. Random Struct. Algor. 18, 116–140 (2001).
Crowder, M. J. Beta-binomial ANOVA for proportions. J. R. Stat. Soc. C Appl. Stat. 27, 34–37 (1978).
DiPrete, T. A. & Forristal, J. D. Multilevel models: methods and substance. Annu. Rev. Sociol. 20, 331–357 (1994).
Guo, G. & Zhao, H. Multilevel modeling for binary data. Annu. Rev. Sociol. 26, 441–462 (2000).
Page, L., Brin, S., Motwani, R. & Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web (Stanford Univ. InfoLab, 1999).
Kleinberg, J. M. Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632 (1999).
Zhu, X., Ghahramani, Z. & Lafferty, J. Semi-supervised learning using Gaussian fields and harmonic functions. In Proc. 20th Int. Conf. Machine Learning 912–919 (JMLR, 2003).
Zheleva, E. & Getoor, L. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proc. 18th Int. Conf. World Wide Web 531–540 (IW3C2, 2009).
He, J., Chu, W. W. & Liu, Z. V. Inferring privacy information from social networks. In Int. Conf. Intelligence and Security Informatics 154–165 (Springer, 2006).
Rubin, D. B. Inference and missing data. Biometrika 63, 581–592 (1976).
Heitjan, D. F. & Basu, S. Distinguishing “missing at random” and “missing completely at random”. Am. Stat. 50, 207–213 (1996).
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997).
Gallagher, B. & Eliassi-Rad, T. in Advances in Social Network Mining and Analysis 1–19 (Springer, Berlin, 2010).
Gong, N. Z. et al. Joint link prediction and attribute inference using a social-attribute network. ACM Trans. Intell. Syst. Technol. 5, 27 (2014).
Golub, B. & Jackson, M. O. How homophily affects the speed of learning and best-response dynamics. Q. J. Econ. 127, 1287–1338 (2012).
Stohl, C. & Stohl, M. Networks of terror: theoretical assumptions and pragmatic consequences. Commun. Theory 17, 93–124 (2007).
Carrington, P. J. in The SAGE Handbook of Social Network Analysis 236–255 (SAGE, Los Angeles, CA, 2011).
Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).
Watts, D. J. Should social science be more solution-oriented? Nat. Hum. Behav. 1, 0015 (2017).
Decelle, A., Krzakala, F., Moore, C. & Zdeborová, L. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (2011).
McPherson, J. M. & Ranger-Moore, J. R. Evolution on a dancing landscape: organizations and networks in dynamic Blau space. Social. Forces 70, 19–42 (1991).
Yang, Y. et al. Gender differences in communication behaviors, spatial proximity patterns, and mobility habits. Preprint at https://arxiv.org/abs/1607.06740 (2016).
Kosinski, M., Stillwell, D. & Graepel, T. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl Acad. Sci. USA 110, 5802–5805 (2013).
Traud, A. L., Kelsic, E. D., Mucha, P. J. & Porter, M. A. Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53, 526–543 (2011).
Karrer, B. & Newman, M. E. J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011).
Chung, F. & Lu, L. Connected components in random graphs with given expected degree sequences. Ann. Comb. 6, 125–145 (2002).
Chatfield, C. & Goodhardt, G. J. in Mathematical Models in Marketing 53–57 (Springer, Berlin, 1976).
We thank B. Fosdick, J. Kleinberg, I. Kloumann, D. Larremore, J. Nishimura, M. Porter, M. Salganik and S. Way for helpful comments. We thank attendees of the 2016 International Conference on Computational Social Science and the 2016 SIAM Workshop on Network Science for comments. This work was supported in part by the Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) programme, the Akiko Yamazaki and Jerry Yang Engineering Fellowship and a David Morgenthaler II Faculty Fellowship. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Altenburger, K.M., Ugander, J. Monophily in social networks introduces similarity among friends-of-friends. Nat Hum Behav 2, 284–290 (2018). https://doi.org/10.1038/s41562-018-0321-8
This article is cited by
Scientific Reports (2023)
Journal of Scientific Computing (2023)
Scientific Reports (2022)
International Journal of Data Science and Analytics (2022)
Scientific Reports (2021)