On the inadequacy of nominal assortativity for assessing homophily in networks

Nominal assortativity (or discrete assortativity) is widely used to characterize group mixing patterns and homophily in networks, enabling researchers to analyze how groups interact with one another. Here we demonstrate that the measure presents severe shortcomings when applied to networks with unequal group sizes and asymmetric mixing. We characterize these shortcomings analytically and use synthetic and empirical networks to show that nominal assortativity fails to account for group imbalance and asymmetric group interactions, thereby producing an inaccurate characterization of mixing patterns. We propose the adjusted nominal assortativity and show that this adjustment recovers the expected assortativity in networks with various level of mixing. Furthermore, we propose an analytical method to assess asymmetric mixing by estimating the tendency of inter- and intra-group connectivities. Finally, we discuss how this approach enables uncovering hidden mixing patterns in real-world networks.


Introduction
Understanding how groups interact in networks is fundamental for uncovering mechanisms underlying diverse phenomena, from protein interactions to social communication (1)(2)(3).Such group-level interactions often generate mixing patterns in networks, commonly assessed with single-valued measures such as nominal assortativity (4,5).Though these measures help analyze group mixing concisely, they may be grounded on unrealistic assumptions about the net-can range between r min and r max , which are dependent on the edge counts in the networks.This specifically implies that r = 1 is only achievable when the sum of the inter-group edge counts, is equal to the total number of edges in the network.Conversely, r = −1 is attainable solely in cases where this sum is zero.Crucially, this work exposes the influence of certain network attributes such as metadata assignment and degree sequence on these bounds of r.
Here we demonstrate that nominal assortativity presents two fundamental inadequacies.
First, it overlooks the group-size imbalance, implicitly assuming that groups are relatively equal.
This assumption neglects that smaller groups have fewer possibilities to connect with them- We generate these networks using a model with a group-mixing parameter h that corresponds to the probability of same-group nodes being connected; the generated networks are in a heterophilic regime with h = 0.2 (left) and a homophilic regime with h = 0.8 (right).These networks have a fixed group mixing h but varying minority fraction f 0 .In the plots, solid lines represent the analytical formulation, whereas dots are values from simulations.(b) Nominal assortativity is a single-valued measure and ignores asymmetries in group mixing.The measure might indicate zero assortativity for networks with significant asymmetric mixing patterns.For example, a network with homophilic and heterophilic groups might be characterized with nominal assortativity equal to zero.
selves, misrepresenting mixing patterns in scenarios of group-size imbalance (see Fig. 1A).
Second, the measure consists of a uni-dimensional value, only characterizing symmetric group mixing (or an average mixing).This restriction ignores potential asymmetries in networks, thereby missing relevant mixing patterns (Fig. 1B).Both inadequacies are problematic, particularly when analyzing real-world data and in the presence of minorities.
In real-world networks, groups tend to have unequal sizes, and some groups (i.e., minority groups) might be much smaller than the largest group.For instance, women are underrepresented in STEM fields, such as Computer Science and Physics, making them a minority group in professional networks (16)(17)(18).When analyzing such networks and other imbalanced data sets, we must consider group sizes to estimate the likelihood of in-group mixing biases.Besides unequal group sizes, networks might display asymmetries in how groups interact.For example, in male-dominated scientific fields, established researchers could be primarily men due to historical first-mover advantages (18).Thus, senior men have the resources to drive their collaboration network, implying that the tendency for male-male collaboration may not be the same as female-female collaboration (19,20).In such settings, homophily is asymmetric, having different strengths for the minority and majority groups.These asymmetries, however, are lost when using a single-valued measure to characterize group mixing, such as nominal assortativity.
In this work, we demonstrate how nominal assortativity misses relevant mixing patterns in networks with unequal group sizes or asymmetric mixing and we show how to tackle these shortcomings.First, we use generative network models with adjustable mixing parameters to show that nominal assortativity fails to recover the expected assortativity in synthetic networks.
We characterize this limitation analytically and numerically by examining the relationship between assortativity, group size, and asymmetric mixing.Second, we propose the adjusted nominal assortativity and show that this adjustment recovers the expected assortativity from synthetic networks.Third, we propose to assess asymmetric mixing in networks by estimat-ing group-mixing tendencies using our analytical formulations.Finally, we discuss how our approach enables characterizing hidden mixing patterns in real-world networks.

Results
Nominal assortativity (or discrete assortativity) characterizes mixing patterns by assessing the significance of the intra-group.To that end, this definition employs the B × B mixing matrix e to account for groups connectivity, where B is the number of groups, and each matrix element e ij corresponds to the fraction of edges connecting nodes from group i to nodes from group j.
The nominal assortativity measure is then defined as follows: where a i and b i are the fraction of edges that, respectively, begin and end at nodes from group i, defined as a i = j e ij and b i = j e ji (4).This definition produces an intuitive quantity that equals zero when groups lack intra-and inter-group tendencies (i.e., e ii = a i b i ).The quantity reaches to its maximum r = 1 when intra-group ties dominate the network (i.e., i e ii = 1) and becomes negative when inter-group ties are predominant.

Nominal assortativity on networks with groups of unequal sizes
To examine how nominal assortativity represents mixing patterns, we use generative network models in which we have a prior knowledge on what to expect from the value of mixing.Thus we aim to evaluate assortativity's ability to recover the expected mixing value.More precisely, we generate random networks using a model with a tunable group mixing parameter h, that corresponds to the probability of same-group nodes being connected, whereas its complement, 1 − h, is the probability of inter-group ties (see Methods).Here, we focus on the case of two groups, B = 2, in which nodes possess a binary attribute (e.g., red/blue, male/female).The case of beyond two groups is discussed in the supplementary materials.We examine networks with a fixed h and varying group sizes, finding that nominal assortativity goes to zero as the minority group decreases in size (Fig. 1A).For example, when h = 0.8 (i.e., homophily), assortativity can vary from 0.6 to 0, depending on the proportional size of the minority group, despite fixed group mixing.
To investigate why nominal assortativity varies with the minority fraction, we turn to the analytical formulation of the assortativity.Let us use a more general notion of group mixing in which h ii denotes the intrinsic tendency of a node from group i connecting to a node of the same group; its complement h ij = 1 − h ii is the tendency of a node in group i to connect to a node in group j.Therefore, in a random network, the probability of finding an edge between group i and group j express as p ij = f i f j h ij , where f corresponds to the proportional size of groups, implying that each mixing matrix element can be defined as the denominator is a normalizing factor.
Thus, i e ii and i a i b i can be expressed as follows: and where 0 and 1 are the labels for the minority and majority group, respectively.Finally, inserting Eq. ( 2) and Eq.(3) into Eq.( 1), the nominal assortativity can be written as: This equation reveals that nominal assortativity is a function of group sizes f 0 and f 1 .We verify this group-size dependency by comparing our analytical formulation with the assortativ-ity measured on synthetic networks, finding a perfect agreement between Eq. ( 4) and simulations (Fig. 2A-B).Our results confirm the group-size dependence and reveal that this dependence increases with smaller minority groups (Fig. 2C).In contrast, when groups have similar sizes, we observe, as expected, a linear relationship between group mixing h and nominal assortativity.More precisely, when groups are equal in size, f 0 = f 1 = 0.5, Eq. ( 4) becomes

The adjusted nominal assortativity
Here we propose to adjust the nominal assortativity for group sizes by normalizing the elements of the mixing matrix.This approach accurately retrieves the expected assortativity in networks, enabling us to assess mixing patterns in imbalanced networks.To that end, we define the adjusted mixing matrix e ⋆ , which accounts for the network's pool of opportunities, namely, the fact that larger groups have more opportunities to connect and thus should be normalized by the group size.We define each element of the adjusted mixing matrix e ⋆ to be where f k corresponds to the proportional size of group k.This adjustment, ensures that the elements of the mixing matrix only represents the mixing tendencies (h) that are relevant for measuring intrinsic homophily and assortativity and not other factors.For instance, in the case of two groups, where original mixing elements are e 00 ≃ f 2 0 h 00 and e 11 ≃ f 2 1 h 11 , the adjusted elements of the matrix are expressed as e ⋆ 00 ≃ h 00 and e ⋆ 11 ≃ h 11 .
Moreover, we define the adjusted nominal assortativity, r adj , as follows: where a ⋆ i = j e ⋆ ij and b i = j e ⋆ ji .This adjustment considers the effects of group-size imbalance on the mixing matrix, leading to a consistent assessment of mixing patterns in imbalanced scenarios.
We verify the proposed measure by generating synthetic data with different group-imbalance and mixing scenarios.We examine networks generated with a fixed h and varying group sizes, revealing that adjusted nominal assortativity accurately recovers the expected mixing independent of group sizes (Fig. 2D-E).Thus results show that the adjusted nominal assortativity has a linear relationship with group mixing h, regardless of f 0 and f 1 as expected (see Fig. 2F).
We find similar results for scale-free networks and three-groups scenarios (see Supplementary Material).In sum, the adjusted nominal assortativity accounts for group sizes and pool of opportunities, enabling us to assess group mixing preferences accurately.

Assessing group mixing in empirical networks
Next, we explore nominal assorativity in different real-world networks with unequal group sizes.Results show that nominal assortativity underestimates the mixing patterns compared to the adjusted nominal assortativity (see Table 1).We analyze social networks of academic collaboration and face-to-face interactions with annotated binary gender information (see Supplementary Material for detailed data descriptions) (22)(23)(24).In most cases, assortativity r is lower than the adjusted assortativity r adj , especially in the cases of small minority groups.For example, in the collaborative coding platform GitHub, where women are only 6% of the network, nominal assortativity is r = 0.04, implying the absence of assortative collaboration; in contrast, the adjusted assortativity is r adj = 0.16, suggesting a potential gender assortativity.Similarly, nominal assortativity might mislead us to mistake changes in group mixing for changes in group sizes.For instance, in the collaboration network among computer scientists (DBLP), assortativity r increases from r = 0.04 in 1980 to r = 0.10 in 2010, which could imply a possible change in group mixing over time.However, this change might be merely due to the growth of the minority group.The minority size almost doubled, from 11% to 21%, and the adjusted assortativity indicates a stable mixing around r adj = 0.15.Overall, these findings underscore the importance of accounting for group sizes when analyzing mixing patterns and the risks of ignoring group imbalance in networks.

Mixing patterns in networks with asymmetric mixing
A single measure of assortativity reduces information about the B × B mixing matrix into a single value, leading to a concise measure but potentially missing relevant asymmetries in mixing.The idea that a single summary statistic may not be representative of a dataset is, of course, not new and has been shown in prior works (25).More recently, Peel and colleagues showed the heterogeneity of the local assortativity (5) and Piraveenan et al. (26) showed the extent to which each node contributes to the measure of assortativity.Here, we pay a special attention to the asymmetric nature of group mixing while assuming no heterogeneity at the node level.
To characterize r and r adj in asymmetric scenarios, we relax the assumption of h 00 = h 11 = h and use our analytical formulation (Eq.( 4)) to evaluate the nominal assortativity over the whole parameter space of h 00 and h 11 (see Fig. 3).We find that the adjusted nominal assortativity is consistent and independent of group size in asymmetric cases, whereas the unadjusted version is size-dependent.Both measures, however, might characterize contrasting mixing patterns with the same value.In particular, these measures might indicate an absence of inter-or intra-group mixing tendency despite significant group mixing.For instance, when h 00 = 0.8 and h 11 = 0.2, the minority group has a strong homophilic tendency, whereas the majority has a strong heterophilic tendency; yet, nominal assortativity equals zero, incorrectly suggesting a lack of assortative or disassortative patterns (Fig. 3).
To better understand the underlying reason for this misrepresentation, note that when r = 0, the numerator in Eq. ( 4) is zero, leading to the following equation: Simplifying this equation by using the expression of Eq. 6, we find that h 00 = 1 − h 11 satisfies this condition.In other words, in many cases when nominal assortavitiy reports a zero value (i.e., lack of any dis/assortative preferences), the group mixing tendencies could be widely different.These findings show that compressing the mixing matrix into a single value, such as assortativity, can hide relevant asymmetric mixing patterns that are present in networks.
It is worth noting that paying attention to asymmetries in mixing patterns between groups is important in other applications, such as the emergence of core-periphery structures (27).The nominal assortativity is dependent of group size in asymmetric cases, whereas (b) the adjusted version is size-independent.Yet both versions of assortativity ignore asymmetric mixing; they reduce a mixing matrix into a unidimensional value, producing a concise measure but missing asymmetry in networks.These measures might indicate an absence of mixing tendency despite significant group mixing.In particular, both measures are zero when h 00 = 1 − h 11 (i.e., the dashed lines).In the plots, each heatmap displays the respective measures in varying mixing scenarios of minority mixing h 00 and majority mixing h 11 in the presence of minority sizes f 0 = 0.05 and f 0 = 0.10.

Assessing asymmetric mixing patterns in networks
In order to assess asymmetric mixing among groups, we propose to turn to the mixing probabilities in a network given an assumption of its generative process.For example, in a random homophilic networks described earlier (ER-Homophily), the diagonal of the mixing matrix can be expressed as: which can be re-written as follows: where and E is the total number of edges, and E 00 and E 11 are the number of intra-group edges of the minority and majority groups, and e 00 and e 11 are fraction of intra-group edges normalized by E. By combining the equations above, ij p ij can be expressed as: By using Eq. ( 5) and Eq. ( 7), we can retrieve h 00 and h 11 from data, given we know basic information about the network (i.e., E, E 00 , E 11 , and f 0 ).
We verify this method by generating networks with varying mixing parameters and compare the estimated parameters with the ground truth in Supplementary Note 6.Similar methodology can be applied to scale-free networks, finding equivalent results (see Supplementary Note 6).
Though this approach requires prior knowledge about the underlying generative processes in networks, it is plausible to argue that many small-scale and large-scale social networks often fall into these two categories of topologically random or scale-free structure (28)(29)(30).Once the plausible topology is identified by examining the degree distribution, the appropriate formulation can be used to extract the group mixing asymmetries.

Discussion
Despite its popularity and relative accuracy in capturing homophily and assortative mixing in a variety of networks, nominal assortativity can produce distorted assessments of mixing patterns in networks with unequal groups and asymmetric mixing.In this work, we demonstrated this problem and proposed ways to tackle these limitations.Using generative network models with adjustable mixing, we show that nominal assortativity fails to estimate assortativity in networks accurately.Our results demonstrate (1) the need for accounting for group sizes in mixing analyses and (2) the inability of single-valued measures to capture asymmetries in networks.
To tackle these limitations, we develop two approaches to assess group mixing in networks.
First, we propose adjusted nominal assortativity to solve the group-size limitation, which accurately recovers the expected assortativity from networks.Our analysis of real-world networks reveals that nominal assortativity underestimates the strength of mixing patterns compared to the adjusted assortativity.Second, we propose to assess asymmetric mixing in networks by estimating the intra-group mixing probabilities accounting for group size differences and other group-level topological features.It is worth mentioning that there are a variety of other segregation and assortativity measurements in the social network literature beyond the nominal assortativity.Future works should focus on comparing the sensitivity, equivalency, and compatibility of those measurements against each other and baseline scenarios similar to this paper and the previous efforts (31).
Accurately measuring biases in group mixing in social networks is crucial because mixing biases affect perception of minorities (32), access to social capital (33), and algorithmic visibil-ity (34), to name a few.Our work lays a novel foundation by proposing an accurate measure of assortativity that can be applied to a wide range of social networks.Better assessment of grouplevel tendencies and asymmetries in networks provides the means to understand how diverse groups interact-a fundamental step for uncovering mechanisms governing our social lives.

Random networks with group mixing
To analyze assortativity in networks, we use a model that incorporates group mixing and random tie formation in networks.In this model, an edge between two nodes depends on their group memberships via a stochastic process.The probability of a node from group i to establish a tie with a node from group j is denoted as h ij .The probability to connect with nodes of the same group is thus the complementary function so that h ij = 1 − h ii , likewise h ji = 1 − h jj (see Jupyter notebooks for more details).

Fig. 1 .
Fig.1.Nominal assortativity misses relevant mixing patterns in networks.(a) Nominal assortativity shows different mixing values for networks that have the same group mixing-a misrepresentation due to group-size imbalance.We generate these networks using a model with a group-mixing parameter h that corresponds to the probability of same-group nodes being connected; the generated networks are in a heterophilic regime with h = 0.2 (left) and a homophilic regime with h = 0.8 (right).These networks have a fixed group mixing h but varying minority fraction f 0 .In the plots, solid lines represent the analytical formulation, whereas dots are values from simulations.(b) Nominal assortativity is a single-valued measure and ignores asymmetries in group mixing.The measure might indicate zero assortativity for networks with significant asymmetric mixing patterns.For example, a network with homophilic and heterophilic groups might be characterized with nominal assortativity equal to zero.

20 Fig. 2 .
Fig. 2. Adjusted assortativity retrieves the expected assortativity in networks with group-size imbalance.(a) Nominal assortativity has a group-size dependence that (b) underestimates the strength of group mixing in networks.(c) This underestimation is more severe in the presence of smaller minority groups.(d) We propose the adjusted assortativity that tackles this misrepresentation by adjusting for group sizes in the network.(e) The measure has a linear relationship with group mixing h and (f) is independent of group sizes.In all plots, solid lines represent the analytical formulation, whereas dots are values from simulations.

Fig. 3 .
Fig.3.Unidimensional measures of assortativity overlook asymmetry in networks.(a) The nominal assortativity is dependent of group size in asymmetric cases, whereas (b) the adjusted version is size-independent.Yet both versions of assortativity ignore asymmetric mixing; they reduce a mixing matrix into a unidimensional value, producing a concise measure but missing asymmetry in networks.These measures might indicate an absence of mixing tendency despite significant group mixing.In particular, both measures are zero when h 00 = 1 − h 11 (i.e., the dashed lines).In the plots, each heatmap displays the respective measures in varying mixing scenarios of minority mixing h 00 and majority mixing h 11 in the presence of minority sizes f 0 = 0.05 and f 0 = 0.10.
The group-size dependency occurs in other types of networks such as scalefree networks.For instance, we simulate the Barabási-Albert homophily (BA-Homophily) (21)l, which incorporates group mixing preferences with the preferential attachment(21), and demonstrate that nominal assortativity is a function of group sizes on such networks and in scenarios involving more than two groups (see Supplementary Material).Overall, these findings imply that nominal assortativity is unadjusted for group sizes and introduces an artifactual bias into mixing analyses in imbalanced scenarios.

Table 1 .
Nominal assortativity and adjusted assortativity in empirical networks.N denotes number of nodes, f 0 is the minority fraction, E is the total number of edges, and label 0 refers to the minority group and label 1 refers to the majority group.