Homophily influences ranking of minorities in social networks

Homophily can put minority groups at a disadvantage by restricting their ability to establish links with a majority group or to access novel information. Here, we show how this phenomenon can influence the ranking of minorities in examples of real-world networks with various levels of heterophily and homophily ranging from sexual contacts, dating contacts, scientific collaborations, and scientific citations. We devise a social network model with tunable homophily and group sizes, and demonstrate how the degree ranking of nodes from the minority group in a network is a function of (i) relative group sizes and (ii) the presence or absence of homophilic behaviour. We provide analytical insights on how the ranking of the minority can be improved to ensure the representativeness of the group and correct for potential biases. Our work presents a foundation for assessing the impact of homophilic and heterophilic behaviour on minorities in social networks.


II. DEGREE EXPONENT IN EMPIRICAL NETWORKS
To evaluate our model against the data, we compare the exponent of the empirical degree distribution with the exponent generated from our model given the same empirical homophily and group size values. To estimate the exponent of the empirical degree distribution we use the maximum-likelihood fitting method [3,4]. The exponent of the degree distribution generated from the model has been calculated analytically (see Methods). Figure 1 displays degree distribution of minorities and majorities in four empirical networks. The exponent of the degree distributions of networks generated with our model may not have a perfect fit in all cases, but it can explain the trend, and the right-skewed nature of the degree [9]. In fact, the degree fit deviation from the model has been observed in many prominent models of networks [4,11].  FIG. 1: Degree distributions for the majority and minority groups in four empirical networks. A) Sexual contact network with sex-workers (blue) and sex-buyers (orange). B) Online Swedish dating (POK). C) Collaboration network with men (blue) and women (orange). D) American Physical Society (APS) citation network for two topics: Classical Statistical Mechanics (CSM, orange) and Quantum Statistical Mechanics (QSM, blue). The dashed line is the fit using a maximum likelihood estimate. The exponent of the fit (Fit) is compared with the analytical exponent derived from our model (Model). Our model generates a realistic degree exponent for empirical networks with various types of homophily and group sizes.

III. DERIVATION OF THE PROBABILITY OF HAVING AN INTERNAL LINK
We focus on the case of links internal to the group a. The case of the group b will be the exact symmetric. Let m aa be the probability to establish a link between two nodes of the group a at each arriving node. By construction, it is also the probability to find a link within the group a. Given Eq. (4) of the main text, the probability to generate a link between two nodes of the group a is given by: Given the results of the derivation of the exponents for the degree distributions, we also have: Given the expression of C in Eq. (12) of the main text, and the expression of β a given by Eq. (15) of the main text, one can write that: and thus: Similarly from C and K a (t), one can express (2 − C) starting from the differential equation for K b (t) (Eq. (4) of the main text): Using the expression for β b (Eq. (17) of the main text), one thus has: Finally, we find that m aa is given by: and m bb is given by: We verify that these expression give: • for h aa = h bb = 1 (perfect homophily): m aa = f a and m bb = f b ; • for h aa = h bb = 0 (perfect heterophily): m aa = m bb = 0; • for h aa = h bb = 0.5 (perfect mixing): in this case we also have β a = β b = 0.5 and then m aa = f 2 a , m bb = f 2 b , and as a consequence m ab = 2f a f b . Figure 2 shows the agreement between numerical and analytical results. For simplicity, we fix the value of the homophily parameter in one group and show the relation between tunable homophily and the fraction of edges for the other group. The dashed lines corresponds to the results of the analytical derivation. The value of the homophily parameter extracted from the simulations is shown by the dots.
As the figure shows, in the case of homophily fixed for one group at 0.5 and same group size (panel left), we observe as expected a sigmoid function for both groups. For the largest value of homophily (h aa , h bb = 1), the fraction of edges between nodes of the same group converges to the size of the group. As the size of the minority decreases, the gap between the fraction of edges for the minority (orange lines when the majority homophily is fixed (h bb = 0.5)) and the majority (blue lines when the minority homophily is fixed (h aa = 0.5)) widen. By tuning the group size and fixing the homophily parameter for minorities, the majority gains an advantage by receiving links within itself partly because of the increase in their degree exponent and large group size differences (blue lines). As the size of the minority decreases, the gap between the fraction of edges for minority (orange lines when majority homophily is fixed (h bb = 0.5)) and majority (blue lines when minority homophily is fixed (haa = 0.5)) widen. The analytical results are derived by estimating expected homophily from number of edges and they are in excellent agreement with the numerical results.

IV. ADJUSTING THE ALGORITHMIC RANKING
The following analytical derivation determines the fraction of minority nodes in top d% degree ranks. The exact distribution of degrees for minority and majority is given by [2]: in which β = 1 γ−1 . Once normalized it gives: Therefore, the probability of having a node with a degree k ≥ K is given by: Thus, if there are N i nodes of each category in the whole network, the number of nodes with a degree k ≥ K is given by: and the total number of nodes with a degree k ≥ K is then: If we are interested in the top d nodes, then there exists one K such as: This equation can be solved numerically to find this K. Then the number of nodes of each category in the top d nodes is given by eq. (13).
This calculation enables us to estimate the trend in Figure 3B of the original paper. Consequently, by knowing the homophily and group sizes in the network, we can determine the exponent in Eq. (10) and predict the probability of minorities to appear in top d% from Eq. (15). The degree rank then can be adjusted by the ranking algorithms to make the number of nodes from each group proportional to their group size.
Here, we derive analytical derivations for the case of two groups of minority and majority that have an unequal number of stubs when connecting to other nodes, denoted by m a and m b respectively.
Let K a (t) and K b (t) be the sum of the degrees of nodes from group a and b respectively. These quantities verify: since the overall growth of the network follows a Barabási-Albert process. Let us denote the relative fraction of group size for each group as f a and f b . For m a = m b = m we find again the simple symmetrical case derived before. The evolution of K a and K b is given in discrete time by: which in the limit ∆t → 0 gives: These equations verify that for h aa = h bb = 0 and h ab = h ba = 1 (perfectly heterophilic network) we get: and thus for the evolution of the degree of a single node: which gives: Similarly, for h aa = h bb = 1 and h ab = h ba = 0 (perfectly homophilic network) we get: and thus for the evolution of the degree of a single node: which gives: Let's make the hypothesis that K a (t) and K b (t) are linear functions of time, so that K a (t) = Ct and K b (t) = (2M − C)t given Eq. (16). Using Eq. (18) with f a = f and f b = 1 − f , we thus have: So: This equation for C is a polynomial of order 3 and can be numerically solved. We can then derive the evolution of the degree of a single node for both groups in the general case. Let's define: and For group a, we have: and thus: Similarly, for group b we have: and thus: and: This derivation gives an insight on how the minority can improve their overall degree growth (expressed in β) by (i) increasing their lower-limit of activity m a and (ii) increasing the asymmetric homophily (h aa > h bb ).