Introduction

In interacting systems, an observed tie between two individuals can often be explained by the existence of groups or a hierarchical organization. For instance, in social networks interactions between people can be explained by the communities the individuals belong to1,2. In animal interaction networks, animals fight or mate strategically based on some underlying notion of ranking perceived between them to determine their dominance in a hierarchy3. When modelling network datasets, one typically observes only the set of interactions, but communities and ranking are hidden variables that need to be learned from the data, problems referred to as community detection and ranking extraction. Often, practitioners consider only one of the two for the dataset at hand, expecting a clear contribution for either community structure or hierarchy in determining edge formation. However, there can be situations where this distinction is blurry, and it is not clear what among these two effects plays a bigger role in explaining the data. For instance, students may report friendship relationships based on the groups they belong to or based on some hidden notion of hierarchy between them, thus reporting what friendship they aspire to have instead. The problem is that in these two cases the input dataset, a directed and possibly weighted network, looks the same. It is a list of edges \(i \rightarrow j\) and their weight w, but one may not know what mechanism best explains the observed data. Unless a practitioner has a strong a priori expectation about how the system works, it is not clear how to distinguish if the network was generated by community structure or by hierarchical organization. The question thus is how to learn this interplay between communities and hierarchies from the data in an automatic way.

A large variety of algorithms are available for extracting communities and ranking from networks, spanning from simple heuristics and deterministic approaches to probabilistic ones. Standard algorithms for ranking entities are based on spectral methods, e.g. PageRank4 and Eigenvector Centrality5. They are based on random walks on network and output real-valued scores. A different family of approaches considers ordinal rankings. These are typically extracted by finding an optimal permutation of the nodes that minimizes some penalty function. Relevant examples are Minimum Violation Rank6,7, SerialRank8 and SyncRank9. Other approaches that consider real-valued scores extracted from pairwise preferences are those based on Random Utility Models10, such as the Bradley-Terry-Luce (BTL) model11,12. A different approach is that of SpringRank13, a physically inspired ranking algorithm that computes real-valued scores via minimizing the energy of a system of springs representing the directed observed interactions. In terms of community detection models, there are various traditional approaches like graph partitioning, spectral clustering modularity-based algorithms or divisive algorithms14. In this work we focus on those based on probabilistic generative models15,16,17,18 like the Stochastic Block Model (SBM)19 and its variants. These have several advantages, including the possibility of sampling synthetic networks with a given community structure and predicting missing links. Most importantly, they allow for a probabilistic treatment, which is the approach we adopt here to tackle our problem.

In most cases, these algorithms are applied independently, i.e. one either extracts communities or ranking, and the practitioners decides a priori which model is most appropriate. There have been some attempts to consider both communities and ranking as hidden variables on networks, assuming some underlying interaction between the two mechanisms. For instance, Chen at al.20 combine clustering and ranking by first inferring non-overlapping groups using a variant of SBM and then retrieving the within-cluster popularity of nodes of different types within each group, which in turns influences the community they belong to. Here the assumption is that nodes belong to groups and there is a ranking of nodes within each group. A recent work has studied the ranking communities problem, addressing the detection of communities by ranking them using information flow techniques21. An additional way of mixing the concepts of community detection and hierarchy is hierarchical clustering. The main idea is that there exists a hierarchy of communities that can be organized via a tree structure22,23. Finally, we mention clustering algorithms used for ranking data, i.e. data that are intrinsically embedded with a hierarchical structure given as metadata24. All of these settings are fundamentally different from the problem considered here, in that they assume an intrinsic rank of communities or an intrinsic clustering of ranks. In this manuscript, instead, we assume that nodes interact mainly either because of community affinity or ranking, and we want our algorithm to learn the preferred mechanisms of each node automatically from the data. This is a relevant problem in networks where the two mechanisms coexist. For example, an individual can have a contact with another individual because of homophily or because of prestige. The former case can occur when there are some common attributes and preferences, so that individuals recognize them as part of the same market. The latter corresponds to cases in which the individuals are perceiving themselves as part of some league in terms of prestige, and they aim at connecting with someone in the same league or slightly above25. In these systems, models considering just one mechanism will be likely to recognize a subset of the interactions as noisy observations, or even interpret them in their own terms, leading to a distorted interpretation of the underlying patterns.

To address this problem, in this work we propose a probabilistic model capable of recognizing the community and ranking structures in a network with coexisting mechanisms and quantifying the extent to which each individual prefers one instead of the other. Our model considers latent variables encoding the division of nodes in clusters, the hierarchical organization of nodes and how every node prefers to interact. The generative model as we define it, allows us to address also the problems of predicting missing links in the data and assigning a preferred interaction type to each node, i.e. community affinity or ranking similarity. We validate our model on synthetic data and showcase its applications on three real datasets where the impact of community and ranking mechanisms differ. We find that it is capable of correctly assigning to each node its preferred interaction mechanism (hierarchy or community) with high confidence, i.e. all the probabilities are close to zero or one. In addition, also the coefficient representing the overall preferred mechanism is always close to the ground-truth value. All of this is achieved without losing accuracy on the edge prediction task, whose performances on the limit cases are close to the ones of the baseline methods.

Generating networks with coexisting community and hierarchical structures

In this work, we are interested in modelling networks with underlying coexisting community and hierarchical structures. We refer to the network mainly through its matrix representation, i.e. the adjacency matrix \(A = \{A_{ij}\}_{i,j=1}^N\), where N is the number of nodes. The entry \(A_{ij} \in {\mathbb {N}}\) represents the number of directed interactions \(i \rightarrow j\) from node i to node j. Each interaction can be either due to affinity between nodes (community) or competition between them (hierarchy). We are interested in scenarios where these two mechanisms coexist, and the interaction type is not known in advance. The goal is thus to observe a network and distinguish edges based on which of these two mechanisms is more likely to explain the interaction. To this end, we assume nodes to be of two types: those that predominantly interact through community, and those that predominantly interact through hierarchy. The intuition is that nodes with the same preference are more likely to interact, inducing the edge type (community or hierarchy). We further assume that the probability of a heterogeneous interaction between nodes of two different types is not null and can be considered as a third edge type. From a generative modelling perspective, this scenario can be understood as first drawing latent labels on nodes, corresponding to node types. Then drawing interactions between nodes from a specific distribution depending on their types, with their mean parameterized accordingly. Formally, the generative model is:

$$\begin{aligned} \sigma _{i}, \sigma _{j}\sim & {} \text {Be}(\mu ) \end{aligned}$$
(1)
$$\begin{aligned} A_{ij}\sim & {} {\left\{ \begin{array}{ll} \text {Pois}(A_{ij};S_{ij})^{\sigma _i } \; \text {Pois}(A_{ij};M_{ij})^{1-\sigma _i} &{} \text {if} \quad \sigma _{i}=\sigma _{j} \quad {(\text{in-group interaction})} \\ \text {Pois}(A_{ij};\delta _0) &{} \text {if} \quad \sigma _{i}\ne \sigma _{j} \quad {(\text{out-group interaction})} \end{array}\right. } \quad , \end{aligned}$$
(2)

where \(\sigma _{i} \in \left\{ 0,1\right\}\) represents the node type, \(\mu \in \left[ 0,1\right]\) its prior, and \(\delta _0 \ge 0\) is a parameter that controls the density of edges between nodes of different type (typically small). The parameters \(M_{ij}\) and \(S_{ij}\) determine the community and hierarchy mechanisms, respectively. The procedure is repeated for each edge \(i \rightarrow j\), since we assume conditional independence of the \(A_{ij}\) given the latent random variables. The model of Eqs. (1), (2) leads to the following network likelihood distribution

$$\begin{aligned} P(A | M,S, \sigma , \delta _0) = \prod _{ij} \left[ \text {Pois}(A_{ij};S_{ij})^{\sigma _i} \; \text {Pois}(A_{ij};M_{ij})^{1-\sigma _i} \right] ^{\delta _{\sigma _i\sigma _j}} \, \left[ \text {Pois}(A_{ij};\delta _0) \right] ^{1-\delta _{\sigma _i\sigma _j}} \;. \end{aligned}$$
(3)

Notice that with this parameterization we obtain that an edge type random variable \(\delta _{\sigma _i\sigma _j} \in \left\{ 0,1\right\}\) can be naturally defined in terms of \(\sigma\) using \(\delta _{\sigma _i\sigma _j}=2 \sigma _{i}\sigma _{j}-\sigma _{i}-\sigma _{j}+1\). It is Bernoulli distributed with parameter \(\mu ^2 + (1-\mu )^2 = 1 - 2\mu (1-\mu )\).

To model community interactions parametrized by \(M_{ij}\) we use MultiTensor18 (MT), a mixed-membership variant of the SBM. Each node of the network belongs to a community to an extent represented by two membership vectors: \(u_i = [u_{ik}]\) determines how much i belongs to the community k considering the amount of out-going edges; \(v_i = [v_{ik}]\) only considers in-coming edges. An affinity matrix \(w = [w_{kh}]\) encodes the density of edges between nodes in different communities. Note that all these quantities are positive but not necessarily normalized. These elements are combined in the expected number of community interactions as \(M_{ij} = \sum _{k,h=1}^K u_{ik} v_{jh} w_{kh}\). This definition results in interactions more likely to exist between nodes with compatible community structure.

To model the hierarchical interactions parameterized by \(S_{ij}\) we use SpringRank13 (SR), a model that associates a score \(s_{i} \in {\mathbb {R}}\) to each node and an interaction energy \(\frac{\beta }{2}(s_i - s_j - 1)^2\) to each edge \(i \rightarrow j\) which regulates the probability of a hierarchical interaction as a Boltzmann weight. Here \(\beta\) is a hyperparameter that controls the strength of the hierarchy. These elements are combined in the expected number of hierarchical interactions as \(S_{ij} = c \exp \left[ -\frac{\beta }{2}(s_i - s_j - 1)^2\right]\), where c controls for network sparsity. This definition results in interactions more likely to exist between nodes with similar scores, i.e. close in rank.

We refer to our model as Xor . This is parameterized by \(\theta =(u,v,w,s,c,\delta _0,\mu )\) and it models the network likelihood distribution \(P(A|\theta ,\sigma )\) as in Eq. 3. A graphical representation of the generative model is provided in Fig. 1, together with a toy example of graph realization. Notice that for \(\sigma\) equal to a null or a unitary vector Xor reduces to MultiTensor or to SpringRank, respectively.

Figure 1
figure 1

Model visualization. (a) Graphical model: the entry of the adjacency matrix \(A_{ij}\) is determined by the community-related latent variables uvw (orange), by the ranking-related ones sc (green) and by the out-group interaction parameter \(\delta _0\) (blue), depending on the values taken by the node type latent variables \(\sigma _{i},\sigma _{j}\). E denotes the set of network directed edges. (b) Example of possible realization of the model: orange nodes interact mainly via community, i.e. \(\sigma _{i}=0\), green ones via hierarchy, i.e. \(\sigma _{i}=1\). Orange and green edges are interactions between nodes of the same type (matching node color), while blue edges are interactions between nodes of different types.

Inference

Given a network adjacency matrix A, we want to infer the parameters \(\theta\) and the node labels \(\sigma\) that best explain the observed data. To this end, we aim at maximizing \(P(\theta | A) = \sum _{\sigma }P(\sigma ,\theta | A)\), i.e. the maximum a posterior estimate of \(\theta\). For convenience we maximize its logarithm instead, as the maxima coincide. We than take a variational approach by using Jensen’s inequality

$$\begin{aligned} \log P(\theta | A) = \log \sum _{\sigma }P(\sigma ,\theta | A) \ge \sum _{\sigma } q(\sigma ) \log \frac{P(\sigma ,\theta | A)}{q(\sigma )} =: {\mathcal {L}}(q,\theta ) \;, \end{aligned}$$
(4)

where \(q(\sigma )\) is a variational distribution over the node labels. This formulation of the problem turns it in a maximization of the function \({\mathcal {L}}(q,\theta )\) with respect to \(\theta\) and q. In fact, since \(q(\sigma )\) must sum to one, the exact equality between the second and third term in Eq. 4 is achieved when \(q(\sigma )\) is equal to the posterior \(P(\sigma |\theta ,A)\). However, this posterior may not be analytically accessible as the normalization is not tractable. In fact we have

$$\begin{aligned}&P(\sigma |\theta ,A) \propto P(\sigma , A | \theta )\nonumber \\&\quad = \prod _i \mu ^{\sigma _i}(1-\mu )^{1-\sigma _i} \prod _{ij} \, \text {Pois}(A_{ij};S_{ij})^{\sigma _i\delta _{\sigma _i\sigma _j}} \text {Pois}(A_{ij};M_{ij})^{(1-\sigma _i)\delta _{\sigma _i\sigma _j}} \, \text {Pois}(A_{ij};\delta _0)^{1-\delta _{\sigma _i\sigma _j}} \quad , \end{aligned}$$
(5)

and this cannot be simply recast into a well-known probability distribution in \(\sigma\) (e.g. a fully factorized Bernoulli distribution). A careful reader will recognize that the \(P(\sigma |\theta ,A)\) is equivalent to an Ising model with unitary inverse temperature and Hamiltonian:

$$\begin{aligned}&H_{A}(s|J,h) = \sum _{i,j}J_{ij}s_{i}s_{j}+\sum _{i}h_{i}s_i\;, \end{aligned}$$
(6)
$$\begin{aligned}&J_{ij} = \frac{\log \text {Pois}(A_{ij};S_{ij})+\log \text {Pois}(A_{ij};M_{ij})-2\,\log \text {Pois}(A_{ij};\delta _0)}{4}\;, \end{aligned}$$
(7)
$$\begin{aligned} h_{i} & = \frac{1}{4}\sum\limits_{j} {\left( {\log {\hbox{Pois}}(A_{{ij}} ;S_{{ij}} ) + \log {\hbox{Pois}}(A_{{ji}} ;S_{{ji}} ) - \log {\hbox{Pois}}(A_{{ij}} ;M_{{ij}} } \right)} - \log {\hbox{Pois}}(A_{{ji}} ;M_{{ji}} )) \\&\quad + \frac{1}{2}\left( {\log \mu - \log (1 - \mu )} \right), \\ \end{aligned}$$
(8)

where here \(s_i \in \left\{ \pm 1\right\} , \ s(\sigma )= 2\, \sigma -1\) and the couplings J are asymmetric, see S2.

To obtain a tractable expression for the variational distribution that estimates \(P(\sigma |\theta ,A)\), we use a mean-field approximation \(q(\sigma ) = \prod _i q_i(\sigma _i)\) assuming \(q_i(\sigma _i) = \text {Be}(\sigma _i; Q_i)\). The goal is to find values of \(\left\{ Q_{i}\right\} _{i}\) such that the Kullback-Leibler divergence between the approximate posterior \(q(\sigma )\) and the true posterior \(P(\sigma |\theta ,A)\) is minimized26,27. Noting that the Hamiltonian in Eq. (6) corresponds to \(- \log P(\sigma ,A | \theta )\), the maximization of \({\mathcal {L}}(q,\theta )\) with respect q is equivalent to minimize a variational free energy \(F(q,\theta )\) defined as:

$$\begin{aligned} F(q,\theta ) = \sum _{\sigma } q(\sigma ) \,H_{A}(s(\sigma ),|J(\theta ),h(\theta )) - S(q) = - {\mathcal {L}}(q,\theta ) + \log P(\theta )\;, \end{aligned}$$
(9)

with the first term being the internal energy of q and S the entropy function of the product of Bernoulli distributions \(q(\sigma )\).

By performing this minimization for q we obtain that the optimal parameters that are included in Algorithm 1, i.e. Eqs. (11) to (13), see S3 for detailed derivations. This result can also be obtained using the standard self-consistency equation for an Ising model \(\sigma _{i} = \tanh \left( h_{i}+\sum _{j}J_{ij}\sigma _{i}\right)\) using \(J_{ij}\) and \(h_{i}\) as in Eqs. (7) and (8). One can in principle use alternative approximations more complex than mean-field26, for instance the Bethe approximation, at the cost of increasing computational complexity. This is left for future work.

The values of \(Q_{i}\) are also point-estimates for the variables \(\sigma _{i}\), as for a Bernoulli distribution \({\mathbb {E}}_{q}[\sigma _{i}]=Q_{i}\) . Differentiating \({\mathcal {L}}(q,\theta )\) with respect to \(\theta\) and setting this to zero gives the updates for the parameters \(\theta\). The full derivation is reported in S4, while the results can be seen in Algorithm 1s where we show the overall EM algorithmic routine. The algorithm does not guarantee convergence to the global maximum of the variational log-likelihood, but only to a local one. In practice, we perform different runs with different random initializations of the inputs and select the one with the best value of the \({\mathcal {L}}(q,\theta )\).

The computational complexity per iteration scales as \(O(E K^2 + N^2)\), where E is the total number of directed edges. In most of the applications, K is usually much smaller than E. For sparse networks, as is often the case for real datasets, \(E \propto N\). Hence, we have a complexity that is dominated by \(O(N^2)\). This contribution comes from terms containing \(\tilde{Q}_{ij} =Q_{i}Q_{j}\) that are not also multiplied by \(A_{ij}\), i.e. terms in the denominators of the updates in Algorithm 1. The matrix \(\tilde{Q}=\left[ \tilde{Q}_{ij}\right]\) is a dense object and was not present in the updates of MultiTensor, whose computational complexity is \(O(E\, K^{2})\), nor in that of the updates of SpringRank , whose complexity is that required to solve a sparse linear system. This may make it prohibitive to run our model on large systems. In these cases, one can consider approximating \(\tilde{Q}\), e.g. by batch sampling of pairs (ij) as done in machine learning applications28. We do not explore this here.

figure a

In the model we did not specify any prior for the \(\theta\). An alternative is to impose exponential priors for each \(u_{ik}, v_{ik}\), independent and identically distributed with parameters \(\lambda _u, \lambda _v\). This results in a \(L_{1}\)-regularized \({\mathcal {L}}(q,\theta )\) that enforces sparse membership vectors and an overall contribution of the community mechanism close to \(||w||_{max}\). To control the growth of the value \(||w||_{max}\) we can impose an exponential prior on w as well, parametrized by \(\lambda _w\). The regularizer may prevent the SBM to overfit. This might be particularly relevant in the case of a block structure induced on the network by different leagues in the hierarchical organization of the nodes. The prior results in small modifications of the updates for the uvw, which are reported in Eqs. (14) and (15). We do not set a prior on s since in previous studies13 it was shown that adding a Gaussian prior may not necessarily lead to better prediction performance.

Results

Results on synthetic data

The Xor model outputs the parameters related to the community (uvw), the hierarchical structure (sc) and the node types \(\sigma\) from the observed network data A. When the ground-truth values of \(\theta\) and \(\sigma\) are available, we can measure the performance of the model in recovering the community structure, the ranking of nodes and their type. For these three tasks we consider as performance metrics the cosine similarity (CS), the Pearson’s correlation (PC) and Area Under the Curve (AUC), respectively, between the ground-truth and the inferred values. In the absence of ground-truth, we can indirectly evaluate the model fitness via edge prediction tasks in cross-validation settings where we hide a subset of the matrix A (test set), fit the model in the remaining subset (training set) and test the ability to predict the missing edges (test set).

Figure 2
figure 2

Performances on synthetic networks for different tasks. The mean value across folds is reported, indicating also the standard deviations. For the edge prediction task, our model is compared with the baseline methods MultiTensor and SpringRank. We vary the proportion of expected nodes with \(\sigma _i =1\), i.e. preferring a hierarchy-based interaction. Only the not regularised version of Xor is shown since it is the most effective in this scenario.

We validate the model on synthetic data generated using the Xor generative model with \(N=500,\) average degree \(\langle k \rangle =20\), \(\beta =5\) and varying the ground-truth value of \(\mu _{gt} \in [0,1]\). Specifically, we generate networks with \(K=3\) communities of equal-size unmixed group membership, a hierarchy with \(l=3\) leagues, i.e. the scores \(\left\{ s_{i}\right\} _{i=1}^{N}\) are drawn from a mixture of Gaussians with means \(\{ -4, 0, 4\}\) and standard deviations \(\{1, 0.5, 1\}\); we set \(\delta _0=0.01\). We draw five different independent samples for each set of parameters. The inference algorithm is tested by using 5-fold cross-validation for splitting the data into train and test sets, and ran with five different random initializations on each graph instance. We set the hyperparameters K and \(\beta\) equal to the ground-truth values, while we use grid search for selecting the best value of the regularization \(\lambda = \lambda _v = \lambda _u = \lambda _w \times 0.1\).

We find that Xor predicts missing edges robustly and consistently across different values of \(\mu _{gt}\) and better than baseline models that consider only community structure (MT) or only hierarchical structure (SR), see Fig. 2 (left). The performance is not monotonic in \(\mu _{gt}\): we obtain high values of AUC when \(\mu _{gt}\) is close to 0 or when \(\mu _{gt}\) is close to 1. These are extreme scenarios where a large majority of the nodes are predominantly interacting either via communities or hierarchy. As one of these two mechanisms dominates, it is also easier to infer the parameters, hence the higher AUC values. The intermediate region \(0.2 \le \mu _{gt}\le 0.8\) where the nodes distribute more evenly between the two mechanisms corresponds to cases in which inference is harder. Nevertheless, the model shows stable performance in this range, with AUC mean values never dropping below 0.7 and always comparable or better than MT, the best performing among the two baselines. A similar non-monotonic behaviour is observed for the node classification task, where we aim at predicting the node type \(\sigma\) using Q. While we observe a similar performance drop in the same intermediate regime, performance is robust, as the average AUC is always higher than 0.85. Finally, the performance in recovering communities is good in the regime that most favours community structure (\(\mu _{gt}<0.4\)) while recovering of the hierarchy is poor, and vice-versa in the opposite regime (\(\mu _{gt}>0.6\)). This is intuitive, as when most of the nodes predominantly interact via communities, their score is irrelevant, and therefore cannot be recovered well. What matters is the ability to recover the structure corresponding to the main mechanism at play, i.e. high cosine similarity when \(\mu _{gt}<0.5\) or high Pearson’s coefficient when \(\mu _{gt}>0.6\). We find that Xor performs well in this task, with a boost in performance in recovering the ranking to values above 0.7 in the regime where hierarchy dominates. Similarly, cosine similarity increases above 0.7 when communities dominate. These results suggest that the model is not only able to predict missing edges and nodes’ type, but also to distinguish the node-level latent features that determine how nodes interact, regardless of their type.

Results on real data

Application on High School network

Figure 3
figure 3

Application on High School network: (a), (b) comparison between the communities detected by MultiTensor and Xor, (c) node types Q visualization and (d) hierarchy in the subnetwork of nodes preferring the competitive mechanism. Nodes’ positions are assigned: (a)–(c) using a spring layout and (d) using the scores s inferred by the Xor algorithm. Communities are selected by normalizing the v membership vector (similar results are obtained with u) and colors of the pie markers are assigned according to the community (mixed) membership in each group. The dark grey nodes have null membership vector. Both the algorithms select \(K=4\) as optimal number of communities, as output using a five-fold cross-validation scheme and grid search.

As a first example of application of this model, we consider a dataset of a network of high school students29. This describes the perceived interactions of a group of 67 high school students. Each student is being asked “who are you friend of” in the fall of 1957 and the spring of 1958, and the answers are aggregated on the same edge allowing weights with values 1 or 2. Agreement in the response is not ensured, hence the network is directed. It is reasonable to expect that students belong to groups (communities) and this influences the answers they give. However, a fraction of the students may not belong to any group and instead nominate others based on their perceived ranking of the students. For instance, one may nominate whom they aspire to befriend. Figure 3 shows what happens when we apply Xor to this dataset. The figure shows the estimates of node type \(\sigma _{i}\), describing the preferred behavior of each student. As we can see, most of the students have a \(\sigma _1\approx 0\), their nominations follow a community structure. However, we obtain four individuals with high \(\sigma _i\), meaning that their preferences are mainly based on ranking. Even though they have a degree similar to that of other students, they are not well connected with the rest of the network as they mostly interact among themselves (there are only 5 other students that nominate one of them as friends). Comparing the reciprocity coefficient for the whole set of students with the one for the subgraph interacting mainly via hierarchy, we find that it increases from a value of 0.51 to a 0.88. In fact, this smaller network is missing only three edges for being a (directed) clique. Considering the direction and weight of the edges of the subgraph made of these four individuals, the ranking is meaningful as it reveals insights on the group social dynamics, as shown in Fig. 3d, where the hierarchy highlights consistency between inferred score (the position of the nodes) and the weight and direction of the interactions. Namely, the individual with highest ranking in that subset (\(i=41\)), is also the individual that makes fewer friendship nominations, while the others tend to nominate them more often.

In the same figure, we show the community structure inferred by our model and that using MultiTensor, which considers only this mechanism for edge generation. As we can see, Xor outputs slightly different communities, as the yellow and blue nodes are partially mixed in the two cases. What is interesting, is that most of the nodes that have \(\sigma _i=1\) are not assigned to any community: again, we observe that the latent variable related to the mechanism not used for interacting is meaningless. Our models thus were able to extract a subgraph where hierarchy structure could meaningfully explain the directed interactions within that subgraph and also distinguish this from the remaining part of the networks where community memberships had likewise a meaningful interpretation. This showcases how practitioners should consider \(\sigma _i\) together with \(\mu _i\) and \(s_i\) to fully characterize nodes.

Application on Parakeets network

Figure 4
figure 4

Application on network of parakeets, group 1: visualization of (a) the node types Q; (b) ranking scores, as inferred by the Xor model; (b) ranking scores inferred by the SpringRank model. In (b) and (c) we highlight the interactions involving node 13, the only one that Xor infers as interacting via communities, i.e. \(\sigma _{13}=0\). The node positions are (a) determined with force atlas and (are also point-estimates), (c) based on the inferred score s by Xor and SpringRank respectively. In (b), (c), ranking scores are decreasing from top to bottom, as pointed out by the arrow on the right. Both Xor and MultiTensor detect no meaningful community structure, since they select \(K=1\) from five-fold cross validation and grid search.

A second case study is the application to a network of directed aggressions among captive monk parakeets (Myiopsitta monachus)30. Each directed edge contained the number of aggressive attacks between parakeets in two study groups, the first made of 21 individuals (G1), the second of 19 (G2). Each group was created and then observed for 24 days, divided into four 6-day study quarters. Insights from behavioural ecology suggest that the patterns of aggression correlate with an underlying dominance hierarchy: parakeets direct their aggression strategically, aiming at improving their position in the hierarchy31. Hence, we expect them to have a prevalent hierarchical latent structure, i.e. we expect \(\sigma _{i}=1\) for most of the individuals.

We performed experiments on the two groups, both extracting a single network from each quarter and aggregating on the quarters. The results on aggregated and not aggregated versions are similar for both groups, apart from more noisy results on the first quarter of both G1 and G2 given by a hierarchy not mature enough for being clearly detected31. The resulting inference on the aggregated G1 group is shown in Fig. 4a: only one node is predicted to use community-based interactions. This is also reinforced by the results of cross-validation tests to extract the number of communities – the value \(K=1\) achieves the best AUC score – and by inspecting uv, which have mainly null entries. Because there is only one node detected with \(\sigma _{i}=0\), the interpretation of community in this case is that this particular node has a behavior that cannot be well explained by the same mechanism that well explains that of all the other nodes (hierarchy structure in this case). We can deduct that this node is an anomaly in the social group, who interact with the other individuals with a random behaviour, rather than a strategic one. Hence, its score is to be considered irrelevant, which is in accordance with the fact that it is placed in a middle-ranking position while having a high number of incoming connections (33 incoming vs 40 outgoing, considering the weights), see Fig. 4b. Note that the results achieved by SpringRank on the same network are similar: in Fig. 4c \(i=13\) is placed 13th, close to the 10th position assigned by Xor, while the interaction pattern is not well in agreement with that. Again, it is behaving as an anomaly, but the SpringRank algorithm cannot learn it as it is not designed to distinguish node types.

Application on Political Blogs network

Figure 5
figure 5

Application on a network of political blogs. (a), (b) comparison between the communities detected by MultiTensor and Xor, (c) node types Q visualization and (d) hierarchy in the subnetwork of nodes preferring the competitive mechanism. Nodes’ positions and communities are assigned as in Fig. 3. Both the algorithms select \(K=2\) as optimal number of communities via 5-fold cross-validation scheme and grid search. Here we show the in-coming membership v, orange nodes have null \(v_{ik}=0 \ \forall k\). MultiTensor only assigns null in-coming community membership to nodes with null in-degree (5 nodes, not visible in the plot); Xor assigns null membership to these 5 nodes and to additional 32 ones (all plotted in orange). Of these 32 nodes, 13 are flagged as ranking-driven while 19 have low in-degree (less then 4). A similar behavior is observed considering \(u_i\) and the out-going degree.

As our final example we focus on the ability of Xor to recognize a dominant preferred interaction mechanism in larger datasets. We consider a network of 830 US web blogs with different political orientations, where the weighted directed edges are the number of hyperlinks from a webpage to another. The data have been collected over the period of two months before of the 2004 US presidential election32 and include metadata about each blog’s political orientation – liberal and conservative. As shown in Fig. 5, Xor identifies the large majority of the nodes (\(98\%\)) as driven by community structure. In addition, the algorithm selects \(K=2\) as the best number of groups via cross-validation, in line with the observation that the nodes form two highly assortative groups; the assigned communities are in accordance with both those retrieved by MultiTensor and the node metadata (Xor /MT total accuracy of \(95\% / 95\%\); \(97\% / 98\%\) on ‘liberal’, \(92\% / 91\%\) on ‘conservative’). These are also identified with a high confidence, as given by the low number of community overlapping.

The remaining small fraction of nodes, less than \(2\%\), is classified as ranking-driven. The hierarchical structure between them is strong, as shown in Fig. 5d where we notice only few edges violating the hierarchy, i.e., going from top to bottom. In this case, as in the High School network, the semantic of the directed edges is such that the most popular node is the one with the most in-coming edges. In the absence of further information about these nodes, we investigated their network structural properties, finding no relevant feature that allows to distinguish them from the rest of the nodes. Their reciprocity, in- and out-degree and assortativity w.r.t. the two political orientations are distributed similarly as for the rest of the nodes. While we cannot rule out noise as a potential explanation for their distinct classification, we argue that this is a relevant example where domain knowledge may help explaining a posteriori potential discordant patterns involving a small fraction of the nodes. Overall, this example shows the robustness of Xor in identifying dominant preferred interaction mechanism in large networks and illustrates how practitioners can use these results to guide further investigations in the absence of a clear a priori orientation towards communities or hierarchies.

Discussion

The Xor model captures coexisting hierarchical and community mechanisms in networks. Being a generative model, it can be used for producing synthetic benchmarks with the desired level of interplay between the two mechanisms. It relies on a principled mathematical formulation with interpretable latent variables and its algorithmic implementation is optimized for sparse systems. In particular, it allows for automatic extraction of main patterns of interactions involving subsets of nodes. We gave examples of this by considering networks of friendship nominations among high school students and of aggressive interactions between monk parakeets. In the case of friendship nominations, Xor highlighted a small subnetwork of four individuals whose interactions stood out from the crowd. Similarly, for the aggression in parakeets, it spotted an individual outlier whose interactions do not seem to align with those observed involving other individuals in the group. When applied to larger datasets, as the network of hyperlinks between political blogs, the model is still able to identify a dominant mechanism involving a large majority of network nodes.

We considered here an efficient, but possibly limited, mean-field approximation to perform parameters’ inference. Its connections with well-known models from statistical physics suggest as natural direction for future developments that of deploying more complex approximations, e.g. using belief propagation33. While we expect this to lead to more accurate approximate posterior distributions, this may come at the price of increasing complexity, we leave this as an open problem for future work. From a modelling perspective, an interesting direction for future work is to explore different ways of modelling interaction preferences. Here we assigned a latent variable \(\sigma\) to each node, but it would be interesting to investigate how results change when considering latent variables on edges instead. This choice may be more natural in scenarios where individuals form ties on a case-by-case basis rather than predominantly via one of the two mechanisms explored here. This could potentially account for a further mechanism for edge formation, as reciprocity34,35,36. Similarly, when node attributes are available along with the network dataset, it would be compelling to adapt the model to suitably incorporate this extra information using insights from previous works37. Finally, algorithmic developments to ameliorate further runtime and scalability offer another promising paths for future research directions.

In summary, in this work we make a first step to tackle problems with mixed underlying mechanisms determining edge formation in networks. While we showed examples of interesting patterns possibly arising as inferred by our model, we provide an open-source implementation of the code to facilitate future data explorations.