Beyond pairwise network similarity: exploring Mediation and Suppression between networks

Network similarity measures quantify how and when two networks are symmetrically related, including measures of statistical association such as pairwise distance or other correlation measures between networks or between the layers of a multiplex network, but neither can directly unveil whether there are hidden confounding network factors nor can they estimate when such correlation is underpinned by a causal relation. In this work we extend this pairwise conceptual framework to triplets of networks and quantify how and when a network is related to a second network directly or via the indirect mediation or interaction with a third network. Accordingly, we develop a simple and intuitive set-theoretic approach to quantify mediation and suppression between networks. We validate our theory with synthetic models and further apply it to triplets of real-world networks, unveiling mediation and suppression effects which emerge when considering different modes of interaction in online social networks and different routes of information processing in the brain.


Introduction
Networks are usually seen as a parsimonious model to describe the backbone architecture of complex systems [1]. Accordingly, comparing different systems boils down to compare their architecture, leading to the notion of network similarity measure [2,3,4,5,6]. In graph theory, two graphs are isomorphic if there exist a vertex permutation which maps one network into the other, naturally leading to a binary (and not very useful in real-world systems) notion of similarity. More useful approaches proceed by projecting networks into a suite of properties summarised in some vector p (e.g. degree distribution, centrality vectors, eigenspectra, etc) and, subsequently construct a similarity metric D by which two networks A and B are closer in the space spanned by p if D(A, B) = ||p A −p B || is "small". Other ideas include the formalisation of graph kernels [2], comparing networks by comparing the statistics of random walks running over them [7,8], or using statistical approaches such as estimating topological correlations between networks. While in all these approaches we typically have D(A, B) = D(B, A), i.e. a symmetrical relation, in many cases this undirected relation is hiding an actual direction (whether causal or not). As an example, consider social networks. The different layers of the social network of an individual are typically correlated: my friends offline tend to be also friends in Facebook. However, such relation is directional: when a new link -i.e, a new social relationship-is created, then it is likely that such a link will be replicated within her online social network too (Facebook, Instagram), but it is proportionally less likely that the direction of influence is inverted. So the offline and the online social network of a person are probably similar, but such similarity has a direction. Furthermore, in many cases such influence is not direct (not causal). Sometimes, there is a hidden network C that indeed confounds or mediates the relation between A and B. For instance, the Facebook and Instagram networks of a certain individual are correlated not because there is a direct, causal relationship between them, but because both these networks are indeed related to the actual (offline) social network of the person.
In this work we are interested in understanding and disambiguating when the relation between two networks A and B (where for instance A r B if D(A, B) < ) is a direct one or is underpinned by the hidden interaction with a third network C. In particular, C can be independent of A and B (leading to a direct relation A r B). C can also act as a hidden mediator or confounding factor (A r C, C r B ⇒ A r B). Finally, C can act as a suppressor such that [A ⊕ C] r B, where ⊕ is here to be defined but conceptually means A and C interact synergistically. The terms mediator, confounder and suppressor are inspired by the information-theoretic framework described in [11]. In what follows we address these questions introducing a set-theoretical approach where concepts such as network mediation or network suppression emerge naturally. We benchmark our theory with simple generative models and then apply it to a range of empirical networks, where we unveil and discuss the concomitant roles of mediation and suppression.

Theory
Let A, B, C be three unweighted networks with adjacency matrices A, B, C, all with the same node set and respective edge sets a, b and c (i.e. they can also be identified with the layers of a multiplex network). Let us define the network-Jaccard index of two networks NJ(A, B) as the Jaccard index over its edge sets NJ(A, B) is a similarity metric, and a distance can be easily defined as d(A, B) = 1 − NJ(A, B). This quantity alone can be used to initially establish if two networks are related. Regardless the fact that such relation is effectively undirected or otherwise is causal (influence), in order to explore whether such relation is underpinned by a third network C we need to quantify the effect of conditioning such relation on C. Let us then define the partial network-Jaccard index NJ p (A, B|C) of two networks A, B conditioned on a third one C as the Jaccard index over the edge subsets of A and B formed by those edges which are absent in C: Let's see intuitively the effect of conditioning with respect to C in this way. Suppose initially that C is totally independent from A and B. Then we may expect that the Jaccard index, on average, will be the same if evaluated just on the links which are absent in C, so NJ p (A, B|C) ≈ NJ(A, B). Suppose on the other hand that A is influencing B indirectly, with the mediation of C. Then, intuitively, removing the links of C would effectively push the partial Jaccard index to zero. A similar scenario takes place if A and B are undirectedly related through direct relation to a confounding factor C. Finally, C could be suppressing the influence of A in B. For example, imagine that B somewhat depends on whether A and C interact synergistically, e.g. if links in C are more likely to occur if they are in one network but not on the other (probabilistic XOR gate); then removing the links of C will enhance the partial Jaccard index.
To distinguish these three scenarios, we define the Jaccard net difference Intuitively, if C is independent of the relation between A and B then ∆ ≈ 0, if it mediates or confounds such relation then ∆ < 0, and if it acts as a suppressor then ∆ > 0.
In what follows we construct simple generative models of independent, mediated and suppressor interactions, detailed as algorithms, we prove that these correctly generate these three types of trivariate relations, and depict numerical simulations of the outcome for finite networks.

Independency
A simple generative model of independency is given by three independent, Erdos-Renyi-type models, where in each of the networks each possible link independently occurs with probability p, see Algorithm 1. The following theorem can be easily proved. The proof of this theorem is given in the appendix. We conclude that, on average, NJ p and NJ are nearly equal for uncorrelated networks generated in this way, as partialization with respect to an independent network C does not have any effect. In Fig.1 we illustrate this case for finite networks with N = 50 nodes and p = 0.5, finding that indeed ∆ ≈ 0 and that NJ p (A, B|C) ≈ NJ(A, B) ≈ 1/3, in good agreement with the theorem. , calculated on 1000 realizations of triplets of networks of N = 50 nodes wired such that C plays no effect (green crosses), plays a mediating effect (violet dots) or a suppression effect (red crosses) in the relation between A and B. These interactions are constructed using the generative models described in Algorithms 1, 2 and 3 (p = 0.5 in every case, and q = 1). For completeness, we depict the histograms P (∆) which certify that these algorithms generate networks where C play an independent role (∆ ≈ 0), a mediating role (∆ < 0) or a suppressing role (∆ > 0). for j = i + 1 to N do 6: if rand < p then A ij , A ji ← 1 7: if rand < p then B ij , B ji ← 1 8: if rand < p then C ij , C ji ← 1 9: return A,B,C

Mediation
Suppose now that A and B are both dependent on C, i.e. where there is a link in C, then there is a link in A and B (see Fig.2 for an illustration of such case, and Algorithm 2 for a formal recipe of this generative model). This describes a situation where C mediates the relation between A and B (or, alternatively, C is confounding that relation). Partializing with respect to C removes the dependence between A and B due to C, which intuitively leads to ∆ < 0. The following theorem can be proved: and C be as in Algorithm 2. If A and B share at least one edge besides the common edges shared with C, then ∆ < 0.
The proof of this theorem is also put in an appendix. In Fig.1 we show numerical results for finite networks with N = 50, with p = 0.5.

Suppression
Finally, let us consider the case where B depends on the interaction of A and C such that, an edge occurs in B with a certain probability if it appears in A but not in C or alternatively if it appears in  Fig. 2 for an illustration). This is akin to a probabilistic XOR gate. Then on average NJ p (A, B|C) > NJ(A, B), i.e. partializing with respect to C in this case evidences suppression effects.
Algorithm 3 encapsulates the generative model. The following theorem can be proved: The proof for this theorem is also put in an appendix. In Fig.1 we show numerical results for finite networks with N = 50, with p = 0.5 and q = 1, which are in full agreement with the theorem.

Coexistence of mediation and suppression effects
When we abandon ideal cases where only suppression or only mediation are present, and we go towards a mixture of the two, it becomes evident that both effects can be hidden and a single ∆ cannot in principle tell us if the system evidences only one out of two mechanisms. To investigate coexistence of both mechanisms, we run a simulation in which algorithms 2 and 3 above are combined: in each step with probability 1 − µ we take algorithm 2 and with probability µ we take algorithm 3. The resulting model linearly interpolates mediation and suppression, such that measurable ∆ = (1 − µ)∆ med + µ∆ syn , where ∆ med and ∆ syn are hidden. The results are depicted in the left panel of Fig.3, for different instances of parameter p and q = 1. We can have negative, null or positive values of ∆ underpinned by a balance of both mediation and suppression mechanisms, and actually for the concrete set of parameters, the effect of mediation in ∆ is slightly stronger than the effect of suppression (this unbalance gets more pronounced for q < 1). This simple interpolating model thus leads us to conclude that, in real cases, we might for instance be naively measuring ∆ < 0 and misleadingly concluding that there is only mediation where in fact both mediation and suppression could be at play. Accordingly, a measure describing the ef-Algorithm 3 Suppression() Output: 3 Erdos-Renyi adjacency matrices A, B, C where C acts as a suppressor between A and B (∆ > 0) The blue curve is a pure randomisation, which generates ∆ ≈ 0. The dotted green line corresponds to a selective rewiring that removes all hidden suppression: in that case the curve is kept always below zero (increasing µ increases the amount of Algorithm 3, but then is selectively rewired, thus effectively randomising the networks and pushing ∆ to zero). The pink curve is the result of a selectively rewiring that removes all hidden mediation: in that case the curve is pushed to the regime ∆ > 0. As µ increases, the amount of Algorithm 3 (generating suppression) is increased, hence pushing ∆ to larger values. The dashed yellow and pink lines are the result of selectively rewiring on the randomised networks, and only highlight the residual values of suppression or mediation which occur by chance (as a finite size effect) in randomised networks.
fects of suppression and mediation is not enough to describe and resolve the simultaneous presence of both.
In order to disentangle both effects we now introduce Algorithm 4, which applies both for constructing null models for mediation (M) and suppression (S). To construct a surrogate where all suppression has been removed, starting from A, B, C, we perform a selective rewiring in B, where only those links in B which are also present in A but not in C (or that also appear in C but not in A) are rewired randomly. Similarly, to construct a surrogate where all mediation has been removed, starting from A, B, C, we perform a selective rewiring in B, where only those links in B which are also present in A and in C are rewired randomly. for ij ∈ G do 3: if rand < p then ij ← kl , kl ← ij if X = S then 3: return SelectiveRewire We then compute again the net Jaccard difference on the rewired versions, which are are labelled ∆ S (applied to the case where suppression is removed) and ∆ M (applied to the case where mediation is removed) respectively. The heuristic is then simple: if there is e.g. hidden suppression in the data (respectively mediation), then ∆ S < ∆ (respectively ∆ M > ∆), whereas if such mechanism is absent then ∆ S ≈ ∆ (∆ M ≈ ∆). Now, we also need to take into account finite-size effects which irremediably add spurious mediation and suppression effects (i.e. triplets of purely random, uncorrelated networks will show small but non-zero mediation and suppression due to chance). To counterbalance such effects, we also proceed to selectively remove suppression and mediation from a completely randomly rewired version of B, which we call B2, leading to two new indices: ∆ RS and ∆ RM . We can finally combine these to produce normalised indices of mediation (m) and suppression (s) by normalising them dividing over the maximum possible value of suppression (mediation) attainable by a generative model such as algorithm 3 (2) to the triplet of networks, i.e.m Accordingly, the role that C plays in the relation of A and B is described by the duple (m,s). Additionally, a significance value for these indices could be defined as: It is worth to stress that for large networks the finite size effects become less common, and ∆ R,{S,M} will tend to zero. In the same way any value of suppression and mediation will be highly significant.
For illustration, in the left panel of Fig.4 we show the effects of the sequence of selective rewirings on ∆ applied to a particular example of three independent, Erdos-Renyi networks with N = 300 nodes and wiring probability p = 0.3 (i.e., Algorithm 1). The original value of ∆ is very close to zero, as well as the ones obtained from a full randomisation of B. Since in this example the networks are independent, any mediation or suppression is only a spurious residual due to finite size effects, thus this residual is flagged out in similar terms by a selective rewiring on the actual network B (∆ X ) or on its full randomisation (∆ RX ), hence the violet and green histograms overlap, and similarly the pink and pale blue ones also

Empirical networks
We now turn to real-world networks and consider four types of 3-layer multiplex networks, including (i) different modes of social interaction in Twitter during the 2014's New York City Climate March (NYC), (ii) different types of social interaction -proximity, phone call/text message, Facebook-as collected in Denmark (Copenhagen), (iii) different interpersonal relations inside a corporation (Lazega law firm) and (iv) different synaptic junctions in a brain network (C Elegans), see appendix for details. To begin with, in Fig.5 we confirm that, with the exception of the pair NYC Retweets vs Replies, all other possible pairs of layers in the four examples are indeed genuinely related -i.e., showing substantially more similarity than a null model-. In each case we plot NJ(A, B) (blue bars) and as a reference, as black lines we also plot the average result of NJ(A, B) (± one standard deviation) after A, B have been appropriately randomised. We confirm that the similarity between each pair of networks is not the result of a finite-size effect and thus exploring the role of a third network (C) is justified. We then turn to analyse the role of C. For illustration, the whole selective rewiring process described in Algorithm 4 is depicted with detail for a specific example (the case of the C Elegans multiplex where we explore the role that the electrical synapses layer play in the relation between the monadic and the polyadic layers) in the right panel of figure 4. We provide the original value ∆ 0 , and the distributions of the ∆ values obtained after each of the rewiring procedure, concluding that this network indeed shows non-negligible mediation and suppression effects.
The normalized indices of mediation and suppression for the rest of permutations in all the real-world multiplex networks are reported in figure 6. The first thing we can observe is that overall there is substantially more mediation than suppression, although we also observe the latter mechanism. All effects are statistically significant (σ 1 in every case, data not shown) except for the suppression in the Lazega advice layer and the NYC Retweet layer, where σ ≈ 2 in both cases. In the case of the Copenhagen multiplex, the only layer which evidences a significant role in the relation Figure 5: Similarity between pairs of real-world networks. Values of NJ(A, B) computed on the four empirical multiplex networks considered (for each multiplex, we consider the three pair permutations). As a reference, black horizontal lines display NJ null (A, B) which computes the average over several randomisations of layers A, B (red lines correspond to ± one standard deviation. We conclude that all pair of layers are genuinely related with the possible exception of the pair NYC replies vs retweet. of the other two is the phone/sms layer, which we show displays both mediation and suppression effects, although the former is notably stronger. For the proximity network we considered averaged values over the whole four weeks, and used an adjacency matrix of a density comparable to the one of Facebook links, corresponding to the closest proximity range. Also the phone network was built irrespective of the timing of the interaction. In the case of the Lazega law firm, all three layers display very high mediation, but such effect is notably stronger for the co-working network, i.e. within this firm dyadic friendships are related to the dyadic advisory relations, and this is mediated by the fact that these are co-working. Only the friendship layer displays a suppressor effect (the one played by the advice layer is non-significant), i.e. pairs of individuals that are not co-working can have an advisory relationship (or otherwise) because they are friends, or pairs of co-working will also have an advisory relationship without the needs of them being friends. In the case of the Twitter triplet (NYC), only the Replies network shows a mediating effect. Finally, in the nervous system multiplex (C Elegans) we can see that all layers display some amount of mediation and suppression. The electrical synapses layer is the one displaying a stronger suppressing effect, whereas the monadic chemical layer is the one that displays a larger mediation role. The increased suppression role of the layer of electrical synapses reflects the evidence that chemical and electrical synapses closely interact and serve related functions [14], so that when the either of the chemical layers is taken as C the presence of the other chemical layer accounts for a reduced suppression/mediation.
As a final analysis, and in order to show how suppression and mediation can be functionally modulated within a particular real-world example, we examine the role played by the Proximity layer in the relation between the Facebook and phone calls/sms layers when such layer is systematically varied. In this multiplex, the proximity network is originally reconstructed using Bluethooth signal strength between participants, by assigning a link between each pair of nodes whose relative Bluetooth strength, averaged over the whole period of the recording (four weeks) belongs to a given range. In order to build different proximity networks (each of them accounting for a different spatial scale) and at the same time keeping the edge density constant, we build non-overlapping Bluetooth intensity ranges by taking into account the original Bluetooth intensity distribution (see the inset of Fig.7). In this way, ranges are non-uniform  According to this distribution, we build seven non-overlapping RSSI windows. RSSI is inversely related to distance (the larger the signal strength, the closer two nodes are) so these windows represent different spatial scales. The right panel describesm +s for each spatial scale, further emphasising that only in the closer spatial scale the proximity network is playing a strong mediating role. but the number of edges in each of this range is the same, hence the resulting proximity networks have all the same edge density while describing different scales of physical proximity. The intuition is that only the smaller scale is a meaningful proximity network, and for larger spatial scales the resulting proximity networks do not really imply any real interaction between the nodes. Then, for each resulting proximity network, we estimate the role it plays in the relation between the other two networks and plot it in the (m,s)-plane. Results are shown in the outer panel of Fig.7. For proximity networks describing large spatial scales, the network is essentially independent of the other two, and it is only when the proximity network captures smaller spatial scale (i.e. when links describe real physical proximity) that an indirect effect (notably, mediation) gets amplified.

Discussion
In this paper we have proposed a simple strategy to assess the role that a given network might play in shaping the relation between other two networks, thus enlarging the paradigm of network similarity beyond the classical pairwise comparison. This approach is aligned to a recent endeavour that aims at going beyond dyadic interactions in the characterisation of complex systems [10], and takes inspiration from the causal mediation literature [11,12]. We make use of a set-theoretic approach to define a similarity metric between a pair of networks and to further explore if such relation is independent of, mediated or suppressed by a third network which might be hidden. We introduce simple generative models that, we prove, produce pure mediation and suppression. We then explore the coexistence between mediation and suppression and develop a procedure to disentangle both indirect effects. The whole methodology is subsequently applied to a range of real-world, 3-layer multiplex networks, and we unveil previously unnoticed mediation and suppression effects in social and brain networks. We hope this work sparks further research in several areas. First, the simplicity and tractability of the approach makes it easily applicable across the disciplines. Second, our approach can be readily generalised to consider not just isolated triplets of networks. Indeed, one can sequentially apply this protocol to a multiplex network of arbitrary number of layers, or to a temporal network, and accordingly derive concepts of causality and directionality in this context.

Proof of theorem 2
Proof. In this case the proof uses basic arguments of set theory. We aim to prove that J p (A, B|C) < J (A, B), i.e.
Let us define the residual sets R a = a \ c, R b := b \ c, and the residuals' intersection R i = R a ∩ R b and union R u = R a ∪ R b . R a and R b are clearly non-empty according to Algorithm 2. R i is not guaranteed to be non-empty as A and B are probabilistic, so we need to assume in what follows that R i = Ø. R u is trivially non-empty according to Algorithm 2.
Since for any three sets x, y and z we have that (x ∩ y) \ z = (x \ z) ∩ (y \ z) and (x ∪ y) \ z = (x \ z) ∪ (y \ z), we also have that |(a ∩ b) \ c| = |R i | and |(a ∪ b) \ c| = |R u |. Incidentally, our previous assumption can now be easily interpreted: we need to assume that networks A and B share at least one edge besides the common edges shared with C, that's why this assumption is indeed stated in the theorem. Now, since in this case c ⊂ (a ∩ b) it is easy to see that a ∩ b = c ∪ R i and a ∪ b = c ∪ R u . Also by construction we have c∩R i = Ø and therefore |a∩b| = |c|+|R i |. Similarly, also by construction c∩R u = Ø and therefore |a ∪ b| = |c ∪ R u | = |c| + |R u |. Therefore we aim to prove that Since R i and R u are respectively the intersection and the union of two sets, it trivially follows that |R i | ≤ |R u |, where inequality only saturates when a = b, and otherwise is strict. Let us indeed assume A = B, thus enforcing the strict inequality. Rearranging terms:

Proof of theorem 3
Proof. The first thing to observe is that in Algorithm 3 the condition c ⊂ (a ∩ b) is not met, therefore theorem 2.2 does not hold in this case. Let us now define the following sets: Let a p ⊂ a be the subset of edges in a such that a p ∩ c = Ø. The elements of this subset will be in b with probability q, hence on average |a p |q edges of a p will be in b.
Let c p ⊂ b be the subset of edges in c such that c p ∩ a = Ø. The elements of this subset will be in b with probability q, hence on average |c p |q edges of c p will be in b.
Let r ⊂ b be the subset of edges in b which are neither in a nor in c, i.e. r ∩ (a ∪ c) = Ø. By symmetry, we have E(|a p |) = E(|c p |). According to Algorithm 3, we have . Therefore as long as E(|c p |)q > 0, we have E(J p (A, B|C)) > E (J(A, B)), yielding E(∆) > 0. Since A and C are independent, c p is not empty with large probability, so given a value of q, for sufficiently large N this condition is always met.