Interplay between tie strength and neighbourhood topology in complex networks

Granovetter’s weak ties theory is a very important sociological theory according to which a correlation between edge weight and the network’s topology should exist. More specifically, the neighbourhood overlap of two nodes connected by an edge should be positively correlated with edge weight (tie strength). However, some real social networks exhibit a negative correlation—the most prominent example is the scientific collaboration network, for which overlap decreases with edge weight. It has been demonstrated that the aforementioned inconsistency with Granovetter’s theory can be alleviated in the scientific collaboration network through the use of asymmetric measures. In this paper, we explain that while asymmetric measures are often necessary to describe complex networks and to confirm Granovetter’s theory, their interpretation is not simple, and there are pitfalls that one must be wary of. The definitions of asymmetric weights and overlaps introduce structural correlations that must be filtered out. We show that correlation profiles can be used to overcome this problem. Using this technique, not only do we confirm Granovetter’s theory in various real and artificial social networks, but we also show that Granovetter-like weight-topology correlations are present in other complex networks (e.g. metabolic and neural networks). Our results suggest that Granovetter’s theory is a sociological manifestation of more general principles governing various types of complex networks.


I. INTRODUCTION
While this is not always the case, the weights of edges in networks are usually quantitative expressions of the mutual relationship between nodes.Be it the number of scientific collaborations between authors or the number of mentions in a social network, the weight of an edge often signifies the strength of the connection between nodes.It stands to reason that this strength must, in some way, correlate with the network's structure -specifically, with the relative position of nodes and their neighbourhoods within the network.
Mark Granovetter, in his famous work Strength of Weak Ties [1,2], introduced a theory which aims to explain the aforementioned link between the weights of edges and the topology of the network.An example that illustrates Granovetter's hypothesis can be found in Fig. 1, which shows two fully connected clusters of nodes.According to Granovetter, since virtually all nodes in each cluster have the same neighbourhoods, we should expect that edge weights (tie strengths) within clusters will be high.The clusters are also connected by a single edge.Edge weight of this connection should be low, as nodes at both sides of the link do not share any neighbours.Granovetter's theory also states that weak ties, like the one connecting the clusters in our example, are crucial to the diffusion of information in the network and nodes that have access to such ties have an advantage over those that do not.In this work, however, we are only interested in the first part of the theory, that is in weight-topology correlations.
In more formal terms, the first part of Granovetter's FIG.
1.An example illustrating Granovetter's theory.A detailed description can be found in the text.
theory states that edge weight is positively correlated with the overlap of the neighbourhood of two connected nodes.The overlap between neighbourhoods of node i and node j is defined in the following way [3] where n ij is the number of common neighbours of nodes i and j, k i and k j are degrees of nodes i and j.It is worth noting that overlap, as defined above, is a symmetric measure and we will refer to it as symmetric overlap to emphasize this fact.Similarly, weights w are also assumed to be symmetric, that is Granovetter's theory in this form -that is, a monotonically increasing relation between O ij and w ij -has been empirically confirmed, to various extents, in real social networks [3][4][5][6][7][8][9][10], like the mobile communication network.However, there are also counterexamples to the theory, one of which is the scientific collaboration network [11][12][13].In this network, nodes represent authors, and an edge connects two authors if they co-authored at least one manuscript.The symmetric weight w ij equals the number of manuscripts co-authored by authors i and j.
At first glance, the scientific collaboration network seems to defy Granovetter's theory, as neighbourhood overlap, on average, decreases with edge weight for the majority of edges.We have shown, however, that this supposed disagreement stems from improper definitions of weights and overlaps [14].Or, more specifically, from the fact that symmetric measures cannot properly describe the properties of this network.
Fig. 2 illustrates the problem with symmetric measures.Panel a) shows two nodes, left l and right r, with degrees k l = k r = 4.The nodes share two common neighbours.In this case, symmetric overlap equals Since the degrees of both nodes are identical, the two common neighbours constitute the same fraction of the neighbourhood of each node.In such a scenario, symmetric overlap accurately reflects this observation from the standpoint of both nodes.However, symmetric overlap fails when assessing nodes with vastly different degrees.In Fig. 2b), the left node with k l = 4 shares two common neighbours with the right node, whose degree is k r = 16.
Intuitively, the left node should assign much greater significance to the two common neighbours than the right node.Symmetric overlap cannot be used as a measure of this significance, as for both nodes it equals It is a low value, clearly skewed towards the node with the higher degree.
These two examples show that symmetric measures work properly when dealing with homogeneous networks, where we compare similar nodes -as was the case for many networks in which Granovetter's theory was proven to hold.Non-homogeneous networks, like the scientific collaboration network, which is scale-free [15,16] (there are often nodes with highly different degrees on both sides of an edge), require a different approach.The innate asymmetry of these networks suggests that one must use asymmetric measures instead of symmetric ones.In [14], we introduced the asymmetric overlap Q: with Returning to the example from Fig. 2b), asymmetric overlap for the left node is while for the right node, we have These two values of overlap properly convey the importance of the shared neighbourhood from the perspective of each node separately.Asymmetric overlap reflects the asymmetric relationships of authors in the scientific collaboration network (and other non-homogeneous networks).What can be a large fraction of collaborators from the perspective of one author, can be a negligible fraction from the perspective of another author.Symmetric definitions of weights suffer from similar issues in non-homogeneous networks.Symmetric weight in the scientific collaboration network usually equals the number of collaborations (co-authored articles) between two authors.However, the importance of a single collaboration depends on the total number of collaborations.If someone wrote only one paper in collaboration with an author who published tens or hundreds of manuscripts, then intuitively, the strength of that tie (the weight of the edge) should be greater from the perspective of the former author.Thus, in [14], we also introduced the asymmetric definition of weight v: where w ij is the symmetric weight, and m i is the number of papers published by the i-th author.For asymmetric weights, we also have Using asymmetric overlaps and asymmetric weights, we showed that Granovetter's theory holds in the scientific collaboration network.We also postulated that these are natural and intuitive tools capable of properly describing scale-free networks, with application to other problems, eg.link prediction [17].However, these asymmetric definitions introduce a certain non-obvious issue, especially when it comes to confirming Granovetter's theory.
The nature of this theory -or rather, the nature of the correlation between weights and overlaps postulated by Granovetter -is sociological.That is, the correlation must stem from actual social interactions between entities represented by nodes in the network.In contrast to that, measures defined in ( 6) and (10) introduce structural correlations to the mix -correlations that result from the topology of the network.It is not unreasonable to assume that the number of papers published by an author (m i ) will relate in some way to the total number of collaborators (k i ) -intuitively, one can expect a positive correlation between these two variables.It raises the following questions: What are we really observing if we detect a correlation between asymmetric overlap and asymmetric weight?What is the source of that correlation?Are we truly confirming Granovetter's theory, or are we merely misinterpreting the effects of the network's topology?This paper aims to dispel these doubts using tools introduced in the next section.

II. METHODS
Let us reiterate the problem mentioned in the Introduction and define it in a clearer and more tangible way.The main assumption behind Granovetter's theory is that weights in social networks are not assigned to edges randomly.Instead, they quantify the strength of interpersonal interactions and follow various patterns dictated by the nature of these interactions.One such pattern is that the strength of interactions should be directly tied to the overlap between social circles of nodes.The higher the overlap, the greater the strength of interaction.It is an intuitive and relatable conclusionfor example, ties within a family, which is a densely connected social circle, should be stronger than ties within a workplace.
Assuming that Granovetter's theory is correct, we could expect that in a network in which weights are assigned completely at random, the correlation between overlap and weight does not exist at all.By the same token, if we were to randomize weights in a network by shuffling them among the edges, such a procedure should also destroy the correlation between overlaps and weights.Unfortunately, while this is true for symmetric measures, asymmetric weights and overlaps still exhibit correlation even with randomised weights.The source of these correlations was mentioned in the previous section, and it is the structural correlation between m i and k i , which are in the denominators of the asymmetric measures.
In fact, for reasons that will be explained in detail in the next section, we will use a definition of asymmetric weight different to the one introduced in [14].In this work, asymmetric weight will be defined as (cf.Eq. 10) where s i is the strength of the i-th node Here, the structural correlations are even clearer.Since s i ∝ k i [18] (for example, if we assume that all weights and we must have The existence of the correlation between Q ij and v ij is largely independent of the distribution of symmetric weights w ij in the graph.That is, if we were to shuffle existing weights between edges or assign completely new weights to edges according to some probability distribution, this structural correlation would still be present.The challenge is, then, to decouple the structural correlations from Granovetter-like social correlations while keeping the asymmetric definitions of strengths and overlaps.Thankfully, this is hardly a new kind of problem, and there are tools capable of dealing with similar issues.More specifically, we are going to employ so-called correlation profiles [19,20], which were used before to study mixing patterns in complex networks (correlations between node degrees at the ends of the same edge) [21][22][23].
The idea behind correlation profiles is simple but powerful.One needs to compare the properties of the actual network with the properties of its randomised realisations (the null model).If the null model is chosen correctly, then the difference between random realisations and the actual network should result not from structural correlations but, in our case, from sociological processes (which are not present in the null model) that govern the assignment of weights to edges.
Correlation profiles are constructed using two simple ratios.If we want to study some pattern p observed in a network, then we have to compare the number N (p) of occurrences of that pattern in the actual network with the average number ⟨N r (p)⟩ of occurrences of the same pattern in randomised realisations of the network.Using these two numbers, we can define the ratio If R(p) is close to 1, then there is no significant difference between the null model and the actual network.It follows that pattern p is associated with properties captured in the null model.On the other hand, if R(p) is higher or lower than 1, then there are mechanisms in the actual network that are responsible for the creation (or dissolution) of pattern p that are not present in the null model.
The second ratio -Z-score -is defined as where ∆N r (p) is the standard deviation of N r (p) in the randomised realisations of the network.This ratio determines the statistical significance of R(p).
In most cases, correlation profiles are represented as two-dimensional images.To give a more concrete example, if we want to study the relation between overlap Q and weight v, we divide the Q − v plane into twodimensional bins of equal size on a log-log scale (we use a logarithmic scale because Q and v values span multiple decades).Patterns p correspond to pairs (v, Q) (each edge in the network introduces two such pairs) falling into corresponding bins.
We count the number of points N (p i ) that fall into the i-th bin in the actual network (here, p i denotes a pattern corresponding to a point falling into the i-th bin).Next, we create many random realisations of the network by shuffling symmetric weights and average over these realisations the number of points N r (p i ) that fall into the corresponding bin.Dividing these two numbers gives us the ratio R(p i ) for the i-th bin.We repeat this procedure for each bin (using the same random realisations), which gives us the full correlation profile.Z-scores are calculated in the same way.
An illustration of R(p i ) calculation can be found in Fig. 3.In this example, we concentrate on the middle bin.When weights are shuffled among edges during the creation of randomised graph instances (the null model), points on the Q−v diagrams change their positions.However, they only move along the v axis.The overlaps, which are independent of weights, do not change.Since the network's topology is fully retained during weight shuffling, the null model leaves the structural correlations intact.In Fig. 3, points that will move into the middle bin after shuffling are orange, while the point that will move out of the middle bin is green.Arrows indicate FIG. 3.An example illustrating the creation of correlation profiles -points in bins a) before weight shuffling and b) after shuffling.

Dataset
Nodes where each of the relevant points will end up after shuffling.Thus, for the middle bin, we have This value of R suggests that the processes responsible for the distribution of weights in the actual network remove points from the middle bin when compared with a random instance of the network, possibly prioritizing other bins in the diagram.While it is an oversimplification (especially since we used only one randomised network instance instead of an entire ensemble, as required by the definition in Eq. 19), this example demonstrates the main idea behind the correlation diagrams.The nonstructural correlations can be singled out by comparing the positions of points on the Q−v diagrams corresponding to the actual network and its randomised instances.

III. DATASETS
In [14], we studied the validity of Granovetter's theory only in the scientific collaboration network.In this work, wanting to test both the theory itself and the applicability of correlation profiles on a variety of different networks, we used 8 datasets in total: • Twitter (source: [24]) -the network of Twitter mentions [12].Nodes represent Twitter users and weights correspond to the number of mentions.
• DBLP (source: [25]) -the scientific collaboration network (version 12).It contains metadata about scientific articles [26], including lists of authors and references.Nodes represent authors; two authors are connected if they co-authored at least one paper.Symmetric weight is equal to the total number of papers co-authored by two authors.
• Actor Movies (source: [27]) -nodes represent actors, two actors are connected if they appeared in the same film.Symmetric weight is equal to the number of films in which actors worked together.
• Record Labels (source: [28]) -nodes represent music artists, two artists are connected if they performed under the same record labels.Symmetric weight is equal to the number of record labels under which artists worked together.
• The Marvel Universe Social Network (source: [29]) -nodes represent heroes, two heroes are connected if they appeared in the same comic [30].Symmetric weight is equal to the number of comics in which heroes appeared together.
• Flights -network of passenger flights.Nodes represent airports, and weights correspond to the volume of traffic (number of passengers) between airports.This database is commercial and is not publicly available.
• Metabolic Network (source: [31]) -where nodes represent reactants, connected by an edge when they take part in the same reaction [32].Symmetric weight equals the number of reactions sharing two given reactants.
• Caenorhabditis Elegans (source: [33]) -the neural network of Caenorhabditis elegans [34].Nodes represent neurons, and an edge links two neurons if a synapse or gap junction connects them.Weights correspond to the total number of connections between neurons.
Not all of these networks are social networks, and some are artificial social networks.However, they all exhibit a Granovetter-like relationship between overlaps and weights.Table III contains information about the sizes of the largest connected components in the networks -our analysis was constrained to these components.Some of the networks we used can be represented as bipartite graphs (e.g.DBLP, Actor Movies -virtually all collaboration networks can be stored in this form) and recovered via appropriate projections [35,36].These networks are undirected and have a well-defined notion of symmetric weight.One can also easily use (10) to define asymmetric weights in such networks, with m i equal to the degree of node i in the bipartite representation of a graph (which corresponds to the total number of collaborations for a given node -e.g.movies or scientific manuscripts).On the other hand, networks like Twitter or Flights are inherently directed, cannot be expressed as bipartite graphs and, consequently, Eq. ( 10) cannot be applied in a meaningful way.
In order to standardise our approach to the networks under study and overcome problems associated with Eq. ( 10), we decided to symmetrise all directed networks and assumed that symmetric weight in their undirected equivalent is equal to the average of weights in both directions: where V ij and V ji are weights of directed edges.At the same time, we abandoned the definition of asymmetric weight introduced in [14], and settled on definition (12) instead (where asymmetry is achieved by normalising symmetric weight -that is by dividing it by the strength of a node).While it may seem as counter intuitive -directed networks are converted to undirected ones using Eq.20, only to be converted again to directed networks using Eq. 12 -this approach allows us to treat all networks, both directed and undirected ones, in the same way and to compare results.

IV. RESULTS
Correlational profiles for Twitter, a real social network, are shown in Fig. 4. Panels a) and b) contain, for comparison with their asymmetric counterparts, heatmaps of the symmetric overlap O as a function of symmetric weight w for the actual network and the null model.It is worth noting that, in many cases, symmetric weights are integers, and edges are often characterized by the same weight values.This makes edges indistinguishable from one another, which is a problem associated with using symmetric weights.Asymmetric weights are free of this issue, which is their additional benefit.Panel c) contains heatmaps for the asymmetric overlap Q as a function of asymmetric weight v.
A clear, Granovetter-like relation is visible -overlap increases with weight.However, almost the same relation is present in panel d), which contains the equivalent heatmap for the null model (the same network with shuffled edges).These two panels show the root of the issue with the asymmetric definitions of weights and overlaps.Granovetter's theory dictates how weights should be distributed in a graph.If the theory is correct, then we should reasonably expect that there is no correlation between Q and v in the null model -the shuffling procedure should destroy any deliberate (from the perspective of Granovetter's theory) placement of weights.Unfortunately, such a correlation is also present in the null model due to the network's topology.Moreover, at first glance, the relation between Q and v seems to be very similar in the actual network and the null model.This is where the correlation profiles come into play.By comparing panels c) and d) -that is, by dividing counts in bins in c) by counts in corresponding bins in d), which creates the correlation profile R, Eq. 19 -we can easily find the differences between the null model and the real network.Panel e) shows such a profile.We can also see a Granovetter-like relation visible there -linear (on a log-log scale) clusters of bins such that more edges fall into these clusters in the actual networks than in the null model.It strongly suggests that Granovetter's theory is indeed correct and that sociological processes that govern the distribution of weights in real networks result in higher weights assigned to edges with higher values of overlaps.These results are statistically significant, which is confirmed by Z-scores in panels f).
We calculated correlation profiles and Z-scores for all networks presented in the previous section.More examples can be found in Fig. 5 and Fig. 6, which show profiles for the network of flights and DBLP.Note that in the case of DBLP, the average symmetric overlap is a decreasing function of symmetric weight for the majority of samples -it is precisely this behaviour that necessitates the introduction of asymmetric measures.Results for asymmetric measures presented in both figures are qualitatively equivalent to the ones in Fig. 4. Once again, we can see a correlation between Q and v in both the actual network and the null model.A Granovetter-like relation is prominent in panel e), suggesting that the processes responsible for the distribution of weights in this network prefer to assign higher weight values to edges characterised by higher overlap values.This observation holds true for all the networks examined in our study.
There is another way to test Granovetter's theoryit is possible to calculate the correlation between overlaps and weights for the null model and the actual network.If Granovetter's theory is correct, then correlations in the real network should be stronger than in the null model.Fig. 7 shows these correlations for all networks we studied.Considering the non-linearity of data, we decided to use the Spearman correlation and calculate it for logarithms of weights and overlaps.As can be seen, in all cases, there is a stronger positive correlation between weights and overlaps in the actual network, which supports Granovetter's theory.

V. SUMMARY AND CONCLUDING REMARKS
Due to the asymmetric nature of many human interactions (or, more generally, any interactions), symmetric measures cannot be universally used to describe social networks [14,37].As we have shown, asymmetry is required in order to deal with such networks properly.For example, asymmetric measures can be used to confirm Granovetter's theory in the network of scientific collaborations, which was considered a counterexample to said theory.However, asymmetric measures -depending on their definitions -are not easy to interpret and require careful and deliberate handling.
In the case of the asymmetric overlap Q and asymmetric weight v, as defined in Eqs. ( 6) and ( 12), the problem with interpretation stems from the superfluous correlations introduced by the definitions of these measures.In fact, there are two layers of correlation that one needs to be wary of when analysing the relationship between Q and v.The first layer is purely structural, induced by the network's topology.The strength of a node s (the sum of weights over edges connecting the node to its neighbours) is correlated with the node's degree, resulting in a correlation between Q and v.The second layer of correlations, the one we are truly interested in when confirming Granovetter's theory, is tied to the sociological processes that govern the distribution of weight between edges in the network.We assume that higher weight values will be assigned to edges with higher overlap values, which is not obvious, unlike the previous correlation.The problem is that correlations from both sources overlap, and a method that would allow us to differentiate between them is needed.
In this paper, we have shown that correlation profiles can be used to achieve this goal.The idea behind them is simple but effective -by randomising weights in a graph (shuffling them), we destroy the second kind of correlations, leaving only the structural correlations intact.Then, by comparing weights in the actual graph with its randomisations, we can determine how exactly the sociological processes responsible for weight distribution in a given network assign weights to edges.Our analysis shows that in the network we studied, a clear Granovetter-like relationship is present in the correlation diagrams (see Fig. 4e for Twitter and Fig. 5e for the network of flights).That is, higher weight values are assigned, on average, to edges with higher overlap values -to the point that a monotonic relation (in the average sense on a log-log plot) is visible in the diagrams.This result truly confirms Granovetter's theory.
Moreover, not only did we study social networks and artificial social networks, but we also calculated correlation profiles for different kinds of networks -for example, the neural network of Caenorhabditis elegans or the metabolic network.These networks also exhibit a Granovetter-like relation between overlaps and weights, which suggests that Granovetter's theory is a sociological manifestation of more general principles governing complex networks.
On the one hand, we believe that this result is intuitive, as one can generally expect that if two nodes share a large portion of their neighbourhoods, then the strength of the connection between these nodes will likely be high.On the other hand, we hypothesise that the recently popularised theory of hidden metric spaces [38][39][40][41] can provide a more formal explanation of this phenomenon.According to this theory, the topology of some networks and the values of weights can be explained by the existence of metric spaces in which these networks can be embedded -the connections in the network are determined, roughly speaking, by the positions of nodes in the hidden space.Such a structured way of determining (or explaining the topology of) neighbourhoods of nodes and edge weights likely leads to a correlation between weights and overlaps.However, we must emphasise that it is still a hypothesis and a possible and interesting direction for future studies.

FIG. 2 .
FIG. 2. Examples of two connected nodes with a) similar and b) vastly different degrees.

FIG. 4 .
FIG. 4. Correlation profiles for Twitter.a) Heatmap for the actual network -symmetric weights.b) Heatmap for the null model (randomised network) -symmetric weights.c) Heatmap for the actual network -asymmetric weights.d) Heatmap for the null model (randomised network) -asymmetric weights.e) Correlation profile (R).f) Z-score (Z).The white lines in a) and b) correspond to the average O as a function of w, on c) and d) -the average Q as a function of v.The line in panel e) is the same as in c).

FIG. 5 .
FIG. 5. Correlation profiles for the network of flights.a) Heatmap for the actual network -symmetric weights.b) Heatmap for the null model (randomised network) -symmetric weights.c) Heatmap for the actual network -asymmetric weights.d) Heatmap for the null model (randomised network) -asymmetric weights.e) Correlation profile (R).f) Z-score (Z).The white lines in a) and b) correspond to the average O as a function of w, on c) and d) -the average Q as a function of v.The line in panel e) is the same as in c).

FIG. 6 .
FIG. 6. Correlation profiles for DBLP.a) Heatmap for the actual network -symmetric weights.b) Heatmap for the null model (randomised network) -symmetric weights.c) Heatmap for the actual network -asymmetric weights.d) Heatmap for the null model (randomised network) -asymmetric weights.e) Correlation profile (R).f) Z-score (Z).The white lines in a) and b) correspond to the average O as a function of w, on c) and d) -the average Q as a function of v.The line in panel e) is the same as in c).

FIG. 7 .
FIG. 7. Spearman correlation between asymmetric overlaps and asymmetric weights in the real network as a function of the corresponding correlation in the null model.

TABLE I .
Sizes of largest connected components in the datasets.