Testing structural balance theories in heterogeneous signed networks

The abundance of data about social relationships allows the human behavior to be analyzed as any other natural phenomenon. Here we focus on balance theory, stating that social actors tend to avoid establishing cycles with an odd number of negative links. This statement, however, can be supported only after a comparison with a benchmark. Since the existing ones disregard actors' heterogeneity, we extend Exponential Random Graphs to signed networks with both global and local constraints and employ them to assess the significance of empirical unbalanced patterns. We find that the nature of balance crucially depends on the null model: while homogeneous benchmarks favor the weak balance theory, according to which only triangles with one negative link should be under-represented, heterogeneous benchmarks favor the strong balance theory, according to which also triangles with all negative links should be under-represented. Biological networks, instead, display strong frustration under any benchmark, confirming that structural balance inherently characterizes social networks.


INTRODUCTION
Network theory has emerged as a powerful framework in many disciplines to model different kinds of real-world systems, by representing their units as nodes and the interactions between them as links.In social science, the study of networks with signed edges has recently seen its popularity revived [1][2][3][4], because the signed character of links can be used to represent the positive as well as the negative social interactions that are currently identifiable in empirical data.
From a historical perspective, the interest towards signed networks is rooted into the psychological theory named balance theory (BT), firstly proposed by Heider [5].The choice of adopting signed graphs to model it has, then, led Cartwright and Harary [6] to introduce its structural version (SBT), which has found application not only in the study of human relationships, but also in that of biological, ecological and economic systems [7][8][9][10].
BT deals with the concept of balance: a complete, signed graph is said to be balanced if all its triads have an even number of negative edges, i.e. either zero (in this case, the three edges are all positive) or two (see Fig. 1).Informally speaking, BT formalizes the principles 'the friend of my friend is my friend' and 'the enemy of my enemy is my friend'.The so-called structure theorem states that a complete, signed graph is balanced if and only if its set of nodes can be partitioned into two, disjoint subsets whose intra-modular links are all positive and whose inter-modular links are all negative.Cartwright and Harary extended the definition of balance to incomplete graphs [6] by including cycles of length larger than three: a (connected) network is said to be balanced when all cycles are positive, i.e. they contain an even number of negative edges.Taken together, the criteria above form the so-called structural strong balance theory (SSBT).
The framework of SSBT has been extended by Davis [11] by introducing the concept of k-balanced networks, according which signed graphs are balanced if their set of nodes can be partitioned into k disjoint subsets with positive intra-modular links and negative inter-modular links.This generalized definition of balance leads to the formulation of structural weak balance theory (SWBT), according to which triads with all negative edges are balanced, since each of their nodes can be thought of as a group on its own if necessary (see Fig. 1).
Several metrics to decide whether signed networks are strongly or weakly balanced have been proposed.For instance, the level of balance of a signed network has been quantified as the number of edges that need be removed, or whose sign need be reversed, in order to obtain a network where each cycle has an even number of negative links [12,13].Alternatively, it has been defined as the number of balanced, closed walks (i.e.closed walks with an even number of negative links) that are present in the network [14][15][16][17].In [18] an incomplete, signed network is considered balanced if it is possible to fill in all its missing links to obtain a complete, balanced graph according to SSBT.In [19]  the authors define three different levels of balance: at the micro-scale, involving triads; at the meso-scale, involving larger subgraphs; at the macro-scale, involving the entire network.Still, as firstly noticed in [6], 'it may happen that only cycles of length 3 and 4 are important for the purpose of determining balance'; this is further stressed in [20], where it can be read that 'this intuition has been later justified empirically by demonstrating that it is easier for people to memorize the valences of ties in shorter cycles', and confirmed in [21], where it is noticed that 'analyses based on counting simple cycles demonstrated that real networks often have a relatively low cycle length threshold after which the degree of balance measures quickly decrease'.
Other approaches have been adopted in [22][23][24], where the problem is studied from a spectral perspective, and in [25], where the problem is studied by employing concepts borrowed from statistical physics (each signed triad is assigned an energy and the networks at the 'lowest temperature' have triangles without negative edges).
Other authors, instead, have focused on the complementary notion of frustration, trying to quantify the extent to which signed networks are far from balanced [19,[26][27][28].In [26], the authors define the so-called balanced decomposition number, i.e. the (minimum) number of balanced groups into which nodes can be partitioned, and evaluate it by counting the (minimum) number of edges whose removal increases a network balance.In [29], instead, the same index is evaluated by adopting the so-called switching signs method introduced in [30] and prescribing to count the (minimum) number of signs that must be reversed to balance a network.In [22], the level of (im)balance of a network is proxied by the magnitude of the smallest eigenvalue of the Laplacian matrix.
Empirical observations seem to point out that realworld, signed networks tend to be k-balanced, i.e. to avoid establishing the patterns that are considered as frustrated by SWBT: as an example, in [24] the authors study a pair of online, social networks induced by the relationships between users, showing that balance increases as the number of clusters into which nodes are partitioned is larger than two.In [17], the authors notice that the weak formulation of SBT allows a better performance in predicting signs to be achieved.
In the present paper, we approach the concept of balance (or frustration) from a statistical perspective, comparing the empirical value of a chosen metric with the outcome of a properly defined benchmark model, i.e. a reference model preserving some of the network properties while randomizing the rest.The most common null model for signed graphs is perhaps the one obtained by keeping the positions of edges fixed while shuffling their signs [2,17].Reference [31] implements what we may call (for reasons that will be clear later) the canonical variant of the aforementioned exercise, assigning signs by means of a Bernoulli distribution.Reference [15] introduces a null model for randomizing both the presence and the sign of links.In [10], the signed version of the Local Rewiring Algorithm is implemented (at each step, two edges with the same sign are selected and rewired, to preserve the total number of signed links incident to each node).The canonical variant of this model is implemented in [32], where the Balanced Signed Chung-Lu model (BSCL) is proposed (although it additionally constrains also the average number of signed triangles each edge is part of).Finally, refs.[33][34][35][36] define models constraining the structural properties of signed networks within the framework of Exponential Random Graphs (ERG).
Our contribution here focuses on binary, undirected signed networks and is motivated by two key considerations.First, real-world social networks have different levels of sparsity and we therefore aim at extending the ERG framework to include null models suitable for the analysis of signed graphs with plus (positive), minus (negative) and additionally zero (missing) edges.Second, as in the analysis of most other networks, we recognize the importance of preserving the inherent heterogeneity of different nodes and we therefore define new null models that can constrain the number of plus, minus and zero edges of each node separately.As we shall see, controlling for the different tendencies of actors of establishing friendly and unfriendly relationships can change the estimated statistical significance of balance quite dramatically.After defining a suite of such null models, we will use them to inspect the statistical significance of the most commonly studied (un)balanced patterns at both local and global levels, i.e. signed triangles and signed communities.

Datasets description
We now employ the benchmarks introduced and discussed in 'Materials and Methods' and summed up in Table I to analyze various real-world networks.Although most of them represent social relationships, we have also considered biological data as a comparison to check for specific patterns characterizing social structures.
The first dataset is the Correlates of Wars (CoW) dataset [37].It provides a picture of the international political relationships over the years 1946-1997 and consists of 13 snapshots of 4 years each.A positive edge between any two countries indicates an alliance, a political agreement or the membership to the same governmental organization.Conversely, a negative edge indicates that the two countries are enemies, have a political disagreement or are part of different, governmental organizations.
The second dataset collects information about the relationships among the ≃ 300.000 players of a massive multiplayer online game (MMOG) [38].A positive edge between two players indicates a friendship, an alliance, or an economic relation.Conversely, a negative edge indicates the existence of an enmity, a conflict, or a fight.Since the network is directed, we have made it undirected by applying the following rules: if any two agents have the same opinion about the other, the undirected connection preserve the sign (i.e.+1 • +1 = +1 and −1 • −1 = −1); if any two agents have opposite opinions, we assume their undirected connection to have a negative sign (i.e.+1 • −1 = −1 • +1 = −1).Furthermore, in order to preserve the total number of nodes, we treat non-reciprocal connections as reciprocal, by preserving the original sign (i.e.+1 • 0 = 0 • +1 = +1 and The remaining datasets we consider are those collected in [39] and analyzed in [27].These include three sociopolitical networks (SPNs): N.G.H. Tribes, Senate US, Monastery; two financial networks (FNs): Bitcoin Alpha and Bitcoin OTC ; and three gene-regulatory networks (GRNs): E. Coli, Macrophage, Epidermal Growth Factor Receptor.
In the SPNs, N.G.H. Tribes collects data about New Guinean Highland Tribes (here, a positive/negative link denotes alliance/rivalry), Monastery corresponds to the last frame of Sampson's data about the relationships between novices in a monastery [40] (here, a positive/negative link indicates a positive/negative interaction), and Senate US collects data about the members of the 108th US Senate Congress (here, a positive/negative link indicates trust/distrust or similar/dissimilar political opinions).
The FNs are 'who-trust-whom' networks of Bitcoin traders on an online platform: a positive/negative link indicates trust/distrust between users [41].The networks representing the FNs are weighted, directed ones: hence, after having binarized them by replacing each positive (negative) weight with a +1 (−1), we have made them undirected by applying the same rules adopted for the MMOG dataset.
Lastly, in the GRNs each node represents a gene, with positive links indicating activating connections and negative links indicating inhibiting connections.Specifically, E. Coli collects data about a transcriptional network of the bacterium Escherichia Coli ; Macrophage collects data about a blood cell that eliminates substances such as cancer cells, cellular debris and microbes; Epidermal Growth Factor Receptor collects data about the protein that is responsible for cell division and survival in epidermal tissues.
The vast majority of the networks considered here is characterized by a small link density c = 2L/N (N − 1) but a large fraction L + /L of positive links.The density of the CoW network decreases over time from ≃ 0.2 to ≃ 0.1 and the percentage of positive links is roughly stationary around ≃ 88%; on the other hand, the link density of the MMOG network is stationary around 0.003 and the percentage of positive links decreases from ≃ 98% to ≃ 60%.The SPNs have the largest values of link density among the configurations in our basket, ranging from ≃ 0.3 to ≃ 0.5, and percentages of positive links ranging from ≃ 50% to ≃ 75%.Bitcoin Alpha has a link density of ≃ 0.002 and a percentage of positive links of ≃ 90%, while Bitcoin OTC has a link density of ≃ 0.001 and a percentage of positive links of ≃ 85%.Lastly, the GRNs have a link density ranging from ≃ 10 −3 to ≃ 10 −2 and a percentage of positive links ranging from ≃ 58% to ≃ 66%.For more details on the basic descriptive statistics of the networks considered in the present work, see the Supplementary Note 4.

Assessing balance
In order to test the validity of the two formulations of SBT, at the local level, we need to compare the empirical abundance of the triadic motifs defined in the Methods section with the corresponding expected values calculated under the null models we have introduced.To this aim, a very useful indicator is represented by the z-score where N m (A * ) is the number of occurrences of pattern m in the real network A * , ⟨N m ⟩ is the expected occurrence of the same pattern under the chosen null model and σ is the standard deviation of N m under the same null model.z m quantifies the number of standard deviations by which the empirical abundance of pattern m differs from the expected one.For instance, after checking for the Gaussianity of N m under the null model (since it is a sum of dependent ran-

Null model
Topology: free Topology: fixed Homogenous SRGM: each pair of nodes is assigned a plus, a minus or a zero edge with a probability that is pair-independent; all nodes are statistically equivalent.Differently from the recipe adopted in [15,42], the parameters defining our SRGM can be unambiguously tuned to reproduce the empirical number of plus and minus edges of any (binary, undirected, signed) network.
SRGM-FT: the topology is the same as in the real network and the connected pairs of nodes are assigned either a plus one or a minus one, with a probability that is pair-independent.Differently from the recipe adopted in [2,31], the parameters defining our SRGM-FT can be unambiguously tuned to reproduce the empirical number of plus and minus edges of any (binary, undirected, signed) network.The SRGM-FT is the conditional version of the SRGM.Heterogenous SCM: each pair of nodes is assigned a plus, a minus or a zero edge, with a probability that is pair-dependent and determined by the different tendencies of nodes to establish positive and negative interactions.This model represents the canonical variant of the one employed in [10].
SCM-FT: the topology is the same as in the real network and the connected pairs of nodes are assigned either a plus one or a minus one, with a probability that is pair-dependent and determined by the different tendencies of nodes to establish positive and negative interactions.The SCM-FT is the conditional version of the SCM.
Table I: Descriptive summary of signed benchmarks.
indicates that the empirical abundance of pattern m is not compatible with the null model at those significance levels.In the latter case, a value z m > 0 (z m < 0) indicates the tendency of the pattern to be over-(under-)represented in the data with respect to the null model.
z-scores can be evaluated either analytically or numerically: implementing the first alternative requires employing the formulas provided in the Supplementary Note 6; implementing the second alternative requires numerically sampling the ensembles of graphs defined by our null models.Since the entries of the adjacency matrix are independent random variables, the unbiased generation of a random matrix A ∈ A can be carried out by drawing a real number u ij ∈ U [0, 1] and posing: for models with varying topology, , for all pairs i < j; for models with fixed topology, for all pairs i < j such that |a * ij | = 1 (see the Supplementary Note 5 for an estimation of the time required to sample the ensemble induced by each of our models, for each of our datasets).

Testing structural balance at the microscopic scale
We report our results starting from the network datasets that have several temporal snapshots (CoW and MMOG).Fig. 2 shows the temporal trends of the zscores for the two networks under the homogeneous null models (SRGM and SRGM-FT).Panels (a)−(c) refer to the SRGM and show that the z-scores for all triangles, irrespective of their signs, are strong and positive.This means that all triangles are over-represented in the data, with respect to a null model that completely randomizes the topology.This result is not unexpected, as it merely indicates that, given the empirical density of links, it is very unlikely to form triangles completely by chance.These results simply tell us that the SRGM is uninformative about the (im)balance in the data, as it is entirely biased by a purely topological effect.This conclusion is in line with the results in [17], which suggested that the SRGM-FT is to be preferred over the SRGM as it provides a better explanation of empirical network structures.
By contrast, the results generated under the SRGM-FT clearly support SWBT (see panels (b)−(d)).Indeed, the only significantly over-represented pattern in the data is precisely the only one that SWBT considers frustrated (the triangle with a single negative link), whereas the empirical abundance of the triangle with all negative edges (which SSBT would predict to be overrepresented as well) remains largely compatible with the null model.Notice that also the empirical abundance of the balanced triangle with two negative edges is close to the one expected under the SRGM-FT, although its z-score is typically smaller than the z-score of the all- In any case, the hypothesis that nodes tend to establish balanced triangles with all positive links is supported on both datasets.Results of this type constitute the backbone of the narrative according to which the weak version of the structural balance theory (SBT) is the one that is better supported by data.(a)−(c) -Note that the SRGM has all z-scores positive, thereby not supporting any version of SBT, a result due to the complete randomization of the topology along with the edge signs: the over-representation of all patterns in the data is merely due to the fact that triangles form with small probability at a purely topological level, given the low link density, irrespective of their signs.
negative triangle.In any case, the abundance of the bal-anced triangle with three positive edges is significantly  2), showing that node heterogeneity contributes significantly to the overall abundance of signed triangles.The all-positive (balanced) triangle is still strongly over-represented in all cases, but additionally the all-negative (frustrated) triangle is now always under-represented.Under the SCM-FT, the other frustrated triangle (the one with a single negative link) is also systematically under-represented, and these combined results provide support for the structural strong balance theory (SSBT) (particularly evidently for the MMOG data).By contrast, the structural weak balance theory (SWBT) (according to which one would expect the under-representation of only the triangle with a single negative link) is no longer supported.These results provide an alternative narrative w.r.t. the usual one: when the heterogeneity of the signed degrees of nodes is accounted for, statistical evidence supports SSBT rather than SWBT.
over-represented on both datasets.This type of results constitute the backbone of the narrative according to , instead, systematically support the structural strong balance theory because they assign positive z-scores to the two balanced triangles (all-positive and with two negative links) and negative z-scores to the two frustrated triangles (all-negative and with one negative link).Additionally, in biological networks they tend to assign opposite signs (w.r.t.social networks) to most z-scores, highlight a strong tendency towards imbalance.These results are fully in line with what already observed with the Correlates of Wars and MMOG datasets in Figs. 2 and 3.
which the weak version of SBT is the one that is bet-ter supported by data [2,17].However, since both the SRGM and the SRGM-FT do not constrain the local (node-specific) signed properties (i.e. the signed degrees of nodes), they cannot disentangle the effects of node heterogeneity from the revealed overall structural (im)balance.For this reason, In Fig. 3 we repeat the analysis of the CoW and MMOG datasets using the SCM and SCM-FT null models.As expected, the resulting z-scores are much smaller in absolute value, showing that node heterogeneity in the real networks is in general strong and is responsible for a significant part of the overall measured (im)balance.Therefore, controlling for the local signed degrees is a way to filter out the effects of node heterogeneity in the statistical analysis of structural balance.In general, we see that the triangle will all negative links has now negative z-scores in both datasets, under both null models.Similarly, the all-positive triangle remains with positive z-scores in all cases.The level of statistical significance (i.e. the absolute value of the z-score) is however quite different in the various cases: in general we see an overwhelming over-representation of the two balanced triangles (the all-positive one and the one with two negative links) in the MMOG data under both null models, while for the CoW data the only clearly significant patter is the over-representation of the allpositive triangle in the SCM.Nicely, the SCM-FT gives always negative z-scores to both the frustrated triangles (the all-negative one and the one with only one negative link), and most of the time positive z-scores to the two balanced triangles.Although the statistical evidence is much stronger for the MMOG data, this result indicates that, if any, the version of SBT supported by the data is SSBT, rather than SWBT.Therefore, as soon as the heterogeneity of the signed degrees of nodes is accounted for, SWBT loses its statistical support, and SSBT is favoured by the data.We now move to the results obtained on datasets which include other social networks as well as various biological networks, providing a different real-world benchmark where structural balance theory is not expected to apply.From Fig. 4 we confirm that the SRGM is completely uninformative about structural (im)balance, as it produces z-scores that are typically positive and very large for all triangles (balanced and unbalanced) and all networks (social and biological).This result simply means that the formation of any triangle, irrespective of its defining signs, is highly unlikely if the topology is completely randomized.By contrast, under the SRGM-FT the only pattern that is under-represented in social network data is the frustrated triangle with a single negative link, a result that largely supports SWBT (on biological data, this pattern is instead either not significant or over-represented).Heterogeneous null models (SCM and SCM-FT), instead, assign positive z-scores to the two balanced triangles (all-positive and with two negative links) and negative z-scores to the two frustrated triangles (all-negative and with one negative link), thereby systematically supporting SSBT.When used on biological networks, they instead highlight a strong tendency towards imbalance, as they tend to assign opposite signs (w.r.t.social networks) to most z-scores.These results confirm and extend what discussed above for the CoW and MMOG datasets, and additionally show that biological networks behave very differently from social networks, somehow favouring frustration.This is an indication that structural balance is indeed an inherent property of social networks.
As further evidence supporting the above conclusion, in Fig. 5 we show, for all networks and under all null models, the z-scores of the frustration indices SDoF and WDoF defined in Eqs. 9 and 10 respectively.Note that, while the raw values of SDoF and WDoF would not discount the effects of the imposed structural constraints on the raw values of frustration, the z-scores measure the level of statistical significance of the 'residual' frustration, after the structural constraints are accounted for.We see that, under all null models, the z-scores (when significant) are always negative for the social networks (signalling under-representation of the frustration indices in the data) and always positive for the biological networks (signalling over-representation of frustration in the data).Moreover, for the models with fixed topology, the z-scores for the heterogeneous null model (SCM-FT) are systematically smaller (in absolute value) than the ones for the corresponding homogeneous model (SRG-FT), indicating that, compared with the latter, the former model 'explains more' of the level of empirical frustration observed in the data.The same relation does not apply systematically between the models with varying topology (SCM and SRG), suggesting that models with fixed topology lead to more robust conclusions, as already observed in terms of their support for SWBT or SSBT.
Testing structural balance at the mesoscopic scale Motivated by the last observation, we now use the null models with fixed topology to probe the patterns of structural (im)balance at a larger, mesoscopic level, i.e. as portrayed by the community structure deriving from optimally partitioning the nodes into communities with positive internal links and negative external ones.As anticipated, SSBT predicts that the overall level of intracommunity frustration, as measured by the FI defined in Eq. 11, should be observed after optimally partitioning the nodes into two communities, dominated by positive signs internally and negative signs across.By contrast, SWBT allows for potentially any number of communities, because it bases the idea of balance precisely at the level of communities, so that all-negative triangles (and in principle all-negative cycles of any length) can be explained by placing the constituent nodes across distinct communities.To extract information about the signed community structure from our data, given a null model ⟨a ij ⟩ for the signed adjacency matrix entry a ij , we look While the SRGM-FT reveals a rather flat profile of FI as a function of K, with the minimum obtained for a number of groups which is larger than two, the SCM-FT reveals that FI is always clearly minimized for a number of groups K = 2. Taken together, these results extend our findings at the mesoscale level.
for the partition that maximizes the signed modularity, as defined in [43]: where • indicates quantities inside and • outside the communities (note that • and that the total number of positive links is preserved under any null model considered here).For null models with fixed topology, a stronger result holds true, i.e.Q = −L • (FI − ⟨FI⟩) so that, since L > 0, maximizing Q becomes equivalent to minimizing the difference between FI and its expected value (see the Supplementary Note 7).The minimization of FI is another popular approach to finding the optimal partition [44] which, however, neglects the information embodied in a null model.Here, we consider a varying number K = 2 . . . 10 of communities and, for each value of K, look for the partition maximizing Q, using as null model both the SRGM-FT and the SCM-FT.We then compute the value of FI as a function of K, as plotted in Fig. 6 for the CoW dataset.We find that the trends produced under the SRGM-FT are quite flat, and in no case the minimum of FI is achieved by K = 2.This result is in line with SWBT, under whose assumptions there is no specific characteristic number of communities that would characterize real networks.By contrast, the SCM-FT produces clearly increasing trends, all starting from a minimum of FI at K = 2.This result strongly supports SSBT, according to which structural balance can be achieved by placing negative links between two communities, and positive links inside them.Taken together, these results extend our finding that SWBT (SSBT) is supported by homogeneous (heterogeneous) null models.

DISCUSSION
Motivated by the widespread observation that actors in real social networks are characterized by a strong heterogeneity (typically signalled by broad distributions of node-specific topological properties), we have introduced a class of null models for signed networks characterized by either global or local constraints and with either fixed or varying topology.Our formalism provides the equivalent of various important ERGs to the domain of signed graphs.We have used our null models to address the problem of structural balance in real social networks.Our results show that the nature (weak or strong) and statistical strength of evidence of structural balance strongly depends on the null model adopted.In particular, we have shown that the occurrences of signed triangles favour SWBT when a homogeneous, global null model is considered.By contrast, SSBT is favoured by heterogeneous models with local constraints.
Generally speaking, adopting fixed-topology benchmarks seems to enhance the detection of frustration with the corresponding, homogeneous (heterogeneous) variant favouring SWBT (SSBT).As a possible behavioural explanation, we may advance the following one.Social agents are characterized by a certain level of tolerance.Such a level can be set by choosing the proper benchmark: null models constraining global quantities assume agents to be characterized on average by the same expected level of tolerance; by contrast, null models constraining local quantities account for the different levels of tolerance characterizing different agents.Let us imagine that relationships were established according to a random mechanism that preserves the total number of friends and enemies: should this be the case, our results indicate that equally tolerant agents would establish many more (+, +, −) motifs than observed; instead, realworld agents are found to avoid engaging in relationships that lead to the formation of the (+, +, −) pattern.Let us, now, refine the aforementioned picture and imagine that relationships were established according to a random mechanism that preserves the local number of friends and enemies: in this case, diversely tolerant agents would establish many more (+, +, −) and (−, −, −) motifs than observed; instead, real-world agents are found to avoid engaging in relationships that lead to the formation of both the (+, +, −) and the (−, −, −) patterns.Overall, then, agents that cannot choose with whom to interact, but only how, adopt a behaviour strongly avoiding engagement in frustrated relationships.
The same results have been extended to the mesoscale structural level, by finding that the optimal number of communities minimizing the overall level of frustration is K = 2 with respect to a heterogeneous null model (strongly supporting SSBT), while there is no characteristic optimal number with respect to a homogeneous null model (in line with SWBT).Importantly, we have considered a set of biological networks as a benchmark of real-world systems for which structural balance theory is not expected to apply.We have found a strong level of frustration in biological systems, indicating that structural balance (in either strong or weak form) indeed characterizes social networks.
Future directions along which the present analysis could be extended concern the possibility of defining ERGs for directed, as well as weighted, signed networks -the main technical difficulty lying in the proper definition of (binary, directed; weighted, both undirected and directed) constraints.The most natural application of such a formalism would be represented by the statistical validation of the so-called status theory, describing social interactions when hierarchies play a role [2].

Formalism and basic quantities
A signed graph is a graph where each edge can be positive, negative or missing.In what follows, we will focus on binary, undirected, signed networks: hence, each edge will be 'plus one', 'minus one' or 'zero'.More formally, for any two nodes i and j, the corresponding entry of the signed adjacency matrix A will be assumed to be 2 and any node pair can be positively connected, negatively connected or disconnected, the total number of possible graph configurations is |A| = 3 ( N 2 ) .To ease mathematical manipulations, let us define the following three quantities: where we have employed Iverson's brackets notation (see the Supplementary Note 1).These new variables are mutually exclusive, i.e. {a − ij , a 0 ij , a + ij } = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, sum to 1, i.e. a − ij +a 0 ij +a + ij = 1, and induce two non-negative matrices The numbers of positive and negative links are defined as Analogously, the positive and negative degrees of node i are . The advantage of adopting Iverson's brackets is that each quantity is now computed from a matrix with positive entries, so that all quantities of interest are positive as well.
Let us now follow [21], according to which 'local measures attain efficiency by focusing only on cycles of particular, usually short, length, such as 3-cycles (triads)', and consider the signed triads depicted in Fig. 7.As mentioned above, according to BT social systems tend to arrange themselves into configurations satisfying the principles 'the friend of my friend is my friend', 'the friend of my enemy is my enemy', 'the enemy of my friend is my enemy', 'the enemy of my enemy is my friend' [5].SSBT formalizes this concept by stating that the overall network balance increases with the fraction of triangles having an even number of negative edges (said to be balanced or 'positive' since the product of the edge sings is a 'plus') and decreases with the fraction of triangles having an odd number of negative edges (said to be unbalanced or 'negative' since the product of the edge sings is a 'minus').SWBT, on the other hand, considers the triangle with all negative edges balanced as well.
Notice that the product of an arbitrary number of matrices of type A + and A − allows us to count the abundance of closed walks whose signature matches the sequence of signs of the matrices.For example, the expression [A + A − A + ] ii counts the number of closed walks, starting from and ending at i, of length 3 and signature (+ − +).Similarly, the expression ii counts the number of closed walks, starting from and ending at i, of length 4 and signature (+ + −+).Therefore, the level of balance of a network can be quantified by the abundance of (non-degenerate) triangles with an even number of negative links, i.e.
Similarly, the level of frustration of a network can be quantified by the abundance of (non-degenerate) triangles with an odd number of negative links, i.e.
(see the Supplementary Note 2 for more details).The above expressions form the basis for the definition of several indices quantifying the level of balance of a network.For instance, the total number of balanced patterns according to SSBT is # sb = T (+++) + T (+−−) , while the total number of unbalanced patterns is # su = T (−−−) + T (++−) .Hence, we may naturally define a 'strong degree of balance' index (SDoB) and a corresponding 'strong degree of frustration' index (SDoF) as On the other hand, the total number of balanced patterns according to SWBT is # wb = T (+++) + T (+−−) + T (−−−) , while the total number of unbalanced patterns is # wu = T (++−) .Hence, we can introduce a 'weak degree of balance' index (WDoB) and a corresponding 'weak degree of frustration' index (WDoF) as WDoB = # wb # wb + # wu , WDoF = 1 − WDoB.(10) The indices defined above quantify imbalance by counting the abundance of locally frustrated, short cycles.Other indices of frustration account for the effect of structural (im)balance at larger scales.In particular, at the mescoscopic level, the effect of structural balance would result in a signed network being partitioned into communities of nodes, where intra-community links would be preferentially positive and inter-community links would be preferentially negative.Correspondingly, one can define the frustration index measuring the percentage of misplaced links, i.e. the total number L + • of positive links between communities, plus the total number L − • of negative links within communities, divided by the total number L of links (the formalism is adapted from the one in [45]).According to SSBT, the node partition minimizing frustration (and, correspondingly, the FI) should be the one corresponding to only two communities, because such bipartition can be realized without creating all-negative triangles.By contrast, SWBT allows for a larger number of communities, because the theory justifies the presence of all-negative triangles precisely by assuming that the three participating nodes are all placed in different communities.

Null models of binary, undirected, signed graphs
Here we generalize the ERG framework to account for models of binary, undirected, signed graphs.We will follow the analytical approach introduced in [46], and further developed in [47], aimed at identifying the functional form of the maximum-entropy probability distribution (over all graphs of a chosen type) that preserves a desired set of empirical constraints on average.Specifically, this approach looks for the graph probability P (A) that maximizes Shannon entropy (where the sum runs over the set A, of cardinality |A| = 3 ( N 2 ) , of all binary, undirected, signed graphs) under a set of constraints enforcing the expected value of a chosen set of properties.The formal solution to this problem is the exponential probability P (A) = e −H(A) /Z where H(A) (the Hamiltonian) is a linear combination of the constrained properties, each multiplied by a corresponding Lagrange multiplier, and Z = A∈A e −H(A) is the normalizing constant (or partition function).
In what follows, we will consider two classes of models, i.e. those keeping the network topology fixed and those letting the topology vary along with the edge signs.The first class is better suited for studying systems where actors cannot choose 'with whom' to interact, but only 'how' (e.g. because workers necessarily interact with colleagues at the same workplace or because countries necessarily interact with each other).On the other hand, the second class is better suited for studying systems where actors can choose their neighbours as well [17].Whatever the situation, comparing the two types of models for the same network is in any case instructive, as it allows the role played by signed constraints to be disentangled from the one played by non-signed (purely topological) constraints.

Signed Random Graph Model
As the simplest example, the Signed Random Graph Model (SRGM) is defined by two, global constraints: L + (A) and L − (A).The Hamiltonian leads to a graph probability P SRGM (A) that factorizes over the individual entries of the matrix A, which are i.i.d.random variables described by the finite scheme with p 0 ≡ 1 − p − − p + and where x ≡ e −α and y ≡ e −β are transformed Lagrange multipliers (see the Supplementary Note 3 for more details).In other words, positive, negative and missing links appear with probability p + , p − and p 0 respectively.The parameters (x, y) determining these probabilities are tuned by maximizing the log-likelihood function L SRGM (x, y) ≡ ln P SRGM (A * |x, y) where A * denotes the specific, empirical network under analysis.This maximization, according to a general result [48], leads to an equality between the expected and the empirical values of the constraints, i.e. ⟨L + ⟩ SRGM = L + (A * ) and ⟨L − ⟩ SRGM = L − (A * ) .This leads to p 0 ≡ 1 − p − − p + and

Signed Random Graph Model with Fixed Topology
We can also consider a variant of the SRGM that keeps the topology of the network under analysis fixed while (solely) randomizing the edge signs.The Hamiltonian is again H(A) = αL + (A) + βL − (A), but the random variables are now only the entries of the adjacency matrix corresponding to the connected pairs of nodes in the original network A * , i.e. the ones for which |a * ij | = 1.These entries obey the finite scheme with In other words, each entry for which |a * ij | = 1 obeys a Bernoulli distribution with probabilities determined by the (Lagrange multipliers of the) imposed constraints (see the Supplementary Note 3 for more details).The maximization of the likelihood function L SRGM-FT (x, y) ≡ ln P SRGM-FT (A * |x, y) (where FT stands for 'fixed topology') leads to with L(A * ) representing the (empirical) number of links.
The SRGM and the SRGM-FT are related via the simple expression involving the probability of the usual 'unsigned' (Erdős-Rényi) Random Graph Model (RGM) and stating that the probability of connecting any two nodes with, say, a positive link can be rewritten as the probability of connecting them with an unsigned link times the probability of assigning the latter a 'plus one': in formulas, p + SRGM /p + RGM = p + SRGM-FT (see the Supplementary Note 3 for more details).Notice that if the network under analysis is completely connected, the SRGM and the SRGM-FT coincide.
Although the recipes implemented in [15] and [2,31] are similar in spirit to the SRGM and the SRGM-FT, we provide the rigorous derivation of both models, together with the proof that the latter is nothing but the conditional version of the former.

Signed Configuration Model
The two aforementioned versions of the SRGM are defined by constraints which are global in nature.However, real social networks are characterized by an inherent heterogeneity of actors, which results in broad distributions of the number of connections of actors.To avoid statistical conclusions about structural balance that are biased by the application of homogeneous null models to intrinsically heterogeneous networks, it is therefore important to introduce models with local (node-specific) constraints.
We, therefore, introduce the Signed Configuration Model (SCM) via the Hamiltonian which constraints the expected value of the signed degrees {k + i (A)} N i=1 and {k − i (A)} N i=1 of all nodes.The resulting graph probability P SCM (A) is still factorized over independent entries of the matrix A, however these entries are no longer identically distributed.Rather, they obey the finite scheme with and p 0 ij ≡ 1 − p − ij − p + ij (see the Supplementary Note 3 for more details).In other words, the two nodes i and j are connected by a positive, negative or missing link with probability p + ij , p − ij or p 0 ij respectively.The parameters of the SCM are found by maximizing the log-likelihood ), and the result ensures that ⟨k which is a system of 2N coupled non-linear equations that have a unique solution to be found numerically, e.g.following the guidelines provided in [49] (see the Supplementary Note 4).If x i ≪ 1 and y i ≪ 1 ∀ i, a 'sparse' approximation of the SCM holds true and one can factorize the probabilities as p + ij ≃ x i x j and p − ij ≃ y i y j , ∀ i < j.Such a manipulation leads us to a result that we may call the Signed Chung-Lu Model (SCLM).
To the best of our knowledge, the canonical SCM described here has no precedents in the literature: Ref. [10] provides a microcanonical version of the model, while the variant considered in [32] is just an approximation of the full canonical model derived here.Notice that the bipartite version of the SCM can be recovered as a special case of the Bipartite Score Configuration Model, proposed in [35].

Signed Configuration Model with Fixed Topology
As for the SRGM, a variant of the SCM that keeps the topology of the network under analysis fixed while (solely) randomizing the signs of the edges can be defined.Again, the Hamiltonian reads ] but the only random variables are those corresponding to the connected pairs of nodes in the empirical graph, i.e. the ones for which |a * ij | = 1.Each of them obeys the finite scheme Maximizing the log-likelihood ) leads to the equations which can be solved numerically -again, along the guidelines provided in [49] (see the Supplementary Note 4 for more details).
Similarly to what has been observed for the SRGM and the SRGM-FT, the SCM and the SCM-FT are related via an expression involving the probability of an ordinary (unsigned) 'induced' Configuration Model (ICM) with probabilities such that (p , for any pair of nodes (see the Supplementary Note 3).Notice that, if the network under consideration is completely connected, then the SCM and the SCM-FT coincide.Data concerning CoW are described in [37] and can be found at the address http://mrvar.fdv.uni-lj.si/pajek/SVG/CoW/.Data concerning E. coli, Macrophage, EGFR, N.G.H. Tribes and Monastery are are described in [27] and can be found at the address https://figshare.com/articles/dataset/Signed_networks_from_sociology_and_political_science_ biology_international_relations_finance_and_ computational_chemistry/5700832.Data concerning Bitcoin Alpha and Bitcoin OTC are described in [19] and can be found at the address https://figshare.com/articles/dataset/Dataset_of_directed_signed_ networks_from_social_domain/12152628.Data concerning MMOG, described in [38] are subject to proprietary restrictions and cannot be shared.

CODE AVAILABILITY
The codes implementing the null models employed for the present analysis are available upon request.

The three functions a
] have been defined via the Iverson's brackets notation.Iverson's brackets work in a way that is reminiscent of the Heaviside step function, i.e.Θ[x] = [x > 0]; in fact, (i.e. a − ij = 1 if a ij < 0 and zero otherwise), (i.e. a 0 ij = 1 if a ij = 0 and zero otherwise), (i.e. a + ij = 1 if a ij > 0 and zero otherwise).The matrices

SUPPLEMENTARY NOTE 2 COUNTING TRIANGLES ON BINARY, UNDIRECTED, SIGNED NETWORKS
A well-known result states that the abundance of node-specific, unsigned triangles reads let us, now, consider signed networks: the abundances of node-specific, signed triangles with an even number of negative links read while the abundances of node-specific, signed triangles with an odd number of negative links read Let us, now, write the expressions for the total abundances of signed triangles with an even number of negative links: analogously, for the abundances of triangles with an odd number of negative links, reading (where each numeric factor avoids the corresponding pattern to be overcounted).
Since the trace of a matrix is invariant under a cyclic permutation of the members of its argument, the following result holds true further implying that T (+−−) = T (−+−) = T (−−+) .As a consequence, the number of balanced patterns according to either variant of the SBT can be defined in several, equivalent ways, i.e. # sb = T (+++) + T

SUPPLEMENTARY NOTE 3 PROBABILISTIC MODELS FOR BINARY, UNDIRECTED, SIGNED NETWORKS
The generalization of the ERG formalism for the analysis of binary, undirected, signed graphs rests upon the constrained maximization of Shannon entropy, i.e.
where S = − A∈A P (A) ln P (A), C 0 ≡ ⟨C 0 ⟩ ≡ 1 sums up the normalization condition and the remaining M − 1 constraints represent proper, topological properties.Such an optimization procedure defines the expression that can be made explicit only after a specific set of constraints has been chosen.

Signed Random Graph Model
The first set of constraints we consider is represented by the properties L + (A) and L − (A).The Hamiltonian describing such a problem reads as a consequence, the partition function reads and induces the expression , where p + is the probability that any two nodes are linked by a positive edge, p − is the probability that any two nodes are linked by a negative edge and p 0 is the probability that any two nodes are no linked at all.Hence, according to the SRGM, each entry of a signed network is a random variable following a generalized Bernoulli distribution, i.e. obeying the finite scheme notice that while the expected value of the random variable a ij reads -such a notation, introduced by Khintchine in Mathematical Foundations of Information Theory, compactly represent a discrete probability distribution, by listing its support on the first row and the probability of the elementary events constituting it on the second row.As a consequence, any network belonging to A is a collection of i.i.d.random variables and obeys the finite scheme i.e. the direct product of the N 2 finite schemes above.
The probability, under the SRGM, that a graph has exactly L + positive links and L − negative links reads in other words, it is a multinomial distribution, i.e. a generalization of the binomial distribution in case there are more than two, possible outcomes for each trial.The combinatorial factor , is the (multinomial) coefficient counting the total number of ways L links (L + of which are positive and L − of which are negative) can be placed among the node-pairs.Hence, Supplementary Equation (65) also represents the total number of graphs with a given number of signed links.
Naturally, it is possible to define the marginal random variables a + ij ∼ Ber(p + ) and a − ij ∼ Ber(p − ) which, in turn, induce the marginal probability distributions P (L − ) = Bin N 2 , p − , P (L 0 ) = Bin N 2 , p 0 and P (L + ) = Bin N 2 , p + ; from the latter ones, it follows that the total number of expected, positive links reads ⟨L + ⟩ = N 2 p + while the total number of expected, negative links reads In other words, it is possible to define a 'traditional' Random Graph Model whose parameter is p ≡ p(a ij = −1) + p(a ij = +1) = p − + p + .Let us, now, move to describe the behaviour of the degree.The probability, under the SRGM, that a node has exactly k + positive links and k − negative links reads again, it obeys a multinomial distribution.The combinatorial factor is the (multinomial) coefficient counting the total number of ways k links (k + of which are positive and k − of which are negative) can be placed among the N − 1 node-pairs each node individuates.The marginal random variables a + ij ∼ Ber(p + ) and a − ij ∼ Ber(p − ) also induce the marginal probability distributions P (k − ) = Bin (N − 1, p − ), P (k 0 ) = Bin N − 1, p 0 and P (k + ) = Bin (N − 1, p + ); from the latter ones, it follows that the expected, positive degree reads ⟨k + ⟩ = (N − 1)p + while the expected, negative degree reads In order to determine the parameters that define the SRGM, let us maximize the likelihood function with respect to x and y.Upon doing so, we obtain the pair of equations equating them to zero leads us to find Naturally, p 0 ≡ 1 − p − − p + .

Signed Random Graph Model with fixed topology
Let us, again, consider the properties L + (A) and L − (A), to be satisfied by keeping a network topology fixed.In what follows, we will indicate the adopted topology as the one induced by the matrix A * .The Hamiltonian describing such a problem still reads but induces a partition function reading in other words, the support of the distribution becomes the set of node pairs i, j, with i < j, such that |a * ij | = 1, inducing a set of admissible configurations whose cardinality amounts at 2 L .The expression above leads us to find having posed p − ≡ e −β e −α +e −β ≡ y x+y and p + ≡ e −α e −α +e −β ≡ x x+y where p + is the probability that any two, connected nodes are linked by a positive edge and p − is the probability that any two, connected nodes are linked by a negative edge.Hence, according to the SRGM-FT, the generic entry of a signed network satisfying |a ij | = |a * ij | = 1 is a random variable following a Bernoulli distribution, i.e. obeying the finite scheme The probability, under the SRGM-FT, that a graph has exactly L + positive links reads i.e. it is a binomial distribution, with L indicating the total number of unsigned links.As a consequence, the total number of expected, positive links reads ⟨L + ⟩ = Lp + ; analogously, L − ∼ Bin(L, p − ).Similarly, the probability, under the SRGM-FT, that node i establishes exactly k + i positive links reads again, it is a binomial distribution, with k i indicating the unsigned degree of node i.As a consequence, the expected, positive degree of node i reads ⟨k In order to determine the parameters that define the SRGM-FT, let us maximize the likelihood function with respect to x and y.Upon doing so, we obtain the pair of equations equating them to zero leads us to find For an illustrative example of the empirical distributions of L + , L − , k + and k − under our homogeneous network models see Supplementary Figures 8, 9, 10 and 11.

Signed Random Graph Model: free VS fixed topology
In order to clarify the relationship between the SRGM and the SRGM-FT, let us write since the RGM induced by the SRGM satisfies the relationship p ≡ p − + p + , one has that the parameters defining the SRGM-FT being, now, p − /p and p + /p.Beside having have an intuitive meaning, i.e.
these expressions are also consistent with the estimations of the parameters obtained via the likelihood maximization: in fact,

Signed Configuration Model
The second set of constraints we consider is represented by the properties {k + i (A)} N i=1 and {k − i (A)} N i=1 .The Hamiltonian describing such a problem reads as a consequence, the partition function reads (1 + e −(αi+αj ) + e −(βi+βj ) ) (87) and induces the expression 1+e −(α i +α j ) +e −(β i +β j ) ≡ xixj 1+xixj +yiyj , where p + ij is the probability that nodes i and j are linked by a positive edge, p − ij is the probability that nodes i and j are linked by a negative edge and p 0 ij is the probability that nodes i and j are no linked at all.Hence, according to the SCM, the generic entry of a signed network is a random variable following a generalized Bernoulli distribution, i.e. obeying the finite scheme as a consequence, any network belonging to A is a collection of independent random variables, each one obeying the finite scheme i.e. the direct product of the N (N −1)
In the case of the SCM, L + is a random variable obeying the Poisson-Binomial distribution that we indicate as PoissBin Hence, the total number of expected, positive links reads while the total number of expected, negative links reads In order to determine the parameters that define the SCM, let us maximize the likelihood function with respect to x i and y i , ∀ i. Upon doing so, we obtain the system of equations equating them to zero leads us to find Although the system above can be solved only numerically, particular conditions exist under which the equations constituting it can be approximated and solved explicitly.They are collectively named 'sparse-case' approximation of the SCM and hold true whenever x i ≪ 1 and y i ≪ 1, ∀ i.In this case, one can pose p + ij ≃ x i x j and p − ij ≃ y i y j , ∀ i < j, which allow the equations above to be simplified as follows the latter ones induce the expressions and , ∀ i, allowing us to find The system of equations above is also known with the name of Signed Chung-Lu Model (SCLM).

Signed Configuration Model with fixed topology
Let us, again, consider the properties {k + i (A)} N i=1 and {k − i (A)} N i=1 , to be satisfied by keeping a network topology fixed.As usual, we will indicate the adopted topology as the one induced by the matrix A * .The Hamiltonian describing such a problem still reads but induces a partition function reading which, in turn, induces the expression e −(αi+αj ) + e −(βi+βj e −(α i +α j ) +e −(β i +β j ) ≡ xixj xixj +yiyj where p + ij is the probability that nodes i and j are linked by a positive edge and p − ij is the probability that nodes i and j are linked by a negative edge.Hence, according to the SCM-FT, the generic entry of a signed network satisfying |a ij | = |a * ij | = 1 is a random variable following a Bernoulli distribution, i.e. obeying the finite scheme In the case of the SCM-FT, L + is a random variable obeying the Poisson-Binomial distribution that we indicate as PoissBin L, {p Hence, the total number of expected, positive links reads In order to determine the parameters that define the SCM-FT, let us maximize the likelihood function with respect to x i and y i , ∀ i. Upon doing so, we obtain the system of equations equating them to zero leads us to find The system above can be solved only numerically.
y i y j /(1 + x i x j + y i y j ) x i x j /(1 + x i x j + y i y j ) + y i y j /(1 + x i x j + y i y j ) = y i y j x i x j + y i y j , (111) x i x j /(1 + x i x j + y i y j ) x i x j /(1 + x i x j + y i y j ) + y i y j /(1 + x i x j + y i y j ) = x i x j x i x j + y i y j (112) hence inducing the probability distribution of the SCM-FT, beside keeping the intuitive meaning made explicit by the expressions p − ij /p ij = P ('link −' | 'link') and p + ij /p ij = P ('link +' | 'link').The first one, on the other hand, can be identified with the probability distribution of the 'induced' CM: where z i ≡ (x i , y i ) is the vector of fitnesses of node i, z i ≡ |z i | = (x 2 i + y 2 i ) is its modulus and cos ϕ ij is the cosinus of the angle between vectors z i and z j .As a consequence, we can write Notice that when z i = (x i , 0) and z j = (x j , 0), cos ϕ ij = 1 and p ij = p + ij = xixj 1+xixj , i.e. the 'induced' CM reduces to the proper CM: in this case, in fact, the information about signs is 'redundant' as k i (A * ) = k + i (A * ) and k j (A * ) = k + j (A * ).On the other hand, when z i = (x i , 0) and z j = (0, y j ), cos ϕ ij = 0 and p ij = 0, i.e. nodes i and j cannot be linked: in this case, in fact, k i (A * ) = k + i (A * ) but k j (A * ) = k − j (A * ), whence the impossibility of (consistently) attributing a sign to the edge between i and j.

SUPPLEMENTARY NOTE 4 NUMERICAL OPTIMIZATION OF LIKELIHOOD FUNCTIONS
In order to numerically solve the systems of equations defining the SCM and the SCM-FT, we can follow the guidelines provided in Supplementary Reference [49]: more specifically, we will adapt the iterative recipe provided there to our (binary, undirected, signed) setting.First, let us consider the SCM whose system of equations can be rewritten as analogously, the system of equations defining the SCM-FT can be rewritten as In order for each iterative recipe to converge, an appropriate vector of initial conditions need to be chosen; here, we have opted the following ones: Besides, we have adopted two, different stopping criteria: the first one is a condition on the Euclidean norm of the vector of differences between the values of the parameters at subsequent iterations, i.e. ||∆ ⃗ θ|| 2 = N i=1 (∆θ i ) 2 ≤ 10 −8 ; the second one is a condition on the maximum number of iterations of our iterative algorithm, set to 10 3 .
The accuracy of our method in estimating the constraints has been evaluated by computing the maximum absolute error (MAE), defined as (i.e. as the infinite norm of the difference between the vector of the empirical values of the constraints and the vector of their expected values) and the maximum relative error (MRE), defined as (i.e. as the infinite norm of the relative difference between the vector of the empirical values of the constraints and the vector of their expected values).The Supplementary Tables II, III and IV sum up the time employed by our algorithm to converge as well as its accuracy in reproducing the constraints defining the SCM and the SCM-FT on each network considered in the present contribution.Overall, our method is fast and accurate: the numerical errors never exceed O(10 −1 ) and the time employed to achieve such an accuracy never exceeds minutes.To be noticed that the time required by our algorithm to solve the SCM is usually smaller than that required to solve the SCM-FT -although such a difference rises with the size of the considered configuration.

Fig. 1 :
Fig. 1: Balanced and unbalanced motifs.Fundamental triadic patterns, or motifs, considered as balanced (blue) and unbalanced (red) by the strong (a) and weak (b) versions of the balance theory.

Fig. 2 :
Fig. 2: Structural (im)balance in the Correlates of Wars and MMOG datasets under homogeneous benchmarks.Structural (im)balance in the CoW and MMOG datasets: evolution of the z-scores of signed triangles under homogeneous benchmarks, i.e. the Signed Random Graph Model (SRGM) and the Signed Random Graph Model with Fixed Topology (SRGM-FT).(a)−(b) -13 snapshots (of 4 years each) of the CoW dataset, covering the period 1946-1997.(c)−(d) -10 snapshots of the MMOG dataset.(b)−(d) -The SRGM-FT supports the structural weak balance theory (SWBT) because the only significantly under-represented pattern in the data is also the only one that SWBT considers frustrated (triangle with only one negative link), while the z-score of the triangle with all negative edges (which the structural strong balance theory would expect to be under-represented as well) is very low.In any case, the hypothesis that nodes tend to establish balanced triangles with all positive links is supported on both datasets.Results of this type constitute the backbone of the narrative according to which the weak version of the structural balance theory (SBT) is the one that is better supported by data.(a)−(c) -Note that the SRGM has all z-scores positive, thereby not supporting any version of SBT, a result due to the complete randomization of the topology along with the edge signs: the over-representation of all patterns in the data is merely due to the fact that triangles form with small probability at a purely topological level, given the low link density, irrespective of their signs.

Fig. 3 :
Fig. 3: Structural (im)balance in the Correlates of Wars and MMOG datasets under heterogeneous benchmarks.Structural (im)balance in the CoW and MMOG datasets: evolution of the z-scores of signed triangles under heterogeneous benchmarks, i.e. the Signed Configuration Model (SCM) and the Signed Configuration Model with Fixed Topology (SCM-FT).(a)−(b) -13 snapshots (of 4 years each) of the CoW dataset, covering the period 1946-1997.(c)−(d) -10 snapshots of the MMOG dataset.The z-scores produced by the SCM (a)−(c) and the SCM-FT (b)−(d) are much smaller, in absolute value, than the corresponding ones produced by the Signed Random Graph Model and the Signed Random Graph Model with Fixed Topology (see Fig.2), showing that node heterogeneity contributes significantly to the overall abundance of signed triangles.The all-positive (balanced) triangle is still strongly over-represented in all cases, but additionally the all-negative (frustrated) triangle is now always under-represented.Under the SCM-FT, the other frustrated triangle (the one with a single negative link) is also systematically under-represented, and these combined results provide support for the structural strong balance theory (SSBT) (particularly evidently for the MMOG data).By contrast, the structural weak balance theory (SWBT) (according to which one would expect the under-representation of only the triangle with a single negative link) is no longer supported.These results provide an alternative narrative w.r.t. the usual one: when the heterogeneity of the signed degrees of nodes is accounted for, statistical evidence supports SSBT rather than SWBT.

Fig. 4 :
Fig. 4: Structural (im)balance in social and biological networks under homogeneous and heterogeneous benchmarks.Structural (im)balance in social and biological networks under homogeneous (a)−(b) and heterogeneous (c)−(d) null models: z-scores of signed triangles for three, socio-political networks (N.G.H. Tribes, Senate US, Monastery), two, financial networks (Bitcoin Alpha, Bitcoin OTC) and, as a comparison, three, biological networks (E.Coli, Macrophage, EGFR).The Signed Random Graph Model (SRGM) produces z-scores that are almost always positive and very large for all triangles (balancedand unbalanced) and all signed networks (social and biological), a result confirming that this null model is completely uninformative about structural (im)balance, as it merely highlights that the formation of any triangle, irrespective of its signs, is highly unlikely if the topology is randomized completely.By contrast, the Signed Random Graph Model with Fixed Topology (SRGM-FT) largely supports the structural weak balance theory on social networks, as the only pattern under-represented in the data is the frustrated triangle with a single negative link.Heterogeneous null models, i.e. the Signed Configuration Model (SCM) and the Signed Configuration Model with Fixed Topology (SCM-FT)

Fig. 5 :
Fig. 5: Analysis of the z-scores of the degree of frustration indices.Analysis of the z-scores of the strong degree of frustration defined in Eq. 9 and the weak degree of frustration defined in Eq. 10. (a)−(b) -13 snapshots of 4 years each of the Correlates of Wars dataset (covering the period 1946-1997).(c)−(d) -10 snapshots of the MMOG dataset.(e)−(f ) -Set of social and biological networks.z-scores are computed under the Signed Random Graph Model (SRGM) (•), the Signed Random Graph Model with Fixed Topology (SRGM-FT) (•), the Signed Configuration Model (SCM) (•) and the Signed Configuration Model with Fixed Topology (SCM-FT) (•).We see that, with respect to all null models, frustration is under-represented in all social network data and over-represented in all biological data.

Fig. 6 :
Fig. 6: Values of the frustration index on several, optimal partitions of the Correlates of Wars dataset.Value of the frustration index (FI) on several, optimal partitions of the 13 snapshots (of 4 years each) of the CoW dataset, each obtained by maximizing the modularity Q = −L • (FI − ⟨FI⟩) for a given number K of modules (communities), using as null models the Signed Random Graph Model with Fixed Topology (SRGM-FT) (a) and the Signed Configuration Model with Fixed Topology (SCM-FT) (b).While the SRGM-FT reveals a rather flat profile of FI as a function of K, with the minimum obtained for a number of groups which is larger than two, the SCM-FT reveals that FI is always clearly minimized for a number of groups K = 2. Taken together, these results extend our findings at the mesoscale level.

Fig. 8 :
Fig. 8: Empirical distributions of L + and L − under the Signed Random Graph Model.(a)−(b) -Empirical, joint distribution of L + and L − over an ensemble of 10.000 configurations induced by the Signed Random Graph Model (SRGM) whose parameters have been tuned to N = 100, p − = 0.2, p 0 = 0.3 and p + = 0.5 (a) and multinomial distribution Multi N 2 , {p − , p 0 , p + } (b): the two have been sided for a visual comparison.(c)−(d) -Distributions of L + (blue dots) and L − (red dots) over an ensemble of 100.000 configurations induced by the SRGM whose parameters have been tuned to N = 100, p + = 0.4, p − = 0.2 (c) and N = 100, p + = 0.2, p − = 0.7 (d).The red, dashed lines represent the binomial distributions Bin N 2 , p − while the blue, solid lines represent the binomial distributions Bin N 2 , p + .

Fig. 9 :
Fig. 9: Empirical distributions of k + and k − under the Signed Random Graph Model.(a)−(b) -Empirical, joint distribution of k + and k − over an ensemble of 10.000 configurations induced by the Signed Random Graph Model (SRGM) whose parameters have been tuned to N = 100, p − = 0.3, p 0 = 0.1 and p + = 0.6 (a) and multinomial distribution Multi N − 1, {p − , p 0 , p + } (b): the two have been sided for a visual comparison.(c)−(d) -Distributions of k + (blue dots) and k − (red dots), for an arbitrarily chosen node, over an ensemble of 100.000 configurations induced by the SRGM whose parameters have been tuned to N = 100, p + = 0.4, p − = 0.2 (c) and N = 100, p + = 0.2, p − = 0.7 (d).The red, dashed lines represent the binomial distributions Bin (N − 1, p − ) while the blue, solid lines represent the binomial distributions Bin (N − 1, p + ).

Fig. 10 :
Fig. 10: Empirical distributions of L + and L − under the Signed Random Graph Model with Fixed Topology.Distributions of L + (blue dots) and L − (red dots) over an ensemble of 100.000 configurations induced by the Signed Random Graph Model with Fixed Topology (SRGM-FT) whose parameters have been tuned to N = 100, p + = 0.3, p − = 0.7 (a) and N = 100, p + = 0.8, p − = 0.2 (b).The fixed topologies have been chosen by sampling a 'traditional' Random Graph Model with p = 0.8 (a) and p = 0.6 (b).The red, dashed lines represent the binomial distributions Bin (L, p − ) while the blue, solid lines represent the binomial distributions Bin (L, p + ).

Fig. 11 :
Fig. 11: Empirical distributions of k + and k − under the Signed Random Graph Model with Fixed Topology.Distributions of k + (blue dots) and k − (red dots), for an arbitrarily chosen node, over an ensemble of 100.000 configurations induced by the Signed Random Graph Model with Fixed Topology (SRGM-FT) whose parameters have been tuned to N = 100, p + = 0.3, p − = 0.7 (a) and N = 100, p + = 0.8, p − = 0.2 (b).The fixed topologies have been chosen by sampling a 'traditional' Random Graph Model with p = 0.8 (a) and p = 0.6 (b).The red, dashed lines represent the binomial distributions Bin (k, p − ) while the blue, solid lines represent the binomial distributions Bin (k, p + ).

Fig. 12 :Fig. 13 :
Fig. 12: Sample VS analytical z-scores of triadic motifs.Sample VS analytical z-scores of our motifs, for each null model and each snapshot of the Correlates of Wars dataset.Each ensemble is constituted by 10.000 realizations.

Fig. 14 :
Fig. 14: Alternative representation of the z-scores of triadic motifs.⟨N m ⟩ ± 2σ[N m ] values, with N m indicating the abundance of motif m, for each null model and a bunch of networks.Colors refer to the Signed Random Graph Model (SRGM) (•), the Signed Random Graph Model with Fixed Topology (SRGM-FT) (•), the Signed Configuration Model (SCM) (•) and the Signed Configuration Model with Fixed Topology (SCM-FT) (•).

Table IV :
Performance of the fixed-point algorithm to solve the systems of equations defining the Signed Configuration Model (SCM) and the Signed Configuration Model with Fixed Topology (SCM-FT) on a bunch of real-world networks.As each of our null models treats links independently, the ensemble it induces can be sampled quite straightforwardly as follows.Pseudocode for sampling the Signed Random Graph Model Pseudocode for sampling the Signed Random Graph Model with Fixed Topology − < u ≤ (p − + p + ) then An estimation of the time needed to sample the ensemble induced by each of our models, for each of our datasets, is reported in the Supplementary Tables V, VI and VII.Algorithm 4: Pseudocode for sampling the Signed Configuration Model with Fixed Topology 1: A ← N × N matrix with 0, 1 entries; 2: for i = 1 . . .N do 3: for j = i + 1 . . .N do networks 1 network 10 3 networks 1 network 10 3 networks 1 network 10 3 networks CoW, 1946-49 ≃ 0.002

Table V :
Time required by the Signed Random Graph Model (SRGM), Signed Random Graph Model with Fixed Topology (SRGM-FT), Signed Configuration Model (SCM) and the Signed Configuration Model with Fixed Topology (SCM-FT) to sample its ensemble -CoW dataset.

Table VI :
Time required by the Signed Random Graph Model (SRGM), Signed Random Graph Model with Fixed Topology (SRGM-FT), Signed Configuration Model (SCM) and the Signed Configuration Model with Fixed Topology (SCM-FT) to sample its ensemble -MMOG dataset.networks 1 network 10 3 networks 1 network 10 3 networks 1 network 10 3 networks

Table VII :
Time required by the Signed Random Graph Model (SRGM), Signed Random Graph Model with Fixed Topology (SRGM-FT), Signed Configuration Model (SCM) and the Signed Configuration Model with Fixed Topology (SCM-FT) to sample its ensemble -socio-political, biological, financial networks.is different from zero, i.e. any two triads co-variate, if they share an edge.In this case, they form a diamond whose vertices can be labelled as i ≡ l, j ≡ m, k, n and induce the expression Cov[a I , a J ] = p + ij p + jk p + ki p + jn p + ni − (p + ij ) 2 p + jk p + ki p + jn p + ni = p + ij (1 − p + ij )p + jk p + ki p + jn p + ni .
. Overall, then, I<J Cov[a I , a J ] = 3!The analysis of covariances requires a more detailed explanation.Let us consider that the generic addendum ofI<J Cov[a I , a J ] = i<j<k<n Cov[a + I Var[a I ] + 2 I<J Cov[a I , a J ] = For what concerns motif T ++− , its standard deviation reads σ[T (++−) ] = I Var[a I ] + 2 I<J Cov[a I , a J ](135)whereI Var[a I ] = Cov[X, A] + Cov[X, B] + Cov[X, C] + Cov[Y, A] + Cov[Y, B] + Cov[Y, C] + Cov[Z, A] + Cov[Z, B] + Cov[Z, C]. (138)Let us focus on Cov[X, A] = Cov[a + ij a + jk a − ki , a + lm a + mn a − nl ] and consider that the aforementioned 3! = 6 pairs of triads leads to the following events