Polarization and multiscale structural balance in signed networks

Polarization, understood as a division into mutually hostile groups, is a common feature of social systems. It is studied in Structural Balance Theory (SBT) in terms of semicycles in signed networks. However, enumerating semicycles is computationally expensive, so approximations are often needed. Here we introduce Multiscale Semiwalk Balance (MSB) approach for measuring degree of balance (DoB) in (un)directed, (un)weighted signed networks by approximating semicycles with closed semiwalks. It allows for selection of the resolution of analysis appropriate for assessing DoB motivated by Locality Principle (LP), which posits that patterns in shorter cycles are more important than in longer ones. Our approach overcomes several limitations affecting walk-based approximations, and provides methods for assessing DoB at various scales, from graphs to individual nodes, and for clustering signed networks. We demonstrate its effectiveness by applying it to real-world social systems, for which it produces explainable results consistent with expectations based on domain-specific knowledge.


Introduction
Networks are used in many branches of science and engineering for modeling complex systems.Depending on the context, they may be undirected (ties are bidirectional) or directed and weighted (ties have weights which usually indicate strength) or unweighted [34].Moreover, some networks are signed, or have links that are either positive or negative, and thus can be used to model valenced relations such as liking and disliking, or alliances and war [9,12,24,45].Signed networks are commonly used for representing systems capable of polarization, or clustering into groups with positive in-group and negative out-group ties.As a result, they have long been important to social scientists interested in polarization and differentiation processes inherent to formation of groups, arXiv:2303.16774v4[physics.soc-ph]24 Oct 2023 attitudes and opinions [7,9,33,40,48].However, signed networks are also used in other disciplines for modeling diverse phenomena such as brain activation [37], ecological interactions [38], and financial time series [16].Moreover, it is often not only the signs that matter, but also the weights indicating intensities of particular relations.Therefore, principled methods for analyzing signed networks, possibly with weights, are important for many applications.
Since signed networks represent valenced relations, a fundamental question concerns the degree to which positive and negative ties are consistent with respect to notions of (anti)transitivity, and whether these microscopic patterns give rise to a polarized macroscopic organization into mutually antagonistic clusters.Both problems are studied in Structural Balance Theory (SBT) [5,19], which originated from Gestalt psychology and the work of Fritz Heider [21], who proposed that positive relations should be transitive (a friend of my friend is my friend) and negative relations antitransitive (an enemy of my enemy is my friend), e.g. two positively (negatively) linked nodes should have identical (opposite) signs on their ties to shared neighbours.
These considerations were later formalized and generalized in graph-theoretic terms and used to demonstrate that (anti)transitivity of (negative) positive relations is directly linked to the properties of cycles, and as a result to clustering and polarization.Namely, polarized systems clustered in exactly two antagonistic groups, in which in-group ties are exclusively positive and out-group ties negative, require that all cycles are positive, or that the products of the signs of their edges are positive [5] (strong balance property; see Fig. 1 for a visual explanation and some examples).Systems clustered into b ≥ 2 antagonistic blocks require that there are no cycles with exactly one negative edge (weak balance property) [6].See Methods, Sec.4.1, for an overview of the main definitions and theorems of SBT, including their general form applicable to directed networks based on the notion of semicycles.
SBT specifies strict requirements for signed networks to be balanced (partitioned in antagonistic groups), but real-world systems are rarely organized neatly enough to satisfy them completely.This is why a lot of work in SBT is concerned with measures of the degree of balance (DoB), or partial balance [3], which can be seen as indicators of a "distance" from the perfectly balanced state.Such measures are typically directly or indirectly related to the relative frequencies of positive and negative cycles (or cycles with exactly one negative edge in the case of weak balance).
However, measuring structural balance in practice is not trivial.While defining DoB at the level of cycles of a particular length k is simple, since in this case the raw proportion of balanced cycles is meaningful, any global DoB measure has to integrate information across cycles of many different lengths and it is not immediately clear how this should be done.The difficulty comes from the fact that typically longer cycles will be much more numerous than shorter ones, so a simple proportion will be determined primarily by patterns found in long cycles, but this may not be a desirable property.Indeed, already Cartwright and Harrary hypothesized that shorter cycles should matter more when evaluating DoB [5].Moreover, this intuition has been later justified empirically by demonstrating that it is easier for people to memorize valences of ties in shorter cycles [49].More recently, analyses based on counting simple cycles demonstrated that real networks often have a relatively low cycle length threshold after which DoB measures quickly decrease, indicating that structural balance is found primarily in structures at smaller scales [18].
Applying SBT in practice is further complicated by the fact that enumerating and counting cycles is computationally expensive, especially for large graphs.This problem can be partially alleviated with novel algorithms and sampling methods, but exact solutions will always remain prohibitively expensive due to the nature of the problem.Moreover, the current state-of-theart sampling methods [18] are limited to "grayscale" measures which quantify DoB for cycles of particular lengths and they do not offer any principled way for aggregating them into a single .Schematic explanation of the meaning of high and low polarization and its general connection to the frequencies of (strongly) balanced and unbalanced cycles.(Top) Two networks of bill co-sponsorship in the U.S. Senate.Positive ties (blue) link senators, who tended to promote the same bills together more often than by chance, and negative ties (red) correspond to those who collaborated less often than at random (see Sec. 2.5 for the data description and a detailed analysis).(Bottom) All possible (strongly) balanced and unbalanced triads (3-cycles) and the general relationship between counts of (un)balanced cycles and polarization (note that, even though only 3-cycles are depicted, the general relationship pertains to cycles of all lengths).Low polarization is characterized by comparable frequencies, or more rarely by a majority of unbalanced cycles.High polarization implies that there is a clear majority of balanced cycles.(A) 96th Congress (1979-81, Carter administration) was a period of low polarization with frequent positive between-party and negative within-party links.This translated to comparable frequencies of balanced (positive product of edge signs) and unbalanced (negative product of edge signs) cycles.(B) 114th Congress (2015-17, Obama administration) featured high polarization with in-group ties being almost exclusively positive and out-group ties negative.This induced a distribution skewed towards balanced (positive) cycles.
global DoB index.This is an important limitation, since it is typically easier and more meaningful to compare a scalar DoB value between different networks.Moreover, global measures, being scalar values, are probably more useful for designing clustering or community detection methods.
Thus, several approximations have been proposed which can roughly be divided into two families of local and global measures.Local measures attain efficiency by focusing only on cycles of particular, usually short, lengths, such as 3-cycles (triads).They can be fast, but provide only a limited description of the real structure of a network.Thus, we argue that global measures are preferable.
Several global approaches have been proposed.Some bypass the problem of counting cycles entirely, and instead search for partitions minimizing frustration [15] (the number or relative weight of edges incompatible with the SBT assumptions), but they suffer from similar computational constraints due to their combinatorial nature.Others leverage spectral properties of signed graphs and are therefore computationally efficient, but measure only strong balance and quantify DoB using the smallest eigenvalue of the signed Laplacian matrix [26], which is not normalized and can be difficult to compare between networks.The last major approach is based on approximating cycle counts with counts of closed walks which can be calculated, or at least approximated, very efficiently with standard linear algebra [12,41].Moreover, it can produce both local and global measures [8,12] as well as capture strong and weak balance properties [24].However, walk-based approximations can be potentially misleading as they may combine patterns found at very different cycle lengths [18].
On the other hand, one can put forth arguments based on the theory of dynamical consensus on signed networks and argue that closed walks provide a fuller picture of structural balance [11].
Here we propose Multiscale Semiwalk Balance (MSB): an approach applicable to (un)directed, (un)weighted signed networks.It is multiscale as it provides both grayscale measures approximating DoB at particular cycle lengths, as well as global indicators aggregating local measures across multiple scales in a principled manner.Namely, it enforces what we call Locality Principle (LP) and ensures that global DoB estimates are weighted averages of estimates at specific lengths such that DoBs for shorter cycles are assigned with non-decreasing weights.
Our work builds on the Walk Balance (WB) approach proposed by Estrada and Benzi [12], which tends to underestimate DoB, especially in large networks [18,41].We show that this is caused by too much weight being placed on long cycles and can be fixed by introducing a formal resolution parameter.Namely, we demonstrate how the inverse temperature, β, considered briefly already in Ref. [12], can be reinterpreted and used to determine an appropriate weighting scheme for aggregating DoB measures across different cycle lengths that satisfies LP.It also allows our MSB approach to be applicable and meaningful in the context of weighted signed networks.Additionally, we generalize the WB approach to capture both strong and weak balance, as well as define DoB measures not only at the level of entire graphs but also for particular nodes and pairs of nodes to enable the development of effective SBT-aware clustering (community detection) methods.Last but not least, by using semiwalk-based approximations our methods are more directly linked to both undirected and directed SBT theorems and therefore meaningful also for directed signed networks.
We demonstrate the utility of our approach in two case studies of polarization in social systems.The first is a re-analysis of the famous Sampson's Monks dataset [39], in which we show that the commonly accepted "ground truth" partition is not SBT-optimal by finding better ones, which also shed some additional light on the underlying social dynamics.In the second study we use our methods to provide evidence for increasing polarization in the U.S. Congress based on bill co-sponsorship data [33].

Notation
Here we consider weighted graphs G = (V, E, ω) with n = |V | vertices and m = |E| edges and no self-loops or multilinks, where V and E ⊆ V × V are vertex and edge sets respectively, and ω : E → R is a function assigning weights to edges.The weights can be negative, so the above definition captures all (un)signed, (un)weighted and (un)directed graphs.
The adjacency matrix of a graph G is given by a square n × n matrix A(G) such that A ij = ω ij = ω(i, j) if (i, j) ∈ E or otherwise A ij = 0. Whenever possible without introducing ambiguity, we will drop the explicit dependence on G and prefer a simpler notation, A. We will use |A| to denote the unsigned counterpart of A such that |A| ij = |ω ij |.Additionally, P and N will denote non-negative n×n matrices corresponding to positive and negative parts of A such that A = P−N and |A| = P + N. When discussing network partitions we will use B to denote n × b block partition matrix such that B iu = 1 when the ith node belongs to the uth block (group) or otherwise B iu = 0. Matrix trace operator will be denoted by tr.In particular, trace of the kth power of a square matrix X will be denoted by tr X k .Hadamard (elementwise) matrix product will be denoted by ⊙.
All measures that we will define here will depend on a particular graph G. Thus, for the sake of avoiding cluttering the notation, whenever possible, we will omit this general dependence in the notation.

Preliminaries
Before introducing the proposed framework we first state the core problems our work is supposed to solve in a more formal fashion for the sake of clarity.

Aggregating DoB measures
The difficulty with defining a meaningful global Degree of Balance (DoB) can be easily seen by first considering DoB measures for cycles of particular lengths.For a signed graph G we define k-balance (DoB for cycles of length k) as: where µ + (k) and µ − (k) are respectively counts of balanced and unbalanced cycles of length k.This measure is easy to interpret, since it is concerned with only one specific class of cycles (those of length k), so in this context it is justified to treat every cycle equally.However, defining a global DoB measure integrating structural balance information across different cycle lengths is more difficult, since there are infinitely many ways to do it.A reasonable solution is to assume that global DoB should be a weighted average of k-balance scores: where ω k 's are normalized weights (ω k ≥ 0 and k ω k = 1) assigned to different balance scores at different lengths k.However, it is not clear how the weights should be chosen in order to produce a meaningful global DoB measure.Importantly, let us note that the above generic definitions are appropriate for both the strong and weak notions of balance.In what follows we will derive particular operationalizations of these generic formulas.

Finding clusters in signed networks
While it is useful to know DoB of a network, which tells how close it is to being perfectly balanced and therefore clusterable, it is arguably even more useful to be able to find clusters (network communities) such that they agree with SBT to the greatest extent possible.This compatibility of a given partition of a signed network with respect to the structure theorems of SBT (see Methods, Sec.4.1, for details) can be measured with frustration ratio, which can be defined, following Ref.[9], as the sum of absolute weights of negative in-group and positive out-group ties relative to the sum of all absolute edge weights, which can be expressed succinctly in the matrix form as: where 1 is a vector of ones of an appropriate length, B ∈ R n×b is a block-partition matrix and P and N are positive and negative parts of the adjacency matrix A. Note that frustration ratio can be also seen as a normalized version of frustration count, which is used to define frustration index as the minimal frustration count over all partitions of a network [4].
Frustration ratio is a very straightforward measure of the extent to which a given partition produces a balanced network configuration.It ranges from 0 for balanced partitions to 1 for maximally unbalanced ones (Fig. 2).
(A) Balanced partition (B) Max.unbalanced partition  It is important to note that frustration ratio, while closely related to DoB, measures something different.DoB is a property of a network as such, which, thanks to the structure theorems of SBT, is informative of the extent to which a given network is clusterable.On the other hand, frustration ratio is a property of a network and a specific partition, and is directly related to how close a given partition is to be perfectly balanced.That is why we argue that it is an appropriate measure of the quality of a partition vis-à-vis the tenets of SBT.Thus, DoB and frustration ratio are closely related but not equivalent, as already observed in Ref. [11].However, the crux is that in the limiting case of the perfect balance, DoB equal to 1 implies that there is a partition with zero frustration and vice versa.The farther a network is from this ideal case the fuzzier this relationship gets, but in general the two measures will be always related.We will use this insight to develop a clustering method utilizing DoB-like scores.

Approximating (semi)cycles with closed (semi)walks
Counting cycles is computationally very expensive, so in practice approximations are necessary.A very general and flexible approach is based on approximating cycles with closed walks, which can be counted much more efficiently using powers of adjacency matrix.However, SBT in its most general form applicable to both directed and undirected networks is formulated in terms of closed semipaths, or semicycles [5].A semipath is a path, in which edge directions can be ignored, but any edge can still be traversed only once.This property has an important consequence for directed networks, in which in general semicycles correspond to cycles in the associated undirected multigraph (obtained by making every link bidirectional) with the exception of 2-cycles, which require both i → j and j → i links to be present (Fig. 3).
Thus, we argue that semicycle counts should be approximated using semiwalks, which are simply walks on the corresponding undirected multigraph (i.e.ignoring edge directions) [48].However, an additional correction factor should be used to account for the fact that non-reciprocated directed edges do not generate any 2-semicycles.

Multiscale Semiwalk Balance
Here we introduce Multiscale Semiwalk Balance (MSB) approach which provides solutions to all of the above-mentioned problems.We first develop it without considering the role of edge weights, which, as we discuss later, appear in our approach naturally also in the context of unweighted networks.Once the core framework is established, we show that it automatically extends to weighted graphs in a meaningful way.Moreover, here we focus on the undirected version of MSB and strong balance.In Sec.2.3 we generalize our approach to directed signed graphs and in Methods, Sec.4.2, to the weak notion of structural balance.
In what follows we will use the fact that for a graph G walks of length k between nodes i and j are counted by the elements of the k-th power of its (unsigned) adjacency matrix, |A| k (in the weighted case |A| k gives weighted counts such that each walk is assigned a weight equal to the product over its constitutive edges).Importantly, such matrix powers can be calculated and approximated easily using eigendecomposition, especially for symmetric matrices and here we will use only such.
We will be particularly interested in weighted sums of matrix powers of the following form: where k iterates over a sequence of consecutive non-negative integers, k min , . . ., k max , and the second approximate equality is exact when In what follows we will use a simpler notation, W(A, β), whenever it is clear from the context, or unimportant, what k min and k max are.Moreover, any function depending on W(. ..) is also implicitly parametrized by k min and k max but we will omit this in the notation for the sake of brevity.Note that here β is a free parameter which can be used to control the weights assigned to different powers of A. We will use this fact later.Moreover, both W(A, β) and its trace can be approximated in an accurate and efficient manner based on m leading eigenvalues of A (see Methods, Sec.4.5).

Strong balance
Following Estrada and Benzi [12], we note that powers of signed adjacency matrix, A k , give differences between counts of positive and negative walks of a given length, while powers of unsigned adjacency matrix, |A| k , count all walks of the given length.Thus, the sum of differences between weighted counts of positive and negative walks of a lengths k = k min , . . ., k max is given by W(A, β).
Similarly, W(|A|, β) gives the corresponding sum of weighted counts of all walks.
In the case of undirected networks considered here we have that k min = 3, since 2-cycles in undirected signed networks are always trivially balanced.On the other hand, it should be that k max ≤ n, since no cycle can be longer than the number of nodes in a network, but it is not obvious what is the proper exact choice for k max .However, any moderately large value will do, since the higher order terms in Eq. ( 4) are quickly killed by the inverse factorial factor.In Supplementary Information (SI), Sec.S3, we show that typically k max ≥ 10 is enough to get practically error-free results.However, to stay on the safe side in all following analyses we always use k max = 30.
Counts of closed walks are given by the diagonal elements, so the overall counts are given by appropriate matrix traces.Thus, to measure structural balance in a signed network one can use Balance Index [12], or the ratio of the difference between weighted counts of balanced (µ + ) and unbalanced (µ − ) closed walks to the weighted count of all closed walks: A conceptually simpler measure is Degree of Balance (DoB), proposed already by Cartwright and Harary [5], which represents the proportion of balanced walks: Following [12] again, we can define node-level measures, also known as local balance [8], simply by calculating diagonals instead of traces: Note that we use lowercase letters to denote quantities describing individual nodes instead of the global properties of entire graphs.We will follows this convention also when defining other node-level measures.
Measures of k-balance (DoB at a particular length k) can also be easily defined: Note that these measures do not depend on β, since, even if they did, the same weighting factor would have to appear in both the numerator and denominator.This shows that β indeed controls only the amount of weight put on different cycle lengths, but does not influence the degree of balance at particular lengths.

Contribution profiles and Locality Principle
Importantly, one can asses the contribution of closed walks of length k to the total weighted sum of closed walk counts for lengths k min , . . ., k max : In other words, Eq. ( 11) measures the ratio of the weighted sum of closed walks of length k to the total weighted sum of closed walks over a specified range of lengths.It is normalized by construction, so The contribution score clearly depends on β, which can be used for controlling the influence of different length scales on the overall calculations.This is a crucial feature of our approach as it allows for a straightforward operationalization of Locality Principle (LP): shorter cycles should generally matter no less than longer ones.

Definition 1 (Locality Principle).
A graph G, a resolution parameter β > 0 and a sequence of consecutive integers 2 ≤ k min , . . ., k max satisfy Locality Principle if and only if the following set of inequalities holds: Thus, LP allows for identification of a range of "reasonable" values of β, which is given by a set (0, β max ], where β max > 0 is the largest value still satisfying LP.Crucially, β max always exists for graphs that contain at least one closed walk for lengths k min , . . ., k max .Theorem 1.Let 2 ≤ k = k min , . . ., k max be a sequence of consecutive integers and G a graph such that tr |A| k > 0 for all k's.Then, there exists a value β max such that Def. 1 holds for values 0 < β ≤ β max and does not hold for values β > β max . Proof.Using Eq. ( 11) the condition for LP can be rewritten as: which after some straightforward algebra gives the following condition for β: Now we note that the right-hand side of the above inequality is always positive, so there is a maximal value β max > 0 satisfying all inequalities: As a result, a β value satisfies LP if and only if β ∈ (0, β max ], which ends the proof. Finally, following the parsimony principle, we choose the weakest LP assumption possible and set β := β max .This is a simple heuristic and we do not make any claims regarding its optimality.We chose to use it here as developing a more principled method for selecting β is beyond the scope of this paper and we plan to address this problem in the future.However, as we later show through empirical analyses of real-world datasets, this heuristic seems to work very well in practice.Moreover, using β max still yields markedly right-skewed contribution profiles, even though it can be argued that for this choice LP "barely" holds, but this is true only in the sense of the entire set of inequalities for all pairs of lengths (k, k + 1), and does not imply that contribution scores assigned to short closed walks are only marginally higher than those assigned to long walks (cf.Fig. 4).
Our results also explain why the original WB approach [12] underestimates DoB in large networks.Namely, it does so because without determining the characteristic scale of a network by tuning β the contribution profile may peak over very long cycles.As Fig. 4 shows, WB places most of the weight on very long cycles (k ≈ 100) in large networks, which clearly violates LP.As a result, it produces much lower DoB estimates than MSB, since products of signs over very long closed walks are arguably mostly random.Only in the case of the directed Epinions network WB produces an estimate close to the one given by MSB.However, as balance measures at particular cycle lengths show, this happens only because of the very particular structure of the network resulting in high DoB at cycle lengths of approximately 100.Moreover, this seems to be a statistical artifact which disappears almost completely when balance is assessed based on semiwalks (MSB) instead of ordinary walks (WB) (see Sec. 2.3 for the generalization to directed measures based on semiwalks).Crucially, this problem is likely to affect any other walk-based methods, which do not use a well-tuned resolution parameter.Moreover, without a measure akin to Eq. (11), it is hard to know for sure whether a given method will produce correct results for a given network.
Importantly, global DoB is a weighted average of k-balance values with weights equal to the corresponding contribution scores.Thus, Eq. ( 6) satisfies the requirement postulated in Sec.2.1.1.Theorem 2. Let G be a signed graph, β > 0 a resolution parameter and 2 ≤ k = k min , . . ., k max a sequence of consecutive integers.Then: Proof.It is given in the SI, Sec.S1.

Node contributions
Starting from similar ideas, one can also define node-level, or local, contribution scores measuring the influence of a node i on the overall DoB calculations: Note that by construction c i (β) ∈ [0, 1] and i c i (β) = 1, so it enjoys the same normalization property as the global contribution score.Importantly, node-level contribution scores, together with local DoB, can be useful for defining and measuring various notions of node centrality in signed networks.

Pairwise cohesion and clustering
Note that off-diagonal elements of W(A, β) also convey important information.Namely, they measure the difference between weighted counts of positive and negative walks between nodes i and j.We use this fact to define pairwise Cohesion Index: and a corresponding (pairwise) Degree of Cohesion (DoC) measuring the fraction of positive walks between nodes i and j: Note that cohesion index uses k min = 2.This facilitates differentiating between frustrated and non-frustrated edges.If there are many positive walks between i and j, but i ∼ j edge is negative, then the (i, j) pair generates many unbalanced closed walks and therefore the i ∼ j edge should be considered rather a frustrated in-group tie than an out-group tie, and an analogous argument can be made for negative walks.Thus, direct links by themselves do not provide evidence necessary for partitioning nodes and therefore should not be used for determining pairwise cohesion.We use the same letters r and b we used to denote (local) balance measures for the sake of consistency as balance and cohesion are based on the same idea.Indeed, all balance scores can be seen as measures of "self-cohesion".To see this, let us consider a cycle and a node i that sends a bit of information to its left neighbour, who passes it further to its left neighbour and so on, until the bit comes back to i.Moreover, let us assume that the bit is flipped when crossing negative edges.Now, it is easy to see that the bit will return in the original state if and only if the cycle is balanced.In this sense, structural balance is measuring the consistency between sent and returning signals.
Cohesion measures are important because they allow developing SBT-aware clustering methods.We leave a detailed study of this idea for future work.However, in what follows we combine them with standard agglomerative hierarchical clustering [20] (see Methods, Sec.4.3, for details) to show that MSB approach produces meaningful results and allows for detecting interpretable low frustration network partitions.

Weighted measures and β as average edge weight
Importantly, β can be interpreted in terms of an average edge weight.Any unweighted network can be seen as a weighted network with uniform absolute edge weights of 1.Note that in this case the absolute product over a closed walk of any length is always equal to 1, so every walk is considered equal, and it is only β that controls and re-scales edge weights inducing nonuniform walk weights (through β k scaling).Thus, an arguably natural way to handle non-unitary weights is to re-scale them, so the average absolute weight is equal to 1: where ω ij is the original weight of the (i, j) edge and |E| is the number of edges.This retains the interpretation of β in terms of an average edge weight and ensures that in a network with a completely uniformly random topology (i.e.Erdős-Rényi random graph with randomly and independently assigned signs and absolute weights) the expected value of a walk weight (i.e. the product of the corresponding edge weights) gets fixed to 1 when β = 1.Analyses in Sec.2.4 suggest tentatively that this approach to incorporating edge weights may be indeed effective and produce better results than analogous unweighted methods (e.g.find partitions with lower frustration).

Directed measures
Here we extend all the previously defined measures to directed signed networks.To do so, we first note that the structure theorems of SBT in their most general form are formulated in terms of semipaths and semicycles (they are listed in Methods, Sec.4.1).Thus, it is natural to extend our approach to directed networks by simply using semiwalks instead of ordinary walks.
Definition 2 (Semiwalk).A semiwalk is a sequence of adjacent edges such that for every two consecutive edges (i, j) and (k, l) it holds that k ∈ {i, j} or l ∈ {i, j}.
More intuitively, semiwalks are just ordinary walks ignoring edge directions [48], or walks on an undirected multigraph derived from a given directed graph by making all edges bidirectional.Thus, semiwalks between all pairs of nodes in a graph G are counted by powers of its semiadjacency matrix, which is defined as the symmetric part of the adjacency matrix: Note that S is symmetric and S(A) = A when A is symmetric, which jointly means that S[S(A)] = S, so the semiadjacency operator is idempotent.In what follows, we will use a simpler notation without the explicit dependence on A and we will use S to denote S(A) and |S| to denote S(|A|).Importantly, S is not a lossless representation of the adjacency matrix of the undirected multigraph underlying a given directed signed network, but it is lossy in a way which does not affect any balance-related calculations.Firstly, reciprocal edges with opposite signs cancel each other out in S(A).However, this does not affect the difference between counts of positive and negative semiwalks, µ + − µ − , since each symmetric dyad with opposite edge signs will be included in the same number of positive and negative semiwalks between i and j (Fig. 5).Secondly, the 1/2 factor means that S approximates the adjacency matrix of the multigraph divided by 2, but, again, this does not matter as in our approach edge weights are reweighted by the β parameter, which sets the average edge weight, anyway.The gain from using the 1/2 factor is that S is idempotent and equal to A for undirected graphs.
As a result, directed balance measures are obtained simply by substituting A with S and |A| with |S| in all the formulas.However, to account for the fact that 2-cycles in directed signed networks are not trivial (i.e. they may be both balanced and unbalanced), an additional correction is needed.As explained in Sec.2.1.3,asymmetric dyads do not span any 2-semicycles, while symmetric ones do.Thus, in the case of directed networks one needs to apply corrections to Eqs. ( 4) and (20) to count proper 2-semicycles: where both W and V still use k min = 3.

Re-analysis of Sampson's Monastery dataset
Sampson's Monastery study [39] produced one of the most famous network datasets studied in Social Network Analysis (SNA) in general, and SBT in particular.It describes the evolution of the social structure in a group of postulants and novices in a monastery in New England in 1960's.
Namely, a network of liking (positive) and disliking (negative) relations was measured at five points in time.The ties are directed and weighted in the −3 : 3 range, with weights indicating ordinal ranking of the preference towards or against a given person typical for sociometric studies (see Methods, Sec.4.6.5, for details).
The dataset is particularly valuable because, as the study had been conducted, the group went through a major conflict, which eventually lead to either resignation or expulsion of the majority of the members of the congregation.Moreover, Sampson identified a partition into three groups, which later have been independently validated with analytic SBT-motivated clustering methods [9], and therefore is commonly recognized as the "ground truth" solution.
The most important events happened at times t = 2, 3, 4, which correspond to a period of differentiation and polarization [9] that eventually lead to an open conflict and disintegration of the group.At t = 2 twelve new members joined the monastery, while some older members left after t = 1, so the new group consisted of 18 men in total.This perturbation lead to an emergence of two competing groups (Loyal Opposition and Young Turks) as well as a group of peripheral members, who were not fully accepted by the rest (Outcasts).The network at time t = 4 depicts the structure just before the open conflict and disintegration.At t = 5 only 7 members remained in the monastery, and those who stayed (they are marked with red labels on Fig. 6C, t = 4) belonged almost exclusively to the Loyal Opposition, which clearly "won" the conflict.
Here we use MSB approach to demonstrate that the "ground truth" partition is not SBToptimal, or maximally consistent with Theorem 4. This can be measured using frustration ratio, F (B). Fig. 6A shows both the "ground truth" and the MSB network partitions for times t = 2, 3, 4 (see Methods, Sec.4.3, for details of the clustering method).They differ only in a few details, which are, nonetheless, very informative about the unfolding dynamics.Firstly, according to the "ground truth" partition, Basil was a member of the Outcasts.However, MSB analysis indicates that initially (t = 2) he interacted mostly with the Young Turks and only later was rejected and became one of the Outcasts.Secondly, Amand, a member of the Loyal Opposition according to the "ground truth", was consistently identified as one of the Outcasts by our MSB clustering procedure.Most importantly, according to MSB, John Bosco, who was considered one of the two leaders of the Young Turks (the second one was Gregory), became one of the Outcasts just before the disintegration of the monastery (t = 4).This says a lot about why the Young Turks "lost" the competition against the Loyal Opposition, of which core constituted most of the group that remained at the monastery.
As evident in Fig. 6C, local weak balance scores of John Bosco were consistently low and at time t = 4 also Gregory, the second leader, attained low local balance (see Methods, Sec.4.2 (A) Sociograms with "ground truth" and MSB block partitions .Re-analysis of Sampson's Monastery networks using MSB approach.Full spectra were used in computations (exact results).(A) Signed sociograms at times t = 2, 3, 4. Left side colors denote block membership according to the "ground truth" partition and right side colors correspond to MSB partitions.Positive ties are blue and negative are red.Individuals of which "ground truth" and MSB block memberships differ (Amand, Basil and John Bosco) as well as the leaders of Young Turks (John Bosco and Gregory) are labelled.Network layout was determined with Kamada-Kawai algorithm using only positive ties with weights (distances) on cross-block ties rescaled by the factor of 5. (B) Time series of strong and weak DoB measures for t = 1, . . ., 5 using MSB as well as strong DoB based on WB approach of Estrada and Benzi [12], which is equivalent to MSB approach with β = 1 using ordinary adjacency matrix.(C) Weak local balance expressed as z-scores relative to the overall distribution.Points are sized proportionally to local contributions and ordered first by block membership and then by balance scores.Members who remained at the monastery after the culmination of the conflict (t = 5) are marked with red labels on the subplot for t = 4. (D) Time series of frustration ratios for t = 1, . . ., 5 according to partitions obtained with MSB and WB (β = 1) approaches as well as the "ground truth" solution (which is defined only for times t = 2, 3, 4).F U MSB denotes frustration values using unweighted MSB approach.
for the details of the weak balance measures).This was largely driven by the tension in their personal relationship (at t = 4 the Gregory→John Bosco tie is positive and John Bosco→Gregory is negative), which then propagated through the entire group (note that both of them had high local contribution scores, Fig. 6C) leading, probably, to its decomposition.As Importantly, MSB measures of DoB are clearly high during the evolution of the conflict (t = 2, 3, 4), with the maximum at t = 4, while analogous WP measures, which are not based on LP, yielded markedly lower DoB values that cannot be readily interpreted as indicative of a conflict, as they are not much greater than 1/2 (which can be expected for a random assignment of edge signs).Similarly, frustration values obtained with MSB clustering are consistently lower than those of "ground truth" partition, and at times t = 1, 2, 3, 4 also lower than the ones obtained using WB.On the other hand, frustration ratios obtained when ignoring edge weights (F U MSB ) are markedly higher, indicating that our approach uses edge weights information effectively leading to better results, i.e. partitions with lower frustration.
Thus, the analysis indicates that MSB can produce useful and interpretable results, including finding low frustration partitions of signed networks.Moreover, by combining global and local measures applied to time series of network snapshots, insights into the impact of microscopic changes (e.g.edge sign switching) on the meso-and macroscopic structure can be gained.

Polarization in the U.S. Congress
It is often claimed that political life in contemporary democracies have polarized significantly over the last few decades.Arguably, this debate is particularly relevant for the U.S., because of its largely two party political system, for which the notion of (bi)polarization is particularly well-defined.Such a hypothesis is also supported by a lot of empirical evidence (cf.[23,33] and references therein).
Here we use MSB approach to study polarization in both chambers of the U.S. Congress based on patterns of bill co-sponsorship between 1973 and 2016 (93rd to 114th Congress) [33].The dataset consists of two sequences of undirected signed networks inferred from co-sponsorship data, where positive ties indicate statistically significant tendency of two representatives/senators to promote the same bills and negative ties the opposite tendency to avoid promoting the same projects (see Methods, Sec.4.6.6 for details).
Our analysis indicates that polarization increased markedly in both the House of Representatives (Fig. 7A) and the Senate (Fig. 7B).This is evident in the steadily increasing strong DoB values meaning that co-sponsorship networks became easier to bipartition in time.
The increasing trend seems to materialize during the second Congress of Carter's administration and be stable, notwithstanding some transient perturbations.Interestingly, and consistently with our previous analysis of the importance of Locality Principle, WB approach yielded almost exclusively very low DoB values, and thus would not capture the true trend.This is, of course, the consequence of the violation of LP.
In both chambers frustration ratios clearly converge (Fig. 7, 2nd panels) meaning that best bipartitions and clusterings (in k groups) based on MSB approach (Methods, Sec.4.3), as well as partitions following partisan affiliations are becoming more and more consistent with the SBT theorems and therefore also similar.This is evident in the time series of the similarity between MSB and partisan partitions measured with Adjusted Mutual Information (AMI) score [47] (Fig. 7, 3rd panels).Moreover, even in k-clusterings with k large, most of the nodes tend to belong to the two largest clusters, indicating, again, an increasingly bipolar structure organized along the party lines.Note that even in the extreme case of the House of Representatives during the 96th congress (second congress of Carter's administration), where we found 147 distinct "clusters", 271 or around 61% of the representatives belong to the two largest blocks, meaning that the rest of the clusters correspond to the other 171 representatives, for whom the average cluster size was about 1.18.Thus, in this period many members of the congress were effectively functioning in-between the two (3rd panel) quantifies similarity between party-based partitions and best bipartitions, AMI(2, P), as well as best partitions into k clusters, AMI(B, P), using Adjusted Mutual Information (AMI) score [47].The closer values are to 1, the better is the match between two clustering solutions.(4th panel) shows the number of clusters in the solution minimizing F (B) (left y-axis), as well as the fraction of nodes within the two largest clusters (right y-axis).
main blocks, and from the perspective of the clustering procedure they were outliers forming many small clusters, very often composed of only one node.This result is consistent with the fact that this was a period of the lowest polarization, for which the partisan cleavage should not be very pronounced.
To sum up, the results point to a strong consistency between global DoB measures and quality of optimal partitions.Namely, the higher DoB the lower the frustration of optimal partitions found by our clustering algorithm.Moreover, the fact that in time all empirical partitions become more and more similar to the partisan affiliations and the majority of nodes always belong to the two largest clusters jointly mean that the MSB partitions we obtained are meaningful and consistent with the partisan polarization hypothesis.In other words, we indeed find that in time it becomes easier to find low frustration network partitions that largely overlap with partisan affiliations.Thus, the patterns of cooperation between the senators and representatives become more and more constrained by their party membership.

Discussion
Polarization is often considered a salient, and perhaps worrying, feature of contemporary societies [2,23,33,40].It can result in a sharp divergence of popular beliefs or attitudes (ideological polarization) as well as in-group favouritism and out-group hostility (affective polarization) [23].Crucially, the latter implies clustering of social networks into 2 or more groups with primarily positive in-group and negative out-group ties.This structural aspect of polarization is studied in Structural Balance Theory (SBT), which links it to properties of semicycles in signed networks and provides strict criteria for measuring polarization [5,6].
Here we introduced Multiscale Semiwalk Balance (MSB) approach for measuring both strong and weak degree of balance (DoB), which is applicable to any kind of (simple) signed networks, including directed and weighted ones.MSB is computationally efficient by approximating semicycles with semiwalks, which can be counted using standard linear algebra, and defines DoB measures not only for entire graphs but also specific nodes and pairs of nodes, which in turn allows for implementing effective signed community detection methods motivated by SBT.Crucially, MSB is multiscale in the three following senses: 1.It proposes a principled way of aggregating multiple k-balance scores for particular cycle lengths to produce a single global DoB estimate motivated by Locality Principle (LP).The resolution of analysis, or the weighting scheme for aggregating k-balance scores, is controlled by a single parameter, β, which can be tuned based on first principles to capture the characteristic scale of a network at which its DoB should be assessed.This is a crucial feature of our framework, as even though many other approaches apply some decaying weights to longer cycles, typically the decay rate is fixed or controlled by a free parameter with no principled way of selecting an appropriate value [3, 12, 2. It provides methods for measuring strong and weak DoB for entire graphs, closed walks of particular lengths, individual nodes and pairs of nodes.
3. Thanks to the pairwise measures it facilitates development of methods for finding mesoscopic structures in signed networks, i.e. clusters or groups of nodes with primarily positive in-group and negative out-group ties.
Unlike many other approaches to SBT [12,24,41], MSB is formulated explicitly in terms of semiwalks as an approximation to semipaths and semicycles.This connects it more directly to the structure theorems [5,6], and as a result facilitates meaningful analyses of directed networks.Crucially, semiwalk-based k-balance scores tend to be similar to values produced by cycle-based kbalance methods introduced in Ref. [18] (see Methods, Sec.4.4, for details).Thus, the fundamental approximation on which our approach is based seems to introduce little extra noise relative to cyclebased measures.Similarly, the error introduced by using only leading eigenvalues and eigenvectors is also typically very small (SI, Sec.S3).On the other hand, being based on (semi)walks that can be counted easily using standard linear algebra, MSB computations can be remarkably fast (SI, Sec.S5).
Furthermore, there are also theoretical reasons for preferring walk-based over cycle-based DoB measures.First, let us note that in a signed graph all cycles are balanced if and only if all closed walks are balanced, so for measuring perfect structural balance walk-and cycle-based DoB measures are equivalent.Furthermore, in opinion dynamics (diffusion) on a signed graph two groups can reach different consensus states if and only if the graph is balanced, but the diffusion process depends not only on purely cyclic structures, but also on acyclic ones, as well as "artificial cycles" produced by backtracking walks [11].Thus, it can be argued that partial DoB measures defined in terms of (semi)walks paint a fuller picture of structural balance, especially as far as the interplay between network structure and diffusion dynamics is considered.Thus, our perspective is different from other works on multilevel assessment of structural balance such as Ref. [1], which are focused exclusively on strong balance, and in which microlevel DoB analysis is equated with the triad-level DoB, mesolevel with the cohesiveness of the network partitions as such (which is fully compatible with our framework), and finally macrolevel is equated with the line index (or frustration index), but computed only for partitions into two groups.Furthermore, our approach tries to follow the structure theorems of SBT as closely as possible given its approximate walk-based nature.Directed MSB measures are based on semiwalks, and thus they ignore edge directions, except for the special case of dyads (2-cycles), in which directions of both edges are considered (this is accounted for by corrections defined in Eqs. ( 17) and ( 18)).This design choice follows directly from the fact that SBT was formulated in terms of semicycles, which are simply cycles in which edge directions are ignored as long as each edge is traversed at most once.On the other hand, methods from Ref. [1] use edge direction information in a more complex fashion, which, of course, may be insightful but is not necessary from the vantage point of SBT theorems and the problem of network partitioning.
Locality Principle is justified not only by its usefulness as a heuristic guiding DoB methods, but also by a long history of social and psychological research.In particular, experimental research on perception of structural balance in social networks indicates that people pay more attention to small scale structures [49].This is in line with other seminal results stressing the importance of proximity (both physical and social) for social phenomena such as social impact theory [27] and Dunbar's numbers [22], which are closely related to the fact that social networks tend to be sparse and composed of ties that are localized within some physical and/or social space [43].Moreover, studies of structural balance using alternative cycle-based methods show that real-world networks tend to have a cycle length threshold after which k-balance scores suddenly decrease to random-like values (around 0.5) [18].In other words, structural balance typically manifests itself at the level of small-and medium-sized structures, so DoB measures should account for that.This is exactly what LP does.
Importantly, β can be endowed with a physical interpretation, which helps to explain its role as a resolution parameter.Note that cohesion index defined in Eq. ( 13), from which all other MSB measures may be derived, can be approximated by a ratio of elements of two matrix exponentials, r ij (β) ≈ (e βA ) ij /(e β|A| ) ij , and the exponential of a rescaled adjacency matrix, such as βA, is known as communicability, which is a general measure of connectedness defined in terms of the weighted sums of walks of different lengths between pairs of nodes [13].In this context, β can be interpreted as the inverse temperature of a thermal bath in which a network is submerged.More generally, the thermal bath may represent an "external situation", e.g. the level of agitation of the system, which manifests itself by rescaling edge weights with the β factor.As a result, when β → 0 (hot regime), there is no communicability between nodes, and when β → ∞ (cold regime), then there is infinite communicability between all pairs of nodes [14].Note that in both cases the actual network topology ceases to matter.Thus, network structure is accounted for in DoB calculations only for appropriately chosen intermediate values of β, and in this context LP provides an effective heuristic for fine-tuning β and finding the most relevant range of cycle lengths at which DoB should be assessed.This stresses the importance of multiscale approaches to SBT and network science more generally.By linking structural balance to communicability [13,14], our results suggest that, perhaps, other network descriptors defined in terms of walks, or powers of adjacency matrices, such as multiscale network entanglement [17], can be informed by Locality Principle.Note that contribution scores defined in Eq. (11), and used for operationalizing LP, can be calculated for any, also unsigned, network.Thus, LP is a heuristic for determining the characteristic intensity and length of internode correlations, and this determines the appropriate weighting scheme for aggregating walk-based measures across multiple scales.More generally, our results contribute also to the research on the importance of local structures in networks [30,31,44].
Our work, of course, does not come without limitations.Firstly, even though cohesion measures defined in Eqs. ( 13) and ( 14) seem to open up new possibilities for designing clustering or community detection methods for signed networks, the actual clustering algorithm we used here is rather naive.Developing more mature methods derived from first principles will not be an easy task and we leave it for future work.Moreover, it can be argued that an even better approach for tuning β could be based on setting it to a value that minimizes frustration of the best partition.However, a proper solution to this problem would require a solid theory-driven clustering method parametrized by β, which we do not currently have, so the choice β := β max should be considered the best working heuristic for selecting an optimal value for β for now, but it should be replaced with more mature solutions once they arrive.Furthermore, even though some in-depth insights regarding similarities and differences between cycle-and walk-based DoB measures vis-à-vis the tenets of SBT have been offered by Estrada [11], one can argue that the debate on whether the former or the latter should be preferred is not yet settled.Perhaps, an interesting "middle ground" perspective could be gained by studying DoB measures based on non-backtracking (Hashimoto) matrices [46]?

Overview of Structural Balance Theory
Here we state the main definitions and theorems of SBT concerned with bi-clusterability as formulated by Cartwright and Harrary [5].We use the general formulation based on semipaths and semicycles, so the theorems are applicable to both undirected and directed graphs.Thus, we first define semipaths and semicycles.

Definition 3 (Semipath).
A semipath is a walk in which each (directed) edge can be traversed both ways but only once and each node is visited exactly once.Definition 4 (Semicycle).A semipath starting and ending at the same node (which in this case is allowed to appear twice).
Corollary.Notions of paths/cycles and semipaths/semicycles are equivalent in undirected graphs, since an undirected edge is treated in this context as two directed edges pointing in opposite directions.

Definition 5 (Strong balance property).
A signed graph is balanced if and only if every semicycle it contains is positive (the product over all edge signs is positive).
Theorem 3 (Strong structure theorem).A signed graph is balanced if and only if its vertices can be partitioned into two subsets such that positive edges connect vertices from the same subset and negative ones link vertices from different subsets.
The above results were later generalized by Davis [6], who provided necessary and sufficient conditions for b-clusterability (where b ≥ 2 is an unknown integer).Definition 6 (Weak balance property).A signed graph is weakly balanced if and only if no semicycle contains exactly one negative edge.
Theorem 4 (Weak structure theorem).A signed graph is weakly balanced if and only if its vertices can be partitioned into b subsets such that positive edges connect vertices from the same subset and negative ones link vertices from different subsets.

Weak balance
Following Ref. [24] we define non-negative matrices P(A) and N(A) corresponding to positive and negative parts of signed adjacency matrix such that A = P − N and |A| = P + N. In what follows we will use the simpler notation without the explicit dependence on A, but it is important to remember that P and N are functions of A.
Weak balance is defined in terms of the extent to which a network is free of cycles with exactly one negative edge.This single negative link can be placed anywhere along a path starting at node i.Hence, we first define a matrix counting weakly unbalanced walks of length k between nodes i and j in a signed graph G as: where QΛQ ⊤ is the eigendecomposition of P, M is a shorthand for the product Q ⊤ NQ that appears in the middle of the second line, and L(k, l) ij = λ l−1 i λ k−l j .Moreover, we used the fact that Λ l−1 MΛ k−l = L(k, l) ⊙ M. Now, a matrix with weighted sums of counts of walks of lengths k = k min , . . ., k max joining nodes i and j is given by: Next, we can use Eq. ( 20) to calculate the overall weighted sums of counts of unbalanced closed walks from appropriate traces: where we used the fact that trace is invariant under cyclic permutations and Q is orthonormal.
The weighted sum of counts of closed walks at a node i is similarly given by the diagonal elements, V(A, β) ii .Now, Eqs. ( 4) and ( 21) can be used to define the measure of the overall weak balance: where µ W is the sum of weighted counts of weakly unbalanced closed walks.Weak pairwise cohesion scores are given by ratios of individual matrix elements: with local (node-level) weak DoB given by the diagonal elements, w ii (β; k min = 3).Similarly, weak k-balance is given by considering only closed walks of a particular length k: Importantly, as in the case of strong balance, global weak DoB can be expressed as a weighted average of weak k-balance with weights given by the corresponding contribution scores (see SI, Sec.S2, for the proof).Last but not least, the trace of the matrix series defined in Eq. ( 20) used for counting unbalanced closed walks always converges, so it is well-defined.Note that: where it is known that the rightmost matrix exponential and its trace always converge, so the middle part of the inequality must converge too.

Hierarchical clustering with pairwise DoB measures
Here we will use the following naive, yet effective, clustering procedure for signed networks based on pairwise cohesion measures (see Secs.

Numerical approximations and efficiency
All computations of MSB can be implemented in a computationally efficient and accurate manner using approximations based on m leading eigenvalues and eigenvectors from both ends of the spectrum.Leading eigenpairs can be found very efficiently using modern linear algebra routines such as implicitly restarted Arnoldi method [28,42].Moreover, numerical stability can be guaranteed by conducting all computations in the log-space and using log-sum-exp trick (to avoid overflow when counting closed walks).This requires a bit of extra care as some eigenvalues may be non-positive.However, zero eigenvalues can be ignored altogether, since no measure defined here depends on the zeroth powers of adjacency matrices, so the calculations can be done over the field of complex numbers, where the logarithm of any number with non-zero modulus is well-defined, and cast back to real values only at the very end.As a result, MSB methods can be remarkably efficient, even when applied to very large systems.Secs.S3 and S5 in the SI presents empirical analyses of accuracy and efficiency of our implementation.Sec.S4 discusses the theoretical basis for approximations based on leading eigenvalues and eigenvectors.
A more in-depth discussion of implementation details is beyond the scope of this paper, but we invite the interested reader to study our source code (see: Data and code availability).

New Guinea Highlands tribes
An undirected unweighted signed network of friendships among tribes of Gahuku-Gama alliance structure of the Eastern Central Highlands region in New Guinea [35].Edge sign indicates either friendship or enmity.Accessed from: https://networks.skewed.de/net/new_guinea_tribes

Epinions trust network
This is a who-trust-whom online social network (directed, unweighted and signed) of a a general consumer review site Epinions.com.Members of the site can decide whether to "trust" each other.All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user [36].Accessed from: https: //snap.stanford.edu/data/soc-Epinions1.html.

Wikipedia adminship vote
A directed unweighted signed network of votes on Request for Adminship (RfA) elections from a 2008 snapshot of Wikipedia [29].Nodes represent editors, and a directed edge (i, j) indicates that editor i voted on editor j.Edge sign indicates the direction of the vote: positive = for, and negative = against.Edges are timestamped.Accessed from: https://networks.skewed.de/net/elec.

Slashdot Zoo network
A directed unweighted signed network of interactions among users on Slashdot (slashdot.org),a technology news website [25].Users name each other as friends (positive tie) or foe (negative tie).The friend label increases the scores of post, and the foe label decreases the score.Accessed from: https://networks.skewed.de/net/slashdot_zoo.

S2. Weak DoB as a weighted average
Theorem S1.Let G be a signed graph, β > 0 a resolution parameter and 2 ≤ k = k min , . . ., k max a sequence of consecutive integers.Then: Proof.We first rewrite Eq. ( 23) in terms of Eq. ( 22) and then follow with a few simple transformations to get the final result: where in the second last equality we used the fact that k C k (β) = 1.

S3. Accuracy of numerical approximations
MSB uses two different numerical approximations to attain high computational efficiency.The first approximation happens when truncating power series to include only the terms of orders k min , . . ., k max .However, this approximation introduces no significant error by design, as LP ensures that higher order terms have very low and monotonically decreasing contributions to the overall DoB calculations.Thus, as long as enough terms are included, and typically about a dozen or two is enough, the truncation introduces no noticeable error.In principle, the lowest number of terms necessary for attaining a given cumulative contribution score can be determined easily by inspecting the contribution profile.However, here we used a simple rule-of-thumb and in all cases, unless specified otherwise, used k max = 30, which is typically more than enough (see Fig. S1A).

Relative error
Strong Weak Crucially, at β = βmax almost all of the cumulative contribution score is driven by leading low order terms, meaning that higher order terms (roughly k > 10) can be safely omitted.The inset plot presents the same data on log-log scale in order to better display the tail of the distribution.(B) Errors of DoB measures based on leading eigenvalues calculated relative to values obtained using full spectra.Errors are typically low even for m = 1 and in all cases quickly decrease as m increases.Moreover, in most of the cases they are lower for the real network (as compared to its randomized counterparts based on the Erdős-Rényi and configuration models), which is consistent with the fact that errors should be lower for networks with heterogeneous distributions of eigenvalues.
The last approximation happens when only m leading eigenpairs from the both ends of the spectrum are used.This allows for solving the corresponding eigenproblems and running other downstream calculations much faster.Moreover, as discussed in Sec.S4, this approximation is optimal and can be highly accurate, especially for real-world networks with heterogeneous spectra.Fig. S1B provides an empirical support for this claim.

S4. Analytic functions of real symmetric matrices
Let X ∈ R n×n be a real symmetric matrix and f : R n×n → R n×n an analytic function defined over the field of real square matrices.Then, f (X) = Qf (Λ) Q ⊤ , where Λ is a real diagonal matrix with eigenvalues of X satisfying |λ 1 | ≥ |λ 2 | ≥ . . .≥ |λ n | and the columns of Q are the corresponding eigenvectors.This implies that: In particular, kth power is given by X k = QΛ k Q ⊤ and exponential by e X = Qe Λ Q ⊤ .Note that Eqs.(S5) and (S6) can be approximated using only m leading eigenvalues, which allows for efficient computations.In particular, as a consequence of Eckart-Young low-rank approximation theorem [10], the error when reconstructing Y = f (X) based on m leading eigenvalues and eigenvectors, provided that X is symmetric and f is analytic, is: where Ŷ is the reconstructed matrix, and ∥•∥ F is Frobenius norm.This approximation produces a matrix minimizing the error across all rank m matrices and therefore is optimal.Moreover, it is clear that if |f (x)| is increasing the approximation is more accurate for networks with heterogeneous spectra, or when some eigenvalues are much larger (in absolute value) than others, which a property of real-world networks.

S5. Computational complexity
Below are computation times for global, local and node-wise DoB measures for the three large networks studied in this paper (Epinions, Slashdot and Wikipedia).Performance was assessed using a laptop with AMD Ryzen 9 5900HX CPU and 32Gb of RAM.As evident in Fig. S2, all running times were arguably short.Global DoB and balance profiles were calculated in about 1 second or much less.Node-wise measures (for all nodes) were calculated in no more than 16 seconds (in the case of the largest network).All results include both the time needed for solving the eigenproblem(s), which can be cached and re-used in multiple computations, as well as any downstream computations using eigenvalues and eigenvectors.Furthermore, in all cases computation times seem to scale with respect to m in a very similar fashion with an average slope coefficient (in log-log scale) of about 0.61.This indicates that, at least for relatively low values of m, MSB computation times are only moderately (sub-linearly) affected when increasing the number of used leading eigenpairs.

Figure 2 .
Figure 2. Relationship between frustration ratio and structural balance in signed networks.Positive ties are blue and negative are red.Different groups are marked with circles.

1 Figure 3 .
Figure 3. Relationship between cycles and semicycles.(A) Symmetric (reciprocated) dyads generate two (semi)cycles.(B) Asymmetric dyads generate no (semi)cycles, since a semicycle of the form i − j − i would have to cross the directed i → j link twice.(C) An example of the connection between directed cycles and semicycles in signed networks.A single directed triad can generate several different balanced and unbalanced 2and 3-semicycles (which here are marked with two-way arrows).

Figure 4 .
Figure 4. Contribution (top) and k-balance (bottom) profiles in four real networks studied by Estrada and Benzi [12] (see Methods, Sec.4.6, for dataset descriptions) based on WB (β = 1) and MSB (βmax) approaches.Approximations based on m = 10 leading eigenvalues from both ends of the spectrum were used.

1 Figure 5 .
Figure 5. Semiwalks in directed signed networks.Positive and negative semiwalks passing through symmetric dyads with opposite edge signs cancel each other out.

Differentiation t = 2 Polarization 1 Figure 6
Figure 6.Re-analysis of Sampson's Monastery networks using MSB approach.Full spectra were used in computations (exact results).(A) Signed sociograms at times t = 2, 3, 4. Left side colors denote block membership according to the "ground truth" partition and right side colors correspond to MSB partitions.Positive ties are blue and negative are red.Individuals of which "ground truth" and MSB block memberships differ (Amand, Basil and John Bosco) as well as the leaders of Young Turks (John Bosco and Gregory) are labelled.Network layout was determined with Kamada-Kawai algorithm using only positive ties with weights (distances) on cross-block ties rescaled by the factor of 5. (B) Time series of strong and weak DoB measures for t = 1, . . ., 5 using MSB as well as strong DoB based on WB approach of Estrada and Benzi[12], which is equivalent to MSB approach with β = 1 using ordinary adjacency matrix.(C) Weak local balance expressed as z-scores relative to the overall distribution.Points are sized proportionally to local contributions and ordered first by block membership and then by balance scores.Members who remained at the monastery after the culmination of the conflict (t = 5) are marked with red labels on the subplot for t = 4. (D) Time series of frustration ratios for t = 1, . . ., 5 according to partitions obtained with MSB and WB (β = 1) approaches as well as the "ground truth" solution (which is defined only for times t = 2, 3, 4).F U MSB denotes frustration values using unweighted MSB approach.
Fig 6A shows, over time John Bosco established more positive connections with Outcasts and developed negative feelings towards Gregory.At the same time, the core of Loyal Opposition strengthened internal connections and became very cohesive at time t = 4, as indicated by high weak local balance scores of most of the individuals with red labels on Fig. 6.

1 Figure 7 .
Figure 7. Polarization in the U.S. Congress between 93th and 114th Congress.Panels are divided into regions corresponding to subsequent White House administrations with colors denoting Republican (red) and Democratic (blue) presidents.Approximations based on m = 10 leading eigenpairs from the both ends of the spectrum were used.Starting from the top, (1st panel) shows strong DoB time series based on MSB approach, B(βmax) and WB of Estrada and Benzi[12], B(1).(2nd panel) presents frustration ratios for best partitions into 2 clusters, F (2), general partitions minimizing F (B), and partitions based on partisan affiliations, F (P). (3rd panel) quantifies similarity between party-based partitions and best bipartitions, AMI(2, P), as well as best partitions into k clusters, AMI(B, P), using Adjusted Mutual Information (AMI) score[47].The closer values are to 1, the better is the match between two clustering solutions.(4th panel) shows the number of clusters in the solution minimizing F (B) (left y-axis), as well as the fraction of nodes within the two largest clusters (right y-axis).
2.2.4 and 4.2).Let D S ij = 1 − b ij (β max ) and D W ij = 1 − w ij (β max ) be pairwise dissimilarity matrices (so D S ii = D W ii := 0) based on the notions of strong and weak balance respectively, and let N b be the maximum number of clusters one is willing to consider.Then, for b = 1, . . ., N b :

1 Figure S1 .
Figure S1.Effects MSB assessed using bill co-sponsorship network from the U.S. Senate during 114th Congress (|V | = 100, |E| = 3696) as well as its randomized counterparts based on Erdős-Rényi model and configuration model[34].(A) Contribution profiles are clearly almost identical for the original and randomized networks.Crucially, at β = βmax almost all of the cumulative contribution score is driven by leading low order terms, meaning that higher order terms (roughly k > 10) can be safely omitted.The inset plot presents the same data on log-log scale in order to better display the tail of the distribution.(B) Errors of DoB measures based on leading eigenvalues calculated relative to values obtained using full spectra.Errors are typically low even for m = 1 and in all cases quickly decrease as m increases.Moreover, in most of the cases they are lower for the real network (as compared to its randomized counterparts based on the Erdős-Rényi and configuration models), which is consistent with the fact that errors should be lower for networks with heterogeneous distributions of eigenvalues.

Figure S2 .
Figure S2.Running times of global, local and node-wise DoB measures (both strong and weak).Lines correspond to median times (over 10 repetitions) and bounds to 1st and 9th deciles.