Identifying hidden coalitions in the US House of Representatives by optimally partitioning signed networks based on generalized balance

In network science, identifying optimal partitions of a signed network into internally cohesive and mutually divisive clusters based on generalized balance theory is computationally challenging. We reformulate and generalize two binary linear programming models that tackle this challenge, demonstrating their practicality by applying them to partition signed networks of collaboration and opposition in the US House of Representatives. These models guarantee a globally optimal network partition and can be practically applied to signed networks containing up to 30,000 edges. In the US House context, we find that a three-cluster partition is better than a conventional two-cluster partition, where the otherwise hidden third coalition is composed of highly effective legislators who are ideologically aligned with the majority party.


Introduction
Signed networks, in which nodes can be connected by positive or negative ties, occur in many contexts.To identify communities in signed networks it is often useful to put the nodes into clusters so that most positive ties are within clusters, while most negative ties are between clusters.Identifying clusters of nodes that optimally meet these criteria is computationally challenging, but we present practical methods for doing so.Applying these new global optimization methods to signed networks of the US House of Representatives shows that legislators are actually organized into three coalitions whose ideological composition offers new insights on the otherwise obscured interplay between partisanship and legislative effectiveness.
Signed networks are studied in a diverse range of contexts in both the natural [1][2][3] and social [4][5][6][7] sciences.Across these contexts, it is often of interest to identify clusters of nodes that are internally cohesive and mutually divisive, and thus partially satisfy the conditions of generalized balance [8][9][10][11] .Recent computational work on signed network analysis has focused on determining the network's level of balance in general [12][13][14][15] , and in the context of signed graphs with node attributes 16,17 .However, although optimization-based methods exist for estimating a network's level of balance 18 by heuristically partitioning it into k = 2 clusters 13 or computing its exact level of balance by optimally partitioning it into k = 2 clusters 2,19,20 , identifying an optimal partition of nodes into k ≥ 2 clusters that corresponds to the network's level of k-balance (a.k.a.weak balance, generalized balance, and clusterability 8 ) has remained a challenge.This computational challenge involves solving fundamental non-deterministic polynomially acceptable hard (NP-hard) graph optimization problems to global optimality 19,[21][22][23] .
A common misconception about solving NP-hard optimization problems is that they can be addressed using "only heuristic methods" 24 .Previous work in this area has used a modified concept of network modularity to incorporate signed edges into a modularity maximization procedure 24,25 .They used a tabu search heuristic algorithm on a signed network with 1131 edges 25 and used a simulated annealing heuristic algorithm on a signed network with 2517 edges 24 , in each case settling for sub-optimal partitions whose distance from optimality remains unknown.Unlike modularity, the concept of frustration 15,26 requires no modification for application in signed networks because it originates from Ising models of atomic magnets in which couplings of opposite nature exist 27 which are analogous to signed ties.Using frustration and two mathematical optimization models, we propose and demonstrate a general method for finding a globally optimal partition of signed networks into k ≥ 2 clusters.
Identifying an optimal partition of nodes into internally cohesive and mutually divisive clusters involves two computational challenges.The first challenge is finding a k-partition of a signed network, placing nodes into k clusters that minimize intra-cluster negative and inter-cluster positive edges (frustrated edges), where k is selected in advance 19 .A second challenge is finding the smallest number of clusters k * min that minimizes frustrated edges among all partitions across all values of k.These challenges are unique from, but conceptually analogous to related challenges in community detection in unsigned networks: It is difficult to find a modularity maximizing partition into a specific number of clusters, but even harder to find the modularity maximizing partition into any number of clusters 28 .We solve the first challenge by generalizing a mathematical programming model for finding an optimal 2-partition 15,19 and introducing a generalized model to find optimal k-partitions.Then, we tackle the second challenge by reformulating another mathematical model 22 for non-complete graphs and solving it without providing the number of clusters.
We demonstrate the practicality of these methods, and illustrate how they can generate novel insights, by applying them to signed networks of political collaboration and opposition in the US House of Representatives from 1981 to 2018.Research on and descriptions of the US House usually place legislators into clusters defined by legislators' political party affiliations.However, reliance on a simple binary attribute risks oversimplifying this complex system because it ignores information about the positive and negative interactions between individual legislators.We explore whether placing legislators into optimal clusters defined by their interactions, rather than simply by their parties, better captures the coalitional structure of the chamber.We find that the best fitting parsimonious solution places legislators into three clusters characterized by a large liberal coalition, a large conservative coalition, and a smaller ideologically fluid coalition.Interestingly, we find that members of this ideologically fluid third coalition are substantially more effective at passing legislation than members of either dominant coalition.These findings suggest that, although political parties are clearly influential in US politics, some of the heavy lifting in the US House is done by a small splinter coalition of highly effective legislators who are ideologically aligned, but not necessarily collaborating, with members of the majority party's core.

Partitioning signed networks
In this section, after introducing notions of k-balance and signed networks, we propose two related mathematical models.The first model finds an optimal partition of nodes into exactly k clusters.The second model finds an optimal partition across all possible partitions.When used together, these models guarantee a globally optimal partition and provide the smallest number of clusters according to generalized balance.

Preliminaries
A signed network is an undirected simple graph with positive and negative signs on the edges usually denoted as G = (V, E, σ ) where V and E are the sets of nodes and edges respectively, and σ is the sign function σ : E → {−1, +1}.Graph G contains |V | = n nodes and its symmetric signed adjacency matrix is denoted by A. The set E of edges contains m − negative edges and m + positive edges adding up to a total of |E| = m = m + + m − undirected signed edges.An edge with endpoints i and j is represented by (i, j) such that i < j.Given a signed graph e. every node belongs to exactly one subset).Balance theory was conceptualized in the 1940s in the context of social psychology 29 , recast in graph theoretic terms in the 1950s 18 , and generalized in the 1960s 8 .Whereas classic balance holds that a signed network can be partitioned into up to two clusters 18 , generalized balance holds that it can be partitioned into any number of clusters.Generalized balance theory allows a more flexible structural decomposition of networked systems, which in turn offers a more nuanced view of polarization in social and political systems [30][31][32] .According to generalized balance theory, a signed network is k-balanced (i.e.clusterable) if its nodes can be partitioned into k clusters (or "coalitions" 33 ) such that each positive edge joins nodes belonging to the same cluster, and each negative edge joins nodes belonging to different clusters 8 .Edges that fail to meet these criteria (i.e. a negative edge within a cluster, or positive edge between clusters) are called frustrated edges under that partition.
Generalized balance in empirical signed networks can be analyzed by measuring their distance to clusterability 9,11,19 .The distance of a given network G to clusterability can be quantified as the minimum number of frustrated edges among all possible partitions into k clusters [11, k-clusterability or the minimum number of frustrated edges among all possible partitions with any number of clusters 1 ≤ k ≤ n [9, clusterability index C(G)].Obtaining these measures require intensive computation and are NP-hard 21 .

Finding an optimal k-partition and the k-clusterability index
We formulate an optimization model that computes the k-clusterability index of an input signed network in its optimal objective function.In a given feasible solution of the optimization problem, each node belongs to one of a set of k clusters C = {1, 2, . . ., k}.The binary decision variable x ic takes the value 1 if node i ∈ V belongs to cluster c ∈ C (and x ic = 0 otherwise).We consider that a positive edge (i, j) ∈ E + is frustrated (indicated by f i j = 1) if its endpoints i and j are in different clusters; otherwise it is not frustrated (indicated by f i j = 0).A negative edge (i, j) ∈ E − is frustrated (indicated by f i j = 1) if its endpoints i and j are in the same cluster; otherwise it is not frustrated (indicated by f i j = 0).
Using the binary decision variable x ic , we formulate the process to find an optimal k-partition and compute the k-clusterability index as the binary linear programming model in Eq. ( 1).The model in Eq. ( 1) is an extension of a model based on classic balance which provides an optimal 2-partition and computes the 2-clusterability index (a.k.a. the frustration index) of a signed network 15,19 .
The objective function in Eq. ( 1) computes the minimum number of frustrated edges among all k-partitions.The first set of constraints in Eq. ( 1) ensures that each node belongs precisely to one cluster.The second and third sets of constraints formulate the relationship between frustration of an edge (left-hand side) and the cluster membership of the endpoints of that edge (right-hand side) respectively for positive edges and negative edges.Refer to the Supplementary Information for more details and an illustrative numerical example on how the k-partitioning model in Eq. ( 1) works.

Finding an optimal partition and the clusterability index
The more general problem of finding an optimal partition without specifying k and computing the clusterability index of a signed network G is known as the Correlation Clustering problem 21 (and the Clique Partitioning problem if the graph is complete 34 ).We reformulate the mathematical model initially proposed in 22 which is defined in the context of complete graphs and widely used in the literature [35][36][37][38] as follows.For every pair of nodes i, j, i < j, we define the binary decision variable y i j which takes the value 1 if i and j belong to the same cluster and takes the value 0 otherwise.min ∑ (i, j)∈E a i j ((a i j + 1)/2) − a i j y i j s.t.
The model in Eq. ( 2) uses these binary variables to count the frustrated edges in the objective function.in Eq. ( 2), the term a i j represents the entry of the input graph's adjacency matrix A associated with the pair of nodes i, j ∈ V .To efficiently handle possibly non-complete graphs, we use the set T for the constraints of the model in Eq. ( 2).
denotes the set of all node triples with at least one edge between two of them.Refer to the Supplementary Information for more details and an illustrative numerical example on how the partitioning model in Eq. ( 2) works.
Although we use both models in Eq. ( 1)-( 2), they are not necessarily dependent.Under the assumption that k <<< n, our proposed model in Eq. ( 1) is less computationally intensive than the model proposed by 22 , which we have reformulated in Eq. ( 2).Despite similar scaling of the number of variables with O(n 2 ), constraints of ( 1) have a quadratic growth with O(n 2 ) while constraints of ( 2) have a cubic growth O(n 3 ).
These models can be used for optimally partitioning any signed network into internally cohesive and mutually divisive clusters based on generalized balance.However, it is important to note that they can yield a multiplicity of optimal solutions, that is, they do not necessarily yield a single unique partition because multiple optimal solutions may exist (see the Supplementary Information for more details).Despite this potential multiplicity, these models offer two kinds of advantages over existing methods with similar goals.First, unlike heuristic partitioning methods that can provide locally optimal partitions 24 , the partitions identified by these models come with a guarantee of global optimality that means no better partition exists.Second, unlike other optimal partitioning methods that have been applied to small 23,35 or complete 21 signed networks, these models can be practically solved even for networks of considerable size and order, and for networks that are not complete, which are typical in social contexts.In the next section we demonstrate their practicality using networks with up to 30, 000 edges.We solve the optimization models in Eq. ( 1) and ( 2) to global optimality using Gurobi solver (version 9.1) 39 on a virtual machine with 32 Intel Xeon CPU E7-8890 v3 @ 2.50 GHz processors running 64-bit Microsoft Windows Server 2019 R2 Standard.

Partitioning the US House networks
In the previous section we generalized one model and reformulated another model that together guarantee a globally optimal partition of a signed network according to generalized balance.In this section, we show that they are computationally feasible and can be solved in a practical amount of time.To illustrate their practicality, we apply them to 19 networks varying in size, density, and structure that represent political collaborations in the US House of Representatives in different eras.Although these networks are not 'large' compared to some networks (n ∼ 445, 4954 ≤ m ≤ 31936), they are large by comparison to the size of signed networks for which globally optimal partitions have been obtained before 23,35,40 .

Optimal coalitions
We compare several ways to partition US House legislators into clusters or "coalitions" 33 , with the goal of determining the optimal number and the composition of these coalitions.The fitness of a given partition is indicated by its associated number of frustrated edges.The conventional method is to partition legislators into coalitions based on their party affiliations, while here we also explore partitioning legislators into coalitions by applying the optimization models in Eq. ( 1)-( 2) to signed networks of their collaborations and oppositions.Throughout our application of these models in the US House context, we use the term "coalition" to refer to the clusters of legislators within a partition, however the partition is obtained, not only because it is commonly used in political contexts, but also because it was the term suggested for signed network partitions by Harary and Kabell 33 .Legislators' memberships in these coalitions depend on either an attribute (e.g.their political party affiliation) or the solution to ( 1)-( 2), but does not necessarily imply their cohesion with other members of the same coalition.
Fig. 2 illustrates the number of frustrated edges (y-axis) for partitions based on party affiliations and optimal k-partitions for k ∈ {2, 3, . . ., 7} (x-axis) in signed US House networks (see SI Table 1).The number of frustrated edges for a party-based partition (denoted by C party (G)) is considerably larger than that of an optimal 2-partition.This implies that defining coalitions simply in terms of legislators' party affiliations leads to many frustrated edges, and therefore to a poor description of the coalition structure of the chamber.The number of frustrated edges decreases further from k = 2 to k = 3, which implies that defining coalitions in terms of classic balance still leads to many frustrated edges and thus a poorer fit than defining coalitions in terms of generalized balance.For k > 3 there is only marginal decline, and then stagnation, in the number of frustrated edges.Substantively, these results suggest that the signed US House networks are better described by a partition into k > 2 coalitions than by a more conventional partition into only two coalitions 20 .
Fig. 2 also reveals the changes over different eras of the House (e.g.sessions with start years 1981-1993 in darker blue-purple shades and 2003-2017 sessions in lighter green-yellow shades).Party-based partitions offer a better fit (i.e.fewer frustrated edges) in recent sessions than in earlier sessions due to increases in partisanship 5,20,41 .However, despite changes in the level of partisanship over time, for every session Because the results from Fig. 2 only cover a small range of k, a natural question is whether the fit could be improved further by using larger values of k.Finding the answer is not practically feasible using only the model in Eq. ( 1).Therefore, we solve Basis of partitioning
Through this comparison, we verify that further decline in the frustrated edges is not possible because among all 19 networks, C(G) = C k (G) at k ≤ 7. The legend of Fig. 2 shows for each network the exact point of stagnation k * min , which is the smallest number of clusters that minimizes the k-clusterability index across all values of k: k * min = arg min 1≤i≤n C i (G).

Coalition ideology
Having identified several ways to assign legislators to coalitions in the US House, including optimal k-partitions and optimal partitions, we now examine the ideological compositions of coalitions defined from three perspectives: party, classic balance (k = 2), and generalized balance (k = 3).Although we found that 3 ≤ k * min ≤ 7, in the remaining substantive analyses we focus on the 3-partition in the generalized balance case because k > 3 offers only small improvements in fit and therefore k = 3 offers a reasonable trade-off between fit and parsimony (See SI Figures 5 & 6).Fig. 3 displays the distribution of coalition members' ideology, for each method of defining coalitions (See SI Table 2).Coalitions with left-leaning liberal ideologies are shaded blue, while coalitions with right-leaning conservative ideologies are shaded in red; the solid vertical lines indicate a coalition's median ideology.
Partitioning legislators into coalitions based on their political party affiliations (Fig. 3, left column) is the conventional approach in political science, and here displays the familiar pattern of increasing ideological polarization.Partitioning legislators based on classic balance (Fig. 3, center column) offer a more data-driven classification because legislators' coalition memberships are based on their collaborative and oppositional interactions, but is still restrictive because it allows a maximum of two coalitions.The classic balance coalitions display similar ideological distributions to those based on political party: increasing liberal-conservative ideological polarization.
Partitioning legislators into 3 coalitions based on generalized balance (Fig. 3, right column) also offers a data-driven classification, but allows more nuance.Like the other partitions, the generalized balance partition is characterized by a large liberal coalition and and a large conservative coalition that diverge over time.However, it also includes a smaller and ideologically fluid coalition shaded in green.In the 435-member chamber, this 'third coalition' ranges in size from only 4 members in the 113 th session (2013) to 69 members in the 111 th session (2009).It also ranges in ideology from very liberal in the 98 th -102 nd sessions (1983-1991), to center-left in the 103 th and 111 th sessions (1993 and 2009), to center-right in the 105 th -110 th sessions (1997-2007).

Coalition effectiveness
The primary task of legislators is to pass laws, and their ability to do so is referred to as legislative effectiveness [42][43][44] .Therefore, we examine the legislative effectiveness of coalitions in the US House of Representatives, again considering coalitions defined from three perspectives: party, classic balance (k = 2), and generalized balance (k = 3).Fig. 4 displays coalition members' mean effectiveness, for each method of defining coalitions (See SI Table 2).The left-leaning liberal coalition shown as a blue line and the right-leaning conservative coalition shown as a red line.Gray bands illustrate the 95% confidence interval around each estimate, while the blue (Democrat) and red (Republican) backgrounds indicate the majority party in a given session.
Coalitions based on political parties (Fig. 4, top panel) illustrate an expected pattern 45 : the majority party is most effective.This occurs not only because the majority party has more votes, but because it controls key procedural details of the chamber including deciding which bills will come for a vote and when (i.e.agenda-setting power 44 ).Coalitions based on classic balance (Fig. 4, center panel) display essentially the same pattern.
Coalitions based on generalized balance (Fig. 4, bottom panel) also display a similar pattern, but with important differences.The large liberal coalition is still more effective when Democrats hold the majority, while the large conservative coalition is still more effective when Republicans hold the majority.However, these two dominant coalitions are both less effective than their party-or classic balance-defined counterparts.These lower levels of effectiveness are explained by the inclusion of the third coalition, shown as a green line, which is the most effective coalition in most sessions.The size and color of the dots along this green line indicate the third coalition's size and median ideology, and highlight that members of the third coalition usually are ideologically aligned with the majority party.
During transitional periods when the majority party changed, members of the third coalition are temporarily less effective.However, during periods of stable party control 46 , the highly effective third coalition has been anchored by a small number of consistent and ultra-effective members.For example, the liberal-leaning third coalition during the Democratic-controlled 99 th -102 nd sessions (1985-1990)  Not only are members of the third coalition more effective than their traditional liberal and conservative coalition counterparts, but they also maintain distinctive political relations.Members of the traditional coalitions have 2.68 negative edges for every positive edge, but members of the third coalition have 21.18 negative edges for every positive edge (See SI Figure S3).Moreover, although 8.44% of traditional coalition members' negative edges are with co-partisans, over one-quarter (25.6%) of third coalition members' negative edges are with co-partisans.

Discussion
Optimally partitioning signed networks according to generalized balance theory is computationally challenging, but often essential to understanding their structure.In this paper, we have developed a solution to this challenge, both demonstrating its computational feasibility and highlighting the novel structural insights that the resulting optimal partitions can reveal.Specifically, we have developed a pair of optimization models that make it practical to partition a signed network into exactly k clusters that minimize the number of frustrated edges across all possible k-partitions (taking 3.3 hours on average for our networks with up to ∼ 30, 000 edges using Eq. ( 1)), and to identify the smallest number of clusters that minimizes the number of frustrated edges across all possible partitions (taking 14 hours1 on average for our networks with up to ∼ 30, 000 edges using Eq. ( 2)).Applying these models to signed networks of collaboration and opposition among legislators in the US House allowed us to determine that these relationships are not structured by legislators' political party affiliations, but instead by a three coalition system composed of a dominant liberal coalition, a dominant conservative coalition, and a previously obscured 'third coalition.'This hidden third coalition is noteworthy because its median ideology is unstable, however its members are consistently more effective at passing legislation than their colleagues in either of the dominant coalitions.
Just as community detection algorithms advanced the ability to uncover patterns in unsigned networks a decade ago 28 , these models can advance the ability to uncover patterns in signed networks.However, unlike most community detection algorithms for which global optimization is not possible 47 , our models guarantee an optimal signed network partition.These innovations are important because signed networks are already studied in a wide range of contexts including biology [1][2][3] , finance 2,4 , and politics 5,7,20 .Moreover, statistical models now exist that enable signed networks to be constructed from virtually any empirical bipartite network data 48 , making signed networks available for analysis in a still broader range of contexts.The models we propose are perfectly general, but we demonstrated their practicality for globally optimal partitioning of real-world signed networks with up to 30, 000 edges.In practice, this is a minor limitation because most empirical signed networks contain fewer edges, and models for constructing signed networks include methods for sparsifying otherwise dense signed networks 48 .
In addition to the methodological advances that our optimization models offer in the study of signed networks, our illustrative application of these models has also revealed a new way of thinking about how the US House of Representatives is organized.We observe that partitioning legislators into three coalitions according to generalized balance offers a better fit to their observed pattern of collaborations and oppositions than simply clustering them by political party.This suggests that the forces guiding coalition formation in the US House are more subtle and go beyond partisanship alone, even during periods of extreme polarization.
The previously obscured 'third coalition' we identified is unique in two important respects.First, members of the third coalition are highly effective at passing legislation, which has implications for how a party's majority status is interpreted.Although members of the majority political party always appear to be more effective than members of the minority party, a substantial portion of this apparent majority advantage is conferred by the highly effective members of the third coalition, who tend to be ideologically aligned with the majority.Second, members of the third coalition have a much higher ratio of oppositions (negative edges) to collaborations (positive edges), and maintain more oppositions with members of their own party, which has implications for how membership in the third coalition is interpreted.These patterns suggest that although members of the third coalition may be ideologically aligned with the dominant coalition and majority party, they nonetheless represent a breakaway faction that are highly effective despite their rejection of partisanship.Our ability to identify such a cluster is noteworthy because it provides empirical support for earlier simulation studies suggesting that the introduction of independent legislators to an existing two-party legislature can increase the body's overall legislative effectiveness 49 .Although these simulation studies might have been viewed as hinting at a strategy for reinvigorating democratic systems plagued by partisanship, our findings suggest it may already be in place in the US House of Representatives.

Methods
We infer the collaboration and opposition patterns of legislators from their bill co-sponsorships 5,50,51 .These data begin as a bipartite network B in which legislators are connected to the bills they sponsor in a given session.From this, we construct the bipartite projection P, which captures the number of bills each pair of legislators has co-sponsored together.Finally, we use the Stochastic Degree Sequence Model (SDSM) 51 , implemented in the backbone package (version 1.5.0) in R 48,52 , to statistically infer a signed network of political collaboration and opposition.The SDSM applies a statistical test to the bipartite projection to yield a signed backbone P in which there exists a positive (negative) edge between each pair of legislators who have co-sponsored statistically significantly more (fewer) bills than expected by chance.The random expectation is obtained from a canonical null model in which bill sponsorship is random, but expected values of both degree sequences of B are preserved.Because the SDSM involves performing a statistical test for each pair of legislators, we ensure a family-wise error rate of α = 0.01 by applying a Holm-Bonferroni correction 53 .
We measure legislators' ideology using 1 st dimension Nokken-Poole ideology scores obtained from the Voteview database 54 .These scores are similar to the widely used DW-Nominate ideological scores [55][56][57] , ranging from −1 (liberal) to 1 (conservative), except that they can vary across sessions.We measure legislators' effectiveness using legislative effectiveness scores provided by the Center for Effective Lawmaking at https://thelawmakers.org/data-download.These scores were computed from fifteen indicators constructed from the intersection of three types of bills (commemorative, substantive, or substantive and significant) and five stages of a bill's progression through the legislative life cycle (sponsored, committee action, post-committee action, chamber passage, and becoming law).These fifteen indicators capture the effectiveness of a legislator to advance their agenda items using methods described by 44 , and are normalized so that the mean effectiveness in each session is 1.

Solving the graph optimization models
The proposed optimization models can be solved by mathematical programming solvers which supports 0/1 linear programming (binary linear) models.The code for both optimization models will be made available on a GitHub repository at https: //github.com/saref/clusterability-indexonce this paper is published.In the GitHub repository, we provide Python code for using Gurobi solver (version 9.1) to solve the proposed binary linear models and obtain optimal partitions of signed networks into internally cohesive and mutually divisive clusters based on generalized balance theory.

An illustrative numerical example for the k-partitioning model
We provide a numerical example to illustrate how the mathematical programming model in Eq. 1 (in the paper) works (and how it is solved by a branch and bound algorithm).Consider that the model in Eq. 1 is given the example signed graph of Fig. 1 (in the paper) and a the pre-defined value of k = 3 for the number of clusters.
The main role of the solver that solves this model is to explore the space of feasible solutions (feasible ways of clustering the input signed graph into k clusters) and finding a feasible solution which is associated with the minimum number of frustrated edges.Without loss of generality, we can consider one step of this optimization process is evaluating the objective function value (the frustration count) for a given feasible solution.The following numerical example explains how the solver handles the model to complete this step and move forward if needed.
Consider that the optimization solver is to evaluate the frustration count of the partition illustrated in Fig. 1 (B).The non-zero x ic binary decision variables for this partition are as follows: and x 5,2 = 1.Every other x ic variable has to be 0 for these variables to constitute a feasible solution (due to the first set of constraints of the model The second and third sets of constraints allow the model to determine the frustration status of each edge by quantifying all f i j variables based on the values of the x ic variables for the feasible solution under evaluation. For the positive edges (1, 3) and (2, 3), the second set of constraints ( f i j ≥ x ic − x jc ∀(i, j) ∈ E + , ∀c ∈ C) is in place.These constraints for the feasible solution in Fig. 1 (B) lead to f i, j ≥ 0. Given the flexibility for taking either binary value, the minimization pressure from the objective function sets the values for f 1,3 and f 2,3 to 0. This means that the edges (1, 3) and (2, 3) are not frustrated because they are positive and have the same cluster membership on their endpoints.
For the edge (4, 5), the third set of constraints is in place because it is a negative edge.The constraint associated with c = 2 leads to f i, j ≥ 1 for the feasible solution in Fig. 1 (B).Therefore, f 4,5 takes the value 1.This means that the edge (4, 5) is frustrated because it is negative and has the same cluster membership on its endpoints.
Accordingly, the objective function ∑ (i, j)∈E f i j is evaluated by the model to 1 for the partition illustrated in Fig. 1 (B).As the linear programming relaxation of the model in Eq. 1 has a solution of 0 for the signed graph in Fig. 1, the solver does not stop at this feasible solution and continues exploring other feasible solutions.
At some point, it finds the feasible solution for the partition illustrated in Fig. 1 (C).The constraints of the model and the pressure from the minimization objective function lead to all f i, j variables taking the value 0. Therefore, the objective function evaluates to 0.
At this stage of the branch and bound process, the upper bound (objective function of the best feasible solution found so far) and the lower bound (LP relaxation solution) reach each other and the solver stops and reports the partition illustrated in Fig. 1 (C) as an optimal k-partition for the input signed graph and the pre-defined parameter k = 3.

An illustrative numerical example for the partitioning model
We provide a numerical example to illustrate how the mathematical programming model in Eq. 2 (in the paper) works (and how it is solved by a branch and bound algorithm).Consider that the model in Eq. 2 is given the example signed graph of Fig. 1 (in the paper).
The main role of the solver that solves this model is to explore the space of feasible solutions (feasible ways of clustering the input signed graph into any number of clusters) and finding a feasible solution which is associated with the minimum number of frustrated edges.Without loss of generality, we can consider one step of this optimization process is evaluating the objective function value (the frustration count) for a given feasible solution.The following numerical example explains how the solver handles the model to complete this step and move forward if needed.
Consider that the optimization solver is to evaluate the frustration count of the partition illustrated in Fig. 1 (B).The non-zero y i j binary decision variables for this partition are as follows: y 1,2 = 1, y 1,3 = 1, y 2,3 = 1, and y 4,5 = 1.Every other y i j is 0 because no other pairs of nodes are in the same cluster.
Note that the term in the objective function for a positive edge is 1 − y i j because a positive edge is frustrated when its endpoints are in different clusters.The term in the objective function for a negative edge is y i j because a negative is frustrated when its endpoints are in the same cluster.
Given the values of y 1,3 = 1 and y 2,3 = 1, the contribution of positive edges (1, 3) and (2, 3) to the objective function is 0. This means that the positive edges (1, 3) and (2, 3) are not frustrated because they have the same cluster membership on their endpoints in the partition illustrated in Fig. 1 (B).
Given the value of y 4,5 = 1, the negative edge (4, 5) contributes 1 to the objective function.This means that the negative edge (4, 5) is frustrated because it has the same cluster membership on its endpoints.The contribution of all other negative edges is 0 because they all have different cluster memberships on their endpoints in the partition illustrated in Fig. 1 (B).
Accordingly, the objective function ∑ (i, j)∈E a i j ((a i j +1)/2)−a i j y i j is evaluated by the model to 1 for the partition illustrated in Fig. 1 (B).As the linear programming relaxation of the model in Eq. 2 has a solution of 0 for the signed graph in Fig. 2, the solver does not stop at this feasible solution and continues exploring other feasible solutions.
At some point, it finds the feasible solution for the partition illustrated in Fig. 1 (C).The non-zero y i j binary decision variables for this partition are as follows: y 1,2 = 1, y 1,3 = 1, and y 2,3 = 1.Every other y i j is 0 because no other pairs of nodes are in the same cluster.The objective function evaluates to 0 because all positive edges have the same cluster membership on their endpoints and all negative edges have different cluster memberships on their endpoints.
At this stage of the branch and bound process, the upper bound (objective function of the best feasible solution found so far) and the lower bound (LP relaxation solution) reach each other and the solver stops and reports the partition illustrated in Fig. 1 (C) as an optimal partition for the input signed graph.

Using Gurobi for solving the proposed optimization models
Our proposed algorithms are developed in Python 3.8 based on the mathematical programming models discussed in the paper which partition signed networks based on generalized balance into an optimal k-partition or an optimal partition without specifying k.
These optimization algorithms are distributed under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.This means that one can use these algorithms for non-commercial purposes provided that they provide proper attribution for them by citing the current article.Copies or adaptations of the algorithms should be released under the similar license.
The following steps outline the process for academics to install the required software (Gurobi solver 39 ) on their computer to be able to solve the optimization models:

Significance, generalizability, and limitations of our computational methods
The computational results we provided have broad relevance because they demonstrate the practical feasibility of solving fundamental NP-hard signed graph partitioning problems.Solving these partitioning problems are essential for exact evaluations of the structure of signed networks which go beyond the political science application we have demonstrated and have use cases in other fields from biology and physics 1,3,12,[58][59][60] to social sciences 4-7, 20, 61-64 .Specifically, our methods for partitioning a signed graph according to generalized balance improve upon heuristic methods that are fast but do not generally yield optimal partitions [36][37][38] .Additionally, our methods also improve upon existing methods for obtaining optimal partitions that are only capable of handling small graphs with n ≤ 40 23,35 .The correctness of our methods for partitioning signed networks is guaranteed by the branch and bound algorithm of Gurobi 39 which is an exact method for solving binary linear programming models to global optimality.
The sizes of real-world instances we have solved to global optimality are considerable and therefore suggest that our proposed models can be used for a wide variety of other applications with networks of similar and smaller sizes.For example, the network of the 115 th session has n = 448 nodes, m = 31, 936 edges, and |T | = 9, 134, 395 node triples with at least one edge.Obtaining an optimal 7-partition using Eq. 1 leads to an optimization model with nk + m = 35, 072 binary variables and mk + n = 224, 000 constraints, which takes Gurobi, 1.66 hours 2 to solve.Moreover, obtaining an optimal partition (without specifying k) using Eq. 2 leads to an optimization model with n(n − 1)/2 = 100, 128 binary variables and 3|T | = 27, 403, 185 constraints, which takes Gurobi only 5.28 hours 3 to solve.While obtaining these partitions requires a few hours 4 , the resulting partition is guaranteed to be globally optimal, which is essential for an exact evaluation of the structure of the signed networks under analysis.
As expected from the NP-hardness of the problems, the main limitation of the models in Eqs.1-2 (in the manuscript) is the size of the network they can handle in a reasonable time.We have demonstrated the practicability of these models for real-world political networks with up to ∼ 30, 000 edges considering that a few hours 5 is worth finding a globally optimal solution for the exact evaluation of the structure of these network.From a practical standpoint, two factors are relevant for determining whether these computationally intensive models are suitable for a different use case: network properties and processing capabilities.Previous studies suggest that some properties of the input graph like degree heterogeneity could be determinant factors of solve time in similar problems 19 .Also, structural regularities in networks constructed from empirical data often make them easier to solve compared to synthetic networks (like random graphs) 19 .As Gurobi solver makes use of multiple processing threads to explore the feasible space in parallel, the processing capabilities of the computer that runs the optimization solver could also make an impact.Therefore, our experiments do not guarantee that every network with up to 30, 000 edges can be optimally partitioned based on generalized balance within the solve times that we have observed for our real-world instances of US House signed networks.
The computing processor configuration we have used (32 Intel Xeon CPU E7-8890 v3 @ 2.50 GHz processors) and the size of the real networks we have analyzed (m ∼ 30, 000) have led to solve times of roughly a few hours 6 per instance.One could speculate that larger networks on the same hardware or the same networks on less powerful hardware is expected to take longer.In such cases, one may consider using a non-zero optimality gap tolerance (MIPGap as a Gurobi parameter 39 ) to find solutions within a guaranteed proximity of optimality to reduce the solve time.

Multiplicity of optimal solutions
There are symmetries in the mathematical formulations for the two models in Eqs.1-2 (in the manuscript).For example, in Eq. 1, a given 2-partition can be expressed by different feasible solutions (sets of values for decision variables).This is because the clusters are treated indifferently and could be swapped while the partition remains virtually unchanged.As another example, in Eq. 2, a feasible solution does not necessarily represent a unique partition.This is due to the original formulation 22 in which a pair of non-adjacent nodes a and i may have no decision variables indicating they belong to the same cluster with any of their neighbours (denoted by b and j respectively ∀b, j : a ab = 0, a i j = 0), i.e., all the decision variables associated with a and i take the value zero.In that case, the same feasible solution could lead to two partitions (with identical fitness) depending on whether nodes a and i are placed in the same or different clusters.Another source of symmetry is the existence of isolate nodes whose optimal cluster membership is random and therefore not meaningful.When characterizing the composition of clusters in our analyses, we have ignored isolates.
Due to the symmetries outlined above, both optimization models in Eqs.1-2 generally have multiplicity in their optimal 2 23.58 minutes when solving this instance again using Gurobi 9.5.2 on a laptop with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz and 64 GBs of RAM running Windows 10 Home 3 only 10.28 minutes when solving this instance again using Gurobi 9.5.2 on a laptop with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz and 64 GBs of RAM running Windows 10 Home 4 a few minutes; using Gurobi 9.5.2 on a laptop with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz and 64 GBs of RAM running Windows 10 Home 5 a few minutes; using Gurobi 9.5.2 on a laptop with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz and 64 GBs of RAM running Windows 10 Home 6 a few minutes; using Gurobi 9.5.2 on a laptop with 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz and 64 GBs of RAM running Windows 10 Home

16/23
solutions.Finding all optimal solutions to such computationally intensive problems are not practically feasible for large instances.For small instances, however, previous studies have looked at multiplicity of optimal solutions in similar partitioning problems 6,40 .Although optimal 2-partitions can be unique in some small real-world signed networks 6 , more often multiple optimal solutions exist [6, their Fig.S1].Also, in the case of small complete random signed graphs, multiple optimal solutions may exist 40 .Due to the practical complexity of these problems and the size of empirical networks we consider, although we cannot find and analyze all optimal partitions, it is certain that the optimal partitions are not unique.Future work is needed to find practical methods for finding and analyzing all optimal partitions of such large networks.

Oppositional ties of the splinter coalition
Members of the third coalition have 21.18 negative ties for every positive tie which is substantially different from the members of traditional coalitions who have on average 2.68 negative ties for every positive tie.This distinction in oppositional ties deserves more attention and we look at the fraction of each type of edge by coalition, taking into account the party of legislator at the other endpoint of the edge.Figure 7 illustrates the fractions of positive and negative edges with co-partisans (members of the same party) for each of the coalitions based on the optimal 3-partitions.Fractions of positive (negative) edges are shown by solid (dashed) lines.The red, blue, and green lines represent the conservative coalition, the liberal coalition, and the splinter coalition respectively.It can be seen in Figure 7 that the three coalitions are similar based on the fraction of positive edges with co-partisans: members of all coalitions mainly collaborate (i.e. have a positive edge with) members of their own party.For the fraction of negative edges with co-partisans, however, the splinter coalition shifts away from the main liberal and conservative coalition.From the 104 th session, this quantity has generally increased for the splinter coalition reaching values close to 0.4.This means that legislators in the splinter coalitions have a considerable proportion (nearly 40%) of their negative edges with members of the own party.Given this distinctive feature in oppositional ties, one may conclude that the members of the third coalition are distinctively more willing to push back against their own party.Movie: Slideshow of the 3-partition coalitions of the signed US House networks A slideshow of optimal 3-partition coalitions is available online at https://saref.github.io/SI/AN2021/House_coalitions.mp4which includes all 19 House networks.Green and red edges represent significantly many and significantly few co-sponsorships respectively.Node color indicates the legislator's ideology on a blue (liberal, Nokken-Poole = -1), purple (moderate, 0), red (conservative, +1) spectrum.Node size indicates the legislator's effectiveness.Looking at the colors and positions of edges we can see that the large majority of edges are intra-cluster positive or inter-cluster negative.In these networks, only 0.05%-2.5% of the edges are frustrated under the optimal 3-partitions which indicate the closeness of the networks to the assertions of the generalized balance theory 8 .If we look at the colors of the nodes, we see the ideological divide between the members of different coalitions.The splinter coalition is the smallest cluster of the nodes which usually has several large nodes (highly effective legislators).

Dataset: frustrated_legislators.RData and frustrated_legislators.R on OSF
The file 'frustrated_legislators.RData' is an R workspace which includes a dataframe object 'data' that contains details about each legislator in each session (e.g.ideology, effectiveness, cluster membership in optimal k-partitions), and 19 igraph objects 'H###' that contain signed networks for each session.The file 'frustrated_legislators.R' in the same repository contains the R code to replicate all substantive analyses reported in the manuscript using these data.Both files are publicly available at https://doi.org/10.17605/OSF.IO/3QTFB.The data are distributed under a CC-BY 4.0 license, which means that they can be used provided they are properly attributed by citing 5,48 and the current article.

Dataset: clusters-house.csv
The results on globally optimal solutions to the optimization model for k-partitioning House networks are available in commaseparated values format at saref.github.io/SI/AN2021/clusters-house.csv.The first and second columns contain session numbers and legislator name as indicated by the headers.Each row is a legislator-session combination.The other columns are the cluster assignments based on optimal k-partitions for k ∈ {2, 3, . . ., 7} as indicated by the column header.
The entries represent the cluster assignment of the node associated to the row (the legislator-session combination) based on an optimal solution of the k-partition associated to the column.

Figure 2 .
Figure 2. Number of frustrated edges (y-axis) of signed US House networks partitioned using different criteria (x-axis).Each line represents a single network, corresponding to a session of the US House starting in the given year.Fewer frustrated edges indicate that the partition is more consistent with the ties of collaboration and opposition between legislators.

Figure 3 .
Figure 3. Distribution of coalition members' ideology in the US House of Representatives.Blue (red) curves indicate the ideologies of Democrats (Republicans) in the left column and that of the dominant liberal (conservative) coalitions in the center and right columns.In the right column, green curves indicate the ideologies of members of the smallest coalition.

Figure 4 .
Figure 4. Mean of coalition members' legislative effectiveness in the US House of Representatives.Blue (red) lines indicate the mean legislative effectiveness of Democrats (Republicans) in the top panel and that of the dominant liberal (conservative) coalitions in the center and bottom panels.In the bottom panel, the green line indicates the mean ideological effectiveness of members of the smallest coalition, while the size and color of the dot indicates the size and mean ideology of this coalition.Background shading indicates whether Democrats (blue) or Republicans (red) held a majority in the chamber during the respective session.

Figure 6 .
Figure 6.The 3-partition coalitions in the 108 th session of the House of Representatives

Figure 7 .
Figure 7. Fractions of positive and negative edges to members of the same party aggregated for each of the three coalitions

23 Table 2 .
stagnation of frustrated edges are shown in bold-face font.18/The size and effectiveness of the two parties and optimal 3-partition coalitions Partition based on political party Generalized Balance Partition (k = party-partition: D = Democratic coalition, R = Republican coalition In 3-partition: L = Liberal coalition, C = Conservative coalition, S = Splinter coalition 19/23

Table 1 .
Detailed properties and clusterability indices for networks