Detecting coalitions by optimally partitioning signed networks of political collaboration

We propose new mathematical programming models for optimal partitioning of a signed graph into cohesive groups. To demonstrate the approach’s utility, we apply it to identify coalitions in US Congress since 1979 and examine the impact of polarized coalitions on the effectiveness of passing bills. Our models produce a globally optimal solution to the NP-hard problem of minimizing the total number of intra-group negative and inter-group positive edges. We tackle the intensive computations of dense signed networks by providing upper and lower bounds, then solving an optimization model which closes the gap between the two bounds and returns the optimal partitioning of vertices. Our substantive findings suggest that the dominance of an ideologically homogeneous coalition (i.e. partisan polarization) can be a protective factor that enhances legislative effectiveness.

Detecting coalitions by optimally partitioning signed networks of political collaboration Samin Aref 1,2* & Zachary neal 3 We propose new mathematical programming models for optimal partitioning of a signed graph into cohesive groups. To demonstrate the approach's utility, we apply it to identify coalitions in US Congress since 1979 and examine the impact of polarized coalitions on the effectiveness of passing bills. Our models produce a globally optimal solution to the NP-hard problem of minimizing the total number of intra-group negative and inter-group positive edges. We tackle the intensive computations of dense signed networks by providing upper and lower bounds, then solving an optimization model which closes the gap between the two bounds and returns the optimal partitioning of vertices. our substantive findings suggest that the dominance of an ideologically homogeneous coalition (i.e. partisan polarization) can be a protective factor that enhances legislative effectiveness.
We propose a general method for identifying cohesive groups in signed networks (networks with positive and negative edges), and apply it to political networks, which have become a common focus in complex network analysis [1][2][3] . Specifically, we examine signed networks of political collaboration and opposition to identify the members of polarized coalitions in the US Congress, then use these coalitions to examine the impact of polarization on effectiveness in passing bills.
In legislative bodies where most pairs of legislators co-sponsor bills, a network of who co-sponsors with whom becomes a highly dense network which would not be suitable for studying political alliances, coalitions, and polarization. Instead, we use signed networks 4 created based on significantly many and significantly few co-sponsorships as two types of edges with opposite nature where a stochastic degree sequence model (SDSM) 5 is used as the null model, to define thresholds of "many" and "few". Previous research 6 on the same data has shown an increase in polarization in the US Congress when measured by the triangle index, which provides a locally-aggregated index of polarization based on structural balance 7 . However, the triangle index only measures the level of balance and polarization, but does not identify the members of the political coalitions that are polarized. For this we turn to the frustration index [8][9][10] (also known as the line index of balance 11 ), which optimally partitions a signed graph into two opposing but internally cohesive "coalitions" 12 . Substantively, these coalitions 13 represent groups of legislators who sponsor significantly many bills with each other (i.e. are political allies), but who sponsor significantly few bills with those in the other coalition (i.e. are political enemies). In our analyses of legislative effectiveness, we focus on the level of partisanship within the largest, and therefore controlling, coalition.
Computing the frustration index is an NP-hard problem 14 , and so is the equivalent partitioning problem that deals with minimizing the total number of intra-group negative and inter-group positive edges. The optimality of a numerical solution to an instance of an optimization problem depends on the function under optimization. Most studies on this topic use heuristic methods for partitioning signed networks under similar objectives [15][16][17][18] . These methods are not guaranteed to provide the optimal solution or even its approximation within a constant factor 14,19 , but can potentially be implemented on larger networks.
Computing the exact value of the frustration index, in principle, involves searching among all possible ways to partition a given signed network into ≤ k 2 groups in order to find the partitioning which minimizes the total number of intra-group negative and inter-group positive edges. We propose a new method for tackling the intensive computations by providing upper and lower bounds for this number, then solving an optimization model which closes the gap between the two bounds and returns the exact value of frustration index alongside the optimal partitioning of vertices.

Signed Graph and Balance theory preliminaries
In this section, we recall some basic definitions of signed graphs and balance theory.
Signed graphs. We consider an undirected signed graph , ) where V and E are the sets of vertices and edges respectively, and σ is the sign function σ → − + E :  (1) and (2).
uv Balance and cycles. A cycle of length k in G is a sequence of nodes such that for each = … i k 1, 2, , there is an edge from − v i 1 to v i and the nodes in the sequence except for = v v k 0 are distinct. The sign of a cycle is the product of the signs of its edges. A cycle with negative (positive) sign is unbalanced (balanced). A balanced network (graph) is one with no negative cycles.
Balance theory is conceptualized by Heider in the context of social psychology 20 . It was then formulated as a set of graph-theoretic conditions by Cartwright and Harary 21 which define a signed graph to be balanced if all its cycles are positive. Cartwright and Harary also introduce measuring the level of balance using, among other indices, the fraction of positive cycles [ 21 , page 288]. Three years later, Harary suggested using frustration index 11 (under a different name); a measure which satisfies key axiomatic properties 10 , but has been underused for decades due to the complexity involving its computation 19,22,23 .

evaluating Balance and frustration
In this section, we explain our computational approach to analyzing signed networks by providing brief definitions and discussions on measuring balance, frustration and partitioning, and graph optimization models.
Measuring partial balance. Signed networks representing real data are often unbalanced, which motivates measuring the intermediate level of partial balance 10 . The first measure we use is triangle index denoted by T G ( ) which equals the fraction of positive cycles of length 3 21,24 . We use Eq. (3) suggested in 7 for computing triangle index, T G ( ), in which A Tr( ) denotes the trace (sum of diagonal entries) of A.
The other measure we use is the normalized frustration index 10 denoted by F G ( ) which is based on normalizing the minimum number of edges whose removal results in a balanced graph 8,11,25 .  Fig. 1) which minimizes the total number of intra-group negative and inter-group positive edges to 1 (only edge (1,5) according to this partitioning). Note that removing edge (1,5) leads to a balanced signed graph. The frustration index of a graph G can be computed exactly by finding partitioning The normalized frustration index, F G ( ), is computed based on L G ( ) and according to Eq. (6) which allows measuring the level of partial balance based on numerical values within the unit interval (m denotes the number of edges).
One may notice some similarities between the problem of finding communities in unsigned networks [26][27][28][29][30][31] and that of partitioning signed networks to minimize the frustration count. One key difference is that in the latter problem for every pair of vertices there are three cases (as opposed to two): a positive edge, a negative edge, or no edge between the two vertices. Due to the differences between objectives of these two problems (minimizing frustration count as opposed to maximizing modularity or other quantities), the partitioning obtained from running community detection algorithms on positive edges of a signed graph will not generally minimize the frustration count.
Recent studies on frustration index and signed networks suggest 19,22 and implement 23 efficient graph optimization models to compute the frustration index of relatively large (up to 10 5 edges) sparse networks. However, the signed networks we analyze have substantially higher densities compared to the instances in 19,22,23 . This requires developing new computational models for tackling the intensive computations involved in obtaining the frustration index of dense graphs.
Bounding the frustration index. In this subsection, we discuss obtaining lower and upper bounds for the frustration index. Using these bounds is a way of substantially reducing the running time, but theoretically they are not required.
The linear programming relaxation (LP relaxation) of the binary optimization models in 19,22 can be used to compute a lower bound for the frustration index. The linear programming model in Eq. (7) is developed for this purpose.
} is the set which contains ordered 3-tuples of nodes whose edges form a triangle in G. The continuous linear programming model in Eq. (7) is developed by combining the LP relaxation of the 0/1 linear model in 22 (5)) an upper bound for the frustration index We use a specific partitioning ′ X V { , \ ′ X } as a starting point to "warm-start" the algorithm for computing the frustration index. Partitioning ′ X V { , \ ′ X } groups nodes into two subsets based on the party affiliation of legislators. To be more precise, for node i which represents a legislator, decision variable x i is given initial value 0 if the reciprocal legislator is a Democrat and x i is given initial value 1 otherwise.
Computing the frustration index. After bounding the frustration index, we use the binary linear programming model in Eq. (8) which minimizes the number of frustrated edges.  (7). We implement the speed-up techniques discussed in 19

Results
In this section, we provide the results of analyzing balance and frustration in signed networks of US Congress legislators.
Partial balance, frustration, and optimal partitioning. We evaluate the level of partial balance using two different methods. Figure 2 illustrates partial balance in the signed networks of the US Congress over time measured by the triangle index and normalized frustration index. Values of the two measures, the triangle index T G ( ) and the normalized frustration index F G ( ), are highly correlated (correlation coefficients are 0.95 and 0.91 respectively for House and Senate networks) and both show relatively high levels of partial balance which have increased in the time period 1979-2016. The results in Fig. 2 indicate an increase in the polarization of both chambers of US Congress, which is in accordance with the literature 6,[33][34][35][36] . Although the triangle index T(G) and the normalized frustration index F(G) capture very similar information concerning the level of partial balance, only the computation of F(G) also provides the partitioning that minimizes the sum of intra-group negative and inter-group positive edges.
Solving the continuous optimization model in Eq. (7) and the discrete (binary) optimization model in Eq. (8) requires intensive computations for large instances such as signed networks of the House. Given the size and density of these instances, the models in Eqs. (7) and (8)   Using the optimal values of the x i variables obtained by solving the discrete optimization model in Eq. (8), we partition nodes of each network into two groups (subsets ⁎ X , V\ ⁎ X ). For each signed network, either ⁎ X or V\ ⁎ X has the larger set cardinality and therefore represents the largest coalition for the corresponding session.
We evaluate the composition of the largest and therefore controlling coalitions in each session and chamber based on the party affiliation of its legislators. Figure 3 illustrates the number of legislators from the two main political parties in the controlling coalitions of the US Congress. As it can be seen in Fig. 3, the controlling coalitions have become more homogeneous (i.e. partisan) over the time period 1979-2016.

Legislative effectiveness and polarization in the US congress.
Within the field of comparative US politics, two topics attract particular attention at the federal level: legislative effectiveness and political polarization. Legislative effectiveness refers to the ability of individual legislators 37,38 , or of an entire legislative body 39 , to advance their agenda, typically by facilitating the passage of legislation. Political polarization (when applied to elected officials or "elites") refers to the formation of non-overlapping ideologically homogeneous groups 6,33 . When these groups mirror political party affiliations, it is also called partisan polarization. For several decades, legislative effectiveness in the US has declined (as illustrated in Fig. 4), while partisan polarization has increased 6 . These trends have led many to hypothesize that they are related, and specifically that "unified party control has [not] been legislatively more productive than divided party control" [ 40 , xii]. Based on the legislative process used by the US Congress, it might be expected that a chamber's bills are more likely to become law when the controlling party holds a larger majority, because its members can form a voting bloc. However, the analysis in the next section suggests that that changes in bill passage rates are better explained by the partisanship of a chamber's largest coalition.
Mediation in bill passage. Using a bivariate linear regression, we find that the percentage of bills introduced in a chamber that become law (passage rate) significantly declines over time. The passage rate has declined in the House by an average of 0.11 percentage points each session (β = − . < .    Tables S2 and S4). To investigate possible explanations for variations in passage rate, we estimate two separate structural equation models for each chamber. A commonsense model tests the expectation that when the majority party holds a larger numerical majority, they should have greater success passing bills 41 . The key variable in this model, party control, is defined as the absolute difference between the number of Republicans and Democrats. Computing party control does not require any information about the legislators' network. We find no support for this model; party control does not mediate the relationship between time and passage rate (see Fig. 5

(B)).
A more nuanced model tests the expectation that when the controlling coalition is more partisan and thus more ideologically unified, it will have greater success passing bills. The key variable in this model, coalition partisanship, is defined as the fraction of non-independent members in the largest coalition that affiliate with that coalition's dominant political party (see Fig. 3). We compute coalition partisanship by applying the partition method described above to a signed network of political collaboration and opposition. We find support for this model in the US House, but not the US Senate (see Fig. 5(C)). Specifically, we find that in the House, the partisan homogeneity of the controlling coalition has increased over time (β = .
< . p 0 661, 005), which is consistent with our expectation about the impact of ideological unity. Together, these effects imply a significant and positive indirect effect of time on the passage rate (β = . < . p 0 510, 005), mediated by coalition partisanship. Thus, the observed decline in bill passage rates in the US House would have been worse (direct effect: β = − . < . p 1 038, 001), but was mitigated by increasingly ideologically homogeneous coalitions, which are a protective factor against declines in legislative effectiveness.

Summary and conclusions
In this study we proposed a general method for identifying internally cohesive opposing coalitions in signed networks of legislators based on structural balance theory, then applied this method to identify opposing coalitions in the US Congress, showing that these coalitions' partisanship can explain changes in legislative effectiveness better than political parties. Based on this analysis, we offer a series of substantive and methodological conclusions.
Consistent with prior studies 6,33-36 , we find that polarization has increased in both the US Senate and US House of Representatives, and that this polarization has largely mirrored partisan divisions along political party lines. We operationalized polarization using the level of a signed graph's structural balance, and therefore measure what 6 calls "strong polarization, " but have used two different measures of balance. We find that the two measures are highly correlated and both support the conclusion of increasing polarization.
The triangle index is easy to compute, but provides only a locally-aggregated measure of a graph's level of balance. In contrast, computing the frustration index is difficult, but it provides not only a global measure of a graph's www.nature.com/scientificreports www.nature.com/scientificreports/ level of balance, but also the optimal partitioning of vertices into internally cohesive but mutually antagonist groups. We have demonstrated a practical method for computing the exact value of frustration index and identifying the optimal partition in dense graphs of | |  E 50000 that involves first obtaining upper and lower bounds, using exogenous node properties (e.g. legislators' political party affiliations), and solving a large-scale binary linear programming model. In the context of legislative networks, this method allows us to identify the most cohesive coalitions of legislators under conditions of balance theory.
Although our computational innovations make the identification of internally cohesive opposing coalitions practically feasible, we must also demonstrate that these coalitions are more informative than other simpler grouping possibilities. In the legislative context, we show that the partisan composition of these cohesive coalitions better explains the declining legislative effectiveness in the US House of Representatives than simply examining legislators' political party affiliations. This affirms Mayhew's claim that "no theoretical treatment of the United States Congress that posits parties as analytic units will go very far" [ 42 , p.27] but goes a step further by identifying an alternative analytic unit -internally cohesive opposing coalitions -that does have explanatory power. Importantly, coalitions appear useful only for explaining the legislative effectiveness of the House, but not the Senate. However, this is also consistent with existing political science theory that "the lack of majority control of [procedural] processes in the Senate negates the possibility of significant party [or other group-based] effects in that body" [ 43 , p.7]. Therefore, in general terms, our empirical findings suggest that in legislative bodies where a sufficiently large group of legislators can influence procedural processes, the composition of the largest coalition is more important than the size of the majority party's majority. This is perhaps obvious in parliamentary systems where multi-party coalition forming is essential, but is noteworthy in the non-parliamentary US Congress.
These conclusions have some significant implications for both the future study of signed networks, and of the link between polarization and legislative effectiveness. First, by providing a practical method for computing the frustration index of relatively dense graphs, we hope to move the study of signed graphs beyond merely determining the level of balance, and toward the study of how the composition of mostly opposing groups impact other network dynamics. Second, our empirical findings suggest that research on polarization and its impact on the legislative process should look beyond political parties and partisanship to more subtle but influential forms of coordination, such as internally cohesive coalitions which are antagonist towards one another.

Materials and Methods
Relations of collaboration and opposition between elected officials are difficult to collect directly because politicians have limited time to participate in surveys and have good reasons to conceal their true political relations. Therefore, studies of elected officials' political networks typically measure these relations indirectly, using bipartite projections focusing on their co-sponsorship of bills 44 , co-voting on bills 36,45,46 , co-membership on committees 47 , and co-attendance at press events 48 . For a range of substantive reasons noted by 6 (e.g. relatively few bills are actually voted on, committee memberships are driven by such non-ideological factor such as seniority), we examine political relations from bill co-sponsorship.
Specifically, we use a signed network of inferred political relations among the members of the US House of Representatives, and among the members of the US Senate, in each session of Congress from 1979 to 2016 (96th session -114th session). The process for creating these signed networks is described in detail by 6 and they are available in a public Figshare data repository 4 . inferring signed networks from co-sponsorship data. Importantly, all pairs of legislators co-sponsor at least some of the same bills, so we know that the mere existence of some co-sponsorships does not imply they collaborate, and that some number of co-sponsorships can actually indicate avoidance. In previous work 6 , a stochastic degree sequence model (SDSM) 5 is used to define thresholds of significantly few and significantly many co-sponsorships by building the empirical sampling distribution of two legislators' joint co-sponsorships under a null model in which each legislator co-sponsored approximately the same number of bills and each bill received approximately the same number of co-sponsorships (i.e. holding approximately constant the legislator and bill degree sequence). To be more specific, given a bipartite graph B, Monte Carlo methods can be used to generate probability distributions ′ BB ij when = Pr B ij is a function of the row and column marginals of B 6 . Decisions about whether a given dyad represents significantly few or significantly many co-sponsorships are made by comparing their observed number of joint co-sponsorships to the empirical sampling distribution using a two-tailed α = .
0 05 threshold. For example, Fig. 6 shows that Rep. Earl Blumenhauer (D-OR3) and Rep. Sheila Jackson-Lee (D-TX18) were observed to have co-sponsored 242 of the same bills (dashed vertical line). The magnitude of joint co-sponsorships, and the fact that both representatives are Democrats, might lead one to conclude that they are collaborating. However, the shaded distribution shows the expected number of joint co-sponsorships under the SDSM null model in which each representative randomly chooses which bills to co-sponsor. Comparing these representatives' observed number of joint co-sponsorships to the null model expectation, we find that they co-sponsor significantly fewer of the same bills than would be expected at random and therefore define the edge between them as negative.
This approach differs from other methods of reducing weighted graphs to binary or signed graphs 49,50 because it explicitly incorporates information from the original bipartite data (i.e. legislators linked to bills), thereby ensuring it is not lost when these data are projected as a unipartite graph. Additionally, 6 extracts signed backbone networks rather than the weighted bipartite projections because the weights in those projections are distorted by heterogeneity in the bipartite degree sequences (i.e. some legislators sponsor many bills, others sponsor few 5,51 ).
Although data on earlier sessions are available, they were excluded because prior to the 96th session, House rules imposed a limit of 25 co-sponsors per bill, which artificially distorts co-sponsorship patterns and limits the usefulness of these data for inferring political networks 52 . These data do not distinguish between a bill's "sponsor"