The 1995-2018 Global Evolution of the Network of Amicable and Hostile Relations Among Nation-States

We draw on the data collected by the Integrated Crisis Early Warning System on millions of international and regional public news stories, and this system's indicators of the orientation toward a specific nation-state. We construct the networks of international amicable and hostile relations among nation-states that occur in specific time-periods in order to study the global evolution of the network of such international appraisals. Our analysis presents evidence of an evolution of the structure of this network and a model of the probabilistic micro-dynamics of the alterations of international appraisals during the 1995-2018 span of the available data. Our research provides empirical findings on long-standing debates in the interdisciplinary field of work on Structural Balance Theory. Also remarkably, we find that the trajectory of the Frobenius norm of sequential transition probabilities, which govern the evolution of international appraisals among nations, dramatically stabilizes.


T he Integrated Crisis Early Warning System
(ICEWS) is a comprehensive, automated, and validated system to monitor national, sub-national, and internal crises. Its event data is publicly available and consists of coded interactions between socio-political actors (i.e., friendly or hostile actions between individuals, groups, sectors, and nation-states).
Geographicaltemporal metadata are extracted and associated with the relevant events within a news article. The data structure is a list of events. Every event has an occurrence date, a source actor, and a target actor. Every event is also annotated with a value in the [−10, +10] interval that indicates the orientation of the source to the target actor: −10 (completely offensive) to +10 (completely supportive). For instance, the news event "Japan said on Tuesday it had halted economic aid to Yugoslavia in line with Western efforts to end the fighting there" is coded as a directed edge from Japan to Yugoslavia with weight −5.6, that is calculated based on the content of the news and the type of event (which in this case was "reduce or stop economic assistance"). Generally, the actors have political positions in a particular country, such as government administration, military, police, etc. In our analysis, we consider every country as a node and focus our analysis on international events in which the source and target nodes belong to different countries. The data includes 250 countries (network nodes) and over 8 million international events (network edges) occurring over more than two decades.
These data provide a unique opportunity to (i) construct networks of international amicable and hostile re-lations among nation-states that occur in specific timeperiods and (ii) investigate the global evolution of the network of such international appraisals over a lengthy span of time. The motivations for exploiting such data include an understanding of the origins of war, the formation of alliances, and the balance of powers. Similarly motivated research includes (1)(2)(3)(4)(5)(6)(7). Some of this research on international appraisals has been guided by a network science theory of structural balance (8,9) in which signed networks evolve toward either a network of all positive appraisals or a network composed of two components of actors with all positive within-component appraisals and all negative between-component appraisals. Gellman et al. (10) used structural balance theory to analyze the origins of WWI, and Antal et al. (11) similarly used balance theory to explore the evolution of major changes among the protagonists prior to WWI during the period 1872 to 1907. Moore et al. (12) used balance theory to analyze the conflict over Bangladesh's separation from Pakistan in 1972. Harary et al. (5) also analyzed international relations among nations and different states of equilibrium and disequilibrium, using structural balance theory for the crisis in the Middle East in 1956. Harary et al. (5) showed how ten countries, after each international shock, sought a new equilibrium alignment consistent with what balance theory predicts.
First, unlike balance theory's prediction (9), the empirical evidence does not support the prediction that a network of friends and enemies must evolve either to a network of all friends or to a network of composed two antagonistic components of actors with all positive within-component appraisals and all negative betweencomponent appraisals. Instead, the evidence supports the conclusion that the evolution of appraisals is mainly driven by reductions of intransitive relations among actors, which allow the emergence of complex network topologies with more than two mutually antagonistic sets of countries and hierarchically structured positive relations between countries (13). Intransitive relations occur when there is evidence of a positive chain of international relations i + − → k + − → j and evidence of a negative i − − → j relation. Such intransitive relations are assumed to be sources of international tensions that lead to transformations of positive relations to negative relations, and vice versa.
Second, the empirical evidence does not support balance theory's assumption that every actor has either and positive or negative orientation to every other actor, and a line of research has developed that relaxes this assumption (4,9,(14)(15)(16)(17)(18)(19)(20). In large-scale networks, incomplete networks that include indifference relations are the rule. In the ICEWS data, such null relations appear when there are neither amicable nor hostile events between two countries. While it may be assumed that all countries are aware of each other's existence, such awareness need not be coupled with an amicable or hostile relation.
Third, despite numerous theoretical advancements and empirical studies on balance theory, the dynamic predictions of network state changes have rarely been tested with empirical investigations of longitudinal data (21,22). There have been a large number of empirical studies on static networks (23)(24)(25). Longitudinal studies have been limited to small populations of actors and to a small number of temporal states of the network (2,26). In contrast, this study presents results on the most extensive set of longitudinal data yet assembled that allows research on the question of whether the evolution of appraisal networks is mainly driven by reductions of intransitive relations.
Our contributions in this study are as follows. In this article, we advance the line of research on the evolution of the network of amicable and hostile relations among countries, and also the basic science on structural balance theory. To the best of our knowledge, this article reports novel empirical findings from the largest longitudinal data yet assembled on structural balance theory. First, we address the existing lacuna on balance theory dynamics in large-scale networks that include null (indifference) relations. In networks that include large numbers of null relations, we find startling evidence of changes in international relations that are predominately restricted to only 10 types of configurations in the possi-ble set of 138 configurations of null, positive, or negative relations among any three countries. Second, we find surprising evidence that does not comport with balance theory's prediction of a general tendency toward configurations of international relations that do not violate the theory's assumptions. Instead, we find a trajectory that involves a short period of increasing numbers of violations of balance theory's expectations, as indifference relations convert to negative or positive relations, followed by a longer trajectory that involves decreasing numbers of violations of transitive relations. During the entire course of this evolution, we find that balanced triads are likely to stay balanced and that unbalanced triads are likely to transition to balanced ones. Third, we introduce a novel convex optimization model with a convergence guarantee for quantitatively estimating time-varying Markov chains of the transitions of the structure of international relations. Empirical Markov transition matrices show diminishing variability over our longitudinal data and emergent dynamic stability. Fourth, we conclude with evidence suggesting that the evolution of the network structure toward dynamic stability is subject to disturbances that appear to be related to disruptive international events and changes in the global economy. This finding provides a novel empirical support on a longitudinal setting for earlier research regarding the effect of global trades on international conflicts (1).

Results
In the span of 23+ years from 1995-01-01 to 2018-09-30, the ICEWS data includes 250 countries and 8,073,921 international events, each of which are positive or negative appraisals generated from a source country to some other target country. Each appraisal is a value in the interval [−10, +10] that indicates the orientation of the source to the target actor in a news event that occurred at particular time: −10 (completely offensive) to +10 (completely supportive) (27). The event data present positive, negative, and null international edges. The null edges are either a source-target news event that cannot be given a sign or an indicator that no source-target news event has been published. Precisely, there are 5,974,283 positive international edges (74%), 1,333,646 negative international edges (17%), and null instances of 765,992 neutral edges (with weight zero) (9%).

A. Empirical Dynamic Networks.
To investigate the evolution of the international appraisals, these data are disaggregated into time periods. Each time period is associated with the subset of published events that occurred during the period. Each network is comprised of 62,250 (250 2 − 250) directed positive, negative, and null edges among the 250 countries. A particular source-target ordered pair of countries may be associated with multiple events during a particular time period, and we take the sign of the summed value of these multiple events as the measure of the orientation of the source to the target. Hence, for 3 month periods, we have 101 snapshots of signed and directed networks among countries (see section Materials and Methods). We attend to different definitions of period length as a check on the robustness of our findings.
Core-periphery phenomenon: We find that the network structures are in the class of classical core-periphery (also called center-periphery) structures (28,29). Such structures of n nodes are composed of one strong component of k nodes (the core) in which one or more paths of positive appraisals exits from every member of the core to every other member of the core. The remaining set of n − k nodes (the periphery) is composed of nodes each which has at least one positive appraisal to a node (or nodes) in the core. We find, in the first period of the data, that there are exactly 111 countries in the core and 23 countries in the periphery. These peripheral countries were Afghanistan, Angola, Guinea, Haiti, Sierra Leone, Zimbabwe, Bolivia, Paraguay, Rwanda, Armenia, Azerbaijan, Congo, Grenada, Guatemala, Guyana, Kuwait, Malawi, Mozambique, Nicaragua, Nigeria, Panama, Sudan, Timor-Leste. We find that over time (in about 4 years), all of the first period peripheral countries moved into the core. That is, the size of the core grew to n = 134 and was maintained as a single strong component of 134 nodes. The remaining 116 countries do not exist in all network periods. Thus, we focus on the network dynamics of these 134 countries in our analysis. Using quarterly periods, the percentage of positive ties in the core increases from 4% to 18%, and the percentage of negative ties increases from 1% to 4%. This trend is shown in Fig. S1 (a).

Structural balance theory:
This theory predicts an evolution of the structure of these networks toward a state in which all violations of transitivity are eliminated. Thus, the theory focuses on the transitions of triads as states. In this theory, triads -subsets of three nodes-are considered the building blocks of relationships. Every possible triad of three countries among the core countries (134 countries) involves six edges in which each edge is either positive, negative or null. Classic balance theory assumes the absence of null edges, in which case there are 16 possible type of triads (20,22,30,31). In contrast, when null arcs are allowed, there are 138 possible types of triads some of which may entail one or more violations of transitivity. Ergo, this generalized structural balance theory is defined on sparse networks which are extremely realistic. The underlying idea for this generalization is that at any given time, if an edge has not been existed until now, it means there exists no social tension between those two nodes and that edge should not be considered part of a violation by classical axioms (See Fig. S8 in appendix showing total 138 possible triads). To follow a tremendous literature on structural balance theory, we extend it on sparse networks using three existing models classical (9), clustering (32), and transitivity (33) (from least to most general respectively). Each model permits different triad set to be considered structurally balanced. The extension of these models are formally defined in Materials and Methods (see Table 2).
Remarkably, we find that 91% of the 392,084 observed triads in 23+ years are concentrated on only 10 triad types (out of 138). Fig. S1 (d) displays this operative set of triad types. The types 6, 8 and 9 include one or more violations of transitivity, and the others do not. Fig. S1 (c) shows the average distribution of operative triads over time and Fig. S1 (b) depicts the temporal trajectory of the percentage of transitive triads. It is evident that structural balance does not always increase. Our finding based on Fig. S1 (b) is that after a decrease of structural balance during the first periods, the network's trend is toward greater balance since 2006 onward. The main basis of the initial decline are conversions of null relations to positive relations and the associated proliferation of intransitive triad 9 configurations. Overtime, many of these violations of transitivity are then resolved by conversions to triad configurations that do not violate transitivity (triads 1, 4 or 10). Also, looking at distribution of triads ( Fig. S1 (c)), we see only about 8% of triads (summation of volume of triads 6, 8 and 9) to be not-balanced over course of more than two decades. The evidence for this result is highlighted in our Markov chain analysis to which we now turn.

A.1. Markov model on dynamic networks.
Here, we present a Markov model of the dynamical system of the temporal transitions of the networks' triads that is not restricted to the operative set of triad types. This model provides a deeper image of the probabilistic micro-dynamics of the alterations of international appraisals during the 1995-2018 span of the available data. We compute the average probability transition matrix of the 138 possible triad types from networks aggregated over a three months period (seasonally). Interestingly, most of probabilities in this matrix are very close to zero and the dynamics of the system can be described only by a few states. For the sake of visualization, we can focus on the operative set of 10 triad types (described in Fig. S1 (e)) and we show their transition probabilities in Fig. S3 (a). The probability transition matrix is robust with respect to the choice of period. In Fig. S3, each sub-figure shows the transition matrix for period lengths: of transitioning from 1 to 1) and, thus, are most likely to persist. More precisely, Fig. S3 (e) shows the stationary distribution of the Markov process. The summation of balanced triads in the stationary distribution is larger than 0.85. It appears that regardless of the definition of period, the Markov model predicts our empirical finding of a network evolution toward structural balance.

A.2. Time-varying Markov model on dynamic networks.
Our results show that the probability of transitions to and remaining in balanced states are statistically significant in every period over the 23+ year data span. We find the same results with a novel time-varying Markov model in which the transition matrix can smoothly change and is learned via a convex optimization scheme (see section Materials and Methods). Fig. S4 describes the analysis pipeline of this model. Our Fig. S5 results further support the conclusion that structural balance drives the dynamics of the system. All unbalanced triads have an estimated high mean probability and small standard deviation on transitions into balanced triads, and the balanced triads have an estimated high mean probability and small standard deviation of remaining balanced. Note the distinctive separation of the transition probabilities.   Applying this experiment on other datasets we posit this result not only holds in our focal network of international relations, but also in two longitudinal datasets on financial bitcoin trust networks (34,35). Due to focus of this paper and lack of space we do not include those figures. Results show although international and financial networks are very different, our findings on transition toward balance generalize across all three datasets. Therefore, it appears that transitions toward balance are ubiquitous (i) regardless of the definition of balance, (ii) regardless of the setting (international news networks or financial networks), and (iii) regardless of the type of actor (individuals or countries).

B.1. Qualitative Relation with Exogenous Shocks.
The transition matrices are stable over time, as measured by the Frobenius norm difference of consecutive matrices (Fig. S6). Interestingly, the Frobenius norm of transition matrices declines smoothly over time. We call this phenomenon, the stability of the dynamics. This finding is aligned with the fact that the number of wars per pair of countries in the past 50 years was roughly a 10th as high as it was from 1820 to 1949. Fig. S6 (a) suggests that the disruptions to this trend are associated with important shocks such as the September 11th, 2001 attacks (9/11).

B.2. Quantitative Relation with International Trade Activity.
Additionally, inspired by previous research (1,(36)(37)(38)(39), using the data on international trades among nations since 1995, in Fig. S6 (b), we find a statistically significant correlation between the Frobenius norm difference of consecutive matrices and inverse of global trades. World trade data shows the global trades among all countries in world as of the percentage of each countries' GDP, which is extracted from The World Bank national accounts data (see section Data Availability for details). The stability of dynamics and global trades are correlated in past 23+ years (Pearson correlation coefficient of 0.88 (p < 1e-07)). Fig. S6 shows that as relationships among countries have become stable over the years, the volume of trades has been increased.
Moreover, our causality test shows evidently the more global trades there are, the more stable the relational dynamics become and vice versa over the course of two decades. Granger causality test (40) shows an statistically significant effect of global trades on the stability of the dynamics (p < 1e-03), and also shows a feedback effect (p < 1e-02). The causality tests are found to be statistically significant using both F-test and chi2-test with #lags = 1.
This result simply means as the relations among countries become more stable, they are more willing to trade for economical benefits, and on the other hand when they internationally trade with one another, they are willing to have a stable relationship with one another. In this study, stability is captured with Markov transition matrices of triads over the course of two decades. This finding comports with the seminal study by Jackson et al. (1) and its finding on stabilizing the international conflicts by the decrease of the number of wars among nations as the same rate as the increase in global trades.

Discussion
Balance theory has triggered a literature of efforts to specify the mechanisms that alter interpersonal appraisal networks (8,9,41) towards states of structural balance. This theory is also associated with research on international relations. However, despite the need for longitudinal data to recover the underlying dynamics of balance theory, such investigations have been rare. We have leveraged an extensive longitudinal dataset to advance the research on the evolution of the network of global international relations, and the basic science on balance theory. We find consistently high probabilities of transition toward and remaining in balanced triads and not vice versa. We believe that balance theory's prediction of a structural evolution toward balanced states is sound. Also, we find that the network dynamics of international relations over the past 23+ years have been toward structural stability, consistent with balance theory expectations, with occasional shocks of large scale international events on the trajectory of the global network.

A. Definitions.
In Table 1, we summarize the major notations used throughout the present section.

B. Generalized Structural Balance.
In order to give formal definitions of different versions of generalized balance theory, we start with Heider's (8,42) four axioms: • A1-Friend of a friend is a friend.
• A2-Friend of an enemy is an enemy.
• A3-Enemy of a friend is an enemy.
• A4-Enemy of an enemy is a friend.
Classical balance theory assumes a fully connected network (8,20,32,43). We generalize three definitions of balance, based on the above axioms, to networks with null edges. The value of e ij can have negative, positive or zero (null) value. Out of 138 possible triads, 93 are transitive-balanced (67%), 44 are cluster-balanced (32%), and 24 are classical-balanced (17%). Remarkably, we find a large set of forbidden triad types are transitioning to a relatively small set of permissible triad types (Fig. S5). In balance theory literature, the concept of sparse balance theory has been before addressed as incomplete awareness (16,44). The concept has been motivated by the empirical evidence that affective relations are signed but seldom complete -actors may be neutral toward each other or there may be null or unobserved edges. Cartwright and Harary (9) define balanced cycles on networks with missing edges such that the only con-dition is as cycles containing an even number of negative edges. In this work, we extend the analysis of networks with null edges to the general case of sparse triads. Assume every directed edge e has value ∈ {−1, 0, +1}. For every three distinct nodes i, j, k, to be considered as a balanced triad, the following condition, for any permutation of the nodes, needs to hold Table 2. Different definitions of sparse structural balance theory that generalize existing definitions of balance. Transitivity is the most general model that only requires the first axiom.  . Estimated transition probability for classical-, clustering-, and transitivitybalanced and -unbalanced triads (the pipeline is described in Fig. S4). The x-axis shows the estimated probability (computed by solving Eq. (S7)), the box is the interquartile range of the probability distribution, the orange line is the median of the distribution, and the whisker shows minimum and maximum of the range of the distribution. This figure shows that the probability of transitions from unbalanced triads to balanced ones is significantly higher than the opposite transitions. Also, the probability of remaining balanced is more likely than the probability of remaining unbalanced. These findings hold regardless of the definition of balance (see section Generalized Structural Balance). Our experiments show that movement toward balance is not limited to a dataset or type of data.

C. Network Extraction.
Networks are extracted by aggregating edges in predetermined periods. If the period length is too short we would not obtain sufficient information, while too long periods would decrease the granularity of the observations. We use 12 weeks (∼1 quarter) as the period duration. Note, Fig. S3 shows results based on transition matrices are robust with respect to the choice of period length. Consequently, for ICEWS dataset, we have 103 networks. For a given network, the appraisal between nodes i and j is determined by the sign of summed edge weights of all directed edges observed between them (edge (i, j)), during that time period, as described by where A(t k ) shows the adjacency matrix at period t k .

D. Empirical Markov Transition Matrices.
For each consecutive observation period (t, t + 1), we compute P ij (t), the number of triads of type i that moved to type j from period t to t + 1. In fact, for every three nodes in the network, we find the corresponding triad type, at time t and time t + 1, say triad type i and triad type j, respectively. Then increment the number of transitions happening from P ij (t) ← P ij (t) + 1. Thus, row i sums to P i (t), the number of triads of type i at time t, while P j (t + 1) is the number of triads that have transitioned to type j at time t + 1. Using the transition matrix P ij (t + 1), the transition probabilities can be estimated to obtain the transition probability matrix. These quantities can be arranged in a matrix and normalized by the sum of every row. Therefore, we have row-stochastic transition matrix P where each P ij (t) is conditional on i only, and not on prior states occupied by the triad. By the Markov property, they are identical for all triads, and they converge to a stationary probability distribution.

E. Estimating Time-varying Markov Transition Matrices.
Estimating Markov transition matrices via counting the observed transitions only takes into account the subsequent periods and therefore does not take advantage of any other similarity among transition matrices. The goal is to have a method that while keeping the Markov at- tribute of the system, allows for transition matrices to transfer information based on the existing assumptions in the literature, such as applying smoothness to estimate a more accurate set of transition probability matrices. Hence, we use time-varying Markov chains to capture the most out of the observed transitions in the data. The length of the longitudinal data, in this study, entails having statistical sufficiency to apply a nonparametric convex method to accurately estimate the transition probability matrices directly from the data.

E.1. Model Formulation.
For a network A t at time t, we count the occurrences of each of three nodes, and classify each into one of 138 possible triads. There are T + 1 periods and thereupon T + 1 networks. There are m entities (triads in a dynamic network), that in parallel, change states for T + 1 periods of time. As discussed before, there are 138 triad types (l 1 , . . . , l 138 ). Each empirical Markov probability transition matrix,P t , is computed as followŝ where eachP t is a 138 × 138 matrix and there ex-istP 1 , . . . ,P T as empirical transition matrices. Fig. S3 shows the average empirical Markov transition matrices for different choice of period length. Now, we formalize an optimization problem to account for potential error in each empirical transition matrix as being optimized to be close to the true underlying time-varying Markov transition matrix as P t ∼P t . [S3] The algorithm considers these empirical transition matrices as the input, and estimates all latent transition matrices, simultaneously. By definition, a Markov transition matrix, P t : 138 × 138, should be ergodic, aperiodic and irreducible. Simply put, in the considered Markov chain, it should be possible to be in any state and also should be possible to get to any state from any state. Thus, there are T probability Markov transition probability matrices and every P t needs to satisfy 0 < (P t ) ij ≤ 1, ∀i, j ∈ {1, . . . , 138} and t ∈ {1, . . . , T }.
[S4] By definition, Markov transition probability matrices should be row-stochastic -every row is sum up to 1. That is, 1 T n P t = 1 T n . [S5] Regarding the objective function, based on previous studies dealing with time-varying Markov chains (45), we make an assumption that subsequent transition matrices are similar to each other; the changes happen smoothly. P t ∼ P t−1 . [S6]

E.2. Optimization Problem.
To estimate all unknown transition matrices simultaneously (T matrices of size 138×138), we setup an appropriate optimization problem; we shall solve the optimization problem using convex optimization methods.
where P t : t = 1, . . . , T are the estimated transition matrices and their results are shown in Fig. S5. In Eq. (S7), we apply two forms of smoothness on subsequent Markov transition matrices: l2-norm (Frobenius norm), also called group lasso that enforces a small amount of change in transition matrices, and l1-norm, also called fused lasso that induces a sparse solution with respect to the changes in matrices. In optimization literature, this criterion is called sparse group lasso (46). Together they encourage subsequent Markov transition matrices to only deviate from each other with small values and in only a few cells.
This formulation allows for learning a separate model for each transition between time periods while inferring information globally across all periods. Non-parametric estimating all transition matrices together via an optimization problem decreases the chance of overfitting. This method also allows for finer time windows than otherwise, and provides a better inference granularity in time. Consequently, even with few observations of the data, we end up with an accurate estimation for timevarying transition matrices. Algorithm 1 illustrates the steps for estimating the time-varying Markov chains.
The problem in Eq. (S7) is convex. The reason is that the objective function is a summation of two norms which are convex, all of the inequality constraints are convex, and all equality constraints are affine. Therefore, the problem is convex (47), it has a globally optimal solution, and we solve this equation by a convex optimization solver, CVXPY (48).
Note, in appendix, we provide a novel proof for optimality of the convergence rate given a bound in probability for the proposed optimization model of simultaneous leaning of all transition matrices in Eq. (S7).

E.3. Model Comparison.
In order to test the predictability of the estimated tran- Tune the hyper-parameters λ 1 , and λ 2 for time starting from t = 2 to T − 1 dô P t := zero matrix [138,138] for every triplet of nodes i, j, k in set of nodes V do triad1 := type(A t−1 (i, j, k)) triad2 := type(A t (i, j, k)) P t [triad1, triad2] :=P t [triad1, triad2] + 1 end P t :=P t / 1 T nPt end Solve the convex optimization problem in Eq. (S7) sition matrices, we predict unseen proportion of the unseen triads using the proposed algorithm as compared to competitive baseline methods. For instance, assume we have T periods. We hold out the proportion of triads at time T and train on periods of 1 up to T − 1 using Algorithm 1. Consequently, the estimated transition matrix for time T − 1 to T is multiplied with proportion at T − 1 to give us the predicted proportion at T . We apply one step-ahead forecast, for multiple times, in each time retraining the model up to the last held out time. To this end, we have a more descriptive picture of the prediction power of the proposed method. As the forecast metric, we compute Root Mean Squared Error (RMSE) of the difference between the predicted proportion with the ground truth. Random prediction is neglected due to its significantly worse accuracy compared to other baselines. Consequently, we compare our forecast to a baseline of simply predicting the last time steps proportions, and versus an average of all preceding time steps' proportions. Fig. S7 depicts that the proposed algorithm outperforms the baselines and makes accurate forecasts of the proportion of triads in the subsequent time period in the ICEWS dataset. We also use this forecasting method for multiple steps, and apply on a validation set of the periods, to carefully tune the hyper-parameters of algorithm 1. In this study, we set λ 1 = 0.5 and λ 2 = 0.05.
A future direction to improve the predictability of the model is to remove the Markov assumption in the modeling. We can let models find the best number of previous periods which should be taken into account for predicting the proportion of triads. Recursive neural networks can model the non-Markovian aspect inherent in the data.

Author Contributions
O.A. analyzed the data and implemented the methods. O.A. and N.E.F. led the study. F.B. and A.K.S. actively participated in the discussion and gave critical comments. All authors substantially contributed to the design of the analysis and the writing of the paper.

Data Availability
The data is publicly available as the so-called "Integrated Crisis Early Warning System (ICEWS) Dataverse" by Boschee et al. (49) under https://doi.org/10.7910/DVN/ 28075. The hyperlink also includes a document about the ICEWS event coding protocols.
The World Bank trade data (% of GDP) is also publicly available under https://data.worldbank.org/.

Code Availability
The source code is publicly available under https://github.   S5 shows the evidence for the network dynamic toward balance for these three datasets. The probability of transitions from unbalanced to balanced triads is significantly higher than transitions from balanced to unbalanced triads. The probability of remaining balanced is more likely than the probability remaining unbalanced. These findings are most strongly expressed in the data on transitivity-balance. This especially strong expression of transitivity-driven evolution is consistent with its status as the most important axiom of structural balance theory (8,33).
Interestingly, Fig. S5 shows that transition toward and staying in structurally balance hold regardless of the definition of balance (in the main paper) and the setting (international or financial networks), but notably, they have the strongest expression with transitivity-balance. classical Fig. S9. Estimated transition probability for classical-, clustering-, and transitivity-balanced and -unbalanced triads in all three datasets (the pipeline is described in Fig. 4 in the main paper and datasets in Table 3) . The x-axis shows the estimated probability (computed by solving Eq. (S7)), the box is the interquartile range of the probability distribution, the orange line is the median of the distribution, and the whisker shows minimum and maximum of the range of the distribution. This figure shows that the probability of transitions from unbalanced triads to balanced ones is significantly higher than the opposite transitions. Also, the probability of remaining balanced is more likely than the probability of remaining unbalanced. Surprisingly, these findings are robust with respect of multiple definitions of balance and different settings.

Proofs for the time-varying Markov model
Settings: There are n = 138 states, named l 1 , . . . , l n . There are m entities (triads in the network), that in parallel, change states for T + 1 periods of time. Each empirical Markov probability transition matrix,P t , is computed as followsP where eachP t is a n × n matrix and there existP 1 , . . . ,P T . S t shows the state of triad i (three countries) at time t.

Assumptions:
The assumption is there is an error in empirical transition matrices such that where P 0,t : t = 1, . . . , T are the true unknown transition matrices,P t , t = 1, . . . , T the empirical transition matrices, and z t , t = 1, . . . , T are i.i.d sub-Gaussian errors with zero mean. The probability transitions are between [0, 1]; thus, it is easy to show that the error is bounded as there exists a value for M, σ > 0 such that P (|z i | > t) ≤ M exp(−t 2 /(2σ 2 )) ∀t > 0, i = 1, . . . , n.
Hence, z t is sub-Gaussian. In our results, we also find empirically support for independence of errors as the Pearson correlation of every cells for subsequent estimated matrices are very small and more than 70% are not statistically significant (p ≥ 0.05).
We also assume the total variation (50) of matrices does not grow too quickly (51), where each P 0,t is a matrix with n 2 dimensions, for the constant value of n = 138 T V (P 0,t ) = T i=2 ||P 0,t − P 0,t−1 || 1,1 ≤ n 2 C T = O(T ). [S10] We empirically report the above equation for our data, is indeed O(T ) and precisely is 0.04T .
Problem definition: Instead of having a model for one Markov transition matrix and fit that to the entire period, we define a convex optimization problem to predict all transition matrices altogether using trend filtering for nonparametric regression (52).
where P t : t = 1, . . . , T are the estimated transition matrices. λ 1 and λ 2 are the hyperparameters which are tuned by applying Grid Search with 5-fold cross-validation.
A. Convexity proof. The objective function is a summation of three norms which are convex, all of the inequality constraints are convex, and all equality constraints are affine (47). Therefore, the problem is convex, it has a globally optimal solution (47). Thus, we solve this equation by a convex optimization solver, CVXPY (48).
[S12] Proof. Since the objective function in Eq. (S11), in previous proof is shown to be convex; thanks to the optimiality of argmin, P , the solution of the optimization problem minimizes the objective function more than any other matrix, say X, objective( P ) ≤ objective(X).
We use Eq. (S11) and rewrite it as As a matter of fact X t , could be replaced by P 0,t as follows ||P 0,t − P 0,t−1 || 1,1 .
In the first term we use P R = ∆ † ∆ where ∆ ∈ R r×n is an arbitrary linear operator, with r rows. We have 16 | askarisichani et al.