Information flow in first-order potts model phase transition

Phase transitions abound in nature and society, and, from species extinction to stock market collapse, their prediction is of widespread importance. In earlier work we showed that Global Transfer Entropy, a general measure of information flow, was found to peaks away from the transition on the disordered side for the Ising model, a canonical second order transition. Here we show that (a) global transfer entropy also peaks on the disordered side of the transition of finite first order transitions, such as ecology dynamics on coral reefs, which have latent heat and no correlation length divergence, and (b) analysis of information flow across state boundaries unifies both transition orders. We obtain the first information-theoretic result for the high-order Potts model and the first demonstration of early warning of a first order transition. The unexpected earlier finding that global transfer entropy peaks on the disordered side of a transition is also found for finite first order systems, albeit not in the thermodynamic limit. By noting that the interface length of clusters in each phase is the dominant region of information flow, we unify the information theoretic behaviour of first and second order transitions.

Tipping points abound in nature and society, and, from species extinction to stock market collapse, their prediction is of widespread importance.Global transfer entropy, a general measure of information flow, is an advance predictor for second order transitions, such as the Ising model ferromagnetic transition [1], where correlation lengths diverge at the transition giving long range order.Here we show that (a) global transfer entropy is also a predictor of finite first order transitions, such as ecology dynamics on coral reefs [2], which have latent heat and no correlation length divergence, and (b) analysis of information flow across state boundaries unifies both transition orders.Important real-world systems exhibit both orders, as does the canonical Potts spin model.Examples include: nematic crystals [3], fundamental to LED display and lighting technology; consciousness and anaesthesia [4].We obtain the first information-theoretic result for the high-order Potts model and the first demonstration of early warning of a first order transition.The unexpected earlier finding that global transfer entropy peaks on the disordered side of a transition is also found for finite first order systems, albeit not in the thermodynamic limit.By proposing that the interface length of clusters of each phase is the dominant region of information flow, we show that first and second order behaviour is consistent with flow magnitude.
Information theory successfully predicts complex phenomena as diverse as neural information flow and the dynamics of starling flocks.Mutual information has been successfully used to model biological communication channels and shown to peak or diverge at critical points in spin systems and real-world domains [5,6].Our new results on global transfer entropy, closely related to conditional mutual information, fit into existing studies on mutual information and provide new insight into behaviour before a transition occurs.
Numerous mechanisms for predicting phase transitions exist, applied, for example, from core science and engineering through biology, ecology, medicine and finance [7]: increased variance; critical slowing down [7]; flickering [8]; and a peak in the global transfer entropy (Eqn.2).By thinking about tipping points in terms of phase transitions, we can apply general results from physics, and make use of a central idea-that there are a limited number of universal transitions.
To study such transitions in their simplest form, two canonical models stand out: the Ising model [9], a binary spin system on a square lattice, where each point on the lattice has a spin, which may point up or down; and the Potts model, which generalises Ising to spins with an arbitrary number of states, q, and reduces to the Ising model for q = 2.
These models have been used for modelling real-world applications as diverse as: foam flow (for applications such as brewing, oil recovery, and fire-fighting) [10], tumour growth [11], reconstructing social networks [12], and studying human population and night light patterns [13].
Note that the Ising and Potts model are equilibrium models, conserving energy during state changes, while non-equilibrium models do not.Importantly, some nonequilibrium models exhibit non-reversible phase transitions, for example, species extinction cannot be reversed, while equilibrium models can always transition back to disordered states (dependent on increased temperature).
In the Ising model [9], mutual information peaks at the transition between ordered and disordered phases [5,6], as does pairwise transfer entropy [14] (Eqn.10, suppl.material), while global transfer entropy (Eqn.2) peaks distinctly on the disordered side [1].Here we extend this prior work to the q-state Potts model [15] which exhibits first-order phase transitions for q > 4 [16].At q = 5 the transition is weakly first order, implying a long correlation length and low latent heat.As q increases the correlation length decreases and the latent heat increaes.We show that as the system becomes more strongly first order (i.e., q > 7) the behaviour of global transfer entropy, G, diverges from the second-order behaviour: In the thermodynamic limit, G becomes discontinuous at the transition temperature, T c , peaking at T + c .The standard Potts model comprises a lattice of spins with periodic boundary conditions and size N = L × L, where the system state is s = s 1 , . . ., s N , with s i ∈ {1, . . ., q}.The interaction energy between two neighbouring sites is E ij = −Jδ(s i , s j ) giving the Hamiltonian H = −J i,j δ(s i , s j ), where interaction strength J = 1, δ(x, y) is the Kronecker delta function which is one if x = y and zero otherwise, and i, j are all inter-acting pairs of sites in the system.Local site energy, E i , is defined similarly, fixing i and summing over its four neighbours.
The system is updated using Glauber dynamics [17].
Overall alignment of the lattice is measured by its magnetisation, M = (q s m − 1)/(q − 1) [18], where s m is the mode state and s m = δ(s m , s i )/N is the proportion of the dominant state over all sites, ranging from q −1 to 1, giving magnetisation in the range [0, 1].M serves as the order parameter and the order-disorder transition occurs at an intermediate temperature [19] where the (thermodynamic) system is disordered (M = 0) at temperatures above T c and non-zero below T c .The behaviour at T c defines the transition order, where q ≤ 4 has continuous M (and discontinuous dM/dT ) giving a second-order phase transition.The first-order case studied here occurs at q > 4 with discontinuous M at T c .Transfer entropy, T, measures (Eqn.8, Eqn. 10 in suppl.material) information flow from one stochastic process, Y , to another, X-in this case the states of two neighbouring spins over time.Global transfer entropy measures the average information flow of the entire system to individual spin sites: We note however, that all information-no matter its origin in the lattice-must flow to s i via its neighbours or its own past, and thus consider only the immediate neighbourhood of each site (including s i ) rather than s in Eqn. 2. As with T, G ≥ 0 with G = 0 iff each site s i , conditioned on its past, is independent of its neighbours.The first-order transition shows a void region of energy space around the phase transition, such that general purpose update schemes, such as Glauber dynamics, are very unlikely to enter this region.In fact, for temperatures close to the critical temperature, energy distribution P (E) is bimodal (See suppl material, Fig. 5).Thus we estimate G via two methods.
In the first, denoted G (g) , we employ straight-forward Glauber dynamics where each update, or sweep, comprises N spin flip attempts.
The second uses the density of states, g(E), calculated with the Wang-Landau algorithm [20].P (E) may then be calculated from where E is the lattice energy.Any thermodynamic observable, f (T ), may now be determined from its value as where P (E) is the distribution of states, not probability, and has been normalised for visualisation and computational reasons [22].After determining g(E) we need to determine G(E).While f (E) depends on energy only, G is a temporal quantity and thus also depends on temperature, therefore we in fact need to determine G(E, T ) for varying T .Additionally, as P (E) → 0 for many values, f (E) can be measured more simply by culling energy values where P (E) is sufficiently low-that is, reaching every E is unnecessary and so G(E, T ) can be calculated via Glauber dynamics rather than Wang-Landau updating.
We thus collect ensemble statistics similar to G (g) (with fixed T per ensemble), collating statistics for G(E, T ) using the energy E of the lattice before the Glauber sweep, where the future state is the post-sweep state.We denote this regime as G (s) .We note however that this may not be strictly correct, as E can change after each successful spin flip during the sweep, thus statistics collated for G(E, T ) will include elements from E = E.To address this, we separately collate statistics on a per-flip basis, where s i and its neighbourhood are recorded at any attempt to flip s i , giving G (f ) .
Due to data volume requirements involved in estimation of G, G (s) and G (f ) employ a compression regime which is applied to G (g) in a validation step, giving G (e) .
All four variants exhibit a peak in G on the disordered side of the transition (Fig. 1).Note too that the effect of simulation changes-uncompressed versus compressed histograms and per-sweep versus per-flip statistics-merely seems to be a constant factor.The peak locations are mostly stable for the pure Glauber approaches, G (g) , G (e) , except for L = 32, while the thermodynamic, density of states approaches, G (f ) , G (s) , exhibit a strong shift in G peak as q and lattice sizes increase, rapidly approaching the critical temperature.
The behaviour of the G peak indicates the presence of finite size effects in the system.Indeed as ensemble volume is increased-both by increasing the number of realisations as well as the lattice size-for G (g) , G (e) a shift in peak location can be observed, shown in Fig. 2. We also observe flattening near T c , indicating that the peak will continue to shift at larger system sizes.Additionally, as expected, Glauber dynamics have difficulty traversing energy space near the transition temperature, which for the current experimental configuration manifests as an artificial reduction in G (g) , G (e) for T c (L) < T < ∼0.703 (See suppl.materials).
Finally, we look at a physical understanding of the behaviour of G. Intuitively, information flows when neigh- G measured using all four methods (top: G (g) , G (e) , bottom: G (f ) , G (s) ), with q = 5, 7, 10 (columns) for L = 32, 64, 128.Ensemble collated using 10 5 time steps over 10 realisations.Vertical lines indicate Tc.Filled symbols indicate "effective" Tc(L), the location where P (E) is precisely bimodal for given q, L.  2. G (g) (solid), G (e) (dashed) at increasing ensemble sizes for q = 10, performed by varying number of realisations, r.As ensemble size increases, system explores more phase space, thus converging to the thermodynamic behaviour given by the density of states.Inset: G for Ising model.From Barnett et al. [1].
bour states differ, hence zero information flow in ground states.This behaviour necessarily extends to clusters of states, implying information flow occurs on the boundaries, or interfaces, between clusters (See Fig. 3).It seems reasonable then to assume that information flow scales with number of interfaces.However, such a maximum coincides with the zero-energy fully disordered regime, where quite clearly G = 0.This assumption neglects the temporal nature of G, which is disrupted at high tem-perature.
Remember that G is a measure of a site's dependence on neighbouring sites, conditioned on its own past.At high temperature, spin flips are essentially random, choosing new states with little influence from neighbours.As temperature decreases, neighbour influence increases, leading to clusters of similar sites.We can thus approximate average influence by probability of cluster size, p(c).This influence is the manifestation of information flow in the system, but only on cluster boundaries (since information flow is conditioned on its own past), leading to: where L c is the boundary length of cluster of size c.Note however that when clusters get sufficiently large-i.e., on the order of system size L-they no longer have an outer perimeter and are instead defined by the holes created by other clusters (Fig. 3, bottom).Thus for this dominant cluster to increase in size, the internal holes must shrink and its boundary length L c actually falls.As temperature decreases, influence increases, but the available sites to transfer influence decreases, hence total information flow G falls.We note that Eqn. 5 is essentially the average interface length.There should thus be some relationship between average interfacial length and net information flow in the lattice.
The intuitive interface model of Eqn. 5, shown in Fig. 4, gives a remarkably good match to the G trends, peaking in the disordered regime in all cases, and converging to T c only where systems become more strongly first-order (increased q and increased L for q > 4).In the A B FIG. 3. Interfaces for q = 5 lattice sampled from T = Tc (0.8515) (Top) and T = 0.5 (Bottom) where each square is a lattice site.Top: Arrows show the counter-clockwise path interface walker (for large cluster) takes around complex interactions.Labelled clusters, while sharing the same state, are disjoint, and thus have separate interfaces.Average interface length is (34 + 3 • 8 + 3 • 6 + 9 • 4)/16 = 7. Bottom: When one cluster dominates, it no longer has an "outer" perimeter.Average interface length is (6 + 4 • 4)/5 = 4.4.q = 2 Ising case, interface peak location remains stable at increasing lattice sizes, as does G peak location [1].
Thus the average interface length is a suitable proxy for G, and is much easier to compute.Using this model, we find that the behaviour for the first-and second-order transitions fits into a single unified framework.

METHODS SUMMARY
Experiment.Straight-forward Glauber approaches construct ensemble composed r = 10 realisations, with settling time of 1000 time steps, followed by a measurement sequence of 10 5 time steps as in the Ising model [1].We optimise simulation by modifying initialisation dependent on T .In the ordered regime, T < T c , we initialise all realisations to the same ground state, noting that T c (L) > T c [23, p. 4] and thus only the ordered peak exists in P (E) for T < T c .For T ≥ T c we evenly divide realisations into random ground states or random disordered states to sample both P (E) peaks.Density of states approaches constructed likewise, minus the superflous (in this regime only) settling time.
Compression Regime.G is estimated via plug-in entropy estimators, using histograms to determine dis- FIG. 4. Average interface length for systems with q ∈ {2, 5, 7, 10} for indicated lattice sizes.The behaviour in peak location mimics the behaviour of the G peak in all systems: the first-order cases, q ∈ {5, 7, 10}, converge to Tc as the system becomes more strongly first order (increased q, L), while the second-order peak q = 2 remains stable above the phase transition.Note the factor of two difference in temperature for q = 2 and Ising results (i.e., Fig. 2 inset) is simply due to a slight difference in definition of site energy (i.e., Eij), with no further side effects.
tributions.Such a histogram requires six dimensions (a site, its four neighbours and its future) of q elements in each dimension and thus requires infeasibly many data points to accurately calculate G(E, T ).We note that the transition probability of spin flips depends only upon the number of spins matching the initial and final spins, rather than the exact neighbouring states.We thus compress this by replacing the neighbour dimensions with current site energy E i .We note the alternate approach, using the energy delta ∆E ki , is incorrect as information becomes double counted in G. E i was validated with an alternate reduction where the binary function δ(s i , s j ) was used for each neighbour dimension.
Interfaces.The average interface length is defined as: where N I interfacial lengths are found by performing a "turn-right walk " procedure, similar to Saberi [24], on every unmarked edge between adjoining lattice sites of differing states.Edges are marked in association with an adjoining site (such that each edge is ultimately marked zero or two times).This prevents a cluster from counting its perimeter (of length N i ) N i separate times, but accounts for interface boundaries between clusters of two or more differing states.This also addresses clusters with two or more disjoint interfaces, i.e., a 2D doughnut.Interface results, I(T ), calculated from I(E) and Eqn. 4 with the weighted Wang-Landau update scheme [20].Each E value sampled at minimum 5000 times, up to a maximum of 10000 samples.* jbrown@csu.edu.au

SUPPLEMENTARY MATERIAL Glauber Dynamics
The system is updated using Glauber dynamics [17], where site s i transitions to state s k with probability where T is the system temperature, k b is taken as one, and ∆E ki denotes the difference in site (or system) energy should the flip occur-i.e., ∆E ki = E k − E i .This transition probability biases spin flips towards lower energy states-where the ground state occurs at minimum energy when all sites take the same state-while the system temperature inhibits this bias, which disappears as T → ∞ such that spins flip to random states with probability 0.5.Glauber dynamics satisfy detailed balance [25] and thus yield the thermal equilibrium probabilities at stationarity.

Transfer Entropy
Transfer entropy measures information flow from one stochastic process, Y , to another, X-in this case the states of two neighbouring spins over time.It is a nonnegative quantity, reaching zero iff process X, conditioned on its own past, is independent of the past of Y .Positive values indicate a statistical dependencya reduction in uncertainty-of X given knowledge of the past of Y .Transfer entropy is given by the time-lagged mutual information, conditioned on the past of X: P (E) as T → T − c , demonstrating emergence of right (disordered) peak.Middle: The location of the "effective" transition, Tc(L), defined by equal height peaks.Right: P (E) as T moves away from Tc(L), showing dissolution of ordered peak.Note that in the thermodynamic limit, each peak only exists in its relevant regime and emergence of bimodal peaks is instantaneous at Tc.
where we use a single-step time-lag and the pairwise transfer entropy is simply the average transfer entropy over all interacting sites:

Energy Space
The first-order transition shows a void region of energy space around the phase transition, such that general purpose update schemes, such as Glauber dynamics, are very unlikely to enter this region.In fact, for temperatures close to the critical temperature, energy distribution P (E) is bimodal (See Fig. 5-note a scaling factor is introduced such that peak maximum is one).As q decreases, the peaks shift closer together until they merge into a unimodal peak at q = 4 (characteristic of a secondorder transition).The valley between peaks is shallower for given lattice size at lower q, thus q = 5 is considered weakly first-order, while q = 10 is strongly first-order.As L increases (with constant q) the valley deepens, making simulation for large lattices, particularly at q = 10, increasingly difficult.

Neighbourhood Compression
Capturing data for G requires a 6D histogram-one dimension for each neighbour, current spin, and future spin-which requires q 12 data points for effective estimation (using the heuristic B = √ N [26]).This volume of data is difficult, yet achievable, for a single histogram (as in G (g) ), but completely infeasible for E histograms, as required in G (s) and G (f ) .Thus we require some way to compress the histogram.
To accomplish this, we note that the transition probability depends only upon the number of spins matching the initial and final spins, rather than the exact neighbouring states.An intuitive approach replaces the neighbour dimensions with the energy delta term, ∆E ki , as this should encode all transition information.This approach is incorrect however as it incorporates information about the future state directly into the conditioned variables in the second term of Eqn.9-that is, it introduces some form of H( X | X ) into the equation, incorrectly eliminating information.
Thus we encode just the current site energy, E i , although not without trade-off: removal of neighbour details leads to consistent reduction in total available information (See Fig. 1 of main text, top row).This approach is validated with an alternative reduction with consistent results-where the binary function, δ(s i , s j ), is used for each neighbour.These approaches give significant reductions in data requirements-(5q 2 ) 2 and (2 4 q 2 ) 2 respectively.The former approach will be employed as it requires fewer bins, and thus data points, without effect on the result.

Lattice Initialisation
The main text describes an artificial reduction in G (g) , G (e) for the experimental configuration.This was partially abated by increasing the ensemble size with more realisations, however the effectiveness of this approach diminished with increasing realisations.Observation of the distribution of energy states from these realisations highlights the unavoidable weakness of simulation based approaches: critical slowing down.The initialisation regime employed is intended to side-step this weakness and produce bimodal P (E): realisations are evenly initialised to disordered and ordered states at t = 0 for T ≥ T c under the assumption that as temperature increases, the ordered realisations will rapidly dissolve into disorder.This would then circumvent issues with traversing the valley in P (E).However, observation reveals that the high-temperature dissolution does not occur rapidly enough: the normalised ordered peak is above 10 −4 until approximately T = 0.703, where for q = 10 the density of states estimation shows the peak should drop below 10 −4 at T ≈ 0.7017.
FIG. 6. G (g) for q = 10, L = 128 calculated over 10 5 time steps with 10 realisations per ensemble demonstrating two initialisation regimes for T > Tc: half of the realisations initialised to random ground states and half disordered against all realisations initialised to disorder.Both regimes use the bimodal initialisation for T = Tc.G (f ) , G (s) also included.Note that while the disorder-initialised G (e) is coincident with G (f ) away from Tc, it does not exhibit the same drop at Tc(L) which should occur due to the appearance of bimodal P (E).
If instead realisations at T > T c are initialised simply to disordered states, thus avoiding spurious ordered peaks in P (E), then G (g) , G (e) increase monotonically approaching T c , past T c (L), as seen in Fig. 6.This highlights the same slowing down from a different angle: impractically large observation windows are required to traverse the valley, and no bimodal peak is obtained at all at T c (L) (i.e., a spurious lack of ordered peak) which artificially inflates G. Thus simulation approaches are inappropriate for determining the limiting behaviour of G near the transition, as they require infeasible simulation time or finely tuned initialisation regimes, noting the latter is only possible given P (E).Yet this is redundant: if one has P (E) then Eqn. 4 from the main text can be utilised directly, as we have done, thus circumventing these issues altogether.G (f ) and G (s) have been included in Fig. 6, noting that G (s) produces coincident results with the disorder-initialised G (e) away from the transition, demonstrating the equivalence of the two approaches where the system is unimodal and disordered (and thus initialisation issues are moot).

Limiting Behaviour
The limiting behaviour of G can be determined via closer analysis of G(E, T ). Figure 7 shows G (f ) (E, T ) and P (E) at q = 10, L = 128 for selected temperatures.As with the above regimes, realisations are initialised evenly between random ground states and disordered states.We can observe the valley in G (f ) (E, T ) which realisations are unable to traverse, noting that for T = 0.702, 0.707, while no ordered P (E) peak exists, G (f ) (E, T ) is non-zero due to initialisation regime.As T increases, the system is able to move through this region of energy space, until high enough temperatures are reached such that lower energies become impossible, with the ground state realisations very rapidly becoming disordered.
Consider now the extreme energies (effectively temperatures) in Fig. 7. On the disordered end (E/N = −1), we can see that G (f ) (E, T ) peaks below the disordered P (E) peaks, and steadily decreases at higher energies (and thus temperatures), consistent with the expectation of reduced G as spins become increasingly independent.Similarly, as T → 0, low energy G (f ) (E, T ) goes to zero as well: conditioned on its own past, s i becomes independent of its neighbourhood-the neighbourhood adds no additional information to knowing the past of s i -as expected.
The observation of high energy G (f ) (E, T ) peaking earlier than P (E), in the void region, also resolves the limiting behaviour near T c .Specifically, when moving towards T c (and thus P (E) peaks at progressively lower E/N ) from high temperatures G (f ) (E, T ) is always increasing.Therefore in the thermodynamic limit, where P (E) is unimodal until precisely T c , G will increase towards T c .The peaks appearing in Fig. 1 of the main text away from T c are then due to the finite size effect of bimodal P (E) away from T c where low G ordered regimes are incorrectly sampled.Furthermore, in the limit at T c , a system will be either ordered or disordered with valley P (E) = 0, and consequently G will be undefined at T c .

FIG. 5 .
FIG.5.P (E) of Potts states for q = 10, L = 128.Left: P (E) as T → T − c , demonstrating emergence of right (disordered) peak.Middle: The location of the "effective" transition, Tc(L), defined by equal height peaks.Right: P (E) as T moves away from Tc(L), showing dissolution of ordered peak.Note that in the thermodynamic limit, each peak only exists in its relevant regime and emergence of bimodal peaks is instantaneous at Tc.
FIG.7.G (f ) (E, T ) (top) for selected temperatures at and above Tc with P (E) (middle) for q = 10, L = 128 with 10 realisations, half initialised to random ground states and half to disordered states.For temperatures near Tc, we observe a valley in G (f ) (E, T ) as in P (E), where realisations are unable to traverse.As T increases, central energy values are reachable, with lower energy values becoming unreachablenote that G (f ) (E, T ) drops to zero at E/N ≈ −1.75, −1.3 for T = 0.707, 0.710, respectively.Bottom shows G (f ) (E, T ) average over T where each G (f ) (E) is scaled with respect to frequency over E.