Higher-order temporal network effects through triplet evolution

We study the evolution of networks through ‘triplets’—three-node graphlets. We develop a method to compute a transition matrix to describe the evolution of triplets in temporal networks. To identify the importance of higher-order interactions in the evolution of networks, we compare both artificial and real-world data to a model based on pairwise interactions only. The significant differences between the computed matrix and the calculated matrix from the fitted parameters demonstrate that non-pairwise interactions exist for various real-world systems in space and time, such as our data sets. Furthermore, this also reveals that different patterns of higher-order interaction are involved in different real-world situations. To test our approach, we then use these transition matrices as the basis of a link prediction algorithm. We investigate our algorithm’s performance on four temporal networks, comparing our approach against ten other link prediction methods. Our results show that higher-order interactions in both space and time play a crucial role in the evolution of networks as we find our method, along with two other methods based on non-local interactions, give the best overall performance. The results also confirm the concept that the higher-order interaction patterns, i.e., triplet dynamics, can help us understand and predict the evolution of different real-world systems.


Triplet dynamics
In order to look beyond pairwise interactions we focus on triplets, at the configurations of three nodes in a network along with all their edges, i.e. the three-node graphlet (also known as a "triphlet"). configurations of three nodes in a network along with any edges between those nodes as shown in Fig. 1. Our focus is not on the number of each triplet type at each time but on how each triplets evolves from one arrangement to the next in a temporal network, an example of "temporal graphlets". Transition matrix. We start with temporal graphs G (s) , a sequence of graphs with one node set V but with variable edges sets E (s) , where s is a discrete time variable. The variable M s (u, v, w) , often abbreviated to M s , records the state of a three-node triplet (u, v, w) at time s. The states are the subgraphs equivalent to the three nodes and all the edges between them at time s. That is M s ≡ M s (u, v, w) is a map from a node triplet (u, v, w), where u, v, w ∈ V , to the induced graphlet, the maximal subgraph in G (s) containing the nodes u, v and w. We will use M to denote the set of all the possible three-node graphs and m i ∈ M represents one of the possible graphs in M . So formally The choice of M is not unique and it depends on the characteristics of the nodes and links used to distinguish configurations. If we characterise the states by the number of links among the three nodes, that is use unlabelled graphs, there are just four distinct states in M which we can name as m 0 , m 1 , m 2 and m 3 as shown in Fig. 1. So M 4 is the state set characterised by the number of links and here m i is the unlabelled graph of three nodes and i edges. We will use M 4 for visualisation and illustration only in our work here.
By contrast, we want to consider the labels (identities) of the nodes in our work. So for our results we work with eight states in M since each link between the three pairs of nodes can either be present or absent. That means the links between different pairs of nodes are distinct. This state set is represented by M 8 as illustrated in the lower row of Fig. 1.
Typically one uses graphlets by counting the frequency of each graphlet m ∈ M in each graph G (s) in the temporal graph sequence. So a simple way to look at the evolution is to see how these counts change. We wish to go beyond this and look at the way local structure controls the local evolution. In order to do this we define a transition matrix T which describes the likelihood that a given triplet is to be transformed in to another triplet where δ(G 1 , G 2 ) = 1 (0) if graphlets G 1 and G 2 are isomorphic (not isomorphic). The best estimate T ij (s) of T ij (s) is produced if T s−1 is the set of all possible distinct node triplets. However, for a large graph with N nodes there are N 3 = N · (N − 1) · (N − 2)/6 three node combinations making it computationally inefficient to use all triplets. For example, for N = 100, 000, N 3 = 1.7 · 10 14 , Therefore, we will use random sampling of triplets of a sufficient amount to produce our estimates. The estimations are detailed in Appendix A.
To illustrate the construction of a triplet transition matrix, we use the simpler M 4 = {m 0 , m 1 , m 2 , m 3 } . Note that the subscripts of T , i, j = 0, 1, 2, 3 correspond to subscripts of states m i in M 4 shown in Fig. 1. An example of the evolution of a network of 5 nodes and the triplet transition matrix is shown in Fig. 2. Note we will use labelled subgraphs and M 8 for our analysis.

Evidence for higher order interactions
To investigate the effectiveness of three-node interactions, we start by considering a 'pairwise model' whose dynamics is driven only by pair-wise relationships. This will act as a null model when analysing the transition matrix for artificial and real-world networks.
A pairwise model. Our 'pairwise model' is a stochastic graph model for dynamic networks 50 in which the evolution of the links is based on comparison of the evolution of the edge between pairs of nodes, so a two-node graphlet model. In the pairwise model, moving from one network, G (s) to the next network in the sequence, G (s + 1) , any pair of nodes with no existing link gains a link with probability p otherwise with probability (1 − p) the pair of nodes remains unconnected. Similarly, every existing link e ∈ E (s) is removed with probability q otherwise with probability (1 − q) the link remains. The number of links is not preserved in this model. This null model is a lower order description of the local interactions than our full analysis in terms of triplets. It is straightforward to write down the form of the transition matrix in our triplet based analysis when assuming this pairwise model gives a precise description, giving us a two-parameter transition matrix denoted as T (pw) (s) . The detailed calculation of the form is given in Appendix B.
If we assume that graph evolution follows this pairwise mechanism, we can estimate values p(s) and q(s) for the parameters p and q respectively by looking at how links changed over one time step, i.e., from the edge set This gives us our pairwise model prediction for the triplet transition matrix T (pw) (s) , where we substitute p(s) from (6) and q(s) from (5) for p and q in T (pw) in equation (B1) of appendix B. That is T (pw) (s) = T (pw) (p(s),q(s)) where  51 . The nodes are shareholders in Turkish companies and two shareholders are linked if they both hold shares in the same company in one year. The snapshots in the Turkish companies temporalShareholder network, G (s) , are for consecutive years. The second data set is a Wikipedia Mathematicians network. Nodes are pages corresponding to biographies of mathematicians in Wikipedia and two nodes are connected in one snapshot G (s) if there is a hyperlink (in either direction) between the two pages at the time the data is taken 52 . The third temporal network is constructed from college message data. In this case the nodes are students and an edge in a snapshot indicates that the students exchanged a message within the interval associated with that triplets are shown in the remaining rows in the figure above. For instance, the subgraph induced by a triplet (a) is the triplet m 3 . This triplet loses an link in the next graphlet, so the same node triplet is now associated with the two link triplet m 2 . This change, therefore, contributes to the T 32 entry. As this is the only induced subgraph (graphlet) in the earlier G (s − 1) graph which is isomorphic to the m 3 triad, this means T 32 = 1 while T 3j = 0 for j = 0, 1, 3 . On the other hand, the double link m 2 triad appears twice as an induced subgraph of node triplets in G (s − 1) , namely (c) and (d). These triplets have two different triads in G (s) leading to two non zero entries in the T 2j row, T 22 = T 23 = 1/2 . Note we use M 8 of Fig. 1 [53][54][55] . We also use a network based on face-to-face contacts at a conference on hypertext. In the network, a node represents a conference visitor, and an edge represents a face-to-face contact that was active for at least 20 s 56,57 . Finally the Email network is derived from the emails at a large institution. Each node corresponds to an email address. An edge in a given snapshot indicates that an email was sent between the nodes in the time interval corresponding to that snapshot 53,58,59 . For temporal networks based on real data, the actual time interval between consecutive graphs in our temporal network, between G (s) and G (s + 1) , is given in real units as t .
In some data sets we are able to look at the same data with different real time intervals between data sets. We provide a summary of the graph statistics in Table 1: Quantifying non pairwise interactions. The pairwise model captures the effects of the interaction of node pairs. We can use this in the form T (pw) (s) in (7) as a benchmark to show how higher-order information is captured by our triplet based analysis using T(s) of (4). One way to study this for any given temporal network is to look at the difference � T(s) of the transition probabilities between the empirical triplet transition and the assumed pairwise null model: where each entry in the matrix represents the difference between the estimated triplet transition probability and pairwise transition probability for different states.
To test our approach using our triplet transition matrices, we first look at artificial temporal graphs created from three simple models. The first model, the pairwise model, is simply the same pairwise interaction mechanism used above to define T (pw) (s) . The second model, the edge swap model, is the configuration model which is also based on pairs of nodes. The third model, the random walk model, uses short random walks in order to find new target nodes when rewiring an edge, and so this involves higher-order interactions. In the last two models, a fraction of edges in G (s) are rewired to give the next graph G (s + 1) in the temporal network. The results are as expected with matrix � T(s) close to zero for all entries for the first two models based on node pairs and it is only with the third higher-order model that some � T(s) entries are found to be large. The networks constructed are undirected. The details of the models and of the results are given in Appendix B.
We then apply this framework to analyse some real data sets. For clarity we show results for our triplet transition matrix (8) when working with M 4 and these are shown in Fig. 3. Not surprisingly, given these are from real world data sets, these who significant non-zero entries in our measure � T(s) but also they all look very different from each other, reflecting different mechanisms behind the evolution of different systems.These results for real-world systems have significant non-zero entries in our measure � T(s) . These results for � T(s) are also very different from each other, reflecting distinct mechanisms behind the evolution of these systems.
When we look for higher order interactions, we find clear differences between the triplet transition matrix T and the simple pairwise reference model of T (pw) , especially in the Turkish Shareholder network Fig. 3a. For example, compared to the corresponding probability in the pairwise model, the real probability of any triplet state becoming disconnected in the next snapshot (i.e., moving from m i → m 0 , i ∈ {0, 1, 2, 3} , the first column of the transition matrix) is much less, showing that this subgraph is very stable compared with the pairwise case in the Turkish Shareholder network. Additionally, the state is much more likely to evolve to m 1 at s + 1 (the second column of the transition matrix), which demonstrates that in many real networks, the interactions are beyond the pairwise interaction. The significance test using Z-score can be found in Appendix D.
In our analysis of real data, we use t to denote the physical time difference between snapshots G (s) and G (s + 1) . This time interval can be varied when working with the College message data of Fig. 3c and the Email networks of Fig. 3d. In these cases we try several values of t before choosing an appropriate value the the illustrations in Fig. 3. We are looking for a value that is not too short (nothing much happens) and not too long (averaging over uncorrelated changes).
Unsurprisingly, all four real world networks show considerable higher-order effects but there are interesting differences that reveal the processes behind the evolution are likely to have some significant differences. The T results are shown in the top row of Fig. 3. The result for the Turkish Shareholder networks in Fig. 3a and the Email networks in Fig. 3d look very similar. However, when we compare them to our reference model, the pairwise model, we see large differences, showing that these are very different types of temporal network and using raw values of T can be misleading. Table 1. Summary of network statistics. Edges in static graph gives the number of node pairs that have at least one interaction while the Temporal Edges gives the total number of interactions between edge pairs, so some edge pairs have more than one temporal edge.  Fig. 3d. In both cases, processes where an edge is added (upper triangle in the heat map) there is little difference from the pairwise model. This might be expected for the cases where there is at most one edge in the triplet. Only the T 23 entry for the m 2 → m 3 transition shows a slight increase over what would be predicted based on the rate of edge addition in these models (the p parameter of the pairwise model) which suggests triadic closure 7,8,47,48 does play a role here but it is very slight by these measures. On the other hand, there is a strong sign of "triadic stability", that is the complete graphlet m 3 is much more stable than we would expect given the rate of edge loss in the pairwise model (the q parameter). So it seems the social processes behind the ideas of triadic closure are possibly more important for the stability of triangles in the contexts of our Wikipedia Mathematician and Email network.It is natural to think that processes responsible for triadic closure ( m 2 → m 3 ) would also slow the rate at which such triangles break up ( m 3 → m 2 ) but we do not see this in our result. One interpretation of our results is that the social processes normally invoked for triadic closure 7,8,47,48 can, in some cases, be more important in preventing the breakup of triangles than in the creation of triangles.
In some ways the similarity between the Wikipedia Mathematicians networks Fig. 3b and the Email networks Fig. 3d is surprising as we might have expected the greatest similarity between the two communication networks, the College Message networks Fig. 3c and the Email networks Fig. 3d but that is not what we see in our measures. In particular, the stability of the complete triangle in the college message is exactly as we would expect based on pairwise measures suggesting different properties in these two communication networks, e.g.i.e. that many of the college messages are between actors who do not have strong ties.
However, the main message in Fig. 3 is that in almost any data, we find the evolution has clear signals of higher-order interactions playing an important role.

Link prediction
The analysis of our transition matrix of triplets reveals that non-pairwise interaction exists in network evolution. We now ask if these higher-order interaction patterns are essential for network evolution. We investigate this question by performing link predictions for dynamic networks based on the higher-order information stored in our triplet transition matrices.
Triplet transition score. The idea behind our algorithm is that the formation or removal of links between a node pair is encoded in our triplet interactions. The likelihood of a link appearing (or disappearing) between a node pair in a snapshot can be obtained by looking at the triplets containing this node pair and using the triplet transition matrix to see what that suggests about the evolution of any edge between our chosen node pair.
To keep our analysis simple we will assume that the interval between snapshots t is about the same size as the appropriate time scale for changes in the network. If we look at snapshots covering a short period of time, nothing much happens and the transition matrix has too little information in it. One could then look at larger  www.nature.com/scientificreports/ times scales by using T(s) , T(s − 1) , T(s − 2) and so forth to predict links in the next network snapshot G (s + 1) . WeIf we look at snapshots covering an extremely short period of time, say one that covers the typical difference in time between events, then the transition matrix contains too little information. In such a case one could then look at larger times scales by using T(s), T(s − 1), · · · , T(s − n) (for some n) to predict links in the next network snapshot G (s + 1) . However, we will use the simpler assumption that working with one snapshot by choosing t appropriately (where we have that choice) captures the essential information. This implies we have avoided the opposite problem, where t is too large so the information in the transition matrix is averaging over mostly uncorrelated and unrelated events which means we have very little signal encoded in our transition matrices.
Our second assumption is that the transition depends only on the state of the system at the last time step, which means we assume the process is Markovian. In some ways this need not be completely true. Provided the snapshots G (s − 1) and G (s) are in a similar state to the one we are trying to predict, G (s + 1) , then because we are constructing our transition matrices based on the similar states, the history of the evolution may well be encoded in this. For instance, patterns of activity in an email network of a large institution could well change if there is a major reorganisation with many people changing roles or locations in which case the history of the system has an impact on the evolution. Put another way, we assume that any non-Markovian behaviour is happening on much larger time scale than we are studying and so we can use a Markovian approximation.
Our link prediction algorithm assigns a score to each node pair. By looking at the distribution of these scores we can separate them into node pairs with low scores, predicting no link in the next snapshot, or a high score meaning this node pair will be connected. For a given snapshot G (s) , for each node pair, say (u, v), we look at all (N − 2) triplets containing that node pair count the number of each triplet falling in each state m ∈ M . Normalising this gives us our node-pair state vector ψ(u, v; s) as follows Our estimate for the transition matrix T(s) , based on G (s − 1) to G (s) evolution, is to be used to tell us about the evolution of this triplet state distribution ψ(s) in (9). For instance we estimate that the probability L β that a pair of nodes u, v has an link, β = 1 , or no link, β = 0 , in the graph G (s + 1) is given by the projection from predicted ψ(s): where P (out) jβ is a projection vector for the triplet evolution and 1 β=0 P It is this L β in (10) that we use as a score for link prediction.
Take the case of M 4 as an example, the projection from triplet distribution onto links uses The factors here arise for the state set M 4 since the unlabelled graphlets do not distinguish which links are occupied in the one-link triplet m 1 or two-link triplet m 2 . For the state set M 8 which we use in most of our work, this projection matrix is much simpler with entries either 0 or 1. If we choose (u, v) to be the first link, so it is the only link in the triplet we call n 1 in Fig. 1, then we can use bitwise logical operators to represent P (out) jβ as (j AND 1) XOR β.
Node similarity. We are using link prediction as a way to test that our triplet transition matrices T capture important higher-order interactions in the evolution of temporal networks. In order to see how effective our approach is, we need to compare against other methods of link prediction. All the methods we use are listed in Table 2. All these methods can be discussed in terms of a node similarity score and in this section we will start our examination of these methods by considering the node similarity scores used in each method. This will allow us to examine the relationships between these various methods and ask if they capture higher-order interactions to any extent. The next stage is to turn these similarity scores into a link prediction and we will look at this step in the next section.
The various link prediction methods can be categorised based on the type of information used to make a prediction about each node pair: those which use only local interactions probing a fixed distance from the node pair, and those global methods which use nodes arbitrarily far from the node pair of interest. This is indicated by the length scale of methods indicated in Table 2 and will be clear from the power of the adjacency matrix A appearing in the similarity scores defined in this section.
One other point can be made. In terms of the temporal network, none of these existing similarity scores, none of the these link prediction methods from the literature, use more than one temporal snapshot G (s) . So one standout feature of our method is the use of two temporal snapshots, G (s) and G (s − 1) . The evolution of a triplet from one snap shot to another records, in an indirect way, the effects of other nodes beyond the triplet of interest as illustrated in Fig. 4. This is why our method is listed as probing long length scales in Table 2. So one standout feature of our method is the use of two temporal snapshots, G (s) and G (s − 1) . The evolution of www.nature.com/scientificreports/ a triplet from one snap shot to another records, in an indirect way, the effects of other nodes beyond the triplet of interest. This is why our method is listed as probing long length scales in Table 2.
In the following s(u, v) is a (similarity) score assigned to a pair of vertices The similarity scores are then used to decide if an edge should exist between u and v and that will be used in turn to make the link prediction for the next snapshot in time. We will assume a simple graph in these discussions.
The simplest node similarity measure is the Edge Existence (EE) index s EE (u, v) . After all, if there is already an edge between two vertices u and v this is a good indication of a close relationship between these nodes, and vice versa. We define Edge Existence index to be That is Every edge between u and v adds one to this index so, for a simple graph, this is the corresponding entry of the adjacency matrix, A uv . This is the simplest edge prediction method in that it predicts no changes at all so it is not very useful and we do not use it in our work. However, we will see this score contributes to some of the other, more sophisticated methods, so it is useful to define this score. The Edge Existence score records no higher-order effects, there is a path of length at most one between the nodes.
While not very useful, this Edge Existence (EE) index could produce some very good statistics. For instance, if very few edges are changing in each time interval, then this method would predict the behaviour of the vast (12) s EE (u, v) = A uv = e∈E δ e,(u,v) . Table 2. Table of the link prediction methods used and their abbreviations. The length scale given indicates the longest path length involved in the method or equivalently the largest power of the adjacent matrix involved in the method. Under code, nx indicates that a NetworkX 65 library routine was used, while own indicates the authors' own code was used. The Edge Existence (EE) approach was not investigated numerically but was included for the sake of comparison. All these methods, except for the Triplet Transition method, have a temporal path length of 0. That is they are derived solely from the current snapshot, G (s) , when making a prediction for links in the next snapshot G (s + 1) . The Triplet Transition method has a temporal path length of 1 as it uses G (s) and G (s − 1) to predict G (s + 1).  It is the network outside the triplet which provides a long distance connection between triplet nodes A and C. Within the original ABC triplet in snapshot G(s − 1) , C appears disconnected. By using two snapshots, our transition matrix includes the effects of such long-range links and so this approach can explain why the edge (A, C) might be appear more often than otherwise expected from the triplet subgraph alone. www.nature.com/scientificreports/ majority of edges as most remain unchanged. It would only do badly if we used statistics that specifically measured the success of node pairs changing their connectedness from one time step to the next. A good example ofthis when the Edge Existence method would appear to be successful is when we break up the data into small steps in time where few edges change. However we would learn little of interest in this case. would be when we are able to break the data up into arbitrarily sized steps in time, and if we choose steps which are too small, we will get little change step because of this discretisation choice. The Common Neighbours (CN) method 60,61,66 simply scores the relationship between two nodes based on the the number of neighbours they have in common This will tend to give large scores if u and/or v have high degrees and this is illustrated in more detail in Appendix E.
One way to compensate for the expected dependence of s CN on the degree of the nodes is to normalise by the total number of unique neighbours. This gives us the Jaccard Coefficient 60,66 (JC) method based on the well known similarity measure 68 in which the likelihood that two nodes are linked is equal to the number of neighbours they have in common relative to the total number of unique neighbours.
The Preferential Attachment (PA) method for link prediction 60,66 is based on the idea that the probability of a link between two vertices is related to the product of their degrees of the two vertices This is proportional to the number of common neighbours expected in the Configuration model 69 as has been used in the context of collaboration (bipartite) collaboration graphs 60,70,71 .
The Resource Allocating Index 63 (RAI) method and the Adamic-Adar Index 60,66 (AAI) method are both based on the idea that if two vertices u and v share some 'features' f that is very common in the whole network then that common feature is not a strong indicator that the two vertices should be linked. The converse is true if the common feature is rare, that is a good indicator that the vertices u and v should be linked. So in general if W(x) is a monotonically decreasing function of x we can use this on the frequency n(f) of the occurrence of feature f to give a generic similarity function of the form s W (u, v) = f ∈u,v W(n(f )) . In our case, we will not assume any meta-data exists, but we will look for methods that use features which are based purely upon the topology of the network.
In the Resource Allocating Index 63 the inverse of the degree of the neighbour is used as the weighting function, so W(n(f )) ≡ 1/|N(w)| giving Note that this means that the contribution from any one node w to the total of all scores S RAI (u, v) = u,v s RAI (u, v) is half of the degree of w minus one, (N(w) − 1)/2 . Thus the Resource Allocating Index still gives high degree nodes more weight.
On the other hand, the Adamic-Adar Index 60,66 uses the inverse logarithm of the degree to weight the contribution of each common neighbour to the score, W(n(f )) ≡ 1/ ln(|N(w)|) . That is All the indices mentioned above have been based either on the degree of the two nodes of interest s EE or on the properties of u, v and their common nearest neighbours w ∈ N(u) ∩ N(v) . This involves paths between the two nodes u and v of length two or less. The next logical step is to include paths of length three and the Local Path Index (LPI) s LPI 61,63 is an example of this where Here β is a real parameter where β = 0 reproduces the Common Neighbours score of (13). The βA 3 term is counting the number of walks of length three that start at u and end at v. If there is already an edge between u and v then [A 3 ] uv includes backtracking paths such as the sequence u, w, u, v. This means the second term also includes a term equal to A uv (|N(u)| + |N(v)|) , i.e. there is a contribution from the Edge Existence similarity s EE (u, v) (12) in this method.
The Katz Index 60,61,66 (Katz) counts the number of paths between each pair of vertices, where each path of length ℓ contributes a factor of β ℓ to the score. The score is simply the appropriate entry of the matrix [I − βA] −1 , where β is positive but must be less than the largest eigenvalue of the adjacency A . Note for low β and for a simple graph we have that .
. where D is a diagonal matrix whose entries are equal to the degrees of the nodes, D uv = δ u,v |N(u)| . The motivation for using this normalisation is that |N(u)| · |N(v)| is proportional to the number of neighbours expected in the configuration model, as shown in Appendix E. So the Local Leicht-Holme-Newman Index is the Katz score relative to the Katz score expected for the same pair of nodes in the configuration model. The Matrix Forest Index 64 (MFI) is defined as: where L = D − A is the Laplacian. One way to understand the Matrix Forest Index is to consider the diffusion process described by a Laplacian. If we were to demand that at time t + 1 we had only had particles at one site u, then s MFI (u, v) tells us how many of those particles were at vertex v at the previous time step.
From node similarity to link prediction. All the link prediction methods used in this paper, see Table 2, assign a similarity score between pairs of nodes. To turn this into a link prediction, the basic conjecture is that the higher the similarity between a pair of nodes, the more likely we are to find a link between these two nodes. All methods, therefore, require a precise method to turn the scores into predictions, essentially to define what is meant by a 'high' or a 'low' score by introducing a classification threshold. Often this is done very simply by ranking the scores and using a fixed number of the most highly ranked node pairs to predict a link. This method is typically used only for link addition in which one is only trying to predict when an unlinked node pair gains a link, that is the A uv (s) = 0 to A uv (s + 1) = 1 process.
We are interested in the most general predictions, looking at all four possible changes for node pairs from one snapshot in time to the next, that is all the four possible A uv (s) = 0, 1 to A uv (s + 1) = 0, 1 processes. This is link evolution rather than link addition. In order to make these more general predictions for any method, we use k-means clustering methods to separate the prediction scores produced by each method into two classes: a high score group and a low score group of node pairs. Any node pair with a score in the high scoring group will be predicted to have a link in the next snapshot; node pairs in the low scoring group will be predicted to have no link.
To show how this works, we give some examples of how the score from our triplet transition method produces a natural split into low and high scores which is easily discovered by an automated clustering method such as k-means, as seen in the first-row of Fig. 5. In our final results below for our link prediction algorithm, we use M 8 and the clear separation of node similarity scores into a low and high group is shown in the second row of Fig. 5. For comparison we also show the results for our method using the less sensitive M 4 . In practice, both seem to separate low and high scoring node pairs well but we get a slightly clearer separation in some cases when using M 8 .
We can look at how the other nine link prediction methods perform in terms of identifying two clear groups of low-and high-scoring node pairs. What is surprising is that most of the algorithms fail to do this well, or some cases at all, in most of the data sets as shown in Section E of the Appendix. An example of a poor separation, the Jaccard Coefficient method, is shown in Fig. 6. The one exception is the Katz method, also shown in Fig. 6, which works well in three of the four data sets. Given this is one of the methods probing large distances in the network, this would seem to support the idea that higher-order interactions are important in link prediction.
Since our triplet transition method splits the node similarity scores so well, any unsupervised clustering method should be able to split the node pairs into a low scoring cluster and a high scoring cluster. As the problem is in one-dimension a version of k-means clustering is sufficient and this will always assign our node pairs to either a low or to a high score even if a method does not have a clear threshold visually. The objective function of our clustering is J where where θ is the Heaviside step function. The sum over (u, v) is over all distinct nodes pairs (so (u, v) ∈ V 2 |u � = v ). The µ + ( µ − ) is the average similarity score of node pairs in the high score (low score) cluster. The problem is reduced to finding the threshold value s th that minimises J.

Evaluation metrics.
To evaluate the performance of our Triplet Transition and the other link prediction methods, we use a variety of standard metrics.The simplest metric we use is to simply look at the fraction of edges that are removedThe simplest metric we use is the fraction of edges that are removed from snapshot s to the next snapshot s + 1 . We denote this by f E (s), where E (s) is the link set of the network in snapshot s. (20) s Katz (u, v) = βA uv + β 2 We also use more traditional metrics such as precision and area under the curve for which we need a binary classification. Thinking of snapshot G (s + 1) as the current state while we are trying to predict the state in the next snapshot, we consider four transitions (A uv (s) → A uv (s + 1)) of node pairs: 0 → 0 , 0 → 1 , 1 → 0 , and 1 → 1 . We map these states onto a binary set of states, namely A uv (s + 1) , so what we call a positive result (negative result) is where a link is present (is not present) in snapshot (s + 1) regardless of the state that pair started in. We then consider whether the prediction for the state of the node pair in snapshot s + 1 was true or false. So a false positive is where we predict no link for a node pair when that pair did end up with a link, while a true negative is where we correctly predict a pair of nodes will not be connected in the next snapshot. The table of predicted classes and actual classes is shown in Table 3.
With this traditional binary classification, we can then evaluate the performance of our Triplet Transition method using standard metrics (see Appendix F for more details)-area under curve and precision. To express these, we define N αβ± to be the number of node pairs satisfying the following criteria. The node pair starts in snapshot G (s) in state α equal to 1 if the node pair is connected by an edge, 0 otherwise. This edge pair is then in state β in snapshot G (s + 1) with β is 1 if the node pair is connected by an edge in snapshot G (s + 1) , and is 0 otherwise. Finally the sign indicates if the prediction made for that node pair in G (s + 1) was correct (true, + ) or incorrect (false, −).
We can evaluate the methods independent of the classification threshold chosen through k-means by using the area under the curve (AUC) where the curve is the receiver operating characteristic curve. For any binary classifier of the results of an algorithm, such as defined here in Table 3, the curve plotted is the fraction of positive results (TPR, true positive rate) which are correct against the fraction of negative results which are incorrect (FPR, false positive rate) as the threshold s th is varied: Once the threshold s th has been fixed, in our case using k-means (23), the precision score S prec is defined as the number of times that we predict a link exists between node pairs in the later snapshot correctly (a true positive, N 11+ + N 01+ ) divided by the number of times we predict a link, correctly (true positive) or incorrectly (false positive, N 11− + N 01− ): The precise values are not important as the important feature is the clear separation of the node similarity scores into two groups. One cluster c 0 appears to have a 'low' probability that a link will exist in the next snapshot and the other cluster has 'high' probability. We therefore predict that if the node pair has a score in the lower score cluster c 0 , then a link will not exist in the next snapshot. Conversely, if a node pair exists in the higher score cluster c 1 , then we predict a link will exist between this node pair in the next snapshot. The first row, where the histogram is plotted in blue is the clustering results for the M 4 while the second row, where the histogram is plotted in green is the clustering results for the M 8 . It shows that the M 8 has higher separation between two clusters than M 4 . The results for each network are based on samples which contain at least 2000 node-pairs with an initial link and at least 2000 more node pairs which started without a link. www.nature.com/scientificreports/ A high precision score means we can trust that links predicted by the algorithm will exist. We can set a baseline value for precision using a simple model which predicts a link in snapshot G (s + 1) exists for any given node pair with a probability given by the fraction of node pairs which have an link in snapshot G (s) , that is the density ρ(s) . In this case, the precision (26) is simply equal to the density in snapshot G (s) , S prec,base = ρ(s) , see Appendix F.
A very low precision score for baseline method indicates that the links existing between node pairs are not random. Some of the local measurement algorithms give lower scores than the baseline, suggesting that those types of local information are less likely to drive the evolution of the networks.
Edge sampling. We use the whole network to evaluate the performance, which is different from the sampling method used in Fig. 5 to take account of the full nature of the data. In the following analysis, we predict the node pairs for all the possible node pairs of a whole network. We expect predictions can capture evolving characteristics of networks and the Mathematician networks change tiny fractions which are not sufficient enough to evaluate the predictions. Therefore, we replace the Mathematician networks with Hypertext networks (see Appendix C for more details of this data set) in the following prediction analysis. Time data is sampled over Figure 6. The histogram of node similarity scores using Jaccard Coefficient (JC) & Katz methods on four real data sets. Each column is for one data set which are, from left to right: Turkish Shareholder networks, WikiWikipedia Mathematician networks, College Messages network, and Email networks. The first row is the similarity score used in Jaccard Coefficient method and the second is for the Katz method. The precise values are not important here as the key feature is the success or failure to identify two clear groups of low-and highscoring node pairs. We under-sampled 1000 linked node pairs and 1000 non-connected node pairs 67 . Only the Katz method (bottom row) shows the necessary clustering of scores needed for link prediction. Similar plots for all the link prediction methods used in this paper are shown in Figure E11 of the Appendix. Table 3. The Confusion matrix for the prediction of edges between nodes u and v in the next snapshot, (s + 1) . The adjacency matrix at snapshot s in the temporal network is A(s) . Each of the four outcomes comes from two situations as this confusion matrix does not consider the state of the edge in snapshot G (s) . The two states are shown for each of outcome in brackets with the notation that αβ± shows the value of A uv (s) as α , the predicted A uv (s + 1) as β , and a + symbol (a − symbol) indicates that the prediction made was correct (incorrect).

Actual classes
A uv (s + 1) = 1 A uv (s + 1) = 0 www.nature.com/scientificreports/ different time intervals t . For Turkish Shareholder networks, the smallest time separation is 2 years, and we consider four or more years too long for the evolution; for the Hypertext data, which records the short communications, we choose t as 40 and 60 minutes; for the College Message and Email networks, we consider that 8 hour to 1 month communication frequency is reasonable. Figure 7 shows a comparison of AUC for the ten algorithms when we apply them to node pairs sampled uniformly at random from all possible node pairs. The Triplet Transition (TT) approach defined here is the leftmost triangular point in each figure. In the Hypertext networks (b) and the College Message networks (c), the Triplet Transition method has the highest AUC though the PA, Katz, MFI and LPI algorithms perform almost as well (see Table 2 for abbreviations). For the Turkish Shareholder network (a) the AUC of our Triplet Transition method is again the highest though with a similar AUC value for various algorithms. Note that PA, Katz, and MFI are now weaker. Only for the Email network (d) is our Triplet Transition outperformed, in this case by the Katz, MFI and LPI algorithms. Figure 8 shows that there are some clear patterns in our results for precision. First, each link prediction method has a similar performance relative to the other methods on three of the networks: the Hypertext network (b), the College Message network (c), and the Email network (d). On these three data sets, three algorithms are consistently better than the others though with similar performances relative to one another: our Triplet Transition (TT) method, the Katz method 60,61,66 , and the Matrix Forest Index 64 (MFI) method. All of these are Table 4. Summary of AUC-ROC performance, t is taken by selecting the highest AUC-ROC. The AUC scores for each network are in the left-hand column, the rank of the score in the right-hand column. The errors quoted on the AUC scores are standard deviations from all the computed network snapshots of the selected time scale except for Turkish Shareholder networks where only two predictions are made. We highlight the highest result and any whose result is within the error quoted on the largest result. The baseline method for AUC is the random clustering of node pairs into two groups, and the AUC is 0.5 which means it is like 'tossing a coin' . The Type column indicates if the link prediction method probes only ultra scales (path length two) or global scales (infinte path lengths).   www.nature.com/scientificreports/ probing non-local information in the networks, suggesting this is necessary to understand the time evolution of these networks.

Results
Overall we see some similarity in these AUC results, summarised in Table 4 as we saw for precision in Table 5. The same three algorithms, the Triplet Transition (TT) methods, the Katz method, and the MFI method, have a similar high performance on the same three networks, the Hypertext network, the College Message network and the Email network. For these three networks, we might also pick out the Preferential Attachment (PA) algorithm. However, now the AUC values for the Turkish Shareholder network show that our Triplet Transition method continues to perform well, unlike for the precision measurements. The Katz, Matrix Forest Index and Preferential Attachment methods are, though, in a weaker group of algorithms as measured by their AUC performance on theTurkish Shareholder network.
For three of these networks, we also measure AUC over snapshots of these networks defined over different time scales. The comparison among methods rarely changes. However, the performance of most algorithms is altered by the size of the time window chosen in some cases, reflecting inherent timescales in the different systems. The most noticeable is that for the Email network, there is a large difference between windows of eight hours and one week but not so much change between a week and one month. The gaps may suggest that if a person is going to produce an email, perhaps following up an email request of bringing a third person into the conversation, that new email is done often on the scale of a few days not always on the scale of a few hours.
The Triplet Transition proposed here is used to predict a link between two nodes by considering these alongside a third node which can be in any position in the network. In this way, this third node not only captures the higher-order interactions but also enables the method to encode both local and non-local information about the link of interest. We find that our Triplet Transition method and the two other global methods used here, Katz Index method 60,61,66 and the Matrix Forest Index 64 are generally the best for most of networks we studied, in particular for the Hypertext networks, the College Message networks and the Email networks. As the most successful methods here perform better than other approaches based on local measures, this shows that in most  Table 2 for abbreviations used in indicate link prediction methods. www.nature.com/scientificreports/ systems the pattern of connections depends on the broader structure of interactions. This dependence of the behaviour of systems on structure beyond nearest neighbours is the crucial motivation for using the language of networks rather than just looking at the statistics of pairs 73 . For instance, the Katz method counts the number of paths between each pair of nodes, probing all paths though giving less weight to longer paths, so nearest neighbours contribute the most. The results for theTurkish Shareholder networks were a little different. In this case, predictions based on local measurements (paths of length two), the semi-local Local Path Index method (LPI), as well as our nonlocal triplet transition method outperformed the other global methods in terms of AUC, see Table 4. The global methods also perform poorly on precision, see Table 5. An algorithm with low precision and high AUC, such as Jaccard Coefficient (JC), is predicting the disconnected pairs well.

Discussion
In this paper, we considered the temporal evolution of networks by looking at a sequence of snapshots of each network. The network G (s) defined for snapshot s contains all the links present between nodes for a certain period, t . Each snapshot s covers the period t immediately following the latest time included in the previous snapshot. In some cases, we have looked at the effect of changing the size of the temporal windows t on our results. The main tool we use to study the network evolution is the transition matrix T of Eq. (2) which are derived from the evolution of three-node combinations in a network from one temporal snapshot G (s − 1) to the next G (s) . These transition matrices are obtained from real data by counting the different states of three nodes found in consecutive network snapshots.
To decode the higher-order interaction patterns of network dynamics, we fitted a pairwise interaction model to the real data and computed the transition matrices from the fitted parameters. By comparing the actual transition matrices found numerically with those predicted using a simple pairwise model, as shown in Fig. 3, we demonstrated that higher-order interactions are needed to understand the evolution of networks. For example, in the Email network, derived from the emails within an EU Institution, if three nodes are connected in the previous temporal snapshot, they are less likely to be disconnected in the following snapshot.
To show that we must look beyond the interactions of neighbours, we then designed a link prediction algorithm based on the transition matrix T of Eq. (2), our Triplet Transition (TT) method. We compared this method with nine other link prediction methods as well as to simple baseline measures. What we found was that on a range of different temporal networks, the Triplet Transition method was as good as two methods based on nonlocal (global) information in the network, namely, the Katz Index method 60,61,66 and the Matrix Forest Index (MFI) method 64 . While not always the best on every network or every measure, these three global methods were usually better than the other methods we studied, all of which used information on paths of length two in the network. Intriguingly, the one other method that used paths of length three, Local Path Index 61,63 (LPI), often performed well too though rarely as well as the top three global methods.
Since the most successful methods in our tests were those that access non-local information, it seems that such information is essential in the evolution of most networks and therefore, it is important to include this in network measures. However, including information from the whole network is numerically intensive and, for any reasonably sized network, the evolution of a link is unlikely to depend directly on what is happening a long way from that link. One reason why our Transition Triplet method works well is that it does not emphasise the vast majority of the network. Most of the information in the transition matrix network is based on neighbours of one or other of the link of interest. For large networks, we use sampling to add the necessary global information into the transition matrices. The Katz index method does include information from all scales but suppresses contributions from more distant parts of the network. The success of the transition triplet approach suggests that there is no need to access all of the global information of a network in order to know what is going on locally. Notably, the overall better performance of the Triplet Transition method reveals that the information of different higher-order interaction patterns can help understand and predict different dynamics of networks.
There is also another reason why our Triplet Transition method may work better than the local methods we look at, and that is because our method is also probing a longer time scale as well as a longer spatial scale. That is we use two snapshots, G (s − 1) and G (s) in order to create the transition matrix T which in turn we use for the predictions in snapshot G (s + 1) . All the other methods used here are based on information from one snapshot G (s) . So again, the success of the Triplet Transition method points to correlations over short but non-trivial time scales as being important in understanding network evolution. In our case, the dependence of results on the time intervals can be seen in the effect of t used to define the transition matrices are important. For different data sets, we found different t gave optimal results which of course reflects inherently different timescales in the processes encoded by our different data sets. So another conclusion is that higher order effects in terms of both time and space are needed to understand the evolution of temporal networks and to make effective predictions for links. The inclusion of higher-order order effects in terms of time remains undeveloped relative to the work on higher-order spatial network features.
In our approach we have focused on changes in the edges and ignored changes in the set of nodes. Our method can include nodes which are not connected in some or even all our snapshots provided these nodes are known. However, in many cases the data sets may only record interactions, our edges, and our set of nodes is only inferred from that. As a result we often have no information about such (totally isolated) nodes. Suppose we are given a list of emails, our edges, sent at a given time between email accounts, our nodes. We cannot distinguish accounts which are dormant from those that are deleted or added to the system without additional information. Should we have such data on node birth and death, then in some cases it might be an important process to include, but it is not one we address in our approach. www.nature.com/scientificreports/ Currently constructing optimal higher order models is a timely topic 5 . Our method can be generalised to include even higher order interactions, such as quadruplet and so on. The method can be used to indicate higher order evolutionary mechanisms in a network and suggest what is the most likely order of interaction. Our work shows that studying the evolution of small graphs over short time periods can reveal important information and predictions regarding network evolution.