Quantifying the effect of temporal resolution on time-varying networks

Time-varying networks describe a wide array of systems whose constituents and interactions evolve over time. They are defined by an ordered stream of interactions between nodes, yet they are often represented in terms of a sequence of static networks, each aggregating all edges and nodes present in a time interval of size Δt. In this work we quantify the impact of an arbitrary Δt on the description of a dynamical process taking place upon a time-varying network. We focus on the elementary random walk, and put forth a simple mathematical framework that well describes the behavior observed on real datasets. The analytical description of the bias introduced by time integrating techniques represents a step forward in the correct characterization of dynamical processes on time-varying graphs.

Time-varying networks describe a wide array of systems whose constituents and interactions evolve over time. They are defined by an ordered stream of interactions between nodes, yet they are often represented in terms of a sequence of static networks, each aggregating all edges and nodes present in a time interval of size Dt. In this work we quantify the impact of an arbitrary Dt on the description of a dynamical process taking place upon a time-varying network. We focus on the elementary random walk, and put forth a simple mathematical framework that well describes the behavior observed on real datasets. The analytical description of the bias introduced by time integrating techniques represents a step forward in the correct characterization of dynamical processes on time-varying graphs.
T ime-varying networks are ubiquitous. Examples are found in the social, cognitive, technological and ecological domains as well as in many others 1 . The temporal nature of such systems has a deep influence on dynamical processes occurring on top of them [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21] . Indeed, the spreading of sexual transmitted diseases, the diffusion of topics over social networks, and the propagation of ideas in scientific environments are affected by duration, sequence, and concurrency of contacts 2,4,[17][18][19]22,23 . In all these cases the timescale characterizing the evolution of the network is comparable with the timescale ruling the unfolding of the process, and they cannot be decoupled. However, empirical datasets are often reduced to a series of static networks by introducing a timeintegrating window, Dt 1, [24][25][26][27] . This is the case, for instance, of face-to-face interaction networks 28 , for which the fine-grained temporal resolution of (e.g.) phone call networks is not available, or of infants' semantic networks 29 , whose evolution can be studied only through the analysis of few snapshots 30 . In other instances, a time window is introduced to reduce the amount of stored information, or to simplify the application of mathematical frameworks developed for static or annealed systems. This is the case, for example, of online social networks where, although usually the original information has time resolutions down to the second, the available datasets are integrated over different windows of hours, days, months, or even years. Thus, the introduction of an integrating window is either intrinsic to the system under study or dictated by practical reasons.
In this work we address the impact of an arbitrary Dt on the description of a discrete dynamical process taking place upon a time-varying network. Despite recent results showing that the presence of any level of temporal aggregation may affect the correct characterization of dynamical processes evolving on top of such datasets 2-21 , an analytical formalization, characterization, and understanding of these effects for a general Dt is still missing.
In particular, we focus on the prototypical random walk process evolving on time-varying networks integrated over a general time window Dt. First, we clarify the relevance of the integrating window issue by studying the behavior of random walk processes on real time-varying networks as a function of Dt. Then, we introduce a mathematical framework that well describes the observed behavior on synthetic activity driven networks 17 as well as on two different real datasets.

Results
We aim to understand how Dt affects the behavior of dynamical processes taking place on time-varying networks. To this end, we consider the fundamental random walk (RW) process on two different real time-varying networks in which the links have been integrated over different integrating windows Dt (see Fig. 1). Typically, the RW asymptotic occupation probability r (see Methods for the formal definition) is computed grouping the nodes according to their the degree k 31-33 . The quantity r k is then defined as the average asymptotic occupation probability of a node in the degree class k 31-33 . However, in time-varying networks the degree of a node is not univocally defined and, more importantly, is a function of Dt. For example, the degree might be the number of connections integrated over the time window, or the average number of connections across the T/Dt static frames (where T is the total time span of the data). Thus, the same node could contribute to different degree classes depending on the value of Dt. We, therefore, focus on a different node measure that has been shown to be mostly invariant to Dt, namely the activity rate a of a node 17 . The activity rate a is defined as the average rate at which each node interacts with others during the observation period [0, T], and can be interpreted as the intrinsic attitude of each node to engage in interactions with other nodes. We aim to calculate the occupation probability as a function of a.
In our simulations we consider two real time-varying networks, and investigate the RW occupation probability function of activity rate a and the integrating window Dt: r a (Dt). The first dataset is the co-authorship network of the Physical Review Letters (PRL) journal from 1980 to 2006 34 . The second dataset is the Yahoo! music dataset with ,4.6 3 10 5 songs rated by ,2 3 10 4 Yahoo! users over six months 35 . We run the RW process over these two time-varying networks for different values of Dt, and record the occupation probability over multiple runs (see SI for details). Fig. 2 shows the empirical values of r a (Dt) (solid points) observed in the PRL dataset for four distinct values of Dt 5 {1, 10, 60, 182} days. Error bars represent the the standard deviation obtained from distinct simulation runs starting at times t 0 g {0, 1, …, Dt 2 1} from the beginning of the dataset.
The effect of Dt is dramatic. Over large values of Dt the RW behaves roughly as could be expected. The share of random walkers increases with the node activity, i.e., highly active nodes are collect more walkers at the end of the simulation than nodes with low activity. However, as Dt decreases, more active nodes lose their power to attract walkers and the occupation probability becomes more uniform. A similar scenario is observed over the Yahoo! dataset over four values of Dt, namely one second, one hour, six hours, and one day (points in Fig. 3). In the next section we will see that the reason for this behavior rests solely in the probability that the RW sees no edges when it decides to move, which turns out to be a function of three factors: Dt, the activity of node the walker resides, and the average node activity in the system.
Mathematical formulation. Let us consider a random walker diffusing at discrete time steps Dt over a time-varying network characterized by N nodes. Starting at node V (t) at step t, the walker takes step t 1 1 at time (t 1 1) Dt diffusing over a network G t (Dt), where G t (Dt) is the result of the union of all the edges generated in the interval [tDt, (t 1 1) Dt). We focus on the general case of an arbitrary time aggregation window Dt . 0.
We consider a simple class of time-varying networks called activity driven networks 17 . The crucial ingredients of these models are: dF(a), the fraction of nodes with activity rate a, and m, the number of edges that are simultaneously created by a node (see Methods for further details). The activity rate determines the probability per unit time for a node to establish (m, simultaneously) edges to other nodes in the system. The value of parameter m is dictated by the specific system under consideration. The case m . 1 is appropriate to describe oneto-many interactions, found for example in such systems as Twitter and blog networks 36,37 . On the other hand, m 5 1 describes two-party (dyadic) communications that are characteristic of phone-call and text-message networks 38,39 . At each step t 5 0, 1, … an unweighted network G t (Dt) is generated as follows: a) G t (Dt) starts with N disconnected nodes; b) The the number of times a node with activity a is active during interval Dt, K Dt,a , is Poisson distributed Node generates mK Dt,a undirected edges connected to mK Dt,a randomly selected nodes (without replacement or self-loops). Inactive nodes in this observed period of Dt may receive connections from other active vertices; c) At time (t 1 1)Dt the process starts over from step a) to generate network G t11 (Dt).
Although activity driven networks are Markovian (memoryless) and lack of some properties observed in real temporal systems, they can be considered as the simplest yet nontrivial framework to study the concurrence of changes in connectivity pattern of the network and dynamical processes unfolding on their structure 17,18 .
To describe the RW behavior, we need to evaluate the transition probability that a walker starting at a node with activity a9 moves to a node with activity a at the next Dt time step, Q aja9 (Dt). Without loss of generality in what follows we focus on the case m 5 1. Detailed results for the m . 1 one-to-many interactions are discussed in the Supplementary Information. At step t 1 1 the neighbors of V (t) can be classified into two types: 1. Passive destinations, are neighbors of V (t) connected by edges created due to the activity of V (t) itself. They are randomly selected from the graph and thus their activity is distributed according to dF(a). We define K Dt,A(t) to be the number of such passive destinations, where A(t) is the activity rate of node V (t). 2. Active destinations, are neighbors of V (t) connected to V (t) by edges created due to their own activity. Thus, their activity is distributed as adF(a)/AEaae, where AEaae is the average activity rate in the system. We define define H Dt as the number of such active destinations.
The word destinations highlights the fact that the walker moves from V (t) to one of these K Dt,a9 1 H Dt neighbors of V (t). For sufficiently large N, H Dt and K Dt,a9 are both Poisson distributed with average AEaaeDt and a9Dt, respectively. If V (t) has at least one edge, the walker follows the edge of a passive destination with probability K Dt,a9 /(K Dt,a9 1 H Dt ), while it moves towards an active destination with probability H Dt /(K Dt,a9 1 H Dt ). Unconditioning the latter expressions with respect to the values of K Dt,a9 and H Dt we obtain where d(x) is the Dirac delta function. While we refer the reader to the SI for the detailed derivation, each term in eq. (1) has a simple interpretation. The two terms inside the double sum represent, respectively, the probability that the walker moves to a passive destination that has activity a and the probability that the walker moves to an active destination that has activity a. The terms multiplying the two terms inside the double summation are related to the probability that K Dt,a9 5 k and H Dt 5 h. The d(a 2 a9) term considers the probability that the node has no edges after Dt and thus the walker must remain at V (t). Thankfully, eq. (1) can be simplified (see SI) yielding where f a9,Dt 5 e 2(a91AEaae)Dt is the probability that no edge is created at a node with activity a9 during interval Dt. Note that in eq. (2) the parameter Dt only affects the probability that no edge is created until the next time step. To find the RW stationary distribution we first note that the RW on the time-varying network is stationary and ergodic (see SI). Thus, the RW occupation probability r a , defined as the probability of finding the walker in a given node of activity a, exists and is unique 40 . The value of r a is the fixed point solution of the following Chapman-Kolmorogov set of equations 41 where V is the set of all activity rates in the system. The solution to eq.
(3) can be obtained numerically. Interestingly, we can extend eq. (3) to consider lazy random walks where the walker moves with probability p g (0, 1] or does not move with probability 1 2 p. For the lazy walker we just need to replace Q aja9 (Dt) in eq. (3) with Q aja9 (Dt)p 1 d(a9 2 a) (1 2 p). A simple algebraic manipulation shows that r a does not change with p. Hence, the steady state of the lazy walker for any p g (0, 1) is the same as the walker that moves with probability p 5 1.
We also find that closed-form solutions of eq. (3) exist in the limits of Dt?1 and Dt=1. In the Dt?1 case, links are integrated over a large time window and the time-varying network can be considered static. Recall that f a,Dt 5 e (a1AEaae)Dt . For Dt?1 the value of f a,Dt < 0, a g V, and thus the second term of eq. (2) is close to zero. In this scenario Q aja9 (Dt) 5 C(a 1 AEaae)dF(a), where C 5 1/2AEaae yielding the fixed point solution of eq. (3) The asymptotic occupation probability of a given node of class a is simply proportional to its activity. Since in the regime of large Dt the degree of a node v, k v , is proportional to its activity, a v , eq. (4) yields r av !k v . Thus, for sufficiently large Dt, we recover the well-known behavior of static networks, where the occupation probability of a node is proportional to its degree 31 . Furthermore, in the SI we show that eqs. (2), (3), and (4) hold for weighted aggregation procedures where integrated edges have weights proportional to how often they appeared during an interval Dt.
In the regime of very short aggregating windows we have lim DtR0 f a,Dt R 1, a g V. Thus, the first term of eq. (2) is zero yielding Q aja9 (Dt) 5 dF(a) and the trivial fixed point solution of eq. (3) Thus, the walker is equally likely to be found at any node regardless of its activity rate. In fact, when Dt is small the probability a node has more than one edge is close to zero. Consequently, highly active nodes lose and gain walkers at the same rate, giving rise to homogeneous occupation probabilities in eq. (5). Interestingly, in previous work on general time-varying network processes we show that the result in eq. (5) holds even when aggregated snapshots have arbitrary strong spatio-temporal correlations 40 .
Numerical validation on synthetic networks. We validated our analytical results through extensive numerical simulations. We considered networks with N 5 10 5 nodes and a power-law activity distribution dF(a) / a 2c (as observed in many real networks 17 ), restricted to the interval V 5 [10 23 , 1] to avoid divergencies in the limit a=1. As shown in Fig. 4, the exact solution reproduces the simulations accurately for the entire spectrum of integrating windows Dt (case m 5 1 in main panel). Interestingly, as Dt grows, the occupation probability increases sharply in high-activity vertices while slightly decreasing at low activity nodes. Moreover, as Dt increases r a / a as predicted by eq. (4), while as Dt gets smaller, r a 5 1/N, as predicted by eq. (5). The equations describe correctly also the behavior observed for one-to-many simultaneous connections m, characterized by a smoother increase in r a at high activity nodes (see m 5 6 case in Fig. 4, inset). The SI contains more details on the formulation of the m . 1 case.
Numerical validation on real-world networks. The analytical framework discussed above qualitatively reproduce also the behavior observed in real datasets. In Figs The theoretical results accurately describe real data, with some deviations for nodes in the intermediate activity range at Dt of one day. The RW occupation probability is uniform and independent of node activity for small Dt as predicted by eq. (5). As predicted by eq. (4), the RW occupation probability r a approaches (a 1 AEaae)/(2NAEaae) (black curve) as Dt increases, an effect particularly noticeable for high-activity nodes. It is also worth highlighting that the data matches well the theoretical equations for the case m 5 1, suggesting a connection between the datasets and the fundamental mechanisms described in our model (for the similarity in behavior between m 5 1 and projected networks such as the PRL co-authorship networks see SI).

Discussion
Our results clarify the effect of time aggregation procedures on the behavior of the RW, taken as the simplest instance of dynamical process, even when aggregation windows are ''short''. We have quantified this effect in a rigorous mathematical framework that (i) allows us to recover the results concerning static networks in the limit of infinite aggregation windows, (ii) accurately describes the behavior observed in numerical simulations upon synthetic time-varying networks, and (iii) captures the phenomenology observed on real datasets. Overall, while for practical or technical reasons researchers are often forced, or simply tempted, to work with time aggregated representations of time-varying networks, our work suggests that caution should be used when drawing general conclusions about dynamical processes based upon time-aggregated networks. At the same time, moreover, our theoretical results may help to investigate possible distortions introduced by the aggregating windows of data collection methods.
The proposed framework considers inherently discrete processes, such as spreading phenomena in contact networks that are, also at the smallest time resolution possible, discrete. We leave the generalization to continuous processes for further work.

Methods
Occupation probability. The asymptotic occupation probability is the steady state probability of finding the walker in a node with activity a, which is guaranteed to exist and be unique if the time-varying network that is stationary, ergodic, and Tconnected (see SI), such as in activity driven networks. A time-varying network is Tconnected if there is a temporal path between any two nodes 40 . In our simulations we consider the RW occupation probability r a to be the probability of finding the walker Figure 4 | Occupation probability r a of a RW over an activity-driven network with activity distribution dF(a) / a 22 , a g (10 23 , 1), N 5 10 5 , for different values of m. Curves in the main plot concern the m 5 1 case, where each node can only simultaneously connect to one node. In the inset, the case m 5 6 is considered, where a node simultaneously connect to six other nodes. Solid curves represent the analytical prediction of eq. (3) integrated over Dt 5 1, 10, 100 (diamonds, squares and circles) time windows. Note that in both panels as Dt gets larger r a < a. Averages performed over 10 3 independent simulations.   in a node with activity a at the end of the simulation period [0, T], given the walker starts at a random node.
Activity-driven networks. Activity-driven network models are based on the activity patterns of nodes, that are used to explicitly model the evolution of the network structure over time 17 . It can be shown that the full dynamics of the network are encoded in the activity rate distribution, dF(a) and that the time-aggregated measurement of network connectivity yields a degree distribution that follows the same functional form as the distribution dF(a) in the limit of small k/Dt and k/N 17 . This is an important feature of the model, that is able to reproduce basic statistical properties found in many real networks giving a simple prescription to characterize explicitly dynamical connectivity patterns.
Datasets & simulation. In this study we considered two different empirical projections of bipartite time-varying networks. The collaborations in the journal Physical Review Letters (PRL) published by the American Physical Society 34 , and the Yahoo! music dataset made available by Yahoo! 35 .
PRL dataset. The bipartite network representation of this dataset has two type of nodes: authors and papers. An author is connected to all the papers she/he wrote in a integrating window Dt. We study the bipartite projection of the authors. In this representation each author of an article in PRL as a node. Undirected edges connect authors that collaborate in the same article. We focus just on small collaborations filtering out all the articles with more than 10 authors. We consider the period between 1958 and 2006. The datasets contains 80,554 authors and 66,892 articles. The smallest timescale available is one day.
Yahoo! music dataset. In this dataset the bipartite network has two type of nodes: users and songs. We study the bipartite projection over the songs. Each node is a song and two songs are connected if at least one user rated both in a time window Dt. The dataset contains 4.6 3 10 5 songs rated by 199,719 users of Yahoo! users collected in the course of six months 35 . User activity is recorded at a time resolution of seconds.
Simulation setup. We obtain the empirical walker occupation probability, r a , as follows. Construct the transition probability matrix P t associated to the RW on the tth aggregated network G t (Dt), t 5 0, …, T/Dt, where T is the time of the last event in the dataset. The empirical RW occupation probability is obtained by multiplying the matrices P 0 P 1 Á Á Á P n and then left-multiplying the result by the vector (1/N, …, 1/N), which gives equal probability that for the walker to start at any node. We note in passing that similar results are obtained when the walker starts at a handful of high activity nodes.