The structured backbone of temporal social ties

In many data sets, information on the structure and temporality of a system coexists with noise and non-essential elements. In networked systems for instance, some edges might be non-essential or exist only by chance. Filtering them out and extracting a set of relevant connections is a non-trivial task. Moreover, mehods put forward until now do not deal with time-resolved network data, which have become increasingly available. Here we develop a method for filtering temporal network data, by defining an adequate temporal null model that allows us to identify pairs of nodes having more interactions than expected given their activities: the significant ties. Moreover, our method can assign a significance to complex structures such as triads of simultaneous interactions, an impossible task for methods based on static representations. Our results hint at ways to represent temporal networks for use in data-driven models.


I. INTRODUCTION
The analysis of large-scale empirical data sets, and in particular of complex networked data, is often made difficult by the nature of the data itself: data may be noisy [1][2][3], and contain both robust, generalizable properties and details specific to the collected data set under investigation, which would change if the data had been collected at a different moment or for a different sample of the same system.For instance, data describing interactions between individuals (face-to-face [4,5], phone calls [6,7], or online interactions [8,9]) might show robust group structures in different days or weeks but with a different set and timing of interactions each day, so that the exact timing of an interaction in a specific day might not be relevant to the understanding of the population's characteristics.Another issue might arise if the network under scrutiny is dense but with very heterogeneous weights on the edges.The importance of edges might then not be easily reducible only to their own weight, nor to the local properties of the nodes they link, such as their degree (number of neighbours in the network) or their strength (sum of the weights of their edges).
In order to extract the most relevant information from the data, several approaches have been put forward for static networks.For instance, the k-core decomposition focuses on more and more connected parts of a network and has been established as an important tool to analyze and visualize complex networks and to determine influential spreaders in networks [10][11][12].Another approach consists in determining a "backbone" of significant edges in the network, and to filter out the remaining nonessential edges.Several methods have been proposed to this purpose in the case of static weighted networks.The simplest way of filtering edges is through thresholding: all the edges with weight below a given threshold value are removed.Such a method however imposes an arbitrary cutoff scale, while many systems of interest display broad distributions of weights and complex patterns at multiple scales.Other methods have thus been put forward to filter out edges simultaneously at different scales, using statistical tests based on null models [13][14][15][16][17][18]: in all cases, the fundamental idea is to test whether the weight of an edge is distinguishable from the hypothetical one that would be generated at random by a certain null model.Filtering is performed by fixing a desired significance level and selecting only those edges whose weight cannot be explained by the null model at the chosen significance level.These significant edges form a backbone of the network.
Various null models have been proposed in the literature to deal with static weighted networks [13][14][15][16][17][18].The recent surge in the availability of temporally-resolved high-resolution data on social and economic networks highlights however the need for methods specifically designed to extract backbones from temporal networks or temporally aggregated networks [19,20].Obviously, each method defined on static weighted methods can be applied to a temporally aggregated network: for instance, a simple threshold could be applied on the number of contacts between two nodes.However, a highly active node could in principle have a large number of (non-essential) ties, so that one needs to control for the difference in intrinsic activity levels across nodes to extract statistically significant ties that cannot be explained by random chance.
Here, we develop a method to extract an irreducible backbone from a sequence of temporal contacts between nodes, by defining an adequate temporal null model.This null model can be interpreted as a (temporal) configuration or fitness model, whose parameters are estimated by using global information, namely the numbers of contacts for all node pairs, similarly to the enhanced configuration model (ECM) filter defined for static networks [17].Thanks to this null model, we determine the set of significant ties, at any significance level, among all the pairs of nodes with at least one interaction in the data.These ties form an irreducible backbone in the sense defined in [17], as their significance cannot be reduced to the activity of the involved nodes.Most importantly, the temporal nature of the null model allows us to attribute a significance also to higher order structures such as e.g., simultaneously occurring triplets of interactions or other temporal motifs [21], a task that would be by construction impossible when defining significant ties and backbones directly from a temporally aggregated network.
We illustrate the application of our filtering method on temporal networks of social and economic relevance and compare its results with several static filtering methods.Interestingly, at a given level of significance, our method identifies more significant edges than other filters and is less biased towards edges with large weights.Moreover, in cases where the aggregated network has a clearcut community structure, corresponding e.g. to classes in a school, the significant ties turn out to be mostly intra-community ones: at high significance levels, the network of significant ties (i.e., backbone) breaks into several connected components, each corresponding to one community, and inter-community edges turn out to be non-significant.This suggests that inter-community edges, while playing a crucial role in reducing the diameter of the network [22,23], are here indistinguishable from randomly created edges between groups, once nodes' activities are fixed.We also investigate significant triads, defined as sets of three nodes that interact simultaneously with each other more than expected given their activities, and show that these significant triads are not necessarily composed of three significant edges.This shows the crucial importance of taking into account temporality when defining a null model to detect the significance of structures in temporal networks, as such information could not be obtained from a purely static null model.

II. DATA
We consider five data sets of social and economic interest described by temporal networks (Table.I).Four data sets correspond to face-to-face contacts among individuals in different contexts, recorded using wearable sensors by the SocioPatterns collaboration with a temporal resolu-TABLE I. Basic description of empirical temporal networks.The "number of labels" gives the number of classes for the Primaryschool data (excluding "teachers"), of offices for Workplace, and of types of occupations for the Hospital data.For the Interbank data, the number shown in the third column denotes the number of daily edges rather than the total number of transactions.We classify banks into Italian banks and foreign banks.tion of 20 seconds and publicly available [24].We consider data sets collected in contexts with very different activity levels, constraints on the schedule of individuals, duration and group structures, namely i) a high school ("Highschool") [25], ii) a primary school ("Primaryschool") [5],

Data
iii) an office building ("Workplace") [26] and iv) a hospital ward ("Hospital") [27].In addition, we also use the data on temporal financial networks in which nodes and edges represent banks and overnight lending-borrowing relationships, respectively.Since the overnight loan contracts last only for one day, we can construct a sequence of daily snapshot networks (i.e., time resolution is one day).We consider here the data on the online interbank market in Italy, called e-MID, between June 12, 2007 and July 9, 2007 (i.e., 20 business days).The data is commercially available from e-MID SIM S.p.A. based in Milan, Italy [28].

A. Temporal fitness model
We consider a set of N nodes and a sequence of interactions that occur at arbitrary points in time between these nodes [20,29].We fix a temporal resolution ∆ by dividing the whole data temporal window of length T into T /∆ time intervals, and we build on each interval a binary adjacency matrix A t with elements A ij,t equal to 1 if there is at least one interaction between i and j during (t − ∆, t] and zero otherwise. In the temporal fitness model, each node i is assigned an activity level a i ∈ (0, 1] and the probability u that nodes i and j interact (e.g., through a face-to-face contact, a bilateral financial transaction, etc) during any given time interval is simply given by the product of their activity levels [30]: In a static network context, this class of network model is called the fitness model and has been used to model network generative processes [31][32][33].The null model we obtain is thus a sequence of successive independent realizations of the (static) fitness model.It settles a baseline of how much two nodes are expected to interact, given their activities, if interaction partners are selected at random at each time step.In the simplest version of this null model, we consider constant activity values for each node, which will allow us to explicitly compute the probability distribution of the total number of interactions between each pair of nodes.Note that this number is at most τ = T /∆, i.e., the number of time intervals given the resolution ∆.Note also that this temporal null model does not contain any a priori knowledge of the group structure of the nodes.An interesting generalization could be to superimpose group labels or node properties (e.g., gender or age for nodes representing individuals) and interaction probabilities depending on the nodes' properties.
In addition, we present in section S1 of Supporting Information (SI) a refined version that takes into account temporal variations of the overall interaction activity in the system, through the introduction of a time-varying parameter ξ(t).In that case, the null model is defined by the fact that the probability of nodes i and j establishing a connection at time t is u(a i , a j , t) = a i a j ξ(t).We present here the case of constant ξ(t) = 1 as we can then obtain an analytical form for the probability distribution of the number of interactions of a pair of nodes, while only an approximate formula is available in the more general case.We show however in SI that both methods yield almost identical results in the case studied here.

B. Significant ties
To uncover significant ties with respect to the null model described above, we proceed in two steps (Fig. 1).First, we perform a maximum likelihood estimation of all the node activities a ≡ (a 1 , . . ., a N ), as described in the Materials and Methods section, obtaining the estimates a * ≡ (a * 1 , . . ., a * N ).We then compute for each pair of nodes i and j the probability distribution of their total number of interactions m ij in the null model, which is given by the following binomial distribution: , where G is the cumulative distribution function (CDF) of g(m ij |a * i , a * j ).If the actual empirical number of interactions m o ij between i and j is larger than m c ij , it means that this number cannot be explained by the null model at significance level α ≡ 1 − c/100, indicating that i and j are connected by a significant tie.For a given significance level α, we can test the significance of the set of interactions of each pair Interaction snapshots (binary) A l p I k P 4 w e L 3 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 3 I 1 H o O y b C e p w f 5 n 8 l u q F j 3 I b s p 0 A l p I k P 4 w e L 3 Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 3 I 1 H o O y b C e p w f 5 n 8 l u q F j 3 I b s p 0 f l b l 8 q r w / U 7 1 5 M w K 6 5 h I Z 5 U 8 e 5 A y y S n s r r 6 1 t X u 1 M D k / 1 h m n A 7 r k + f f p n E 7 5 B F 7 r j 3 0 4 J + b 3 o P M D V P 6 / 7 o e g X i 1 / L l f m a q W p 6 e w l c h j F O 7 z n 6 / 6 E K X z F L O q 8 7 R l + 4 x p / t R n N 1 2 L t R 7 d U 6 8 k 0 I 7 h n 2 s 4 N a O O l n g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " m 1 U A q X w k l f l b l 8 q r w / U 7 1 5 M w K 6 5 h I Z 5 U 8 e 5 A y y S n s r r 6 1 t X u 1 M D k / 1 h m n A 7 r k + f f p n E 7 5 B F 7 r j 3 0 4 J + b 3 o P M D V P 6 / 7 o e g X i 1 / L l f m a q W p 6 e w l c h j F O 7 z n 6 / 6 E K X z F L O q 8 7 R l + 4 x p / t R n N 1 2 L t R 7 d U 6 8 k 0 I 7 h n 2 s 4 N a O O l n g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " m 1 U A q X w k l

min.
< l a t e x i t s h a 1 _ b a s e 6 4 = " y f 1. Sketch of the filtering method.From the temporal network at resolution ∆, described by τ = T /∆ adjacency matrices, we estimate the set of node activities (a * , . . ., a * N ), and thus the probability distribution of the number of interactions between any pair of nodes (i, j) under the null model.We compare the empirical value m o ij with the percentiles of this distribution to determine the significance of the pair (i, j)'s interactions.
of nodes independently from the others [17].Note that, even if the significance of a tie is determined from an aggregated number of interactions, a significant tie does not correspond here to a static edge but to an interacting pair of nodes with their set of temporally resolved interactions, and the backbone given by all interactions in the significant ties remains a temporal network.Tuning α allows us to probe more and more significant pairs by decreasing α, and/or to tune the number of ties retained in the backbone, providing a systematic filtering method that we call Significant Tie (ST) filter.
Thanks to the use of the null model, a pair of interacting nodes can be significant even if their number of interactions is small, as long as their individual activity levels are sufficiently low.Reciprocally, ties with a large number of interactions might not be significant if the two involved nodes are very active.The ST filter controls indeed for the difference between nodes in terms of intrinsic activity levels.As a consequence, the significant ties identified by our method are "irreducible" in the sense that their significance cannot be attributed to local node-specific properties [17], such as the node degree and strength in the aggregated network: the probability of interaction between two nodes under the null hypothesis is determined by an interplay of global and local information through the maximum likelihood estimation ((S4) in Materials and Methods).The resulting network of interactions between the significant pairs of nodes may thus be regarded as an irreducible backbone of the temporal network under study [17].

C. Significant temporal structures
Using a temporal fitness model as null model allows us to go beyond the usual tests concerning the significance of ties, and to assign a significance to higher order structures such as temporal motifs.To illustrate this point, let us consider the simple case of a triadic interaction between three nodes i, j, k; the empirical number of time intervals in which the three pairs (i, j), (j, k) and (i, k) are simultaneously interacting, denoted by r o ijk , can be compared to the probability distribution of the number r ijk of occurrences of such triangles in the null model.For each time interval, the probability that i, j, k are forming a triangle of interactions in the temporal fitness model is so that r ijk obeys the following probability distribution in the null model: Similarly to the case of dyads, we define for each significance level α = 1 − c/100 the significant triads as those such that r o ijk is larger than the c-th percentile of h(r ijk |a * i , a * j , a * k ).Note that this method can easily be generalized to any set of temporally constrained interactions (e.g., occurring in a sequence of successive snapshots) or motifs.On the contrary, any filtering method based directly on the aggregated network, and not taking into account the temporality of the data, is by construction unable to define a null hypothesis for simultaneous interactions (or interactions with a given temporal sequence) and thus to assign a significance to such patterns.

D. Case studies
We now use the ST filter on the data sets presented above, varying the significance level α and comparing the results, when possible, with two other filtering methods that use directly the static, temporally aggregated network, namely the disparity filter (DP filter) [13] and the enhanced configuration model (ECM filter) [17], whose computation is recalled in section S2 of the SI.We present in the main text the main results corresponding to the Primaryschool and Highschool data sets, with temporal resolution ∆ = 15min (except when mentioned otherwise) and refer to the SI for the results concerning other temporal resolutions and other data sets.methods.As α decreases, this number decreases sharply for all methods.Interestingly however, the number of significant ties remains much larger for our method than for the DP and ECM filters, as soon as α enters a regime of high statistical significance (e.g., α < 10 −2 ).As α becomes very small, i.e., at very high statistical significance, DP and ECM filters retain only a very small number of edges, while the ST filter still uncovers a relatively large number of significant node pairs.The resulting backbone of interactions between significant pairs at very low α (e.g., α between 10 −10 and 10 −5 ) might be regarded as a fundamental backbone of the data set.See Fig. S4 for the similar results obtained with other data sets and different parameters.

Comparison with other filtering methods
Given the differences in the definition of the various filters, it is important to understand to what extent these filters select distinct or similar sets of ties.We quantify the similarity between the results of the filtering methods in two different ways.First, the Jaccard index x | gives the fraction of common edges between the backbone obtained by the ST filter at significance level α and another backbone x ∈ {DP, ECM} at significance level α .A Jaccard equal to 1 means that both methods yield the same exact set of edges, while J = 0 means that the backbones are disjoints.As the different methods yield very different backbone sizes for a fixed significance level, we show in Fig. 3 a color plot of the Jaccard index as a function of the number of node pairs retained by each filtering method.In both DP and ECM cases, the largest Jaccard indices are obtained when the number of edges are similar: they reach at most of ∼ 80% and decrease as the backbone size decreases.The ST filter thus defines a backbone significantly different from the one obtained from either the DP or the ECM filters, even at fixed number of edges.
The Jaccard index however does not take into account the fact that different ties can correspond to very different number of interactions.We therefore also consider a weighted measure of the similarity between the backbones obtained by different filters.The results shown in Fig. S5, indicate that values larger than the Jaccard index are obtained, with similarities of 0.9 − 1, decreasing below 0.8 only when the backbone sizes becomes small.The various backbones seem thus to all retain similar sets of ties with large weights, while differing when assessing the significance of pairs of nodes with smaller numbers of interactions.Similar results are obtained with all data sets.
To investigate this issue in more details, we compare in Fig. S6 the distributions of weights (i.e., of the number of interactions) of the ties considered either as significant or not by the different filters.In all cases, the significant ties display on average larger weights than the non-significant ones, and all ties with weight larger than a certain threshold turn out to be significant.However, the range of weights spanned by the ties found by the ST filter is broader.The DP filter in particular is almost equivalent to a thresholding procedure, with only a narrow range of weight values for which both significant and non-significant edges can be found, in agreement with the result noted in [17] that this filter tends to retain larger weights.The ST filter thus manages to retain pairs of nodes with a wide range of actual numbers of interactions, thanks to the null model on which it is based.

Backbones and community structure
In the data sets we consider, nodes can be classified into different groups, corresponding e.g. to classes in schools, departments in the workplace and roles in the hospital.In the Highschool, Primaryschool and Workplace cases, these groups define a clear-cut community structure [5,25,26], while nodes from different groups are more mixed in the other data sets [27].This is confirmed by the values of the weighted modularity of the partition corresponding to these groups shown in Fig. 4c.A visualization of the backbones obtained by the different filtering methods, shown in Fig. 4a and b, indicates that the backbone obtained by the ST filter seems to separate the network into connected components corresponding to these communities more efficiently than the other filters, at fixed number of edges.We show that this is indeed the case, through two more quantitative indicators.First, we measure, as a function of the backbone size, the fraction of intra-group edges (Fig. S7).It is larger than the random baseline (in which edges are kept completely at random) for all filters, approaching one as the number of edges decreases, and maximal for the ST filter.Second, we consider each filtering method as a prediction task for finding intra-group edges: we use α as parameter and measure, for each α, the true and false positives (edges in the backbone that are/are not intra-group edges) and true and false negatives (edges not in the backbone that are not/are intra-group edges), building thus a ROC curve (see section S3 of the SI for details).The area under the curve (AUC) of the ROC curve of the ST filter is higher than for the other filters for the three data sets for which the group structure corresponds to a community structure (large value of the modularity) while, for the other data sets in which the modularity is small, all filters lead to an AUC close to 0.5, i.e., close to a random baseline (Fig. 4c).
The fact that the ST backbone selects mostly intracommunity node pairs suggests that inter-community edges are largely due to random interactions, given the activity levels of the nodes.Inter-community edges, which act as bridges, play an important role in propagating information, spreading ideas, and diffusion of influence [22,36].Nevertheless, our analysis shows that the actual intercommunity edges are statistically indistinguishable from randomly connected ties, hence not "significant" with respect to a null model of random interactions between nodes of fixed activities.This hints at a way to represent the original temporal network as a superposition of (i) a backbone of significant ties and (ii) connections extracted at random between nodes of different groups, in a way that would refine the contact matrix of distributions put forward in [37,38].

Triadic relationships
As described above, we can extract triads with a significant number of simultaneous interactions only by using the filtering method presented here, as the other filters do not use temporally resolved null models.We show in Fig. 5a the number of such significant triadic relationships as a function of the significance level α, for the Highschool data set and temporal resolution ∆ = 1min.Similar results are shown in Fig. S8 for the other data sets and different values of ∆.For filtering levels α > 10 −4 , almost all the triangles present in the aggregated network are considered as significant.However, the number of significant triadic relationships strongly decreases as α becomes lower, determining a set of triads such that their number of simultaneous interactions cannot be explained by the temporal null model and the individual nodes' activity levels.
Figure 5b highlights moreover a striking feature of the  significant triads, namely, that they do not necessarily correspond to three significant ties.In fact, the number of significant ties in a significant triad can take any value between 0 and 3 (see also Fig. S9).Reciprocally, not all triangles made by three significant ties turn out to be significant triads (Fig. S10).This clearly shows how the temporal null model allows us to go beyond the definition of significant ties and find significant higher order structures that could not be unveiled by a static approach.Indeed, considering triangles made by significant ties does not guarantee that the corresponding triads have (a significant number of) simultaneous interactions, while on the other hand ties (i, j) with non-significant number of interactions when considered as dyads can turn out to interact a significant fraction of times simultaneously with two other dyads (j, k) and (i, k) for a certain k, forming thus a significant triad.

IV. DISCUSSION
In this paper, we have presented a new method to find significant ties and structures in temporal network data sets.To this aim, we have defined a null model of interactions that takes into account the heterogeneity in the activity of individual nodes and the temporal dimension of the system, and can potentially be extended to include temporal variations in the overall network activity, due for instance to circadian or weekly rhythms, imposed schedule constraints, etc.We compute for each pair of nodes the distribution of their number of interactions in the null model, and compare the empirical value to this distribution.For any chosen significance level, we thus define as significant pairs of nodes those with a number of interactions that cannot be explained by the null model.As the null model includes the heterogeneous activity of nodes, the temporal network backbone composed by the ties with significant numbers of interactions is not reducible to the nodes local properties and contains ties with a broad distribution of numbers of interactions.Varying the significance level allows us to tune the number of node pairs in this backbone.
The comparison with other backboning methods built for weighted static network, and hence applied here on the temporally aggregated network, reveals interesting similarities and differences.Our method yields, at a given significance level, more pairs of nodes than the other benchmarks.A more detailed comparison shows that the difference does not come from ties with large numbers of interactions, which are similarly selected by all methods, but rather by the fact that our filter uncovers more significant pairs of nodes with small number of interactions.The resulting distribution of weights of the significant ties is less biased towards large weights.Moreover, it turns out that, for networks with a strong community structure, our method tends to uncover mostly intra-community ties, showing the random nature of the inter-community ones, once the node activities are given.
Thanks to the temporal nature of the null model considered, our method can also attribute a significance to more complex temporal structures, such as sets of simultaneous interactions, which have a clear importance in social terms (it is clearly not the same to have three interactions (i, j) (j, k), (i, k) between three individuals at the same moment or at different times), but also for processes such as epidemic spread occurring on top of a temporal network [39].We have in particular shown that significant triads of simultaneous interactions are not equivalent to triangles of three significant dyads, illustrating the need to take into account the temporality of these structures, which could not be uncovered by static filtering methods.
Our work hints at several perspectives and future research directions.First, it would be interesting to refine data representations such as the ones put forward in [37,38], by combining a backbone at a certain significance level and a contact matrix representing in a summarized fashion the non-significant ties.Another possibility would be to represent the data as a backbone plus the set of activities {{a * i }, ξ(t)}.In both cases, these representations should be validated by numerical simulations of various types of processes on top of the data.The relevance of such representation is two-fold: on the one hand, they allow to summarize and generalize complex data sets in a way that can be fed into data-driven models of dynamical processes such as epidemic or information spreading; on the other hand, they keep the minimum amount of detailed information on the precise interactions, summarizing less relevant details as distributions or averages, and thus in a way that might be easier to render data anonymous.Another direction of research would be to define a backbone at a finer resolution, namely that would be composed of (sets of) significant interactions instead of ties or set of ties.

Estimation of nodal activity
We perform a maximum likelihood (ML) estimation of a ≡ (a 1 , . . ., a N ), taking the τ temporal snapshots as input, where τ = T /∆ .If two individuals are independently matched in each time interval according to probability u(a, a ), then the number of times temporal edges are formed between nodes i and j over τ time intervals is a random variable m ij that follows a binomial distribution with parameters τ and u(a i , a j ).Therefore, the joint probability function leads to where m ij ≤ τ denotes the count of temporal edges between i and j observed over τ periods in the null model.The log-likelihood function for the empirical data {m o ij } is thus given by where "const."denotes the terms that are independent of a.The ML estimate of a is the solution for the following N equations: The first-order condition (S4) is obtained by differentiating the log-likelihood function (6) with respect to a i .The system of N nonlinear equations, H(a) = 0, can be solved by using a standard numerical algorithm. 1The obtained ML estimates of a is denoted by a * ≡ (a * 1 , . . ., a * N ).The numbers of contacts obtained from the model and the empirical data are compared in section S4.
The extension of the method to include time-varying probabilities of creating interactions is shown in section S1.

Supporting Information
"The structured backbone of temporal social ties" Teruyoshi Kobayashi, Taro Takaguchi, Alain Barrat

S1. TIME-VARYING MATCHING PROBABILITY
In constructing the temporal fitness model, it is possible to take into account the possibility that the probability of an interaction between two nodes can vary over time even when individuals' intrinsic activities {a i } are constant.This can happen when, for example, a school schedule has a certain rhythm (e.g., lunch time, class schedule, etc), or due to circadian or weekly rhythms.The probability u for the existence of an interaction between two nodes i and j at time t is then given as u(a i , a j , t) ≡ a i a j ξ(t), t = 1, . . ., τ, where ξ(t) denotes a time-varying parameter.We assume that there is no correlation between the values of ξ at different times, and the interaction probabilities are independent across time intervals.The joint probability function for a certain temporal network {A t } is obtained as where A ij,t is the (i, j) element of the adjacency matrix in time interval t, denoted by A t , and ξ ≡ (ξ(1), . . ., ξ(τ )) .
The log-likelihood function is thus given by L(a, ξ) = log p({A t }|a, ξ) The maximum-likelihood estimate of (a, ξ) is the solution for the following N + τ − 1 equations: The first-order conditions (S4) and (S5) are obtained by differentiating the log-likelihood function Eq. (S3) with respect to a i for i = 1, . . .N and ξ(t) for t = 2, . . ., τ .For t = 1, ξ( 1) is normalized as one since otherwise there would arise a linear dependency between the optimality conditions and therefore the solution would be indeterminate.This reflects the fact that any combination of âi , âj and ξ(t) would satisfy the optimality conditions if a * i a * j = c • âi âj and ξ * (t) = ξ(t)/c.In solving the nonlinear equations (S4) and (S5), the initial values for a i and ξ(t) are set as a i = j:j =i (m ij /τ )/ 2 i<j m ij /τ and 0.999, respectively.
Under the null model with a time-varying term, the average number of contacts between i and j over τ periods is given by Thus, the number of contacts obeys a Poisson binomial distribution with mean λ ij and variance σ ij ≡ τ t=1 (1 − u(a i , a j , t))u(a i , a j , t).Since an exact functional form for a Poisson binomial distribution is intractable, we approximate the distribution of {m ij } with a Poisson distribution [40][41][42]: where the error bound is given by Le Cam's theorem [40][41][42]: We use Eq.(S7) in testing the significance of edge (i, j) for a given observation m o ij .A comparison between the models with and without time-varying parameter is shown in Fig. S1.It shows that the test results are almost identical between the two null models; the numbers and the degree of overlap of identified significant ties suggests that the introduction of a time-varying parameter for capturing an activity rhythm does not affect the results shown in the main text.

S2. BACKBONING METHODS FOR STATIC NETWORKS
We recall here two well-known ways to assign a significance to edges and build backbones for static weighted networks.In our context, it can be done by first aggregating the temporal network on the available time-window, obtaining a network where the degree of a node is given by its number of distinct neighbors, the weight of an edge is total interaction time between two nodes, and the strength of an edge is given by s i = j,t A ij,t .

A. Disparity filter
The Disparity (DP) filter [13] is a filtering algorithm to classify the edges of a static weighted network into significant and insignificant ones.The DP filter uses only local information: the weight of an edge, ω ij , the nodal degree, k i , and the strength, s i .The idea is that if node i has no specific relationship with its neighbors, then its strength (i.e., sum of weights) is distributed uniformly at random on the k i edges incident to it.The authors of [13] show that the link between i and j is regarded as significant at filtering level α, if it satisfies the following condition: where p ij = ω ij /s i .The LHS of Eq. (S9) represents the p-value for the null hypothesis that the edge weights are distributed uniformly at random.In fact, as argued in Gemmetto et al. [17], the significance of edge between i and j is not necessarily identical to that between j and i even for an undirected network.Therefore, one needs to test the significance of "two edges" (i, j) and (j, i) independently, and then the (undirected) edge is regarded as significant if at least one of the two "edges" satisfies the criterion (S9).

B. ECM filter
The ECM (enhanced configuration model) filter [17] is developed based on the idea that statistically significant edges are the ones whose presence cannot be explained by random chance.More specifically, the entropy-maximizing random matching probabilities are calculated with the ECM in which edge weights are distributed at random as uniformly as possible subject to two constraints: k = k * and s = s * , where x * denotes the empirical value of variable x.Gemmetto et al. [17] show that the p-value for edge (i, j) is then given by where x * i x * j y * i y * j 1 − y * i y * j + x * i x * j y * i y * j , (S11) and x * i and y * i represent hidden variables (or auxiliary variables) [43,44] that solve the following conditions: x i x j y i y j 1 − y i y j + x i x j y i y j ∀i, (S12) x i x j y i y j (1 − y i y j )(1 − y i y j + x i x j y i y j ) ∀i. (S13) One needs to solve a system of 2N nonlinear equations to obtain x * and y * .In fact, − ln x i (− ln y i ) corresponds to a Lagrange multiplier associated with the constraint k i = k * i ( s i = s * i ).The backbone of a weighted network with significance level α is the network consisting only of edges (i, j) ∈ {(i, j) : γ * ij < α}.Our implementation for the calculation of Eqs.(S12) and (S13) is based on the "Max & Sam" method proposed in [45], and the MATLAB code is available from [46].

S3. DEFINITIONS OF ROC CURVE AND AUC
The receiver operating characteristic (ROC) curve is a plot of true positive rates against false positive rates for different cutpoints of a test statistic.In our context, we want to know how well the significance of an edge can predict whether that edge is an intra-community edge.For this purpose, we use the p-value of an edge in a given filtering test as a measure of edge significance.That is, different points on an ROC curve denote different cutoff levels of p-values for a given filtering method (Fig. S2).For the ST filter presented in the main text, the p-value of an edge i − j is simply 1 − G(m c ij |a * i , a * j ).For a given p 0 , we consider therefore as True Positives (TP) the edges that have a p-value lower than p 0 (i.e., are considered significant) and are intra-community edges.False Positives (FP) are instead the inter-community edges with a p-value lower than p 0 .Similarly, True Negatives (TM) are edges with p-value larger than p 0 and inter-community, and False Negatives (FN) the intra-community edges with p-value larger than p 0 .The false positive rate is given by F P/(T N + F P ) and the true positive rate by T P/(T P + F N ).
The area under the ROC curve (AUC) quantifies the goodness of the p-values for the task of predicting intracommunity edges.If the p-value of a filtering method perfectly distinguishes between intra-and inter-community edges (i.e., higher p-values indicate inter-community edges), then the value of AUC will be one.If the p-value is instead a poor indicator so that it is not different from a random prediction, then the AUC will be close to 0.5 (i.e., the ROC curve is then be a straight line joining the points (0, 0) and (1, 1)).
We show the ROC curves and a comparison of the AUCs for different data sets in Fig. S2 and Fig. 4c in the main text.

S4. TOTAL NUMBER OF INTERACTIONS
The total number of empirical snapshot edges, denoted by M , is given by The expected number of total snapshot edges within the null model, denoted by M * , is given as Figure S3 illustrates the comparison between M and M * , for different data sets, with temporal resolution ∆ = 15 min.The almost perfect fit between the estimated value of M * and the empirical M indicates that the set of random matching probabilities, {a * i a * j }, accurately explain the total number of contacts, suggesting that the maximum likelihood estimation of the activity vector works well for a wide range of temporal-network data.
e Z O X P m n j t n Z n T X l H 5 A 9 B R T O j q 7 u n t 6 + + L 9 A 4 N D i e T w S M 5 3 a p 4 h s o Z j O t 6 2 r v n C l L b I B j I w x b b r C c 3 S T Z H X q y u t / X x d e L 5 0 7K 3 g w B U F S y v b s i Q N L W A q Z x U b s n J U T K Y p Q 2 G k f g I 1 A m l E s e Y k b 7 C L P T g w U I M F A R s B Y x M a f G 4 7 U E F w m S u g w Z z H S I b 7 A k e I s 7 b G W Y I z N G a r P J Z 5 t R O x N q 9 b N f 1 Q b f A p Jn e P l S l M 0 S P d U p M e 6 I 6 e 6 f 3 X W o 2 w R s v L A c 9 6 W y v c Y u J 4 b P P t X 5 X F c 4 D 9 T 9 W f n g O U s B B 6 l e z d D Z n W L Y y 2 v n 5 4 2 t x c 3 J h q T N M V v b D / S 3 q i e 7 6 B X X 8 1 r t f F x g X i / A H q 9 + f + C X K z G Z U y 6 v p c e m k 5 + o p e T G A S M / z e 8 1 j C K t a Q 5 X M r O M E Z z m M v y r A y p o y 3 U 5 V Y p B n F l 1 D S H / X V j E k = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 W u S S l a v 3 W z X 7 u 5 F / e Z O X P m n j t n Z n T X l H 5 A 9 B R T O j q 7 u n t 6 + + L 9 A 4 N D i e T w S M 5 3 a p 4 h s o Z j O t 6 2 r v n C l L b I B j I w x b b r C c 3 S T Z H X q y u t / X x d e L 5 0 7K 3 g w B U F S y v b s i Q N L W A q Z x U b s n J U T K Y p Q 2 G k f g I 1 A m l E s e Y k b 7 C L P T g w U I M F A R s B Y x M a f G 4 7 U E F w m S u g w Z z H S I b 7 A k e I s 7 b G W Y I z N G a r P J Z 5 t R O x N q 9 b N f 1 Q b f A p Jn e P l S l M 0 S P d U p M e 6 I 6 e 6 f 3 X W o 2 w R s v L A c 9 6 W y v c Y u J 4 b P P t X 5 X F c 4 D 9 T 9 W f n g O U s B B 6 l e z d D Z n W L Y y 2 v n 5 4 2 t x c 3 J h q T N M V v b D / S 3 q i e 7 6 B X X 8 1 r t f F x g X i / A H q 9 + f + C X K z G Z U y 6 v p c e m k 5 + o p e T G A S M / z e 8 1 j C K t a Q 5 X M r O M E Z z m M v y r A y p o y 3 U 5 V Y p B n F l 1 D S H / X V j E k = < / l a t e x i t >

1 .FIG. 2 .
Figure2first displays the number of significant ties as a function of the significance level α, for the three

FIG. 4 .
FIG. 4. Backbones and community structure.Visualization of the backbones obtained by different filtering methods, at similar numbers of edges, for the (a) Highschool and (b) Primaryschool data sets.The nodes are shown in the same position for all the backbones of a given data set."High contact" correspond to a simple thresholding procedure on the aggregated network.Different colors denote different classes.For the Primaryschool data, black circles represent the teachers.The ST filter detects a larger fraction of intra-class edges than the other filtering methods.(c) AUC of the ROC curve for the identification of intra-community edges.The red solid curve denotes the value of the weighted modularity Q [34, 35] calculated by regarding the actual groups (classes for Primaryschool and Highschool, departments for Workplace, roles for Hospital) as communities on the original aggregate networks.For the Primaryschool data set, the teachers are classified into classes with which they have the largest number of edges.For the Interbank data set, banks are classified into Italian banks and foreign banks.

3 FIG. 5 .
FIG. 5. Significant triadic relationships in the Highschool data set, for a temporal resolution ∆ = 1min.(a) Number of significant triads as a function of α.(b) Fraction of significant triadic relationships with a given number of significant dyads, as a function of α.
t e x i t s h a 1 _ b a s e 6 4 = " W g R 9 k z 9 T H FIG. S2.ROC curve for the detection of intra-community edges for the data sets with a clear community structure, for the three filters considered in the main text: Disparity filter (DP), Enhanced Configuration model (ECM) and ST filter.Here ∆ = 15 min.
FIG. S3.Total numbers of snapshot edges in the model and the real data.We set ∆ = 15 min for all data sets except Interbank (for which the temporal resolution is 1 day).
e x i t > FIG.S1.Comparison between constant and time-varying interaction probabilities.(a) Number of significant ties vs. significance level α.(b) Jaccard index quantifying the overlap between the lists of significant ties detected by the two null models, vs. α.Here ∆ = 15 min.