Universal patterns in egocentric communication networks

Tie strengths in social networks are heterogeneous, with strong and weak ties playing different roles at the network and individual levels. Egocentric networks, networks of relationships around an individual, exhibit few strong ties and more weaker ties, as evidenced by electronic communication records. Mobile phone data has also revealed persistent individual differences within this pattern. However, the generality and driving mechanisms of social tie strength heterogeneity remain unclear. Here, we study tie strengths in egocentric networks across multiple datasets of interactions between millions of people during months to years. We find universality in tie strength distributions and their individual-level variation across communication modes, even in channels not reflecting offline social relationships. Via a simple model of egocentric network evolution, we show that the observed universality arises from the competition between cumulative advantage and random choice, two tie reinforcement mechanisms whose balance determines the diversity of tie strengths. Our results provide insight into the driving mechanisms of tie strength heterogeneity in social networks and have implications for the understanding of social network structure and individual behavior.


Introduction
Social networks are key to the exchange of ideas, norms, and other cultural constructs in human society [1], influencing the way we communicate [2], support each other [3,4], and form enduring communities [5].Decades of research have focused on regularities in the patterns of relations among individuals [6] as well as the drivers and mechanisms behind their origin [7].One particularly prominent feature of social networks is the diversity of tie strengths [8], where strong ties are typically embedded within social groups while weak ties are crucial for the cohesiveness of the network as a whole [8][9][10].At the micro level, ego networks-the sets of social ties between an individual (the ego) and their family, friends, and acquaintances (the alters)-commonly feature a small core of close relationships.These close relationships are associated with high emotional intensity and they are surrounded by a larger number of weaker ties.The emergence of this characteristic structural pattern has been associated with constraints on maintaining social relationships, which include limited information processing capacity [11], social cognition [12][13][14], and time availability [15][16][17].
Studies of human communication via mobile phones have shown that in line with the above picture, there is a consistent, general pattern in egocentric networks where a small number of close alters receive a disproportionately large share of communication.Data on the frequency of mobile phone calls and text messages also indicate that within this general pattern, there are clear and persistent individual differences [18][19][20][21][22]: some people repeatedly focus most of their attention on a few close relationships, while others tend to distribute communication among their alters more evenly [18].These differences are stable in time even under high personal network turnover.However, the mechanisms that generate such heterogeneity of tie strengths, its individual-level variation, and the generality of this pattern beyond mobile-phone-mediated communication, have not yet been established [14,[22][23][24].
Here, we explore multiple sets of data on recurring social interactions between millions of people to study heterogeneity in ego network tie strengths and its individual variation, and to shed light on the mechanisms behind this heterogeneity.These large-scale data sets contain metadata on different types of time-stamped interactions, from mobile phone calls to social media, spanning a time range from months to years.They are likely to reflect different aspects of social behaviour: e.g., mobile-phone calls between friends, work-related emails, and messages on an Internet forum or dating website serve different purposes and may or may not reflect social relationships that also exist offline.Using social networks reconstructed from the interaction records in our data, we measure the distribution of tie strengths in a massive number of egocentric networks, focusing on how this distribution varies between individuals.We compare observations across several datasets representing different channels of communication and use our observations to construct a minimal, analytically tractable model of egocentric network growth that attributes heterogeneity in tie strengths and its individual variation to the balance between competing mechanisms of tie reinforcement.
We find systematic evidence of broad variation in the distributions of tie strengths in ego networks across all communication channels, including those channels that do not necessarily reflect offline social interactions.The majority of ego networks have heterogeneous tie strengths with varying amounts of heterogeneity, while a minority of individuals distribute their contacts in a homogeneous way.With the help of our model of egocentric network evolution, we attribute the amount of heterogeneity to a mechanism of cumulative advantage [25][26][27], similar to proportional growth [28] and preferential attachment [29][30][31][32].
Homogeneity, in turn, is associated with effectively random choice of alters for communication.The balance between these two mechanisms determines the dispersion of tie strengths in an egocentric network.This balance is captured in our model through a single preferentiality parameter that can be fitted to data for each ego.The distributions of fitted values of this parameter are remarkably similar across different datasets, indicating universal patterns of communication in channels that are very different in nature.Similarly to social signatures [18], we also observe that at the level of individuals, the preferentiality parameter is a stable and persistent indicator of the distinctive way people shape their network on the particular channel.

Results
We analyze data on recurring, time-stamped social interactions between millions of individuals across 16 communication channels, including phone call records, text messages, emails, and posts from social Real-time contact sequence between an ego and its k alters (left) and time evolution of communication activity a with each alter (right), for selected ego in the CNS call dataset [44,45] (for data description see SI Section S1).Times are relative to the length of the observation period, so close-by events appear as single lines (left) or sudden increases in a (right).The sequence is divided into two consecutive intervals with the same number of events (I1 and I2).As time goes by, some alters accumulate more events than others.(b) Aggregated ego network (left) and final alter communication activity distribution pa (right) for data in (a).The distribution is characterized by a minimum activity a0, mean t, and standard deviation σ.(c) Complementary cumulative distribution P [a ≥ a] of number of alters with at least activity a (top) and fraction of communication events fa with alter at rank r (bottom), for selected egos in the Forum dataset [39,42,43].Egos distribute activity among alters either homogeneously or heterogeneously.(d) Distribution pd of the dispersion index d for all egos with more than 5 events in the Forum dataset, showing the systematic presence of both types of egos in (c).(e) Probability πa that an alter with activity a is contacted, averaged over time and subsets of heterogeneous (d > d ) or homogeneous (d < d ) egos in (d), and the average baseline πa = 1/k when communication events are distributed randomly (each value of a corresponds to at least 50 egos and is normalized by the maximum activity am in the subset).For heterogeneous ego networks, the increasing tendency indicates cumulative advantage where alters with high prior activity receive more communication.(f ) Complementary cumulative distribution P [d ≥ d] of the number of egos having at least dispersion d, for 8.6M egos in 16 datasets of calls, messaging, and online interactions (SI Table S1 and SI Fig. S2; shown only for ego networks with more than 10 events).Data shows a broad variation in how egos allocate activity among alters.(g) Relative connection kernel πa − 1/k for all datasets (each a value corresponds to at least 50 egos with k ≥ 2; see SI Fig. S3).Increasing trends indicate cumulative advantage in the ego networks of all channels.
networks and online forums (Fig. 1).Data include, among others, anonymized metadata for 1.3B calls and 613M messages made by 6M people in a European country during 2007 [9,21,[33][34][35][36][37], 431k emails by 57k students at Kiel University in 4 months [38,39], and 850k wall posts in Facebook made by 45k users in New Orleans during 2006-2009 [39,40].Periods of observation vary widely, from 1 month of text message logs for 3 mobile phone companies [41] to 7 years of private messages and open forum discussions in the Swedish movie recommendation website Filmtipset [39,42,43] (for data details see Supplementary Information [SI] Section S1, Table S1, and Fig. S1).The analyzed data covers a wide range of population sizes and time scales of activity, and they come from a large enough variety of channels to include typical social contexts of human online communication.
The total communication activity a (the number of calls, messages, or posts) between an individual, or ego, and each of the ego's acquaintances, or alters, increases with time (Fig. 1a).Due to variability in the communication patterns with different alters, aggregated ego networks at the end of the observation period typically have heterogeneous tie strengths (numbers of events between ego and alter), manifested as a broad alter activity distribution p a .Following [46], we characterize the spread of p a by the dispersion index d = (σ 2 − t r )/(σ 2 + t r ), where σ 2 is the variance of p a and t r = t − a 0 its mean relative to the minimum activity in the ego network (Fig. 1b).We find that in our datasets most egos primarily communicate with a few alters, in agreement with previously observed patterns of mobile phone communication [18,47] and online platform use [48].These egos have networks with heterogeneous tie strengths, in other words, broad activity distributions p a with large dispersion d, or equivalently, steep activityrank curves ("social signatures" in [18]) where most events are concentrated on the highest-ranking alters [18,49] (Fig. 1c).Note that in the following, because of the equivalence, we use the term social signature interchangeably for both individual activity distributions and activity-rank curves.In addition to egos with heterogeneous tie strengths, all studied communication channels contain a smaller fraction of egos who distribute their communication more homogeneously among alters, leading to smaller values of d and flatter activity-rank distributions.Indeed, the distribution p d of the dispersion indices over an entire dataset shows both over-dispersed egos (d ∼ 1) and egos with more Poissonian social signatures (d ∼ 0; Fig. 1d).Even egos with similar degrees or strength (total numbers of alters or events) can have heterogeneous or homogeneous activity distributions, which are thus not solely driven by differences in the total level of activity between individuals.
In order to find plausible generative mechanisms behind the diversity of social signatures seen in human communication data, we calculate the probability π a that a new contact happens between the ego and an alter with activity a, averaged over all events and alters in the aggregated ego network (Fig. 1e).This measure is akin to the attachment kernel of growing networks [50][51][52], which has been identified in many cases as a linear function of the degree [53,54], and which has been applied in preferential attachment models [28][29][30]55].When averaged over heterogeneous egos (d > d ), π a increases roughly linearly with a, indicating cumulative advantage or linear growth as the way most individuals interact with their acquaintances.Homogeneous egos (d < d ), on the other hand, are closer to the average baseline π a = 1/k where events are allocated among alters uniformly, which can be modelled by random choice.Despite variations in the ratio of heterogeneous to homogeneous activity distributions across channels (signaled by different shapes of the dispersion distribution p d ; Fig. 1f and SI Fig. S2), the connection probability π a has qualitatively the same functional form for all datasets, and it even has a similar slope for a wide range of activity values (Fig. 1g and SI Fig. S3).
To explore the simplest theoretical mechanisms that may give rise to the observed variability across ego networks, we consider minimal cumulative-advantage dynamics similar to Price's model [26,56], where the probability of communication between an ego and an alter depends on their prior communication activity and a tunable parameter α that modulates random alter choice (Fig. 2).We start with an undirected ego network of degree k where all alters have initial communication activity a 0 .After τ interactions, the probability π a that an alter with activity a interacts with the ego at event time τ + 1 is When the parameter α is small, π a increases linearly with activity so egos interact preferentially with the most active alters, following a dynamics similar to stochastic processes driven by cumulative advantage [27,28], and preferential attachment in the evolution of connectivity [29,32,55] and edge weights [30] in growing networks.For large α, the connection probability is flatter and alters are chosen uniformly at random.The parameter α interpolates between heterogeneity and homogeneity in edge weights, even for ego networks with the same mean alter activity t = τ /k (Fig. 2a; for a detailed model description see Materials and Methods [MM] and SI Section S2).We solve the model analytically via a master equation for p a in the limit τ, k → ∞ (see MM and SI Section S2 for derivation).By introducing the preferentiality parameter β = t r /α r with t r = t − a 0 and α r = α + a 0 , the activity distribution can be written as where a r = a − a 0 , p 0 = (1 + β) −αr , and B(a r , α r ) is the Euler beta function.Eq. ( 2) fits to numerical simulations of the model very well, even for relatively low values of τ and k (Fig. 2b).The preferentiality parameter β, the ratio between the average number of interactions in the ego network and the tendency of the ego and alters to interact preferentially, reveals a crossover in the behavior of the model, as signaled by the dispersion d = β/(2 + β) (Fig. 2c).For large β, dispersion increases (just like in the heterogeneous signatures of Fig. 1) and p a takes the broad shape of a gamma distribution.When β and d are small, the activity distribution approaches a Poisson distribution and scales like a Gaussian in the limit of large t r (Fig. 2d).
Empirical ego networks have broadly distributed degree and minimum/mean alter activities for all communication channels studied (see SI Table S1 and Fig. S1).With k, a 0 , and t fixed by the data, Eq. ( 2) becomes a single-parameter model, allowing us to derive maximum likelihood estimates for the preferentiality parameter β in each ego network (Fig. 3; see MM and SI Section S3 for details on the fitting given values of αr = α + a0 and tr = t − a0 in the Mobile (sms) dataset [9,21,[33][34][35][36][37] (data description in SI Section S1; all datasets in SI Fig. S9).Most egos (93%) have a heterogeneous social signature.On the other side of the crossover β = 1, a few egos (7%) have more homogeneous tie strengths (SI Table S3).(b) Complementary cumulative distribution P [a ≥ a] of the number of alters having at least activity a (top), and fraction of events fa with alter in rank r (bottom), aggregated over all egos in the heterogeneous (β > 1) or homogeneous (β < 1) regime in the Facebook dataset [39,40].(c) Complementary cumulative distribution P [1/β ≥ 1/β] of rate 1/β, estimated for 6.57M egos in 16 datasets of calls, messaging, and online interactions.All systems show a diversity of social signatures, with 66-99% egos favouring a few of their alters, and 1-34% communicating homogeneously (SI Table S3 and SI Fig. S8).process).After performing a goodness-of-fit test [57][58][59] with both Kolmogorov-Smirnov and Cramérvon Mises test statistics [60], we obtain β estimates for 33 − 71% of egos in each dataset, amounting to 6.57M individuals over 16 communication channels (SI Tables S2-S3).Values of the preferentiality parameter, capturing the shape of the social signature of an ego, cover a wide region in the (α r , t r ) space and accumulate around the crossover β = 1 (Fig. 3a; compare with Fig. 2c; all datasets in SI Fig. S9).By accumulating all alter activities over heterogeneous (β > 1) and homogeneous (β < 1) egos (Fig. 3b), both activity and activity-rank distributions have the same functional form as in Fig. 1c, implying that the crossover value d = 1/3 predicted by the model is a more principled estimate of the boundary between regimes than the arbitrary threshold d = d (Fig. 1d-e).
The heterogeneity of ego network tie strengths is well captured by the preferentiality parameter β, as it is a single number that encapsulates how each individual chooses which alters to interact with (cumulative advantage or effective random choice).Our data and model show that this parameter is broadly distributed (66-99% of ego networks in a dataset have heterogeneous and 1-34% homogeneous signatures; see SI Table S3).Yet, the parameter has a similar functional shape in data representing different communication channels (Fig. 3c).To explore whether β and the associated activity distribution p a are personal characteristics of each ego and not a product of random variation, we quantify its persistence by separating the communication activity of an ego into two consecutive intervals [18][19][20][21] (with the same number of events; see Fig. 1a), fitting the model independently to each interval.The difference ∆β in preferentiality, relative to β for the whole observation period, is very small for most egos (Fig. 3d).When separating individuals by alter turnover in their ego networks, i.e. the Jaccard similarity coefficient J between sets of alters in both intervals, the mean of ∆β remains close to zero even for egos with high network turnover (J ∼ 0; for details see SI Section S3 and SI Fig. S10).The persistence of the preferentiality parameter, found in all of our datasets regardless of communication channel (Fig. 3e) and irrespectively of alter turnover, shows that it indeed captures intrinsic individual differences in social behavior.

Discussion
Our findings demonstrate that humans tend to build similar-looking personal networks on multiple online communication channels.The analysis of egocentric networks reveals a common heterogeneous pattern, in which a small group of alters receive a disproportionate amount of communication, yet substantial inter-individual variation is observed similarly across all datasets.To capture this pattern and its variation, we have developed a parsimonious and analytically tractable model of ego network evolution, which incorporates a preferentiality parameter specific to each ego.This parameter quantifies the degree of heterogeneity in an ego's personal network, reflecting the balance between two distinct mechanisms of tie reinforcement: cumulative advantage and random choice.Importantly, the distribution of fitted preferentiality parameter values characterizing individual social behavior is consistent across datasets from different channels, pointing to the presence of platform-independent universal patterns of communication.
This universality can be considered both expected and unexpected.In the case of people's "real" social networks, loosely defined as relationships that exist in the offline world, it is not surprising that their structure, characterized by a small number of close relationships, is reflected in online communication as well, such as through mobile phone calls.The cumulative advantage mechanism that drives the dispersion of tie strength can be simply thought to result from people putting more emphasis on their closest relationships.Generally, the heterogeneity of tie strengths in ego networks has been attributed to cognitive, temporal, and other constraints [11][12][13][15][16][17], and different personality traits [61,62] and their relative stability have been proposed as one possible reason for the persistent individual variation in this heterogeneity [20].
However, there is no a priori reason why the ego networks generated from work-related emails, dating website messages, or movie-related online forum discussions should exhibit similarities to those arising from mobile telephone communications.The nature of communication in these different contexts often pertains to a specific purpose and is limited to a subset of the ego's alters [63], who may even only be represented by online aliases.Nevertheless, despite these differences, the overall pattern of heterogeneous tie strengths and the distribution of the preferentiality parameter, which captures interindividual variability, are remarkably similar across all datasets.This raises questions as to the underlying mechanisms driving these similarities.
One possibility is that our brain is simply wired to consistently shape our social networks in similar ways, independent of the specific medium of communication [13,64].Alternatively, the reason may lie in the mechanisms of tie strength reinforcement: cumulative advantage may arise, e.g., because we have already participated in an online conversation with someone and it is easier to continue interacting with the same alter.In other words, while the mechanism of cumulative advantage explains ego network tie strengths, it can arise because of different reasons: emotional closeness of real relationships or the ease of repeated interactions in online communication with aliases.However, purely observational data such as those analyzed here cannot provide a clear answer, and thus further research is required.
An alternative perspective to consider is one in which all forms of social connections, whether they occur in-person or virtually, with actual people or pseudonymous entities, are integral components of an egocentric network that encompasses all relationships of an individual.Then, the various communication media can be viewed as distinct dimensions that reflect specific facets of this overarching network.Subnetworks associated with each communication channel are then shaped by the ego's channel preferences and may or may not contain the same alters (see, e.g., [63]).It is conceivable that the cognitive and time constraints on personal networks act across the whole set of communication channels.Then, each individual has their own way of allocating their available communication activity on the different channels.The selection of a communication channel is known to affect the capacity to sustain emotionally intense social relationships [65], and it is plausible that channel-specific variations in an ego's preferentiality parameter may reflect their ability (or inability) to manage channel-specific constraints that impact effective social bonding.This offers additional insights into the debate surrounding competing theories such as media richness [66] and communication naturalness [64].Given that the utilized datasets represent distinct populations, it is yet to be determined whether the preferentiality parameter of each individual displays similar or divergent values across different media.Recent research suggests that the values of the preferentiality parameter are similar at least for calls and text messages [21], but it is not certain if this finding generalizes to other channels.
It is also notable that the value of the preferentiality parameter of each ego appears to be stable in time, even in the face of personal network turnover.This suggests that the parameter may reflect a persistent individual trait that influences the structure of egocentric networks on various channels.
This interpretation raises important questions about the possible links between an ego's preferentiality parameter and their other personal characteristics, such as age, gender, and health.It is well established that the diversity of social relationships can serve as an indicator of increased longevity [4], enhanced cognitive functioning during aging [67], and greater resilience to disease [68].
Variation in the preferentiality parameter within a population may have also important consequences at the network level.Egocentric network tie strengths and their variation are obviously related to the well-established heterogeneous distribution of tie strengths across the broader network (see, e.g., [33]).
Moreover, if an ego's parameter value reflects a personal trait, it may also correlate with their network role.For instance, in social media data, personality traits seem to correlate with the ability of an individual to increase their network size [69], broker new relations between alters [70], and participate in more communities [71].Thus, a broad distribution of preferentiality parameter values among individuals may manifest as a macro-level network structure that reflects a broad array of roles and positions of individuals within the network.These observations highlight the potential for our findings to contribute to a broader understanding of the underlying mechanisms driving social network formation and individual behaviour.

Model of alter activity
We consider a minimal ego network dynamics where individuals allocate interactions via cumulative advantage and a tunable amount of random choice (for details see SI Section S2).At initial event time τ0 = ka0 with k the degree of the ego network, all alters have minimal activity a0.At any time τ ≥ τ0, the probability that an alter with activity a becomes active at time τ + 1 is with ar = a − a0, tr = t − a0, and t = τ /k the mean alter activity.The preferentiality parameter β = tr/αr (with αr = α + a0 and α a tunable parameter) interpolates between two regimes: random alter choice (β → 0 and πa ∼ 1/k), and preferential alter selection (β → ∞ and πa ∼ ar/τr with τr = τ − τ0).
The model can be treated analytically in the limit τ, k → ∞ with constant t (SI Section S2).The probability pa that a randomly chosen alter has activity a follows the master equation with initial condition pa(a0) = δa,a 0 and dt the derivative with respect to t.By introducing the probability generating function g(z, t) = a paz a , Eq. ( 4) reduces to a partial differential equation with initial condition g(z, a0) = z a 0 .Via the method of characteristics, g takes the explicit form from which we obtain the activity distribution pa in Eq. ( 2) iteratively by taking partial derivatives of g with respect to z.The distribution pa has mean t and variance σ 2 = tr(1 + β), leading to the dispersion index

Fitting data and model
We derive maximum likelihood estimates of the model parameter for empirical ego networks with degree k, minimum/maximum alter activity a0 and am, and total/mean alter activity τ = i ai and t = τ /k (for details see SI Section S3).Assuming that the k alter activities {ai} are independent and identically distributed random variables following pa in the model, the likelihood Lα that the sample {ai} is generated by Eq. ( 2) for given α follows where Fα = 1 or, equivalently, by β = e Fα − 1.
A goodness-of-fit test allows us to quantify how plausible is the hypothesis that the empirical data is drawn from the model activity distribution in Eq. (2) (SI Section S3).We measure goodness of fit via the standard Kolmogorov-Smirnov statistic that is, the largest magnitude of the difference ∆Pa(t) = P data [a ≤ a]−Pa(t) between the cumulative distribution of alter activity in data, Pdata[a ≤ a], and that of the fitted model, Pa(t) = a a =a 0 p a (t), across all activities a ∈ [a0, am].We check the robustness of our results with three other measures from the Cramér-von Mises family of test statistics (for details see SI Section S3).
Given the sample {ai}, we compute the estimate α numerically from Eq. ( 8) and the statistic D from Eq. ( 9), where the model activity distribution follows Eq. ( 2).From the model we generate nsim = 2500 simulated activity samples {ai}sim.For each simulated sample, we find its own estimate αsim and the corresponding statistic Dsim.
Then, the fraction of simulated statistics Dsim larger than the data statistic D is the p-value associated with the goodness-of-fit test, according to D. If the p-value is large enough (p > 0.1 with 0.1 an arbitrary significance threshold), we do not rule out the hypothesis that our activity model emulates the empirical data, and we consider that the ego network has a measurable preferentiality parameter β.We aim at obtaining large p-values (rather than small), since we want to keep the assumption that the model is a good description of the observed data (rather than reject it).Our goodness-of-fit test shows that 33 − 71% of all considered ego networks are well described by the model (or up to 42 − 88% for other test statistics; see SI Table S2).

S1 Communication data
We analyze several datasets of social interactions between individuals from a wide range of studies in the temporal networks literature (Table S1 and Fig. S1).Each dataset includes a time-ordered set of communication events between anonymized individuals i and j (according to hashed timestamps).For each dataset, we construct temporal ego networks for each individual so that the network for ego i contains all events where i participates.Therefore, each event connecting nodes i and j appears both in the ego network where j is an alter of ego i, and in the ego network where i is an alter of ego j (except otherwise explicitly stated in Section S1.1).Table S1 lists basic properties of all datasets considered, starting with the system size N u (unfiltered number of egos) and number of events V (all distinct contact events between egos and alters).We only consider egos with any level of heterogeneous alter activity, i.e. with mean alter activity t larger than the minimum across its alters (t > a 0 ), leading to a reduced system size N (filtered number of egos).Table S1 includes several properties of the filtered datasets: average degree k (mean number of alters per ego), average strength τ (mean number of events per ego), average mean alter activity t (mean number of events per alter per ego), and average minimum/maximum alter activity a 0 and a m (mean of lowest/highest alter activity per ego).We briefly describe below each dataset considered, including references to detailed studies and locations of publicly available data.Table S1.Datasets used in this study.Characteristics of the available datasets, starting with system size Nu (unfiltered number of egos) and number of events V (all communication events between egos and alters).
We only consider egos with mean alter activity larger than its minimum (t > a0), leading to a system of size N (filtered number of egos) with the following properties: average degree k (mean number of alters per ego), average strength τ (mean number of events per ego), average mean alter activity t (mean number of events per alter per ego), and average minimum/maximum alter activity a0 and am (mean of lowest/highest alter activity per ego).We include references to detailed studies of each dataset and locations of publicly available data.Data is not publicly available, but has been extensively studied in the literature (see, for example, [1][2][3][4][5][6][7]).

S1.1 List of datasets
Short messages (Wu 1, 2 & 3).Dataset from a mobile phone operator including three charging accountant bills from three companies (denoted 1, 2, and 3) over a 1-month period.Each event comprises a sender mobile phone number, a recipient mobile phone number (both anonymized), and a hashed timestamp with a precision of 1 second [8].Data is publicly available in the Supplementary Information of [8].
Emails (Enron).Dataset of email communication from the Enron corporation during 1999-2003, which was made public as a result of legal action by the Federal Energy Regulatory Commission in the US.A subset of the corpus including 200, 399 messages sent between 158 users was originally studied in 2004 [9].In 2015 this corpus was corrected and published in raw form [24]. Data we use comes from the Koblenz network collection [10] and corresponds to 1, 148, 072 emails between 87, 273 addresses, both inside and outside Enron.After filtering out events with equal sender and recipient, we obtain the slightly lower values of N u and V in Table S1.Data is publicly available at http://konect.cc/ networks/enron/.
Emails (Kiel).Dataset of log files of email server at Kiel University, recording source and destination of every email from or to a student account over a period of 112 days [11].Data has also been analyzed in terms of temporal greedy walks in [12] (see Section S1.1.1 for data acknowledgments).
Emails (Uni).Dataset of log files of one of the main mail servers at an unnamed university, comprising email messages sent during a period of 83 days and connecting ∼ 10,000 users [13].Data was reduced to the internal mail within the institution, leaving a set of 3,188 users interchanging 309,125 messages.
The dataset has also been analyzed in terms of temporal greedy walks in [12].The value of V in [13] slightly differs when calculated directly from available data (see Table S1  Online messages (Facebook).Dataset on both friendship relationships and interactions for a large subset of the Facebook New Orleans social network, comprising over 60,000 anonymized users and over 800,000 logged interactions (wall posts) between users in a period of two years [16]. in [12].Data is publicly available at: http://socialnetworks.mpi-sws.org/data-wosn2009.html.
Values of N u and V in [16] differ when calculated directly from available data (see Table S1).
Online messages (Messages & Forum).Dataset from the social movie recommendation community Filmtipset (Sweden's largest and available since 2000), consisting of time-stamped communications (contact events) between 36,492 users during 7 years [18].Available data corresponds to a user-to-user messaging channel where each user can send text messages to another user privately and only one user at a time (Messages), and an open forum where users comment on posts of other users, as many as are willing to participate (Forum).The dataset was originally studied in [17], and has also been analyzed in terms of temporal greedy walks in [12] (see Section S1.1.1 for data acknowledgments).
Online messages (Dating).Dataset from pussokram.com,a Swedish online community primarily intended for romantic communication and targeted at adolescents and young adults, consisting of all activity during 512 days from 13 February 2001 to 10 July 2002 among roughly 30,000 users [19].Timestamped contact events between users follow 4 modes of communication: private intra-community emails, guest book signing, friendship requests ('flirts'), and friendships.Data has also been analyzed in terms of temporal greedy walks in [12] (see Section S1.1.1 for data acknowledgments).for several properties • of ego networks in each dataset: degree k (number of alters of an ego), strength τ (number of events involving an ego), mean alter activity t (average number of events per alter of an ego), minimum activity a0 (minimum number of events with the same alter), and maximum activity am (maximum number of events with the same alter).All properties are heterogeneously distributed across egos and alters, with some differences between datasets.
Online messages (College).Dataset of private messages sent on a Facebook-like online social network for students at the University of California, Irvine, from April to October 2004, where users could search the network for others and then initiate conversations based on their profile information [20,21].
Data includes the 1,899 students that sent or received at least one message on the site, comprising 59,835 online messages over 20,296 directed ties between these users.The dataset is hosted by Tore Opsahl at https://toreopsahl.com/datasets/#online_social_network and is also publicly available from the SNAP repository at https://snap.stanford.edu/data/CollegeMsg.html.
Copenhagen Networks Study (CNS call & sms) Dataset of multi-channel, phone-enabled social interactions from the Copenhagen Networks Study (CNS) [22,23].The original study includes activity of roughly 1,000 individuals during 2012-2013 via Bluetooth interactions, calls, and messages [22].Data used here is a selected portion of the full dataset as described in [23].The selected dataset includes call and short message logs between individuals, with data on timestamps of the call/message, anonymized user IDs, and call duration.We disregard missed calls, making the dataset smaller from the one in [23].
Data is publicly available via figshare in [25].

S1.2 Ego network properties, activity dispersion and connection kernel
In all considered datasets of communication, ego networks have heterogeneous structures and patterns of activity.For each ego network we measure the degree k, strength τ , mean alter activity t, minimum alter activity a 0 , and maximum alter activity a m , and then see how these measure vary across egos.
All properties show broad tails in their corresponding complementary cumulative distribution functions (CCDFs), the probability P [• ≥ •] that an ego has property • larger than a given value • (see Fig. S1).
In order to measure the variability in communication patterns between egos and alters (the heterogeneity of tie strengths in an ego network), we focus on the alter activity distribution p a , the probability that a randomly chosen alter has activity a.Following [26], we quantify the spread of p a via the varianceto-mean ratio σ 2 r /µ r by defining the dispersion index for each ego network in a dataset, where µ r = t r = t − a 0 is the mean alter activity relative to the minimum a 0 , and σ (fraction of egos having at least dispersion index d) varies smoothly with d in all systems, meaning there are egos with both narrow (d ∼ 0) and broad (d ∼ 1) alter activity distributions (Fig. S2).
We also calculate the connection kernel π a , the probability that an alter with current activity a communicates once more with the ego.When averaged over time and large enough subsets of ego-alter pairs with activity a at some point in time, the average connection kernel π a increases with alter activity (Fig. S3).Apart from low values of a, π a is larger than the average baseline 1/k (communication events are distributed uniformly at random among alters) and increases faster than linear or roughly linearly for all datasets, depending on the activity range.This behavior indicates cumulative advantage: alters with high prior activity are more likely to communicate with the ego later on in time.

S2 Model of alter activity
Consider a social ego network made up of one central individual (the ego) and its k acquaintances (the alters), where a tie between ego and alter represents communication activity between individuals (i.e.calls/messages or online interactions).At a discrete event time τ (starting from τ 0 up to the length of the observation window), each alter i = 1, . . ., k has an activity score a i (τ ) counting the number of times the ego and alter i have communicated until time τ .We take as initial condition a i (τ 0 ) = a 0 for all i, meaning that all alters have the same initial activity a 0 ≥ 0, i.e. the minimum activity observed across alters.At each time τ of the model dynamics, a single alter i communicates with the ego, such that a i (τ + 1) = a i (τ ) + 1. Taking τ 0 = ka 0 , we ensure that event time is equal to the sum of all communication events in the ego network, i.e. τ = i a i is the total communication activity.Scores are thus bounded by the growing interval We consider a cumulative-advantage dynamics (similar to Price's model [27][28][29]) tuned by a parameter α: the probability π a (τ ) that an alter with previous activity a i (τ ) = a is active at time τ +1 is proportional to its past number of communications, The connection kernel in Eq. ( S2) is well defined (at any time τ ≥ τ 0 ) for any α larger than its minimum value α 0 = −a 0 , so we can also tune the model by the relative parameter α r = α − α 0 = α + a 0 > 0.
Similarly, we define the relative alter activity a r = a − a 0 , relative event time τ r = τ − τ 0 = τ − ka 0 , and the relative mean alter activity t r = t − t 0 = t − a 0 with t = τ /k.Introducing the preferentiality parameter allow us to rewrite Eq. (S2) as As we will see in Section S2.1, the scale β (or, alternatively, the rate β −1 ) quantifies a crossover between regimes of behavior in alter activity.For β 1 (i.e.α → ∞ for fixed t and a 0 ), we have π a = 1/k for any a r and communication events are spread uniformly at random among alters.For β 1 (i.e.α → α 0 for fixed t and a 0 ), the probability of communication is roughly proportional to activity, π a → a r /τ r .In this way, the preferentiality parameter β interpolates between a homogeneous regime where communication in the ego network is uniformly random (β < 1), and a heterogeneous regime where activity is driven by cumulative advantage (β > 1), with a crossover at β = 1 (α r = t r ).

S2.1 Master equation for activity dynamics
We treat our model analytically by solving a master equation for the activity dynamics in the limit of large total alter activity τ → ∞ and large number of alters k → ∞, such that the mean alter activity t = τ /k is kept constant.We denote by p a (τ ) the time-dependent probability that an alter chosen uniformly at random has activity a at time τ ., i.e. the alter activity distribution When a new communication event happens at time τ + 1, with probability π a the group of kp a alters with activity a loses one alter (since the alter's activity increases to a + 1).With probability π a−1 the group also wins one alter from the group of kp a−1 alters with activity a − 1 (since the alter's activity increases to a).The master equation for p a is with initial condition p a (τ 0 ) = δ a,a0 , and p a ≡ 0 for a < a 0 .
Taking the limit τ, k → ∞ (with dt = 1/k → 0) and rescaling time to the fixed mean alter activity t = τ dt, we can rewrite Eq. (S5) as a continuous master equation for the alter activity distribution p a (t), with d t the derivative with respect to t.The initial time is t 0 = τ 0 /k = a 0 , so the initial condition of Eq. ( S6) is p a (t 0 ) = δ a,a0 , with p a ≡ 0 for a < a 0 .We solve Eq. (S6) within a generating function formalism.We introduce the probability generating function (PGF) g(z, t) associated to p a , which returns the probability p a by computing the a-th partial derivative with respect to z, p a (t) = ∂ a z g(0, t)/a!.Summing up over a in Eq. (S6) and manipulating dummy indices, we obtain a partial differential equation (PDE) for g, with initial condition g(z, t 0 ) = z a0 .
The linear PDE in Eq. (S8) can be solved with the method of characteristics.By introducing an auxiliary variable s, solving Eq. (S8) is equivalent to solving the system of ordinary differential (Lagrange-Charpit) equations for t ≡ t(s), z ≡ z(s) and g ≡ g(s), Using the solutions of Eq. (S9) to substitute s and z 0 , we obtain an explicit expression for the PGF, where we use the preferentiality parameter β = t r /α r (with α r = α + a 0 and t r = t − a 0 ).
Equating terms between Eq. (S7) and the Mclaurin series of Eq. (S10) lets us calculate the alter activity distribution p a explicitly by calculating partial derivatives of the PGF g with respect to z.After some algebra and by using the preferentiality parameter β we obtain for a r > 0 (a > a 0 ), with (middle) When αr = 1, the exponent of the power-law decay in the gamma distribution is αr − 1 = 0, so the activity distribution has a plateau of a values with relatively constant pa that grows with t.Eq. (S11) approximates numerical simulations very well, but fails at the tail for sufficiently low k.Simulations are averaged over 10 4 realizations.
The n-th raw moment of p a can also be computed from Eq. (S10) as m (n) = (z∂ z ) n g| z=1 , leading to the mean µ = t (in consistence with the definition of the model) and variance σ 2 = t r (1 + β).Changing variables from a to the relative alter activity a r = a − a 0 , we obtain the relative mean µ r = t r and variance σ 2 r = σ 2 (since variance is location-invariant).This allows us to write the dispersion index d of Eq. (S1) in terms of β as Eq. (S11) has an intuitive behaviour as a function of the relative alter activity a r , mean alter activity t r , cumulative-advantage parameter α r , and the minimum alter activity a 0 (Fig. S4).Even if the derivation of Eq. (S11) assumes τ, k → ∞ for fixed t, its functional form agrees very well with numerical simulations of the dynamical rule in Eq. (S2) for degree as low as k = 100 (Fig. S4 upper row), with some disagreement in the tail of the activity distribution for even lower k = 10 due to finite-size effects (Fig. S4 lower row).The first factor in Eq. (S11), p 0 , shows that the fraction of alters with minimum activity decreases as time goes by with a decay regulated by α r .The second factor, a −1 r /B(a r , α r ), is roughly a power law for intermediate values of activity a r with exponent regulated by α r .The third factor is an exponential cutoff for large activity a r at the scale β that moves to the right as time t r increases.As we will see below, the behaviour of the activity distribution p a (t) is even more apparent by approximating Eq. (S11) in the heterogeneous (β > 1) and homogeneous (β < 1) regimes by either a gamma or Poisson distribution.Since these distributions have their own scaling form, β parametrises a crossover between regimes in terms of the scaling of the activity distribution.Thus, for large α r , the decay of the activity distribution p a is exponential and independent of α r (Fig. S4 right panel).Since σ 2 r → t r as β → 0, this limit consistently recovers a dispersion d → 0 [Eq.(S13)].The scaling behaviour of the Poissonian activity distribution in Eq. (S16) is apparent from exploring the limit t r → ∞ (i.e.large mean alter activity t for fixed a 0 ).Using Stirling's approximation of the gamma function, a r != Γ(a r + 1) √ 2πa r e −ar a ar r , and assuming that p a only takes significant values close to a r = t r , the activity distribution approaches a Gaussian distribution with mean t r and standard When plotting √ t r p a vs. (a r − t r )/ √ t r for varying t r , all curves collapse to the standard Gaussian distribution with mean 0 and standard deviation 1 (Fig. S5 bottom row).The Poisson distribution (and its Gaussian scaling property) is a very good approximation of the activity distribution even for relatively low t r , as long as we are in the homogeneous regime of β < 1 (for example, Gaussian scaling fails in the bottom center plot of Fig. S5 for t = 1000 and α r = 100 since β = 10).Note that the asymptotic Gaussian scaling shape in the homogeneous regime is independent of α r , but it converges slowly as we increase t r .
Overall, the model of alter activity in social ego networks defined by the dynamical rule in Eq. (S2) has two regimes of behavior in (α r , t r )-space regulated by the preferentiality parameter β = t r /α r (Fig. S6 top).In the homogeneous regime of β < 1, the activity distribution p a (t) is asymptotically Poissonian with Gaussian scaling for increasing α r and t r , meaning that the ego spreads events homogeneously across its alters, with no strong dependence on the particular value of α r .In the heterogeneous regime  Eq. (S2) (Fig. S7).We obtain samples {a i } from numerical simulations of the model for given k, t, and a 0 , for several target values α * r of the model parameter.Then we plot the average F α − ln(1 + β) over all realizations as a function of α r to graphically locate a root, and we also compute αr numerically from Eq. (S21), which follows the distribution P [ αr ] over all realizations (inset in Fig. S7).The MLE procedure is quite accurate (α r ∼ α * r ) and thus consistent in the heterogeneous and crossover regimes.In the homogeneous regime, however, αr systematically underestimates the target value α * r for large t, where the functional form of F α − ln(1 + β) does not depend much on t anymore.It is instructive to see this MLE bias in light of the scaling property of the activity distribution.In the homogeneous regime, alter activities asymptotically scale like a Gaussian regardless of the value of α, so even relatively large errors in estimating α lead to the same scaling form.In the heterogeneous regime, where activities have an α-dependent gamma scaling, less bias means we can estimate the scaling form of empirical data more accurately.

S3.2 Goodness-of-fit test
The MLE α given implicitly by Eq. (S21) is the value of α maximizing the likelihood that the activity model of Section S2 produces a given empirical activity distribution.In addition, we need a goodnessof-fit (GOF) test quantifying how plausible is the hypothesis that the empirical data is drawn from the theoretical activity distribution p a (t) in Eq. (S11).Following [30], we measure goodness of fit by means of the distance between the activity distributions in model and data.(We have previously used this method to gauge the plausibility of several models of rank distributions in sports performance data [31]; for a rigorous criticism of the methods of [30] based on extreme value theory, see [32].)We choose as distance metric four different test statistics [33].The first one is the standard Kolmogorov-Smirnov (KS) statistic [34],  The GOF test is as follows: Given the sample {a i } from an empirical ego network, we compute the MLE α numerically from Eq. (S21), as well as the associated data statistics D, W2 , U 2 , and A 2 from Eqs. (S22)-(S25), where the model CDF P a (t) is computed numerically from p a (t) in Eq. (S11) with α = α (and t, a 0 , and a m are taken from the data sample).From the model p a (t) we generate within ego networks (Fig. S10).To quantify this effect, we separate the observed period of activity of an ego network into two consecutive intervals with the same number of events (I 1 and I 2 , see Fig. 1 in main text).We then independently estimate the preferentiality parameter for both the entire period (β) and for each of these two intervals (β 1 and β 2 ), leading to a preferentiality change ∆β = β 1 − β 2 .
We also measure alter turnover as the Jaccard similarity coefficient the sets of alters A 1 and A 2 in both intervals (with J = 0 implying totally different alters in I 1 and I 2 , and J = 1 exactly the same alters across intervals) [39].Fig. S10 shows that the relative preferentiality change ∆β/β stays close to zero regardless of alter turnover, somewhat trivially for J ∼ 1 (since alters are anyway the same people when moving from I 1 to I 2 ), but remarkably also for J ∼ 0. In other words, the individual way in which each ego allocates communication activity among alters (driven by cumulative advantage or by random alter choice) persists in time despite potentially large changes in the identity makeup of their social networks.

Figure 1 .
Figure 1.Tie strengths in egocentric network are heterogeneous and driven by cumulative advantage.(a)

Figure 2 .
Figure 2. Simple model of alter activity shows crossover in shape of social signatures.(a) In a modeled ego network of degree k, alters begin with activity a0 and engage in new communication events at event time τ with probability πa, where a is the alter's current activity and α a parameter interpolating behavior between cumulative advantage (α → −a0, top) and random choice (α → ∞, bottom; see MM and SI Section S2).These dynamics lead to an ego network with mean alter activity (i.e.time) t = τ /k.Plots and networks on the right are shown diagrammatically but correspond to k = 5, a0 = 1, α = −0.9(10 3 ), and t = 3 (10 3 ) at the top (bottom).(b) Probability pa that an alter has activity a at time t, for varying t with α = −0.7 (9) at the top (bottom), k = 100 and a0 = 1.Numerical simulations (num) match well with analytical calculations (theo), indicating that cumulative advantage and random choice, respectively, lead to broad or narrow activity distributions.(c) Phase diagram of activity dispersion d in terms of rescaled parameters αr = α + a0 and tr = t − a0.The preferentiality parameter β = tr/αr showcases a crossover between heterogeneous and homogeneous regimes at β = 1 (dashed line).The vertical gray dash-dotted lines are parameter values for plot (d).(d) Rescaled activity distribution pa for varying t and αr = 0.3 (10 3 ) at the top (bottom).Heterogeneous (homogeneous) regimes show gamma (Gaussian) scaling in pa.All simulations are averages over 10 4 realizations.

Figure 3 .
Figure 3. Model reveals diversity and persistence of social signatures.(a) Heat map of the number Nα,t of egos with (d) Number NJ,∆β of egos with given alter turnover J and relative preferentiality change ∆β/β when estimating β in two consecutive intervals of activity (I1 and I2, see Fig. 1 and SI Section S3), calculated for egos in the Mobile (call) dataset (all channels in SI Fig. S10).We also show marginal number distributions of turnover (NJ ) and relative preferentiality change (N∆β ).Social signatures are persistent in time at the level of individuals, regardless of alter turnover.(e) Distribution p∆β of relative preferentiality change for all studied datasets.Persistence of social signatures is systematic across communication channels.
Facebook links were crawled during December 29th, 2008 and January 3rd, 2009, starting from a single user and visiting friends with a breadth-first-search algorithm.Wall posts were then crawled between January 20th, 2009 and January 22nd, 2009 for all previously detected users.Wall post data spans from September 26th, 2006 to January 22nd, 2009.The dataset has also been analyzed in terms of temporal greedy walks

Figure S1 .
Figure S1.Basic properties of communication datasets.Complementary cumulative distribution functions (CCDFs) P [• ≥ •]for several properties • of ego networks in each dataset: degree k (number of alters of an ego), strength τ (number of events involving an ego), mean alter activity t (average number of events per alter of an ego), minimum activity a0 (minimum number of events with the same alter), and maximum activity am (maximum number of events with the same alter).All properties are heterogeneously distributed across egos and alters, with some differences between datasets.

Figure S2 .
Figure S2.Dispersion in communication activity.Complementary cumulative distribution function (CCDF) P [d ≥ d] of the number of egos having at least dispersion index d in all considered datasets, calculated only for ego networks with more than 10 events, i.e. τ > 10.The average dispersion d is displayed as a dashed line.Communication channels show broad variation in how egos allocate activity among alters.

2 r = σ 2 Figure S3 .
Figure S3.Connection kernel in communication.Probability πa that an alter with current activity a communicates once more with the ego, averaged over time and subsets of at least 50 egos with degree k ≥ 2 for each a value, shown here for all considered datasets.The dashed line corresponds to the average baseline πa = 1/k when communication events are distributed randomly.The growth of the connection kernel πa with activity indicates cumulative advantage, where alters with high prior activity receive more communication.

Figure S4 .
Figure S4.Simple model of alter activity.Probability pa(t) of a randomly selected alter having activity at time t, as a function of a = ar + a0 for varying t = tr + a0 and varying αr = α + a0, for fixed a0 = 1 and k = 10, 100 (bottom/top rows), in both numerical simulations of the model [Eq.(S2); dots] and its analytical solution [Eq.(S11); lines].For given t and a0, the time evolution of pa [as defined by Eq. (S6)] reaches an αr-dependent asymptotic shape.(right) When αr → ∞ (α → ∞), pa converges to a Poisson distribution with mean and variance tr [Eq.(S16)].(left) When αr → 0 (α → −a0), pa approaches a gamma distribution with shape αr and scale β = tr/αr [Eq.(S14)].(middle)When αr = 1, the exponent of the power-law decay in the gamma distribution is αr − 1 = 0, so the activity distribution has a plateau of a values with relatively constant pa that grows with t.Eq. (S11) approximates numerical simulations very well, but fails at the tail for sufficiently low k.Simulations are averaged over 10 4 realizations.

S2. 1 . 1 1 e
Heterogeneous regime (β > 1): Alter activity is gamma-distributedWe explore the limit α r → 0 (α → −a 0 for fixed a 0 ) by considering a large activity a r 0 (a a 0 , i.e. the tail of the activity distribution) for small but fixed α r .Then, the beta function behaves as B(a r , α r ) Γ(α r )a −αr r for given α r .The condition β > 1 leads to the approximations (1+β) −αr β −αr and (1+1/β) −ar e −ar/β (from a 1st-order Taylor expansion of the exponential).Inserting into Eq.(S11)we obtainp a (t) = 1 β αr Γ(α r ) a αr−1 r e −ar/β , α r → 0,(S14)a gamma distribution with shape α r and scale β.Then, the relative alter activity a r has mean t r and variance t r β.Consistently,σ 2 r → ∞ as β → ∞, implying a dispersion d → 1 [see Eq. (S13)].In the heterogeneous regime where alters communicate with the ego with probability proportional to their previous activity, the activity distribution p a (t) has power-law behaviour with exponent α r − 1 and an exponential cutoff regulated by the scale β (see, e.g., Fig.S4left).The moment-generating function of the gamma distribution shows that Eq. (S14) has exponential scaling.Plugging the rescaled activity a r = a r /β into Eq.(S14) leads to βp a (t) = 1 Γ(α r ) a r β αr−−ar/β , (S15) the standard gamma distribution (with shape α r and scale 1).In a plot of βp a vs. a r /β for varying t r and fixed α r , all curves collapse to the standard form of Eq. (S15) (Fig. S5 top row).The gamma distribution (and its scaling property) is a very good approximation of the activity distribution even for relatively low activity a r , as long as we are in the heterogeneous regime of β > 1 (for example, gamma scaling fails in the top right plot of Fig. S5 for t = 2 and α r = 10 since β = 0.1).The gamma scaling shape in the heterogeneous regime depends on α r , but remains a good approximation of the activity distribution even at the crossover β = 1.S2.1.2Homogeneous regime (β < 1): Alter activity is Poisson-distributed In the limit α r → ∞ (α → ∞ for fixed a 0 ) where alters communicate with the ego uniformly at random, the beta function behaves as B(a r , α r ) Γ(a r )α −ar r for given a r .For β < 1, we can approximate (1 + β) −αr e −tr (from a 1st-order Taylor expansion of the exponential) and (1 + 1/β) −ar β ar .Then, the activity distribution converges to a Poisson distribution with mean and variance t r , p a (t) = t ar r e −tr a r !, α r → ∞. (S16)

Figure S5 .
Figure S5.Crossover in scaling of alter activity.Probability pa(t) of a randomly selected alter having activity a at time t, as a function of ar = a − a0 for varying t = tr + a0 and varying αr = α + a0, for fixed a0 = 1 and k = 100, in both numerical simulations of the model [Eq.(S2); dots] and its analytical solution [Eq.(S11); lines].(top) By plotting βpa vs. ar/β for varying tr and fixed αr, curves collapse to the standard gamma distribution [Eq.(S15); dashed lines].This αr-dependent, gamma scaling is valid in the heterogeneous regime β > 1, with β = tr/αr the scale parameter of the gamma distribution in Eq. (S14).Though only asymptotically correct, gamma scaling is a good approximation even at the crossover β = 1.(bottom) By plotting √ trpa vs. (ar − tr)/ √ tr for varying tr, curves collapse to the standard Gaussian distribution [Eq.(S17); dashed lines].This Gaussian scaling is valid in the homogeneous regime β < 1 and becomes asymptotically more accurate with tr.Simulations are averaged over 10 4 realizations.

Figure S6 .
Figure S6.Scaling regimes in alter activity.(top) Phase diagram in (αr, tr)-space, showcasing the scaling regimes in alter activity of the model defined by Eq. (S2).(bottom) Probability pa(t) of a randomly selected alter having activity a at time t, as a function of ar = a − a0 for varying tr = t − a0 and varying αr = α + a0 such that β = tr/αr is constant, for fixed a0 = 1 and k = 100, in both numerical simulations of the model [Eq.(S2); dots] and the gamma [Eq.(S14)] and Poisson [Eq.(S16)] approximations.When β < 1 (right), activity is homogeneously distributed across alters and pa(t) is Poissonian with asymptotic Gaussian scaling.When β > 1 (left), pa(t) has gamma scaling for varying tr and fixed αr.For αr < 1 a few alters accumulate most activity, and for αr > 1 alter activity is more homogeneously distributed.Regimes are separated by a crossover at β = 1 where both gamma and Gaussian scaling forms fail slightly.

Figure S7 .
Figure S7.Consistency of maximum likelihood estimation.Numerical simulations of alter activity [according to Eq. (S2)] for several target values α * r of the model parameter and varying t = tr + a0, with a0 = 1 and k = 10 3 .The quantity Fα − ln(1 + β) as a function of αr (an average over 10 3 simulations), has a root αr somewhere close to α * r , in accordance with Eq. (S21).Insets show the kernel density estimate P [ αr] of the MLE αr [as computed numerically from Eq. (S21)] over all simulations, which is centered around α * r for most parameter values.The MLE procedure recovers the target value α * r and is thus consistent, apart from a systematic underestimation for large t in the homogeneous regime.

aa[
=a0 p a (t), across all activities a ∈ [a 0 , a m ], where a 0 and a m are the minimum and maximum alter activities in the empirical ego network, respectively.The other three belong to the Cramér-von Mises family of test statistics[35][36][37][38]: the Cramér-von Mises (W 2 ) statistic, ∆P a − ∆P ] 2 p a , (S24) and the Anderson-Darling (A 2 ) statistic, a p a are, respectively, the square and average of the CDF difference between model and data 2 .

2 NFigure S10 .
Figure S10.Persistence of preferentiality in communication data.Hexbin histogram across datasets of number N J,∆β of egos with given alter turnover J and relative preferentiality change ∆β/β.We estimate the preferentiality parameter in the whole observation period (β) as well as in two consecutive intervals of activity spanning the period (β1 and β2, respectively, with ∆β = β1 − β2).We also show marginal number distributions of turnover (NJ ) and relative preferentiality change (N∆β).Social signatures are persistent in time at the level of individuals, regardless of alter turnover.

Table S2 .
Statistical significance of maximum likelihood estimation.Fraction n• of ego networks satisfying the condition p• > 0.1 on the p-value p• associated to the test statistics of Kolmogorov-Smirnov, Cramér-von Mises, Watson, and Anderson-Darling [• = D, W 2 , U 2 , A 2 , respectively; see Eqs. (S22)-(S25)].Fractions n• are calculated relative to the number N of egos in each dataset under the condition t > a0 (i.e. with any level of heterogeneity on their communication signatures).The model is able to reproduce observed data for most egos, at least according to some statistic.For large datasets, statistical significance is robust to the choice of statistic.thatis, the largest magnitude of the difference ∆P a (t) = P data [a ≤ a]−P a (t) between the cumulative distribution function (CDF) in data, P data [a ≤ a], and the CDF of the fitted model, P a (t) =