Introduction

Social networks are key to the exchange of ideas, norms, and other cultural constructs in human society1, influencing the way we communicate2, support each other3,4, and form enduring communities5. Decades of research have focused on regularities in the patterns of relations among individuals6 as well as the drivers and mechanisms behind their origin7. One particularly prominent feature of social networks is the diversity of tie strengths8, where strong ties are typically embedded within social groups while weak ties are crucial for the cohesiveness of the network as a whole8,9,10. At the micro level, ego networks—the sets of social ties between an individual (the ego) and their family, friends, and acquaintances (the alters)—commonly feature a small core of close relationships. These close relationships are associated with high emotional intensity and they are surrounded by a larger number of weaker ties. The emergence of this characteristic structural pattern has been associated with constraints on maintaining social relationships, which include limited information processing capacity11, social cognition12,13,14, and time availability15,16,17.

Studies of human communication via mobile phones have shown that in line with the above picture, there is a consistent, general pattern in egocentric networks where a small number of close alters receive a disproportionately large share of communication. Data on the frequency of mobile phone calls and text messages also indicate that within this general pattern, there are clear and persistent individual differences18,19,20,21,22: some people repeatedly focus most of their attention on a few close relationships, while others tend to distribute communication among their alters more evenly18. These differences are stable in time even under high personal network turnover. However, the mechanisms that generate such heterogeneity of tie strengths, its individual-level variation, and the generality of this pattern beyond mobile-phone-mediated communication, have not yet been established14,22,23,24.

Here, we explore multiple sets of data on recurring social interactions between millions of people to study heterogeneity in ego network tie strengths and its individual variation, and to shed light on the mechanisms behind this heterogeneity. These large-scale datasets contain metadata on different types of time-stamped interactions, from mobile phone calls to social media, spanning a time range from months to years. They are likely to reflect different aspects of social behavior: e.g., mobile-phone calls between friends, work-related emails, and messages on an Internet forum or dating website serve different purposes and may or may not reflect social relationships that also exist offline. Using social networks reconstructed from the interaction records in our data, we measure the distribution of tie strengths in a massive number of egocentric networks, focusing on how this distribution varies between individuals. We compare observations across several datasets representing different channels of communication and use our observations to construct a minimal, analytically tractable model of egocentric network growth that attributes heterogeneity in tie strengths and its individual variation to the balance between competing mechanisms of tie reinforcement.

We find systematic evidence of broad variation in the distributions of tie strengths in ego networks across all communication channels, including those channels that do not necessarily reflect offline social interactions. The majority of ego networks have heterogeneous tie strengths with varying amounts of heterogeneity, while a minority of individuals distribute their contacts in a homogeneous way. With the help of our model of egocentric network evolution, we attribute the amount of heterogeneity to a mechanism of cumulative advantage25,26,27, similar to proportional growth28 and preferential attachment29,30,31,32. Homogeneity, in turn, is associated with effectively random choice of alters for communication. The balance between these two mechanisms determines the dispersion of tie strengths in an egocentric network. This balance is captured in our model through a single preferentiality parameter that can be fitted to data for each ego. The distributions of fitted values of this parameter are remarkably similar across different datasets, indicating universal patterns of communication in channels that are very different in nature. Similarly to social signatures18, we also observe that at the level of individuals, the preferentiality parameter is a stable and persistent indicator of the distinctive way people shape their network on the particular channel.

Results

We analyze data on recurring, time-stamped social interactions between millions of individuals across 16 communication channels, including phone call records, text messages, emails, and posts from social networks and online forums (Fig. 1). Data include, among others, anonymized metadata for 1.3B calls and 613M messages made by 6M people in a European country during 20079,21,33,34,35,36,37, 431k emails by 57k students at Kiel University in 4 months38,39, and 850k wall posts in Facebook made by 45k users in New Orleans during 2006–200939,40. Periods of observation vary widely, from 1 month of text message logs for 3 mobile phone companies41 to 7 years of private messages and open forum discussions in the Swedish movie recommendation website Filmtipset39,42,43 (for data details see Supplementary Information [SI] Section S1, Table S1, and Fig. S1). The analyzed data covers a wide range of population sizes and time scales of activity, and they come from a large enough variety of channels to include typical social contexts of human online communication.

Fig. 1: Tie strengths are heterogeneous and driven by cumulative advantage.
figure 1

a Real-time contact sequence between ego and its k alters (left) and timeline of communication activity a (right), for selected ego in the CNS call dataset75,76 (data description in SI Section S1). Times are relative to the observation length, so close-by events appear as single lines (left) or sudden increases in a (right). The sequence is divided into two consecutive intervals with the same number of events (I1 and I2). With time, some alters communicate more than others. b Aggregated ego network (left) and alter activity distribution pa (right) for (a). The distribution has minimum activity a0, mean t, and standard deviation σ. c Complementary cumulative distribution function (CCDF) \(P[{a}^{{\prime} }\ge a]\) of number of alters with at least activity a, for egos in each quartile range of the dispersion distribution pd and k ≥ 10, in the Mobile (call) dataset9,21,33,34,35,36,37 (all systems in SI Fig. S3). For larger dispersions, egos communicate with alters heterogeneously. d Dispersion distribution pd for data in (c), showing more heterogeneous egos (all channels in SI Fig. S2). e Relative probability πa − 〈1/k〉 that alter with activity a is contacted, averaged over time and egos in each quartile range of the dispersion distribution pd in (d) (all systems in SI Fig. S6). The baseline πa = 〈1/k〉 means alters are contacted at random (each a value corresponds to at least 30 egos and is normalized by the maximum activity am in the ego subset). For heterogeneous egos, the increasing tendency indicates cumulative advantage: alters with high prior activity receive more events. f CCDF \(P[{d}^{{\prime} }\ge d]\) of number of egos having at least dispersion d, for 8.6M egos in 16 communication channels (SI Table S1 and SI Fig. S2; shown only for egos with more than 10 events). g Relative connection kernel πa − 〈1/k〉 for all datasets (each a value corresponds to at least 50 egos with k ≥ 2; see SI Figs. S4S6). Increasing trends indicate cumulative advantage in all channels.

Tie strengths are heterogeneous and driven by cumulative advantage

The total communication activity a (the number of calls, messages, or posts) between an individual, or ego, and each of the ego’s acquaintances, or alters, increases with time (Fig. 1a). Due to variability in communication patterns with different alters, aggregated ego networks typically have heterogeneous activities (or, equivalently, tie strengths). This heterogeneity leads to a broad alter activity distribution pa, defined as the probability that a randomly chosen alter has activity a at the end of the observation period. Following44, we characterize the spread of pa by the dispersion index d = (σ2 − tr)/(σ2 + tr), where σ2 is the variance of pa and tr = t − a0 its mean relative to the minimum activity in the ego network (Fig. 1b). We find that in our datasets most egos primarily communicate with a few alters, in agreement with previously observed patterns of mobile phone communication18,45 and online platform use46. These egos have networks with heterogeneous tie strengths, in other words, broad activity distributions pa with large dispersion d, where most events are concentrated on the most communicative alters18,47 (Fig. 1c and SI Fig. S3). Note that in the following, because of their equivalence, we use the term social signature interchangeably for both individual activity distributions and the activity-rank curves of18. In addition to egos with heterogeneous tie strengths, all studied communication channels contain a smaller fraction of egos who distribute their communication more homogeneously among alters, leading to smaller values of d and narrower activity distributions. Indeed, the distribution pd of the dispersion indices over an entire dataset shows both over-dispersed egos (d ~ 1) and egos with more Poissonian social signatures (d ~ 0; Fig. 1d and SI Fig. S2). Even egos with similar degrees or strength (total numbers of alters or events) can have heterogeneous or homogeneous activity distributions, which are thus not solely driven by differences in the total level of activity between individuals.

In order to find plausible generative mechanisms behind the diversity of social signatures seen in human communication data, we calculate the probability πa that an alter with current activity a communicates once more with the ego, averaged over all events and alters in the observation period (SI Fig. S4). This measure is akin to the attachment kernel of growing networks48,49,50, which has been identified in many cases as a linear function of the degree51,52, and which has been applied in preferential attachment models28,29,30,53. We further restrict πa to the aggregated data of egos with given values of dispersion d (Fig. 1e and SI Fig. S6). When averaged over heterogeneous egos (large d), the connection kernel πa increases monotonically with a, indicating cumulative advantage as the way most individuals interact with their acquaintances. Homogeneous egos (low d), on the other hand, have a flatter and eventually decreasing kernel closer to the average baseline πa = 〈1/k〉 where events are allocated among alters uniformly, which can be modeled by random choice. Despite variations in the ratio of heterogeneous to homogeneous activity distributions across channels (signaled by different shapes of the dispersion distribution pd; Fig. 1f and SI Fig. S2), the connection kernel πa has qualitatively the same functional form for all datasets, and it even has a similar slope for a wide range of activity values (Fig. 1g and SI Fig. S4). The observed increasing kernels are also robust to the degree k of the ego network, with low degrees showing slightly higher levels of cumulative advantage (SI Fig. S5).

Modeling tie strength heterogeneity

To explore the simplest theoretical mechanisms that may give rise to the observed variability across ego networks, we consider minimal cumulative-advantage dynamics similar to Price’s model26,54, where the probability of communication between an ego and an alter depends on their prior communication activity and a tunable parameter α that modulates random alter choice (Fig. 2). We start with an undirected ego network of degree k where all alters have initial communication activity a0. After τ interactions, the probability πa that an alter with activity a interacts with the ego at event time τ + 1 is

$${\pi }_{a}=\frac{a+\alpha }{\tau+k\alpha }.$$
(1)

When the parameter α is small, πa increases linearly with activity so egos interact preferentially with the most active alters, following a dynamics similar to stochastic processes driven by cumulative advantage27,28, and preferential attachment in the evolution of connectivity29,32,53 and edge weights30 in growing networks. For large α, the connection kernel is flatter and alters are chosen uniformly at random. The parameter α interpolates between heterogeneity and homogeneity in edge weights, even for ego networks with the same mean alter activity t = τ/k (Fig. 2a; for a detailed model description see Materials and Methods [MM] and SI Section S2).

Fig. 2: Simple model of alter activity shows crossover in shape of social signatures.
figure 2

a In a modeled ego network of degree k, alters begin with activity a0 and engage in new communication events at event time τ with probability πa, where a is the alter’s current activity and α a parameter interpolating behavior between cumulative advantage (α → − a0, top) and random choice (α → , bottom; see MM and SI Section S2). These dynamics lead to an ego network with mean alter activity (i.e. time) t = τ/k. Plots and networks on the right are shown diagrammatically but correspond to k = 5, a0 = 1, α = − 0.9 (103), and t = 3 (103) at the top (bottom). b Probability pa that an alter has activity a at time t, for varying t with α = − 0.7 (9) at the top (bottom), k = 100 and a0 = 1. Numerical simulations (num) match well with analytical calculations (theo), indicating that cumulative advantage and random choice, respectively, lead to broad or narrow activity distributions. c Phase diagram of activity dispersion d in terms of rescaled parameters αr = α + a0 and tr = t − a0. The preferentiality parameter β = tr/αr showcases a crossover between heterogeneous and homogeneous regimes at β = 1 (dashed line). The vertical gray dash-dotted lines are parameter values for plot (d). d Rescaled activity distribution pa for varying t and αr = 0.3 (103) at the top (bottom). Heterogeneous (homogeneous) regimes show gamma (Gaussian) scaling in pa. All simulations are averages over 104 realizations.

We solve the model analytically via a master equation for pa in the limit τ, k →  (see MM and SI Section S2 for derivation). By introducing the preferentiality parameter β = tr/αr with tr = t − a0 and αr = α + a0, the activity distribution can be written as

$${p}_{a}={p}_{0}\frac{{a}_{r}^{-1}}{{{{{{{{\rm{B}}}}}}}}({a}_{r},{\alpha }_{r})}{\left(1+\frac{1}{\beta }\right)}^{-{a}_{r}},$$
(2)

where ar = a − a0, \({p}_{0}={\left(1+\beta \right)}^{-{\alpha }_{r}}\), and B(ar, αr) is the Euler beta function. Eq. (2) fits to numerical simulations of the model very well, even for relatively low values of τ and k (Fig. 2b). The preferentiality parameter β, the ratio between the average number of interactions in the ego network and the tendency of the ego and alters to interact preferentially, reveals a crossover in the behavior of the model, corresponding to a dispersion d = β/(2 + β) (Fig. 2c; derivation in SI Section S2). For large β, dispersion increases (just like in the heterogeneous signatures of Fig. 1) and pa takes the broad shape of a gamma distribution. When β and d are small, the activity distribution approaches a Poisson distribution and scales like a Gaussian in the limit of large tr (Fig. 2d). This equivalence between β and d justifies our choice of the dispersion index as a measure of heterogeneity: d depends only on β and allows us to compare egos with different activity levels, while a quantity like the activity variance σ2 = tr(1 + β) depends explicitly on mean activity.

Model reveals diversity and persistence of social signatures

Empirical ego networks have broadly distributed degree and minimum/mean alter activities for all communication channels studied (see SI Table S1 and Fig. S1). With k, a0, and t fixed by the data, Eq. (2) becomes a single-parameter model, allowing us to derive maximum likelihood estimates for the preferentiality parameter β in each ego network (Fig. 3; see MM and SI Section S3 for details on the fitting process). After performing a goodness-of-fit test55,56,57 with both Kolmogorov-Smirnov and Cramér-von Mises test statistics58, we obtain β estimates for 33−71% of egos in each dataset, amounting to 6.57M individuals over 16 communication channels (SI Tables S2S3). Values of the preferentiality parameter, capturing the shape of the social signature of an ego, cover a wide region in the (αr, tr) space and accumulate around the crossover β = 1 (Fig. 3a; compare with Fig. 2c; all datasets in SI Fig. S13). By accumulating all alter activities over heterogeneous (β > 1) and homogeneous (β < 1) egos (Fig. 3b and SI Fig. S14), activity distributions have the same functional form as in Fig. 1c, revealing the crossover value d = 1/3 predicted by the model as a principled estimate of the boundary between heterogeneous and homogeneous regimes in Fig. 1c–e.

Fig. 3: Model reveals diversity and persistence of social signatures.
figure 3

a Heat map of the number Nα,t of egos with given values of αr = α + a0 and tr = t − a0 in the Mobile (call) dataset9,21,33 -- 37 (data description in SI Section S1; all systems in SI Fig. S13). Most egos (95%) have a heterogeneous social signature. On the other side of the crossover β = 1, a few egos (5%) have more homogeneous tie strengths (SI Table S3). b CCDF \(P[{a}^{{\prime} }\ge a]\) of number of alters having at least activity a, aggregated over all egos in the heterogeneous (β > 1) or homogeneous (β < 1) regime in data from (a) (all channels in SI Fig. S14). c CCDF \(P[1/{\beta }^{{\prime} }\ge 1/\beta ]\) of rate 1/β, estimated for 6.57M egos in 16 datasets of calls, messaging, and online interactions. All systems show a diversity of social signatures, with 66–99% egos favouring a few of their alters, and 1–34% communicating homogeneously (SI Table S3 and SI Figs. S11S12). d Number NJβ of egos with given alter turnover J and relative preferentiality change Δβ/β when estimating β in two consecutive intervals of activity (I1 and I2, see Fig. 1 and SI Section S3), calculated for egos in (a) (all channels in SI Fig. S15). We also show marginal number distributions of turnover (NJ) and relative preferentiality change (NΔβ). Social signatures are persistent in time at the level of individuals, regardless of alter turnover. e Distribution pΔβ of relative preferentiality change for all studied datasets. Persistence of social signatures is systematic across communication channels.

The heterogeneity of ego network tie strengths is well captured by the preferentiality parameter β, as it is a single number that encapsulates how each individual chooses which alters to interact with (cumulative advantage or effective random choice). Our data and model show that this parameter is broadly distributed (66–99% of ego networks in a dataset have heterogeneous and 1–34% homogeneous signatures; see SI Table S3). Yet, the parameter has a similar functional shape in data representing different communication channels (Fig. 3c), both in value and in the region in (αr, tr) space covered by data (see SI Fig. S13). To explore whether β and the associated activity distribution pa are personal characteristics of each ego and not a product of random variation, we quantify its persistence by separating the communication activity of an ego into two consecutive intervals18,19,20,21 (with the same number of events; see Fig. 1a), fitting the model independently to each interval. The difference Δβ in preferentiality, relative to β for the whole observation period, is very small for most egos (Fig. 3d). When separating individuals by alter turnover in their ego networks, i.e. the Jaccard similarity coefficient J between sets of alters in both intervals, the mean of Δβ remains close to zero even for egos with high network turnover (J ~ 0; for details see SI Section S3 and SI Fig. S15). The persistence of the preferentiality parameter, found in all of our datasets regardless of communication channel (Fig. 3e) and irrespectively of alter turnover, shows that it indeed captures intrinsic individual differences in social behavior.

Discussion

Our findings demonstrate that humans tend to build similar-looking personal networks on multiple online communication channels. The analysis of egocentric networks reveals a common heterogeneous pattern, in which a small group of alters receive a disproportionate amount of communication, yet substantial inter-individual variation is observed similarly across all datasets. To capture this pattern and its variation, we have developed a parsimonious and analytically tractable model of ego network evolution, which incorporates a preferentiality parameter specific to each ego. This parameter quantifies the degree of heterogeneity in an ego’s personal network, reflecting the balance between two distinct mechanisms of tie reinforcement: cumulative advantage and random choice. Importantly, the distribution of fitted preferentiality parameter values characterizing individual social behavior is consistent across datasets from different channels, pointing to the presence of platform-independent universal patterns of communication.

This universality can be considered both expected and unexpected. In the case of people’s real social networks, loosely defined as relationships that exist in the offline world, it is not surprising that their structure, characterized by a small number of close relationships, is reflected in online communication as well, such as through mobile phone calls. The cumulative advantage mechanism that drives the dispersion of tie strength can be thought to effectively result from people putting more emphasis on their closest relationships, which arise in part due to similarities in any number of sociodemographic, behavioral, and intrapersonal characteristics59. Generally, the heterogeneity of tie strengths in ego networks has been attributed to cognitive, temporal, and other constraints11,12,13,15,16,17, and different personality traits60,61 and their relative stability have been proposed as one possible reason for the persistent individual variation in this heterogeneity20.

However, there is no a priori reason why the ego networks generated from work-related emails, dating website messages, or movie-related online forum discussions should exhibit similarities to those arising from mobile telephone communications. The nature of communication in these different contexts often pertains to a specific purpose and is limited to a subset of the ego’s alters62, who may even only be represented by online aliases. Nevertheless, despite these differences, the overall pattern of heterogeneous tie strengths and the distribution of the preferentiality parameter, which captures inter-individual variability, are remarkably similar across all datasets. This raises questions as to the underlying mechanisms driving these similarities.

One possibility is that our brain is simply wired to consistently shape our social networks in similar ways, independent of the specific medium of communication13,63. Alternatively, the reason may lie in the mechanisms of tie strength reinforcement: cumulative advantage may arise, e.g., because we have already participated in an online conversation with someone and it is easier to continue interacting with the same alter. In other words, while the mechanism of cumulative advantage effectively explains ego network tie strengths, it can arise because of different reasons: emotional closeness of real relationships, or the ease of repeated interactions in online communication with aliases. A process potentially underlying cumulative advantage is homophily27,59,64. If individuals with similar traits communicate more often, as time goes by, alters with large activity will be those most similar to the ego, and also the ones most likely to interact with the ego again, leading to an increasing connection kernel. Random choice and a flat kernel, in turn, are consistent with a lack of similarity-based tie reinforcement. Observational data including individual traits (beyond the activity counts explored here) may allows us to further explore the explicit relationship between cumulative advantage and homophily65,66.

An alternative perspective to consider is one in which all forms of social connections, whether they occur in-person or virtually, with actual people or pseudonymous entities, are integral components of an egocentric network that encompasses all relationships of an individual. Then, the various communication media can be viewed as distinct dimensions that reflect specific facets of this overarching network. Subnetworks associated with each communication channel are then shaped by the ego’s channel preferences and may or may not contain the same alters (see, e.g., 62). It is conceivable that the cognitive and time constraints on personal networks act across the whole set of communication channels. Then, each individual has their own way of allocating their available communication activity on the different channels. The selection of a communication channel is known to affect the capacity to sustain emotionally intense social relationships67, and it is plausible that channel-specific variations in an ego’s preferentiality parameter may reflect their ability (or inability) to manage channel-specific constraints that impact effective social bonding. This offers additional insights into the debate surrounding competing theories such as media richness68 and communication naturalness63. Given that the utilized datasets represent distinct populations, it is yet to be determined whether the preferentiality parameter of each individual displays similar or divergent values across different media. Recent research suggests that the values of the preferentiality parameter are similar at least for calls and text messages21, but it is not certain if this finding generalizes to other channels.

It is also notable that the value of the preferentiality parameter of each ego appears to be stable in time, even in the face of personal network turnover. This suggests that the parameter may reflect a persistent individual trait that influences the structure of egocentric networks on various channels. This interpretation raises important questions about the possible links between an ego’s preferentiality parameter and their other personal characteristics, such as age, gender, and health, and whether preferentiality itself is subject to homophilous constraints. It is well established that the diversity of social relationships can serve as an indicator of increased longevity4, enhanced cognitive functioning during aging69, and greater resilience to disease70.

Variation in the preferentiality parameter within a population may have also important consequences at the network level. Egocentric network tie strengths and their variation are obviously related to the well-established heterogeneous distribution of tie strengths across the broader network (see, e.g.,33). Moreover, if an ego’s parameter value reflects a personal trait, it may also correlate with their network role. For instance, in social media data, personality traits seem to correlate with the ability of an individual to increase their network size71, broker new relations between alters72, and participate in more communities73. Thus, a broad distribution of preferentiality parameter values among individuals may manifest as a macro-level network structure that reflects a broad array of roles and positions of individuals within the network. These observations highlight the potential for our findings to contribute to a broader understanding of the underlying mechanisms driving social network formation and individual behavior.

Methods

Model of alter activity

We consider a minimal ego network dynamics where individuals allocate interactions via cumulative advantage and a tunable amount of random choice (for details see SI Section S2). At initial event time τ0 = ka0 with k the degree of the ego network, all alters have minimal activity a0. At any time ττ0, the probability that an alter with activity a becomes active at time τ + 1 is

$${\pi }_{a}=\frac{{a}_{r}/{t}_{r}+{\beta }^{-1}}{k(1+{\beta }^{-1})},$$
(3)

with ar = a − a0, tr = t − a0, and t = τ/k the mean alter activity. The preferentiality parameter β = tr/αr (with αr = α + a0 and α a tunable parameter) interpolates between two regimes: random alter choice (β → 0 and πa ~ 1/k), and preferential alter selection (β →  and πa ~ ar/τr with τr = τ − τ0).

The model can be treated analytically in the limit τ, k →  with constant t (SI Section S2). The probability pa that a randomly chosen alter has activity a follows the master equation

$${d}_{t}{p}_{a}=\frac{1}{t+\alpha }\left[(a-1+\alpha ){p}_{a-1}-(a+\alpha ){p}_{a}\right],$$
(4)

with initial condition \({p}_{a}({a}_{0})={\delta }_{a,{a}_{0}}\) and dt the derivative with respect to t. By introducing the probability generating function g(z, t) = ∑apaza, Eq. (4) reduces to

$${\partial }_{t}g=\frac{z-1}{t+\alpha }\left(z{\partial }_{z}g+\alpha g\right),$$
(5)

a partial differential equation with initial condition \(g(z,{a}_{0})={z}^{{a}_{0}}\). Via the method of characteristics, g takes the explicit form

$$g(z,t)={z}^{{a}_{0}}{\left[z+(1-z)\left(1+\beta \right)\right]}^{-{\alpha }_{r}},$$
(6)

from which we obtain the activity distribution pa in Eq. (2) iteratively by taking partial derivatives of g with respect to z. The distribution pa has mean t and variance σ2 = tr(1 + β), leading to the dispersion index d = β/(2 + β).

Fitting data and model

We derive maximum likelihood estimates of the model parameter for empirical ego networks with degree k, minimum/maximum alter activity a0 and am, and total/mean alter activity τ = ∑iai and t = τ/k (for details see SI Section S3). Assuming that the k alter activities {ai} are independent and identically distributed random variables following pa in the model, the likelihood Lα that the sample {ai} is generated by Eq. (2) for given α follows

$${d}_{\alpha }\ln {L}_{\alpha }=k\left[{F}_{\alpha }-\ln (1+\beta )\right],$$
(7)

where \({F}_{\alpha }=\frac{1}{k}{\sum }_{i}[\psi ({a}_{r}+{\alpha }_{r})-\psi ({\alpha }_{r})]\) is an average over all observed relative activities ar = ai − a0 of the digamma function ψ(α) = dαΓ(α)/Γ(α), i.e. the logarithmic derivative of the gamma function Γ(α). The α value that maximizes Lα is given implicitly by

$${\alpha }_{r}=\frac{{t}_{r}}{{e}^{{F}_{\alpha }}-1},$$
(8)

or, equivalently, by \(\beta={e}^{{F}_{\alpha }}-1\).

A goodness-of-fit test allows us to quantify how plausible is the hypothesis that the empirical data is drawn from the model activity distribution in Eq. (2) (SI Section S3). We measure goodness of fit via the standard Kolmogorov-Smirnov statistic

$$D=\mathop{\max }\limits_{{a}_{0}\le a\le {a}_{m}}| \Delta {P}_{a}|,$$
(9)

that is, the largest magnitude of the difference \(\Delta {P}_{a}(t)={P}_{{{{{{{{\rm{data}}}}}}}}}[{a}^{{\prime} }\le a]-{P}_{a}(t)\) between the cumulative distribution of alter activity in data, \({P}_{{{{{{{{\rm{data}}}}}}}}}[{a}^{{\prime} }\le a]\), and that of the fitted model, \({P}_{a}(t)=\mathop{\sum }\nolimits_{{a}^{{\prime} }={a}_{0}}^{a}{p}_{{a}^{{\prime} }}(t)\), across all activities a [a0, am]. We check the robustness of our results with three other measures from the Cramér-von Mises family of test statistics (for details see SI Section S3).

Given the sample {ai}, we compute the estimate α numerically from Eq. (8) and the statistic D from Eq. (9), where the model activity distribution follows Eq. (2). From the model we generate \({n}_{{{{{{{{\rm{sim}}}}}}}}}=2500\) simulated activity samples \({\{{a}_{i}\}}_{{{{{{{{\rm{sim}}}}}}}}}\). For each simulated sample, we find its own estimate \({\alpha }_{{{{{{{{\rm{sim}}}}}}}}}\) and the corresponding statistic \({D}_{{{{{{{{\rm{sim}}}}}}}}}\). Then, the fraction of simulated statistics \({D}_{{{{{{{{\rm{sim}}}}}}}}}\) larger than the data statistic D is the p-value associated with the goodness-of-fit test, according to D. If the p-value is large enough (p > 0.1 with 0.1 an arbitrary significance threshold), we do not rule out the hypothesis that our activity model emulates the empirical data, and we consider that the ego network has a measurable preferentiality parameter β. We aim at obtaining large p-values (rather than small), since we want to keep the assumption that the model is a good description of the observed data (rather than reject it). Our goodness-of-fit test shows that 33 − 71% of all considered ego networks are well described by the model (or up to 42 − 88% for other test statistics; see SI Table S2).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.