Beyond Dunbar circles: a continuous description of social relationships and resource allocation

We discuss the structure of human relationship patterns in terms of a new formalism that allows to study resource allocation problems where the cost of the resource may take continuous values. This is in contrast with the main focus of previous studies where relationships were classified in a few, discrete layers (known as Dunbar’s circles) with the cost being the same within each layer. We show that with our continuum approach we can identify a parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta $$\end{document}η that is the equivalent of the ratio of relationships between adjacent circles in the discrete case, with a value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta \sim 6$$\end{document}η∼6. We confirm this prediction using three different datasets coming from phone records, face-to-face contacts, and interactions in Facebook. As the sample size increases, the distributions of estimated parameters smooth around the predicted value of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta $$\end{document}η. The existence of a characteristic value of the parameter at the population level indicates that the model is capturing a seemingly universal feature on how humans manage relationships. Our analyses also confirm earlier results showing the existence of social signatures arising from having to allocate finite resources into different relationships, and that the structure of online personal networks mirrors those in the off-line world.

www.nature.com/scientificreports/ There are, however, alternative ways of measuring tie strength which do not rely on a discrete scale like the one used with the circles. Good examples are frequency of contact 9 , time spent together 17 , or number of messages (information) exchanged 4,13 . Even though some of these quantities could be technically regarded as discrete, the fact that they consist of hugely many possibilities makes this viewpoint rather impractical. More importantly, these measures do not have clear upper and lower bounds (what is the shortest duration of a call to be considered a contact?) that play the role of first and last layers, respectively. This calls for a more general version of the model that would allow us to consider intensities of high granularity, possibly continuum. On the other hand, such a model would be conceptually very general in so far as many resource allocation problems are of a continuous or quasi-continuous nature.
The purpose of this paper is therefore to introduce a general model in which the allocated amount of resource can take any positive real number. After going through the description of the model and its mathematical study, we apply it to three different datasets in which the intensity of personal relationships is measured with continuous variables: face to face contact time 18 , number of messages between Facebook users 19 , and number of phone calls exchanged 13 . Our analyses unveil the existence of a structure similar to that found when intensities are considered as discrete categories, thus showing that there is no need to exogenously categorize the data to understand its structure. More importantly, we prove the existence of a new universal scale parameter η , which replaces (and is consistent with) the scale factor ∼ 3 ubiquitously found in the discrete scenario with social relationships.

Model description
We introduce our model from a completely abstract viewpoint, by starting from an individual that must distribute a limited amount of some resource among an assortment of N different choices. We will denote by L the average number of different choices the individual makes and by S the average amount of resource invested in them (irrespective of which magnitude we use to measure it). In the particular example of the ego-networks that we will explore in more depth here, L represents links to alters and S the individual's cognitive resources devoted to keep those links. At this point, however, we are not concerned by the precise nature of these two magnitudes, only with their existence and their limited values.
For the time being, let us assume that all possible choices can be classified within r different categories, each of them bearing a different cost (in terms of resource invested) s max = s 1 > s 2 > · · · > s r = s min . A maximum entropy analysis shows that the probability that an individual chooses ℓ k elements within the category k ( k = 1, . . . , r ) is given by 12 where B (L, L /N, N) is the binomial distribution for the total number of choices, and Here δ(x, y) = 1 if x = y and 0 otherwise, and the parameter μ =μ(σ ) is determined by the equation The cost is the only variable that distinguishes different choices, so in order to make discrete categories it is natural to split the whole range of costs uniformly. Thus, with k = 1 ( k = r ) corresponding to the most (least) costly category, following the standard convention used in previous studies. Substituting this form for s k into the probability distribution (2) we obtain with µ ≡μ(s max − s min )/(r − 1). This is nothing but the probability distribution of links in an ego-network that was obtained in Ref. 12 , but our goal is to describe a continuum of levels, not these discrete categories. To that purpose we need to take the limit r → ∞ appropriately: levels will now be described by a continuous index t ≡ (k − 1)/(r − 1) ∈ [0, 1] . Notice that t = 0 corresponds to s max and t = 1 to s min , so the parameter t can also be interpreted as a sort of 'distance' to the corresponding choice. As a matter of fact, it is possible to parameterize everything in terms of cost rather than distance, by introducing s = 1 − t.
In order to proceed now with the distribution of links (5), it has to be realized that in this limit it becomes quite an unmanageable object-a path integral. There are two ways to circumvent this technical problem. The first one amounts to calculating the limit of averages. For the second one we should realise that, in the limit r → ∞ , the only dependence on ℓ(t) is through the moment L 1 , so instead of dealing with a limit of (5) it is better to take the limit of the probability distribution where W(L, L 1 ) is a factor that only depends on L and L 1 and whose specific form does not concern us at this point. The first approach will be useful to obtain the expected distribution of choices as a function of their costs; with the second one we will derive a Bayesian estimate of the parameter µ(σ ) in the limit r → ∞.
A continuum version. Using the distribution (5) it is possible to calculate ε k , the expected number of choices from category k, as well as χ k , the expected number of choices with costs larger than or equal to that of category k. The latter is what in the literature of ego-networks is referred to as a social "circles" 20 . It is straightforward to obtain the expression 12 Taking the limit r → ∞ transforms these expected values into their continuous counterparts: In particular, χ(t) is the fraction of links whose "distance" to the individual is not larger than t. Finally, we must find the relationship between the continuum parameter η and the discrete parameter σ . In order to do that, we start off from Eq. (3) which, after substituting (4), becomes The continuum limit ( r → ∞ , µ → 0 with η = µ(r − 1) = constant ) of this expression yields an implicit equation whose solution provides the sought for dependence η = η(σ ) . Notice that and since g ′ (η) > 0 for all η ∈ R , Eq. (11) has a unique solution for any 0 < t < 1 . As a matter of fact, η = 0 for t = 1/2 , whereas η > 0 for t > 1/2 and η < 0 for t < 1/2-hence η ′ (t) > 0 (see the plot of g(η) in Fig. 1).
Connection with the theory of ego-networks. With the calculations above, we are now in a position to obtain a quantitative estimate of the parameter η that determines the distribution in the continuum, for the specific application to Dunbar's social circles in this limit. Recall that in the social circles interpretation the choices are links to alters of an ego, cost means cognitive cost, and the categories describe layers of emotional closeness of the corresponding relationships.
For large values of µ , Eq. (8) behaves as This shows, on the one hand, that in the ordinary regime ( µ > 0 ) the circles (quantified by χ k ) satisfy an approximate scaling relation, and on the other hand, that in the so-called "inverse" regime ( µ < 0 ) the closest circle becomes overpopulated. Both behaviours have been properly documented in the literature 4,5,12,21,22 . The corresponding analysis for the continuum model requires that we determine the asymptotic behaviour, for large η , of the logarithmic derivative of χ(t) , namely (10) s max − σ s max − s min = e µ (r − 1)e rµ − re (r−1)µ + 1 (r − 1)(e rµ − 1)(e µ − 1) . www.nature.com/scientificreports/ As the discrete version of the left-hand side is (χ k+1 − χ k )/χ k �t , a comparison between (13) and (14) in the ordinary regime leads to η�t ≈ e µ − 1 . Since �t ≈ (r − 1) −1 , we obtain the equivalence Equation (14) reveals that η is the true underlying scaling factor of the circles. Therefore, the equivalence just derived implies that the value of µ in the discrete model must depend on the total number of circles r. This fact has been overlooked in previous analysis of the original circles model because of the implicitly assumption that there are r = 4 circles in the structure of ego-networks 4 . If we set r = 4 in (15) and input the empirical scaling observed in this model e µ ≈ 3 4 , we conclude that the scaling to be expected in a continuous setting of social relationships must be η ≈ 6 . This is a concrete prediction of the continuum model that needs to be tested against actual data.

Data analysis
In this section we will explore how this continuous model compares to actual data. We will use three datasets for this comparison: phone calls 13 , face-to-face contacts 18 , and interactions between Facebook users 19 . But before that we need to develop a formalism to make the fits and determine their confidence intervals.
Bayesian estimate of the scaling parameter. Starting from (7) and assuming a noninformative uniform prior for µ , it follows that, up to a normalising constant, In the continuum limit and using the definitions (6), where The limiting distribution (17) allows us to obtain η for any dataset as the maximum-likelihood estimate. Differentiating with respect to η leads to the equation www.nature.com/scientificreports/ Comparing this equation to (11) provides the interpretation As in the discrete case, Eq. (21) enables us to estimate the value of the parameter η given the total cost per item σ from a set of empirical data. There is an important difference though: we now need to set the scale of costs, namely the values of s max and s min , using additional information on the dataset-a problem that did not arise in the discrete case because the first and last categories were fixed. Remember that s max defines the largest possible cost that one can invest in one item, whereas s min defines the least possible such cost. Once these parameters are known, t is estimated as where the s i are the costs associated to each of the items i = 1, . . . ,L , measured in the same units as s max and s min .
For the confidence interval of the maximum-likelihood estimate of η we need to introduce the function Then the 1 − 2δ confidence interval for η , given L and L 1 , is obtained through the cumulative distribution More precisely, the confidence interval [η − , η + ] is determined by solving the equations Ŵ(η − |L,L 1 ) = δ , and Ŵ(η + |L,L 1 ) = 1 − δ (see "Methods" for numerical details). In what follows we choose a 95% confidence interval using δ = 0.025.

Mobile phones dataset.
We have obtained the first dataset to analyse from Ref. 13 (actually, data were originally collected for another study 23 ). This dataset contains the phone activity of 24 individuals during 18 months. At the beginning of the study, all participants (12 males, 12 females, ages [17][18][19] were in their final year of secondary school, so that about six months later they transitioned into either university (18 of them) or labour market. The data from the phones (which were given for free to the participants along with 500 free monthly voice minutes and unlimited text messages) were complemented with three questionnaires, one at the beginning of the study, another one at the end of month 9, and a last one at the end of month 18. With this information, the authors were able to merge phone numbers that belonged to the same person, and, most importantly, to conclude that the number of calls was a reliable estimate of the emotional closeness of the relationships (see Ref. 13 for details).
In the original study, the communication patterns of the participants were analysed by dividing the dataset into three time intervals ( T 1 , T 2 , and T 3 ) of six months each. For each time interval, the number of calls from each ego to each alter were counted and the alters were subsequently ranked based on this number. Then, the curve representing the fraction of calls as a function of the alters' ranks is used as a fingerprint of the ego's communication pattern. The main result of this study is that, even though the composition of personal networks varies considerably over time, these patterns are consistent across the different time windows. They named these patterns social signatures and conjectured that they were likely a consequence of a constraint on the available resources (time and cognitive skills) necessary to manage relationships.
In order to analyse these data we first aggregate them into the same time windows, so that we end up with a list (per time window) of pairs (a i , n i ) for each ego, where a i is a given alter and n i is the total number of calls made to that alter. As we explained in "Bayesian estimate of the scaling parameter" section, prior to fitting the model we need to determine what s min and s max are for each participant (at each time window). To that end, we first select the minimum and the maximum number of calls each ego made every month to any alter. Then, s min (respectively s max ) for each time window is defined as the sum of the monthly minima (respectively maxima) along the six-month period. The rationale for these definitions is that these would have been the maximum and minimum number of calls to an alter, had this alter been the same all along the time window. Once s min and s max have been determined, we filter out any interaction below s min (alters receiving fewer calls do not qualify as true relationships) and fit the model as explained in "Bayesian estimate of the scaling parameter" section. Figure 2 summarises our results. As we can see in panels A-D, the distributions of the parameter estimates are centred around values consistent with the predicted η ≈ 6 scaling (see "Connection with the theory of egonetworks" section). Additionally, the model is able to capture individual's nuances (panels a-d), and the fittings are, generally speaking, strikingly good (see Supplementary Information for a comprehensive set of figures, including fittings for every subject within every time window). Furthermore, we find a very high, significant . www.nature.com/scientificreports/ correlation between the estimated parameter for each ego and the number of alters in his or her network ( L ). More precisely: η T 1 ∼L T 1 ( r = 0.84, p < 10 −6 ), η T 2 ∼L T 2 ( r = 0.52, p < 10 −3 ), η T 3 ∼L T3 ( r = 0.81, p < 10 −5 ) and η T 1 ∪T 2 ∪T 3 ∼L T 1 ∪T 2 ∪T 3 ( r = 0.83, p < 10 −6 )-Pearson's r coefficients, 2-tailed tests. This fact further endorses the claim that the amount of resource available to form relationships is a seemingly fixed quantity that individuals spread according to the maximum entropy principle 12 . Lastly, we analyse if the parameter η may serve as a quantitative characterisation of the social signatures. In Ref. 13 , the authors used the Jensen-Shannon divergence 24 (JSD) to measure the shape difference (distance) between signatures. Sticking to the notation in that reference, we will denote d ij ab the JSD distance between the signature of ego i in time a and ego j in time b. This measure was used to compute the variation between the signatures of the same ego (i) in consecutive time windows as d ii 12 ≡ d self 12 (i) and d ii 23 ≡ d self 23 (i) . For comparison, the authors also computed the reference distances and found that these reference distances were consistently higher than the ones between signatures of the same ego 13 .
We perform a parallel analysis using the relative change between two different values of η as a measure of the "distance" between them. That is, using the same notation, self-distances are obtained as whereas the reference distances are given by  Fig. 3 we show the resulting distributions of self-( d self ) and reference distances ( d ref ). The distribution d ref is again consistently higher than that of d self -which is confirmed by a Mann-Whitney U test yielding p < 10 −3 (two-sided). Therefore, the different egos tend to have a persistent value of η just like they have a persistent social signature. Given that the central premise of our model is that the resources available to create relationships are limited (see "Model description" section), this result reinforces the conjecture 13 that the existence of social signatures is a consequence of this very constraint.
Face-to-face contacts dataset. In this section we analyse data from face-to-face interactions 18 that took place during a scientific conference in Turin, Italy, in 2009 (see "Methods"). The data were collected using proximity sensors that voluntary participants ( n = 111 , about 75% of the attendees) had embedded in their conference badges. The sensors recorded interactions over intervals of 20s when two or more participants were facing each other at less than about 1.5 − 2m (see Refs. 18,[25][26][27] for technical details). With this information, we can build the network of interactions for each participants using the time spent together as a proxy of the intensity of the implied relationships.
The high temporal resolution of the data permits us to characterise the values of s min and s max in several ways. One natural option is to aggregate the data over one day 18 , and use a similar rule to the one we applied in "Mobile phones dataset" section-that is, use the sum of the maximum time spent with any alter on each day as s max , and the sum of the minima as s min . However, during a conference, many different time restrictions may apply to the attendees, such as having an agenda of presentations to attend or deliver. As a consequence, the aforementioned heuristic may not apply here, since it is very likely the case that it was not entirely up to the participants with whom to spend their time at a given moment. Furthermore, we do not have any information on the interactions with the 25% of individuals who were at the venue but chose not to participate. These facts impose clear limitations to the conclusions we can draw from applying our model, and they are hardly avoidable. Therefore, we adopt a rather cautious position and do not aggregate the data on daily time windows. Instead, we simply take s max as the maximum time spent (and recorded) with one alter during the whole conference, and s min as the minimum one. Additionally, we exclude all participants who had fewer than five alters in their networks, ending up with a total of 95 valid cases.
Our results (Fig. 4a) show a long-tailed distribution for the parameter estimates with a clear peak close, once again, to the predicted η ≈ 6 , which suggests that the overall behaviour of the contact patterns seems to agree with our model. However, even though some fittings are quite good (see Fig. 4b), overall they are not as good as those of the mobile phones dataset (see Fig. S6 in the Supplementary Information). For comparison, we also carried out the analysis using the same approach as in "Mobile phones dataset" section to set s min and s max . Figure S5 in the Supplementary Information collects the corresponding results, showing individual fits that are slightly worse and distributions of the parameter estimates centred around a higher value ( η ≈ 14 ). It has to be taken into account www.nature.com/scientificreports/ that, as explained above, these data are inherently noisy and assessing the intensity of the relationships (or even merely of the interactions) based solely on time spent together during a conference can be misleading. Ideally, we would need this type of data but from individuals in their daily lives, so that the interactions recorded would better correspond to decisions of the individual. Nevertheless, even with the mentioned limitations, the model is still capable of capturing the patterns of face-to-face interactions to some extent.
Facebook dataset. If we compare the results from "Mobile phones dataset" and "Face-to-face contacts dataset" sections (Figs. 2a, 4a) we can appreciate how, as the sample size increases, the distribution of the parameter estimates seems to smooth around a well-defined central value η ≈ 6 . If that were the case, it would be a clear indication that the parameter of the model is indeed capturing a real feature of the way individuals manage relationships. To further explore this possibility, we analyse a larger dataset of interactions in Facebook 19 . This dataset was obtained using a crawler on April 2008 and comprises data on roughly 3 million Facebook users and 23 million edges. Importantly, it also contains the number of interactions (photo comments or Wall posts) between users. The data is divided into four different time windows (referred to the time of the crawl): last month, last six months, last year and all-which contains all the interactions among the users since they established their links 19 .
To analyse the structure of the personal networks in Facebook, the authors of that study filtered the data to retain only active, relevant users from which the relative frequency of contact with all his or her alters could be adequately assessed (see Ref. 19 for details). The resulting dataset contains about 90, 000 users and 4.5 million links. Applying two different clustering techniques, k-means 28 and DBSCAN 29 , they found that the structure of personal networks of Facebook users consists of a set of 4 concentric, inclusive circles according to the intensity of their links, and that the sizes of these circles exhibited a more or less constant scaling ratio close to 3-thus, resembling what is found in offline social networks 4 .
Since clustering algorithms find an optimal partition of personal networks into four circles with a scaling of approximately 3, our model should yield a distribution of parameters centred around η ≈ 6 . In this case, for each individual s max is simply given by his or her most intense interaction, and s min by the least intense one-with this decision, we can use the original dataset without any further pre-processing. Figure 5 confirms our hypothesis, showing a smooth distribution with mean = 8.25 , median = 7.17 , and mode = 5.48 . Interestingly, the size of this sample allows us to find, for the first time, individuals exhibiting an inverse regime ( η < 0 ). Specifically, we find 256 users, about 0.3% of the population, exhibiting this type of structure-to be precise, only for 7 of them ( 0.007% ) the 95% confidence interval does not include the zero. In Fig. 5b,c we show representative fits of individuals in the standard and the inverse regime, respectively. Let us remark that not only does our model capture the typical structure of personal networks 19 , but it also unveils that the inverse regime 12 can also be found in digital communications-in spite that this is the last environment one would expect to find it because of the usual inflation of contacts it favours.

Discussion
In this paper we have presented an extension of the discrete model of costly allocation of resources introduced elsewhere 12 , which treats the cost as a continuous variable. While our approach allows us to deal with any such problem of resource allocation, we have applied it to case of the structure of personal networks when the intensity of emotional links is given by a continuous magnitude (time spent with the alter, number of phone calls, messages exchanged, etc.) which cannot be naturally classified in categories or layers of intensity. We have found that the behaviour of this continuous model is qualitatively identical to that of its discrete counterpart. Remarkably, our experimental results show that the estimates of the new parameter characterising the distribution of links www.nature.com/scientificreports/ ( η ≈ 6 ) are consistent with the scaling relation between circles typically observed in discrete settings ( µ ≈ 1 or e µ ≈ 3 ). Consequently, one may wonder whether the organisation of personal networks has a discrete (as empirical evidence has suggested so far) or continuous nature. Given the abundant empirical evidence for the existence of discrete layers, we are inclined to think that the discretisation might be real-if only because of the natural human tendency to classify and the inherent difficulty to deal with the continuum. However, this discretisation will hardly be perfect and may be subject to fluctuations. Moreover, even if the (psychological) organisation of the networks were perfectly discrete, it would be difficult for all people within the same layer to receive precisely the same attention (number of calls, contact time, and so on) at all times, which would cause continuous fluctuations. Let us emphasise that under no circumstances are both results incompatible, since our (continuous) model does not assume at any time that the distribution of intensities is continuous, but only that it can be so measured. The model we have developed simply allows us to manage this type of data without having to make ad hoc assumptions on the number of layers. Importantly, the principles underlying both types of structures are indeed the same, namely that relationships are costly in terms of (cognitive) resources and that the we have a limited amount of these resources to devote to them.
The use of the continuum approach we have introduced here has its own drawbacks. Dispensing with the layers/circles allowed us to find a parameter that characterises the scaling of the distribution of resources valid in any situation, but the price to pay is that the scale in which the intensity of the relations is measured (i.e., s min and s max ) has to be inferred from additional information on the problem. This creates a further challenge when fitting the data, and decisions have to be made based on plausible reasons-but there might be other possibilities. This might well be one of the reasons why the individual fittings seem to be somewhat worse than the ones obtained with the discrete model 12 , and it is an issue that deserves further attention.
On the other hand, it is important to realize that one of the assumptions of the model is that the effort devoted to relationships is a perfect indicator of their intensity. This must be compared with the different types of information with which we have measured these efforts (number of calls, face-to-face contact, and number of messages exchanged), which are nothing more than proxies for that effort. In particular, although contacts can be maintained using different means (phone calls, personal meetings, Facebook, etc.), in our analyses we are focusing only on one of them. Including all the data of contacts among people through any means should improve the results. Another relevant issue is that, more likely than not, all communications are not equally intense, even if their duration is the same, which is a significant source of noise for our model. In any case, given the simplicity of the model and the particularities of the data, the fits are remarkably good. Furthermore, the aggregate distribution of the parameter estimates (which might compensate for individual errors) exhibits a clear shape centred around the expected value of η ≈ 6 , a remarkable result in itself that makes it clear that our relationships exhibit the signature of a resource allocation problem.

Methods
All numerical analyses are carried out in Python with the packages scipy.optimize and scipy.integrate. The documentation of these packages can be found in https:// docs. scipy. org/ doc/ scipy-0. 14.0/ refer ence/ gener ated/ scipy. optim ize. fsolve. html.
To compute the integrals in Eq. (24) for finite values of u we use the function quad (Python). For u → ∞ we evaluate them using a Gauss-Laguerre quadrature with 150 points. Overflows due to exponentiation are www.nature.com/scientificreports/ avoided by evaluating the logarithm of the integrand, and the singularity at η = 0 is avoided by Taylor expanding e −ηr and e −η up to third order. Likewise, the singularity at η = 0 of (20) is avoided by using the Taylor expansion χ k ≈ k/r + (k/2r)(e η − 1)(k − r) for |e η − 1| ≤ 10 −6 . The extremes of the confidence interval [η − , η + ] and the Eq. (20) are solved using the function fsolve with tolerance 10 −6 . The code used for these analyses is publicly available 30 . Data for the analysis of "Face-to-face contacts dataset" section has been downloaded from the SocioPatterns webpage 31 (last accessed 24 January 2019). They register face-to-face interactions that took place during the scientific conference "Hypertext 2009: 20th ACM Conference on Hypertext and Hypermedia" (http:// www. ht2009. org/), held in Turin, Italy, between June 29th and July 1st in 2009.
The Facebook dataset used to be available, upon request, at http:// curre nt. cs. ucsb. edu/ socia lnets/ under the name "Anonymous regional network A". However, as of April 24, 2021, it seems that the web is no longer available. We obtained the data thanks to Prof. Ben Zhao's kindness.