Introduction

Various constituents of social systems have been found to follow remarkable statistical regularities. Only the recent availability of relevant data made it possible to unravel such features. Tracking bank notes or cell phones it has been shown that humans follow simple and reproducible mobility patterns1,2. The communication via e-mails occurs in bursts, exhibiting a broad distribution of times between successive messages of individuals (inter-event times)3,4. Recently, we have found that the act of sending messages of individual users in two online communities present long-term correlations5 characterized by power-law correlation functions obtained via standard Detrended Fluctuation Analysis.

In the present work we examine the relation between the two empirical findings of broad inter-event time distributions3,4 and the long-term persistence identified in the communication activity5. Therefore, we investigate the communication activity of actants in a social online community with special consideration of the timing and study long-term correlations in the communication as well as clustering of successive messages. Here, the term clustering is used when the events tend to occur in burst, i.e. packages of many events are separated by long periods without events. In the case of power-law inter-event times this takes place on all scales. In other words, in the case of (temporal) clustering, the inter-event time distributions are more inhomogeneous than in the case of Poissonian statistics.

Long-term correlations have been found in the dynamics of many physical, technological and natural systems. They are characterized by a divergent correlation time, i.e. a power-law decaying auto-correlation function (for a review see6). Such correlations lead to a pronounced mountain-valley-structure on all time scales – comprising indeterministic epochs of small and large values7. This type of persistence represents a surprising regularity since it is present in many different data such as DNA-sequences, human heartbeat, climatological temperature, etc.8,9,10. Long-term persistence in human related data has been reported for highway traffic11,12, Wikipedia access13, Ethernet traffic14, finance and economy15,16,17, written language18,19, as well as physiological records9,20,21. Human brain activity22,23,24 and human motor activity25 also comprise long-term correlations as well as city growth26,27,28,29, biological networks30 and the spreading of disease31.

The distributions of inter-event times (times between successive messages) have been found to be rather broad, described by power-laws3. If many short intervals are separated by few long ones, the activity as messages per unit time comprises persistence, i.e. epochs of large and small activity. Since such distributions have been described with power-laws, we wish to investigate the relation between the long-term correlations in activity5 and the broad (power-law) distribution of inter-event times3. We will test two possible scenarios: (i) In the first scenario, the long-term correlations found in the communication activity5 result from Levy type distributions, i.e. correlations are only due to the power-law inter-event time distribution (with exponents in the specific range)32. In the second scenario, (ii) the activity comprises ‘real’ correlations, i.e. the inter-event time distributions do not follow a power-law, but the communication activity is temporally not independent, namely long-term correlated.

We study the activity of sending messages based on detailed temporal data from a social online community and obtain the long-term correlation exponent H via DFA. The exponent H depends on the overall activity of the members; the more active the members the larger the fluctuation exponents. This exponents reaches a value H ≈ 0.90 for the most active users from an uncorrelated value H ≈ 0.5 for the less active ones. Then, we compare the value of H with the corresponding exponents of randomized data and a theoretical prediction relating correlations with clustering in the inter-event times. From the consistency of the comparison of this three measures, we conclude that the long-term correlations found in the activity of sending messages for single users is a direct consequence of the power-law distributed inter-event time of the individuals. Thus, the burstiness in the user activity explains the long-term correlations.

More interesting results are found when we consider the activity of the whole community as a sum of the activity of its members. Again we find non-trivial long-term correlations with exponents H in the same range as the individual users. However, the origin of this correlations is not related to the inter-event activity. This is probed by shuffling the activity data but preserving the distribution of inter-event times. In this case, this shuffling destroys the long-term correlations, implying that the correlations are not a byproduct of the broad distribution of inter-event times. We conclude that the whole system acts as a true long-term correlated system where correlations are not directly related to the Levy distributions of events.

We analyze the data of an online community (www.pussokram.com, POK33,34,35,) covering the complete lifetime of the community over 492 days from February 2001 until June 2002. We record the activity among almost 30,000 members with more than 500,000 messages sent. This internet-site has been used for general social interactions and dating. The data consist of the time when the messages are sent and anonymous identification numbers of the senders and receivers. The data has been analyzed by us in5,36. In contrast to similar network data sets consisting only of snapshots, i.e. temporally aggregated social networks expressing who sent messages to whom, the advantage of this data set is that it provides the exact time when the messages were sent. For a discussion see37.

Before shutdown, the members could log in and meet virtually. In such communities, there are different ways of interacting. Usually, it is possible to choose favorites, i.e. certain members, that a person somehow feels committed to. Such platforms also offer the possibility to discuss in groups with other members about specific topics. We focus on messages sent among the members – they are similar to e-mails but have the advantage that they are sent within a closed community where there are no messages coming from or going outside. Figure 1 illustrates patterns of sending messages for typical single users [a–d] and for the whole community [e]. The data is publically available at http://lev.ccny.cuny.edu/hmakse/soft_data.html. We would like to note that we do not consider here the QX dataset which we analyzed in5,36, since it covers only 2 months and the scaling of the distribution of inter-event times is not reliable and we could not measure the shape of this distribution consistently.

Figure 1
figure 1

Examples of activity of sending messages and overall activity in POK.

The vertical lines in (a) and (c) represent the instants when the messages have been sent by two arbitrary members. The panels (b) and (d) show the records of number of messages per day, x(t), of the same two members. The record of the total number of messages sent by all members per day within POK is depicted in (e). (a) and (b): member326 (M = 1023); (c) and (d): member9414 (M = 100).

Results

Study of correlations in individual activity

Applying DFA21,38,39 we have found in5,36 that the individual activity records, x(t), i.e. messages per unit time (records of messages per day or per week), exhibit long-term correlations. The fluctuation function provided by DFA scales as

where the exponent H is also known as the Hurst exponent. In the case of long-term correlations – which are characterized by a power-law decaying auto-correlation function:

where 〈·〉 denotes the average, σx is standard deviation of x(t) and γ is the correlation exponent (0 ≤ γ ≤ 1) – one finds 1/2 ≤ H ≤ 1, whereas larger exponents correspond to more pronounced long-term correlations. For uncorrelated or short-term correlated records (γ ≥ 1, or in general γ ≥ d, d is the substrate dimension) the asymptotic fluctuation exponent is H = 1/2. In the range 0 ≤ γ ≤ 1 both exponents are related via

For an overview, we refer to6,39. DFAn removes polynomial trends of the order n – 1 from the original record x(t), i.e. DFA2 copes with linear trends.

It is important to note that the DFA fluctuation function Eq. (1) is not applied to the activity x(t), but to the integrated signal y(t) = Σtx(t′). Thus, x(t) would be the analogous to the steps in a random walk and y(t) the displacement. DFA incorporates an additional detrending of the data. The integration leads to the appearance of long-term correlation when the interval between each step is power-law distributed. We will come back to this result when explaining the long-term correlations in terms of the burstiness.

We have measured the fluctuation exponents by applying least squares fits to log Ft) vs. log Δt on the scales 10 < Δt < 70 weeks conditional to the member's activity level, e.g. their total number of messages, M5. Figure 2 depicts the DFA results. We find that the less active members, sending very few messages in the period of data acquisition, exhibit uncorrelated behavior. The more messages the members send, the more correlated is their activity. The fluctuation exponent H increases with M and reaches values up to H = 0.91±0.04 (value obtained for sending messages, we disregard the last points, M > 400, which have too large errors bars). The uncorrelated behavior(H ≈ 0.5) for small activity can be understood since when M ≈ 1–10 there is not enough time in the data acquisition window to capture long-term correlations. Thus, the change from H = 0.5 to H = 0.91 might be most probably due to a crossover behavior due to finite acquisition time. In36 we propose a model which reproduces the dependence of the fluctuation exponents on the activity level of the members. For receiving messages we find almost identical results36. We use weekly resolution in order to cope with possible weekly oscillations4,40,41,42.

Figure 2
figure 2

Fluctuation exponents of the communication activity sending messages.

The exponents are plotted as a function of the activity level M (final number of messages) for the original data (green circles), shuffled data preserving the individual inter-event times (blue squares), as well as the exponents expected from Eq.(4) (brown triangles down) and from the distribution of inter-event times as characterized by power-law fits to the curves of Fig.4 providing the exponentµ. The error bars were obtained from from randomly separating the members into 10 groups and repeating the analysis for each of them.

Similar long-term correlations have been found in43,44 in traded values of stocks and e-mail communication. The fluctuation exponent increases with the mean trading activity of the corresponding stock or with the average number of e-mails similarly as in our results.

Study of clustering in individual activity

The timing of human communication activity has been found to comprise bursts where many events occur in relatively short periods which are separated by long periods with few or no events at all. Such patterns can be characterized with the inter-event times, i.e. the times, dt, between successive messages. For e-mail communication it has been argued that their probability density follows a power-law,

with exponent µ ≈ 13,45,46. As an origin for such heavy tails in human dynamics a queuing model has been suggested3 according to which each individual performs tasks from a priority list. It has been confirmed that such a process can reproduce bursts of activity or clustering, see e.g.47,48. In contrast, analyzing the same e-mail data, a log-normal distribution has been found to be more appropriate to describe the inter-event time distribution49,50. We would like to remark that fitting fat tailed distributions is disputed51,52,53,54. There is neither a consensus on a typical functional form nor on a proper fitting technique. Recently, a cascading Poisson process based on daily and weekly cycles has been proposed as origin of slower-than-exponential decays of P(dt)4,42. We studied the cascading Poisson process in36.

In55, memory in the sequences of dt has been studied for different data sets, characterizing the inter-event times in terms of a burstiness parameter, which is based on the distribution and in terms of a memory coefficient, which is the auto-correlation function at lag 1. In addition, the authors locate the corresponding data sets in a phase diagram defined by these two quantities. Nevertheless, we would like to note that the quantification of long-term correlations in the dt can be hindered by noise56,57.

Next, we study the POK data, i.e. the inter-event times dt between successive messages of individual members and relate their statistics to the long-term correlations. The finding of long-term correlations opens the question of the origin of such a persistence pattern in the social communication. From a statistical physics point of view, we consider two possible scenarios:

  1. 1

    In the first scenario, the intervals between the messages follow a power-law3,58. Accordingly, the activity pattern comprises many short intervals and few long ones, implying persistent epochs of small and large activity. This fractal-like clustering in the activity can – depending on the exponent – lead to long-term correlations with H > 1/2 (see the analogous problem of the origin of long-term correlations in DNA sequences as discussed in59). This scenario implies a direct link between the correlations in the activity and the distribution of inter-event times which can be obtained analytically60. We call this scenario “Levy correlations” since the actual activity may not be correlated per-se, but correlations arise as a byproduct of integrating a signal with a power-law distribution of inter-events in the DFA formalism.

  2. 2

    In the second scenario, the intervals between the messages may or may not follow a power-law distribution, but the values of the inter-event times are not independent of each other and comprise ‘real’ long-term persistence. For example, the distribution of inter-event times could be stretched exponential (see recent work on the study of extreme events of climatological records exhibiting long-term correlations56,61) and then the only way to explain long-term correlations in the activity are correlations in the inter-event times. We call this scenario “true correlations” since the correlations are not related to the distribution of inter-events but they reflect ‘real’ correlations in the dynamics of the communication activity.

A possible way to discern between these two scenarios is to shuffle the temporal activity, keeping the inter-event distribution intact. While in the case of Levy type correlations shuffling the inter-event times should not influence the long-term correlation properties of x(t), in the case of ‘real’ long-term correlations shuffling the inter-event times should destroy the (asymptotic) long-term correlations since the memory is due to the arrangement of the inter-event times. In what follows, we investigate the activity of individual members and the activity of the whole POK community.

Study of inter-event distribution of individual members

Figure 2 exhibits the fluctuation exponents for individual members when we shuffle the data but preserve the distribution of inter-event times. This is done according to the following steps: (i) Extract the set of inter-event times of each user. (ii) Shuffle the extracted data. (iii) Rebuild the record of events. Since the sum of the inter-event times does not add up to the entire period of data acquisition, the first event is chosen so that the remaining time is split into two, one part in the beginning and the other one at the end. (iv) Repeat the analysis.

The corresponding exponents also reach high values, almost as high as for the original data and do not drop for very active members. This agreement is a first indication of Levy correlations in single user activity.

Further evidence is found by studying the distribution of inter-event times in the activity of each individual. Figure 3 shows the probability density, P(dt), of times between messages of the same users sent in the online community. A power-law regime of approximately two decades can be seen with an exponent µ ≈ 1.5, which differs from the exponent reported for e-mail communication3,46, i.e. µ ≈ 1. A reason for these different findings might be that in the case of3 only one user is considered and that µ depends on the activity level of the users, as we show below. In addition, here we study all messages from a closed community. The exponent we find is closer to the one reported for reply times (waiting times), i.e. the time individuals spend between receiving and sending to the same communication partner. For reply times of e-mails and land mail µw ≈ 1.5 has been reported3,62.

Figure 3
figure 3

Probability density of inter-event times dt between successive messages sent by a single member of POK, in daily resolution.

The values are extracted by considering every single individual sending messages in the period of data acquisition and then joined from all members. The dotted line in the top corresponds to the exponent µ = 1.5.

Since we found a dependence of the fluctuation exponent H on the activity level M, i.e. the total number of messages each member sends, we suspect that also µ might depend on M. Thus, in Fig. 4 we plot for sending messages in POK (daily resolution) the P(dt) for groups of different activities, i.e. different total number of messages M. We find that for the most active members P(dt) decays rather steeply, while for the least active members P(dt) decays much slower. Due to the finite size of the data it is not quite clear which functional form the curves follow. If one assumes a power-law decay then the exponents are roughly in the range 1 ≤ µ ≤ 3.

Figure 4
figure 4

Probability density of inter-event times dt between successive messages sent by all individual members of POK in daily resolution.

The values are extracted for the individuals and unified among members according to their activity level M. The curve for the most active members is in the bottom, while the one for the least active is in the top. The dotted lines correspond to the exponents −1 (top) and −3 (bottom).

As discussed above, the power-law distribution of inter-event times, Eq. (3), can lead to long-term correlations in activity, without requiring temporal dependencies between the intervals themselves. It can be shown that the long-term persistence properties of this point process are characterized by the fluctuation exponent which theoretically depends on µ according to23,32,60,63:

see Fig. 5. Apart from detrending, DFA provides an integration of the original record. So if there are long periods of no activity due to power-law inter-event times, then, this is reflected in long-term persistence in the signal calculated by DFA. Thus, the existence of long-term correlations is due to the long periods distributed via Levy distributions as expressed by the direct relation between correlations and Levy inter-event activity, Eq. (4).

Figure 5
figure 5

Levy correlations and persistence.

Theoretical relation between the inter-event time distribution exponent µ and the fluctuation exponent Hµ according to Eq. (4).

Applying least squares fits (in the straight range) to the P(dt) for sending in POK (Fig. 4) we obtain values for µ as a function of the activity level M and determine the corresponding fluctuation exponents, Hµ, as expected from Eq. (4). We would like to note that the curves in Fig. 4 are not always straight lines leading to large uncertainty regarding the estimated values of µ.

Figure 2 depicts the fluctuation exponents Hµ from Eq. (4) in comparison with the values obtained from DFA. We find HHµ for a big part of the M range. The exponents Hµ are also close to H of the shuffled records where the inter-event times are preserved. The fact that when we shuffle the signal, respecting the corresponding distribution of inter-event, gives rise to the same correlation function, indicates that the origin of the long-term correlation obtained in DFA are due to the Levy correlations. This is further corroborated by the agreement between H from DFA and the prediction Hµ. From Fig. 2 we see that the three curves are in a reasonable agreement. This supports that the correlations in single user activity can be due to the power-law distribution of the inter-event times, which is in favor of Levy type correlations.

Study of whole community activity

Next, we investigate the activity of the community as a whole. While we have studied the activity of single users, it is of interest to investigate the activity of the whole community by considering the number of messages sent by all members in a specified period of time. Figure 1(e) shows such activity temporally aggregated to one day. The interest arises since we would like to test the existence of correlations emerging from collective behavior in the communication patterns at the level of the whole community.

For this study, we disregard who sends the messages to whom and only consider the instants when any message was sent. In order to have a sufficiently long record to apply DFA, we aggregate the data to messages per hour (instead of daily or weekly resolution). As can be seen in Fig. 1(e), the record contains oscillations4. Since such periodicities lead to erratic fluctuation functions39, we subtract the hourly averages over all days: xtot(t) → xtot(t) − 〈xtott mod 24.

The DFA fluctuation functions are shown in Fig. 6. The hump on scales around 20 hours in the results of DFA1 and DFA2 are residual oscillations, i.e. they were not completely removed. On larger scales this effect vanishes and we find a fluctuation exponent Htot ≈ 0.9. The straight line in the case of DFA0 is due to the fact that the maximum exponent is 139. More importantly, when the record of the whole community is shuffled but preserving the inter-event distribution, the asymptotic scaling is F t)1/2. That is, in contrast to the result for individual activity, when we shuffle the signal of the whole community, we obtain the uncorrelated exponent: (dashed lines in Fig. 6). The fact that the correlations vanish (H = 0.9 → H = 0.5) when the data is shuffled indicates that the long-term correlations found in the activity of the community as a whole are not due to Levy correlations. Instead, correlations in the whole community are “true correlations” appearing as a manifestation of collective behavior of the scale of the entire community.

Figure 6
figure 6

Fluctuation function of the record of messages sent by any member of POK.

The record is the same as in Fig. 1(e) but in hourly resolution. Prior to applying DFA, the record has been deseasoned according to xtot(t) → xtot(t) − 〈xtott%24. The different curves differ in the DFA-order (DFA0-DFA2, from top to bottom), which determines the capability of detrending. DFA2 eliminates linear trends in xtot(t)39. The dotted line in the bottom corresponds to a power-law with exponent H = 0.5 and serves as guide to the eye while the continuous line at the top represents H = 0.9.

Another surprise appears when we calculate the distribution of inter-event times for the whole community. Here we define inter-event the time between the sending of two consecutive messages of any member in the community. This contrasts with the same study done at the single user level (Fig. 4) when inter-event is defined as the time between two events of the same user. In a sense, P(dt) for the entire community captures the collective behavior emerging from the entire community as information travels through the network.

In Fig. 7 the resulting probability density is displayed. We find a plateau up to 50 seconds followed by a power-law decay according to Eq. (3) with µ ≈ 2.25. Thus, the distribution of inter-event activity of the community as a whole is also a Levy type like the single user activity, albeit with a larger exponent. Such a larger exponent reflects the fact that P(dt) is narrower for the community than for the individuals, as expected.

Figure 7
figure 7

Probability density of inter-event times dt between successive messages sent by any member of POK in seconds.

The dotted straight line corresponds to a power-law with exponent µ = −2.25 and serves as guide to the eye.

When we convert the exponent µ ≈ 2.25 to the Hµ through the Levy distribution model, Eq. (4), we find Hµ ≈ 0.88. Thus surprisingly, Eq. (4) may also explain the persistence as in the individual activity. However, the main evidence of Fig. 6, that is, the fact that the correlations vanish when we shuffle the data, probe that, even if Eq. (4) provides a good estimation of H, the long-term correlations are due to ‘real’ correlations and are not an artifact of the integration of a Levy type activity with DFA.

The long-term correlations found in the behavior of the entire community is more understandable than in the activity of single members, since the activity of the community is based on the communication patterns of the messages and information flowing through the whole system. The existence of H ≈ 0.9 at the whole level and the indications that the correlations are real ones is an interesting instance of the emergence of critical behavior in the collective dynamics of the system as a whole.

We conclude that while at the individual level we find Levy correlations, the activity of the whole community comprises ‘real’ correlations, which is due to the (possibly correlated) superposition of the individuals activity into a collective self-organized information flow in the system. Such a behavior is reminiscent of critical systems in phase transitions.

Discussion

We have studied the timing of communication in a social online community and find long-term persistence in the activity of sending messages at the single user level and the whole community level. Furthermore, we have addressed the question of the origin of these long-term correlations and whether these are Levy type or ‘real’ correlations. While in the case of Levy type correlations the inter-event times need to be power-law distributed, ‘real’ long-term correlations are independent of the distributions, since they are due to interdependencies in the activities.

Our work, then, still leaves unanswered the question of the cause of the long-term persistence in the communication patterns at the whole community level. One possibility is that the temporal correlations are related to correlations in the network structure64,65. The persistence could also be due to social effects, i.e. the dynamics in the social network66 induces persistent fluctuations, such as cascades. An example could be that a group of friends tries to make an appointment and therefore sends many subsequent messages in a relatively short time67. After agreeing, the communication activity among the group drops. The activity patterns of individuals could be understood as a superposition of many such cascades. On the other hand, it could be purely due to a state of mind23, solipsistic, emerging from moods. More research is needed to thoroughly understand the interesting properties of human activity and its motives.

In conclusion, we have determined 3 exponents to characterize communication activity: (i) H, the fluctuation exponent of the original data, (ii) Hshuf, the fluctuation exponent when the data is shuffled preserving the inter-event times, (iii) Hµ, the fluctuation exponent which is expected from power-law distributed inter-event times. We find that HHshufHµ ≈ 0.9 which supports the hypothesis of Levy correlations in the single user activity, while we find H ≈ 0.9 ≠ Hshuf ≈ 0.5 for the collective behavior of the whole community revealing non-trivial long-term correlations and self-organization at the level of the whole system.

We should mention a third scenario which we leave for future work. It is possible that the correlations comprise more complex features. It has been shown that nonlinear correlations in multifractal data sets lead to power-law distributed inter-event times (of peaks over threshold)68. In fact, the authors of68 find in their Fig. 1(c) a similar dependence of µ on the total number of events as we do for Hµ in our Fig. 2. Additional analysis is needed to fully characterize the multifractal properties69,70,71 of communication activity via e-mails or messages in online communities.