## Introduction

Social media have dramatically changed the way people produce, access and consume information1, and there is increasing evidence that online discussions have the potential to impact society in unprecedented ways2. For example, the public debate around the COVID-19 pandemic has been accompanied by the so-called Infodemic that is affecting the outcome of the vaccination campaign by increasing hesitancy3,4,5. Also, online discussions in the Reddit channel r/wallstreetbets induced many individuals to buy GameStop shares in opposition to the shorting operation carried out by hedge funds and professional investors. As a result, the market capitalization of the company displayed an increase of more than \$22 billion in just a few days6. It is not surprising therefore the renewed scientific interest to comprehend the mechanisms that drive information propagation.

Analyses of the propagation of information in social media reveal, at least qualitatively, similarities with other natural phenomena such as the firing of neurons7,8 and earthquakes9. These processes are characterized by bursty activity patterns. The activity consists of point-like events in time, and bursts (or avalanches) of activity are defined as sequences of close-by events. Bursts are separated by long periods of low activity. Activity can be characterized at the macroscopic level by the distributions P(S) and P(T) of the size S and the duration T of avalanches10,11,12,13,14,15. In real-world systems P(S) and P(T) have a power-law decay for large value of their argument, i.e., P(S) ~ Sτ and P(T) ~ Tα7,8,9,12,16,17,18. This property is interpreted as evidence of the system operating at, or in the vicinity of, a critical point. This statement is supported by the theory of absorbing phase transitions according to which, if the avalanche dynamics is at a critical point, then P(S) and P(T) must decay as power laws, see Eq. (3). Furthermore, in a process operating at criticality, the average size of avalanches with given duration must obey the hyperscaling relation 〈S〉 ~ Tγ, with γ = (α − 1)/(τ − 1)16,19,20. The specific values of the exponents τ and α typically differ for classes of systems. Their actual values are fundamental for the characterization of systems into universality classes, i.e., an ontology of processes with conceptual and practical relevance21.

Universality is the notion that nearly identical avalanche statistics are observed for a multitude of systems governed by different dynamical laws that nevertheless share some basic core mechanisms. Criticality instead refers to the fact that avalanche statistics are characterized by algebraic distributions. Classifying a system within a universality class is informative about the basic core mechanisms that drive the unfolding of the avalanches. Where information propagation (in general, and in online social media) is concerned, the issue of the existence of well-defined universality classes is far from settled. Existing analyses typically study data collected from a single source and over short observation windows. It is often found that distributions of avalanche size and duration obey power laws, but the estimated values of the exponents vary across studies: τ values range between τ 2 and τ 413,14,22,23,24, whereas α 3.625 or α 2.526,27. Also, empirical studies reporting on correlations between size and duration of avalanches fail to find a power law28,29. This variability might be ascribed to multiple operative definitions of avalanches, which can be given in terms of hashtags time series22,28 as well as reply trees or retweet chains13,24,30. Furthermore, regardless of the definition, the temporal resolution can affect the avalanche distribution12,31.

As a consequence of the variability in the distributions inferred, uncertainty about representative theoretical models remains. In particular, it is an open problem to determine when and if models based on simple contagion are more appropriate to describe the spreading of information online than those based on complex contagion. Stemming from the similarity between the spreading of disease and information, a widely accepted paradigm is that information propagates according to a simple contagion process, where only a single exposure to activity may be sufficient for its diffusion10,13,22,28,32,33. Simple contagion is at the core of many theoretical models of information propagation used in the literature, all displaying critical properties of the mean-field branching process (BP), i.e., τ = 3/2 and α = 234,35,36,37, see Methods. However, there are quite a few studies in favor of the complex contagion paradigm38,39,40,41. As originally introduced by Centola and Macy, in a complex contagion process the involvement of an individual in the propagation of information requires exposure from multiple acquaintances42. Complex contagion is exemplified by some models, such as the linear threshold model and the Random Field Ising Model19,43 (RFIM), see Methods. Distinguishing between simple and complex contagion and, possibly, comprehending how they coexist within the same population44, is fundamental to understand the spreading of (mis)information in online social media38,45.

In this work, we perform a large-scale study of (hash)tags time series from Twitter, Telegram, Weibo, Parler, StackOverflow and Delicious [see Methods and Supplementary Information (SI) A for details about the data sets]. We consider a total of 206,972,692 time series. In our study, a time series consists of all posts that carry the same topic identifier, such as a hashtag on Twitter. Taken cumulatively, our time series consists of 905,377,009 events, collected over periods even longer than 10 years. The Twitter data, collected specifically for this work, are fully available together with codes to reproduce the results of this paper46,47. To define avalanches in a principled fashion we adopt the approach inspired by percolation theory proposed in Ref. 31, see Methods. We provide evidence that social media share universal statistics of avalanches that are well described by power-law distributions. We also develop a novel statistical technique able to determine the level of criticality and complexity of individual time series, see Methods. We find that nearly 20% of the time series are less than 5% away from criticality. These account for 53% of all events in our data sets. At the aggregate level, each social medium displays a critical behavior that is compatible with the RFIM, indicating that, plausibly, processes compatible with complex contagion may play a preponderant role in information diffusion. A more detailed analysis reveals a more nuanced scenario, where about 50% of the individual time series are better explained in terms of a complex rather than a simple contagion process. A qualitative analysis of the most popular hashtags suggests that information concerning conversational topics, e.g., music or TV shows, spreads according to the rules of simple contagion, whereas information concerning political/societal controversies shows signatures of an underlying complex contagion process.

## Results

### Selection of temporal resolution

Here, an avalanche is defined as a maximal subset of contiguous events in a time series such that two consecutive ones are separated by a time interval smaller than Δ. A proper choice Δ* of the time resolution Δ for the specific data set at hand is necessary to avoid significant distortion in the resulting avalanche statistics. This is true for synthetic time series generated by temporal point processes31, but also for the empirical time series as those analyzed in this paper (see SI E for details). To determine the value of Δ* we use the principled method developed in Ref. 31 that identifies Δ* as the critical point of a one-dimensional percolation model, see Methods for details. Results are presented in Fig. 1. Values of Δ* for each data set are reported in the SI A; they vary substantially across data sets, from Δ* 1500 s for Twitter to Δ* 30,000 s for Telegram (Fig. 1b).

Once the time resolution is rescaled according to Δ → Δ/Δ*, the curves of the percolation strength for the different data sets exhibit a nearly identical quantitative behavior, see insets of Fig. 1. This fact suggests the possibility of seeing the propagation of information in social media as a universal process, with Δ* representing the natural resolution for observing information avalanches. Figure 2a, b shows the distributions of avalanche size and duration obtained by setting Δ = Δ*. Figure 2c shows the relation between average size and duration. The collapse of the curves relative to different data sets on a single curve hints once more, at least when data are considered at the aggregate level, to processes belonging to the same universality class.

### Criticality and universality of avalanche statistics

The avalanche statistics of Fig. 2a–c seems well described by power laws, indicating that the underlying process is (nearly) critical, and that its universality class can be identified by estimating the value of the critical exponents τ, α, and γ, see Eq. (3)21. We rely on maximum likelihood estimation for τ and α48; linear regression on the logarithm of the relation 〈S〉 ~ Tγ is used to estimate γ. Results are reported in Fig. 2d, see SI C for details. The estimated exponent $$\hat{\tau }$$ is compatible with the one of the mean-field RFIM universality class, i.e., τ = 9/419. The compatibility of the avalanche statistics with those of a homogeneous mean-field model is not surprising given that in some social media there is no underlying network among users and in others there are mechanisms for the propagation of information that bypass it. For example, in Telegram all users who subscribe to a channel receive all messages sent from any other user of that channel, meaning that there is an all-to-all network among all users of the channel as in the mean-field version of the RFIM. In StackOverflow there is no underlying network as users do not follow each other, rather they search for content using common tools offered by the platform. Even in Twitter, where users have follower–followee relationships, the network can be easily bypassed by the way the platform manages users’ feeds. There is an apparent mismatch between our estimates $$\hat{\alpha }$$ and $$\hat{\gamma }$$ and the RFIM predictions α = 7/2 and γ = 2 due to finite-size effects. To properly address this issue, we performed numerical simulations of the RFIM, and measured the maximum likelihood estimators of τ and α. For consistency, we performed the same operation for the BP too. The results of Fig. 2 reveal that, overall, our data are compatible with the phenomenology of the RFIM and not with the phenomenology of the BP.

The proximity of exponents estimated across different data sets points to the existence of a genuine and distinctive universality class for information propagation in social media when considered at the aggregate level. In particular, this class seems to be different from that of the BP often invoked as a representative in phenomena related to information diffusion. This universal scaling is a genuine feature of social media, as if we repeat the same analysis on time series describing activity in very different types of systems, e.g., brain networks and earthquakes, avalanche duration and size still decay in a power-law fashion, but with radically different exponent values, see SI D for details. In particular, for neuronal avalanches in the brain, we recover exponents compatible with previous studies8,49,50,51.

### Complexity of avalanche statistics

To assess if the statistical properties obtained on aggregate data are representative of individual time series, we develop a maximum likelihood method to fit the time series against the BP and the RFIM. The technique is inspired by the work of Ref. 48, see Methods for details. The method supports three different tests. First, it establishes the regime of a time series, depending on how the best estimate of the branching ratio parameter $$\hat{n}$$ compares to the critical value nc = 1 for the BP, or how the best estimate of the disorder parameter $$\hat{R}$$ compares to the critical value $${R}_{c}=\sqrt{2/\pi }\simeq 0.8$$ for the RFIM. Second, it evaluates the goodness of the individual fits via their p values. Similarly to the prescription of Ref. 48, we set the threshold for statistical significance equal to p = 0.1. We verified, however, that the outcome of the analysis is not greatly affected by the choice of the threshold value, see SI J. Third, it establishes whether a time series is better modeled by the BP or by the RFIM by comparing their likelihood.

Results of our analysis are reported in Figs. 3 and 4. Our method is applied only to time series that contain at least two avalanches larger than Smin = 10. These two avalanches must also have different sizes, so that P(S) has at least two non-zero values. Tests of robustness for different Smin values are reported in the SI J. In all systems we find that the best fitting parameter assumes values over a broad range, encompassing a large portion of the subcritical phase and the critical point of the models (Fig. 3a, b). The majority of events belongs to a minority of time series giving rise to the largest avalanches. As a consequence, the large-scale behavior of each system is mainly determined by those few time series that are fitted in a narrow region of the parameter space close to the critical point for both the BP and the RFIM (insets of Fig. 3a, b). Also, our tests indicate that the vast majority of time series are well described by at least one of the two models (Fig. 4a). The model selection indicates that individual time series are divided into two nearly equally populated classes, one better described by the BP and the other by the RFIM (Fig. 4a). Simple and complex contagion thus coexist in social media, with only a mild dominance of complex over simple contagion (Fig. 3c). The individual-level analysis is not incompatible with the results obtained for the aggregate data (Fig. 2). If we aggregate data only from the time series that we attributed to the class of complex contagion, we consistently recover a power-law scaling compatible with that class for all avalanche sizes, see Fig. 3d. However, the aggregation of time series that are classified in the BP class generate a distribution characterized by a neat crossover from BP scaling for small avalanches to RFIM scaling for large avalanches (Fig. 3d). The mixture produces a universal distribution that is overall more compatible with the RFIM universality class rather than the BP class (Fig. 2c).

## Discussion

We showed that temporal patterns characterizing bursts of activity in online social media are conveniently classified in two universality classes. This finding suggests that few core mechanisms determine the large-scale behavior of information diffusion and that many peculiarities that characterize individual platforms are far less relevant. Also, in contrast with the vast majority of previous studies where purely diffusive models have been considered37, we showed that information propagation in social media is often better described by complex contagion dynamics. Complex contagion is here exemplified by the RFIM, an agent-based model of activation originally formulated to describe the para-to-ferromagnetic phase transition in metals19. Recast in the language proper to the description of information propagation52, the RFIM prescribes that each agent (i) has a personal opinion, (ii) is subject to the social influence exerted by the agents she interacts with, and (iii) is also driven by an external force representing the public information about exogenous events. These appear reasonable assumptions for modeling many realistic discussions happening in social media. Figure 4b shows the 30 most popular Twitter hashtags identified by our method either in the simple or in the complex contagion classes. In the category of simple contagion, we find conversational topics, mostly related to music or cinema/TV shows. Hashtags belonging to the class of complex contagion display either periodic patterns or are related to political/controversial themes. This suggests the existence of a relation between the semantics of hashtags and the universality class of the corresponding time series. This qualitative picture fits with previous studies that have explicitly focused on the semantic of different hashtags in Twitter45. For both classes of information avalanches, we inferred the dynamics underlying their generation as critical, a fact that provides theoretical ground for the surprising but remarkable robustness of our findings. The presence of a large portion of social media content that acquires popularity via complex contagion dynamics calls for a reconsideration of predictive algorithms relying on the temporal characteristics of the signal only, because these algorithms often neglect the semantics of hashtags and, even more frequently, the characteristics of the network over which they spread53,54,55,56,57. Both aspects are important for the successful characterization of the process underlying the propagation of information38,45,58,59. We further speculate that our results extend beyond the six platforms considered here. If so, there must be a mechanism that explains the universality shown by the data, involving critical dynamics that is independent of the peculiarities implemented in the individual platforms. Understanding where this mechanism is rooted in and how to exploit this mechanism for the prediction of the propagation of information in online social media remain open challenges for future research.

## Methods

### Data

We build a time series for each (hash)tag appearing in the data at our disposal. A time series contains the times, i.e., {t1, t2, …}, when the (hash)tag is observed in the data.

Specifically, the Twitter data set is composed of 2,353,192,777 tweets corresponding to a 10% random sample of all Tweets posted on Twitter during the observation window from October 1 to November 30, 2019. The collection of this data has been performed via the Indiana University OSoME Decahose stream60,61. Telegram time series are extracted from a total of 317,224,715 messages, originally collected in Ref. 62. Parler time series are extracted from a total of 183,062,974 posts, originally collected in Ref. 63. Weibo time series are extracted from 226,841,249 posts, originally collected in Ref. 64. StackOverflow time series are extracted from a total number of 46,947,635 questions and answers. Delicious time series were extracted from 7,034,524 users actions, originally collected in Ref. 65. Timestamps always have the temporal resolution of the second, except for the StackOverflow data set, whose temporal resolution is the millisecond.

We pre-process the data so that the number of events per unit time is roughly constant over the whole temporal window considered (see SI A for details) to obtain a corpus of 206,972,692 time series consisting of 905,377,009 total events.

### Selection of the temporal resolution

We follow the same procedure as in Ref. 31. Given a time series {t1, t2, …}, we define an avalanche starting at tb as a sequence of events {tb, tb+1, …, tb+S−1} such that tb − tb−1 > Δ, tb+S − tb+S−1 > Δ and tb+i − tb+i−1 ≤ Δ for all i = 1, …, S, where Δ is the resolution parameter. The size S of an avalanche is the number of events within it and the duration T is the time lag between the first and last event in the avalanche, i.e., T = tb+S−1 − tb. Depending on the value of Δ, the same time series is composed of different avalanches.

We identify the optimal resolution Δ* as the critical point of a one-dimensional percolation model that is used to describe the time series. Each time series in a data set is considered as an instance of the one-dimensional percolation model. We measure the size SM of the largest avalanche within each time series. We define the percolation strength P and its associated susceptibility χ, respectively, as

$$\begin{array}{l}{P}_{\infty }=\langle {S}_{M}\rangle \\ \chi =\frac{\langle {S}_{M}^{2}\rangle -{\left\langle {S}_{M}\right\rangle }^{2}}{\langle {S}_{M}\rangle }\,,\end{array}$$
(1)

where 〈SM〉 and $$\langle {S}_{M}^{2}\rangle$$ are, respectively, the first and second moments of the distribution of the size of the largest avalanche SM across all time series in a data set. Δ* is computed as the resolution maximizing χ, i.e.,

$${{{\Delta }}}^{* }=\arg \max \,\chi ({{\Delta }})\ .$$
(2)

As time series with only one event introduces an offset in the measure of P and are not informative with respect to the optimal resolution Δ*, i.e., SM = 1 for any Δ in these time series, we remove them from the sample and compute P and χ considering only time series composed of at least two events.

Values of the optimal resolution Δ* are reported in SI A. Note that the avalanche statistics reported in Fig. 2 is obtained considering all avalanches, excluding the largest one of each time series. This choice is due to the well-known fact that in percolation theory the largest cluster respects different statistics than that of finite clusters66.

### The branching process

In the BP an individual initially active spreads activity to a random number of peers, who can in turn spread activity further34. The process continues for a number T of time steps or generations, until there is a generation in which no individual further spreads activity. T is the duration of the avalanche. The size S of the avalanche is the total number of individuals activated during the avalanche. The average number of individuals who are activated from a single spreader is the branching ratio n and the model is critical for n = nc = 1. The branching ratio is the only tunable parameter of the model.

Finite avalanches of activity in the BP obey the laws

$$\begin{array}{l}P(S)={S}^{-\tau }{{{{{{{{\mathcal{D}}}}}}}}}_{S}({S}^{\sigma }n^{\prime} )\\ P(T)={T}^{-\alpha }{{{{{{{{\mathcal{D}}}}}}}}}_{T}({T}^{1/z\nu }n^{\prime} )\\ \langle S\rangle (T)\propto {T}^{\gamma }\,,\end{array}$$
(3)

where 〈〉 is the average over different avalanches, and P(S) and P(T) are the probability distributions of S and T, respectively. The functions $${{{{{{{{\mathcal{D}}}}}}}}}_{S}$$ and $${{{{{{{{\mathcal{D}}}}}}}}}_{T}$$ are known as scaling functions and introduce corrections at small values of their argument, where we have defined the reduced distance from the critical point $$n^{\prime} =| n-{n}_{c}| /{n}_{c}$$. The BP is characterized by the exponent τ = 3/2, α = 2 and γ = 2. The above exponents are not independent, rather they are related by γ = 1/(σzν) = (α − 1)/(τ − 1). σ, z and ν are additional critical exponents that we do not explicitly consider in our analysis.

### The Random Field Ising Model

We consider the mean-field formulation of the zero-temperature RFIM. Agent i is characterized by the state variable yi = ±1 indicating whether the agent is active, yi = +1, or not, yi = −1. Each agent i has a propensity hi to become active, with hi (−, +). A large value of hi indicates that the agent is particularly prone to become active. Agents interact by means of ferromagnetic interactions that model social pressure, i.e., active neighbors push an inactive agent to become active. The whole system is further affected by public information that all agents have access to and that pushes users toward becoming active with intensity H (−, +). In the initial configuration, all agents are inactive. The external pressure H grows till the agent with the largest hi value becomes active. This change of state can trigger an avalanche of activity in the other nodes. Specifically, agent j becomes active if the following condition is met

$$H+{h}_{j}+{N}^{-1}\mathop{\sum}\limits_{k\ne j}{y}_{k} \, > \, 0\,,$$
(4)

where N is the system size and the mean-field formulation is expressed by the all-to-all interaction. Once in the active state, agents cannot change their state back to inactive. When an avalanche ends, the external pressure H grows again until a new user becomes active and triggers a new avalanche. The field is frozen during the unfolding of avalanches, meaning that avalanches are characterized by a time scale much shorter than the one characterizing external pressure. In the long-term limit, when H = +, all agents become active. The size S of an avalanche is given by the number of users that are activated during the avalanche; its duration T is given by the activation rounds characterizing the avalanche.

The stochasticity of the model comes from the random nature of the propensities hi, extracted from a normal distribution with zero mean and variance R. The choice of the normal distribution is quite standard both for ferromagnets and social systems52. R is the only tunable parameter of the model, and the model is critical for $$R={R}_{c}=\sqrt{2/\pi }$$. Avalanche statistics obey laws similar to those of Eq. (3). The functional form of the scaling functions, however, is not the same as in the BP; also, their argument is given in terms of the distance from the critical point of RFIM, i.e., $$n^{\prime} =| n-{n}_{c}| /{n}_{c}$$ is replaced by $$R^{\prime} =| R-{R}_{c}| /{R}_{c}$$. The values of the critical exponents are τ = 9/4, α = 7/2 and γ = 219. In SI F, we show that the peculiar form of the scaling function $${{{{{{{{\mathcal{D}}}}}}}}}_{T}$$ introduces strong preasymptotic corrections to the functions P(T) and 〈S〉(T), affecting the measure of α and γ obtained through numerical simulations of the model.

### Model selection

To ascribe each time series to a dynamical model, we first fit each model individually by maximizing its likelihood. We evaluate the p value of the fits and, if both hypotheses cannot be rejected, we select the best fit via the log-likelihood ratio test.

To perform the fit, we compare the probability distribution P(S) of the avalanche sizes identified in the time series with the conditional distributions of the avalanche size QRFIM(SR) and QBP(Sn), respectively, obtained for the RFIM and the BP for a given value of the parameters R and n. The construction of the model distributions Q requires discretizing the parameter space of the models. In this study R varies in the interval [0.025, 2.7] by steps of length dR = 0.025 and n varies in [0.02, 1.7] by steps of length dn = 0.015. dR (dn) represents the uncertainty on the parameter. Instead of sampling avalanches from the model at a precisely given value of R (n), we consider model instances corresponding to R (n) values uniformly distributed over an interval of length dR (dn) centered at R (n). The distribution Q corresponding to a specific value of the parameter model is constructed as the superposition of 500 distributions whose parameter values are randomly sampled from the corresponding interval. Fitting a time series to a model means estimating the best parameter with an accuracy of dR (dn) for the RFIM (BP).

Given the empirical distribution P and the model distributions Q, we evaluate the log-likelihood function

$$\,{{\mbox{L}}}\,(P| | Q)=\mathop{\sum}\limits_{S\ge {S}_{{{{{{\rm{min}}}}}}}}P(S)\log [Q(S)]\,.$$
(5)

The summation is performed over all avalanches with S ≥ Smin, a parameter we vary in our analysis. The distributions P and Q are normalized over the interval [Smin, ) to account for this fact. The best fit is obtained by finding the parameter value that maximizes the log-likelihood of Eq. (5). The maximization of the log-likelihood of Eq. (5) is equivalent to the minimization of the cross-entropy of the distribution Q relative to the distribution P. To avoid numerical problems in the estimation of the likelihood, we smoothen the function Q. Details are provided in SI G.

To assign a p value to a fit, we follow the prescription of Ref. 48. Indicating with Ztail/Z the fraction of avalanches with S ≥ Smin in the fitted time series, a synthetic sample of Z avalanches is created by sampling avalanches with SSmin from the selected model Q with probability Ztail/Z and by sampling avalanches with S < Smin from the empirical distribution with complementary probability. Each of these synthetic samples is fitted analogously to the original sample obtained from the time series. We compute the Kolmogorov–Smirnov (KS) distance between the empirical distribution P and the selected model Q, as well as between the synthetic samples and their best model. The p value of the fit is defined as the fraction of synthetic samples whose KS distance from the selected model is larger than the KS distance between the real sample and its best model. The hypothesis that the sample has been generated by a certain dynamical model, say the RFIM, cannot be rejected if the p value of the fit to the RFIM is larger than a pre-established significance threshold. We set the threshold to 0.1 in the main text, following the prescription of Ref. 48. Tests of robustness against the choice of this parameter value are reported in SI J.

If one of the two hypotheses can be rejected but the other cannot, the non-rejected model automatically becomes the selected one. If both hypotheses can be rejected, the time series is classified as “None.” If, however, both hypotheses cannot be rejected, we select as the best model the one with the largest likelihood48. We neglect the possibility that a single time series could be described by a mixture of models. Empirical data are fitted only if the time series contains at least 50 events and at least 10 avalanches.

We validate our fitting procedure applying it to synthetic distributions P generated by the RFIM or by the BP. Results are shown in SI I and confirm the ability of our procedure to identify the ground-truth model and the correct value of the parameter.

More details about the fitting and model selection protocol, including tests of robustness against the threshold on the p value and on Smin, are given in the SI.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.