Introduction

Misinformation has traditionally represented a political, social, and economic risk. The digital age, in which new ways of communication arose, has exacerbated its extent, and mitigation strategies are even more uncertain. However, according to the World Economic Forum, massive digital misinformation remains one of the main threats to our society1.

The diffusion of social media caused a shift of paradigm in the creation and consumption of information. We passed from a mediated (e.g., by journalists) to a more disintermediated selection process. Such a disintermediation elicits the tendencies of the users to a) select information adhering to their system of beliefs – i.e., confirmation bias – and b) to form groups of like-minded people where they polarize their opinion – i.e. echo chamber2,3,4,5,6,7.

Under these settings, discussion within like-minded people seems to negatively influence users’ emotions and to enforce group polarization8,9. What’s more, experimental evidence shows that confirmatory information gets accepted even if containing deliberately false claims10,11,12,13,14, while dissenting information is mainly ignored or might even increase group polarization15. Current solutions, such as debunking efforts or algorithmic driven solutions based on the reputation of the source, seem to be ineffective16,17. To make things more complicated, users on social media aim at maximizing the number of likes (Attention Bulimia) and often information, concepts, and debate get flattened and oversimplified. In such a disintermediated environment, indeed, the public opinion deals with a large amount of misleading information that might influence important decisions.

Computational social science18 seems to be a powerful tool for a better understanding of the cognitive and social dynamics behind misinformation spreading1. Along this path, in the present work we address the evolution of online echo chambers by performing a comparative analysis of two distinct polarized communities on the Italian Facebook, i.e., science and conspiracy. The sizes of both the communities are firstly analyzed in terms of their temporal evolution and fitted by classical population growth models deriving from biology and medicine fields. The behavior of users turns out to be similar for both categories, irrespective of the contents: both science and conspiracy communities reach a thresholding value in their sizes, after an almost exponential growth, in agreement with classical growth models.

Moreover, we analyze the community behavior by accounting for the engagement and the emotional dynamics of users. Indeed, whether a news item, either substantiated or not, is accepted as true by a user may be strongly affected by social norms or by how much it coheres with the community shared system of beliefs.

Users’ emotional behavior seems to be affected by their engagement within the community. An higher involvement in the echo chamber, resolves in a more negative emotional state. Such a phenomenon appears in both users categories. Moreover, we observe that, on average, more active users show a faster shift towards the negativity than less active ones.

The paper is structured as it follows. First we analyze the structural evolution of both science and conspiracy communities on the Italian Facebook. Then we explore the user sentiment behavior as a single unit, and subsequently we explore the sentiment contagion inside each of the two communities from a macroscopic point of view.

Results and Discussion

Community Evolution

Online social networks might elicit the aggregation of individuals in communities of interest. For the particular case of science and conspiracy users on the Italian Facebook (refer to section Methods for more details on the data collection and classification), the emergence of two separate echo chambers has already been shown in a previous study11. However little is known about the structural evolution of the two communities and the role of users’ engagement in shaping them. To shade light on the determinants of group formation, as a first step, we analyze and compare the temporal evolution of science and conspiracy communities size by considering users commenting activity.

More in details, we divide users in three categories:

  • U1 the set of all active users – i.e. of all those users that commented at least once,

  • U2 the set of all users that commented at least twice, and

  • U5 the set of all users that commented at least five times.

For each set of users we look at the temporal evolution of the science and conspiracy communities, defined as:

where i {1, 2, 5}, nu is the total number of comments made by user u, su is the number of comments that user u made on science posts, cu is the number of comments that user u made on conspiracy posts, and t {1, …, T}, with T equal to the number of days of observation (i.e., 835). We consider the threshold of 0.95 for the membership inside one community in accordance with previous studies10,12.

Figure 1(a) shows the temporal evolution of the size of the communities resulting from the previous classification. The dataset has been sampled by daily resolution, over the period January 2010–April 2012, for a total of 835-days observations. A similar global behavior emerges in all cases, and significant quantitative differences arise between C1 (or C2) and C5, as well as between C1 (or C2) and Si, i {1, 2, 5}. This phenomenon may be linked to the abundance of low-activity users inside the conspiracy communities, and for this reason in the next sections we will restrict our attention to the respective most active communities, S5 and C5. We also pairwise compared the six sample distributions by means of the Kolmogorov-Smirnov test (see Table 1 for the tests’ results). For each users typology, we reject the null hypothesis of equivalence between science and conspiracy distributions, at the 99% confidence level.

Table 1 Results from Kolmogorov-Smirnov tests.
Figure 1
figure 1

(a) Temporal evolution of the size of the communities S1 (solid violet), S2 (dotted orange), S5 (dashed pink), C1 (solid blue), C2 (dotted sea green), and C5 (dashed green). (b) Boxplots of the users’ mobility within each group. From left to right, results for C1, S1, C2, S2, C5, and S5.

In Fig. 1(b) we report the summarizing statistics for the users’ mobility inside one particular community by box and whiskers plots19 (or, simply boxplots). Black horizontal lines represent the median of the number of users entering or exiting the science and conspiracy communities, and the colored boxes represent the interquartile ranges (i.e., the 25th–75th percentile ranges) and they statistically measure the degree of dispersion and the skewness of each analyzed distribution: the users which enter the science and conspiracy communities (violet and blue boxes, respectively), and the users which exit from each community (green and orange boxes for science and conspiracy, respectively). Vertical lines (i.e., the whiskers) are lower and upper bounded by the minimum and maximum values of the corresponding distribution, once both outliers and extreme values are removed from the data. Individual points represent the outliers of each analyzed distribution. From the left to the right, each set of boxplots corresponds to one user’s category (i.e., U1, U2, and U5).

In all cases we notice a significant difference between the users entering into and exiting from a community, favorable to the formers, indeed more than 99% of the users’ flow is made up of those users entering one community.

These two results underline that the behavior of users is similar for both categories, irrespective of the contents. After an initial spike-like growth, the communities evolve at a nearly constant rate. Moreover, once a user enters one community the probability to get out of it is very small.

To better characterize the temporal evolution of both communities, we fit the Gompertz growth model (GM) in (5), the Logistic model (LM3, LM5) in (6), and the Log-logistic model (LLM) in (7) to our sample distributions S5 and C5, representing the temporal profile of quite active users, i.e. with at least 5 total comments, affiliated to science or conspiracy communities, respectively. The models are chosen on the basis of the observed evolution of the communities’ size, that is characterized by a first phase of rapid growth, approximately exponential, followed by a more gradual one.

For each model we estimate its parameters through the Nonlinear Least Squares NLS (see Section Methods for more details about the fitting models). Fit’s results are shown in Fig. 2 for both science (panel a) and conspiracy (panel b). Four fits are superposed to original data (green dotted line): GM (bold orange line), LM3 (dotted violet line), LM5 (dashed-dotted blue line), and LLM (dashed purple line).

Figure 2
figure 2

Fit of the temporal evolution of the size of science (a) and conspiracy (b) communities. We fitted the data with four growth models: GM (bold orange line), LM3 (dotted violet line), LM5 (dashed-dotted blue line), and LLM (dashed purple line).

As it can be deduced from Fig. 2, all models show a good approximation of the temporal evolution of science and conspiracy communities sizes. Anyway, in order to identify the best fit and the quality of each fit, we perform a series of Kolmogorov-Smirnov tests (KS) between the real data and each of the synthetic distributions and Maximum Likelihood Estimates (MLE). Results of the KS tests are reported in Table 2. By considering a level of significance α = 0.01, we fail to reject the null hypothesis of equivalence of the two distributions in all cases. The Logistic model maximizes the log-likelihood for both S5 and C5.

Table 2 Results from Kolmogorov-Smirnov test.

The particular S-shaped behavior observed on raw data, and then characterized by growth-model fits, reminds the one observed in the framework of population growth, where after a first stage of huge growth, a saturation level is reached, and population stabilizes. Logistic and Gompertz growth models found several fields of application, ranging from demography and sociology, to biology and ecology20,21.

Science and conspiracy communities reach a thresholding value in their sizes growth, as fit results suggest. Those users which are deeply engaged in a community are more likely to become focused on a particular topic, and their increasing involvement into highly specified topics makes them “isolated” from the neighboring environment, which in this case is the whole world of knowledge. What is curious is that both conspiracy and science communities show the same size profiles.

To better assess the reliability of model fits results, we further inspect the time evolution of S5 and C5 communities sizes through advanced spectral methodologies extremely useful to uncover the presence of significant oscillatory movements, besides the huge growing trend dominating both communities temporal evolution. More precisely, we try to identify trends, oscillatory components (both periodic and not-periodic), and background noise in our series to finally reconstruct the embedded true signal, by summing up the contributions of all its significant components. We chose non-parametric methods, such as singular-spectrum analysis and similar methodologies22,23, in order to analyze our records time evolution by an alternative approach, which is not based on fitting an assumed model to the data, with the final goal in mind to further support model fits results by a completely different method. Indeed, the simultaneous and flexible application of more than one spectral tool can assure a quite reliable and robust analysis of temporal dynamics, especially when the signal-to-noise ratio is low, besides dealing with finite sample length. Moreover Monte-Carlo SSA (MCSSA)24,25 test is applied to assess the significance of the revealed oscillatory modes with respect to both white and red background noise null-hypotheses. The reader is referred to Section Methods for deeper details about the applied methodology.

Both conspiracy and science time series behavior turn out to be described by the first two T-PCs (temporal principal components), which in that case correspond to the trend. More in details, the trends capture the 96.16% and the 95.44% of S5 and C5 series total variance, respectively. Besides, we extracted the pure significant reconstructed signals from our series, and we observed that they resulted to be quite similar to trends (exception made for some boundary effects due to the finite-sample length). Figure 3 shows the trends (dotted violet line) and reconstructed signals (dashed green line) superposed to S5 (panel a) and C5 (panel b) communities size evolution in time (orange lines). Boundary effects are visible, especially at the beginning of the series, but quite negligible. Trends are able to catch both S5 and C5 temporal profile, and they mainly coincide with the reconstructed significant signals, in both cases.

Figure 3
figure 3

S5 (a) and C5 (b) dominant spectral components. Original series are shown in orange lines, trends in dotted violet lines, and significant signal reconstructions in dashed green lines.

As a further check, we pre-process data, first by removing the trend, second by standardizing-by-trends the so obtained residual time series. Pre-processing is required since the presence of such a pervasive trend reflects in a high peak at zero frequency dominating the shape of power spectrum estimate, and sometimes hiding eventual higher-frequency cycles. No significant cycle is detected in S5 and C5 series after trend removal. Figure 4 shows S5 and C5 detrended time series (panel a) and S5 and C5 residual time series standardized by their trends (panel b). The apparent oscillating behavior visible in raw data and in the detrended time series (especially in C5, Fig. 4a) is not connected to significant oscillatory modes, according to Monte-Carlo SSA test. Besides, both communities show a smoother profile after Jan 2011 (Fig. 4b), corresponding to the range t > 600 in Fig. 2. At that time, both S5 and C5 growth starts to decrease. Notice that both the pre-processed time series shown in Fig. 4b are standardized to zero mean and unit variance, in order to help the visual comparison between S5 and C5.

Figure 4
figure 4

Pre-processing procedure.

(a) Detrended S5 (solid pink) and C5 (solid green). (b) Standardized-by-trend S5 (solid pink) and C5 (solid green) residual time series. In panel b, the pre-processed series are standardized to zero mean and unit variance.

We can finally infer that the trends determine the time evolution of our records, only. Thus, we compare the previous described model fits (GM, LM3, LM5, LLM) to the S5 and C5 trends, only. No particular difference emerges between science and conspiracy communities in terms of their growth, and the linear correlation between both communities trends and each fitted model turns to be very high for all the cases, preventing us to identify a significantly favorite fit, in agreement with the results previously reported in Table 2. Pearson correlation coefficient is computed, since no particular significant cycle emerges from S5 and C5 sizes records spectral analysis, thus reducing the risk of underestimating the presence of an eventual correlation at time-shifted version of the original series.

Our analysis thus suggests that communities present strong similarities, and that the behavior of users inside each of them is similar. Once they have selected their preferred group, users seem to undergo community dynamics, that are similar in both science and conspiracy case, irrespectively of the content.

Users’ Sentiment Analysis

Now we zoom in at the level of the emotional dynamics of the polarized groups. We approximate the emotional attitude of users towards one piece of information that they commented by considering the sentiment of the text. We label the sentiment of each comment as: negative (−1), neutral (0), or positive (+1). We perform an automatic sentiment classification based on supervised machine learning, refer to Section Materials and Methods or to ref. 9 for more details.

Our aim is to characterize the emotional behavior of the users as a function of their involvement inside the community. To do this we define three new measures, the mean user sentiment (σi), the mean negative/positive difference of comments (δNP(i)), and the user sentiment polarization (ρσ(i)) as it follows:

where Ti is the number of days in which user i was active, Negj(j) the number of i’s negative comments in day j, Posj(i) the number of i’s positive comments in day j;

where Ni, ki, hi are respectively the number of all, negative, and neutral comments left by user i, while li = Ni − ki − hi is the number of the positive ones. Note that ρσ(i)  [−1, 1] and that it is equal to 0 if and only if li = ki or hi = Ni, it is equal to 1 if and only if ki = Ni, and it is equal to −1 if and only if li = Ni. While σi is simply defined as the mean of the sentiment of all comments left by user i.

Figure 5 shows the average sentiment σi for all users (panel a), science users (panel b), and conspiracy users (panel c), as a function of the user engagement – i.e., the total number of comments left by each user. In the insets we report, for each of the three categories, the value of σi as a function of the number of comments for the most active users, i.e. those users with at least 100 comments. We then regress the mean user sentiment σi w.r.t. the logarithm of the number of comments. We notice that σi becomes more negative as the number of comments increases, in all cases, when considering all users. However, when we restrict our attention to the most active users, we notice that σi becomes more negative as the number of comments increases only in science case, while the opposite is true for the other cases.

Figure 5
figure 5

Mean final sentiment σi of all users (a), science users (b), and conspiracy users (c), as a function of the user engagement. In the insets we report the value of σi for those users with at least 100 comments.

Figure 6 shows the mean negative/positive difference of comments δNP(i) of all users (panel a), science users (panel b), and conspiracy users (panel c), as a function of the user engagement. In the insets we report, for each of the three categories, the value of δNP(i) as a function of the number of comments for those users with at least 100 comments. We regressed the mean negative/positive difference δNP(i) w.r.t. the logarithm of the number of comments. δNP(i) is a measure of the mean negative shift from a situation of neutral equilibrium for which either the user has only neutral comments or he/she has the same number of positive and negative comments. A positive value of δNP(i) indicates that the user tends to have, on average, more negative than positive comments. From Fig. 6 we notice that δNP(i) tends to increase when the number of comments increases in all cases, underlining the fact that, on average, more active users tend to show a faster shift towards the negativity than less active ones. The rate of this increment in the negativity is higher for users with more than 100 comments and it is also higher for science users w.r.t. conspiracy ones.

Figure 6
figure 6

Mean negative/positive difference δNP(i) of all users (a), science users (b), and conspiracy users (c), as a function of the user engagement. In the insets we report the value of δNP(i) for those users with at least 100 comments.

Figure 7 displays the user sentiment polarization ρσ(i) of all users (panel a), science users (panel b), and conspiracy users (panel c), as a function of the user engagement. In the insets we show, for each of the three categories, the value of ρσ(i) as a function of the number of comments for those users with at least 100 comments. We regressed the user sentiment polarization ρσ(i) w.r.t. the logarithm of the number of comments. The user sentiment polarization ρσ(i) ranges in [−1, 1], and it is equal to 0 either if all comments are neutral or if there is the same number of negative and positive comments, while it tends to 1 (resp. −1) when and hi is small enough, i.e., when the number of positive comments is much bigger than the number of negative ones, (resp. and hi is small enough, i.e., when the number of negative comments is much bigger than the number of positive ones). Science users show an higher value of ρσ(i), however conspiracy users with at least 100 total comments tend to increase it w.r.t. science ones.

Figure 7
figure 7

User’s sentiment polarization ρσ(i) of all users (a), science users (b), and conspiracy users (c), as a function of the user engagement. In the insets we report the value of ρσ(i) for those users with at least 100 comments.

The engagement within the echo chamber affects users emotional dynamics. The more a user is active, the higher the tendency to express negative emotion when commenting. This feature holds for both users categories. Moreover, for both categories we observe that, on average, more active users show a faster shift towards the negativity than less active ones. The rate of this increment in the negativity is higher for users with more than 100 comments and it is also higher for science users w.r.t. conspiracy ones. In terms of the users’ sentiment polarization we observe some differences between the two categories: its value is generally higher for science users, however very active science users tend to decrease their sentiment polarization with the increasing of the activity, while on the contrary conspiracy ones tend to increase it.

Evolution of the Sentiment inside the Communities

We now focus on the collective sentiment of the two communities, rather than the single user’s one. Similarly to the single user case we define the community negative/positive difference of comments and the mean community sentiment polarization as follows:

where T is the number of days of observations, the number of negative comments from users belonging to community C during day j, the number of positive comments from users belonging to community C during day j, MC is the maximum daily activity of community C, and C {Science, Conspiracy}, while

where NC, kC, hC are respectively the number of all, negative, and neutral comments left by users of community C, while lC = NC − kC − hC is the number of positive ones. Note that .

Figure 8 displays the community negative/positive difference of comments as a function of the daily community activity for science users (left: panels a, c) and conspiracy users (right: panels b, d). The top figures (panels a, b) show the values considering all users in the communities, while the bottom ones (panels c, d) only consider those users with at least 100 comments. We regressed the community negative/positive difference of comments (y-axes) w.r.t. the logarithm of the number of comments inside the community at a given time (x-axes). For both communities tend to increase, while science one shows an higher increasing rate for the most active case, conspiracy one shows an higher increasing rate for the general case.

Figure 8
figure 8

Community negative/positive difference of comments as a function of the daily community activity for science users (a,c) and conspiracy users (b,d), for all users (a,b) and users with at least 100 comments (c,d).

Figure 9 shows the mean community sentiment polarization as a function of the daily community activity for science users (left: panels a, c) and conspiracy users (right: panels b, d). The top figures display the values considering all users in the communities (panels a, b), while the bottom ones only consider those users with at least 100 comments (panels c, d). As for Fig. 8, we regressed the mean community sentiment polarization w.r.t. the logarithm of the number of comments inside the community at a given time. For the conspiracy community we notice a decrement in the value of as the number of comments increases, moreover this decrement is higher for most active users. Science community instead shows a decrement in the value of for the case of most active users and a slight increment for the general case.

Figure 9
figure 9

Mean community sentiment polarization as a function of the daily community activity for science users (a,c) and conspiracy users (b,d), for all users (a,b) and users with at least 100 comments (c,d).

Also the community sentiment behavior is affected by the cumulative users’ activity (in terms of comments). When either community is more active, the shift towards negative comments is larger. A difference between the two echo chambers comes upon if we restrict our attention only to the most active users, i.e. those with at least 100 comments. In this last case, science users show a higher rate of increment than conspiracy ones, contrary to the general case. Differently from the single user case, the community sentiment polarization shows a deep decrement with higher activity in the conspiracy community, the process is slower for science community, when we consider only most active users, and even reversed in the general case.

Conclusions

The Facebook environment is particularly suited for the emergence of polarized communities, or echo chambers. The activity inside such echo chambers is limited to only one type of content. In this work, we characterize the behavior of users inside the echo chamber and the structural evolution of the community accounting for both users activity and the sentiment they express.

We first study the evolution of the size of the two communities by fitting daily resolution data with three growth models, i.e. the Gompertz model, the Logistic model, and the Log-logistic model, and we observe that both communities evolve in a similar way and the behavior of users is similar irrespectively of the difference in contents: after a first phase of rapid growth, approximately exponential, both the communities sizes undergo a more gradual growth, till a thresholding value is reached. The lack of communication with the environment can be supposed to associate with the users extreme focusing on a precise topic.

Then we notice that both the users’ and the communities’ emotional behavior is affected by the users’ involvement inside the echo chamber. To an higher involvement corresponds a more negative approach. Moreover, for both categories we observe that, on average, more active users show a faster shift towards the negativity than less active ones. The rate of this increment is higher for users with more than 100 comments and higher for science users w.r.t. conspiracy ones. The community sentiment polarization shows a deeper decrement with higher activity in the conspiracy community, while the process is slower for science community, when we consider only most active users, and even reversed in the general case.

Methods

Ethics Statement

The data collection process has been carried out using the Facebook Graph application program interface (API)26, which is publicly available. For the analysis (according to the specification settings of the API) we only used publicly available data (thus users with privacy restrictions are not included in the dataset). The pages from which we download data are public Facebook entities and can be accessed by anyone. User content contributing to these pages is also public unless the user’s privacy settings specify otherwise, and in that case it is not available to us.

Data Collection and Description

Using the approach described in ref. 10, with the support of diverse Facebook groups very active in the debunking of misinformation (Protesi di Complotto, Che vuol dire reale, La menzogna diventa verità e passa alla storia), we identified two main categories of pages: conspiracy theories, i.e., pages promoting contents neglected by main stream media, and science information, i.e., pages diffusing scientific news and research advances for which it is easy to check the sources. Starting from this basic differentiation, we categorized Facebook pages according to their contents and their self description. The resulting dataset is composed of 73 public Italian Facebook pages, 34 of which were diffusing scientific information and 39 conspiracy theories, and covers a timespan of 5 years, from 2010 to 2014. Table 3 summarizes the details of our data collection.

Table 3 Dataset description.

Growth Models

The Gompertz Growth Model is often used to model growth phenomena which are typically characterized by an asymptotic behaviour rather than by a linear increase. In that sense, a Gompertz function has to be intended as a special case of the most general logistic function, and it is nowadays applied in various research fields, such as biology, ecology, economics, marketing, and medicine. In oncology, in particular, the Gompertz sigmoid function has been used to model tumor growth27,28, which are interpreted as an expansion of cellular populations developing in a confined space, where the availability of nutrients is limited in a certain sense. As a consequence, the model considers two parameters, a first one, a, for the tumor intrinsic growth related to the mitosis rate and a second one, b, for the growth deceleration, due to the antiangiogenic processes. Let x(t) be the size of the tumor at time t, then we have:

For a given initial condition x(0) = x0, and known parameters a and b, the solution27 is:

The most general Logistic Growth Model is defined as:

In that case, the first stage of the growth is approximately exponential, then the growth rate decreases till an asymptotic value is reached. That right-hand asymptote is reached less gradually than the left-hand one compared to the behaviour of the Gompertz function. We used two variants of the Logistic model to fit our data: L3 that considers only parameters (b, d, f) in (6), and L5 that is exactly (6).

Finally, the Log-Logistic Growth Model is defined as:

Nonlinear Least Square Fitting and Goodness of Fit

We use the Nonlinear Least Squares (NLS)28 to estimate the parameters of the various models while fitting them with our data. Consider a set of n observations (t1, x1), …, (tn, xn) and a model function depending on m parameters y = f(x, β), where β = (β1, …, βm) and n ≥ m. We want to find the vector β that minimizes the sum of squares:

where the residuals errors ri are given by

for i = 1, 2, …, m.

We tested the goodness of our fit by means of the Kolmogorov-Smirnov Test.

Advanced spectral analysis and trend extraction procedure

Singular-spectrum analysis (SSA) is a not-conventional spectral analysis method which provides insight into the unknown and/or partially known dynamics of a dynamical system22,23. More in detail, SSA aims at decomposing the signal as a linear combination of variability modes, which are data-adaptive functions of time. Thus, with respect to more traditional spectral approaches such as the classical Fourier decomposition, SSA doesn’t ground on variability modes which have to be necessarily harmonic components. As a consequence, SSA provides a powerful de-noising filter, to identify the different components of the analyzed signal, such as trends, oscillatory patterns, harmonic and/or anharmonic oscillations, quasi-periodic phenomena, without making any assumption about the underlying generating model of the observed signal29. Moreover, SSA doesn’t require the assumption of any particular stationarity or ergodicity conditions.

In order to distinguish between significant signal and random fluctuations (i.e. background noise), Monte-Carlo SSA (MCSSA) is applied. MCSSA grounds on a particular Monte Carlo approach to the signal-to-noise separation issue, suited to overcome the limitations of classical signal extraction procedure, i.e. the identification of simply a gap in the eigenvalues spectrum24. Recent fine-tunings of the method have been proposed to further improve results robustness and reliability in short time series25. In the present work, MCSSA is applied to S5 and C5 time series, to establish whether our time series are linearly distinguishable from the linear stochastic processes, usually considered as noise. Both white and red noise null-hypotheses are taken into account, since the choice of the most suitable kind of noise in social sciences, when dealing with advanced spectral methodologies, is still under debate.

Sentiment Classification

The sentiment classification is carried out as in ref. 9 and refers to the same dataset. We consider three values for the sentiment of each comment: negative (−1), neutral (0), and positive (+1). We perform an automatic sentiment classification based on supervised machine learning that consists of the following four steps: (i) a sample of texts is manually annotated with sentiment (in our case 20 K randomly selected comments are manually annotated by 22 native Italian speakers), (ii) the labeled set is used to train and tune a classifier, (iii) the classifier is evaluated on an independent test set or by cross-validation, and (iv) the classifier is applied to the whole set of texts. For more details on the classifier or on its performance refer to ref. 9.

Additional Information

How to cite this article: Del Vicario, M. et al. Echo Chambers: Emotional Contagion and Group Polarization on Facebook. Sci. Rep. 6, 37825; doi: 10.1038/srep37825 (2016).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.