Main

Social media companies are struggling to control online health dis- and misinformation, for example, during the COVID-19 pandemic in 20208. Online narratives tend to be nurtured in in-built community spaces that are a specific feature of platforms such as Facebook (for example, fan pages) but not Twitter3,16,17,18. Previous studies have pointed out that what is missing is a system-level understanding at the level of millions of people13, whereas another study14 has highlighted the need to understand the role of algorithms and bots in the amplification of risk among unwitting crowds.

Here we provide a system-level analysis of the multi-sided ecology of nearly 100 million individuals expressing views regarding vaccination, which are emerging from the approximately 3 billion users of Facebook from across countries, continents and languages (Figs. 1, 2). The segregation in Fig. 1a arises spontaneously. Individuals come together into interlinked clusters. Each cluster is a Facebook page and its members (that is, fans) who subscribe to, share and interact with the content and narratives of that Facebook page. A link from cluster A to B exists when A recommends B to all its members at the page level, as opposed to a page member simply mentioning a cluster. Each red node is a cluster of fans of a page with anti-vaccination content. Cluster size is given by the number of fans, for example, the page ‘RAGE Against the Vaccines’ has a size of approximately 40,000 members. Blue nodes are clusters that support vaccinations, for example, the page ‘The Gates Foundation’ has a size (that is, number of fans) of more than 1 million. Each green node is a page focused around vaccines or another topic—for example, a school parent association—that has become linked to the vaccine debate but for which the stance is still undecided. Support and potential recruitment of these green clusters (crowds) is akin to a battle for the ‘hearts and minds’ of individuals in insurgent warfare.

Fig. 1: Online ecology of vaccine views.
figure 1

a, Snapshot from 15 October 2019 of the connected component in the complex ecology of undecided (green), anti-vaccination (red) and pro-vaccination (blue) views comprising nearly 100 million individuals in clusters (pages) associated with the vaccine topic on Facebook. The colour segregation is an emergent effect (that is, not imposed). Cluster sizes are determined by the number of members of the Facebook page. Black rings show clusters with more than 50% out-link growth. Each link between nodes has the colour of the source node. b, Global spread of Fig. 1a for a small number of clusters. The ‘global ether’ represents clusters that remain global (grey). c, Anti-vaccination clusters have a stronger growth in cluster size. Each coloured dot is a node; data are from February–October 2019. d, Anti-vaccination individuals are an overall numerical minority compared with pro-vaccination individuals; however, anti-vaccination individuals form more separate clusters.

Fig. 2: Temporal evolution of online ecology.
figure 2

a, Link growth during February–October 2019 for anti-vaccination (red; left) and pro-vaccination (blue; right) clusters. Anti-vaccination clusters successfully added many new links within the largest network patch and between network patches, despite the media ambience against anti-vaccination views during the measles outbreak in 2019. The underlying clusters are identical to Fig. 1a, that is, each network patch is a clustered region of clusters from Fig. 1a. b, Anti-vaccination clusters have a stronger growth in node eigencentrality—which indicates the influence of a node in a network—than pro-vaccination clusters. Data are from February–October 2019.

Seven unexpected features of this cluster network (Fig. 1) and its evolution (Fig. 2) together explain why negative views have become so robust and resilient, despite a considerable number of news stories that supported vaccination and were against anti-vaccination views during the measles outbreak of 2019 and recent efforts against anti-vaccination views from pro-vaccination clusters and Facebook.

First, although anti-vaccination clusters are smaller numerically (that is, have a minority total size, Fig. 1d) and have ideologically fringe opinions, anti-vaccination clusters have become central in terms of the positioning within the network (Fig. 1a). Specifically, whereas pro-vaccination clusters are confined to the smallest two of the three network patches (Fig. 2a), anti-vaccination clusters dominate the main network patch in which they are heavily entangled with a very large presence of undecided clusters (more than 50 million undecided individuals). This means that the pro-vaccination clusters in the smaller network patches may remain ignorant of the main conflict and have the wrong impression that they are winning.

Second, instead of the undecided population being passively persuaded by the anti- or pro-vaccination populations, undecided individuals are highly active: the undecided clusters have the highest growth of new out-links (Fig. 1a), followed by anti-vaccination clusters. Moreover, it is the undecided clusters who are entangled with the anti-vaccination clusters in the main network patch that tend to show this high out-link growth. These findings challenge our current thinking that undecided individuals are a passive background population in the battle for ‘hearts and minds’.

Third, anti-vaccination individuals form more than twice as many clusters compared with pro-vaccination individuals by having a much smaller average cluster size. This means that the anti-vaccination population provides a larger number of sites for engagement than the pro-vaccination population. This enables anti-vaccination clusters to entangle themselves in the network in a way that pro-vaccination clusters cannot. As a result, many anti-vaccination clusters manage to increase their network centrality (Fig. 2b) more than pro-vaccination clusters despite the media ambience that was against anti-vaccination views during 2019, and manage to reach better across the entire network (Fig. 2a).

Fourth, our qualitative analysis of cluster content shows that anti-vaccination clusters offer a wide range of potentially attractive narratives that blend topics such as safety concerns, conspiracy theories and alternative health and medicine, and also now the cause and cure of the COVID-19 virus. This diversity in the anti-vaccination narratives is consistent with other reports in the literature4. By contrast, pro-vaccination views are far more monothematic. Using aggregation mathematics and a multi-agent model, we have reproduced the ability of anti-vaccination support to form into an array of many smaller-sized clusters, each with its own nuanced opinion, from a population of individuals with diverse characteristics (Fig. 3b and Supplementary Information).

Fig. 3: Predictions and interventions.
figure 3

a, Theoretical prediction for the future total size of anti-vaccination and pro-vaccination support without new interventions (coloured lines with 2σ bands from the simulation). Under the present conditions, it predicts that total anti-vaccination support reaches dominance in around 10 years. b, Top left, our theoretical model predicts that, as observed empirically, many smaller-sized anti-vaccination clusters form, with each cluster having its own nuanced type of narrative (for example, X, Y, Z) that surrounds a general topic (vaccines in this case). Bottom left, the predicted growth profile of individual clusters can be manipulated by altering the heterogeneity to delay the onset and decrease the growth. Bottom middle, pro-vaccination population B is predicted to overcome the anti-vaccination population, or persuade the undecided population, X, within a given network patch in time T by using Fig. 1a to identify and then engage with all the clusters. Bottom right, the link dynamics can be manipulated to prevent the spread of negative narratives. See Supplementary Information for all mathematical details.

Fifth, anti-vaccination clusters show the highest growth during the measles outbreak of 2019, whereas pro-vaccination clusters show the lowest growth (Fig. 1c). Some anti-vaccination clusters grow by more than 300%, whereas no pro-vaccination cluster grows by more than 100% and most clusters grow by less than 50%. This is again consistent with the anti-vaccination population being able to attract more undecided individuals by offering many different types of cluster, each with its own type of negative narrative regarding vaccines.

Sixth, medium-sized anti-vaccination clusters grow most. Whereas larger anti-vaccination clusters take up the attention of the pro-vaccination population, these smaller clusters can expand without being noticed. This finding challenges a broader theoretical notion of population dynamics that claims that groups grow though preferential attachment (that is, a larger size attracts more recruits). Therefore, a different theory is needed that generalizes the notion of size-dependent growth to include heterogeneity (Fig. 3b).

Seventh, geography (Fig. 1b) is a favourable factor for the anti-vaccination population. Anti-vaccination clusters either self-locate within cities, states or countries, or remain global. Figure 1b shows a small sample of the connectivity between localized and global clusters. Any two local clusters (for example, two US states) are typically interconnected through an ether of global clusters and so feel part of both a local and global campaign.

The complex cluster dynamics between undecided, anti-vaccination and pro-vaccination individuals (Figs. 1, 2) mean that traditional mass-action modelling19 cannot be used reliably for predictions or policies. Mass-action models suggest that given the large pro-vaccination majority (Fig. 1d), the anti-vaccination clusters should shrink relative to pro-vaccination clusters under attrition, which is the opposite of what happened in 2019. Figure 3a shows the importance of these missing cluster dynamics using a simple computer simulation with mass-action interactions only between clusters, not populations. The simulation reproduces the increase in anti-vaccination support in 2019, and predicts that anti-vaccination views will dominate in approximately 10 years (Fig. 3a). These findings suggest a new theoretical framework to describe this ecology, and inform new policies that allow pro-vaccination entities, or the platform itself, to choose their preferred scale at which to intervene.

If the preferred intervention scale is at the scale of individual clusters (Fig. 3b), then Fig. 1a can identify and target the most central and potentially influential anti-vaccination clusters. Our clustering theory (see Supplementary Information) predicts that the growth rate of an influential anti-vaccination cluster can be reduced, and the onset time for future anti-vaccination (or connected undecided) clusters delayed, by increasing the heterogeneity within the cluster. This reduces parameter F of our theory, which captures the similarity of pairs of engaged individuals N in a particular narrative. The anti-vaccination (or connected undecided) cluster size C(t) is reduced to C(t) =N(1 − W([−2Ft/N]exp[−2Ft/N])/[−2Ft/N]) where W is the Lambert function20, and the delayed onset time for a future nascent anti-vaccination (or connected undecided) cluster is tonset = N/2F. If instead the preferred intervention scale is at the scale of network patches (single or connected; Fig. 3b), our theoretical framework predicts that the pro-vaccination population (B) can beat the anti-vaccination population or persuade the undecided population (X) within a given network patch S over time T by using Fig. 1a to identify and then proactively engage with the other clusters in S, irrespective of whether they are linked or not:

$$T=\frac{1}{{x}_{{\rm{c}}}(2-{d}_{{\rm{B}}}-{d}_{{\rm{X}}})}\left[4X+(B-X)\mathrm{ln}\,\frac{X(B-X+{x}_{{\rm{c}}})}{{x}_{{\rm{c}}}B}\right]$$
(1)

where dB and dX are rates at which the activity of an average cluster becomes inactive (for example, no more posts in the cluster), and B and X are the current total sizes of the respective populations21. If instead the preferred intervention scale is the entire global ecology (Fig. 1a), this framework predicts the condition rlinkp/rinactiveq < 1 to prevent the spreading of negative narratives22 (Fig. 3b), where rlink and rinactive are the rates at which links are formed and become inactive between sets of clusters; p is the average rate at which a cluster shares material with another cluster and q is the average rate at which a cluster becomes inactive. Conversely, rlinkp/rinactiveq > 1 predicts the condition for system-wide spreading of intentional counter-messaging. As p and q are properties related to a single average cluster and are probably more difficult to manipulate, the best intervention at this system-wide scale is to manipulate the rate at which links are created (rlink) and/or the rate at which links become inactive (rinactive).

Finally, we note that our analysis is incomplete and that other channels of influence should be explored. However, similar behaviours should arise in any online setting in which clusters can form. Our mathematical formulae are approximations. We could define links differently, for example, as numbers of members that clusters have in common. However, such information is not publicly available on Facebook. Furthermore, our previous study of a Facebook-like platform for which such information was available showed that the absence or presence of such a link between pages acts as a proxy for low or high numbers of common members. How people react to intervention is ultimately an empirical question23,24. One may also wonder about external agents or entities—however, clusters tend to police themselves for bot-like or troll behaviour. The crudely power law-like distribution of the cluster sizes of anti-vaccination clusters suggests that any top-down presence is not dominant.

Methods

We used clusters (Facebook pages) as the unit for our analysis17,18. Our cluster approach does not require any private information of individuals. The ForceAtlas2 layout of Gephi (Fig. 1a) simulates a physical system in which nodes (clusters) repel each other while links act as springs. It is colour-agnostic, that is, the colour segregation in Fig. 1a emerges spontaneously and is not in-built. Nodes that appear closer to each other have local environments that are more highly interconnected, whereas nodes that are far apart do not. Our data collection uses the same cluster snowballing methodology as described previously17,18, that is, a combination of automated processes and human subject-matter analysis. Each cluster (Facebook page) directly receives the feed of narratives and other material from that page and all members (fans) can engage in the discussions and posting activity. Figure 1b uses the declared location of each cluster. Derivations of the equations are provided in the Supplementary Information; they build on published results20,21,22 and our approach complements other studies25,26,27,28,29,30,31,32,33. Equation (1) is easily generalizable, but for simplicity we assume here a minimal model in which each pro-vaccination cluster has a narrative that persuades on average xc members of each cluster X in each engagement, and the pro-vaccination cluster B picks a cluster X randomly within S. Equation (1) also applies to the full anti-vaccination–undecided ecology if we take the X-related quantities in equation (1) as weighted anti-vaccination–undecided values from Fig. 1a. The formula rlinkp/rinactiveq < 1, to prevent spreading, accounts for the key feature of cluster interconnections that change over time and can be applied to spreading between anti-vaccination clusters, between undecided clusters, or between both anti-vaccination and undecided clusters using weighted values. For the model in Fig. 3a, rates of cluster interaction are given to the first order by the relative number of links of each type with Y-undecided interactions that yield more recruits for Y when Y is anti-vaccination than when Y is pro-vaccination (see Supplementary Information). The fidelity of these predictions is affected by the approximations of the model. For Fig. 3b, all parameters can be extracted from data or estimated from simulations. In the top left graph of Fig. 3b, two dimensions are shown for simplicity, for example, the degree of belief in government conspiracy and the degree of belief in alternative health, but similar plots emerge for other numbers of dimensions. In the bottom middle graph of Fig. 3b, the total initial size B (pro-vaccination population) plus size X (for example, anti-vaccination population) is kept constant. Although this leaves open the details of the conversion process for each X cluster, a previous study30 has shown that such conversion within an online cluster occurs and can be rapid. T for mass-action theory would tend to decrease monotonically as B increases; however, our theory in equation (1) shows a counterintuitive dependence because smaller but finite numbers of X clusters take the pro-vaccination clusters longer to find. Only functional forms are shown (that is, no numbers) as the underlying formulae and models are not restricted by specific numerical choices of parameter values.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.