The drivers of global news spreading patterns

The web radically changed the dissemination of information and the global spread of news. In this study, we aim to reconstruct the connectivity patterns within nations shaping news propagation globally in 2022. We do this by analyzing a dataset of unprecedented size, containing 140 million news articles from 183 countries and related to 37,802 domains in the GDELT database. Unlike previous research, we focus on the sequential mention of events across various countries, thus incorporating a temporal dimension into the analysis of news dissemination networks. Our results show a significant imbalance in online news spreading. We identify news superspreaders forming a tightly interconnected rich club, exerting significant influence on the global news agenda. To further investigate the mechanisms underlying news dissemination and the shaping of global public opinion, we model countries’ interactions using a gravity model, incorporating economic, geographical, and cultural factors. Consistent with previous studies, we find that countries’ GDP is one of the main drivers to shape the worldwide news agenda.


Introduction
The rapid evolution of the web has profoundly transformed the global dissemination of news.Nowadays, news flows incessantly, with media outlets promptly reporting unfolding events, and anyone can access this real-time information ecosystem.Furthermore, the Internet has intensified competition among news providers 1,2 as they strive to capture users' attention, contending with both traditional media and online news sources 3 .Advancements in data science have unlocked new possibilities for analyzing vast amounts of news content 4,5 with extensive research efforts devoted to studying the impact of social media platforms on news dissemination [6][7][8] .A particular emphasis has been posed on their influence on misinformation 9,10 and polarization [11][12][13] .In this dynamic environment, news spreads rapidly across the web, reaching numerous outlets in different countries and potentially shaping the news agendas of other nations 14,15 .Nevertheless, the reach and influence of the global news network vary significantly from one country to another.Factors such as culture, government censorship, language barriers, and digital division can impact news exchange between nations 16 .Therefore, gaining a comprehensive understanding of how news circulates on the Internet yields valuable insights into the economic and cultural influences among countries.It allows for characterizing news spreading routes, providing insights into the interconnected relationships and interdependence between different nations.Previous studies 17,18 have shed light on the key factors influencing news dissemination, including economic strength, geographic location, and cultural elements.These factors are pivotal in shaping the pathways through which news flows across countries.Additionally, studies [19][20][21][22][23] have revealed hierarchical structures and clustering patterns that are driven by economic growth, language, and political freedoms.Such findings emphasize the complex dynamics in the global news network, where certain countries or regions may have stronger connections due to shared economic interests, linguistic affinities, or similar political environments.By analyzing the patterns of news circulation, we can map the intricacies of global information exchange, gaining insights into how countries interact and influence each other through news dissemination.These insights contribute to a deeper understanding of the multifaceted relationships in the modern interconnected world.In this study, we aim to construct a comprehensive global network of news flows by leveraging the Global Database of Events, Language, and Tone (GDELT) for 2022 24,25 .Our analysis focuses on the order in which news spreads within media outlets, moving beyond a simple examination of the frequency of countries' names in news articles.By studying the sequential mentions of events across countries, we aim to map the intricate interactions and reciprocal influences among nations in disseminating news.To achieve this, our methodology involves identifying the initial source of each news event and tracing subsequent mentions of the event by other countries.By doing so, we can unravel the paths through which news travels across the global media landscape.This approach allows to gain insights into how news events unfold, propagate, and are shared among countries, shedding light on the dynamics of news dissemination and the interconnectedness of nations in shaping the global information landscape.We leverage a massive dataset of 140 million news articles from 37,802 domains across 183 countries sourced from shows how many times a country served as the initial news spreader of the event and the maximum virality of events spread by each country.The inset in (b) reports the virality distribution (i.e. the number of unique countries an event gets mentioned in) for all events regardless of the initial spreader country.
GDELT.We find a significant skewness and inequality in the resulting graph, representing the interconnectedness of news dissemination.Specifically, we observe a small set of dominant countries that form a well-connected network, indicating their influential role in disseminating news globally and shaping the global public agenda.In pursuit of a deeper comprehension of the intricate dynamics underlying the patterns of news diffusion across countries, we use a gravity model.This methodological approach allows us to unravel the fundamental drivers of the observed skewed distribution.Our analysis reveals two primary determinants that significantly influence this phenomenon: the gross domestic product (GDP) and the geographic proximity of the countries involved.Notably, nations endowed with larger GDPs and closer geographical proximity demonstrate heightened interconnections and wield substantial influence within the global news network.Furthermore, we acknowledge the salient role common languages play, which actively shapes the flow of news across nations.

News and Countries
To assess the extent of news outlet activity across different countries, we rely on the Source-Country dataset, as detailed in the Methods section, which allows us to identify and analyze news outlets alongside their respective news articles.To visualize the relationship between the number of news outlets and the corresponding quantity of news articles published by each country, we present Figure 1(a).This graphical representation depicts the interplay between these two variables and shows that countries with more news outlets tend to publish more news articles, following approximately a power law.We define the initial spreader(s) as the country or countries that first mention the event.Since the GDELT dataset has a time granularity of 15 minutes, multiple countries can be identified as the initial spreaders for a single event.This is because two or more news outlets may have posted an article within the same 15-minute window.
For each event, we consider the number of countries in which that event was mentioned.We call the distribution of these measures the virality distribution.Moreover, we define the spreader score for a country i as the number of times i belongs to the set of initial spreaders of an event.Figure 1(b) depicts the spreader score of a specific country plotted against the maximum number of countries reached by events published by that country.Figure 1(b) indicates that the United States (US) emerged as the initial top spreader in 2022, significantly surpassing other countries.The United Kingdom (UK) and India (IN) followed closely behind.Moreover, we deduce that countries with higher spreader scores are more likely to have their events go viral at least once.The inset plot in Figure 1 illustrates the overall distribution of virality, suggesting that most events tend to circulate within their countries.

Structural Analysis on News Spreading
In this section, we aim to quantify the heterogeneity in the news-spreading process.We construct a directed weighted network G = (V, E) using the GDELT data to achieve this.The set of countries, denoted as V , represents the nodes in the network.
A directed edge (i, j) ∈ E is created if country j mentions an event successively after country i.The weight associated with each edge, denoted as w i j , represents the number of times this sequential mentioning occurs.Please refer to the Methods section for more detailed information on this construction.The resulting network consists of 183 countries and 23,033 edges.The network's density, which indicates the proportion of existing connections compared to all possible connections, is 0.69.This value suggests that relatively few pairs of countries lack an exchange of information.Figure 2 illustrates the relationship between in-degree and out-degree, as well as in-strength and out-strength within the network.The degree distributions, both in-degree, and out-degree, reveal that most countries in the network have many connections.This indicates that countries tend to be mentioned by many other countries (in-degree) or mention many other countries (out-degree) at least once in the news-spreading process.It suggests that countries are actively involved in disseminating news and have significant interactions with other countries.The in-strength and out-strength distributions exhibit a similar pattern but on a logarithmic scale.This indicates that while most countries have a relatively high level of involvement in receiving or transmitting information, a few stand out with exceptionally high levels of strength in news dissemination.These countries play a prominent role in spreading the news, indicating a concentration of influence in the network.An interesting observation is a high correlation between the in-degree and out-degree (ρ = 0.99, p < 2.2 × 10 −16 ).This can be justified by the high reciprocity (i.e.proportion of links in both directions) exhibited by the network, with a value of 0.91.Furthermore, when comparing the weights of mutual edges, they tend to have similar values.This implies that the extent of news diffusion between countries is well-balanced, as indicated by the similarity in the weights of reciprocal edges.Please refer to Figure 5 in the supplementary information for additional details and visualizations.
To examine the organizational principles of the network, we analyze its topology concerning the associated weights.In line with previous studies 26 , we compare the unweighted and weighted clustering coefficients to gain insights into the correlation between weights and network topology.Specifically, we define C w (s) as the weighted directed clustering coefficient 27 and C(s) as its unweighted counterpart, both averaged over nodes with a strength of s (see Methods for details about the coefficients).Figure 3(a) shows the results of the comparison of C w (s) and C(s) as the strength increases.The weighted directed clustering coefficient C w (s) is approximately equal to the unweighted clustering coefficient C(s) for low-strength nodes.However, C w (s) takes on higher values for high-strength nodes than C(s).This finding suggests that while higher-strength nodes tend to form fewer triangles in terms of network topology, the triangles they do form are more likely to consist of edges with higher weights.In other words, there is a core group of countries where information flows preferentially, contributing to higher weighted clustering coefficients.This indicates the presence of strong interconnectedness and concentrated information exchange within this core group of countries.This is also suggested by Figure 6  detecting a rich-club phenomenon 28 .A rich club refers to a group of prominent and tightly interconnected nodes that exert control over the flow of information in the network 26,[29][30][31] .Figure 3 compares the normalized unweighted and weighted rich-club coefficients (refer to the Methods section for further details).The normalized rich-club coefficient measures the extent to which high-degree nodes tend to be more interconnected with each other compared to what would be expected by chance taking into account an appropriate null model.Consistent with the previous findings, we observe that only a strong weighted rich-club ordering (with the coefficient significantly higher than 1 for high-degree nodes) is evident, confirming our earlier hypothesis.Specifically, the eight largest countries (according to their strength) within the rich-club are the United States (US), the United Kingdom (UK), Canada (CA), India (IN), Russia (RS), Germany (GM), France (FR), and Ukraine (UP).These countries play a prominent role in the weighted network, forming a core where information flow is significantly concentrated and interconnected.Contrary to expectations, it is noteworthy that Ukraine belongs to the rich-club.This is likely due to the extensive coverage of events concerning the war between Russia and Ukraine in 2022, contributing to their prominent position in the network.All of the previous analyses indicate that only a small set of countries is actively involved in news diffusion, while the participation of other countries is relatively minimal.However, it is also interesting to examine the individual role of each country in the network.To accomplish this, we employ the HITS algorithm 32 and compare the Authority and Hub scores of each node in Figure 4, utilizing the weights of the edges to represent the intensity of interaction.The color of the nodes The plot confirms that the United States (US), the United Kingdom (UK), Canada (CA), and India (IN) have a significant influence on news diffusion.Specifically, the United States emerges as the primary hub in the network, indicating its central role in disseminating news.Conversely, the United Kingdom assumes the primary authority position, suggesting its influence in shaping the information landscape.Canada and India play a mixed role, exhibiting authority and hub characteristics to some extent.However, it is important to note that the global domination of news diffusion conversation is primarily attributed to the United Kingdom and the United States.These countries play central and influential roles in disseminating news on a global scale.

Heterogeneity of diffusion and gravity model
In the previous section, we demonstrated that only a small group of countries actively participate in the dissemination of information, indicating a high degree of heterogeneity in how news is spread.To further quantify this aspect, we proceed by computing transition probabilities associated with each edge., i.e.
The transition probability p i j represents the proportion or percentage of information produced by country i that flows to the country j.We are primarily interested in understanding if the observed behavior could emerge randomly.In Figure 4(b), we calculate the disparity index γ i (k) 33 (see Methods) and display it.The blue line indicates the maximum value that γ i (k) can assume.The values observed in Figure 4(b) indicate that news diffusion heterogeneity increases with the node's out-degree.This suggests that news diffusion does not occur randomly or uniformly across the network.If random, we would expect a more consistent behavior, resulting in γ i (k) ≈ 1 for any given k.The increasing values of γ i (k) concerning the out-degree imply that certain countries play a more influential role in news diffusion than others.This further supports the notion that news spreading is characterized by heterogeneity, where some countries have a more significant impact and influence in disseminating information than others.
We propose using a gravity model (GM) applied explicitly to news dissemination to investigate the factors influencing the observed behaviour in news diffusion.This approach has been employed in some prior studies, albeit with slight variations in network considerations 23 .For our analysis, we utilize the CEPII gravity dataset 34 , which provides comprehensive economic data on international trade and various aspects of the global economy.However, since this dataset is updated only until 2020, we merge it with GDP data from 2022 obtained from the International Monetary Fund (IMF) 35 .Doing so, we aim to incorporate relevant economic factors into the gravity model for news diffusion.This approach will allow us to examine how economic variables, such as GDP, influence news flow between countries.Our considered model is: where: 1. GDP i is the GDP of country i measured in billions of dollars; 2. d i j is the distance between the most populated cities of the two countries i and j, measured in km; 3. C i j is a dummy variable equal to 1 when country i and j share a common border; 4. L i j is a dummy variable equal to 1 when country i and j share a common language; 5. S i j is a dummy variable equal to 1 when country i and j share a common language spoken by at least 9% of the population; Note that our dependent variables are the logit of p i j , namely i.e. the odds of i continuing the flow in country j.To ensure the assumptions of our regression procedure are satisfied and to account for the potential correlation of errors within countries or country pairs, we employ Ordinary Least Squares (OLS) with robust clustered standard errors.Although estimating the parameter β k , where k ∈ 1, . . ., 6, using standard OLS is common, it may underestimate the actual variance of the parameters 36 .This underestimation is due to the potential correlation of errors within the country, such as the correlation between country pairs.To address this issue, we use OLS with robust clustered standard errors.This involves specifying a clustering variable (in our case d i j ) that independently identifies each country pair, regardless of the direction.By employing robust clustered standard errors, we can ensure that all assumptions are satisfied and obtain more reliable estimates of the parameters.After merging the CEPII and IMF datasets, we obtain a network of 172 nodes and 20468 edges.).This difference can be attributed to the fact that news diffusion does not have the same tangible cost associated with it as in trade.News can travel more easily and quickly across long distances without incurring significant transportation or logistical expenses.Thus, the impact of geographical distance on news diffusion is somewhat attenuated compared to its impact on trade flows.Additionally, the results indicate that economic power plays a role in news diffusion.Countries with higher GDP are more likely to be the target of the news flow, while they are less likely to be the source of the flow, although this effect is relatively weak (β 1 ≈ −0.13).
Regarding cultural factors, the study reveals that the presence of a shared language has a substantial influence on the transmission of news.This is substantiated by the positive and statistically significant coefficients assigned to the language variables (L i j , S i j ) in the model.In summary, the analysis indicates that several factors, including geographic proximity, economic power, and cultural aspects like language, play a pivotal role in shaping the dynamics of news diffusion among nations.These findings emphasize the significance of considering these factors when examining the patterns of information flow on an international scale.

Conclusion
We adopt a novel approach by utilizing a network-based analysis and a gravity model to investigate the organizational structure and primary drivers of news dissemination in the digital realm.In contrast to prior research, our study incorporates a temporal dimension into the analysis of news diffusion networks, leveraging a vast dataset provided by GDELT.By analyzing this dataset, we unveil a network that exhibits robust interconnections and reciprocal relationships, highlighting the presence of a core group of nations that hold significant influence in the dissemination of information.
Through the usage of a gravity model, we have uncovered which parameters influence the observed behavior, namely the economic power, the geographic location, and the usage of the same language.
Possibly limitations of our work regard the model of diffusion that we use.In fact, an edge (i, j) between two countries does not generally represent a direct influence between country i and j, i.e. we are not sure that j mention the event because i did.Moreover, the Multilingual Source-Country dataset provides only an approximation of the geographic country of origin of news outlets.
Although our study uses a different dataset and modeling approach, and despite potential data limitations related to the ongoing Russia-Ukraine conflict, our findings align with prior research on news distribution.These results suggest that the core group of countries that play a prominent role in news distribution remains relatively stable over time.It underscores the continued relevance of economic, geographic, and cultural factors in shaping news dissemination in the digital age.

GDELT Dataset
We used the GDELT 2.0 to collect online news articles from January 1 to December 31, 2022, with a 15-minute update resolution.The Global Database of Events, Language, and Tone (GDELT) is an independent, non-profit project providing a vast, updated global news and events database in different languages.The dataset covers news articles written in English and 65 other languages, allowing us to delve deep into non-Western media.The data in GDELT 2.0 is organized into three main tables: Events, Mentions, and the Global Knowledge Graph (GKG).The Events table lists events with related general information.The Mentions table instead contains each news article referring to an event from the Events table.Finally, the GKG table provides detailed information about each article's actors, emotions, and themes.We used the Mentions table to track a story's trajectory and network structure as it flowed through the global media system from country to country.In the Mentions table, each mention (i.e., news article) of an event is given its entry in the dataset.This means an event mentioned in 100 news articles will be listed 100 times in the Mentions table.The dataset resulting from this collection, described in Table 2, consists of 140 million news articles referencing 51 million unique events.To ensure data accuracy, we excluded news articles from domains not present in the Source-Country dataset, resulting in 125 million news articles and their respective country of origin.We explain GDELT's event identification process in the supplementary information.

Multilingual Source-Country Dataset
This dataset (gdelt-bq.extra.sourcesbycountry)estimates the country of origin for major online news outlets monitored by GDELT, using the primary geographic focus of the outlet over the monitored time frame.GDELT's documentation mentions challenges related to outlets with insufficient coverage volume or geographic emphasis and regional wire services with bureaus in particular countries 37 .Despite the challenges, the dataset is considered a reasonable approximation of the geographic country of origin for these news outlets and is available on GDELT's website.

Network construction
Starting from our collected data, for each event k we consider a directed weighted graph G k = (V k , E k ).In particular, V k is the unique set of countries that mention k, while (i, j) ∈ E k if country j mentions k successively after country i.In more details, consider two successive times t i ,t i+1 (we have t i+1 − t i ≥ 15 min) and let C k (t i ),C k (t i+1 ) be the set of countries that mention k at time t,t + 1 (respectively).If T k is the lifespan of event k, we have that Since it is possible to have multiple edges, we associate to each of them a weight w k i, j that represents their multiplicity.This procedure results in a collection of graphs G k , one for each event.We obtain a final network G = (V, E) defined as and w i j = ∑ k w k i j , i.e. we overlap all the networks associated with each event.Thus, G is a directed weighted network in which an edge between (i, j) means that j mention an event successively after i a number of times equal to w i j .To focus on the relationship among countries, we delete all the loops, i.e. edges that start and end in the same country, obtaining a simple graph.

Weighted Clustering Coefficient
Clustering coefficients are widely used in the network science literature, with the aim of studying the interconnectedness of nodes' neighbours.However, fewer measures have been proposed for weighted directed networks.Since one of our aims is to study the organization of news diffusion network, we decided to consider the Clustering Coefficient proposed by Clemente and Grassi 27 , which can be defined as follows where A, W are the unweighted and weighted adjacency matrices of the network, s i is the strength of node i and s ↔ i is defined as the strength of bilateral arcs, i.e.
The numerator of (4) takes into account all directed triangles that node i actually forms with its neighbours, weighted with the average weight of the links connecting a node i to its adjacent j and k.The denominator counts the total number of directed triangles that it could form, taking weights properly.

Rich-club
Rich-club refers to a network subgraph made up of the most prominent nodes (from a topological or non-topological point of view) being highly interconnected with each other.Many studies report that these nodes tend to be important for the overall structure and function of the network 29,[38][39][40] .In more detail, given a weighted graph G = (V, E) with binary adjacency matrix A and weighted adjacency matrix W (W i, j ∈ (0, +∞)), the topological (i.e.unweighted) measure that allows to detect a rich-club with respect to nodes of degree k is: where m >k and n >k are the number of links and nodes of the subgraph G >k ⊆ G inducted by nodes with degree higher than k.
To consider weighted networks, the presence of a rich-club must be detected with respect to a certain richness parameter r that allows ranking the nodes with respect to it.Common examples of richness parameters can be degree, strength or other measures of centrality 30,31 .Given such a measure, it is possible to define the weighted rich-club as: where w rank i ≥ w rank i+1 , with i = 1, 2, . . ., |E| are the weights of the links ranked in increasing order.Thus, (6) measures the fraction of weights shared by the richest nodes compared to the total amount they could share using the strongest links of the network.

Interactions between mutual edges
Consider the directed weighted network of news diffusion G.To understand if mutual edges share the same amount of information, we normalize the weights of mutual edges of G using i.e. we compute the transition probabilities between countries.Therefore, if w i j ≈ w ji there is evidence that the flow of information is similar between source and target (and vice-versa).Weight of (A,B) Weight of (B,A) Figure 5 shows that in general, we can deduce that connected countries do not differ much in how they share information, especially for high weights links.

Core of countries
To try to understand how countries are organised in the network, we consider a notion of distance between nodes using the idea that "close" countries are the ones that have more interactions (i.e. a high weight).Therefore, we update the weights of G considering: obviously, this does not correspond to a distance in a mathematical sense (for example, d i j ̸ = d ji in general), but this transformation allows us to compute distances taking into account the closeness of countries in terms of news diffusion.Figure 6 shows a heatmap of the shortest distances computed with the weights d.The heatmap order followed the out-strength value.
Interestingly, countries with high out-strength tend to have low distances in the network.This suggests the existence of a core of countries in which information flows preferentially.

Figure 1 .
Figure 1.Left panel (a) describes the number of associated news outlets and published news articles for each country.Right panel (b)shows how many times a country served as the initial news spreader of the event and the maximum virality of events spread by each country.The inset in (b) reports the virality distribution (i.e. the number of unique countries an event gets mentioned in) for all events regardless of the initial spreader country.

Figure 2 .
Figure 2. (a) relation between in and out-degree and (b) relation between in and out-strength.In both cases, they are highly correlated.

Figure 3 .
Figure 3. (a): Comparison between unweighted and weighted clustering coefficient.The weighted one takes higher values for higher strength.(b): Normalized rich-club (RC) coefficient for each degree value.(c): Normalized weighted rich-club coefficient for each strength value.

Figure 4 .
Figure 4. (a) Comparison between (weighted) Hub and Authority scores.We observe that US dominates as a Hub of the Network, while UK is the biggest Authority.(b) Disparity index γ i (k) versus the out degree of network' nodes.We observe an increasing value of γ i (k) when the out Degree is greater.

Figure 5 .
Figure 5.Comparison between weights of mutual edges.Only a sample of 2000 points is considered for better graphics reasons.

Figure 6 .
Figure 6.Heatmap of shortest paths among 40 countries with the highest out-strength.We use (10) as weights.The countries are ordered using out-strength.

Table 1 .
Result of the GM fitting.The obtained results indicate that our model explains approximately 59% of the variance in the dependent variable (R 2 = 0.59).The findings suggest that news flow tends to occur more frequently between countries that are geographically closer to each other.However, the coefficient for distance in the news diffusion model (approximately -0.54) is lower in absolute value compared to what is typically observed in real trade networks (which is approximately 1