A calibrated measure to compare fluctuations of different entities across timescales

A common way to learn about a system’s properties is to analyze temporal fluctuations in associated variables. However, conclusions based on fluctuations from a single entity can be misleading when used without proper reference to other comparable entities or when examined only on one timescale. Here we introduce a method that uses predictions from a fluctuation scaling law as a benchmark for the observed standard deviations. Differences from the benchmark (residuals) are aggregated across multiple timescales using Principal Component Analysis to reduce data dimensionality. The first component score is a calibrated measure of fluctuations—the reactivity RA of a given entity. We apply our method to activity records from the media industry using data from the Event Registry news aggregator—over 32M articles on selected topics published by over 8000 news outlets. Our approach distinguishes between different news outlet reporting styles: high reactivity points to activity fluctuations larger than expected, reflecting a bursty reporting style, whereas low reactivity suggests a relatively stable reporting style. Combining our method with the political bias detector Media Bias/Fact Check we quantify the relative reporting styles for different topics of mainly US media sources grouped by political orientation. The results suggest that news outlets with a liberal bias tended to be the least reactive while conservative news outlets were the most reactive.


Additional dataset info
Example timeseries can be found in Fig. S1. Numbers of articles and sources as well as values of the TFS scaling exponent for selected time window sizes can be found in the Tab. S1.

Dataset cleaning 2.1 Types of erroneous timeseries
For effective aggregation of TFS residuals, it was necessary check for outliers. We identified two types of outliers corresponding to two common ER crawler malfunctions: 1. unusually many of a given publisher's articles annotated with a given concept in a very short period (typically from a few minutes to a few hours). This is typically due to changes in website layouts that confuse a crawler in one of two ways: causing it to misidentify a batch of old articles as new, adding them with a new date or corrupting text extraction to include parts of other articles or unrelated texts, causing incorrect semantic annotations.
Timeseries corrupted in this way have abnormally increased R(∆) for each ∆ as there is only a slight dependence on ∆ in Eq.1. The most significant terms are constant and 1. 2. Missing data in some period; the most common reason is the lack of functional crawler (not created yet or because of a layout change).
Timeseries corrupted in this way have abnormally increased R(∆) for higher ∆ (because then α > 0.5).
Examples of timeseries with type 1 and type 2 errors can be found in Fig. S1. Detailed theoretical and numerical consideration of these two models can be found below.

Toy models for erroneous timeseries
In the main paper, we defined a TFS residual as: R k s (∆) = log 10 y s (∆) 10 β k (∆) x s (∆) α k (∆) = log 10 y s (∆) − α k (∆) log 10 x s (∆) − β k (∆) (1) Let f (q,∆) be a timeseries with a mean µ and a variance σ 2 . We will now use it to construct toy models of the two mentioned corrupted timeseries and check their influence on the residual values: 1. In the first scenario a large number of articles is added in a short period. Assuming that the period is shorter than the shortest windows ∆, we propose a toy model where one element is increased by some large value g (q=q 1 ,∆) = f (q=q 1 ,∆) +V and other elements are unchanged: g (q =q 1 ,∆) = f (q,∆) .
This causes the mean value to change (introducing δ = ∆ T -a ratio of a time window size to the total observed period, and simplifying the notation with f = f (q,∆) and g = g (q,∆) ): Similarly: Thus the variance: Knowing that V > 0: Putting Eqs.2 and 6 to Eq. 1, we have: Timeseries corrupted in this way have abnormally increased R(∆) for each ∆ as there is only a slight dependence on ∆ in the last equation. The most significant terms are constant and >> 1.
2. In the second scenario, for a period of time no articles are added. A toy model for this case would be a timeseries f (q,∆) with a fraction p of values is replaced with zeros: h (q<q 2 ,∆) = 0 where q 2 = pT . Then the mean (introducing h = h (q,∆) ): Assuming the squared values of f for q ≤ q 2 and q > q 2 are equally distributed: Hence the variance: Putting Eqs. 9 and 12 to Eq. 1, we have: Timeseries corrupted in this way have abnormally increased R(∆) for higher ∆ (because then α > 0.5).
We checked the correctness of our toy models by modifying original timeseries with one of the two disturbances (activity spike or periods of missing data) and comparing it to results predicted by our calculations. Visualizations of both models with different parameters - Fig. S2. For the spike model, the results are almost the same as predicted. For the missing data model, there are minor deviations from the theory caused by a violation of the assumption of equal activity distributions in the whole timeseries. . Effects of two types of disturbances (activity spike, periods of missing data) on a source position in TFS plots. Black points are original (filtered) data, colored points are activity timeseries modified with one of the two models with various parameters, black curves are theoretical fits for the same range of parameters values.

Applying the models to the clean dataset
As a result of the discussion above, it is clear that considering corrections in only one timescale might be insufficient as the correction can vary for different time windows. To properly identify sources with corrupted timeseries, we performed Principal Component Analysis of a matrix of corrections calculated for all sources. For a keyword k, we constructed a matrix M k with rows representing news outlets s and columns standing for corrections for each ∆. In Fig. S3 there are visualizations of contributions of corrections for each ∆ to first four principal components (matrix Π), and a cumulative explained variance ratio by PCs. The results are very similar in all cases. The first four PCs explain more than 95% of the variance. The first PC has approximately equal contributions from correction in all the analyzed time window sizes; in the second PC, contributions from the short time window corrections have an opposite sign to contributions from the long time windows. As the experimental composition of the two PCs is coincident to the effects described above, we interpret sources with a high absolute PC1 value as type-1 corrupted, and sources with PC2 values associated with high positive values of corrections in longer time windows as type-2 corrupted (sources with oppositely signed PC2 values tend to add many articles in very short time periods which we suspect to reflect rather a publishing strategy than an effect of crawler malfunction). However, erroneous timeseries cannot be perfectly separated from correct timeseries using the method and so the filtering needs to be performed using threshold values. Exact identification of timeseries with errors was time-consuming as it often required analyzing positions of unexpected peaks and/or browsing articles in a suspected period to evaluate whether a change in the timeseries character was a result of the errors described or an unconventional publisher's behavior. Thus we identified publishers with corrupted timeseries by an inspection of randomly sampled timeseries with gradually less extreme values of PC1 and/or PC2. Experimentally, we found that a threshold value 2.5 (PC1 for type 1 errors, PC2 for type 2 errors) was guaranteeing satisfying precision and recall trade-off. We discarded publishers with PC1 or PC2 above this value and rerun the calculations of TFS exponents and residuals to account for changes in results caused by the filtering procedure.

6/24 3 Temporal Fluctuation Scaling in the Event Registry dataset
We have reported the fit of the TFS for online news outlet activities in our previous paper. While outliers have only a slight impact for the value of the TFS exponent, here we report results which heavily rely on the correctness of the dataset. Because of this we developed a method to discard news outlets with corrupted activity timeseries -see above. In the Fig. S4, there is an example TFS plot for news outlet publishing around the topic China for the raw data (top-left) and for the cleaned data (top-right) with a time window size ∆ = 1 day. The number of publishers with abnormally high activity variance is significantly reduced for the cleaned data. Moreover, two plots in the bottom panels of Fig. S4 show TFS residuals for raw (bottom-left) and cleaned (bottom-right) datasets. Comparing figures in the two top panels shows that outliers tend to drift towards higher fluctuations, as expected. Comparing the two bottom plots suggests that removing the outliers not only improves the linear fit (R 2 ≈ 0.85 for raw data; R 2 ≈ 0.95 for cleaned data) but also causes the cloud of points to be symmetrical with respect to the OX axis. TFS exponents for each concept for three selected time window sizes (∆ ∈ {1 hour, 1 day, 30 days}) can be found in Tab. S1. The dependence of the TFS exponent α k on a time window size ∆ is the same as in the previous study and can be found in Figs. S4, S5, and S6.  Other keywords exponents  Figure S6. Dependence of the TFS exponent α on time window size ∆ for other keywords (not related to countries). X-axis is ∆, Y-axis is α.

8/24 4 Aggregated correlation matrices
Matrices from Fig. 3 and 5 from the main text were averaged over keywords and time window sizes to observe trends. When aggregating by concept groups, for a given pair (k 1 , k 2 ), a value in the averaged matrixK was calculated as follows: When aggregating in time window size groups, for a pair (∆ 1 , ∆ 2 ), a value in the averaged matrixT was calculated as follows: For the window size groups (D short/medium/long ), the values were calculated as follows:    The averaging procedure was performed as in Eq. 16.  PHGLDQUHDFWLYLW\C c (k) Figure S12. The median reactivity C c (k) of the reactivity for all sources from country c for keywords k related to concepts related to selected African and European countries. The square size is proportional to the total mean daily number of articles published by sources from country c on topic k; color represents the median reactivity C c (k) = RA s (k) s∈S c of sources from the country c. Missing squares indicate that there were no publishers from a country that published at least 36 articles on a topic in our dataset. Red symbols correspond to topics that were reactively discussed in the country in 2018.  Figure S14. The median reactivity C c (k) of the reactivity for all sources from country c for keywords k related to selected concepts related to popular culture. Square size is proportional to the total mean daily number of articles published by sources from country c on topic k; color represents the median reactivity C c (k) = RA s (k) s∈S c of sources from the country c. Missing squares indicate that there were no publishers from a country that published at least 36 articles on a topic in our dataset. Red symbols correspond to topics that were reactively discussed in the country in 2018. PHGLDQUHDFWLYLW\C c (k) Figure S15. A median reactivity C c (k) of the reactivity for all sources from country c for keywords k related to other concepts selected for the study. Square size is proportional to the total mean daily number of articles published by sources from country c on topic k; color represents a median reactivity C c (k) = RA s (k) s∈S c of sources from country c. Missing squares indicate that there were no publishers from a country that published at least 36 articles on a topic in our dataset. Red symbols correspond to topics that were reactively discussed in the country in 2018.  Figure S16. Spearman correlation matrix of variability measures. The median value of each measure was calculated for each country-keyword combination over all publishers from a country. Correlations were calculated over all possible country-keyword combinations. daily activity -total number of articles, median RA -median reactivity, median std -median of standard deviation (σ ), median fano -median of the Fano factor ( σ 2 µ ), median coefv -mean coefficient of variation ( σ µ ).  Figure S17. Hierarchical clustering of the correlation matrix of coefficients of variation for all concepts and time window sizes. Each row/column stands for one (k, ∆) combination. A color axis for the correlation matrix is on the right hand side. Additional color labels on top of the matrix represent time window size ∆ (lightest -5 minutes, darkest -30 days); color labels on the left side of the matrix represent different keywords k. Residuals for short (below 1 hour) and long (over 1 hour) time windows are clustered together; inside the two clusters, residuals for the most of keywords tend to be close to each other. Each row/column stands for one (k, ∆) combination. A color axis for the correlation matrix is on the right hand side. Additional color labels on top of the matrix represent a time window size ∆ (lightest -5 minutes, darkest -30 days); color labels on the left side of the matrix represent different keywords k. Residuals for short (below 1 hour) and long (over 1 hour) time windows are clustered together; inside the two clusters, residuals for the most of keywords tend to be close to each other.  Figure S23. Comparison of relative median of measures between bias groups for all keywords.. top left -reactivity RA, top right -standard deviation σ , bottom left -coefficient of variation σ µ , bottom right -Fano factor σ 2 µ . Top panels: median value of the measure for all keywords by political bias. Bottom panels: results of pairwise Dunn median tests with a two-step Benjamini-Krieger-Yekutieli FDR adjustment; for the rest of caption -see the main text.