A mathematical perspective on edge-centric brain functional connectivity

Edge time series are increasingly used in brain imaging to study the node functional connectivity (nFC) dynamics at the finest temporal resolution while avoiding sliding windows. Here, we lay the mathematical foundations for the edge-centric analysis of neuroimaging time series, explaining why a few high-amplitude cofluctuations drive the nFC across datasets. Our exposition also constitutes a critique of the existing edge-centric studies, showing that their main findings can be derived from the nFC under a static null hypothesis that disregards temporal correlations. Testing the analytic predictions on functional MRI data from the Human Connectome Project confirms that the nFC can explain most variation in the edge FC matrix, the edge communities, the large cofluctuations, and the corresponding spatial patterns. We encourage the use of dynamic measures in future research, which exploit the temporal structure of the edge time series and cannot be replicated by static null models.


I. INTRODUCTION
Functional connectivity (FC) refers to patterns of statistical dependence in brain activity, such as the blood oxygen level-dependent (BOLD) signal measured via functional magnetic resonance imaging (fMRI). Static FC is traditionally calculated over the course of an entire scan session and it is an established technique of modern neuroimaging [1,2], with individual differences linked to brain disorders and cognitive states [3,4]. On the other hand, time-varying FC refers to time-resolved fluctuations in FC, typically estimated by fitting dynamic models [5][6][7]50] or by sliding windows [8,9]. Differences in time-varying FC at rest are also associated with a wide range of cognitive and behavioural traits, as well as psychiatric and neurological conditions (see [10] for a recent review of this rapidly growing field).
To avoid the temporal blurring caused by sliding windows [8], it is possible to analyse BOLD fluctuations at the resolution of single frames. Coactivation patterns (CAPs) [11] and point-process analyses [12] are early examples of this approach. Initially using a single seed region, both found that that static seed-based FC maps can be reliably approximated by averaging only a few high-amplitude frames. Whole-brain extensions of both methods have soon followed [13,14]. More recently, edgecentric approaches have generated excitement as they also go beyond single seeds and additionally decompose the entire FC matrix into its frame-wise contributions by omitting traditional time averaging over the length of the experiment [15,16]. Such temporal unwrapping of the FC results in a large number of edge time series, each capturing the moment-to-moment cofluctuations of a pair of brain regions. Going one step further, one can measure the similarity between every pair of edges to obtain a large matrix, referred to as edge FC (eFC), as opposed to the traditional node-centric FC (nFC). The eFC is shown to be replicable, stable within individuals across multiple scanning sessions and reliable across datasets [16]. Furthermore, clustering the eFC yields overlapping brain communities that could better suit the study of aspects of cognition and behaviour that transcend traditional disjoint brain parcellations. While CAPs focus on patterns of BOLD activity, edge-centric approaches focus on patterns of cofluctuations between all regional pairs. However, the analogous finding is that the static FC can be faithfully approximated by averaging a few instantaneous connectivity patterns. These are characterised by simultaneous large cofluctuations across all node pairs, measured as the root-sum-of-squares (RSS) over all the edge time series values corresponding to the same frame. Such brief, intermittent, and high-amplitude cofluctuations drive the nFC and the network structure over these time points contributes disproportionately to the overall modularity of the functional brain network [15]. * leonardo.novelli@monash.edu Although the edge-centric decomposition of nFC into its frame-wise constituents is mathematically exact, a comprehensive treatment of the statistical properties of the edge-centric measures is lacking and there is a consensus on the need for appropriate null models [16,17]. A rigorous mathematical study is especially important since several widely-acknowledged publications have warned about the dangers of extracting structure from noise when studying static or time-varying FC, often using minimal null models to reproduce existing results [18][19][20][21][22][23]. The warnings concerning sampling variability are particularly relevant to edge-centric methods as they represent an extreme case of a single-frame sliding window approach. High-amplitude cofluctuations can be observed in temporally-uncorrelated synthetic time series such that accounting only for static spatial correlations is sufficient to replicate key empirical findings [15, Fig. S4]. This observation has been interpreted as further evidence that large cofluctuations are not fMRI artefacts. However, it arguably raises an equally pressing conceptual concern: what information do current edge-centric measures provide beyond the nFC, if any?
Here, we tackle this question mathematically and present a theoretical explanation for the widespread occurrence of large cofluctuations across datasets and why a few large events drive nFC. This explanation rests on fundamental properties of subexponential distributions [24]. Further mathematical derivations clarify how the nFC eigenvalues shape the RSS distribution and how the leading nFC eigenvectors underpin the spatial correlation patterns expressed during high-amplitude events. The influence of functional modules on the eigenvalue distribution could explain why these events disappear when the modular structure is disrupted, as recently reported in [25]. Finally, we analytically show that the eFC matrix, the edge communities, the large cofluctuations, and the corresponding brain activity modes can all be predicted from the nFC without recourse to the edge-centric formulation. Many of these derivations are based on the null hypothesis of i.i.d. Gaussian variables that only takes into account the observed (static) spatial correlations and ignores temporal features. Under this assumption, and invoking results from random matrix theory [26], the edge time series variability is described by the sampling distribution of the nFC, known as the Wishart distribution [27]. Testing the analytic predictions using fMRI data from the Human Connectome Project (HCP) [28] shows that the null model is sufficient to replicate the vast majority of existing edge-centric features both qualitatively and quantitatively, as well as foundational properties of CAPs.

II. RESULTS
We present six main results showing that the existing findings based on edge time series [15,16] can be derived from the static nFC under the null hypothesis of i.i.d. multivariate Gaussian variables that preserve the observed  static spatial correlations but not the temporal ones. The theoretical predictions were empirically tested using a HCP dataset comprising 100 subjects, preprocessed with the current standard HCP pipeline, both with and without global signal regression (GSR). The detailed derivations of the presented equations are in the Methods section, reserving the current section for a concise account of the key results only.
A. The edge FC matrix can be derived analytically from the node FC The eFC was introduced in [16] to quantify linear interactions between edges. For each pair of brain regions, an edge time series is computed as the element-wise product of the two regional z-scored time series. Thus, the values of each edge time series represent the instantaneous cofluctuation magnitudes between the corresponding pair of brain regions. The eFC is the edge-by-edge matrix obtained by computing the inner products between all pairs of edge time series, normalised to the [−1, 1] interval.
Our first result is that the eFC can be analytically derived from the nFC under the static Gaussian null hypothesis (Fig. 1a,b). As shown in Equations (8) to (11) of the Methods section, the (jk, lm) entry of the eFC matrix is obtained as a sum of pairwise products between the (j, k, l, m) entries of the nFC matrix, divided by a normalisation factor: eFC jk,lm = r jk r lm + r jl r km + r jm r kl 1 + 2r jk 2 √ 1 + 2r lm 2 . (1) Using the HCP dataset, the predicted eFC achieves an average Pearson correlation of r = 0.93 with the empirical eFC (r = 0.88 if GSR is applied). The distributions across 100 unrelated HCP subjects are shown in Fig. 1c. This is a significant improvement on the linear regression approach adopted in [16], which achieved an average Pearson correlation of r = 0.72 on pairs of edges not sharing any nodes, but performed poorly otherwise (r = 0.06). Moreover, by revealing the mathematical relationship between eFC and nFC, Equation (1) explains why the eFC is highly replicable, stable within individuals across multiple scan sessions and consistent across datasets-it will be as long as the nFC is.
B. The edge communities can be predicted from the nFC Clustering the eFC via the k-means algorithm was used to identify communities of co-fluctuating edges. These were then mapped back to individual nodes to obtain overlapping regional communities, offering a new way to study aspects of cognition and behaviour that transcend traditional disjoint brain parcellations [16]. Given the high similarity between the empirical and null eFC demonstrated in the previous section, it would be reasonable to expect that the edge communities be well recovered by the null model by applying the same community detection algorithm to the predicted eFC. Indeed, the agreement between the empirical and predicted community assignments is exact on 84% of the 19900 edges, and 74% if GSR is applied (Fig. 2a,b). At the node level, the similarity is even stronger. In [16], the similarity between two nodes, referred to as "edge cluster similarity", is measured as the fraction of all edge pairs starting from those two nodes (and reaching the same target) that are clustered together. Using the same procedure, the predicted edge cluster similarity achieves a Pearson correlation coefficient of r = 0.96 with the empirical one, and r = 0.95 if GSR is applied (Fig. 2c,d). Note that the rows and columns of the matrices in Fig. 2 have been rearranged to match the ordering of the 16 networks used in [16,Fig.6] to facilitate a visual comparison. Given that the analysis is performed on the same HCP dataset, the small differences in the empirical results are most likely due to the different preprocessing pipelines (here we use the preprocessed data made available by the HCP; see the Methods section for details). This adds to the evidence that edge-centric measures are not strongly dependent on the preprocessing approach, and similarly speaks to the robustness of the null model predictions.
So far, the results in this subsection have been based on simulations. Let us now see how a mathematical derivation can provide further insight into these edge and node similarities. First, recall that the edge communities are obtained by clustering the eFC (e.g. via k-means). The main obstacle to a full analytic approach is that the outcome of stochastic clustering algorithms cannot be entirely predicted from their input; however, since they are usually based on a distance metric, it is reasonable to expect that the smaller the distance between two rows of the eFC, the higher the probability that the corresponding edges would be clustered together. Accordingly, we define the distance between two edges (jk) and (j k ) as the 1 norm of the difference between the corresponding rows of the eFC. As shown in Equation (12) of the Methods section, this edge distance simplifies to where, for a brain region i, the row vector z i is its z-scored BOLD signal and z i denotes its transpose. What does this imply for the node communities? The similarity between two nodes i and j was measured in [16] as the fraction of all edge pairs starting from i and j (and reaching the same target) that are assigned to the same cluster. Here, the analogous analytic step is to compute the distance between nodes i and j as the sum of the distances between the edges starting from them. Crucially, the resulting node distance can be expressed in terms of BOLD signal correlations (see Equation (13) in the Methods section): where c denotes a constant term, independent of i and j. This implies that the edge-cluster similarity [16] between nodes i and j can be predicted from the corresponding r ij entry of the nFC, avoiding the memory-intensive computation of the eFC and computationally-intensive clustering algorithms (the space complexity of the eFC is O(N 4 ), i.e., it scales with the fourth power of the number of regions and requires over a terabyte of memory for fine brain parcellations-for each subject). Using the HCP dataset to test this prediction shows that the nFC alone achieves an average Pearson correlation of r = 0.76 with the empirical edge cluster similarity matrix shown in Fig. 2c. Once again, the static, node-centric, second-order features of the BOLD signal are sufficient to replicate key findings that appear at first to rely on the specific temporal sequence of BOLD cofluctuations at the single-frame resolution.
C. The null model reproduces the high similarity of the top RSS frames to the nFC Let us now consider the root-sum-of-squares (RSS) of the edge time series introduced in [15]. The RSS is a univariate time series defined as the Euclidean norm of the edge time series vector at each frame. In other words, the RSS peaks when all the cofluctuations (i.e., the edge time series) are simultaneously high in absolute value, either positive or negative. The key finding in [15] is that only a small fraction of frames exhibiting large RSS values are required to explain a significant fraction of variance in the nFC, as well as the network's modular structure. Both results are perfectly reproduced by the static null model, as shown in Fig. 3a,b and Fig. S1. What is particularly remarkable is that the timing of the highamplitude RSS events produced by the null model are arbitrary, and yet a small fraction of frames corresponding to these large cofluctuations is still sufficient to explain the observed nFC. Furthermore, Fig. 3c shows that the null model frames with the largest RSS also exhibit high similarity to the empirical frames with the largest RSS from the HCP dataset-occurring at entirely different times. Note that, unlike temporal dependencies, spatial correlations are necessary to reproduce the results: if the null model is chosen to be both temporally and spatially uncorrelated, high-RSS frames are no more similar to the empirical nFC than low-RSS frames (see Fig. S2). Another interesting note is that high-RSS frames exhibit the strongest average correlation with all other BOLD frames and this average similarity decreases with the RSS magnitude, both in empirical and null cases (see Fig. S3).
A theoretical explanation for these findings will be provided in Section II E and supported by detailed derivations in the Methods section. Interestingly, most of these points were also reported in [15, Fig. S4], where they were taken as evidence that large RSS events are not fMRI artefacts. While settling that methodological issue, these observations raise a conceptual concern: if matching the timing of the RSS events is not essential and the results can be replicated by the static null model, does the edge-centric approach provide any information about the time-varying connectivity that cannot be explained by the static nFC? We will address this question in the next section, by examining the statistical properties of the RSS.

D. The RSS distribution is determined by the nFC eigenvalues
Having established that the large RSS events are not an exclusive feature of neural signals, let us investigate how their ubiquitous appearance can be analytically explained and why the corresponding frames account for the largest fraction of variance in the nFC. As a first step, the RSS can be computed as the squared Euclidean norm of the z-scored BOLD signal z (full derivations provided in the Methods section, see Equations (14) to (16)): The intuition behind this equivalence is that summing over the pairwise products of the elements of a vector is the same operation that is performed when squaring a polynomial; in this case, the vector is the squared BOLD signal, its pairwise products are the squared edge time series, and the polynomial is the squared Euclidean norm.
The key message is that, although it was introduced in [15] as the Euclidean norm of the edge time series, the RSS is mathematically equivalent to the squared Euclidean norm of the BOLD signal, i.e., a measure The same results hold more generally when the frames are ordered according to the corresponding RSS amplitude, either in descending or ascending order. Here, the curves represent the average similarity over 100 subjects. c) The findings do not depend on the timing of the high-amplitude RSS events: the top frames generated by the null model exhibit high similarity to the top frames of the real HCP data, which occur at different times. of the fluctuation of the BOLD signal amplitude over time. We can then proceed without resorting to the edge time series, which is not only convenient in practice but also shifts the conceptual focus back to the BOLD time series-which are more readily interpretable. For a large family of common (sub-Gaussian) distributions, the squared Euclidean norm of a random variable (RV) is heavy-tailed (more specifically, it is sub-exponential [29]). The RSS being a squared norm, large cofluctuations are then to be expected due to its heavy-tailed distribution (see Equation (28)), offering an explanation for the large RSS peaks observed in the BOLD time series.
In the specific case of Gaussian variables (i.e., the null hypothesis), the RSS can be expressed as a sum of N independent Gamma(k = 1/2, θ = √ 2λ i ) variables, each related to an eigenvalue of the nFC matrix (λ i , i = 1 . . . , N ). This is summarised by the moment-generating function in Equation (27). The largest eigenvalues capture the distribution tail and including smaller eigenvalues provides an increasingly complete characterisation of the empirical RSS distribution (Fig. 4b). This distribution can be used for testing the statistical significance of the empirical RSS observed in the HCP dataset against the static null hypothesis of spatially correlated noise. Fig. 4a illustrates the convergence of the empirical RSS distribution to the null distribution as more and more time frames are observed (i.e., over longer fMRI sessions). When all the 1200 time frames available in the HCP data are utilised, the null hypothesis cannot be rejected for 58% of the participants at a 5% significance level and on 90% of the participants after Bonferroni correction for multiple comparisons (p-values given by the two-sided Kolmogorov-Smirnov test).

E. nFC eigenvectors underpin spatial patterns of high BOLD activity
High-amplitude BOLD cofluctuations were observed to be underpinned by a particular spatial mode of brain activity in which default mode and control networks are anticorrelated with sensorimotor and attentional systems [15]. This particular spatial mode was defined as the first principal component of the BOLD activity and, as such, it can be obtained as the largest eigenvector of the static nFC matrix by mathematical equivalence and without recourse to null models (Fig. 5a).
The only question left to answer is whether the RSS is predicted to peak when the BOLD activity aligns with the largest eigenvector. This can be proven to be true even beyond the static null hypothesis (see Equations (23) to (24) in the Methods section). An intuitive understanding can be gained once the RSS is seen as the fluctuation of the BOLD signal amplitude over time: high-amplitude frames have a larger variance, which is captured by a larger coefficient of the first principal component since the latter is the vector that aligns with the direction of maximum variance (Fig. 5c). We can then refine our theoretical understanding of the RSS peaks: not only do they occur when the Euclidean norm of the BOLD signal vector is large but, most likely, when it is well aligned with the leading eigenvector of the nFC. However, the alignment with the leading eigenvector need not be perfect: in general, large RSS values can be expected whenever the BOLD vector is a mixture of the top eigenvectors ( Fig. 5b and Fig. S4). Additional principal components are included as more large-RSS frames are averaged, suggesting why the top 5% frames alone are sufficient for an almost perfect reconstruction of the nFC (Fig. 3). We have thus explained why the nFC estimates corresponding to frames with the largest RSS exhibit the highest similarity with the nFC. Moreover, since the nFC features multiple communities, high-RSS frames naturally reflect this property by exhibiting high modularity, as well as higher values than low-RSS frames, which are less similar to the nFC (see Fig. S1).
One could speculate that the cofluctuation patterns corresponding to large nFC eigenvectors would be closely related to empirical cofluctuation patterns obtained by clustering high-RSS frames as in [17]. For example, the leading nFC eigenvector has the highest similarity to all BOLD frames and the corresponding cofluctuation pattern would resemble the most frequently occurring cluster.

F. A null model for binary edge time series
It was recently observed that thresholding the edge time series retains most of the information about the nFC since averaging the binarised edge time series over time still yields a very accurate approximation of the nFC both at the voxel [14] and parcel level [30]. As a premise, the latter finding is empirically replicated here using the HCP dataset, with a resulting Pearson correlation r = 0.98 between the average binarised edge time series and the nFC (average correlation over 100 unrelated subjects). How to explain this almost perfect correlation? Furthermore, it was noted that the binary edge time series are highly constrained by the nFC [30]. What is the nature of such constraints? Both questions can be answered mathematically under the static Gaussian null hypothesis. Consider two brain regions or parcels j and k with z-scored BOLD activity Z j (t) and Z k (t) and their corresponding edge time series C jk (t) = Z j (t)Z k (t). The thresholded edge time series C jk (t) is equal to 1 when the cofluctuations between j and k are positive, i.e., when Z j (t) and Z k (t) have the same sign. Under the null hypothesis, Z j (t) and Z k (t) are normal Gaussian RVs with correlation coefficient denoted as r jk . If the two parcels are uncorrelated (r jk = 0), their sign is positive or negative with equal probability and can be described mathematically as a Bernoulli(1/2) RV, which is a formal way to model a fair coin draw. The corresponding binary edge time series C jk (t) can also be modelled as a fair coin draw since the two signs are expected to agree half of the times. The situation clearly gets more complicated in the case of 200 correlated parcels as in the HCP dataset considered here. Intuitively, we would expect C jk (t) = 1 if the two parcels were perfectly correlated and C jk (t) = 0 if they were perfectly anticorrelated, i.e., in consistent disagreement. This intuition can be formalised mathematically and extended to all intermediate cases as shown in Equation (30) of the Methods section. A trigonometric argument proves that C jk (t) can be modelled as a biased coin, i.e., a Bernoulli(p jk ) RV with success probability This result reveals the exact analytic relationship between the binary edge time series and the nFC, answering the second question. However, it is straightforward to show that Equation (5) also predicts the approximated nFC obtained by averaging the binary edge time series. This follows from the fact that the time average of ergodic processes (including the null model) converges to their expectation, and that the expectation of a Bernoulli RV is equal to its success probability, p jk . Testing this analytic prediction on the HCP dataset confirms its accuracy (see Fig. S5b). Finally, we can explain the strong correlation between the original and approximated nFC by noting that p jk in Equation (5) is very close to r jk for small values of the correlation (see Fig. S5a), which are the most frequent ones.

G. Relationship with coactivation patterns (CAPs)
CAPs [11,31] preceded edge-centric approaches in the study of spatial patterns of BOLD activity at the singleframe resolution. These patterns differ from established resting-state functional networks and it was speculated that they may originate from a neuronal avalanching phenomenon. Both CAPs and edge-centric studies report a strong similarity between high-amplitude frames and the nFC, and both employ k-means to assign these frames to different clusters [11,17]. However, there are important differences between the two methods. CAPs are patterns of node coactivations (i.e., they are defined in node space), while the frame-wise FC patterns studied in edge-centric analyses are defined in edge space. Moreover, CAPs are seed-based, i.e., frames to be clustered are selected based on high activity levels of a single node rather than simultaneous high activity across all nodes. For example, choosing the posterior cingulate cortex (PCC) as the seed, Liu and Duyn [11] observed a strong correlation between the BOLD activity at times where PCC is highly active and the corresponding seed-based FC map.
The static Gaussian null model employed in this study can predict this strong correlation, which is the fundamental conceptual and practical property guiding the selection of frames that are subsequently clustered into CAPs. As shown in Equation (32) of the Methods section, frames selected based on the high activity of a seed node are expected to exhibit BOLD activity patterns that reflect the corresponding column of the nFC. Furthermore, the The static Gaussian null model can explain the spatial similarity between high-amplitude frames based on a seed node and the corresponding correlation map-a core feature of coactivation patterns (CAPs). a) For each seed node (i.e. parcel), the frames are sorted in descending order based on the BOLD activity of the seed (horizontal axis). The Pearson correlation is computed between each frame and the seed-based correlation map (i.e., the column of the nFC matrix corresponding to the seed). The curves represent the average similarity over all 200 seed nodes and over 100 HCP subjects. As predicted analytically, the similarity is proportional to the BOLD activity and it decreases as lower-amplitude frames are considered (lower percentiles). The null model is in excellent agreement with the empirical results. b) Same as in panel a, but the BOLD activity of all the frames above a given threshold (percentile) are averaged before computing the correlation with the seed-based FC map. Only a small fraction of high-amplitude frames is required to explain most of the nFC variance, reproducing a well-known result in the CAPs literature [11].
correlation is directly proportional to the activity level and thus peaks at frames with the highest activity of the seed. This relationship is illustrated in (Fig. 6a), where frames are sorted in descending order based on the seed node activity. It is important to note that parcels are used instead of voxels, in alignment with the rest of the simulations herein; importantly, the mathematical results hold true in both cases. What happens when more frames are averaged? The answer is shown in Fig. 6b and follows intuitively from Fig. 6a. Let us again consider the frames sorted in descending order; as the activity threshold is lowered (i.e., set to a lower percentile), more frames are averaged. The first frames are best aligned with the seed FC vector (i.e., the nFC column corresponding to the seed node) and their average quickly converges to the same direction. The alignment (or similarity) reaches a plateau as the middle frames are added to the average since they have low or zero correlation with the seed FC vector. Finally, as the last and most negatively-correlated frames are added to the average, the similarity drops sharply. This explains and replicates the first findings of the seminal CAPs paper by Liu and Duyn [11].
To summarise, analytic predictions in the case of CAPs differ from the edge-centric case in that the highamplitude frames are best explained by individual nFC columns rather than its eigenvectors. However, there is an intuitive link between the two: if we select frames where a specific node is highly active (CAPs approach), the BOLD signal will most likely resemble the corresponding nFC column; if we select frames where all nodes are highly active (edge-centric approach), the BOLD signal will most likely resemble all the nFC columns, i.e., it will align with the leading nFC eigenvector.

III. DISCUSSION
We have presented mathematical proofs and numerical analyses of real HCP data supporting our claim that the static nFC is sufficient to replicate the main resting-state edge-centric findings in [15] and [16] both qualitatively and quantitatively, without relying on the edge time series nor any temporal correlations. Specifically, the eFC, the edge communities, the edge time series norm (RSS) distribution, and the spatial BOLD patterns underpinning large cofluctuations can all be predicted from the nFC under the null hypothesis of i.i.d. multivariate Gaussian variables. The inability to reject the null hypothesis on most of the HCP 100 unrelated subjects does not support the conclusion that these edge-centric metrics provide additional information beyond the nFC. These results are not an attempt to disprove the existence of finely timed neural events-they just warn that the evidence provided by fMRI data may not be sufficient to reject simpler explanations, based on the established nFC literature, for the edge-centric features studied in [15] and [16]. In fact, previous influential studies have raised similar warnings in the context of sliding-window approaches to time-varying FC [18,20,21].
However, it would be premature to conclude that the nascent edge-centric approach has no merit, and we acknowledge the fast progress in its development and applications at the time of writing [17,25,30]. Particularly interesting is the influence of structural modules on the edge cofluctuations [25], which we briefly address in the Methods section. The size of the functional modules shapes the spectrum of the nFC: larger functional modules allow for larger eigenvalues, which underpin the high-amplitude cofluctuations. This offers a mathematical insight into the relationship between modular structure and large cofluctuations, and why the latter disappear if the modular structure is disrupted. While fully addressing the latest (and increasingly large number of) preprints is beyond the scope of this work, it is possible that model-based approaches will reveal the role of edge-centric properties in bridging brain structure and function. Indeed, temporally-unfolded (or point-wise) dependence measures have been instrumental in studying the structure-function relationship in canonical complex systems [32,33]; seeing the edge time series as point-wise mutual information under the Gaussian assumption could create new links to the existing literature. It would also be unreasonable to assume that the null hypothesis of i.i.d. variables is a good description of the BOLD signal, which is slowly-varying and highly autocorrelated. Thus, the fact that such null model is able to replicate the edge-centric features in [15] and [16] could be an indication that the temporal structure of the edge time series has not been fully exploited. Indeed, besides the notable cases of the synchronisation of the cofluctuations across subjects watching the same movies and the peak-to-peak interval distribution (see Fig. S6), most of the proposed edge-centric metrics are unaffected by time shuffling (i.e., they are static measures) and can thus be replicated by the i.i.d. null model. Our mathematical derivations do not directly apply to sliding-window approaches since windowed correlations are changed by time shuffling, but it is important to remember that many common time-varying FC analysis pipelines have intermediate steps that alternately leverage and neglect temporal ordering [10]. For example, one might estimate sliding-window correlations (dynamic stage), apply k-means clustering to the resulting time-resolved FC matrices (static stage, since k-means ignores the temporal ordering of the windows), and then evaluate state properties such as dwell times and transition probabilities (dynamic stage). While dynamic measures cannot be predicted from the static nFC, it is not unlikely that static null models could reproduce some of the static measures involved, e.g. the state patterns found via k-means, but not their transition probabilities.
Let us note, however, that the role of null models in time-varying FC is a matter of current debate [18,34], and not all features that can be explained by null models are clinically irrelevant or to be dismissed. For example, using a small fraction of high-amplitude frames to approximate the nFC has been suggested as a way to compress the BOLD signal and alleviate the computational burden of analysing large fMRI datasets without compromising the prediction accuracy [14]. Future studies may find other useful criteria for filtering the frames using the edge time series. As a final contribution, we have analytically shown that the predominance of a few frames in shaping the nFC is to be expected: the nFC captures the BOLD signal variance, which is a second order statistic and heavy-tailed (even if the BOLD signal were Gaussian). Therefore, the nFC is necessarily shaped by a few tail events corresponding to large amplitude frames. These frames determine the direction of maximum variance and thus the first principal components of the BOLD signal, forming the leading eigenvectors of the nFC.
In conclusion, we have laid out the mathematical foundations for the edge-centric FC analysis with the goal of informing future studies, in an interplay with empirical observations and simulations. Future work could leverage this mathematical framework and focus on dynamic measures that cannot be easily explained by minimal null models like the one presented herein.

A. Definition of edge-centric FC
Functional connectivity is defined as the magnitude of the statistical dependence between pairs of brain parcels [35]. This dependence is typically estimated from their time series (here, the BOLD signal) using the Pearson correlation coefficient. Let N be the number of parcels, T be the number of recorded frames, and x i = [x i (1), . . . , x i (T )] be the time series recorded from parcel i, with 1 ≤ i ≤ N . The correlation between two parcels i and j can be computed as r ij = 1 where z i and z j are their z-scored time series row vectors, i.e., z i = xi−µi σi (with µ i and σ i indicating the timeaveraged mean and standard deviation). Repeating this procedure for all pairs of parcels results in a node-by-node (N ×N ) correlation matrix R = [r ij ], which is an estimate of the (node-centric) functional connectivity.
The edge time series between two parcels i and j is the vector resulting from the element-wise product of z i and z j , which encodes the magnitude of their cofluctuations over time: We will denote the random variable (RV) associated with the edge time series c ij (t) with a capital letter, i.e., C ij (t).
The column vector of all the N 2 edge time series values at a given time t can be reshaped into a N × N matrix that is an instantaneous estimate of the dynamic functional connectivity based on a single frame. This matrix is symmetric since each edge time series is independent of the order of the node pair it involves, i.e., c ij (t) = c ji (t), ∀i, j. For this reason, only the upper-triangular portion is typically computed, for a total of N (N − 1)/2 edge time series instead of N 2 .
It is also possible to go one step further and estimate the statistical dependence between each pair of edge time series, where each edge corresponds to a pair of parcels. This fourth-order statistic results in a large N (N −1) 2 × N (N −1) 2 matrix named edge functional connectivity (eFC) [16], with normalised entries defined as B. Null hypothesis Let z be the N × T (parcels × frames) matrix of zscored BOLD observations. Since our goal is to derive the (dynamic) edge-centric properties from the (static) nFC matrix (R = [r ij ]), we need to define a null hypothesis that discounts any temporal dependencies but retains the observed spatial correlations in R. A simple null hypothesis on the distribution of (the columns of) z that satisfies this criterion is Z(t) ∼ N (0, R), that is, i.i.d. multivariate Gaussian RV with a covariance matrix R matching the observed nFC. If we denote the state of the system at time t as the column vector z(t) = [z 1 (t), . . . , z N (t)] , the null hypothesis simply states that z(t) is drawn from the same multivariate Gaussian distribution at each time t, independently of the other samples.

C. Derivation of edge FC
With capital letters denoting RVs, the expected product between two edge time series C jk (t) and C lm (t) at time t is (8) which follows from the definition of the joint cumulant κ(Z j (t)Z k (t)Z l (t)Z m (t)), also noting that that the products involving the expectation of a single variable are equal to zero (i.e., E[Z i (t)] = 0) since z i are z-scored. The expression of the expectation in terms of joint cumulants is sometimes referred to as moment-cumulants formula [36,37]. The joint cumulant is equal to zero for Gaussian RVs [37], allowing a simplification of Equation (8) known as Isserlis' theorem [38]: If we additionally assume that Z(t) is ergodic (which does not preclude it from being a multivariate autoregressive process [18]), all the involved terms become independent of time and Equation (9) further simplifies to E[C jk (t)C lm (t)] =r jk r lm + r jl r km + r jm r kl .
In a mathematical sense, the expectation is to be intended over the population (ensemble), whereas the eFC is computed from a single session (sample). However, the ergodic assumption guarantees that sample estimates converge to the ensemble expectation as the number of time frames increases. We can then obtain an estimate of the eFC by substituting Equation (10) into Equation (7): eFC jk,lm = r jk r lm + r jl r km + r jm r kl 1 + 2r jk 2 √ 1 + 2r lm 2 .
The i.i.d. multivariate Gaussian null model clearly satisfies the ergodic hypothesis since it has no memory. However, note that neither the i.i.d. nor the Gaussian properties are necessary since the derivations in this section assume ergodicity with the only additional constraint that the joint cumulant is equal to zero.

D. Derivation of edge communities
In order to formalise the intuition that two edges (jk) and (j k ) with similar rows of the eFC are likely to be clustered together, let us define their distance d jk,j k as the 1 norm of the difference between the corresponding rows of the (unnormalised) eFC: |r jk r lm + r jl r km + r jm r kl −r j k r lm − r j l r k m − r j m r k l | = N l,m=1 We now have the necessary ingredients to build a measure of similarity between the nodes, which can be used to predict the edge cluster similarity in [16]. There, the similarity between two nodes is measured as the frequency with which the corresponding edges are clustered together (having fixed the number of communities to 10). Instead of discrete assignments to 10 communities, Equation (12) provides a continuous measure of the distance between two edges. The distance between two nodes i and j can then be defined as the sum of the distances between the edges starting from i and j and reaching the same target: where we used the Cauchy-Schwarz inequality and noted that the terms in the square brackets form a constant (independent of i and j). It is then apparent that the edge-cluster similarity [16] between nodes i and j can be approximated by the nFC. Once again, note that the i.i.d. Gaussian RV assumption can be relaxed since the derivations in this section are based on Equation (10), which requires ergodicity with the only additional constraint that the joint cumulant is equal to zero.

E. Derivation of RSS from the BOLD signal
Recalling the definition of the edge time series c ij (t) in Equation (6), the RSS defined in [15] can be approximated as the squared Euclidean norm of the z-scored BOLD signal, up to a constant factor: The approximation does not rely on the i.i.d. assumption; it is valid under the Gaussian null hypothesis and, more generally, for distributions with finite kurtosis -including fMRI data. Under this assumption, z(t) 4 dominates i z i (t) 4 in Equation (14), as can be seen from the ratio of their (expected) values: The approximation in Equation (14) can be replaced by an exact equality if all the N 2 edge time series are included in the RSS definition (that is, all the (i, j) tuples, rather than only the pairs with i < j): F. Why are frames with the largest RSS most similar to the nFC?
Having rewritten the RSS as the squared Euclidean norm in Equation (14), we can more easily investigate the conditions underpinning the largest RSS fluctuations. Let us withen the BOLD vector Z(t) to obtain the RV and let be the eigendecomposition of the correlation matrix R, with Λ = diag(λ 1 , . . . , λ N ) and U being the unitary matrix of eigenvectors of R (since R is symmetric). Without loss of generality, assume that the eigenvalues are sorted in descending order, such that λ 1 is the largest eigenvalue and u 1 is the corresponding leading eigenvector. The RSS can be treated as a RV and rewritten in terms of W(t) and the eigenvector matrix U: where u i is the i-th eigenvector and Θ i (t) is the RV representing the angle formed by the vectors u i and W(t) at time t. Also note that u i 2 = 1 because U is unitary.
For any realisations w(t) with squared norm w(t) 2 , an upper bound on the RSS is obtained as noting that The upper bound is reached when θ 1 (t ) = 0 or θ 1 (t ) = π, which implies that w(t ) = c u 1 , where c is a constant. In other words, w(t ) is aligned (parallel or antiparallel) with the leading eigenvector u 1 . When this happens, the BOLD signal vector z(t ) must also be aligned with u 1 : We can then refine our theoretical understanding of the RSS peaks: not only they occur when the Euclidean norm of the BOLD signal vector is large (as per Equation (14)) but, most likely, when it is well aligned with the leading eigenvector of the static nFC (see Fig. 5b). If the alignment were perfect at a frame t , the instantaneous estimate of the nFC would be i.e., an approximation of the nFC obtained from its leading eigenvector only (and independent of the sign of c). This approximation would achieve an average similarity (Pearson correlation coefficient) of r = 0.69 with the nFC over 100 unrelated participants of the HCP dataset (while, in practice, the highest similarity achieved by the top frame was r = 0.53). However, the alignment with u 1 need not be perfect: in general, large RSS values can be expected whenever the BOLD signal is a mixture of the top eigenvectors. Additional principal components are expressed as more large-RSS frames are averaged, suggesting why the top 5% frames alone are sufficient for an almost perfect reconstruction of the nFC (Fig. 3).
We have thus explained why cofluctuation patterns corresponding to frames with the largest RSS exhibit the highest similarity with the nFC. These results are based on Equation (14) and hold true under the assumption of finite kurtosis (which also applies in the specific case of the null hypothesis, i.e., for Gaussian variables). The i.i.d. assumption is not required.

G. Null distribution of the RSS
The RSS can be written as a simple quadratic form , which is known to follow a generalised χ 2 distribution under the null hypothesis of Gaussian variables [39]. The weights of the non-central chi-square components are proportional to the eigenvalues of the nFC matrix, i.e., λ1 √ 2 , . . . , λ N √ 2 . Another characterisation of this distribution is provided by Equation (19): under the null hypothesis, the inner product u i , W(t) follows a normal Gaussian distribution since W(t) ∼ N (0, 1) and U is unitary. Therefore, u i , W(t) 2 follows a χ 2 distribution and each term λi The RSS is thus obtained as a sum of N independent Gamma-distributed RVs, each associated with one eigenvalue of the nFC. The tail of the RSS is best approximated by the RVs associated with the largest eigenvalues (which have the largest mean and variance), while including smaller eigenvalues provides an increasingly fuller characterisation of the whole distribution (Fig. 4b). The mean and variance of the RSS can be readily obtained from the properties of the Gamma distribution: Higher moments of the RSS null distribution can be derived from its moment-generating function: H. Why are large RSS fluctuations present in many datasets?
The moment-generating function in Equation (27) can be employed to show that the RSS is subexponential under the null hypothesis, which explains its heavy tail and the consequent large events [24,40]. Specifically, the subexponential feature of the null RSS follows from the sufficient condition However, we can expect this behaviour under the more general hypothesis that the z-scored BOLD signal is sub-Gaussian, i.e., its tail decays at least as fast as that of a Gaussian RV (including, for example, any uniformlybounded RVs). The reason is that the square of a sub-Gaussian RV is sub-exponential, and the sum of independent subexponential RVs is also subexponential. Therefore, being the RSS closely approximated by a sum of squared RVs (as per Equation (14)), extreme events are to be expected under the general sub-Gaussian assumption for the BOLD signal, which offers an explanation for the large RSS fluctuations observed in most fMRI datasets.
I. How do nFC modules influence the edge cofluctuations?
Interestingly, Pope et al. [25] have recently reported a connection between the presence of structural modules and the occurrence of large events in the edge cofluctuations (RSS). Insofar as structural and functional modules are in agreement [41,42], we can explain these findings based on the nFC spectrum. How do functional modules shape the eigenspectrum of the nFC? In the ideal case of a block-diagonal matrix (with zeroes outside the blocks), the sum of the eigenvalues corresponding to each block coincides with the block size (since the diagonal elements are all ones and the trace is preserved under diagonalisation). As such, the largest eigenvalue is bounded by the size of the largest block, i.e., larger functional modules allow for larger eigenvalues. In turn, large eigenvalues underpin the high-amplitude cofluctuations, as shown in Section IV F. Therefore, if the size of the modules is reduced via randomisation of the structural connectivity as in [25,SI Fig. 3], the expected magnitude of the RSS cofluctuations will drop according to Equation (21). This offers a mathematical explanation for the lower RSS event count when the modular structure is disrupted.

J. A null model for binary edge time series
Consider two parcels j and k with z-scored BOLD activity Z j (t) and Z k (t) and correlation coefficient r jk as defined by the nFC matrix. Under the static Gaussian null hypothesis, Z k (t) ∼ N (0, 1) and where V j ∼ N (0, 1), and the square root coefficient ensures unit variance for Z j (t) since both parcels are zscored. The edge time series corresponding to the two parcels is C jk (t) = Z j (t)Z k (t), as per Equation (6). By definition, the binary edge time series C jk (t) is equal to 1 when the cofluctuations between j and k are positive, i.e., when Z j (t) and Z k (t) have the same sign. In other words, C jk (t) is a Bernoulli(p jk ) RV with probability where From a geometric perspective, A is satisfied by any vector (Z k , V j ) whose polar angle is between π/2 and arctan(− ). Therefore, In conclusion, under the null hypothesis, the binary edge time series C jk (t) can be modelled as a Bernoulli(p jk ) RV with p jk = 1 2 + arcsin(r jk ) π .

K. Relationship with coactivation patterns (CAPs)
Here, we will focus on explaining a well-known finding in the CAPs literature: choosing a seed node, Liu and Duyn [11] observed a strong correlation between the BOLD activity at frames where the seed is highly active and the corresponding seed-based FC. Note that the correlation is measured between two N -dimensional vectors (where N is the number of nodes): the first vector is the BOLD signal and the second vector is the column of the nFC matrix that corresponds to the chosen seed node. In the following, we will show that the i.i.d. Gaussian null model can predict the observed correlation. Let k be the seed node and z k its z-scored BOLD time series. As usual, the associated random process is denoted with the capital letter Z k (t). If we condition on a specific value of the seed node, say z * k := Z k (t * ), the BOLD activity vector at the corresponding time t * is expected to align with the k-th column of the nFC (denoted asR k ): This is an elementary property of multivariate Gaussian RVs that follows directly from Equation (29): Perhaps less intuitively, the conditional correlation between the BOLD vector Z(t * ) and the k-th column of the sample nFC (R k ) is also proportional to the seed value z * k . In order to prove it, let us first compute their expected inner product: Similarly, the covariance can be approximated as The expected sample covariance is directly proportional to z * k and thus peaks at frames with the highest activity of the seed node k. This remains true after normalising the covariance to obtain the correlation coefficient, as shown in Fig. 6a.

L. Human Connectome Project fMRI Dataset
This study used openly-available and independentlyacquired resting-state fMRI (rsfMRI) data from the Human Connectome Project (HCP) S1200 release [43]. In particular, we used the "100 unrelated subjects" dataset: a subset of 100 non-twins adult participants which were pre-selected by the HCP coordinators (54% female; mean age = 29.11 ± 3.67 years; age range, 22-36 years). The HCP study was approved by the Washington University Institutional Review Board, and informed consent was obtained from all participants. All subjects were scanned on a customized Siemens 3T "Connectome Skyra" with a 32-channel head coil, housed at Washington University in St. Louis. rsfMRI data was acquired in four runs of 15 minutes over a 2-day period, with eyes open and relaxed fixation on a projected bright cross-hair on a dark background (presented in a darkened room). Resting state images were collected with the following parameters: gradient-echo EPI sequence, run duration = 14:33 min, TR = 720 ms, TE = 33.1 ms, flip angle = 52°, FOV = 208x180 mm (RO x PE), matrix = 104x90 (RO x PE), slice thickness = 2 mm, 2-mm isotropic voxel resolution, multi-band factor = 8, echo spacing = 0.58 ms, BW = 2290 Hz/Px).
M. Pre-processing and ICA-FIX denoising Functional images in the HCP dataset were minimally pre-processed according to the pipeline described in [44]. In short, the data was corrected for gradient distortion, susceptibility distortion and motion and then aligned to a corresponding T1-weighted image with one spline interpolation step. This volume was further corrected for intensity bias, normalised to a mean of 10000, projected to the 32k fs LR mesh (excluding outliers), and aligned to a common space using a multi-modal surface registration.
In addition, the preprocessed rsfMRI data was cleaned of structured noise through a process that pairs independent component analysis (MELODIC) with FIX to automatically remove non-neural spatiotemporal components (trained on 25 hand-labeled HCP subjects). The FIX approach and initial results of classification accuracy are detailed in [45], and the effects of the ICA + FIX cleanup (and optimal methods to remove the artefactual components from the data) are evaluated in detail in [46]. The cleaning pipeline is described more comprehensively in the HCP S1200 release reference manual (https://humanconnectome.org/study/hcpyoung-adult/document/1200-subjects-data-release/) and the preprocessing and the cleaning scripts are openly available on Github (https://github.com/Washington-University/HCPpipelines).
The Schaefer200 parcellation was used to define 200 areas on the cerebral cortex [47]. This functional parcellation was designed to optimise both local gradient and global similarity measures of the fMRI signal and is openly available in '32k fs LR' space for the HCP dataset. The nodes are mapped to the Yeo canonical functional networks [48]. The parcellated data was analysed both before and after regressing the global signal. The theoretical derivations and predictions hold and perform equally well in both cases, and we report any significant differences when they occur. Unless otherwise stated, the GSR results are shown in the figures since they are more directly comparable to those published in [15] and [16], noting in particular that GSR was performed in [15]. Despite the ICA-FIX preprocessing pipeline used here is entirely different from those employed in [15] and [16], our results are in excellent agreement with the previously published ones.

V. DATA AND CODE AVAILABILITY
The imaging data from the Human Connectome Project are publicly available and can be accessed after signing a data use agreement at https://db.humanconnectome.org. Source data are provided with this paper.

VI. CODE AVAILABILITY
The analysis was performed with MATLAB (Math-Works, Inc., version 2020b) and the code is made freely available on Github for reproducibility (https: //github.com/LNov/eFC).
A permanent record is also made available on Zenodo (https://zenodo.org/ record/6238564).  FIG. S1. The static Gaussian null model can reproduce the empirical network modularity, which can be interpreted as a measure of the segregation between the network systems. The employed q * variant of modularity is well suited for use with correlation matrices [49]. a) The empirical results match the finding published in [15, Fig. 1E]: the networks estimated using the top 5% of frames exhibit much higher modularity than those estimated using the bottom 5% of frames. The gap is accurately replicated by the null model. Each point corresponds to one of 100 unrelated subjects from the Human Connectome Project (HCP) dataset, with boxes indicating the quartiles and whiskers length specified as 1.5 times the interquartile range. b) The same results hold more generally when the frames are ordered according to the corresponding RSS amplitude, either in descending or ascending order. Here, the curves represent the average over 100 subjects. . Accounting for spatial correlations is essential to reproduce the empirical edge-centric findings. To prove that, we repeated the analysis reported in Fig. 3 but ignored the spatial correlations in addition to the temporal ones. As a result, the differences between frames exhibiting high and low cofluctuation magnitudes (RSS) vanished. a) When using spatially uncorrelated white noise, the small fraction of frames exhibiting the largest RSS is no longer able explain most of the nFC variance (unlike the null model presented in the main text, which accounts for the observed spatial correlations encoded in the nFC matrix). The similarity is computed as the Pearson correlation coefficient between the nFC and the average FC estimated from the top and bottom 5% of the total frames. Each point corresponds to one of 100 unrelated subjects from the Human Connectome Project (HCP) dataset, with boxes indicating the quartiles and whiskers length specified as 1.5 times the interquartile range. b) The same inability is shown more generally when the frames are ordered according to the corresponding RSS amplitude, either in descending or ascending order. Here, the curves represent the average similarity over 100 subjects. c) Unlike the spatially-correlated null model, the top frames generated by Gaussian white noise do not exhibit high similarity to the top frames of the real HCP data. . Frames corresponding to large RSS values exhibit high similarity to the leading nFC eigenvectors. This similarity increases with the number of frames, as shown by comparing the empirical and synthetic results. a) Empirical results using 1200 frames available in the HCP dataset, as in Fig. 5b in the main text. The similarity is measured as the Pearson correlation between the FC estimate from a single frame and each of the estimates from the the four leading nFC eigenvectors. b) The same analysis was repeated on 10 times longer synthetic data (i.e. 12 000 frames compared to previously analysed 1200 frames) generated for each HCP subject using their empirical nFC as input to the null model. The empirical nFC of a subject was set as the covariance matrix of a multivariate Gaussian process, which was sampled to obtain a longer synthetic scan for that subject.