Abstract
One of the most wellestablished tools for modeling the brain is the functional connectivity network, which is constructed from pairs of interacting brain regions. While powerful, the network model is limited by the restriction that only pairwise dependencies are considered and potentially higherorder structures are missed. Here, we explore how multivariate information theory reveals higherorder dependencies in the human brain. We begin with a mathematical analysis of the Oinformation, showing analytically and numerically how it is related to previously established information theoretic measures of complexity. We then apply the Oinformation to brain data, showing that synergistic subsystems are widespread in the human brain. Highly synergistic subsystems typically sit between canonical functional networks, and may serve an integrative role. We then use simulated annealing to find maximally synergistic subsystems, finding that such systems typically comprise ≈10 brain regions, recruited from multiple canonical brain systems. Though ubiquitous, highly synergistic subsystems are invisible when considering pairwise functional connectivity, suggesting that higherorder dependencies form a kind of shadow structure that has been unrecognized by established networkbased analyses. We assert that higherorder interactions in the brain represent an underexplored space that, accessible with tools of multivariate information theory, may offer novel scientific insights.
Similar content being viewed by others
Introduction
Perhaps the most ubiquitous model used in complex systems is the network, which represents pairwise interactions between different elements of a system as directed or undirected graphs^{1,2}. While network models can be extremely powerful, they are also fundamentally limited by the rule that every interaction between elements is strictly bivariate. Hence, interactions between three or more nodes must be indirectly inferred, using methods such as motifs^{3}, transitivity or clustering coefficients^{4}, and mapping cores or mesoscale communities^{5,6}. Increasingly, statistical interactions involving more than two elements (termed higherorder interactions) are recognized as being a key feature of complex systems^{7,8}. This makes the task of recognizing and modeling higherorder structures an important, developing field. However, a lack of welldeveloped, formal tools, as well as the inherent computational and combinatorial difficulties associated with higherorder interactions, have limited their application. In neuroscience, higherorder interactions have been theoretically implicated as building blocks of complexity^{9,10} and functional integration^{11}. Empirically, they have been found at multiple scales, including in neuronal networks^{12,13,14,15,16,17}, electrophysiological signals^{18,19}, and fMRI BOLD data^{20,21,22}, where higherorder interactions have been proposed to relate to emergent mental phenomena and consciousness^{23}.
Recently, Rosas and Mediano^{24} proposed that information theory could be used to identify higherorder interactions in multivariate systems, and furthermore, that it is possible to disentangle qualitatively different kinds of interactions, characterized by pairwise redundant and synergistic modes of information sharing. Intuitively, redundant information corresponds to information that is copied over many different elements such that the observation of a single element resolves the corresponding uncertainty in all of the other elements. In contrast, synergistic information sharing occurs when uncertainty can only be resolved by considering the joint state of two or more variables. This space of redundant and synergistic interactions in the brain remains largely unexplored, as it comprises interactions that are typically inaccessible to a bivariate, functional connectivity network analysis. Synergy is of potential interest because it tracks the ability of the brain to generate novel information through the interactions of multiple brain regions (sometimes called information modification)^{25}. In studies of cortical neural networks, synergy has been associated with neural computation (the genesis of new information through a nontrivial interaction of multiple inputs)^{12,13,14,15,17,26}.
Much of the previous work on higherorder information in neuroscience has used the partial information decomposition (PID) framework^{27,28}, which provides a complete decomposition of the joint mutual information into atomic information components. While powerful, the PID framework has some fairly strict limitations that have hindered its adoption by the wider complex systems community. The first is that it requires partitioning a system into sources and targets, and does not allow analysis of the whole system qua itself. The second is that, due to the combinatorial explosion of information atoms, analysis of more than five or six elements is impossible. Given that even small systems can have hundreds, or even thousands of elements, this is a severe limitation. Finally, the PID is unusual in that, while it reveals the structure of multivariate information, actually calculating values from data requires an additional step: the selection of a redundancy function that quantifies some notion of redundant information. This is a surprisingly difficult task, as many redundancy functions have been proposed, and different choices can lead to radically different descriptions of the same system^{29,30}.
Rosas et al. introduced the Oinformation^{24} as an alternative measure, which gives an overall estimate of the extent to which a system is redundancy dominated or synergy dominated, without requiring the incredible computational cost or ad hoc choices required by the PID. The Oinformation reveals the global structure of the informationsharing dependencies in a system. Negative Oinformation indicates the presence of predominantly synergistic interactions, while positive Oinformation indicates predominately redundant interactions. Despite its strong appeal as a quantitative metric, the origins and neural manifestations of Oinformation have remained elusive, if not enigmatic^{31}.
In this work, we apply a range of informationtheoretic measures to resting state fMRI data acquired from human cerebral cortex with the aim of identifying ensembles of regions (subsystems) that express specific modes of higherorder statistical dependencies. First, we introduce the mathematical machinery required to derive the Oinformation, and its interpretation in the context of multivariate information sharing processes. We derive an analytic relationship betwee the Oinformation and other, more wellknown, multivariate metrics such as the TononiSpornsEdelman complexity^{10}. Then we apply multivariate information metrics to brain data and uncover the presence of abundant and widely distributed subsystems expressing synergy (negative Oinformation) across the entire cerebral cortex. Finally, we discuss what our insights reveal about the structure and functional roles of higherorder relations in brain activity.
Results
Theory
Integration, segregation, redundancy, synergy
A fundamental idea in modern theoretical neuroscience states that the nervous system maintains a balance between integration and segregation^{9}. The integrationsegregation balance principle is based on the insight that the nervous system combines regional elements of functional specialization, with systemwide functional integration. Considerable empirical work has gone into the neural integrationsegregation hypothesis, and the ongoing balance of integrated and segregated dynamics has been found to be regulated by distinct neuromodulatory systems^{32,33}, and correlates with conscious awareness^{34,35,36,37,38}.
The segregationintegration spectrum is typically visualized as a onedimensional space: on one extreme the system is totally disintegrated and every element is behaving entirely independently of all the others. On the other extreme is the case of total integration: every element synchronizes with every other element so that the whole system is densely connected. In the middle there is a complex regime where the system combines elements of independence and integration. As it was originally formulated, integration and segregation were discussed in the contexts of networks, and higherorder interactions were inferred via partitioning the system into subsets of varying numbers of nodes^{9}. These arguments predated the rigorous, mathematical distinction between redundancy and synergy, introduced in the work of Williams and Beer almost two decades later^{27}. Building on these foundations, as well as the definition of Oinformation from Rosas et al.^{24}, we argue that the notion of integration can be expanded to include redundant integration and synergistic integration. The result is a rich space described by distinct dimensions of integration, segregation, redundancy, and synergy (although these do not form an orthogonal basis). This highdimensional, qualitative configuration space may be viewed as an informational morphospace^{39,40,41} and provides a framework for the detailed comparison of different systems.
Information theory and higherorder informationsharing
In this section, we introduce the basics of information theory necessary to understand its application to higherorder relationships. For a more thorough introduction, readers may be interested in Cover & Thomas^{42}. The basic object of study in information theory is the entropy^{43}, which quantifies the uncertainty that we, as observers, have about the state of a variable X. If the states of X are drawn according to the probability distribution P(X = x) with Support Set \({{{{{{{\mathcal{X}}}}}}}}\), then the entropy of X is:
This classic formulation of entropy assumes that X is a discrete random variable, although for continuous data, the generalization to differential entropy is reasonably straightforward (see Sec. Gaussian Information Theory).
Now consider two variables X_{1} and X_{2}: how does knowing the state of X_{1} reduce our uncertainty (the entropy) about the state of X_{2}? The answer is given by the mutual information^{43}, which can be written in two mathematically equivalent forms:
The bivariate mutual information is often applied in the study of complex systems for the inference of functional connectivity networks (e.g., refs. ^{44,45,46,47,48}), which can reveal the structure of dyadic interactions between different elements^{49}. While functional connectivity networks are extremely powerful, they are fundamentally limited by their pairwise structure and are insensitive to higherorder interactions between three or more variables.
The natural place to begin an analysis of higherorder structures in neural data, then, is by attempting to generalize the mutual information to account for more than two variables. Unfortunately, there is no single unique generalization, and at least three are known to exist: the total correlation, the dual total correlation, and the interaction/coinformation (which we will not explore in detail here)^{42}. The total correlation (also referred to as the integration in ref. ^{9}), is formally a straightforward generalization of Eq. (2):
where X is a macrovariable comprised of an ensemble of multiple random variables: X = {X_{1}, X_{2}, …, X_{N}} and D_{KL}() is the KullbackLeibler divergence from prior distribution Q(x) to posterior distribution P(x):
The total correlation is low when every variable is independent, and high when every variable is individually highly entropic but the jointstate of the whole has low entropy. This occurs when the whole system is dominated by redundant interactions: the state of a single variable discloses a large amount of information about the state of every other variable.
The second generalization of mutual information is the dual total correlation, formally a generalization of Eq. (3):
where H(X_{i}∣X^{i}) refers to the residual entropy^{50}: the uncertainty intrinsic to the the i^{th} element of X that is not resolved by any other variable, or collection of variables, in X. The difference between the joint entropy and the sum of the residual entropies is all the entropy that is shared between at least two elements of X (i.e., is redundantly common to two or more elements). Curiously, while total correlation monotonically increases as X transitions from randomness to synchrony, the dual total correlation is low both for totally random, and totally synchronized systems, peaking when X is dominated by shared information.
Rosas et al.^{24}, propose that the difference between TC(X) and DTC(X) (first explored by James and Crutchfield as the enigmatic information^{31}) could provide a measure of the overall balance between redundancy and synergy in multivariate systems: if TC(X) > DTC(X), then the global constraints on the system dominate and force a redundant dynamic, while if TC(X) < DTC(X) the system is dominated by information that is both shared, but not redundant. Rosas et al., rechristen this measure the organizational information:
In the specific case of three variables, Ω(X_{1}, X_{2}, X_{3}) is equivalent to the coinformation^{24}, which Williams and Beer showed is itself equivalent to the redundancy minus the synergy^{51}. This is in keeping with the intuition that positive Oinformation implies a redundancydominated structure and a negative Oinformation implies a synergydominated structure, although the direct link between Ω and the coinformation is only direct for three variables and the measures are not identical for larger sets.
While Oinformation has been applied in a variety of contexts (such as to questions about the aging brain^{22}, information flow in neuronal circuits^{52}, and music composition^{16}), there remains considerable uncertainty around how synergy should be intuitively understood. To help elucidate the answer, we relate Oinformation to the original measure of integration/segregation balance proposed by Tononi, Sporns, and Edelman: the TSE complexity^{9} and show that a geometric interpretation of the Oinformation exists that brings with it a novel perspective on redundancy and synergy.
The TSEcomplexity admits two formulations:
The first (Eq. (9)) defines the TSE complexity as the average mutual information between the pairs of every possible bipartition of the system X. For every integer i between 1 and ⌊n/2⌋, we compute all possible subsets of X with i elements (notated by X^{γ}) and compute the mutual information between that set and it’s complement (X^{−γ}). The second equation (Eq. (10)) provides an alternative interpretation: the TSE complexity quantifies the difference, at every scale, between the expected integration of the scale if the system were fully integrated, and the actual integration of that scale (calculated as the average total correlation of every subset of size k). In this interpretation, the TSE complexity is highest when the smallest scales are relatively disintegrated, but the macroscales are relatively more integrated. This balance of integration and segregation is emblematic of TSE complexity. For a visualization of the TSE complexity calculation as the difference between the expected and empirical values, see Fig. 1.
Computing the full TSE complexity itself requires analyzing every possible subsystem (or bipartition) of X: an insurmountable task for all but the smallest networks, as the combinatorics grow superexponentially. A useful approximation is to look only at the secondtotop layer of the full TSE complexity summation, which only requires finding the average total correlation for the N sets X^{−i} (where X^{−i} is every X ∈ X excluding X_{i}). We refer to this measure as the description complexity of X^{10,53}. Formally:
The definition of C(X) can be understood as the successive pruning of information: the first term, TC(X), is the total integration of X. The second term, − TC(X)/N, is the expected decrease in integration associated with a single element (on average). Finally, \({\mathbb{E}}[{{{{{\mathrm{TC}}}}}}({{{{{{{{\bf{X}}}}}}}}}^{i})]\), is the actual decrease in integrated associated with removing every element on its own. C, then, computes the difference between the expected decrease in integration associated with removing a single node and the actual decrease. C has several obvious conceptual parallels with the DTC and there is indeed an analytic relationship between DTC and C (for proof, see Supplementary Note 1):
This result was independently derived in ref. ^{54}. The relationship between DTC and C allows us to rewrite the Oinformation purely in terms of total correlations:
This allows us to reconceptualize redundancy and synergydominance in terms of just redundancy: synergistic information is information that is redundantly present in large ensembles of elements considered jointly but not in any subset of those ensembles. This is conceptually very similar to the definition of synergy provided by the partial information decomposition^{27}, which defines synergy in terms of redundant information shared by higherorder collections of elements. We can also propose a geometric interpretation of the sign of the Oinformation: based on Eqs. (8) and (13), we can see that Ω(X) < 0 ⇔ TC/N < C and Ω(X) > 0 ⇔ TC/N > C. This means that a system X is synergydominated if the removal of a single element (on average) decreases the integration of the remaining N − 1 elements more than would be expected in the null case of a totally integrated system. The two possible cases (redundancydominated, with Ω > 0 and synergydominated, with Ω < 0) are visualized and discussed in the context of the TSE complexity in Fig. 1.
The framing of the Oinformation in terms of the change to integration after removal of individual elements also has conceptual links to the socalled gradients of Oinformation^{55}. Scagliarini et al., explore how individual elements can contribute redundantly or synergistically to the Oinformation, defining the gradient as the difference between the Oinformation of an ensemble X and the Oinformation when single elements X_{i} are excluded. While a detailed analytic exploration of the link is beyond the scope of this paper, the property of gradients yield valuable insights into the structure of higherorder dependencies in complex systems.
Another heuristic approximation of the TSE complexity is the sum of the total correlation and dual total correlation. Following the notation from Rosas et al.:
James et al. previously termed this measure the exogenous information and described it as a very mutual information: quantifying all of the shared dependencies between each single variable and every other subset of the system:
Given the obvious similarity to Eq. (9), Rosas et al., hypothesized that Σ(X) ∝ TSE(X), which was verified to hold in simple simulations with small N^{24}. By leveraging the Gaussian assumptions here, we can empirically estimate the correlation between TSE and exogenous information and assess how well the relationship holds as N gets large. Figure 2 confirms the strong correlations between TSE complexity with both TC + DTC and DTC alone. These correlations hold over a range of subset sizes, from three to fifteen elements.
fMRI results
We set out to identify subsystems (subsets of dynamically interacting elements) that express negative Oinformation (synergy) in the human brain. Leveraging Gaussian assumptions^{42} (see Methods), multivariate information theoretic measures can be estimated from covariance (correlation) matrices expressing empirically recorded functional connectivity (FC). We computed longtime averages of FC derived from two normative samples of human restingstate fMRI, the Human Connectome Project (main data set^{56}) and an opensource multimodal MRI dataset for MicrostructureInformed Connectomics (MICAMICs; replication data set;^{57}). For both data sets we computed a single FC matrix (HCP: 95 participants, 4 runs each; MICAMICs: 50 participants, 1 run each). Both FC covered the entire cerebral cortex parcellated into a common set of 200 nodes^{58} and node time series were derived from BOLD signals after performing global signal regression, which removes signal components that are common to all nodes in the system, i.e., globally redundant (Supplementary Fig. 1)^{59}. For a brief discussion of global signal regression in this context, see Supplementary Fig. 6).
Computing OInformation on the fullsize 200node FC matrix results in positive quantities for both data sets (HCP: Ω = 79.16 nats; MICA: Ω = 46.69 nats), indicating that the full structure is redundancydominated, which might potentially obscure the presence of higherorder, synergistic correlations. We asked if smaller subsets of nodes were present within the fullsize FC that generated synergy, or negative Oinformation. Random sampling of small subsets (between 3 and 16 nodes) indeed yields abundant subsets that express negative Oinformation (Fig. 3a). Their relative abundance declines rapidly with growing subset size, reflecting the increasing dominance of redundant information and exhaustive capture of unique information. While synergistic subsets account for rapidly diminishing fractions of all subsets, their total number can be nonnegligible (10node subsets: 0.41 percent and 9.23 × 10^{13}, respectively). In a large random sample of 10node subsets, the Oinformation is positively correlated with TSE complexity (Fig. 3b; ρ = 0.642, p = 0; HCP data). Focusing on a separate random sample of 5000 10node subsets with negative Oinformation, we asked if the frequency with which pairs of nodes participate in such subsets is related to their pairwise FC. Indeed, the absolute pairwise FC is strongly negatively correlated with the frequency of participation in synergistic subsets (ρ = − 0.504, p = 0, HCP; ρ = − 0.485, p = 0, MICA, HCP; Fig. 3c). This indicates that strongly positive or negative FC between two nodes makes their joint inclusion in a synergistic subset unlikely, while node pairs with low FC magnitude could either be truly disintegrated, or participating in a highly synergistic subsystem.
Participation of nodes in randomly sampled synergistic subsets varies systematically across the cortex. Over a large random sample of 100,000 10node subsets, all nodes participate at least once, with several nodes participating in more than 10,000 distinct synergistic subsets. Hence, the complete repertoire of coexpressed synergistic subsets covers the entire cortex, with some overlap between subsets, centered on highparticipation nodes that form “focal points” or clusters (Fig. 4a). Projecting the participation of individual nodes (brain regions) onto the cortical surface shows significant consistency between HCP (Fig. 4b) and MICA data (Supplementary Fig. 4) (the two maps are correlated with ρ = 0.579, p = 2.5 × 10^{−19}, between the two data sets). Functional systems^{60} distribute unevenly as well, with highest frequencies of participation found in the frontoparietal (FP) system, for synergistic subsets of 10 nodes (HCP: Fig. 4c; MICA: Supplementary Fig. 2). For larger subset sizes, participation of limbic (LIM) regions dominates over FP regions.
Combinatorics prevent exhaustive exploration of subsets of even modest sizes, and the random sampling strategy employed so far is likely to miss subsets that express maximal synergy. To identify subsets with maximally negative Oinformation (maximal synergy), we used an optimization algorithm based on simulated annealing (references and details are contained in the Methods section). Multiple runs of the algorithm yielded consistent and highly similar outcomes (Supplementary Fig. 3), indicating convergence of the optimization while again highlighting the existence of a large reservoir of nonidentical (degenerate) subsets, all expressing highly negative Oinformation. Deploying this algorithm while varying subset sizes between 3 and 30 nodes, we identified large numbers of subsets that express highly negative Oinformation, for subset sizes 324 nodes (HCP; Fig. 5a) and 3–27 nodes (MICA; Supplementary Fig. 4). No synergistic subsets are found for subset sizes greater than 27 nodes, as redundancy starts to overwhelm the unique informational contributions of individual nodes at larger subset sizes.
To validate that our optimization algorithm was observing truly synergistic ensembles, we tested each optimized subsystem against a null (see Materials and Methods E). Since the ensemble size k is fixed by the optimization algorithm, it is possible that the apparent synergy of that ensemble is actually due to some subset of nodes within that ensemble (for example, a system of three synergistic elements and two independent elements will appear to be a synergistic system of k = 5, however, the real synergy is only in the three entangled elements). To ensure that all elements in the ensemble contributed to the synergy, we only considered a set valid if it was impossible to remove any node without the Oinformation increasing (i.e., the contribution of each element was synergydominated). We found that, for small value of k, the vast majority of optimized ensembles were valid (≈99.08% for ensembles of size four, ≈92.92% for ensembles of size six, ≈84.14% for ensembles of size eight, and ≈64.04% for the maximally synergistic ensemble size of ten). For collections much larger than ten, the proportion of valid systems decreased rapidly: for ensembles of size fifteen, only ≈0.04% were valid, and there were no valid ensembles of size greater than fifteen, despite the fact that the simulated annealing algorithm returned a large number of results with Ω < 0. This implies that, although these larger subsystems are synergy dominated, that synergy is restricted to a core set of components and not attributable to the whole.
Minimal Oinformation was achieved for subsets comprising ~10 nodes for both data sets. Mapping subsets of nodes expressing near minimal Oinformation onto a surface plot of the cerebral cortex reveals consistent topography. Figure 5b shows the frequency with which individual cortical parcels (nodes) were identified across 5000 runs of the optimization algorithm, yielding 4021 unique solutions (HCP Fig. 5b; 4166 unique solutions for MICA data, Supplementary Fig. 4b). Brainwide nodal frequencies are significantly correlated across HCP and MICA data sets (Spearman’s ρ = 0.522, p = 2.2 × 10^{−15}). When mapping these nodal frequencies to seven canonical restingstate functional systems^{60}, we find that each of these seven systems contributes, but to different extent. In HCP data, for optimally synergistic 10node subsets, the visual, frontoparietal and default mode networks are overrepresented, while only the FP system appears overrepresented in the MICA data (Supplementary Fig. 5).
The nature of negative Oinformation (synergy) requires that individual nodes make largely unique (nonredundant) contributions to the multivariate information metric. This suggests that nodes derived from different, informationally distinct (intrinsically redundant, but extrinsically nonredundant) functional communities might be favored as constituents of synergistic subsets. To test this hypothesis, we created sets of 20,000 randomly sampled subsets that were comprised of nodes derived from between 1 and all 7 canonical functional systems (HCP, Fig. 5c; MICA, Supplementary Fig. 4c). The mean Oinformation, across all randomly chosen subsets, was found to be positive regardless of how many FC systems were included in the subsets. For samples derived from just 1 FC system, the Oinformation was most positive (i.e., subsets were most redundancydominated) for visual, somatomotor and attention systems, and they were least redundancydominated for default, frontoparietal and limbic systems. Importantly, the mean Oinformation decreased, and the fraction of synergstic subsets increased, as subsets were sampled from larger numbers of canonical systems. No subset derived from a single functional system was capable of expressing synergy. Subsets spanning 6 or 7 canonical systems were most likely to express synergy, as indexed by the fraction of negative Oinformation encountered in the sample. The finding supports the notion that dividing the brain into canonical functional systems prioritizes grouping nodes by redundant over synergistic information, hence missing a potentially important substrate for neural computation.
Discussion
In this paper, we have shown how the Oinformation^{24}, a measure of higherorder interactions in multivariate data, can reveal synergistic ensembles of brain regions that are invisible to bivariate functional connectivity analyses. Our primary theoretical result is to provide a geometric interpretation of the Oinformation. The interpretation unifies multiple disparate measures of multivariate information into a single framework, built around the TononiSpornsEdelman complexity^{9}. By rewriting Ω(X) and DTC(X) in terms of the total correlation between multiple subsets of X, we find that synergy occurs when removing any single element causes the system to become less integrated, and crucially, moreso than would be expected if structure was uniformly distributed over X. Said differently, synergy can be intuitively understood as that integration that is present in the whole but not smaller subsets (in this case, the N subsets created by removing each X_{i}). In this sense, synergy captures how the whole can be greater than the sum of it’s parts^{61}. This intuition is conceptually similar to the formal definition of synergy from the partial information decomposition framework^{27}, which defines synergy as the information left over when everything accessible in simpler combinations of sources has been accounted for. The exclusive use of total correlations also allows us to consider the Oinformation purely in terms of KullbackLeibler divergences from independent to joint probability distributions (Eq. (5). This shows us that all of these measures can be understood in the context of inferences about structure (relative to a disintegrated prior). In the context of synergy, the extra information in the joint state is information about something: specifically about the relative likelihood of a configuration with respect to the maximum entropy case.
Applied to two separate fMRI brain data sets we find that synergistic subsets of brain regions are ubiquitous and abundant, spanning scales between 3 and 25 regions and extending over the entire cerebral cortex. While redundant interactions dominate functional connectivity at larger subset sizes, the application of multivariate information measures demonstrates a previously hidden repertoire of synergistic ensembles, each integrating diverse and distinct sources of information. Recent work by Luppi et al.,^{21} proposed a synergistic core to the human brain where complex processing occurs. While we found that there is significant overrepresentation of specific regions (including portions reported by Luppi et al., such as prefrontal cortex, occipital pole, the precuneus, and cingulate regions), synergydominated subsystems could include any region of cortex, although some regions contribute more reliably than others. This suggests that synergy is a widespread property of multivariate information emerging from restingstate brain activity. While there is discrepancy between our results and those of Luppi et al., this difference is likely a reflection of the different analytical pipelines, rather than a true conflict. Their approach was based on decomposing the temporal mutual information, which considers dependencies between past and future, while our approach does not. Luppi et al., also only considered pairs of regions coevolving together, while we considered larger ensembles. Our approach brings into view synergies of a much higher order than would be possible in the approach by Luppi et al. Finally, the prior analysis is based on a generalization of the partial information decomposition and requires choosing one of several redundancy functions. It is unknown whether the reported results would hold for all plausible redundant information functions or not. Consequently, different results likely reflect different kinds of synergies (temporal, pairwise vs. instantaneous, higherorder) that can coexist in the brain.
Information theoretic measures are not the only approaches to higherorder structure in brain activity. Recently, Santoro et al.,^{62} proposed a metric they term hypercoherence, which describes higherorder cofluctuations between sets of three and four regions. Based on the edge time series framework^{63,64,65}, the hypercoherence defines a higherorder activity in terms of simplicial complexes of coherent activations. When applied to brain data, Santoro et al., found that hypercoherence was relatively lower in systems that we found to be relatively higher in synergy (frontal and default mode regions), and relatively higher in regions we found to have high average Oinformation (somatomotor regions). We conjecture that the hypercoherence framework is probably preferentially sensitive to redundancies rather than synergies. This would be consistent with recent results from Varley et al., who found that pairwise cofluctuations were positively correlated with redundant information and anticorrelated with synergistic information^{66}. An avenue of future research may be to attempt to apply the filtering approach Santoro et al., use to the Oinformation or other multivariate information measures.
Interestingly, the randomly sampled ensembles that were most likely to be synergy dominated were those that involved nodes that spanned multiple canonical subsystems, while sets of regions all within one system were strongly dominated by redundancy (Fig. 5c). This would be consistent with the hypothesis that functional connectivity, when viewed entirely as bivariate interaction, is largely sensitive to redundant, but insensitive to synergistic, dependencies between brain regions. Consequently, the functional connectivity matrix is not a complete map of the statistical structure in a dataset, but only of dependencies characterized by redundancy. This is consistent with findings from Ince^{67}, Finn and Lizier^{68}, and Varley et al.^{66}, who argued that bivariate correlations are intrinsically redundancydominated. Higherorder synergies represent, in a sense, a kind of shadow structure and consequently are missed by networkfocused approaches that omit higherorder interactions. This hypothesis finds some support in ref. ^{21}, who found that the distribution of synergies was anticorrelated with the functional connectivity network structure, while the distribution of redundancies was positively correlated.
Given the novelty of tools like the Oinformation, the significance of these synergistic dependencies remains almost entirely unknown, although the small number of studies to date suggest intriguing patterns. One study found alterations to the redundancy/synergy bias across the human lifespan^{22}, while other studies have suggested that loss of consciousness induced by propofol is associated with decreased synergistic dynamics^{20}. Future avenues of work include deeper analyses of how higherorder dynamics change between rest and task conditions, in cases of psychopathology or brain injury, and nonhuman animals. We should note that, in the context of the Oinformation, synergy is not necessarily a causal measure: in related contexts, synergy has been discussed as a measure of computation in neural circuits by Sherrill et al., Newman et al., and others^{13,14,15,17,26,69}, although it remains unexplored how exactly these two approaches relate to eachother. The Oinformation is an atemporal measure, sensitive to instantaneous, higherorder correlation structures, but with no notion of dynamics or a flow from past to future. In contrast, the work by Sherill et al., is done in the context of information dynamics^{25} and considers how the past informs on the future. Future research may explore how a synergistic correlation structure might facilitate computations within the system over time.
In addition to the insights into synergy specifically, the results presented here also have implications for researchers interested in multivariate information theoretic analyses. For example, the TSE complexity has long been an object of theoretical interest^{54}, but the intractable combinatorics have limited its applicability in empirical data (although its use is not unheard of^{70}). The finding that the exogenous information Σ(X) ∝ TSE(X) for reasonably large N (first reported in ref. ^{24}), even more so than the original heuristic C, opens the door to applications in experimental neuroscience. The nature of this correlation merits further study as well. One outstanding question is how redundancies and synergies in the data differentially influence the relationship between Σ(X) and TSE(X). Unlike the Oinformation, the Sinformation does not obviously link to redundancies or synergies, and so how these kinds of integration impact the relationship to TSE remains unknown. Future work developing generative models with precisely controllable distributions of redundancies and synergies may shed light on this question.
In a broader scientific context, our work contributes to the increasing interest in higherorder interactions, beyond the standard, pairwise network model^{8,71}. The informationtheoretic approach (such as the work reported here, as well as in refs. ^{16,20,21,52,69,72}) is based largely on a statistical inference, while alternative frameworks based on simplicial complexes, algebraic topology, and hypergraphs has been developed largely in parallel^{7,41,73,74,75,76}. How these different mathematical frameworks relate to each other remains an open question, and the potential for a more unified approach to understanding higherorder interactions both in terms of topology and statistical inferences is an alluring promise.
The optimization of maximally synergistic subsets via simulated annealing can be thought of as an attempt to find a maximally efficient, dimensionally reduced representation of a potentially large data set: when modeling a system, it is generally desirable to capture as many statistical dependencies as possible with the fewest required degrees of freedom. By finding a representation that incorporates synergies while simultaneously pruning redundant information that would be double counted, we can attempt to build the most computationally efficient model of a system under study^{77,78}. While dimensionality reduction and feature selection algorithms are widespread in many computational sciences, a rigorous treatment of the ways that synergistic and redundant information can inform the analysis of brain dynamics and functional networks remains a space of active development (for an example, see refs. ^{78,79}).
The Oinformation scales far more gracefully than related measures of synergistic information (such as the partial information decomposition, which is practically impossible to apply to systems larger than 5 elements^{28}). However, the combinatorics associated with assessing every possible subsystem becomes intractable as the system size grows, an issue first noted for the TSE complexity. In standard functional and effective network research, it is common to compute all pairwise interactions (which only grows with N^{2}), and then filter out spurious edges as needed^{79}. While this may be possible for very small subsystems, it is intractable for larger ones. If one can preselect a set of elements, then the computation of Oinformation is trivial up to hundreds of items. However, the requirement to select subsets of interest can itself be computationally intensive and timeconsuming. Consequently heuristic measures such as optimization, random sampling, or prefiltering subsystems to exclude collections of elements will be required.
Since the Oinformation is a measure of relative redundancy/synergy dominance, in highly redundant data, synergistic structures may not be strong enough to dominate the signal, resulting in a positive (redundancydominated) Oinformation. By adding increasing amounts of lowfrequency redundancy to the BOLD data, and rerunning the optimizations, we found that the maximally synergistic subsets extracted from the uncontaminated data became impossible to retrieve (see Supplementary Fig. 6). Those synergies still existed, they were merely swamped by redundancy and made invisible. Adding global signal back in this way provides a new insight into a commonly used step in fMRI image preprocessing: global signal regression (GSR)^{80}. We argue that GSR can be understood as scrubbing global redundancies from the data, and in doing so may reveal previously buried synergies that would not have been accessible in the original, unprocessed data.
One limitation of this study is that it is hard to disambiguate between information that reflects computation in neural tissue, versus what is attributable to the vascular physiology of the BOLD signal. Recent work by Colenbier has shown that there are synergistic interactions between the global signal, blood arrival times, and functional connectivity structure^{59}. Since the pairwise covariance forms the foundation of the multivariate Gaussian entropy estimator, it is likely that the same confounds influence the estimates of entropy and mutual information. Future work replicating these results using electrophysiological recordings such as M/EEG should help untangle this issue. Another limitation is that it operates on static distributions: every frame is assumed to have been drawn from an unchanging multivariate Gaussian distribution, with no memory or dynamics from moment to moment. This is a standard assumption in functional connectivity analyses, although there is growing interest in the limitations this assumption produces and the need for analyses that explicitly account for dynamics^{81}. The field of information dynamics provides a number of relevant analyses^{82,83}, and there is already interest in higherorder dynamics in the brain: in addition to the aforementioned work by Luppi et al., recent work by Faes et al. proposed a derivative of the Oinformation for rhythmic processes (the Oinformation rate, or OIR)^{84}. The OIR has been used the describe brainheart interaction dynamics and opens up the frequency domain to higherorder, informational analysis in addition to the time domain. Similarly, an application of the Oinformation to the dynamic measure of transfer entropy has been proposed and applied to optimizing ensembles of maximally synergistic or redundant neurons^{52}. Both of these measures could be incorporated into a pipeline line the one described here and may shed light on the similarities (and differences) between dynamic and static analyses. Despite its limitations, however, we are confident that the classic, static Oinformation likely contains a wealth of asyet unexplored structure and will continue to provide insights into brain structure and function.
In this article, we demonstrate how an informationtheoretic measure of multivariate interactions (the Oinformation or synergy) can be used to uncover higherorder interactions in the human brain dynamics. We analytically show that the Oinformation can be related to an older measure of systemic complexity, the TSE complexity, and from this derive a novel geometric interpretation of redundancy and synergydominated systems. With a combination of random sampling and optimization, we show that a large number of subsystems displaying synergistic dynamics exist in the human brain and that these systems form a highly distributed shadow structure that is entirely overlooked in standard, bivariate functional connectivity models. We conclude that the space of higherorder interactions in the human brain represents a large, and underexplored area of study with a rich potential for new discoveries and experimental work.
Methods
Gaussian information theory
In this paper, we focus on higherorder information sharing in fMRI BOLD signals. Since BOLD data is continuous (rather than discrete), to quantify the entropy of a continuous signal, we use a generalization of the the classic, discrete Shannon entropy (Eq. (1)): the differential entropy:
Computing the differential entropy from empirical data is generally difficult, as it requires estimating P(x). However, if one is willing to make assumptions of multivariate normality, closedform estimators of the Gaussian joint entropy can be leveraged.
Prior work has established that BOLD data is wellmodeled by multivariate Gaussian distributions^{85,86} and that more complex and highly parameterized models provide little additional benefit^{87}. While information theory was originally formalized in the context of discrete random variables, in the specific case of Gaussian random variables, closedform estimators exist for almost all the standard information measures (for an accessible review, see^{82} supplementary material). For a univariate, Gaussian random variable \(X \sim {{{{{{{\mathcal{N}}}}}}}}(\mu ,\sigma )\), the entropy (given in nats) is defined as:
For a multivariate Gaussian random variable X = {X_{1}, X_{2}, . . . X_{N}}, the joint entropy is given by:
where ∣Σ∣ refers to the determinant of the covariance matrix of X. The bivariate mutual information (nats) between X_{1} and X_{2} is:
where ρ is the Pearson correlation coefficient between X_{1} and X_{2}. Note that, since the mutual information is a function of ρ for Gaussian variables, this special case of mutual information is not generally sensitive to nonlinear relationships in the data in the way that nonparametric estimators are. Finally, the Gaussian estimator for total correlation is:
From these, it is possible to calculate all of the measures described above (dual total correlation, description complexity, Oinformation, and TSE complexity) for multivariate Gaussian variables. While the assumption of linearity that comes with a parametric Gaussian model can be limiting, the standard technique for assessing functional connectivity (the Pearson correlation coefficient) makes identical assumptions, so our work is consistent with assumptions made when applying standard approaches to FC analysis.
Building an intuitive understanding of synergy in the context of linear systems is difficult, since a multivariate Gaussian is defined in terms of pairwise covariances. Barrett showed that higherorder synergies can exist in purely Gaussian systems and that redundancy is related to the mutual information^{88}, so even in linear systems, beyondpairwise dependencies can exist. This can be partly understood by recognizing that the multivariate Gaussian is the maximum entropy distribution subject to the constraints of pairwise covariance^{42}. So, while pairwise linear relationships are enough to uniquely specify the distribution, they do not rule out the possibility that beyondpairwise relationships exist. They do, however, fix the structure of those higherorder dependencies.
Datasets
Two independent fMRI resting state data sets were employed in the empirical analyses, one derived from the Human Connectome Project (HCP data^{56}) and the other from a recently published opensource repository (MICA^{57}). The HCP data, derived from a set of 100 unrelated subjects, have been used in several previous studies (for more detailed description see ref. ^{89}). All participants provided informed consent, and the Washington University Institutional Review Board approved all of the study protocols and procedures. A Siemens 3T Connectom Skyra equipped with a 32channel head coil was used to collect data. Restingstate functional MRI (rsfMRI) data was acquired during four scans on two separate days. This was done with a gradientecho echoplanar imaging (EPI) sequence (scan duration: 14:33 min; eyes open). Acquisition parameters of TR = 720 ms, TE = 33.1 ms, 52^{∘} flip angle, isotropic voxel resolution = 2 mm, with a multiband factor of 8 were used for data collection. A parcellation scheme covering the cerebral cortex developed in ref. ^{58} was used to map functional data to 200 regions. This parcellation can also be aligned to the canonical resting state networks found in ref. ^{60}.
Of the 100 unrelated subjects considered in the original dataset, 95 were retained for inclusion in empirical analysis in this study. Exclusion criteria were established before the present study was conducted. They included the mean and mean absolute deviation of the relative root mean square (RMS) motion across either four restingstate MRI scans or one diffusion MRI scan, resulting in four summary motion measures. Subjects that exceeded 1.5 times the interquartile range (in the adverse direction) of the measurement distribution in two or more of these measures were excluded. Following these criteria, four subjects were excluded. Due to a software error during diffusion MRI processing, one additional subject was excluded. The remaining 95 subjects were 56% female, had a mean age of 29.29 ± 3.66, and an age range of 22 to 36.
The MICA dataset includes 50 unrelated subjects, who also provided written informed consent. The study was approved by the Ethics Committee of the Montreal Neurological Institute and Hospital. Resting state data was collected in a single scan session using a 3T Siemens Magnetom PrismaFit with a 64channel head coil. Resting state scans lasted for 7 minutes during which participants were instructed to look at a fixation cross. Imaging was completed with an EPI sequence, and acquisition parameters of TR = 600 ms, TE = 48 ms, 52^{∘} flip angle, isotropic voxel resolution = 3 mm, and multiband factor 6. The parcellation used in this dataset was the same as the one used for the HCP data (described above).
Preprocessing
Minimal preprocessing of the HCP rsfMRI data followed these steps^{90}: (1) distortion, susceptibility, and motion correction; (2) registration to subjects’ respective T1weighted data; (3) bias and intensity normalization; (4) projection onto the 32k_fs_LR mesh; and (5) alignment to common space with a multimodal surface registration^{91}. The preprocessing steps described produced an ICA+FIX time series in the CIFTI grayordinate coordinate system. Two additional preprocessing steps were performed: (6) global signal regression and 7) detrending and band pass filtering (0.008 to 0.08 Hz)^{92}. After confound regression and filtering, the first and last 50 frames of the time series were discarded, resulting in a final scan length of 13.2 min (1100 frames).
Preprocessing of the MICA dataset was performed as described in ref. ^{57} for resting state data. Briefly, the data was passed through the Micapipe^{93} processing pipeline, which includes motion and distortion correction, as well as FSL’s ICA FIX tool trained with an inhouse classifier. Time series were projected to each subject’s FreeSurfer surface, where nodes were also defined. Further details about the processing pipeline can be found in ref. ^{93}. The data was global signal regressed in addition to the other preprocessing steps described in this pipeline.
For calculating the covariance matrix used in computing Oinformation, total correlation and dual total correlation, the functional data from all scans and all subjects were combined to create a single COV or FC matrix. Aggregation was carried out by appending the nodal time series across all subjects and runs and then calculating a single Pearson correlation for each node pair. An alternative approach (taking the mean over the singlerun, singlesubject COV/FC matrices) yielded virtually identical results. Following preprocessing and using the common 200node parcellation of cerebral cortex, the mean COV/FC matrices for the HCP and MICA data sets were highly correlated (R = 0.851, p = 0).
Random sampling and optimization
Subsets of regions were selected from the fullsize (200 nodes/regions) FC matrices in two ways, by random sampling and by search through optimization. Random sampling is simple to implement but because of the vast repertoire of potential subsets (\(N\choose{k}\)) it cannot fully disclose the extent of variations in informational measures present in the data. Instead, search under an objective function (optimization) can guide exploration to specific subspaces enriched in subsets with distinct informational signatures.
To perform optimizations we implemented a variant of simulated annealing^{94}. As objectives we chose multivariate informational measures such as the Oinformation (OI), total correlation (TC), and dual total correlation (DTC), which could be maximized or minimized. Each run of the simulated annealing algorithm was carried out in one FC matrix and for one subset size. We carried out 5000 runs, with subset sizes ranging from 3 to 30 nodes. A random selection of nodes was chosen according to the given subset size to initiate each run. The corresponding covariance matrix was extracted from the full COV/FC and used to compute the information theoretic metric of interest. The composition of the subset was then varied and variations were selected under the objective function. Annealing operates by selecting variations stochastically, depending on a temperature parameter that determines the amount of noise permitted in the selection process. Initially, the temperature is high, resulting in the somewhat random exploration of the landscape. As the temperature is lowered, the optimization becomes more deterministic, focusing more and more on local gradient descent. For each run the algorithm proceeded for a maximum of 10,000 steps. At each step, a new set of nodes was generated by randomly replacing nodes, with the number determined by a normal distribution (frequencies of 1, 2, and 3 element flips were 0.68, 0.27 and 0.04, respectively). A new covariance matrix was computed for the new set of nodes and the objective function was calculated for that set. The set was retained if its cost was lower than the current set or if a random number drawn from the uniform distribution between 0 and 1 was less than \(\exp \left.\right((({C}_{n}C)/{T}_{c})\), where C_{n} is the cost of the new set of nodes, C_{L} is the cost of the current set of nodes and T_{c} is the current temperature. At each step, the current temperature decays to a fraction of the initial temperature, as a function of the number of steps completed:
where T_{c} is the current temperature, T_{0} is the initial temperature (set to T_{0} = 1), T_{exp} governs the steepness of the temperature gradient, and h is the current iteration step. By decreasing the temperature at every step, the algorithm becomes progressively more deterministic.
Null model
Given some set X of k nodes with Ω(X) < 0, it is possible that not every X_{i} ∈ X actually contributes to the synergy (for example, if there are some X_{i} that are independent from every other node). However, this set may still be found as a solution for an optimally synergistic subset of k nodes by the simulated annealing algorithm. To ensure that the synergy found in a given subset is the maximally synergistic set of nodes, each node in the subset was removed from the subset in turn by setting the Pearson correlation of that node with all other nodes to zero. After removal of a node, the Oinformation was recalculated. If removal of any node decreased the Oinformation of the subset, then the subset was considered reducible to the k − 1 subset, and was not included in further analyses.
Statistics and reproducibility
All statistics were computed using MATLAB 2020 and MATLAB 2021. The code for reproducing results is provided Supplementary Software 1. Covariance matrices were computed from zscore BOLD time series and squared off to ensure symmetry by averaging each matrix and its own transpose. Informationtheoretic estimators were computed using the formulae given in Sec. Gaussian Information Theory (all code provided).
Random sampling of ensembles was done for ensembles of size 3–16, with 100,000 samples done for each size. Annealing was done using the provided code, with 5000 replications for each ensemble size. All correlations computed using Spearman’s ρ.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All the data used here are available from the Human Connectome Project^{56} (http://www.humanconnectomeproject.org/) and the MicrostructureInformed Connectomics Project^{57} (https://osf.io/j532r/). Data for reproducing figures Figs. 3, 5a, c is included Supplementary Data 1 and Supplementary Data 2, respectively.
Code availability
MATLAB code for computing the TC, DTC, Oinformation, and Sinformation, as well as the simulated annealing, is attached to this manuscript as Supplementary Software 1.
Change history
07 June 2023
A Correction to this paper has been published: https://doi.org/10.1038/s4200302304984y
References
Barabási, A. L., Pósfai, M. Network Science (Cambridge University Press, 2016).
Menczer, F., Fortunato, S. & Davis, C. A. A First Course in Network Science (Cambridge University Press, 2020).
Sporns, O. & Kötter, R. Motifs in brain networks. PLOS Biol. 2, e369 (2004).
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘smallworld’ networks. Nature 393, 440–442 (1998).
Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
Betzel, R. F. Community detection in network neuroscience. https://arxiv.org/abs/2011.06723 (2020).
Battiston, F. et al. Networks beyond pairwise interactions: structure and dynamics. Phys. Rep. 874, 1–92 (2020).
Battiston, F. et al. The physics of higherorder interactions in complex systems. Nat. Phys. 17, 1093–1098 (2021)
Tononi, G., Sporns, O. & Edelman, G. M. A measure for brain complexity: relating functional segregation and integration in the nervous system. Proc. Natl Acad. Sci. USA 91, 5033–5037 (1994).
Tononi, G., Edelman, G. M. & Sporns, O. Complexity and coherency: integrating information in the brain. Trends Cogn. Sci. 2, 474–484 (1998).
Tononi, G. & Edelman, G. M. Schizophrenia and the mechanisms of conscious integration. Brain Res. Rev. 31, 391–400 (2000).
Timme, N. M. et al. Highdegree neurons feed cortical computations. PLOS Comput. Biol. 12, e1004858 (2016).
Faber, S. P., Timme, N. M., Beggs, J. M. & Newman, E. L. Computation is concentrated in rich clubs of local cortical networks. Netw. Neurosci. 3, 1–21 (2018).
Sherrill, S. P., Timme, N. M., Beggs, J. M. & Newman, E. L. Partial information decomposition reveals that synergistic neural integration is greater downstream of recurrent information flow in organotypic cortical cultures. PLOS Comput. Biol. 17, e1009196 (2021).
Sherrill, S. P., Timme, N. M., Beggs, J. M. & Newman, E. L. Correlated activity favors synergistic processing in local cortical networks in vitro at synaptically relevant timescales. Netw. Neurosci. 4, 678–697 (2020).
Scagliarini, T., Marinazzo, D., Guo, Y., Stramaglia, S. & Rosas, F. E. Quantifying highorder interdependencies on individual patterns via the local Oinformation: theory and applications to music analysis. Phys. Rev. Res. 4, 013184 (2022).
Varley, T. F., Sporns, O., Schaffelhofer, S., Scherberger, H. & Dann, B. Informationprocessing dynamics in neural networks of macaque cerebral cortex reflect cognitive state and behavior. Proc. Natl Acad. Scie. USA 120, e2207677120 (2023).
Rosas, F. E. et al. Reconciling emergences: an informationtheoretic approach to identify causal emergence in multivariate data. PLOS Comput. Biol. 16, e1008289 (2020).
Varley, T., Sporns, O., Puce, A. & Beggs, J. Differential effects of propofol and ketamine on critical brain dynamics. PLOS Comput. Biol. 16, e1008418 (2020).
Luppi, A. I. et al. A synergistic workspace for human consciousness revealed by integrated information Decomposition. https://doi.org/10.1101/2020.11.25.398081 (2020).
Luppi, A. I. et al. A synergistic core for human brain evolution and cognition. Nat. Neurosci. 25, 771–782 (2022).
Gatica, M. et al. Highorder interdependencies in the aging brain. Brain Connect. 1, 734–744 (2021).
Luppi, A. I. et al. What it is like to be a bit: an integrated information decomposition account of emergent mental phenomena. Neurosci. Conscious. 2021, niab027 (2021).
Rosas, F., Mediano, P. A. M., Gastpar, M. & Jensen, H. J. Quantifying highorder interdependencies via multivariate extensions of the mutual information. Phys. Rev. E 100, 032305 (2019).
Lizier, J. T., Flecker, B. & Williams, P. L. Towards a synergybased approach to measuring information modification. https://arxiv.org/abs/1303.3440 (2013).
Newman, E. L., Varley, T. F., Parakkattu, V. K., Sherrill, S. P. & Beggs, J. M. Revealing the dynamics of neural information processing with multivariate information decomposition. Entropy 24, 930 (2022).
Williams, P. L. & Beer, R. D. Nonnegative decomposition of multivariate information. https://arxiv.org/abs/1004.2515 (2010).
Gutknecht, A. J., Wibral, M. & Makkeh, A. Bits and pieces: understanding information decomposition from partwhole relationships and formal logic. Proc. R Soc. A Math. Phys. Eng. Sci. 477, 20210110 (2021).
Kolchinsky, A. A novel approach to the partial information decomposition. Entropy 24, 403 (2022).
Kay, J. W., Schulz, J. M. & Phillips, W. A. A comparison of partial information decompositions using data from real and simulated layer 5b pyramidal cells. Entropy 24, 1021 (2022).
James, R. G., Ellison, C. J. & Crutchfield, J. P. Anatomy of a bit: information in a time series observation. Chaos: Interdiscip. J. Nonlinear Sci. 21, 037109 (2011).
Deco, G., Tononi, G., Boly, M. & Kringelbach, M. L. Rethinking segregation and integration: contributions of wholebrain modelling. Nat. Rev. Neurosci. 16, 430–439 (2015).
Shine, J. M. Neuromodulatory influences on integration and segregation in the brain. Trends Cogn. Sci. 23, 572–583 (2019).
Massimini, M. et al. Breakdown of cortical effective connectivity during sleep. Science (N. Y., NY) 309, 2228–2232 (2005).
Casali, A. G. et al. A theoretically based index of consciousness independent of sensory processing and behavior. Sci. Transl. Med. 5, 198ra105–198ra105 (2013).
Sarasso, S. et al. Consciousness and complexity: a consilience of evidence. Neurosci. Conscious. https://doi.org/10.1093/nc/niab023 (2021).
Luppi, A. I. et al. Consciousnessspecific dynamic interactions of brain integration and functional diversity. Nat. Commun. 10, 1–12 (2019).
Luppi, A. I. et al. LSD alters dynamic integration and segregation in the human brain. NeuroImage 227, 117653 (2021).
McGhee, G. R. Theoretical morphology: the concept and its applications. Short. Courses Paleontol. 4, 87–102 (1991).
AvenaKoenigsberger, A., Goñi, J., Solé, R. & Sporns, O. Network morphospace. J. R. Soc. Interface 12, 20140881 (2015).
Varley, T. F., Denny, V., Sporns, O. & Patania, A. Topological analysis of differential effects of ketamine and propofol anaesthesia on brain dynamics. R. Soc. Open Sci. 8, 201971 (2021).
Cover, T. M. & Thomas, J. A. Elements of Information Theory (Wiley, 2012).
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Friston, K. J. Functional and effective connectivity in neuroimaging: a synthesis. Hum. Brain Mapp. 2, 56–78 (1994).
van Diessen, E. et al. Opportunities and methodological challenges in EEG and MEG resting state functional brain network research. Clin. Neurophysiol. 126, 1468–1481 (2015).
Ursino, M., Ricci, G., Magosso, E. Transfer entropy as a measure of brain connectivity: a critical analysis with the help of neural mass models. Front. Comput. Neurosci. 14, 45 (2020).
Barnett, L., Muthukumaraswamy, S. D., CarhartHarris, R. L. & Seth, A. K. Decreased directed functional connectivity in the psychedelic state. NeuroImage 209, 116462 (2020).
Fornito, A., Zalesky, A., Bullmore, E. Fundamentals of Brain Network Analysis (Academic Press, 2016).
Sporns, O. Networks of the Brain (The MIT Press, 2010).
Abdallah, S. A. & Plumbley, M. D. A measure of statistical complexity based on predictive information with application to finite spin systems. Phys. Lett. A 376, 275–281 (2012).
Williams, P. L. & Beer, R. D. Generalized measures of information transfer. https://arxiv.org/abs/1102.1507 (2011).
Stramaglia, S., Scagliarini, T., Daniels, B. C. & Marinazzo, D. Quantifying dynamical highorder interdependencies from the Oinformation: an application to neural spiking dynamics. Front. Physiol. 11, 595736 (2021).
Sporns, O., Tononi, G. & Edelman, G. M. Theoretical neuroanatomy and the connectivity of the cerebral cortex. Behav. Brain Res. 135, 69–74 (2002).
Ay, N., Olbrich, E., Bertschinger, N. & Jost, J. A unifying framework for complexity measures of finite systems. ECCS’06 : Proceedings of the European Conference on Complex Systems 2006. (2006).
Scagliarini, T. et al. Gradients of Oinformation: loworder descriptors of highorder dependencies. http://arxiv.org/abs/2207.03581 (2022).
Van Essen, D. C. et al. The WUMinn human connectome project: an overview. NeuroImage 80, 62–79 (2013).
Royer, J. et al. An open MRI dataset for multiscale neuroscience. Sci. Data 9, 569 (2021).
Schaefer, A. et al. Localglobal parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).
Colenbier, N. et al. Disambiguating the role of blood flow and global signal with partial information decomposition. NeuroImage 213, 116699 (2020).
Yeo, B. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165 (2011).
Griffith, V. & Harel, J. Irreducibility is minimum synergy among parts. https://arxiv.org/abs/1311.7442 (2013).
Santoro, A., Battiston, F., Petri, G., & Amico, E. Higherorder organization of multivariate time series. Nat. Phys. 19, 221–229(2023).
Zamani Esfahlani, F. et al. Highamplitude cofluctuations in cortical activity drive functional connectivity. Proc. Natl Acad. Sci. USA 117, 28393–28401 (2020).
Faskowitz, J., Esfahlani, F. Z., Jo, Y., Sporns, O. & Betzel, R. F. Edgecentric functional network representations of human cerebral cortex reveal overlapping systemlevel architecture. Nat. Neurosci. 23, 1644–1654 (2020).
Betzel, R. F., Cutts, S. A., Greenwell, S., Faskowitz, J. & Sporns, O. Individualized event structure drives individual differences in wholebrain functional connectivity. NeuroImage 252, 118993 (2022).
Varley, T. F., Pope, M., Puxeddu, M. G., Faskowitz, J. & Sporns, O. Partial entropy decomposition reveals higherorder structures in human brain activity. http://arxiv.org/abs/2301.05307 (2023).
Ince, R. A. A. The partial entropy decomposition: decomposing multivariate entropy and mutual information via pointwise common surprisal. https://arxiv.org/abs/1702.01591 (2017).
Finn, C. & Lizier, J. T. Generalised measures of multivariate information content. Entropy 22, 216 (2020).
Varley, T. F. Decomposing past and future: integrated information decomposition based on shared probability mass exclusions. https://arxiv.org/abs/2202.12992 (2022).
Timme, N. M. et al. Criticality maximizes complexity in neural tissue. Front. Physiol. 7, 425 (2016).
Rosas, F. E. et al. Disentangling highorder mechanisms and highorder behaviours in complex systems. Nat. Phys. 18, 476–477 (2022).
Varley, T. F. & Kaminski, P. Untangling Synergistic Effects of Intersecting Social Identities with Partial Information Decomposition. Entropy 24, 1387 (2022).
Sizemore, A. E., PhillipsCremins, J., Ghrist, R. & Bassett, D. S. The importance of the whole: topological data analysis for the network neuroscientist. Netw. Neurosci. 3, 656–673 (2019).
Saggar, M. et al. Towards a new approach to reveal dynamical organization of the brain using topological data analysis. Nat. Commun. 9, 1399 (2018).
Billings, J., Saggar, M., Hlinka, J., Keilholz, S. & Petri, G. Simplicial and topological descriptions of human brain dynamics. Netw. Neurosci. 5, 549–568 (2021).
Stolz, B. J., Emerson, T., Nahkuri, S., Porter, M. A. & Harrington, H. A. Topological data analysis of taskbased fMRI data from experiments on schizophrenia. J. Phys. Complex. 2, 035006 (2021).
Varley, T. F. & Hoel, E. Emergence as the conversion of information: a unifying theory. Philos. Trans. R. Soc. A: Math., Phys. Eng. Sci. 380, 20210150 (2022).
Wollstadt, P., Schmitt, S. & Wibral, M. A rigorous informationtheoretic definition of redundancy and relevancy in feature selection based on partial information decomposition. https://arxiv.org/abs/2105.04187 (2021).
Novelli, L., Wollstadt, P., Mediano, P., Wibral, M. & Lizier, J. T. Largescale directed network inference with multivariate transfer entropy and hierarchical statistical testing. Netw. Neurosci. 3, 827–847 (2019).
Liu, T. T., Nalci, A. & Falahpour, M. The global signal in fMRI: nuisance or information? NeuroImage 150, 213–229 (2017).
Novelli, L. & Razi, A. A mathematical perspective on edgecentric functional connectivity. http://arxiv.org/abs/2106.10631 (2021).
Lizier, J. T. JIDT: an informationtheoretic toolkit for studying the dynamics of complex systems. https://arxiv.org/pdf/1408.3270.pdf (2014).
Bossomaier, T., Barnett, L., Harré, M. & Lizier. J. T. An Introduction to Transfer Entropy: Information Flow in Complex Systems (Springer, 2016).
Faes, L. et al. A new framework for the time and frequencydomain assessment of highorder interactions in networks of random processes. IEEE Trans. Signal. Process. 70 (IEEE, 2022).
Hlinkaa, J., Paluša, M., Vejmelkaa, M., Mantini, D. & Corbetta, M. Functional connectivity in restingstate fMRI: Is linear correlation sufficient? NeuroImage 54, 2218–2225 (2011).
Liégeois, R., Yeo, B. T. T. & Van De Ville, D. Interpreting null models of restingstate functional MRI dynamics: not throwing the model out with the hypothesis. NeuroImage 243, 118518 (2021).
Schulz, M. A. et al. Different scaling of linear models and deep learning in UKBiobank brain images versus machinelearning datasets. Nat. Commun. 11, 4238 (2020).
Barrett, A. B. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 91, 052802 (2015).
Sporns, O., Faskowitz, J., Teixeira, A. S., Cutts, S. A. & Betzel, R. F. Dynamic expression of brain functional systems disclosed by finescale analysis of edge time series. Netw. Neurosci. 5, 405–433 (2021).
Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage 80, 105–124 (2013).
Robinson, E. C. et al. MSM: a new flexible framework for Multimodal Surface Matching. NeuroImage 100, 414–426 (2014).
Parkes, L., Fulcher, B., Yücel, M. & Fornito, A. An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for restingstate functional MRI. NeuroImage 171, 415–436 (2018).
Cruces, R. R. et al. Micapipe: a pipeline for multimodal neuroimaging and connectome analysis. Neuroimage 263, 119612 (2022).
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).
Acknowledgements
T.F.V. and M.P. are supported by the NSFNRT grant 1735095, Interdisciplinary Training in Complex Networks and Systems. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
T.F.V., M.E.P., and O.S. conceived of the project. T.F.V. and O.S. performed the formal, mathematical analysis. M.E.P. and O.S. analyzed the data. J.F. preprocessed and collated the fMRI data. T.F.V. and M.E.P. wrote the initial manuscript. O.S., J.F. provided editorial feedback. O.S. supervised the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary handling editors: Enzo Tagliazucchi and George Inglis. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Varley, T.F., Pope, M., Faskowitz, J. et al. Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex. Commun Biol 6, 451 (2023). https://doi.org/10.1038/s4200302304843w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4200302304843w
This article is cited by

Identifying Diagnostic Biomarkers for Autism Spectrum Disorder From Higherorder Interactions Using the PED Algorithm
Neuroinformatics (2024)

Behavioural relevance of redundant and synergistic stimulus information between functionally connected neurons in mouse auditory cortex
Brain Informatics (2023)

Neural complexity through a nonextensive statistical–mechanical approach of human electroencephalograms
Scientific Reports (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.