Unfolding the multiscale structure of networks with dynamical Ollivier-Ricci curvature

Gosztolai, Adam; Arnaudon, Alexis

doi:10.1038/s41467-021-24884-1

Download PDF

Article
Open access
Published: 27 July 2021

Unfolding the multiscale structure of networks with dynamical Ollivier-Ricci curvature

Nature Communications volume 12, Article number: 4561 (2021) Cite this article

4967 Accesses
10 Citations
27 Altmetric
Metrics details

Subjects

Abstract

Describing networks geometrically through low-dimensional latent metric spaces has helped design efficient learning algorithms, unveil network symmetries and study dynamical network processes. However, latent space embeddings are limited to specific classes of networks because incompatible metric spaces generally result in information loss. Here, we study arbitrary networks geometrically by defining a dynamic edge curvature measuring the similarity between pairs of dynamical network processes seeded at nearby nodes. We show that the evolution of the curvature distribution exhibits gaps at characteristic timescales indicating bottleneck-edges that limit information spreading. Importantly, curvature gaps are robust to large fluctuations in node degrees, encoding communities until the phase transition of detectability, where spectral and node-clustering methods fail. Using this insight, we derive geometric modularity to find multiscale communities based on deviations from constant network curvature in generative and real-world networks, significantly outperforming most previous methods. Our work suggests using network geometry for studying and controlling the structure of and information spreading on networks.

Emergence of fractal geometries in the evolution of a metabolic enzyme

Article Open access 10 April 2024

Song lyrics have become simpler and more repetitive over the last five decades

Article Open access 28 March 2024

Principal component analysis

Article 22 December 2022

Introduction

Real-world networks are rarely embedded in physical or Euclidean spaces, which complicates their analysis. Therefore, previous works have typically assumed that the network’s nodes lie in a latent metric space¹. A well-chosen metric space can provide a ‘geometric backbone’ to allow the correct representation of node similarities, and to study symmetries and dynamical processes on networks at a fundamental level. For example, assuming an underlying manifold structure² permits the efficient functioning of clustering algorithms based on Euclidean geometric features such as k-means or expectation maximisation³. Similarly, the hyperbolic space of constant negative curvature provides a natural parametrisation of complex networks to unveil their self-similar clusters across scales^4,5. Besides, networks can also be embedded based on a suitable pseudo-distance metric between dynamical network processes, which has helped reveal their functional organisation^6,7. However, in general, there is no guarantee that a network is compatible with a given metric space without suffering significant distortion⁸. Yet, a network may have several, not necessarily self-similar, geometric representations owing to multiscale structure, arising, for example, from clusters at multiple resolutions⁹. Thus, there is a need for a geometric notion that does not rely on predefined embedding spaces, yet allows unfolding the multiscale structure of a general class of networks.

A promising alternative to embeddings is to define geometry based on a notion of curvature, such as the Ollivier–Ricci (OR) curvature¹⁰, which intuitively speaking measures the deviation of the graph from being locally grid-like analogously to being ’flat’ in continuous spaces. ‘Flatness’ of a network can be understood in terms of its local connectivity: the distance of a pair of nodes is the same as the average distance of their neighbourhoods. Thus positive (or negative) OR curvature of an edge indicates that it resides in a region of the graph that is more (or less) connected than a grid. In addition to the intuitive definition, the OR curvature does not impose geometry through embedding but induces an effective network geometry with a precise interpretation in limiting cases. In fact, it is the only one among several discrete curvature notions^11,12 known to converge rigorously to the Ricci curvature of a Riemannian manifold¹³. The OR curvature has also been linked to graph-theoretical objects by deriving formal bounds on the local clustering coefficient and the spectrum of the graph Laplacian^14,15. Moreover, the OR curvature of an edge is intrinsically linked to network-level robustness to edge removal, which has led to advances in applications such as studying the fragility of economic networks¹⁶ or characterising the human brain structural connectivity¹⁷.

However, despite recent clustering heuristics based on the OR curvature^18,19, several of its properties have hindered its widespread adoption to study network clusters. Firstly, since the OR curvature depends on structural neighbourhoods, related clustering methods (including the Ricci flow method¹⁹) lack a resolution parameter to tune the geometry to unveil multiscale structure in real-world networks. Multiscale clustering has been the subject of intense research and several methods and heuristics have been proposed, along with a parallel list of goodness measures for community structures. These include, without claim of exhaustivity, methods based on statistical mechanical models^20,21, normalised cut²², nonnegative matrix factorisation²³, modularity^24,25,26 and extensions thereof using random walks and diffusion processes^9,27 as well as methods based on graph signal processing²⁸. The second shortcoming of the classical OR curvature of an edge is that it is a local quantity, which depends on the degrees of its endpoints¹⁴. Thus, it likely provides a suboptimal geometric representation of sparse networks—including many real-world networks where each node connects only to a few others—in which node degrees vary widely. In fact, the classical OR curvature is related to the spectral gap of the graph Laplacian¹⁵, the central object of spectral clustering methods²⁹, which no longer indicates clusters in sparse graphs³⁰, similarly to other nodes clustering methods²⁴. This lack of robustness of the OR curvature for sparse networks also precludes its use for studying the limit of information spreading in graphs³¹, which is linked to a phase transition occurring as the community structure gets weaker and becomes abruptly undetectable^31,32,33.

In other words, there is a need for a geometric notion that does not rely on embeddings, is capable of generating a family of geometric representations to encode multiscale clusters as increasingly coarser features. Moreover, these features should robustly signal network clusters at different scales until the fundamental limit of their detection. Such a notion would hold the premise to describe multiscale structures of graphs without the need for statistical null models and to open new avenues to study and control information spreading phenomena using network geometry.

Results

Dynamical OR curvature from graph diffusion

We address this need by combining two distinct frameworks—network-driven dynamical processes and geometry with OR curvature. The spreading of network-driven dynamical processes is shaped by the heterogeneity of the network connectivity. In turn, one may infer the network structure by observing properties of their evolution. We focus on Markov diffusion processes^9,34,35,36, a class of linear dynamical systems which is rich enough to capture several properties of nonlinear processes on networks^37,38. Let us consider a connected network of n nodes and m edges weighted by pairwise distances w_ij. We construct a continuous time diffusion on the network by the standard procedure²⁹ of defining the normalised graph Laplacian matrix L ≔ K⁻¹(K − A), where K is the diagonal matrix of node degrees with K_ii = ∑_jA_ij and A is the weighted adjacency matrix encoding similarities between nodes. To obtain non-negative similarities from node-to-node distances, one may simply take ${A}_{ij}={\max }_{uv}{w}_{uv}-{w}_{ij}$ or ${A}_{ij}={e}^{-{w}_{ij}}$, with the latter more strongly penalising distant points. Then, the probability measure of the diffusion started from the unit mass δ_i on node i (Fig. 1a, b) evolves according to

$${{{{{{{{\bf{p}}}}}}}}}_{i}(\tau )={\delta }_{i}{e}^{-\tau {{{{{{{\bf{L}}}}}}}}}\ .$$

(1)

**Fig. 1: Dynamical Ollivier–Ricci curvature capturing the spreading of diffusion processes.**

In analogy to the Ricci curvature on a manifold, the classical OR curvature^10,39 measures the distance of one-step neighbourhoods of a pair of nodes i, j relative the geodesic (shortest path) distance of i, j (see Supplementary Note 1 for background). Here, instead of structural neighbourhoods we consider distributions generated by diffusion processes across scales τ. Specifically, we start a diffusion process at each node i = 1, …, n to obtain a set of measures p_i(τ). We then define the dynamic OR curvature of an edge as the distance of the pair of measures started at its endpoints relative to the weight of the edge

$${\kappa }_{ij}(\tau ):= 1-\frac{{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))}{{w}_{ij}}\ ,$$

(2)

whenever ij is an edge and 0 otherwise. Intuitively, Eq. (2) reflects the overlap of diffusions over time when started w_ij distance apart, measured by ${{{{{{{{\mathcal{W}}}}}}}}}_{1}$, the optimal transport distance⁴⁰. The latter is obtained as a solution to a minimisation problem (Eq. (11) in the “Methods” section) yielding the least cost of transporting the measure p_i(τ) to p_j(τ) by the optimal transport plan ζ(τ). The entries of the optimal transport matrix are shown on Fig. 1c, d representing the quantity of mass moved between each pair of nodes u and v along their connecting geodesic of length d_uv (Fig. 1e).

By contrast to Eq. (2), previous works have typically defined the classical OR curvature based on one-step transition probabilities of lazy random walks p_i(τ) ≃ p_i = αI + (1 − α)δ_iK⁻¹A. In this definition, scale is introduced via a laziness (or idleness) parameter α, varying the importance of local neighbourhoods relative to w_ij, which in effect introduces self-loops in the graph. Our definition (Eq. (2)) replaces one-step neighbourhoods by probability measures supported on the whole graph, with the timescale τ of the diffusion process playing the role of the scale parameter. As expected, Eq. (2) recovers the classical OR curvature¹⁰ as a first-order approximation. In addition, the dynamical OR curvature inherits the geometric intuition of the classical definition. Analogously to the Ricci curvature on planes, spheres and hyperboloids, κ_ij(τ) is zero on grids but positive and negative on cliques-like and tree-like networks, respectively, for all finite scales τ (Supplementary Fig. 1b, c). In the following, we are interested in studying the curvature distribution across edges when the network structure deviates from these canonical topologies.

Edge curvature gap differences in rate of information spreading

Most real-world networks exhibit organisation on several scales. As an illustration Fig. 1a, b shows an unweighted stochastic block model (SBM) network⁴¹ of four clusters and two nontrivial scales. To construct these scales, we drew edges independently between clusters with different probabilities (0.1 or 0.02). We varied cluster sizes and within-cluster edge probabilities but ensured that the latter remained sufficiently high to easily visualise clusters (see Fig. 1 for parameters). We show that this multiscale structure can be revealed by scanning through a finite range of scales τ and studying snapshots of curvature distribution across edges.

The characteristic scales of a network are related to the overlap between pairs of diffusion measures p_i(τ), p_j(τ). This overlap depends on the starting points i, j and on network clusters which can confine diffusions on well-connected regions for long times before reaching the stationary state π^9,34,35,42, given by π_i = K_ii/∑_iK_ii. This transient phenomenon is reflected by the structure of the optimal transport matrix ζ(τ). If i, j lie within the same cluster, the measures quickly overlap (Fig. 1a) and only diagonal entries of ζ(τ) are positive (Fig. 1c), weighing only short, within-cluster geodesics. By contrast, started at different clusters, the measures remain almost disjoint (Fig. 1b) and ζ(τ) is forced to select longer geodesics (Fig. 1d, e), reflected by the large entries in the off-diagonal block.

The evolution of the edge curvature κ_ij(τ) (Fig. 1f) aggregates the information in ζ(τ) into a single number that is related to the rate of mass exchange between clusters at a given scale. We see in Fig. 1f that, initially, when all nodes support disjoint point masses and the diffusions have not yet mixed, ${{{{{{{{\rm{lim}}}}}}}}}_{\tau \to 0}{\kappa }_{ij}(\tau )\to 1-{{{{{{{{\mathcal{W}}}}}}}}}_{1}({\delta }_{i},{\delta }_{j})/{d}_{ij}=0$. At the other extreme, as the diffusions reach stationary state, ${{{{{{{{\rm{lim}}}}}}}}}_{\tau \to \infty }{\kappa }_{ij}(\tau )\to 1-{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{\boldsymbol{\pi }}}}}}}},\ {{{{{{{\boldsymbol{\pi }}}}}}}})/{d}_{ij}=1$. At intermediate scales, the curvature can take values between 1 and some finite negative number depending on the graph¹⁵. We find that, as the curvature of an edge evolves, the scale at which it approaches unity indicates how easy it is to propagate information between clusters. More precisely, in the “Methods” section, we prove that this scale gives an upper bound on the mixing time ${\tau }_{ij}^{{\mathsf{mix}}}$ of the diffusion pair, namely,

$$\begin{array}{lll}{\tau }_{ij}^{{\mathsf{mix}}}&:= &\frac{1}{2}\mathop{\sum}\limits_{uv}| {\zeta }_{uv}(\tau )-{\zeta }_{uv}(\infty )| \\ &\le &\min \{\tau :\ {\kappa }_{ij}(\tau )\ge 0.75\},\end{array}$$

(3)

where ζ(τ) is the optimal transport plan with marginals p_i(τ) and p_j(τ). Note that κ_ij(τ) ≥ 0.75 does not imply that the corresponding diffusion processes have approached stationary state independently, but only that they exchange negligible mass at that or larger scales.

Importantly, a gap in the distribution of curvatures appears when the curvature exceeds 0.75 for some edges while being <0.75 for others indicating a network bottleneck that limits mass flow. To illustrate this, Fig. 1f shows three bundles of edges, the edges within the bundle with most positive curvature are found within clusters, while edges within the other two bundles lie between clusters. Figure 1g, h show the two scales on Fig. 1f (${{{{{{\mathrm{log}}}}}}}\,\tau =0.15,\ 0.43$) where the curvature has exceeded 0.75 for a given bundle, indicating the diffusions are well mixed across the corresponding edges, but not across other edges whose curvature is <0.75. The latter mark bottleneck edges which lie between the expected partitions with four and two clusters, respectively. This simple example shows the importance of the scale parameter τ in our curvature definition to capture network scales. Let us emphasise that the characteristic scales revealing network clusters in our framework are indicated by curvature gaps, i.e., differences in the relative magnitude of curvatures. This is unlike some previous works^18,19, where clusters were identified based on finding negatively curved edges between clusters. Before applying this idea to real networks, we take a closer look at the curvature gap in the theoretical context of the stochastic block model.

Curvature gap is a robust indicator of clusters in stochastic block models

Since in our example any pair of diffusions are supported by one (Fig. 1a) or two (Fig. 1b) clusters, we focus on the subgraph G induced by two clusters (Fig. 2a). Let us simplify one more step and assume that G is a realisation of ${{{{{{{\mathcal{G}}}}}}}}=\,{{\mbox{SSBM}}}\,(n/2,\ {p}_{{\mathsf{in}}},\ {p}_{{\mathsf{out}}})$, the symmetric SBM composed of two planted partitions of equal size. Edges are generated independently with probability p_in within-clusters and probability p_out between-clusters. This symmetry assumption is not necessary in general, as illustrated by our other examples, but it allows us to make links to known theoretical results. We will denote the ground truth as ${C}_{i}^{* }\in \{1,\ -1\}$ for each node i and define $\bar{k}=n({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})/2$ as the average degree.

**Fig. 2: Edge curvature gap indicates the presence of clusters where spectral clustering fails.**

Classical spectral clustering methods²⁹ perform well for dense graphs (Fig. 2a), where $\bar{k}$ is an increasing function of n. This suppresses fluctuations for large n causing a spectral gap to appear when the eigenvalue λ_c of the Laplacian matrix L of G separates from bulk eigenvalues arising from randomness²⁹ (Fig. 2c). In this dense regime, λ_c is well approximated by ${\langle {\lambda }_{c}\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}=2{p}_{{\mathsf{out}}}/({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})$, the second eigenvalue of the ensemble averaged Laplacian ${\langle {{{{{{{\bf{L}}}}}}}}\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}$ (see Supplementary Note 2). Since λ_c can be identified due to the spectral gap, clustering involves simply labelling nodes by the sign of the entries of the corresponding eigenvector ${\phi }_{c}(u)=1/\sqrt{n}$ when ${C}_{u}^{* }=1$ and $-1/\sqrt{n}$ when ${C}_{u}^{* }=-1$. However, for sparse graphs (Fig. 2b), where $\bar{k}$ is constant (independent of n), the spectral gap ceases to exist⁴³ (Fig. 2d). Thus, spectral algorithms relying on identifying λ_c perform no better than chance. To perform clustering in this regime, one needs to go beyond spectral clustering using, for example, the belief propagation method in statistical physics or the related non-backtracking operator whose spectrum is better behaved^31,33.

To see how robustly the dynamical OR curvature indicates the presence of clusters in the symmetric SBM, we define the curvature gap as the difference between the mean curvatures of within- and between-edges at a given scale

$${{\Delta }}\kappa (\tau ):= \frac{1}{\sigma }\left|{\langle {\kappa }_{ij}(\tau )\rangle }_{{C}_{i}^{* } = {C}_{j}^{* }}-{\langle {\kappa }_{ij}(\tau )\rangle }_{{C}_{i}^{* }\ne {C}_{j}^{* }}\right|.$$

(4)

Here the averages are over within and between-edges, normalised by $\sigma =\sqrt{\frac{1}{2}\left({\sigma }_{{{{{{{{\rm{within}}}}}}}}}^{2}+{\sigma }_{{{{{{{{\rm{between}}}}}}}}}^{2}\right)}$ in terms of the standard deviations of both sets of curvatures. This measure is adapted from the sensitivity index in signal detection theory, known to be, asymptotically, the most powerful statistical test for discriminating two distributions⁴⁴. Large curvature gap Δκ(τ) indicates that the within and between edges have curvatures different enough for the clusters to be recovered (Fig. 2e, f). Correspondingly, in the limits τ → 0, ∞ where the curvatures are uniform across the graph Δκ(τ) vanishes and, likewise, in the absence of structure (p_in ≈ p_out in the Erdős–Rényi (ER) limit) we have Δκ(τ) = 0 for all τ (Fig. 2g). At intermediate scales, we find that the scale of maximal curvature gap occurs at τ_κ at which point the curvatures of within-edges is κ_ij(τ_κ) ≈ 0.75. In agreement with Eq. (3), this indicates well-mixed diffusions across these edges relative to low-curvature bottleneck edges between clusters, which indicate incomplete mixing. We also find that ${\tau }_{\kappa }\approx {\lambda }_{c}^{-1}$ (Fig. 2e, f). These results show that positive curvature gap is associated with the presence of clusters.

What is the minimum curvature gap needed to detect clusters? Previous works on the limits of cluster detection has shown that if the clusters are too weak (high r ≔ p_out/p_in) or the graph too sparse (low $\bar{k}$), no clustering algorithm performs better than chance, or distinguish G from an Erdős–Rényi graph (r = 1). This is known as the limit of weak-recovery or detection and is characterised by the Kesten–Stigum (KS) threshold $r={r}_{{{\mbox{KS}}}}=(\bar{k}-\sqrt{\bar{k}})/(\bar{k}+\sqrt{\bar{k}})$^31,32,45.

To study this limit, we sampled 20 networks from ${{{{{{{\mathcal{G}}}}}}}}$ for a range of $\bar{k}$ and r. For each sample, we computed the maximal curvature gap ${{\Delta }}{\kappa }^{* }:= \mathop{\max }\limits_{\tau }{{\Delta }}\kappa (\tau )$ and formed the ensemble average quantity ${\langle {{\Delta }}{\kappa }^{* }\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}$. As r increases for a given $\bar{k}$ we observe that ${\langle {{\Delta }}{\kappa }^{* }\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}$ decreases exponentially until a certain noise level (Fig. 2h). The critical edge density ratio ${r}_{\bar{k}}^{* }$ can be estimated as the smallest r where ${\langle {{\Delta }}{\kappa }^{* }\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}$ dropped below a threshold background noise level, estimated here at 0.035 (black horizontal line). This choice of threshold is not absolute, as it is affected by the finite-size effect of the SBM graphs. An analytical derivation of this threshold is out of scope of this work, but our numerical experiment clearly shows that the curvature gap detects a signal from the planted partitions up to the KS limit (Fig. 2i).

Geometric cluster detection in the sparse regime

Given that the curvature gap (Eq. (4)) indicates the presence of clusters until the fundamental KS limit, we asked if this information could be used to recover the ground truth partition. The definition of curvature gap (Eq. (4)) suggests looking for equilibrium configurations of the unit-temperature Boltzmann distribution over the cluster assignments C,

$${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\propto {e}^{{\sum }_{ij}{\kappa }_{ij}(\tau )\delta ({C}_{i},{C}_{j})}\ ,$$

(5)

where κ is a matrix with entries κ_ij, the sum is over all edges ij and δ(C_i, C_j) = 1 if C_i = C_j and 0 otherwise. The distribution involves only within-edges because finding those is equivalent to finding between-edges, up to a normalisation factor.

The distribution ${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})$ is important because all of its equilibrium states are equivalent and correlate with the ground truth partition of the symmetric SBM ${{{{{{{\mathcal{G}}}}}}}}$. To see this, we connect ${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})$ to the posterior distribution ${\mathbb{P}}(C| G)$ of the cluster assignments given the graph drawn from ${{{{{{{\mathcal{G}}}}}}}}$. In the sparse regime, the likelihood of observing G with a given cluster assignment C is

$${\mathbb{P}}(G| C)\propto \mathop{\prod}\limits_{ij}{\left(\frac{{p}_{{\mathsf{in}}}}{{p}_{{\mathsf{out}}}}\right)}^{\delta ({C}_{i},{C}_{j})}\propto {\mathbb{P}}(C| G)$$

(6)

(see Eq. (15) in the “Methods” section). The second part of Eq. (6) results from Bayes’ theorem using a uniform prior on C, since a priori all configurations are equally likely. It has been previously shown³¹ that P(C∣G) is equivalent to the Boltzmann distribution of an Ising model with constant interaction strength

$${\mathbb{P}}(C| G)\propto {e}^{\beta {\sum }_{ij}\delta ({C}_{i},{C}_{j})}$$

(7)

with inverse temperature $\beta ={{{{{{\mathrm{log}}}}}}}\,({p}_{{\mathsf{in}}}/{p}_{{\mathsf{out}}})\approx {p}_{{\mathsf{in}}}-{p}_{{\mathsf{out}}}$. Note that the equilibrium state (1, 1, …, 1) is trivial assigning all nodes to one cluster. However, asymptotically (n → ∞) the probability of this state vanishes and the Boltzmann distribution is uniform over all other configurations with group sizes n/2 and p_outn/2 between-edges³¹. The fact that one of these states is the ground truth partition, and all equilibrium states of Eq. (7) are equivalent up to a permutation of nodes within clusters means they are indistinguishable from the ground truth partition.

Due to the equivalence between Eqs. (6) and (7), to prove the equivalence between Eqs. (5) and (6) we show that Eq. (5) can also be reduced to Eq. (7). The main insight is that the dynamical OR curvature (Eq. (2)) is constructed using pairs of diffusions, as opposed to single diffusions used in previous studies^9,20,35. Thus, eigenmodes arising from random fluctuations, which would otherwise confound methods relying on the spectrum of the Laplacian, are reflected equally in the spectrum of both diffusions and cancel out upon taking differences over all adjacent node pairs. This allows recovering the community eigenvector ϕ_c even in the sparse regime where the spectral gap vanishes and λ_c is no longer identifiable from the spectrum (Fig. 2d). To see this, we consider the difference between a pairs of diffusions and use the spectral expansion to write ${\sum }_{ij}({p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau ))={\sum }_{s}{e}^{-{\lambda }_{s}\tau }{\phi }_{s}(u){{\Delta }}{\phi }_{s}$ where

$${{\Delta }}{\phi }_{s}:= \mathop{\sum}\limits_{ij}\left({\phi }_{s}(i)-{\phi }_{s}(j)\right)\ .$$

(8)

We find that, instead of looking at the eigenvalue distribution (Fig. 2d), the community eigenvector ϕ_c can be recovered by the relative amplitude of Δϕ_s. Indeed, on a single SBM realisation, Δϕ_s is large for only a few eigenvectors ϕ_s and diminishes for others Fig. 3a. Importantly, those and only those eigenvectors with large Δϕ_s correlate strongly with the ground truth (Fig. 3a inset). As seen in Fig. 3b, the best eigenvector is not ϕ₂, i.e., the one whose eigenvalue is second in the spectrum and is used by spectral clustering methods, but the one whose eigenvalue is inside the bulk in Fig. 2d and thus cannot be identified by looking at the spectrum alone. The correlation with the ground truth for ϕ_c with the highest Δϕ_s averaged over 50 SBM realisations remains close to the highest achievable among all eigenvectors as the KS bound is approached. Meanwhile, ϕ₂, the eigenvector used by spectral clustering methods is suboptimal (Fig. 3c). We also found that, close to the KS bound, often a few other eigenvectors with similarly high Δϕ_s appear, suggesting an improved clustering method combining several top eigenvectors, but this is out of scope here.

**Fig. 3: Detecting communities using pairs of diffusions near the weak recovery limit.**

To express the curvature in the exponent of Eq. (5) we use the dual formulation of the optimal transport distance (Eq. (12) in “Methods” section). The fact that Δϕ_c dominates the contribution from other eigenvectors, allows us to approximate ${\sum }_{ij}({p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau ))={e}^{-{\lambda }_{c}\tau }{\phi }_{c}{{\Delta }}{\phi }_{c}+{\epsilon }_{\phi }\propto {e}^{-{\lambda }_{c}\tau }{\phi }_{c}+{\epsilon }_{\phi }$, where ϵ_ϕ is an asymptotically small term. We use this expression, together with the duality formula (Eq. (12)) to express Eq. (5). Finally, in the sparse regime, we may make a tree-like approximation of the neighbourhoods of i and j to find that Eq. (5) reduces to

$${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\propto {e}^{| {p}_{{\mathsf{in}}}-{p}_{{\mathsf{out}}}| {\sum }_{ij}\delta ({C}_{i},{C}_{j})}.$$

(9)

We refer the reader to the “Methods” section for details. Eq. (9) is the same as Eq. (7) when the communities are assortative (p_in > p_out). We then conclude that the curvatures encode the communities of the symmetric SBM and allow it to be recovered until close to the Kesten–Stigum bound.

In the next section, we present a clustering algorithm based on this insight that can find multiscale clusters in real-world networks.

Geometric modularity for the multiscale clustering of networks

To exploit the property of the dynamical OR curvature to give multiple geometric representations, we develop a multiscale graph clustering algorithm for real-world networks. Using Eq. (5), we introduce the geometric modularity function

$${Q}_{\kappa }(C,\tau )=\frac{1}{2{m}_{\kappa }}\mathop{\sum}\limits_{ij}({\kappa }_{ij}(\tau )-{\kappa }_{0})\delta ({C}_{i},{C}_{j})\ ,$$

(10)

where 2m_κ = ∑_ij∣κ_ij∣ is a normalisation factor and ${\kappa }_{0}={\max }_{ij}{\kappa }_{ij}({\tau }_{{\mathsf{min}}})$ is a constant ensuring that all edges have small non-positive curvature at the smallest computed scale ${\tau }_{{\mathsf{min}}}$. Hence optimising Eq. (10) at small times yields separate communities for each node whereas at large times, when κ_ij(τ) → 1 for all ij, all nodes are merged to a single community. At intermediate scales, the curvatures will have negative and positive values on different edges, making the detection of non-trivial clusters possible without a statistical null-model. This is in contrast to classical modularity²⁴, which minimises the expected number of edges between clusters, and requires a statistical null-model (typically the configuration model), which can hinder identifying functional communities based on dynamics⁶.

To detect robust partitions at several scales, we compute the curvature distribution at scales τ spanning the entire dynamical range of the curvature and, at each τ, we sample the cluster landscape Q_κ(C, τ) by optimising Eq. (10) using the Louvain algorithm^46,47 with 10² random initialisations. At a given τ, we take the cluster with the highest geometric modularity and deem it robust if it has a low variation of information VI_τ against 50 other randomly chosen clusters at this scale, as well as low variation of information ${{{\mbox{VI}}}}_{\tau \tau ^{\prime} }$ against the best cluster assignments at nearby scales $\tau ^{\prime}$. As an example, we show in Fig. 4a the result of this computation on our four-partition SBM graph with two hard-coded scales. We clearly see two large plateaus with low VI_τ and ${{{\mbox{VI}}}}_{\tau \tau ^{\prime} }$, corresponding to robust clusters, shown in Fig. 4b, c. At the smallest scales we find no robust communities shown by the sharp increase in the number of communities and the large VI_τ. We compared geometric modularity to other clustering methods on the SBM and LFR generative benchmark graphs achieving near-state-of-the-art accuracy in both cases, close to the theoretical limit (see Supplementary Fig. 2 and Supplementary Note 3 for details). Notably, our method performs substantially better than the Ricci flow¹⁹ method based on the classical OR curvature reinforcing our theoretical insight that combining diffusion processes and OR curvature allows surpassing the limitations of previous OR curvature-based methods.

**Fig. 4: Clustering networks based on multiscale geometric modularity.**

Our algorithm involves several steps including computing the geodesic distance matrix, computing the diffusions (Eq. (1)) starting from all nodes, computing the curvatures (Eq. (2)) for all edges and running the Louvain algorithm. We discuss the complexity in detail in Supplementary Note 5. Briefly, the step with highest complexity is the computation of the edge curvatures which using exact algorithms runs in time O(mn^5/2), which denotes that the computation time grows at most as Mn^5/2 for a positive constant M and for m, n sufficiently large. This complexity arises since the computation involves solving a linear programme of complexity O(n^5/2) for each of the m edges. On sparse networks this is on par with other random walk^9,20 and probabilistic^21,25,27 methods. However, for small times a complexity close to O(mn) can be achieved by ’trimming’ the probability measures, i.e., reducing their support size by ignoring the mass on nodes below a certain cutoff value (Supplementary Fig. 4b). For large times, a complexity reduces to O(mn) by replacing the the optimal transport distance by the regularised Sinkhorn distance⁴⁸ (Supplementary Fig. 4c). This computational speedups together with the parallelised implementation of algorithm means that it scales well to moderately sized graphs (~10⁴ nodes).

Due to the link between high edge curvature and well-mixed state (Eq. (3)), we expected that at robust scales the clusters will correspond to those regions which have a high amount of redundant information, and thus can be disconnected without affecting the dynamics within them. To see this, we applied this clustering algorithm to the European power grid graph in Fig. 4d–f, an unweighted network of major electrical lines, which has been previously analysed for robustness⁴⁹, multiscale communities⁵⁰ and centrality³⁶. The multiscale community structure can be clearly seen with the many minima of the VI_τ function in Fig. 4d. We displayed two scales in Fig. 4e, f which unfold parts of the power grid which have been historically independently developed. The smaller scale (at around ${{{{{{\mathrm{log}}}}}}}\,\tau =-0.95$) marks economical or historical unions and states (Skandinavia, Benelux, Czechoslovakia, Balkans, etc.). Likewise, the larger scale (at around ${{{{{{\mathrm{log}}}}}}}\,\tau =-0.5$) divides historical Eastern–Western Europe. Interestingly with the boundary in Germany runs along the iron curtain, which also demarcates the regions between major electricity companies.

Finally, we analysed a recent dataset of homeobox gene expression in single neurons of C. elegans⁵¹. The authors in ref. ⁵¹ found based on a multivariate linear regression that the homeobox gene expression profile in a given anatomical neuron class can explain on average 74% of the expression level of the remaining genes in that neuron class. We therefore asked whether the homeobox gene expression profile has sufficient information to cluster neurons into their known anatomical classes.

The data contains a binary feature vector for each of the 301 neurons, indicating the presence of a protein expressed by any of the 105 homeobox genes in the given neuron. To convert this data into a graph with nodes being neurons, we first eliminated all homeobox genes co-expressed in none or more than 90% of the neurons to retain 67 homeobox genes. We then constructed an all-to-all graph adjacency matrix weighted by the Jaccard similarity index between expression profiles of neurons. To increase the number of edges with negative curvature, thus improve the detection at the smallest scales, we sparsified this network using a geometric sparsification method⁵² with parameter γ = 0.01. This method retains at most a fraction γ edges of the original graph as minimum spanning tree augmented by edges relevant for preserving local or global geometry of the graph.

The results of our clustering algorithm on this graph is shown in Fig. 4g and compared with the result of Markov stability⁹ on Fig. 4h, a multiscale method based on persistence of diffusions. Geometric modularity obtains a large range of robust scales with highly similar clusters—as shown by the low VI_τ and ${{{\mbox{VI}}}}_{\tau \tau ^{\prime} }$. These scales correlate closely with the known ground truth of 117 anatomical neuron classes (Fig. 4i). In contrast, for Markov stability⁹, the scales with low VI_τ overfit the graph finding too many clusters (Fig. 4h) which correlate less with the ground truth (Fig. 4i). Likewise, hierarchical clustering fails to identify the ground truth communities⁵¹. We also compared our result with that obtained from a broad range of clustering methods finding that other methods either overfit the neuron classes or found too few partitions (Supplementary Note 4 and Supplementary Fig. 3). Althouth the wavelet method of Tremblay and Borgnat returned a clustering near the ground truth, this scale was not identifiable based on their stability metric (Supplementary Fig. 3b). On Fig. 4j we superimpose the best clustering from geometric modularity against the ground-truth. We observe little differences, apart from VA and AS nodes as well as VD and DD often clustered together. Careful look reveals close biological relationship between these classes; all four classes correspond to motor neurons, with pairs expressing the same neurotransmitters—VA, AS expressing acetylcholine and VD, DD expressing gamma-aminobutyric acid (GABA). These novel results give direct quantitative support to the claim that homeobox gene expression patterns encode structural neuron types. We also observe other stable partitions at larger scale, but they did not correlate the ground-truth.

Overall, these results give a strong demonstration that our method is able to find stable clusters in sparse graphs, and provide meaningful insights into distinct types of real-world networks.

Discussion

Real-world networks often exhibit community structure on multiple scales, due to differences between the rates of information propagation in regions the network on various timescales. We introduced the concept of dynamical OR curvature which defines a scale-dependent geometry from the evolution of pairs of diffusion processes on the network. We showed that the edge curvature carries a precise meaning in this context bounding the rate of information flow across edges. Consequentially, gaps in the edge curvature distribution arising from differences between edge curvatures within and between regions indicate network bottlenecks. Systematically finding these gaps in the edge curvature distribution captures progressively coarser community features as the diffusion processes evolve. This result does not rely on the dynamics being linear diffusions, making it suitable to study the interaction of arbitrary dynamical processes. We expect that, in the future, this approach can be used to tune the geometry of the graph to control the flux or interaction of network-driven dynamical processes, for example, leading to insights to metapopulation models⁵³ and synchronisation problem, for example, to better understand the coexistence of chimera states^54,55.

Unlike previous geometric approaches, which rely on embedding a network into a particular latent metric space^5,6,7,35, our approach constructs an effective object - the weighted and signed edge curvature matrix. Whilst not requiring specific assumptions used by latent space approaches it is worth noting that the dynamical OR curvature is constructed on the metric space formed by all the shortest paths of the graph. This property suggests links with the field of fractal geometry which studies scaling properties of graphs using the shortest path metric⁵⁶. Thus, in graph families such as complex networks whose fractal geometry can be characterised⁵⁷ one can expect relationships between coarse-graining schemes based on box-covering techniques and aggregating clusters based on similar dynamical OR edge curvatures, which could be exploited for controlling the multifractal geometry of these networks.

Although diffusion processes constructed from the graph Laplacian have been explored for network clustering^9,27,35, our work differs in the use of diffusion pairs, as opposed to single diffusions, to construct the curvature. Diffusion pairs are implicitly coupled through the graph and pick up random variations independently, which can be exploited to average out non-informative fluctuations. On stochastic block models, this feature allows the curvature gap to robustly indicate clusters in the sparse regime down to the fundamental limit, where clustering methods relying on the spectral gap in the Laplacian fail⁴³. We also found a new measure of eigenvalue quality, able to select the best eigenvector to be used in spectral methods. Interestingly, the edge curvatures are defined on the set of shortest paths which cannot contain the same edge twice, a subset of the set of non-backtracking walks. Our results are therefore consistent with previous works on the limits of cluster detection using statistical physics objects including the spectrum of non-backtracking operator³³ or related message passing approaches³¹. We expect this insight to provide a new avenue to study the fundamental limits of efficient clustering from a geometric perspective.

Finally, we introduced the notion of geometric modularity to build an easy-to-use multiscale clustering algorithm. Notably, our algorithm achieved near-state-of-the-art performance in sparse SBM graphs, better than methods relying on the spectral gap in the Laplacian matrix (e.g., spectral clustering²⁹ and edge-betweenness²⁵) as well as those relying on the classical OR curvature¹⁹. This confirms that combining diffusions and OR geometry allows surpassing the limitations of these methods, which work well only on dense graphs. We also found robust and interpretable communities on multiple scales in real-world networks without the tendency of overfitting. Overall, we expect our insights connecting dynamical processes, geometry and network clustering to open new avenues to studying and controlling the structural and dynamical properties of networks.

Methods

Optimal transport distance

To measure the distance between a pair of measures p_i(τ) and p_j(τ) we use the optimal transport distance⁴⁰ (also known as 1-Wasserstein or earth-mover distance), defined as

$${{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau )) =\mathop{{{{{\mathrm{min}}}}} }\limits_{{{{{{{{\boldsymbol{\zeta }}}}}}}}}\mathop{\sum}\limits_{uv}{d}_{uv}{\zeta }_{uv}\ ,\\ \,{{\mbox{subject to}}}\,\ \mathop{\sum}\limits_{v}{\zeta }_{uv} ={p}_{i}^{u}(\tau )\ ,\quad \mathop{\sum}\limits_{u}{\zeta }_{uv}={p}_{j}^{v}(\tau )\ .$$

(11)

The constraints in Eq. (11) ensure that the optimal transport plan ${{{{{{{\boldsymbol{\zeta }}}}}}}}(\tau )\in {{\mathbb{R}}}^{n\times n}$ is a coupling of the measures p_i(τ), p_j(τ), i.e., ζ(τ) is a joint distribution that admits _pi(τ) and p_j(τ) as marginals.

An equivalent formulation of this distance can be constructed from the Kantorovich–Rubinstein duality⁴⁰, given by

$${{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))= \mathop{{{{{\mathrm{sup}}}}}}\limits_{f} \mathop{\sum}\limits_{u}f(u)[{p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau )]$$

(12)

where the supremum is taken over all 1-Lipschitz functions f on the graph, that is,

$$| f(u)-f(v)| \le {d}_{uv}$$

(13)

for any node pair u, v.

Upper bound on the mixing time in terms of curvature

Here we prove inequality (3), which gives an upper bound on the mixing time of the coupled diffusions with measures p_i(τ), p_j(τ) in terms the dynamical OR curvature. The ϵ-mixing time is defined as the smallest τ where the law of the coupled process, the optimal transport plan ζ(τ), is within an ϵ radius of the stationary distribution

$${\tau }_{ij}(\epsilon ):= \min \{\tau :\ | | {{{{{{{\boldsymbol{\zeta }}}}}}}}(\tau )-{{{{{{{\boldsymbol{\zeta }}}}}}}}(\infty )| {| }_{{{\mbox{TV}}}}\le \epsilon \},$$

(14)

where the notion of “close to stationarity” is quantified by the total variation distance $| | {{{{{{{\boldsymbol{\zeta }}}}}}}}(\tau )-{{{{{{{\boldsymbol{\zeta }}}}}}}}(\infty )| {| }_{{{\mbox{TV}}}}:= \frac{1}{2}{\sum }_{uv}| {\zeta }_{uv}(\tau )-{\zeta }_{uv}(\infty )|$. Since p_i(τ) and p_j(τ) are marginals of ζ(τ) we have that

$${\tau }_{ij}(\epsilon ) =\min \{\tau :\ | | {{{{{{{{\bf{p}}}}}}}}}_{i}(\tau )-{{{{{{{\boldsymbol{\pi }}}}}}}}| {| }_{{{\mbox{TV}}}}+| | {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau )-{{{{{{{\boldsymbol{\pi }}}}}}}}| {| }_{{{\mbox{TV}}}}\le \epsilon \}\\ =\min \{\tau :\ | | {{{{{{{{\bf{p}}}}}}}}}_{i}(\tau )-{{{{{{{{\bf{p}}}}}}}}}_{j}(\tau )| {| }_{{{\mbox{TV}}}}\le \epsilon \}\ ,$$

where we used the independence of the diffusion processes. From here, we may follow ref. ⁵⁸ and use the Csiszár-Kullback-Pinsker inequality for the optimal transport distance

$$| | {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau )-{{{{{{{{\bf{p}}}}}}}}}_{j}(\infty )| {| }_{{{\mbox{TV}}}}\le (1/{d}_{0}){{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))\ ,$$

where ${d}_{0}=\mathop{\min }\limits_{ij}{d}_{ij}$ is a global graph constant, which can therefore be absorbed into ϵ. This gives an upper bound

$${\tau }_{ij}(\epsilon ^{\prime} ) \le \min \{\tau :\ {{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))\le \epsilon ^{\prime} \}\\ =\min \{\tau :\ {\kappa }_{ij}(\tau )\ge 1-\epsilon ^{\prime} \}\ ,$$

with $\epsilon ^{\prime} ={d}_{0}\epsilon$ which is what we set out to show. Note that choosing any $\epsilon ^{\prime} \in (0,1/2)$ ensures exponential convergence rate to the stationary measure⁵⁹ and by convention, we take the middle of this range and define ${\tau }_{ij}^{{\mathsf{mix}}}:= {\tau }_{ij}^{{\mathsf{mix}}}(1/4)$ to obtain Eq. (3).

Connection between geometric modularity and the symmetric stochastic block model

In this section, we prove that the Boltzmann distribution of cluster assignments given the edge curvatures ${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})$ (Eq. (5)) has equilibrium states which are indistinguishable from the ground truth partition of the SBM. We show this by reducing ${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})$ as well as the posterior distribution ${\mathbb{P}}(C| G)$, to the same constant interaction Ising model (Eq. (7)). In the remainder of this section we work in the sparse regime, where p_in, p_out = O(1/n).

First, we recap the well-known equivalence of the SBM and the Ising model³¹. Let E denote the set of edges. The probability distribution of the symmetric SBM for two clusters can be written as⁴¹

$${\mathbb{P}}(G| C)=\; {p}_{{\mathsf{out}}}^{e}{(1-{p}_{{\mathsf{out}}})}^{\left(n\atop2\right)-e} \\ \times \mathop{\prod}\limits_{ij\in E}{\left(\frac{{p}_{{\mathsf{in}}}}{{p}_{{\mathsf{out}}}}\right)}^{\delta ({C}_{i},{C}_{j})}\mathop{\prod}\limits_{ij\notin E}{\left(\frac{1-{p}_{{\mathsf{in}}}}{1-{p}_{{\mathsf{out}}}}\right)}^{\delta ({C}_{i},{C}_{j})}\\ \propto \mathop{\prod}\limits_{ij\in E}{\left(\frac{{p}_{{\mathsf{in}}}}{{p}_{{\mathsf{out}}}}\right)}^{\delta ({C}_{i},{C}_{j})}$$

(15)

where e is the total number of edges and in the last line we used that the effect of non-edges is weak in the sparse regime. Therefore, by Bayes’ theorem with uniform prior one obtains the posterior distribution ${\mathbb{P}}(C| G)\propto {\mathbb{P}}(G| C)$. As a result, the probability of clusters generated by the SBM is equivalent to the Ising model with uniform interaction with Boltzmann distribution given by Eq. (7)³¹.

Second, we reduce the Boltzmann distribution of clusters given the edge curvature to same Ising model in Eq. (7). From Eq. (5) we have

$${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}}) \propto {e}^{{\sum }_{ij}{\kappa }_{ij}(\tau )\delta ({C}_{i},{C}_{j})}\\ \propto {e}^{{\sum }_{ij}[1-{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),{{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))]\delta ({C}_{i},{C}_{j})}\ ,$$

(16)

where in the last line we used the definition of the curvature in Eq. (2). Comparing Eq. (16) with Eq. (7) note that $1-{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))$ is non-constant and has a non-linear dependence on the scale τ. However, it is possible to express it in terms of p_in, p_in to make the connection to the Ising model. Let us write the diffusion measures in Eq. (1) in terms of the spectral decomposition of L as

$${p}_{i}^{k}(\tau )=\mathop{\sum }\limits_{s=1}^{n}{e}^{-{\lambda }_{s}\tau }{\phi }_{s}(k){\phi }_{s}(i)\ .$$

(17)

At this point let us remark that in the dense regime where p_in, p_out = O(1), the first two eigenmodes (λ₁, ϕ₁) and (λ_c, ϕ_c) dominate and the second eigenmode contains the anti-symmetric eigenvector ${\phi }_{c}(u)=1/\sqrt{n}$ when C_u = 1 and $-1/\sqrt{n}$ when C_u = 2 that is associated with the community structure (Fig. 2c). Thus, one can follow spectral clustering methods²⁹ to find the sparsest cut between clusters using ϕ_c. In contrast, in the sparse regime, the dominant eigenmodes will be driven by random fluctuations in the node degrees across the graph⁶⁰, thus spectral clustering algorithms based on L are suboptimal (Fig. 2d).

However, the coupled diffusion pair allows for cancelling out random fluctuations in their spectrum. To see this, consider for a between-edge ij the difference

$$\mathop{\sum}\limits_{ij\in E}{p}_{i}^{k}(\tau )-{p}_{j}^{k}(\tau ) =\mathop{\sum}\limits_{ij\in E}\mathop{\sum }\limits_{s=1}^{n}{e}^{-{\lambda }_{s}\tau }{\phi }_{s}(k)[{\phi }_{s}(i)-{\phi }_{s}(j)]\\ =\mathop{\sum }\limits_{s=1}^{n}{e}^{-{\lambda }_{s}\tau }{\phi }_{s}(k){{\Delta }}{\phi }_{s}\ ,$$

(18)

where Δϕ_s is defined in Eq. (8). The first term involves the constant eigenvector ϕ₁ corresponding to the stationary state. Therefore, ϕ₁(i) = ϕ₁(j) for all ij and thus its contributions cancels out when taking differences. Further, for eigenvectors ϕ_s with s ≠ 1, c we have asymptotically (n → ∞) that (Fig. 3)

$${{\Delta }}{\phi }_{s}\to 0$$

As a result, the only contribution we are left with is coming from the anti-symmetric eigenmode (λ_c, ϕ_c). Thus we have that

$$\mathop{\sum}\limits_{ij\in E}({p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau ))=\left\{\begin{array}{l}{\epsilon }_{\phi },\,{{\mbox{if}}}\,{C}_{i}={C}_{j},\hfill \\ {e}^{-{\lambda }_{c}\tau }{\phi }_{c}{{\Delta }}{\phi }_{c}+{\epsilon }_{\phi },\,{{\mbox{if}}}\,{C}_{i}\;\ne\; {C}_{j}\ ,\end{array}\right.$$

(19)

where ϵ_ϕ represents the contribution from the random eigenvectors which is negligible in the limit n → ∞.

To compute ${{{{{{{{\mathcal{W}}}}}}}}}_{1}$ in the exponent of Eq. (16), we use Kantorovich–Rubinstein duality (Eq. (12)). Using Eq. (19) in Eq. (12) and ignoring asymptotically small terms, we consider the quantity

$$\mathop{\sum}\limits_{ij\in E}\mathop{\sum}\limits_{u}f(u)\left[{p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau )\right]\\ =\mathop{\sum}\limits_{ij\in E}{e}^{-{\lambda }_{c}\tau }\mathop{\sum}\limits_{u}f(u){\phi }_{c}(u)\\ =\mathop{\sum}\limits_{ij\in E}\frac{{e}^{-{\lambda }_{c}\tau }}{n}\left[\mathop{\sum}\limits_{u:\ {C}_{u}=1}f(u)-\mathop{\sum}\limits_{u:\ {C}_{u}=2}f(u)\right]\\ =\mathop{\sum}\limits_{ij\in E}\frac{{e}^{-{\lambda }_{c}\tau }}{n}\left[\mathop{\sum}\limits_{u:\ {C}_{u}=1}(f(u)-f(i))-\mathop{\sum}\limits_{u:\ {C}_{u}=2}(f(u)-f(j))\right.\\ \quad\,+\left.\mathop{\sum}\limits_{u:\ {C}_{u}=1}f(i)-\mathop{\sum}\limits_{u:\ {C}_{u}=2}f(j)\right].$$

(20)

In the sparse regime, we may make a tree-like approximation in the neighbourhood of i. This means that the number of neighbours of i at distance q inside the cluster is ${p}_{{\mathsf{in}}}^{q}{(n/2)}^{q}$, ignoring terms of order O(1/n) and beyond. Considering only nodes at unit distance (q = 1), we approximate Eq. (20) as

$$\mathop{\sum}\limits_{ij\in E}\frac{{e}^{-{\lambda }_{c}\tau }}{n}\left[\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim i}(f(u)-f(i))-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim i}(f(u)-f(i))\right.\\ \quad\,+\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim j}(f(u)-f(j))-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim j}(f(u)-f(j))\\ \quad\,+\left.\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim i}f(i)-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim i}f(i)+\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim j}f(j)-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim j}f(j)\right] \\ =\mathop{\sum}\limits_{ij\in E}\frac{{e}^{-{\lambda }_{c}\tau }}{n}\left[\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim i}(f(u)-f(i))-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim i}(f(u)-f(i))\right.\\ \quad\,+\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim j}(f(u)-f(j))-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim j\\ }(f(u)-f(j))\\ \quad\,+\left.\frac{n}{2}{p}_{{\mathsf{in}}}(f(i)-f(j))-\frac{n}{2}{p}_{{\mathsf{out}}}(f(i)-f(j))\right].$$

Then, taking the supremum over all 1-Lipschitz functions f, we obtain

$$\begin{array}{ll}&\mathop{\sum}\limits_{ij\in E}{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))\delta ({C}_{i},\ {C}_{j})\\ &\approx \mathop{\sum}\limits_{ij\in E}{e}^{-{\lambda }_{c}\tau }({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})\left(1+\frac{| {p}_{{\mathsf{in}}}-{p}_{{\mathsf{out}}}| }{2({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})}\right)\delta ({C}_{i},{C}_{j})\end{array}$$

(21)

Substituting this into Eq. (16) and noting that p_in + p_out is constant we obtain at a fixed τ

$${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\propto \exp \left[\left(\frac{| {p}_{{\mathsf{in}}}-{p}_{{\mathsf{out}}}| }{2({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})}\right)\mathop{\sum}\limits_{ij\in E}\ \delta ({C}_{i},\ {C}_{j})\right],$$

which up to a constant of proportionality equals the expression in Eq. (9).

Data availability

The data generated in this study is available at https://dataverse.harvard.edu/dataverse/geometric_clustering/.

Code availability

The code to reproduce the results in our paper and to perform geometric modularity optimisation is available at https://doi.org/10.5281/zenodo.5031276.

References

Boguñá, M. et al. Network geometry. Nat. Rev. Phys. 3, 114–135 (2021).
Article Google Scholar
Tenenbaum, J. B., de Silva, V. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000).
Article ADS CAS PubMed Google Scholar
Ding, C., He, X., Zha, H. & Simon, H. D. Adaptive dimension reduction for clustering high dimensional data. In 2002 IEEE International Conference on Data Mining, 2002, Proceedings, 147–154, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1183878 (2002).
Serrano, M. A., Krioukov, D. & Boguñá, M. Self-similarity of complex networks and hidden metric spaces. Phys. Rev. Lett. 100, 078701 (2008).
Article ADS PubMed Google Scholar
García-Pérez, G., Boguñá, M. & Serrano, M. Á. Multiscale unfolding of real networks by geometric renormalization. Nat. Phys. 14, 583–589 (2018).
De Domenico, M. Diffusion geometry unravels the emergence of functional clusters in collective phenomena. Phys. Rev. Lett. 118, 168301 (2017).
Article ADS PubMed Google Scholar
Brockmann, D. & Helbing, D. The hidden geometry of complex, network-driven contagion phenomena. Science 342, 1337–1342 (2013).
Article ADS CAS PubMed Google Scholar
Matousek, J. Lectures on Discrete Geometry, Vol. 212 (Springer, New York, 2013).
Delvenne, J.-C., Yaliraki, S. N. & Barahona, M. Stability of graph communities across time scales. Proc. Natl. Acad. Sci. USA 107, 12755–12760 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Ollivier, Y. Ricci curvature of Markov chains on metric spaces. J. Funct. Anal. 256, 810–864 (2009).
Article MathSciNet MATH Google Scholar
Sturm, K.-T. On the geometry of metric measure spaces. Acta Math. 196, 65–131 (2006).
Article MathSciNet MATH Google Scholar
Lott, J. & Villani, C. Ricci curvature for metric-measure spaces via optimal transport. Ann. Math. 169, 903–991 (2009).
Article MathSciNet MATH Google Scholar
van der Hoorn, P., Cunningham, W. J., Lippner, G., Trugenberger, C. & Krioukov, D. Ollivier–Ricci curvature convergence in random geometric graphs. Phys. Rev. Research 3, 013211 (2021).
Jost, J. & Liu, S. Ollivier’s Ricci curvature, local clustering and curvature-dimension inequalities on graphs. Discret. Comput. Geom. 51, 300–322 (2014).
Article MathSciNet MATH Google Scholar
Bauer, F., Jost, J. & Liu, S. Ollivier–Ricci curvature and the spectrum of the normalized graph Laplace operator. Math. Res. Lett. 19, 1185–1205 (2012).
Article MathSciNet MATH Google Scholar
Sandhu, R. S., Georgiou, T. T. & Tannenbaum, A. R. Ricci curvature: an economic indicator for market fragility and systemic risk. Sci. Adv. 2, e1501495 (2016).
Farooq, H., Chen, Y., Georgiou, T. T., Tannenbaum, A. & Lenglet, C. Network curvature as a hallmark of brain structural connectivity. Nat. Commun. 10, 4937 (2019).
Article ADS PubMed PubMed Central Google Scholar
Sia, J., Jonckheere, E. & Bogdan, P. Ollivier–Ricci curvature-based method to community detection in complex networks. Sci. Rep. 9, 9800 (2019).
Article ADS PubMed PubMed Central Google Scholar
Ni, C.-C., Lin, Y.-Y., Luo, F. & Gao, J. Community detection on networks with Ricci flow. Sci. Rep. 9, 9984 (2019).
Article ADS PubMed PubMed Central Google Scholar
Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA 105, 1118–1123 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, P. & Moore, C. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc. Natl. Acad. Sci. USA 111, 18144–18149 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Shi, J. & Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (2000).
Article Google Scholar
Du, R., Kuang, D., Drake, B. & Park, H. Hierarchical community detection via rank-2 symmetric nonnegative matrix factorization. Comput. Soc. Netw. 4, 7 (2017).
Article PubMed PubMed Central Google Scholar
Newman, M. E. J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577–8582 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002).
Article ADS MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Arenas, A., Fernandez, A. & Gomez, S. Analysis of the structure of complex networks at different resolution levels. New J. Phys. 10, 053039 (2008).
Article ADS Google Scholar
Reichardt, J. & Bornholdt, S. Statistical mechanics of community detection. Phys. Rev. E 74, 016110 (2006).
Article ADS MathSciNet Google Scholar
Tremblay, N. & Borgnat, P. Graph wavelets for multiscale community mining. IEEE Trans. Signal Process. 62, 5227–5239 (2014).
Article ADS MathSciNet MATH Google Scholar
Chung, F. R. K. Spectral Graph Theory, Vol. 92 (American Mathematical Society, Providence, 1997).
Nadakuditi, R. R. & Newman, M. E. J. Graph spectra and the detectability of community structure in networks. Phys. Rev. Lett. 108, 188701 (2012).
Decelle, A., Krzakala, F., Moore, C. & Zdeborová, L. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84, 066106 (2011).
Article ADS Google Scholar
Abbé, E. & Sandon, C. Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, 670–688 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7354380 (2015).
Massoulié, L. Community detection thresholds and the weak Ramanujan property. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, STOC’14, 694–703 (2014).
Gfeller, D. & De Los Rios, P. Spectral coarse graining of complex networks. Phys. Rev. Lett. 99, 038701 (2007).
Article ADS PubMed Google Scholar
Coifman, R. R. et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. USA 102, 7426–7431 (2005).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Arnaudon, A., Peach, R. L. & Barahona, M. Scale-dependent measure of network centrality from diffusion dynamics. Phys. Rev. Res. 2, 033104 (2020).
Article CAS Google Scholar
Schaub, M. T., Billeh, Y. N., Anastassiou, C. A., Koch, C. & Barahona, M. Emergence of slow-switching assemblies in structured neuronal networks. PLoS Comput. Biol. 11, e1004196–28 (2015).
Article ADS PubMed PubMed Central Google Scholar
Young, H. P. Innovation diffusion in heterogeneous populations: contagion, social influence, and social learning. Am. Econ. Rev. 99, 1899–1924 (2009).
Article Google Scholar
Veysseire, L. Coarse Ricci curvature for continuous-time Markov processes. Preprint at https://arxiv.org/abs/1202.0420 (2012).
Villani, C. Optimal Transport: Old and New. (Springer, 2009).
Book MATH Google Scholar
Holland, P. W., Laskey, K. B. & Leinhardt, S. Stochastic blockmodels: first steps. Soc. Netw. 5, 109–137 (1983).
Article MathSciNet Google Scholar
Gosztolai, A., Carrillo, J. A. & Barahona, M. Collective search with finite perception: transient dynamics and search efficiency. Front. Phys. 6, 153 (2019).
Article Google Scholar
Kawamoto, T. & Kabashima, Y. Limitations in the spectral method for graph partitioning: detectability threshold and localization of eigenvectors. Phys. Rev. E 91, 062803 (2015).
Article ADS MathSciNet Google Scholar
Kay, S. M. Fundamentals of Statistical Signal Processing: estimation theory (Prentice Hall, New Jersey, 1993).
Mossel, E., Neeman, J. & Sly, A. A proof of the block model threshold conjecture. Combinatorica 38, 665–708 (2018).
Article MathSciNet MATH Google Scholar
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).
Article MATH Google Scholar
PyGenStability: unsupervised clustering with generalised Louvain and Markov stability. GitHub https://github.com/barahona-research-group/ PyGenStability (2021).
Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport In Proceedings of the 26th International Conference on Neural Information Processing Systems (eds Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. W.) Vol. 2 (Curran Associates Inc., 2013).
Rosas-Casals, M., Valverde, S. & Solé, R. V. Topological vulnerability of the European power grid under errors and attacs. Int. J. Bifurcat. Chaos 17, 2465–2475 (2007).
Article MATH Google Scholar
Schaub, M. T., Delvenne, J.-C., Yaliraki, S. N. & Barahona, M. Markov dynamics as a zooming lens for multiscale community detection: non clique-like communities and the field-of-view limit. PLOS ONE 7, 1–11 (2012).
Article Google Scholar
Reilly, M. B., Cros, C., Varol, E., Yemini, E. & Hobert, O. Unique homeobox codes delineate all the neuron classes of C. elegans. Nature 584, 595–601 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Beguerisse-Diaz, M., Vangelov, B. & Barahona, M. Finding role communities in directed networks using role-based similarity, Markov stability and the relaxed minimum spanning tree. In 2013 IEEE Global Conference on Signal and Information Processing, 937–940 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6736792 (2013).
Davis, J. T., Perra, N., Zhang, Q., Moreno, Y. & Vespignani, A. Phase transitions in information spreading on structured populations. Nat. Phys. 16, 590–596 (2020).
Sawicki, J., Omelchenko, I., Zakharova, A. & Schöll, E. Chimera states in complex networks: interplay of fractal topology and delay. Eur. Phys. J. Spec. Top. 226, 1883–1892 (2017).
Article Google Scholar
Chouzouris, T. et al. Chimera states in brain networks: empirical neural vs. modular fractal connectivity. Chaos 28, 045112 (2018).
Article ADS MathSciNet PubMed Google Scholar
Song, C., Havlin, S. & Makse, H. A. Origins of fractality in the growth of complex networks. Nat. Phys. 2, 275–281 (2006).
Article CAS Google Scholar
Xue, Y. & Bogdan, P. Reliable multi-fractal characterization of weighted complex networks: algorithms and implications. Sci. Rep. 7, 7487 (2017).
Article ADS PubMed PubMed Central Google Scholar
Paulin, D. Mixing and concentration by Ricci curvature. J. Funct. Anal. 270, 1623–1662 (2016).
Article MathSciNet MATH Google Scholar
Levin, D. A., Peres, Y. & Wilmer, E. L. Markov Chains and Mixing Times (American Mathematical Society, Providence, 2006).
Krivelevich, M. & Sudakov, B. The largest eigenvalue of sparse random graphs. Combinatorics, Probability and Computing 12, 61–72 (2003).

Download references

Acknowledgements

A.G. acknowledges support from an HFSP Cross-disciplinary Postdoctoral Fellowship (LT000669/2020-C). This study was supported by funding to the Blue Brain Project, a research center of the École polytechnique fédérale de Lausanne (EPFL), from the Swiss government’s ETH Board of the Swiss Federal Institutes of Technology. We thank Mauricio Barahona for insightful discussions on this topic, Jonas Braun and István Tomon for their helpful comments on the manuscript and Daniel Morales for inspiring us to analyse the C. elegans dataset. We also thank the three anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Neuroengineering Laboratory, Brain Mind Institute & Interfaculty Institute of Bioengineering, EPFL, Lausanne, Switzerland
Adam Gosztolai
Department of Mathematics, Imperial College London, London, UK
Alexis Arnaudon
Blue Brain Project, École polytechnique fédérale de Lausanne (EPFL), Geneva, Switzerland
Alexis Arnaudon

Authors

Adam Gosztolai
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Arnaudon
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.G. and A.A. contributed equally to this work.

Corresponding author

Correspondence to Adam Gosztolai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Allen Tannenbaum, Paul Bogdan and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gosztolai, A., Arnaudon, A. Unfolding the multiscale structure of networks with dynamical Ollivier-Ricci curvature. Nat Commun 12, 4561 (2021). https://doi.org/10.1038/s41467-021-24884-1

Download citation

Received: 08 February 2021
Accepted: 06 July 2021
Published: 27 July 2021
DOI: https://doi.org/10.1038/s41467-021-24884-1

This article is cited by

Multi-scale geometric network analysis identifies melanoma immunotherapy response gene modules
- Kevin A. Murgas
- Rena Elkin
- Allen R. Tannenbaum
Scientific Reports (2024)
Multi-omic integrated curvature study on pan-cancer genomic data
- Jiening Zhu
- Anh Phong Tran
- Allen Tannenbaum
Mathematics of Control, Signals, and Systems (2024)
Community detection in networks by dynamical optimal transport formulation
- Daniela Leite
- Diego Baptista
- Caterina De Bacco
Scientific Reports (2022)
Emergent time, cosmological constant and boundary dimension at infinity in combinatorial quantum gravity
- C. A. Trugenberger
Journal of High Energy Physics (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.