Real-world networks are rarely embedded in physical or Euclidean spaces, which complicates their analysis. Therefore, previous works have typically assumed that the network’s nodes lie in a latent metric space1. A well-chosen metric space can provide a ‘geometric backbone’ to allow the correct representation of node similarities, and to study symmetries and dynamical processes on networks at a fundamental level. For example, assuming an underlying manifold structure2 permits the efficient functioning of clustering algorithms based on Euclidean geometric features such as k-means or expectation maximisation3. Similarly, the hyperbolic space of constant negative curvature provides a natural parametrisation of complex networks to unveil their self-similar clusters across scales4,5. Besides, networks can also be embedded based on a suitable pseudo-distance metric between dynamical network processes, which has helped reveal their functional organisation6,7. However, in general, there is no guarantee that a network is compatible with a given metric space without suffering significant distortion8. Yet, a network may have several, not necessarily self-similar, geometric representations owing to multiscale structure, arising, for example, from clusters at multiple resolutions9. Thus, there is a need for a geometric notion that does not rely on predefined embedding spaces, yet allows unfolding the multiscale structure of a general class of networks.

A promising alternative to embeddings is to define geometry based on a notion of curvature, such as the Ollivier–Ricci (OR) curvature10, which intuitively speaking measures the deviation of the graph from being locally grid-like analogously to being ’flat’ in continuous spaces. ‘Flatness’ of a network can be understood in terms of its local connectivity: the distance of a pair of nodes is the same as the average distance of their neighbourhoods. Thus positive (or negative) OR curvature of an edge indicates that it resides in a region of the graph that is more (or less) connected than a grid. In addition to the intuitive definition, the OR curvature does not impose geometry through embedding but induces an effective network geometry with a precise interpretation in limiting cases. In fact, it is the only one among several discrete curvature notions11,12 known to converge rigorously to the Ricci curvature of a Riemannian manifold13. The OR curvature has also been linked to graph-theoretical objects by deriving formal bounds on the local clustering coefficient and the spectrum of the graph Laplacian14,15. Moreover, the OR curvature of an edge is intrinsically linked to network-level robustness to edge removal, which has led to advances in applications such as studying the fragility of economic networks16 or characterising the human brain structural connectivity17.

However, despite recent clustering heuristics based on the OR curvature18,19, several of its properties have hindered its widespread adoption to study network clusters. Firstly, since the OR curvature depends on structural neighbourhoods, related clustering methods (including the Ricci flow method19) lack a resolution parameter to tune the geometry to unveil multiscale structure in real-world networks. Multiscale clustering has been the subject of intense research and several methods and heuristics have been proposed, along with a parallel list of goodness measures for community structures. These include, without claim of exhaustivity, methods based on statistical mechanical models20,21, normalised cut22, nonnegative matrix factorisation23, modularity24,25,26 and extensions thereof using random walks and diffusion processes9,27 as well as methods based on graph signal processing28. The second shortcoming of the classical OR curvature of an edge is that it is a local quantity, which depends on the degrees of its endpoints14. Thus, it likely provides a suboptimal geometric representation of sparse networks—including many real-world networks where each node connects only to a few others—in which node degrees vary widely. In fact, the classical OR curvature is related to the spectral gap of the graph Laplacian15, the central object of spectral clustering methods29, which no longer indicates clusters in sparse graphs30, similarly to other nodes clustering methods24. This lack of robustness of the OR curvature for sparse networks also precludes its use for studying the limit of information spreading in graphs31, which is linked to a phase transition occurring as the community structure gets weaker and becomes abruptly undetectable31,32,33.

In other words, there is a need for a geometric notion that does not rely on embeddings, is capable of generating a family of geometric representations to encode multiscale clusters as increasingly coarser features. Moreover, these features should robustly signal network clusters at different scales until the fundamental limit of their detection. Such a notion would hold the premise to describe multiscale structures of graphs without the need for statistical null models and to open new avenues to study and control information spreading phenomena using network geometry.


Dynamical OR curvature from graph diffusion

We address this need by combining two distinct frameworks—network-driven dynamical processes and geometry with OR curvature. The spreading of network-driven dynamical processes is shaped by the heterogeneity of the network connectivity. In turn, one may infer the network structure by observing properties of their evolution. We focus on Markov diffusion processes9,34,35,36, a class of linear dynamical systems which is rich enough to capture several properties of nonlinear processes on networks37,38. Let us consider a connected network of n nodes and m edges weighted by pairwise distances wij. We construct a continuous time diffusion on the network by the standard procedure29 of defining the normalised graph Laplacian matrix LK−1(K − A), where K is the diagonal matrix of node degrees with Kii = ∑jAij and A is the weighted adjacency matrix encoding similarities between nodes. To obtain non-negative similarities from node-to-node distances, one may simply take \({A}_{ij}={\max }_{uv}{w}_{uv}-{w}_{ij}\) or \({A}_{ij}={e}^{-{w}_{ij}}\), with the latter more strongly penalising distant points. Then, the probability measure of the diffusion started from the unit mass δi on node i (Fig. 1a, b) evolves according to

$${{{{{{{{\bf{p}}}}}}}}}_{i}(\tau )={\delta }_{i}{e}^{-\tau {{{{{{{\bf{L}}}}}}}}}\ .$$
Fig. 1: Dynamical Ollivier–Ricci curvature capturing the spreading of diffusion processes.
figure 1

a Snapshot at time \({{{{{{\mathrm{log}}}}}}}\,\tau =0.15\) of a pair of diffusion measures pi(τ) and pj(τ) started at nodes i, j of a stochastic block model network (with four clusters of sizes 30, 40, 35, 50 and respective edge probabilities 0.7, 0.8, 0.9, 0.6 within clusters, and 0.1, 0.02 between clusters). When i and j are in the same cluster, the measures overlap significantly. The size of half-circles is proportional to the amount of mass on the respective nodes. b For \(i^{\prime} ,\ j^{\prime}\) in different clusters the measures remain largely disjoint. c Optimal transport plan ζ(τ) superimposed with pi(τ), pj(τ). When i, j (coloured dashed lines) lie in the same cluster only diagonal elements ζuu are positive, meaning only geodesics within a cluster transport significant mass. The white dashed lines correspond to the four clusters. d When the diffusions started at nodes \(i^{\prime}\) and \(j^{\prime}\) in different clusters only entries ζuv with u and v in the corresponding clusters have significant nonzero weight. e Geodesic distance matrix duv showing the block structure of the network. f The evolution of the edge curvatures κ(τ) (Eq. (2)) against time. There are three distinct curvature bundles and in the insets we manually highlighted the edges to reveal their position in the graph. The shade of grey within the bundles reflect the density of edges. Here κij(τ)  0.75 marked by the red dashed line indicates scales when local mixing occurs between the corresponding diffusion pairs. The dashed vertical lines show two such scales (\({{{{{{\mathrm{log}}}}}}}\,\tau =0.15,\ 0.43\)). g, h Graph edges coloured by the curvature reveals clusters at the two scales.

In analogy to the Ricci curvature on a manifold, the classical OR curvature10,39 measures the distance of one-step neighbourhoods of a pair of nodes i, j relative the geodesic (shortest path) distance of i, j (see Supplementary Note 1 for background). Here, instead of structural neighbourhoods we consider distributions generated by diffusion processes across scales τ. Specifically, we start a diffusion process at each node i = 1, …, n to obtain a set of measures pi(τ). We then define the dynamic OR curvature of an edge as the distance of the pair of measures started at its endpoints relative to the weight of the edge

$${\kappa }_{ij}(\tau ):= 1-\frac{{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))}{{w}_{ij}}\ ,$$

whenever ij is an edge and 0 otherwise. Intuitively, Eq. (2) reflects the overlap of diffusions over time when started wij distance apart, measured by \({{{{{{{{\mathcal{W}}}}}}}}}_{1}\), the optimal transport distance40. The latter is obtained as a solution to a minimisation problem (Eq. (11) in the “Methods” section) yielding the least cost of transporting the measure pi(τ) to pj(τ) by the optimal transport plan ζ(τ). The entries of the optimal transport matrix are shown on Fig. 1c, d representing the quantity of mass moved between each pair of nodes u and v along their connecting geodesic of length duv (Fig. 1e).

By contrast to Eq. (2), previous works have typically defined the classical OR curvature based on one-step transition probabilities of lazy random walks pi(τ) pi = αI + (1 − α)δiK−1A. In this definition, scale is introduced via a laziness (or idleness) parameter α, varying the importance of local neighbourhoods relative to wij, which in effect introduces self-loops in the graph. Our definition (Eq. (2)) replaces one-step neighbourhoods by probability measures supported on the whole graph, with the timescale τ of the diffusion process playing the role of the scale parameter. As expected, Eq. (2) recovers the classical OR curvature10 as a first-order approximation. In addition, the dynamical OR curvature inherits the geometric intuition of the classical definition. Analogously to the Ricci curvature on planes, spheres and hyperboloids, κij(τ) is zero on grids but positive and negative on cliques-like and tree-like networks, respectively, for all finite scales τ (Supplementary Fig. 1b, c). In the following, we are interested in studying the curvature distribution across edges when the network structure deviates from these canonical topologies.

Edge curvature gap differences in rate of information spreading

Most real-world networks exhibit organisation on several scales. As an illustration Fig. 1a, b shows an unweighted stochastic block model (SBM) network41 of four clusters and two nontrivial scales. To construct these scales, we drew edges independently between clusters with different probabilities (0.1 or 0.02). We varied cluster sizes and within-cluster edge probabilities but ensured that the latter remained sufficiently high to easily visualise clusters (see Fig. 1 for parameters). We show that this multiscale structure can be revealed by scanning through a finite range of scales τ and studying snapshots of curvature distribution across edges.

The characteristic scales of a network are related to the overlap between pairs of diffusion measures pi(τ), pj(τ). This overlap depends on the starting points i, j and on network clusters which can confine diffusions on well-connected regions for long times before reaching the stationary state π9,34,35,42, given by πi = Kii/∑iKii. This transient phenomenon is reflected by the structure of the optimal transport matrix ζ(τ). If i, j lie within the same cluster, the measures quickly overlap (Fig. 1a) and only diagonal entries of ζ(τ) are positive (Fig. 1c), weighing only short, within-cluster geodesics. By contrast, started at different clusters, the measures remain almost disjoint (Fig. 1b) and ζ(τ) is forced to select longer geodesics (Fig. 1d, e), reflected by the large entries in the off-diagonal block.

The evolution of the edge curvature κij(τ) (Fig. 1f) aggregates the information in ζ(τ) into a single number that is related to the rate of mass exchange between clusters at a given scale. We see in Fig. 1f that, initially, when all nodes support disjoint point masses and the diffusions have not yet mixed, \({{{{{{{{\rm{lim}}}}}}}}}_{\tau \to 0}{\kappa }_{ij}(\tau )\to 1-{{{{{{{{\mathcal{W}}}}}}}}}_{1}({\delta }_{i},{\delta }_{j})/{d}_{ij}=0\). At the other extreme, as the diffusions reach stationary state, \({{{{{{{{\rm{lim}}}}}}}}}_{\tau \to \infty }{\kappa }_{ij}(\tau )\to 1-{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{\boldsymbol{\pi }}}}}}}},\ {{{{{{{\boldsymbol{\pi }}}}}}}})/{d}_{ij}=1\). At intermediate scales, the curvature can take values between 1 and some finite negative number depending on the graph15. We find that, as the curvature of an edge evolves, the scale at which it approaches unity indicates how easy it is to propagate information between clusters. More precisely, in the “Methods” section, we prove that this scale gives an upper bound on the mixing time \({\tau }_{ij}^{{\mathsf{mix}}}\) of the diffusion pair, namely,

$$\begin{array}{lll}{\tau }_{ij}^{{\mathsf{mix}}}&:= &\frac{1}{2}\mathop{\sum}\limits_{uv}| {\zeta }_{uv}(\tau )-{\zeta }_{uv}(\infty )| \\ &\le &\min \{\tau :\ {\kappa }_{ij}(\tau )\ge 0.75\},\end{array}$$

where ζ(τ) is the optimal transport plan with marginals pi(τ) and pj(τ). Note that κij(τ) ≥ 0.75 does not imply that the corresponding diffusion processes have approached stationary state independently, but only that they exchange negligible mass at that or larger scales.

Importantly, a gap in the distribution of curvatures appears when the curvature exceeds 0.75 for some edges while being <0.75 for others indicating a network bottleneck that limits mass flow. To illustrate this, Fig. 1f shows three bundles of edges, the edges within the bundle with most positive curvature are found within clusters, while edges within the other two bundles lie between clusters. Figure 1g, h show the two scales on Fig. 1f (\({{{{{{\mathrm{log}}}}}}}\,\tau =0.15,\ 0.43\)) where the curvature has exceeded 0.75 for a given bundle, indicating the diffusions are well mixed across the corresponding edges, but not across other edges whose curvature is <0.75. The latter mark bottleneck edges which lie between the expected partitions with four and two clusters, respectively. This simple example shows the importance of the scale parameter τ in our curvature definition to capture network scales. Let us emphasise that the characteristic scales revealing network clusters in our framework are indicated by curvature gaps, i.e., differences in the relative magnitude of curvatures. This is unlike some previous works18,19, where clusters were identified based on finding negatively curved edges between clusters. Before applying this idea to real networks, we take a closer look at the curvature gap in the theoretical context of the stochastic block model.

Curvature gap is a robust indicator of clusters in stochastic block models

Since in our example any pair of diffusions are supported by one (Fig. 1a) or two (Fig. 1b) clusters, we focus on the subgraph G induced by two clusters (Fig. 2a). Let us simplify one more step and assume that G is a realisation of \({{{{{{{\mathcal{G}}}}}}}}=\,{{\mbox{SSBM}}}\,(n/2,\ {p}_{{\mathsf{in}}},\ {p}_{{\mathsf{out}}})\), the symmetric SBM composed of two planted partitions of equal size. Edges are generated independently with probability pin within-clusters and probability pout between-clusters. This symmetry assumption is not necessary in general, as illustrated by our other examples, but it allows us to make links to known theoretical results. We will denote the ground truth as \({C}_{i}^{* }\in \{1,\ -1\}\) for each node i and define \(\bar{k}=n({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})/2\) as the average degree.

Fig. 2: Edge curvature gap indicates the presence of clusters where spectral clustering fails.
figure 2

Two-partition symmetric SBM graph in the a dense regime (pin = 0.5, pout = 0.1) and b sparse regime (pin = 8/n, pout = 0.5/n). Edges are coloured by the curvature (n = 100, \({{{{{{\mathrm{log}}}}}}}\,\tau =0.83\)). c, d The histogram of eigenvalues obtained from five SBM realisations in dense and sparse regime, respectively. In the dense regime, the eigenvalue λc corresponding to the community structure is well separated from the bulk eigenvalues, but overlaps in the sparse regime. e, f The evolution of edge curvatures driven against diffusion time. A gap between the curvatures of within-edges (orange) and between-edges (grey) is associated with the presence of clusters. When κij > 0.75 (horizontal dashed line) the diffusions are well mixed across the respective edges. The curvature gap is maximal at \({\tau }_{\kappa }\approx {\lambda }_{c}^{-1}\) (orange and black vertical lines). g There is no curvature gap in the limiting ER graph (inset, pin = pout = (8 + 0.5)/(2n)). h Maximal curvature gap averaged over 20 SBM realisations for each fixed \(\bar{k}\) with 104 nodes, against edge density ratio. The horizontal line marks the estimated background noise level. The intersection of this line with the mean curvature gap defines \({r}_{\bar{k}}^{* }\), the largest possible edge density ratio to detect clusters. Shaded error bars indicate one standard deviation from the mean. i Phase diagram of critical edge density ratio against average degree. The numerically obtained critical edge density ratios computed from the curvature gap are superimposed with the theoretical Kesten–Stigum detection limit (dashed line) and show excellent agreement. Shaded error bars indicate one standard deviation from the mean. Grey shaded area denotes the regime where detection is possible.

Classical spectral clustering methods29 perform well for dense graphs (Fig. 2a), where \(\bar{k}\) is an increasing function of n. This suppresses fluctuations for large n causing a spectral gap to appear when the eigenvalue λc of the Laplacian matrix L of G separates from bulk eigenvalues arising from randomness29 (Fig. 2c). In this dense regime, λc is well approximated by \({\langle {\lambda }_{c}\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}=2{p}_{{\mathsf{out}}}/({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})\), the second eigenvalue of the ensemble averaged Laplacian \({\langle {{{{{{{\bf{L}}}}}}}}\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}\) (see Supplementary Note 2). Since λc can be identified due to the spectral gap, clustering involves simply labelling nodes by the sign of the entries of the corresponding eigenvector \({\phi }_{c}(u)=1/\sqrt{n}\) when \({C}_{u}^{* }=1\) and \(-1/\sqrt{n}\) when \({C}_{u}^{* }=-1\). However, for sparse graphs (Fig. 2b), where \(\bar{k}\) is constant (independent of n), the spectral gap ceases to exist43 (Fig. 2d). Thus, spectral algorithms relying on identifying λc perform no better than chance. To perform clustering in this regime, one needs to go beyond spectral clustering using, for example, the belief propagation method in statistical physics or the related non-backtracking operator whose spectrum is better behaved31,33.

To see how robustly the dynamical OR curvature indicates the presence of clusters in the symmetric SBM, we define the curvature gap as the difference between the mean curvatures of within- and between-edges at a given scale

$${{\Delta }}\kappa (\tau ):= \frac{1}{\sigma }\left|{\langle {\kappa }_{ij}(\tau )\rangle }_{{C}_{i}^{* } = {C}_{j}^{* }}-{\langle {\kappa }_{ij}(\tau )\rangle }_{{C}_{i}^{* }\ne {C}_{j}^{* }}\right|.$$

Here the averages are over within and between-edges, normalised by \(\sigma =\sqrt{\frac{1}{2}\left({\sigma }_{{{{{{{{\rm{within}}}}}}}}}^{2}+{\sigma }_{{{{{{{{\rm{between}}}}}}}}}^{2}\right)}\) in terms of the standard deviations of both sets of curvatures. This measure is adapted from the sensitivity index in signal detection theory, known to be, asymptotically, the most powerful statistical test for discriminating two distributions44. Large curvature gap Δκ(τ) indicates that the within and between edges have curvatures different enough for the clusters to be recovered (Fig. 2e, f). Correspondingly, in the limits τ → 0,  where the curvatures are uniform across the graph Δκ(τ) vanishes and, likewise, in the absence of structure (pin ≈ pout in the Erdős–Rényi (ER) limit) we have Δκ(τ) = 0 for all τ (Fig. 2g). At intermediate scales, we find that the scale of maximal curvature gap occurs at τκ at which point the curvatures of within-edges is κij(τκ) ≈ 0.75. In agreement with Eq. (3), this indicates well-mixed diffusions across these edges relative to low-curvature bottleneck edges between clusters, which indicate incomplete mixing. We also find that \({\tau }_{\kappa }\approx {\lambda }_{c}^{-1}\) (Fig. 2e, f). These results show that positive curvature gap is associated with the presence of clusters.

What is the minimum curvature gap needed to detect clusters? Previous works on the limits of cluster detection has shown that if the clusters are too weak (high rpout/pin) or the graph too sparse (low \(\bar{k}\)), no clustering algorithm performs better than chance, or distinguish G from an Erdős–Rényi graph (r = 1). This is known as the limit of weak-recovery or detection and is characterised by the Kesten–Stigum (KS) threshold \(r={r}_{{{\mbox{KS}}}}=(\bar{k}-\sqrt{\bar{k}})/(\bar{k}+\sqrt{\bar{k}})\)31,32,45.

To study this limit, we sampled 20 networks from \({{{{{{{\mathcal{G}}}}}}}}\) for a range of \(\bar{k}\) and r. For each sample, we computed the maximal curvature gap \({{\Delta }}{\kappa }^{* }:= \mathop{\max }\limits_{\tau }{{\Delta }}\kappa (\tau )\) and formed the ensemble average quantity \({\langle {{\Delta }}{\kappa }^{* }\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}\). As r increases for a given \(\bar{k}\) we observe that \({\langle {{\Delta }}{\kappa }^{* }\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}\) decreases exponentially until a certain noise level (Fig. 2h). The critical edge density ratio \({r}_{\bar{k}}^{* }\) can be estimated as the smallest r where \({\langle {{\Delta }}{\kappa }^{* }\rangle }_{{{{{{{{\mathcal{G}}}}}}}}}\) dropped below a threshold background noise level, estimated here at 0.035 (black horizontal line). This choice of threshold is not absolute, as it is affected by the finite-size effect of the SBM graphs. An analytical derivation of this threshold is out of scope of this work, but our numerical experiment clearly shows that the curvature gap detects a signal from the planted partitions up to the KS limit (Fig. 2i).

Geometric cluster detection in the sparse regime

Given that the curvature gap (Eq. (4)) indicates the presence of clusters until the fundamental KS limit, we asked if this information could be used to recover the ground truth partition. The definition of curvature gap (Eq. (4)) suggests looking for equilibrium configurations of the unit-temperature Boltzmann distribution over the cluster assignments C,

$${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\propto {e}^{{\sum }_{ij}{\kappa }_{ij}(\tau )\delta ({C}_{i},{C}_{j})}\ ,$$

where κ is a matrix with entries κij, the sum is over all edges ij and δ(Ci, Cj) = 1 if Ci = Cj and 0 otherwise. The distribution involves only within-edges because finding those is equivalent to finding between-edges, up to a normalisation factor.

The distribution \({\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\) is important because all of its equilibrium states are equivalent and correlate with the ground truth partition of the symmetric SBM \({{{{{{{\mathcal{G}}}}}}}}\). To see this, we connect \({\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\) to the posterior distribution \({\mathbb{P}}(C| G)\) of the cluster assignments given the graph drawn from \({{{{{{{\mathcal{G}}}}}}}}\). In the sparse regime, the likelihood of observing G with a given cluster assignment C is

$${\mathbb{P}}(G| C)\propto \mathop{\prod}\limits_{ij}{\left(\frac{{p}_{{\mathsf{in}}}}{{p}_{{\mathsf{out}}}}\right)}^{\delta ({C}_{i},{C}_{j})}\propto {\mathbb{P}}(C| G)$$

(see Eq. (15) in the “Methods” section). The second part of Eq. (6) results from Bayes’ theorem using a uniform prior on C, since a priori all configurations are equally likely. It has been previously shown31 that P(CG) is equivalent to the Boltzmann distribution of an Ising model with constant interaction strength

$${\mathbb{P}}(C| G)\propto {e}^{\beta {\sum }_{ij}\delta ({C}_{i},{C}_{j})}$$

with inverse temperature \(\beta ={{{{{{\mathrm{log}}}}}}}\,({p}_{{\mathsf{in}}}/{p}_{{\mathsf{out}}})\approx {p}_{{\mathsf{in}}}-{p}_{{\mathsf{out}}}\). Note that the equilibrium state (1, 1, …, 1) is trivial assigning all nodes to one cluster. However, asymptotically (n → ) the probability of this state vanishes and the Boltzmann distribution is uniform over all other configurations with group sizes n/2 and poutn/2 between-edges31. The fact that one of these states is the ground truth partition, and all equilibrium states of Eq. (7) are equivalent up to a permutation of nodes within clusters means they are indistinguishable from the ground truth partition.

Due to the equivalence between Eqs. (6) and (7), to prove the equivalence between Eqs. (5) and (6) we show that Eq. (5) can also be reduced to Eq. (7). The main insight is that the dynamical OR curvature (Eq. (2)) is constructed using pairs of diffusions, as opposed to single diffusions used in previous studies9,20,35. Thus, eigenmodes arising from random fluctuations, which would otherwise confound methods relying on the spectrum of the Laplacian, are reflected equally in the spectrum of both diffusions and cancel out upon taking differences over all adjacent node pairs. This allows recovering the community eigenvector ϕc even in the sparse regime where the spectral gap vanishes and λc is no longer identifiable from the spectrum (Fig. 2d). To see this, we consider the difference between a pairs of diffusions and use the spectral expansion to write \({\sum }_{ij}({p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau ))={\sum }_{s}{e}^{-{\lambda }_{s}\tau }{\phi }_{s}(u){{\Delta }}{\phi }_{s}\) where

$${{\Delta }}{\phi }_{s}:= \mathop{\sum}\limits_{ij}\left({\phi }_{s}(i)-{\phi }_{s}(j)\right)\ .$$

We find that, instead of looking at the eigenvalue distribution (Fig. 2d), the community eigenvector ϕc can be recovered by the relative amplitude of Δϕs. Indeed, on a single SBM realisation, Δϕs is large for only a few eigenvectors ϕs and diminishes for others Fig. 3a. Importantly, those and only those eigenvectors with large Δϕs correlate strongly with the ground truth (Fig. 3a inset). As seen in Fig. 3b, the best eigenvector is not ϕ2, i.e., the one whose eigenvalue is second in the spectrum and is used by spectral clustering methods, but the one whose eigenvalue is inside the bulk in Fig. 2d and thus cannot be identified by looking at the spectrum alone. The correlation with the ground truth for ϕc with the highest Δϕs averaged over 50 SBM realisations remains close to the highest achievable among all eigenvectors as the KS bound is approached. Meanwhile, ϕ2, the eigenvector used by spectral clustering methods is suboptimal (Fig. 3c). We also found that, close to the KS bound, often a few other eigenvectors with similarly high Δϕs appear, suggesting an improved clustering method combining several top eigenvectors, but this is out of scope here.

Fig. 3: Detecting communities using pairs of diffusions near the weak recovery limit.
figure 3

a Difference in eigenvectors Δϕs (Eq. (8)) between diffusion processes started at adjacent nodes for a single sparse SBM network (pin = 3/n, pout = 0.5/n, n = 104). Each dot marks (λs, Δϕs) for the 50 smallest eigenvectors, coloured by the correlation of the corresponding eigenvector with the ground truth, shown in the inset. b The eigenvector ϕc with the highest Δϕs encodes the cluster structure (solid line), whereas the second eigenvector ϕ2, used by spectral clustering methods, are driven by high random fluctuations. c Correlation of eigenvectors with ground truth against distance to KS limit (n = 105, \(\bar{k}=3\)). The eigenvector identified by the highest Δϕs approaches the correlation with the ground truth of the best eigenvector in the spectrum. All eigenvectors become uncorrelated with the ground truth at the KS limit. Error bars indicate one standard deviation from the mean.

To express the curvature in the exponent of Eq. (5) we use the dual formulation of the optimal transport distance (Eq. (12) in “Methods” section). The fact that Δϕc dominates the contribution from other eigenvectors, allows us to approximate \({\sum }_{ij}({p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau ))={e}^{-{\lambda }_{c}\tau }{\phi }_{c}{{\Delta }}{\phi }_{c}+{\epsilon }_{\phi }\propto {e}^{-{\lambda }_{c}\tau }{\phi }_{c}+{\epsilon }_{\phi }\), where ϵϕ is an asymptotically small term. We use this expression, together with the duality formula (Eq. (12)) to express Eq. (5). Finally, in the sparse regime, we may make a tree-like approximation of the neighbourhoods of i and j to find that Eq. (5) reduces to

$${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\propto {e}^{| {p}_{{\mathsf{in}}}-{p}_{{\mathsf{out}}}| {\sum }_{ij}\delta ({C}_{i},{C}_{j})}.$$

We refer the reader to the “Methods” section for details. Eq. (9) is the same as Eq. (7) when the communities are assortative (pin > pout). We then conclude that the curvatures encode the communities of the symmetric SBM and allow it to be recovered until close to the Kesten–Stigum bound.

In the next section, we present a clustering algorithm based on this insight that can find multiscale clusters in real-world networks.

Geometric modularity for the multiscale clustering of networks

To exploit the property of the dynamical OR curvature to give multiple geometric representations, we develop a multiscale graph clustering algorithm for real-world networks. Using Eq. (5), we introduce the geometric modularity function

$${Q}_{\kappa }(C,\tau )=\frac{1}{2{m}_{\kappa }}\mathop{\sum}\limits_{ij}({\kappa }_{ij}(\tau )-{\kappa }_{0})\delta ({C}_{i},{C}_{j})\ ,$$

where 2mκ = ∑ijκij is a normalisation factor and \({\kappa }_{0}={\max }_{ij}{\kappa }_{ij}({\tau }_{{\mathsf{min}}})\) is a constant ensuring that all edges have small non-positive curvature at the smallest computed scale \({\tau }_{{\mathsf{min}}}\). Hence optimising Eq. (10) at small times yields separate communities for each node whereas at large times, when κij(τ) → 1 for all ij, all nodes are merged to a single community. At intermediate scales, the curvatures will have negative and positive values on different edges, making the detection of non-trivial clusters possible without a statistical null-model. This is in contrast to classical modularity24, which minimises the expected number of edges between clusters, and requires a statistical null-model (typically the configuration model), which can hinder identifying functional communities based on dynamics6.

To detect robust partitions at several scales, we compute the curvature distribution at scales τ spanning the entire dynamical range of the curvature and, at each τ, we sample the cluster landscape Qκ(C, τ) by optimising Eq. (10) using the Louvain algorithm46,47 with 102 random initialisations. At a given τ, we take the cluster with the highest geometric modularity and deem it robust if it has a low variation of information VIτ against 50 other randomly chosen clusters at this scale, as well as low variation of information \({{{\mbox{VI}}}}_{\tau \tau ^{\prime} }\) against the best cluster assignments at nearby scales \(\tau ^{\prime}\). As an example, we show in Fig. 4a the result of this computation on our four-partition SBM graph with two hard-coded scales. We clearly see two large plateaus with low VIτ and \({{{\mbox{VI}}}}_{\tau \tau ^{\prime} }\), corresponding to robust clusters, shown in Fig. 4b, c. At the smallest scales we find no robust communities shown by the sharp increase in the number of communities and the large VIτ. We compared geometric modularity to other clustering methods on the SBM and LFR generative benchmark graphs achieving near-state-of-the-art accuracy in both cases, close to the theoretical limit (see Supplementary Fig. 2 and Supplementary Note 3 for details). Notably, our method performs substantially better than the Ricci flow19 method based on the classical OR curvature reinforcing our theoretical insight that combining diffusion processes and OR curvature allows surpassing the limitations of previous OR curvature-based methods.

Fig. 4: Clustering networks based on multiscale geometric modularity.
figure 4

a Clustering statistics computed based on 102 Louvain realisations for the multiscale stochastic block model graph. Vertical dashed lines show scales at which stable clusters are detected based on low variation of information at a given scale and persistent low variation of information between Louvain realisations across scales. The communities obtained are shown on b for scales \({{{{{{\mathrm{log}}}}}}}\,\tau =-2.3\) and c for \({{{{{{\mathrm{log}}}}}}}\,\tau =-1\). Edges are coloured by the curvatures κij(τ) at the respective scales. d Clustering statistics for the European power grid. Two representative stable scales are shown e for \({{{{{{\mathrm{log}}}}}}}\,\tau =-0.95\) and f for \({{{{{{\mathrm{log}}}}}}}\,\tau =-0.5\). Node colours show community membership. g Clustering statistics for and the network of C. elegans single-neuron homeobox gene expressions show a plateau of stable scales with very similar partitions. h Clustering statistics obtained with Markov stability shows stable scales only at small times with single-node communities, indicating overfitting, and many non-robust partitions at larger scales with high variation of information. i Distance from ground truth based on structural neuronal types or the predicted clusters. Geometric modularity obtained significantly better performance than Markov stability. j Clustering of the C. elegans homeobox gene expression data obtained from geometric modularity optimisation superimposed with the ground truth.

Our algorithm involves several steps including computing the geodesic distance matrix, computing the diffusions (Eq. (1)) starting from all nodes, computing the curvatures (Eq. (2)) for all edges and running the Louvain algorithm. We discuss the complexity in detail in Supplementary Note 5. Briefly, the step with highest complexity is the computation of the edge curvatures which using exact algorithms runs in time O(mn5/2), which denotes that the computation time grows at most as Mn5/2 for a positive constant M and for m, n sufficiently large. This complexity arises since the computation involves solving a linear programme of complexity O(n5/2) for each of the m edges. On sparse networks this is on par with other random walk9,20 and probabilistic21,25,27 methods. However, for small times a complexity close to O(mn) can be achieved by ’trimming’ the probability measures, i.e., reducing their support size by ignoring the mass on nodes below a certain cutoff value (Supplementary Fig. 4b). For large times, a complexity reduces to O(mn) by replacing the the optimal transport distance by the regularised Sinkhorn distance48 (Supplementary Fig. 4c). This computational speedups together with the parallelised implementation of algorithm means that it scales well to moderately sized graphs (~104 nodes).

Due to the link between high edge curvature and well-mixed state (Eq. (3)), we expected that at robust scales the clusters will correspond to those regions which have a high amount of redundant information, and thus can be disconnected without affecting the dynamics within them. To see this, we applied this clustering algorithm to the European power grid graph in Fig. 4d–f, an unweighted network of major electrical lines, which has been previously analysed for robustness49, multiscale communities50 and centrality36. The multiscale community structure can be clearly seen with the many minima of the VIτ function in Fig. 4d. We displayed two scales in Fig. 4e, f which unfold parts of the power grid which have been historically independently developed. The smaller scale (at around \({{{{{{\mathrm{log}}}}}}}\,\tau =-0.95\)) marks economical or historical unions and states (Skandinavia, Benelux, Czechoslovakia, Balkans, etc.). Likewise, the larger scale (at around \({{{{{{\mathrm{log}}}}}}}\,\tau =-0.5\)) divides historical Eastern–Western Europe. Interestingly with the boundary in Germany runs along the iron curtain, which also demarcates the regions between major electricity companies.

Finally, we analysed a recent dataset of homeobox gene expression in single neurons of C. elegans51. The authors in ref. 51 found based on a multivariate linear regression that the homeobox gene expression profile in a given anatomical neuron class can explain on average 74% of the expression level of the remaining genes in that neuron class. We therefore asked whether the homeobox gene expression profile has sufficient information to cluster neurons into their known anatomical classes.

The data contains a binary feature vector for each of the 301 neurons, indicating the presence of a protein expressed by any of the 105 homeobox genes in the given neuron. To convert this data into a graph with nodes being neurons, we first eliminated all homeobox genes co-expressed in none or more than 90% of the neurons to retain 67 homeobox genes. We then constructed an all-to-all graph adjacency matrix weighted by the Jaccard similarity index between expression profiles of neurons. To increase the number of edges with negative curvature, thus improve the detection at the smallest scales, we sparsified this network using a geometric sparsification method52 with parameter γ = 0.01. This method retains at most a fraction γ edges of the original graph as minimum spanning tree augmented by edges relevant for preserving local or global geometry of the graph.

The results of our clustering algorithm on this graph is shown in Fig. 4g and compared with the result of Markov stability9 on Fig. 4h, a multiscale method based on persistence of diffusions. Geometric modularity obtains a large range of robust scales with highly similar clusters—as shown by the low VIτ and \({{{\mbox{VI}}}}_{\tau \tau ^{\prime} }\). These scales correlate closely with the known ground truth of 117 anatomical neuron classes (Fig. 4i). In contrast, for Markov stability9, the scales with low VIτ overfit the graph finding too many clusters (Fig. 4h) which correlate less with the ground truth (Fig. 4i). Likewise, hierarchical clustering fails to identify the ground truth communities51. We also compared our result with that obtained from a broad range of clustering methods finding that other methods either overfit the neuron classes or found too few partitions (Supplementary Note 4 and Supplementary Fig. 3). Althouth the wavelet method of Tremblay and Borgnat returned a clustering near the ground truth, this scale was not identifiable based on their stability metric (Supplementary Fig. 3b). On Fig. 4j we superimpose the best clustering from geometric modularity against the ground-truth. We observe little differences, apart from VA and AS nodes as well as VD and DD often clustered together. Careful look reveals close biological relationship between these classes; all four classes correspond to motor neurons, with pairs expressing the same neurotransmitters—VA, AS expressing acetylcholine and VD, DD expressing gamma-aminobutyric acid (GABA). These novel results give direct quantitative support to the claim that homeobox gene expression patterns encode structural neuron types. We also observe other stable partitions at larger scale, but they did not correlate the ground-truth.

Overall, these results give a strong demonstration that our method is able to find stable clusters in sparse graphs, and provide meaningful insights into distinct types of real-world networks.


Real-world networks often exhibit community structure on multiple scales, due to differences between the rates of information propagation in regions the network on various timescales. We introduced the concept of dynamical OR curvature which defines a scale-dependent geometry from the evolution of pairs of diffusion processes on the network. We showed that the edge curvature carries a precise meaning in this context bounding the rate of information flow across edges. Consequentially, gaps in the edge curvature distribution arising from differences between edge curvatures within and between regions indicate network bottlenecks. Systematically finding these gaps in the edge curvature distribution captures progressively coarser community features as the diffusion processes evolve. This result does not rely on the dynamics being linear diffusions, making it suitable to study the interaction of arbitrary dynamical processes. We expect that, in the future, this approach can be used to tune the geometry of the graph to control the flux or interaction of network-driven dynamical processes, for example, leading to insights to metapopulation models53 and synchronisation problem, for example, to better understand the coexistence of chimera states54,55.

Unlike previous geometric approaches, which rely on embedding a network into a particular latent metric space5,6,7,35, our approach constructs an effective object - the weighted and signed edge curvature matrix. Whilst not requiring specific assumptions used by latent space approaches it is worth noting that the dynamical OR curvature is constructed on the metric space formed by all the shortest paths of the graph. This property suggests links with the field of fractal geometry which studies scaling properties of graphs using the shortest path metric56. Thus, in graph families such as complex networks whose fractal geometry can be characterised57 one can expect relationships between coarse-graining schemes based on box-covering techniques and aggregating clusters based on similar dynamical OR edge curvatures, which could be exploited for controlling the multifractal geometry of these networks.

Although diffusion processes constructed from the graph Laplacian have been explored for network clustering9,27,35, our work differs in the use of diffusion pairs, as opposed to single diffusions, to construct the curvature. Diffusion pairs are implicitly coupled through the graph and pick up random variations independently, which can be exploited to average out non-informative fluctuations. On stochastic block models, this feature allows the curvature gap to robustly indicate clusters in the sparse regime down to the fundamental limit, where clustering methods relying on the spectral gap in the Laplacian fail43. We also found a new measure of eigenvalue quality, able to select the best eigenvector to be used in spectral methods. Interestingly, the edge curvatures are defined on the set of shortest paths which cannot contain the same edge twice, a subset of the set of non-backtracking walks. Our results are therefore consistent with previous works on the limits of cluster detection using statistical physics objects including the spectrum of non-backtracking operator33 or related message passing approaches31. We expect this insight to provide a new avenue to study the fundamental limits of efficient clustering from a geometric perspective.

Finally, we introduced the notion of geometric modularity to build an easy-to-use multiscale clustering algorithm. Notably, our algorithm achieved near-state-of-the-art performance in sparse SBM graphs, better than methods relying on the spectral gap in the Laplacian matrix (e.g., spectral clustering29 and edge-betweenness25) as well as those relying on the classical OR curvature19. This confirms that combining diffusions and OR geometry allows surpassing the limitations of these methods, which work well only on dense graphs. We also found robust and interpretable communities on multiple scales in real-world networks without the tendency of overfitting. Overall, we expect our insights connecting dynamical processes, geometry and network clustering to open new avenues to studying and controlling the structural and dynamical properties of networks.


Optimal transport distance

To measure the distance between a pair of measures pi(τ) and pj(τ) we use the optimal transport distance40 (also known as 1-Wasserstein or earth-mover distance), defined as

$${{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau )) =\mathop{{{{{\mathrm{min}}}}} }\limits_{{{{{{{{\boldsymbol{\zeta }}}}}}}}}\mathop{\sum}\limits_{uv}{d}_{uv}{\zeta }_{uv}\ ,\\ \,{{\mbox{subject to}}}\,\ \mathop{\sum}\limits_{v}{\zeta }_{uv} ={p}_{i}^{u}(\tau )\ ,\quad \mathop{\sum}\limits_{u}{\zeta }_{uv}={p}_{j}^{v}(\tau )\ .$$

The constraints in Eq. (11) ensure that the optimal transport plan \({{{{{{{\boldsymbol{\zeta }}}}}}}}(\tau )\in {{\mathbb{R}}}^{n\times n}\) is a coupling of the measures pi(τ), pj(τ), i.e., ζ(τ) is a joint distribution that admits pi(τ) and pj(τ) as marginals.

An equivalent formulation of this distance can be constructed from the Kantorovich–Rubinstein duality40, given by

$${{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))= \mathop{{{{{\mathrm{sup}}}}}}\limits_{f} \mathop{\sum}\limits_{u}f(u)[{p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau )]$$

where the supremum is taken over all 1-Lipschitz functions f on the graph, that is,

$$| f(u)-f(v)| \le {d}_{uv}$$

for any node pair u, v.

Upper bound on the mixing time in terms of curvature

Here we prove inequality (3), which gives an upper bound on the mixing time of the coupled diffusions with measures pi(τ), pj(τ) in terms the dynamical OR curvature. The ϵ-mixing time is defined as the smallest τ where the law of the coupled process, the optimal transport plan ζ(τ), is within an ϵ radius of the stationary distribution

$${\tau }_{ij}(\epsilon ):= \min \{\tau :\ | | {{{{{{{\boldsymbol{\zeta }}}}}}}}(\tau )-{{{{{{{\boldsymbol{\zeta }}}}}}}}(\infty )| {| }_{{{\mbox{TV}}}}\le \epsilon \},$$

where the notion of “close to stationarity” is quantified by the total variation distance \(| | {{{{{{{\boldsymbol{\zeta }}}}}}}}(\tau )-{{{{{{{\boldsymbol{\zeta }}}}}}}}(\infty )| {| }_{{{\mbox{TV}}}}:= \frac{1}{2}{\sum }_{uv}| {\zeta }_{uv}(\tau )-{\zeta }_{uv}(\infty )|\). Since pi(τ) and pj(τ) are marginals of ζ(τ) we have that

$${\tau }_{ij}(\epsilon ) =\min \{\tau :\ | | {{{{{{{{\bf{p}}}}}}}}}_{i}(\tau )-{{{{{{{\boldsymbol{\pi }}}}}}}}| {| }_{{{\mbox{TV}}}}+| | {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau )-{{{{{{{\boldsymbol{\pi }}}}}}}}| {| }_{{{\mbox{TV}}}}\le \epsilon \}\\ =\min \{\tau :\ | | {{{{{{{{\bf{p}}}}}}}}}_{i}(\tau )-{{{{{{{{\bf{p}}}}}}}}}_{j}(\tau )| {| }_{{{\mbox{TV}}}}\le \epsilon \}\ ,$$

where we used the independence of the diffusion processes. From here, we may follow ref. 58 and use the Csiszár-Kullback-Pinsker inequality for the optimal transport distance

$$| | {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau )-{{{{{{{{\bf{p}}}}}}}}}_{j}(\infty )| {| }_{{{\mbox{TV}}}}\le (1/{d}_{0}){{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))\ ,$$

where \({d}_{0}=\mathop{\min }\limits_{ij}{d}_{ij}\) is a global graph constant, which can therefore be absorbed into ϵ. This gives an upper bound

$${\tau }_{ij}(\epsilon ^{\prime} ) \le \min \{\tau :\ {{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))\le \epsilon ^{\prime} \}\\ =\min \{\tau :\ {\kappa }_{ij}(\tau )\ge 1-\epsilon ^{\prime} \}\ ,$$

with \(\epsilon ^{\prime} ={d}_{0}\epsilon\) which is what we set out to show. Note that choosing any \(\epsilon ^{\prime} \in (0,1/2)\) ensures exponential convergence rate to the stationary measure59 and by convention, we take the middle of this range and define \({\tau }_{ij}^{{\mathsf{mix}}}:= {\tau }_{ij}^{{\mathsf{mix}}}(1/4)\) to obtain Eq. (3).

Connection between geometric modularity and the symmetric stochastic block model

In this section, we prove that the Boltzmann distribution of cluster assignments given the edge curvatures \({\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\) (Eq. (5)) has equilibrium states which are indistinguishable from the ground truth partition of the SBM. We show this by reducing \({\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\) as well as the posterior distribution \({\mathbb{P}}(C| G)\), to the same constant interaction Ising model (Eq. (7)). In the remainder of this section we work in the sparse regime, where pin, pout = O(1/n).

First, we recap the well-known equivalence of the SBM and the Ising model31. Let E denote the set of edges. The probability distribution of the symmetric SBM for two clusters can be written as41

$${\mathbb{P}}(G| C)=\; {p}_{{\mathsf{out}}}^{e}{(1-{p}_{{\mathsf{out}}})}^{\left(n\atop2\right)-e} \\ \times \mathop{\prod}\limits_{ij\in E}{\left(\frac{{p}_{{\mathsf{in}}}}{{p}_{{\mathsf{out}}}}\right)}^{\delta ({C}_{i},{C}_{j})}\mathop{\prod}\limits_{ij\notin E}{\left(\frac{1-{p}_{{\mathsf{in}}}}{1-{p}_{{\mathsf{out}}}}\right)}^{\delta ({C}_{i},{C}_{j})}\\ \propto \mathop{\prod}\limits_{ij\in E}{\left(\frac{{p}_{{\mathsf{in}}}}{{p}_{{\mathsf{out}}}}\right)}^{\delta ({C}_{i},{C}_{j})}$$

where e is the total number of edges and in the last line we used that the effect of non-edges is weak in the sparse regime. Therefore, by Bayes’ theorem with uniform prior one obtains the posterior distribution \({\mathbb{P}}(C| G)\propto {\mathbb{P}}(G| C)\). As a result, the probability of clusters generated by the SBM is equivalent to the Ising model with uniform interaction with Boltzmann distribution given by Eq. (7)31.

Second, we reduce the Boltzmann distribution of clusters given the edge curvature to same Ising model in Eq. (7). From Eq. (5) we have

$${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}}) \propto {e}^{{\sum }_{ij}{\kappa }_{ij}(\tau )\delta ({C}_{i},{C}_{j})}\\ \propto {e}^{{\sum }_{ij}[1-{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),{{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))]\delta ({C}_{i},{C}_{j})}\ ,$$

where in the last line we used the definition of the curvature in Eq. (2). Comparing Eq. (16) with Eq. (7) note that \(1-{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))\) is non-constant and has a non-linear dependence on the scale τ. However, it is possible to express it in terms of pin, pin to make the connection to the Ising model. Let us write the diffusion measures in Eq. (1) in terms of the spectral decomposition of L as

$${p}_{i}^{k}(\tau )=\mathop{\sum }\limits_{s=1}^{n}{e}^{-{\lambda }_{s}\tau }{\phi }_{s}(k){\phi }_{s}(i)\ .$$

At this point let us remark that in the dense regime where pin, pout = O(1), the first two eigenmodes (λ1, ϕ1) and (λc, ϕc) dominate and the second eigenmode contains the anti-symmetric eigenvector \({\phi }_{c}(u)=1/\sqrt{n}\) when Cu = 1 and \(-1/\sqrt{n}\) when Cu = 2 that is associated with the community structure (Fig. 2c). Thus, one can follow spectral clustering methods29 to find the sparsest cut between clusters using ϕc. In contrast, in the sparse regime, the dominant eigenmodes will be driven by random fluctuations in the node degrees across the graph60, thus spectral clustering algorithms based on L are suboptimal (Fig. 2d).

However, the coupled diffusion pair allows for cancelling out random fluctuations in their spectrum. To see this, consider for a between-edge ij the difference

$$\mathop{\sum}\limits_{ij\in E}{p}_{i}^{k}(\tau )-{p}_{j}^{k}(\tau ) =\mathop{\sum}\limits_{ij\in E}\mathop{\sum }\limits_{s=1}^{n}{e}^{-{\lambda }_{s}\tau }{\phi }_{s}(k)[{\phi }_{s}(i)-{\phi }_{s}(j)]\\ =\mathop{\sum }\limits_{s=1}^{n}{e}^{-{\lambda }_{s}\tau }{\phi }_{s}(k){{\Delta }}{\phi }_{s}\ ,$$

where Δϕs is defined in Eq. (8). The first term involves the constant eigenvector ϕ1 corresponding to the stationary state. Therefore, ϕ1(i) = ϕ1(j) for all ij and thus its contributions cancels out when taking differences. Further, for eigenvectors ϕs with s ≠ 1, c we have asymptotically (n → ) that (Fig. 3)

$${{\Delta }}{\phi }_{s}\to 0$$

As a result, the only contribution we are left with is coming from the anti-symmetric eigenmode (λc, ϕc). Thus we have that

$$\mathop{\sum}\limits_{ij\in E}({p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau ))=\left\{\begin{array}{l}{\epsilon }_{\phi },\,{{\mbox{if}}}\,{C}_{i}={C}_{j},\hfill \\ {e}^{-{\lambda }_{c}\tau }{\phi }_{c}{{\Delta }}{\phi }_{c}+{\epsilon }_{\phi },\,{{\mbox{if}}}\,{C}_{i}\;\ne\; {C}_{j}\ ,\end{array}\right.$$

where ϵϕ represents the contribution from the random eigenvectors which is negligible in the limit n → .

To compute \({{{{{{{{\mathcal{W}}}}}}}}}_{1}\) in the exponent of Eq. (16), we use Kantorovich–Rubinstein duality (Eq. (12)). Using Eq. (19) in Eq. (12) and ignoring asymptotically small terms, we consider the quantity

$$\mathop{\sum}\limits_{ij\in E}\mathop{\sum}\limits_{u}f(u)\left[{p}_{i}^{u}(\tau )-{p}_{j}^{u}(\tau )\right]\\ =\mathop{\sum}\limits_{ij\in E}{e}^{-{\lambda }_{c}\tau }\mathop{\sum}\limits_{u}f(u){\phi }_{c}(u)\\ =\mathop{\sum}\limits_{ij\in E}\frac{{e}^{-{\lambda }_{c}\tau }}{n}\left[\mathop{\sum}\limits_{u:\ {C}_{u}=1}f(u)-\mathop{\sum}\limits_{u:\ {C}_{u}=2}f(u)\right]\\ =\mathop{\sum}\limits_{ij\in E}\frac{{e}^{-{\lambda }_{c}\tau }}{n}\left[\mathop{\sum}\limits_{u:\ {C}_{u}=1}(f(u)-f(i))-\mathop{\sum}\limits_{u:\ {C}_{u}=2}(f(u)-f(j))\right.\\ \quad\,+\left.\mathop{\sum}\limits_{u:\ {C}_{u}=1}f(i)-\mathop{\sum}\limits_{u:\ {C}_{u}=2}f(j)\right].$$

In the sparse regime, we may make a tree-like approximation in the neighbourhood of i. This means that the number of neighbours of i at distance q inside the cluster is \({p}_{{\mathsf{in}}}^{q}{(n/2)}^{q}\), ignoring terms of order O(1/n) and beyond. Considering only nodes at unit distance (q = 1), we approximate Eq. (20) as

$$\mathop{\sum}\limits_{ij\in E}\frac{{e}^{-{\lambda }_{c}\tau }}{n}\left[\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim i}(f(u)-f(i))-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim i}(f(u)-f(i))\right.\\ \quad\,+\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim j}(f(u)-f(j))-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim j}(f(u)-f(j))\\ \quad\,+\left.\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim i}f(i)-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim i}f(i)+\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim j}f(j)-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim j}f(j)\right] \\ =\mathop{\sum}\limits_{ij\in E}\frac{{e}^{-{\lambda }_{c}\tau }}{n}\left[\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim i}(f(u)-f(i))-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim i}(f(u)-f(i))\right.\\ \quad\,+\mathop{\sum}\limits_{u:\ {C}_{u}=1\atop u \sim j}(f(u)-f(j))-\mathop{\sum}\limits_{u:\ {C}_{u}=2\atop u \sim j\\ }(f(u)-f(j))\\ \quad\,+\left.\frac{n}{2}{p}_{{\mathsf{in}}}(f(i)-f(j))-\frac{n}{2}{p}_{{\mathsf{out}}}(f(i)-f(j))\right].$$

Then, taking the supremum over all 1-Lipschitz functions f, we obtain

$$\begin{array}{ll}&\mathop{\sum}\limits_{ij\in E}{{{{{{{{\mathcal{W}}}}}}}}}_{1}({{{{{{{{\bf{p}}}}}}}}}_{i}(\tau ),\ {{{{{{{{\bf{p}}}}}}}}}_{j}(\tau ))\delta ({C}_{i},\ {C}_{j})\\ &\approx \mathop{\sum}\limits_{ij\in E}{e}^{-{\lambda }_{c}\tau }({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})\left(1+\frac{| {p}_{{\mathsf{in}}}-{p}_{{\mathsf{out}}}| }{2({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})}\right)\delta ({C}_{i},{C}_{j})\end{array}$$

Substituting this into Eq. (16) and noting that pin + pout is constant we obtain at a fixed τ

$${\mathbb{P}}(C| {{{{{{{\boldsymbol{\kappa }}}}}}}})\propto \exp \left[\left(\frac{| {p}_{{\mathsf{in}}}-{p}_{{\mathsf{out}}}| }{2({p}_{{\mathsf{in}}}+{p}_{{\mathsf{out}}})}\right)\mathop{\sum}\limits_{ij\in E}\ \delta ({C}_{i},\ {C}_{j})\right],$$

which up to a constant of proportionality equals the expression in Eq. (9).