First-passage times to quantify and compare structural correlations and heterogeneity in complex systems

Bassolas, Aleix; Nicosia, Vincenzo

doi:10.1038/s42005-021-00580-w

Download PDF

Article
Open access
Published: 15 April 2021

First-passage times to quantify and compare structural correlations and heterogeneity in complex systems

Communications Physics volume 4, Article number: 76 (2021) Cite this article

2466 Accesses
17 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Virtually all the emergent properties of complex systems are rooted in the non-homogeneous nature of the behaviours of their elements and of the interactions among them. However, heterogeneity and correlations appear simultaneously at multiple relevant scales, making it hard to devise a systematic approach to quantify them. We develop here a scalable and non-parametric framework to characterise the presence of heterogeneity and correlations in a complex system, based on normalised mean first passage times between preassigned classes of nodes. We showcase a variety of concrete applications, including the quantification of polarisation in the UK Brexit referendum and the roll-call votes in the US Congress, the identification of key players in disease spreading, and the comparison of spatial segregation of US cities. These results show that the diffusion structure of a system is indeed a defining aspect of the complexity of its organisation and functioning.

Unraveling the effects of multiscale network entanglement on empirical systems

Article Open access 10 June 2021

Taming out-of-equilibrium dynamics on interconnected networks

Article Open access 22 November 2019

Entropy, irreversibility and inference at the foundations of statistical physics

Article 01 May 2024

Introduction

The elements of a variety of complex systems can be naturally associated with one of a small number of classes or categories. Typical examples include the organisation to which an individual belongs¹, the political party of a voter², the income level of a household³ or the functional group of a neuron⁴. Quite often, the co-existence of nodes belonging to different classes and the interactions among those classes play a fundamental role in the functioning of a system. For instance, economic and ethnic segregation in cities is known to be associated with the emergence of social inequalities^5,6. Similarly, the organisation of neural cells in the brain and the intricate patterns of relations among different functional areas are known to be responsible for the large variety of cognitive tasks that we as humans are able to perform^7,8. However, obtaining a robust and non-parametric quantification of the heterogeneity of class distributions, especially in systems consisting of a large number of interconnected discrete components, is still an outstanding problem.

The statistics of random walks on interaction graphs have been successfully employed to study the structure of a variety of complex systems^9,10,11,12, and have been used to model and characterise transportation^13,14, biological^15,16 and financial systems¹⁷. And it is now established that the diffusion structure of a graph is intimately connected with many of the meaningful aspects of a complex network^18,19, including the existence of communities^20,21,22, the distribution of node roles²³, the global navigability of a system²⁴ and the variability of structural properties in temporal graphs²⁵. Quite interesting insights about the relation between the structure and dynamics on a network have come from the analysis of transient and long trajectories of random walks on graphs, including the behaviour of entropy rate^26,27,28, the statistics of first passage and coverage times^{12,29,30,31,32} and the systematic study of their fluctuations³³. However, the potential usefulness of random walks to quantify the heterogeneity of class distributions on networks, either in spatially embedded systems or in high-dimensional networks, has only recently been hinted to^33,34,35.

We propose here a principled methodology to quantify the presence of correlations and heterogeneity in the distribution of classes or categories in a complex system. The method is based on the statistics of passage times of a uniform random walk on the graph of interconnections among the units of the system. For instance, in the case of a social system we can construct a graph among individuals based on the observed relations or contacts among them. Similarly, in an urban system we can consider the network of adjacency between census tracts or the connections among census tracts due to human mobility flows. By normalising class mean first passage times (CMFPT) with respect to a null-model where classes are reassigned to nodes uniformly at random, we can effectively quantify and compare the heterogeneity of class distributions in systems of different nature, size and shape.

It is important to stress that, although random walks have been widely used to identify modules and communities of nodes based on how tightly connected they are^20,21,22,36, here we focus on systems whose nodes have preassigned classes that do not necessarily coincide with their location in the graph. And we study the dual problem of quantifying the heterogeneity and correlations induced by a fixed and exogenous assignment of nodes to classes. We test our framework on a variety of systems with simple geometries and ad-hoc class assignments, and then we use it in three real-world scenarios, namely the quantification of polarisation in the Brexit referendum and in the US Congress since 1926, the role of face-to-face interactions among individuals in the spread of an epidemics and the relation between economic segregation and prevalence of crime in the 53 US cities with more than one million citizens.

Results

Let us consider a graph G(V, E) with ∣E∣ = K edges on ∣V∣ = N nodes and adjacency matrix A = {a_ij}, and a given colouring function $f:V\to {\mathcal{C}},\quad {\mathcal{C}}=\{1,2,\ldots ,C\}$, which associates each node i of G to a discrete label f(i) = c_i. Let us also consider a random walk on G, defined by the transition matrix Π = {π_ij} where π_ij is the probability that a walker at node i jumps from node i to node j in one step (see Fig. 1a). In general, Π could be any row-stochastic transition matrix, but in the following we will consider only uniform random walks.

**Fig. 1: Characterising colour distributions in regular lattices.**

The method we propose moves from the classical research on mean first passage times (MFPT) between pairs of nodes in a graph^9,10,11,12, and focuses instead on the distribution of CMFPT, i.e., the expected number of steps τ_αβ needed to a random walker to visit for the first time a node of a certain class β when it starts from a node of class α (see Fig. 1a). Consider as an example the two regular square lattices shown in Fig. 1b, c. In both graphs nodes are divided into five classes of equal size. In the graph in Fig. 1b the class to which a node belongs is chosen uniformly at random, while in Fig. 1c, instead, the nodes are associated with a small number of homogeneous clusters. We are interested here in the statistical properties of the symbolic dynamics ${W}_{i}=\{{c}_{{i}_{0}},{c}_{{i}_{1}},\ldots \}$ of node labels or classes visited by the walk W when starting from i. Intuitively, we expect that, for long-enough times, all the trajectories of random walks starting from each of the N nodes of the graph in Fig. 1b will be associated with the same symbolic dynamics, thus becoming indistinguishable. Indeed, that system has neither marked structural heterogeneity, since all the nodes have the same degree (except for the nodes at the border of the grid), nor inhomogeneity in class assignments, since the probability for a node to belong to a certain class does not depend on its position in the graph or on the classes of its neighbours. In particular, a random walker starting at any node belonging to, say, the light-blue class (see nodes 1 and 2 in Fig. 1b) will require on average a small number of steps to hit a node in the orange cluster, since orange nodes can be found in the vicinity of any blue node.

If the association of nodes to classes induces compact clusters, as in the lattice shown in Fig. 1c, then the statistical properties of the symbolic dynamics W_i will heavily depend on the starting node i, despite the fact that almost all the nodes have identical degree. In particular, a random walker starting in the blue cluster at the top-left corner of the graph (node 3) will in general require a very large number of steps before hitting for the first time an orange node. Conversely, a random walker starting at node 4 will hit a node in the orange cluster in a much smaller number of steps, just because one of its immediate neighbours is indeed orange.

Class mean first passage times

A quantity of interest in the study of a symbolic dynamics over graphs is the expected time needed to hit a certain node for the first time, usually known in the literature as the hitting time or MFPT²⁹. We denote as T_i,α the average MFPT from node i to any node of class α, i.e., the expected number of steps needed to a random walk starting on i to visit for the first time any node j such that f(j) = α. Following the formalism to derive the MFPTs between nodes in a graph, we can write a set of self-consistent equations for T_i,α¹⁸:

$${T}_{i,\alpha }=1+\mathop{\sum }\limits_{j=1}^{N}\left(1-{\delta }_{f(j),\alpha }\right){\pi }_{ij}{T}_{j,\alpha }.$$

(1)

The analytic solution for Eq. (1) depends only on the structure of the graph and on the colouring function f. Let us denote as T_α the column vector of hitting times from nodes of class β ≠ α to nodes of class α. By convention we set ${\{{T}_{\alpha }\}}_{i}=0$ if f(i) = α. The self-consistent equation for hitting times can be written as follows:

$${T}_{\alpha }={D}_{\overline{\alpha }}+{{{\Pi }}}_{\overline{\alpha }}{T}_{\alpha }$$

where ${{{\Pi }}}_{\overline{\alpha }}$ is the transition matrix of the walk where all the rows and columns corresponding to nodes of class α are set to zero. We denote by D_α the indicator vector of nodes belonging to class α, and by ${D}_{\overline{\alpha }}={{\boldsymbol{1}}}_{N}-{D}_{\alpha }$ the indicator vector of nodes not belonging to class α, i.e., ${\{{D}_{\overline{\alpha }}\}}_{i}=1-{\delta }_{f(i),\alpha }$. This leads to the solution:

$${T}_{\alpha }={\left[I-{{{\Pi }}}_{\overline{\alpha }}\right]}^{-1}{D}_{\overline{\alpha }}$$

(2)

The average Class MFPT τ_αβ from class α to class β is computed as follows:

$${\tau }_{\alpha \beta }=\frac{1}{{N}_{\alpha }}\mathop{\sum }\limits_{j=1}^{N}{D}_{\alpha }^{\top }{T}_{\beta }$$

(3)

where N_α is the number of nodes belonging to class α.

The return time to class α, which is the expected number of steps needed to a walker starting on a node of class α to hit a node of class α (including its starting point), can be computed in a similar way. The forward equation for the hitting time to class α from a node of class α reads:

$${R}_{i,\alpha }=\mathop{\sum }\limits_{j=1}^{N}{\delta }_{f(j),\alpha }{\pi }_{ij}+\mathop{\sum }\limits_{j=1}^{N}(1-{\delta }_{f(j),\alpha }){\pi }_{ij}(1+{T}_{j,\alpha })$$

(4)

where the first contribution accounts for the neighbours of node i that actually belong to class α, while the second contribution corresponds to walks passing through immediate neighbours of i not belonging to class α. The equation can be written in a compact form as follows:

$${R}_{\alpha }={{{\Pi }}}_{\alpha \alpha }{D}_{\alpha }+{{{\Pi }}}_{\alpha \overline{\alpha }}\left[{T}_{\alpha }+{D}_{\overline{\alpha }}\right]$$

(5)

where R_α is the vector of return times to class α, such that ${\left\{{R}_{\alpha }\right\}}_{i}=0$ if f(i) ≠ α, and T_α is the vector of MFPT to class α from nodes that do not belong to class α, as above. Here we denote by Π_αα the transition matrix of the walk restricted to nodes of class α, i.e., whose generic element π_ij is set to 0 if either i or j does not belong to class α. Similarly, ${{{\Pi }}}_{\alpha \overline{\alpha }}$ is the transition matrix restricted to links from nodes of class α to nodes not in class α.

By solving Eqs. (2) and (5) for the grid lattice shown in Fig. 1b we obtain the distribution of CMFPTs provided in Fig. 2a, while for the lattice with clusters in Fig. 1c we obtain the values shown in Fig. 2b. Notice that, as expected, the CMFPTs in the lattice with compact clusters are in general much higher than those observed in the same lattice with uniformly random class assignments. Moreover, both cases we have in general τ_αβ ≠ τ_βα, since CMFPTs depend primarily on the shape and size of clusters and on the actual fine-grain arrangement of colours.

**Fig. 2: Class mean first passage times (CMFPT) in synthetic colour distributions.**

One could argue that the situation shown in Fig. 1b could be potentially represented by an equivalent mean-field network consisting of a clique of just five nodes, one for each class, and where the probability to jump in one step from a class to any of the another ones is equal for all the classes. However, such a mean-field approximation (see Methods for details) only retains information about the average probability of jumping between two classes in one time-step, which indeed discards all the (possibly relevant) structural information contained in the underlying graph. In particular, in a mean-field model of C nodes in which each class is represented as a supernode the actual distribution of distances among classes is grossly underestimated and flattened, thus resulting in a consequently inaccurate underestimation of CMFPT. For instance, if we use the mean-field approximation for the lattice with random colour assignments shown in Fig. 1b, we get the solutions shown in Fig. 2c, where the values of CMFPT between different classes are substantially smaller than those computed above using Eq. (2). The analysis of suitable mean-field approximations for the computation of CMFPTs is actually quite intricate, and will be thoroughly addressed in a forthcoming paper.

Simple geometries

We compute here the distribution of CMFPT for some simple geometries with simple class assignments. These examples aim at showing that CMFPT depends heavily on class assignment and on the way nodes belonging to the same class are arranged. In all these cases, the symmetric nature of colour assignments will allow us to perform the computations on a minimal weighted graph whose distribution of CMFPT is identical to that of the original system. Notice that those minimal weighted graphs provide exact solutions for the original colour assignment they represent, and should not be confused with mean-field approximations.

The first example is that of a two-dimensional square lattice with periodic boundary conditions (a torus), whose nodes are organised in alternate stripes of black and white nodes, as shown in Fig. 3a. Thanks to the symmetric nature of this specific colour assignment, a uniform random walk on that graph is effectively equivalent to a random walk on the weighted minimal two-node graph shown in Fig. 3b, with the transition matrix:

$${{\Pi }}=\left[\begin{array}{ll}\frac{1}{2}&\frac{1}{2}\\ \frac{1}{2}&\frac{1}{2}\end{array}\right]$$

The hitting time to the black node in the minimal equivalent graph when starting from the white node can be written as follows:

$${T}_{\circ ,\bullet }=1+\frac{1}{2}{T}_{\circ ,\bullet }$$

which gives T_∘,• = 2 and, by symmetry, also T_•,∘ = 2.

**Fig. 3: Characterisation of colour distributions in simple geometries through class mean first passage times (CMFPT).**

As a second example we consider a finite chain of white nodes surrounded by black nodes, as shown in Fig. 3c. We are interested here in showing how the length L of a linear cluster of a given colour influences the distribution of CMFPT across the cluster, so we will focus on the CMFPT from nodes of class ∘ to nodes of class •. In this case the system has a mirror symmetry, which effectively allows us to focus on the ⌈$\frac{L}{2}$⌉ nodes on either side of the chain. The only caveat is that weighted minimal graphs associated with chains with even length L are slightly different from those associated with chains with odd length, as shown in Fig. 3d. For L ≥ 6, the closed expression for the distribution of CMFPT from each node in the chain is:

$${T}_{k,\bullet }=\left[\frac{4+{A}_{M-1}}{4-{B}_{M-1}}\right]\mathop{\prod }\limits_{\ell =1}^{k}{B}_{M-\ell }+\mathop{\sum }\limits_{j=1}^{k-1}{A}_{M-k+j}\mathop{\prod }\limits_{\ell =0}^{j-1}{B}_{M-k+\ell }$$

(6)

where $M=\left\lfloor \frac{L-1}{2}\right\rfloor$, and A_k and B_k are two rational sequences whose form depends on whether the length of the chain is even or odd. More details about the derivation are provided in Supplementary Note 1, where we also compute the distributions for 1 ≤ L < 6. In Fig. 3e we report both the distribution of T_k,• across the chain and the average CMFPT to nodes of class •. Notice that when L ≫ 1, 〈T_•〉 converges to 2, which is the same value obtained in the case of a lattice with rows of alternate colours seen above, as expected.

As a final example we consider the infinite cylinder shown in Fig. 3f, where the nodes in the upper M + 1 rows are of class ∘ and those in the bottom row are of class •. The aim of this example is to show the behaviour of CMFPT as a cluster becomes deeper, i.e., as the nodes in the cluster of a certain class are placed farther away from the frontier with the other cluster. For our purposes, this geometry is effectively equivalent to the linear chain of nodes shown in Fig. 3g, where each row of the original graph is represented by a single node in the minimal graph, with an appropriately weighted self-loop. We can write a set of self-consistent equations for the hitting time to class • for a walk started on each of the rows k = 0, 1, …, M:

$${T}_{0,\bullet }=2+\frac{1}{2}{T}_{1,\bullet }\\ {T}_{k,\bullet }=2+\frac{1}{2}\left[{T}_{k-1,\bullet }+{T}_{k+1,\bullet }\right],k=1,\ldots ,M-1\\ {T}_{M,\bullet }=3+{T}_{M-1,\bullet }$$

(7)

whose solution is:

$${T}_{k,\bullet }=(k+1)(3+4M-2k),\quad k=0,1,\ldots M-1$$

(8)

and

$${T}_{M,\bullet }=3+M(5+2M)$$

(9)

The full derivation is reported in Supplementary Note 1.

In Fig. 3h we show the scaling of average MFPT to class •, defined as follows:

$$\langle {T}_{\bullet }\rangle =\frac{1}{M+1}\mathop{\sum }\limits_{k=0}^{M}{T}_{k,\bullet }=\frac{8{M}^{3}+33{M}^{2}+43M+18}{6(M+1)}$$

(10)

$$\simeq \frac{1}{6}\left(8{M}^{2}+25M\right)$$

(11)

where the approximation is accurate for M ≫ 1. This means that the average MFPT to class • scales as the square of the height of the cylinder. In Fig. 3h we also show the scaling of the second moment of T_k,•:

$$\langle {T}_{\bullet }^{2}\rangle =\frac{32{M}^{5}+355{M}^{4}}{15(M+1)}+o({M}^{2})\simeq \frac{32{M}^{4}+323{M}^{3}}{15}$$

(12)

which indicates that the standard deviation of T_k,• is a function that grows as O(M²) as well. In other words, the deeper a cluster the much higher the values of CMFPT to its border, and the much wider the distribution of the CMFPT from any single node of the cluster to the border.

Synthetic colourings in two-dimensional lattices

We study here the distribution of CMFPTs in a two-dimensional square lattice with N = L × L nodes. Here each node is associated with one of two possible classes, namely • or ∘, depending on their position in the lattice. Without lack of generality, we set the relative abundance of • nodes $r=\frac{{N}_{\bullet }}{N}$. Then we assign N_• nodes to class • sampling their coordinates (x, y) from a symmetric two-dimensional Gaussian distribution centred in the middle of the lattice, with standard deviation equal to σ:

$$P(x)=\frac{1}{\sigma \sqrt{2\pi }}{e}^{-{\left(x-L/2\right)}^{2}/2{\sigma }^{2}},$$

(13)

By tuning r and σ this model allows for a continuous transition between homogeneous distributions of colours and patterns with strong class segregation. In Fig. 4a–f we show sample sketches of how the spatial distribution of colours looks like as a function of σ and r. For very large values of σ, the picture approaches a homogeneous distribution, regardless of the relative abundance of the two colours. As σ decreases, instead, the nodes of class • will become more strongly clustered around the centre. The role of the relative abundance of colours is evident from the comparison of Fig. 4b and c, which are two configurations with the same σ respectively for r = 0.9 and r = 0.5. In particular, we note that the relative abundance of the two colours also has a non-trivial role in determining the degree of mixing between the two classes, since more ∘ nodes can be found inside the • cluster for r = 0.5 than for r = 0.9.

**Fig. 4: Characterisation of colour distributions in 2D lattices.**

A summary of the interplay between these two parameters is reported in Fig. 4g. Since different values of r produce systems with a different level of colour imbalance, we need to properly normalise the values of CMFPT measured in each configuration. We use the normalised CMFPT ${\widetilde{\tau }}_{\bullet \to \circ }$ and ${\widetilde{\tau }}_{\bullet \to \circ }/{\widetilde{\tau }}_{\circ \to \bullet }$, where ${\widetilde{\tau }}_{\alpha \beta }=\frac{{\tau }_{\alpha \beta }}{{\tau }_{\alpha \beta }^{{\rm{null}}}}$, and ${\tau }_{\alpha \beta }^{{\rm{null}}}$ is the CMFPT from class α to class β in a null-model where the structure of the graph is preserved but nodes are assigned to classes uniformly at random, preserving the relative abundance of each class (see Methods). The symbols in Fig. 4g correspond to the values of ${\widetilde{\tau }}_{\bullet \to \circ }$ and ${\widetilde{\tau }}_{\bullet \to \circ }/{\widetilde{\tau }}_{\circ \to \bullet }$ obtained for a variety of values of r and σ (the points corresponding to the configurations in Fig. 4a–f are labelled accordingly).

For large values of σ, we expect a more homogeneous distributions of the two colours (see Fig. 4a), and indeed we have ${\widetilde{\tau }}_{\bullet \to \circ }\simeq 1$, meaning that the relative distribution of CMFPT from • nodes is compatible with the one observed in the corresponding null-model. At the same time, the ratio ${\widetilde{\tau }}_{\bullet \to \circ }/{\widetilde{\tau }}_{\circ \to \bullet }$ is close to 1 as well, meaning that the normalised CMFPTs of the two classes are indistinguishable. As σ decreases, the • cluster becomes more prominent, but the actual relation between ${\widetilde{\tau }}_{\bullet \to \circ }$ and ${\widetilde{\tau }}_{\bullet \to \circ }/{\widetilde{\tau }}_{\circ \to \bullet }$ will depend on the value of r. In particular, if r > 0.5, i.e., nodes of class • are the majority (see Fig. 4b, d), the ratio ${\widetilde{\tau }}_{\bullet \to \circ }/{\widetilde{\tau }}_{\circ \to \bullet }$ normally remains larger than 1. This is again expected, since a deeper cluster of • nodes causes an increase in the CMFPT from • to ∘ nodes, along the same lines of the increase in CMFPT observed in the simple geometry shown in Fig. 3c. Conversely, if r < 0.5 then • nodes are interspersed within a large cluster of ∘ nodes (see Fig. 4e), making it harder for a walker started at a ∘ node to find a • node. This results in values of ${\widetilde{\tau }}_{\circ \to \bullet }$ larger than 1, i.e., much longer than in the corresponding null-model, and in ${\widetilde{\tau }}_{\bullet \to \circ }/{\widetilde{\tau }}_{\circ \to \bullet }\,<\,1$. There are some particular situations (see Fig. 4c, f) in which despite the increase in ${\widetilde{\tau }}_{\bullet \to \circ }$, the ratio ${\widetilde{\tau }}_{\bullet \to \circ }/{\widetilde{\tau }}_{\circ \to \bullet }$ remains close to 1. This is due to the fact that in these cases the two classes are distributed in a similar fashion, and the relative depths of the two clusters are indeed comparable.

It is worth noting that the $({\widetilde{\tau }}_{\bullet \to \circ },{\widetilde{\tau }}_{\bullet \to \circ }/{\widetilde{\tau }}_{\circ \to \bullet })$ phase space provides a very intuitive interpretation and a powerful visualisation of the heterogeneity of distributions of two classes, and it would thus be quite useful to characterise the spatial heterogeneity of generic binary feature distributions and point statistics^37,38.

Polarisation and segregation in voting dynamics

Among the variety of social dynamics that exhibit complex behaviours, voting is possibly one of the most interesting. And not just because anything concerning politics can spur endless and ferocious discussions, but also because voting patterns are the result of the interplay of a variety of factors that are generally difficult to model in an accurate way, including socio-economic and cultural background and spatial and temporal correlations^2,39. We focus here on two examples where voting dynamics result in the emergence of heterogeneity and correlations, namely the spatial clustering of Leave/Remain voters in the Brexit referendum and the polarisation of opinions of roll-call votes in the US Congress.

Spatial heterogeneity of Brexit vote

Here we show how inter-class MFPTs can be used to quantify the spatial segregation of voting patterns. We consider the results of the so-called Brexit referendum, held in the United Kingdom in 2016 to decide whether to leave the European Union. The referendum had a turnout of 72.2%, and 51.9% of the voters expressed the preference to leave the EU. The results of the referendum have been analysed in several works⁴⁰ that have outlined interesting correlations of voting preference with a variety of socio-economic indicators, including age, income level, unemployment and level of education⁴¹. One of the most intriguing aspects of the results was that the vote was highly segregated. Indeed, highly urbanised areas, as well as the majority of constituencies in Scotland, voted preferentially for Remain, while the rest of the country expressed a preference to Leave.

We constructed the planar graph of constituencies in Great Britain (i.e., all the mainland constituencies in England, Wales and Scotland, leaving out the few constituencies in Northern Ireland), where each node is associated with a constituency and a link between two nodes exists if the corresponding constituencies border each other. We assigned each node to either Leave or Remain according to which party won the majority of votes in the corresponding constituency. We then computed the normalised average CMFPT pattern for Remain (R) and Leave (L), we obtained the values shown in Table 1. It is worth noting that while the normalised return time to each class is close to 1 in both cases, i.e., it is consistent with the corresponding null-model where classes are reassigned to nodes uniformly at random, the proper inter-class MFPTs exhibit a pronounced disparity between the two classes. In particular, the normalised CMFPT from Leave to Remain ${\widetilde{\tau }}_{LR}$ is much smaller (2.827) than its counterpart ${\widetilde{\tau }}_{RL}$ (12.304). This means that, on average, in this graph is much easier for a random walker starting at a node whose citizens expressed a majority of votes for Leave to arrive at a node where people preferentially voted for Remain, than the other way around. Or, putting it in another way, it was much easier for a Leave supporter wandering through the graph to meet a Remain supporter than the other way around.

Table 1 Normalised class mean first massage times (CMFPT) between constituencies that voted for Leave (L) or Remain (R) in the Brexit referendum if the class assignment is done according to the option with the majority of the votes.

Full size table

A possible interpretation of this result is the presence of a structural reinforcement of segregation. Indeed, if we assume that voters could influence each other by discussing the matter of the referendum with other voters holding opposite opinions, then a person determined to vote for Remain would have had a much harder time finding a Leave supporter to convince. On the contrary, Leave supporters would have been able to find Remain supporters much more easily, as Leave constituencies are on average closer (in terms of MFPT) to other Remain constituencies than the other way around.

However, this picture is completely reversed if we take into account the actual percentage of votes for Leave and Remain in each constituency, instead of noting only which party got a majority. We considered an ensemble of colour assignments obtained by assigning colour L to node i with probability p_L(i) equal to the percentage of Leave voters in constituency i. For instance, if in a given constituency i we had 45% of votes for Leave, then node i will be assigned to class L with probability 0.45, and to class R with probability 0.55. We computed the inter-class MFPTs in this ensemble of colourings, and normalised them by the corresponding values in a null-model where we reassigned vote proportions among nodes uniformly at random. The results are reported in Table 2.

Table 2 Normalised class mean first massage times (CMFPT) between constituencies that voted for Leave (L) or Remain (R) in the Brexit referendum if we consider the ensemble of colour assignments where a node is assigned to Leave (Remain) with a probability equal to the fraction of voters supporting Leave (Remain) in that constituency.

Full size table

It is evident that, by taking into account the actual distribution of voters in each constituency, the pattern of inter-class MFPT becomes practically indistinguishable from the one we would observe in the corresponding null-model. This means that there was indeed no significative spatial segregation effect in the Brexit vote, and indicates that indeed the reasons in support for Leave or Remain were most probably linked to socio-economic characteristics, rather than to geographical ones.q

Polarisation of roll-call votes

In this section we show how ${\widetilde{\tau }}_{\alpha \beta }$ can be used to quantify the level of polarisation in roll-call votes in the US congress^42,43,44, and to keep track of its evolution over time. We considered the full data set of affiliation and single roll-call votes of the members of the US Congress, and we built a weighted graph among members of each chamber in each term between 1929 and 2016, where the weight of the edge connecting two nodes is equal to the number of times the votes of the corresponding members in a roll call coincided. We assigned each node to either Republicans, Democrats or Others, according to the party to which they belonged, but we focused exclusively on the CMFPT between Republicans and Democrats, since members of Other parties are normally a rather small minority, if present at all.

It is worth noting that, at difference with the synthetic networks we have studied so far, these networks do not admit a natural embedding in a metric space. Intuitively, we expect that members of both the Senate and the House of Representatives would, in general, be more likely to vote as other members of their party, giving rise to somehow definite clusters. However, the situation is not always that clear. In Fig. 5a–c we show the networks of Senate members observed in three different terms from the last 30 years. Indeed, it is evident that stronger connections between members of the same party appear as time passes. Moreover, when comparing the 112th and 115th congresses, we can see that the party with majority tends to be more heavily connected than the other one. This is also reflected in the relative size of nodes, which is proportional to the corresponding CMFPT to the other party. Just by looking at these three graphs we would be inclined to think that the level of polarisation in the US Congress seems to increase over time.

**Fig. 5: Roll-call polarisation in the Senate and the US House of Representatives.**

For each graph, we also show the distribution of normalised MFPTs ${\widetilde{T}}_{i,\alpha }$ from each node i to each of the parties, respectively, for Democrats and Republicans. By looking at these distributions on the same scale, it is evident that the polarisation, intended as the relative distance between nodes of different parties as measured by inter-class MFPT, has increased substantially in the last 30 years. In particular, the separation between the intra- and inter-class MFPT distributions has increased dramatically, to the point that in the 115th Congress the distributions of intra-class and inter-class MFPT are clearly separated. When comparing the 112th and 115th legislatures we also observe that the changes in the shape and position of the MFPT distributions seem to depend on which party holds the majority of the seats. In particular, when Democrats have the majority, the CMFPT to Republicans across nodes is higher than from Republicans to Democrats. Conversely, when Republicans are ruling the situation is inverted, pointing out that the party that holds the majority seems to be the main driver of polarisation.

In Fig. 5d, e, we show the evolution of the average ${\widetilde{\tau }}_{\alpha \beta }$ between the two main US parties in both the Senate and the House of Representatives. The dashed lines indicate the intra-class MFPT, and indeed support the intuition that polarisation has increased dramatically in recent years. The plots of return times ${\widetilde{\tau }}_{\alpha \alpha }$ instead (solid lines) are far more stable, and the small oscillations we observe depend only on which is the ruling party, since that one will normally yield lower values of ${\widetilde{\tau }}_{\alpha \alpha }$.

We quantify the overall polarisation in the Senate and the US House of Representatives by computing the average between the inter-class passage times $({\widetilde{\tau }}_{D\to R}+{\widetilde{\tau }}_{R\to D})/2$, which provides an estimate of how close are the voting behaviours of the two main parties in each Congress. The temporal evolution of $({\widetilde{\tau }}_{D\to R}+{\widetilde{\tau }}_{R\to D})/2$ is shown in Fig. 5f, where each point is coloured according to the party holding the majority of seats in that term. Again, we observe a clear increase of polarisation after the 1970s in both branches of the Congress, which is in very good agreement with prior works^42,43,44.

We inspect next if, as hypothesised, the increasing polarisation observed in Fig. 5f is due to the party holding the majority of seats in each term. We computed the ratio of inter-class passage times ${\widetilde{\tau }}_{{\rm{Maj}}\to {\rm{Min}}}/{\widetilde{\tau }}_{{\rm{Min}}\to {\rm{Maj}}}$, where ${\widetilde{\tau }}_{{\rm{Maj}}\to {\rm{Min}}}$ is the normalised inter-class MFPT from the party with the majority to the party with the minority and ${\widetilde{\tau }}_{{\rm{Min}}\to {\rm{Maj}}}$ is the normalised inter-class MFPT from the minority to the majority. The values of ${\widetilde{\tau }}_{{\rm{Maj}}\to {\rm{Min}}}/{\widetilde{\tau }}_{{\rm{Min}}\to {\rm{Maj}}}$ (Fig. 5g) are indeed relatively stable over time, with the vast majority of points lying along or above the solid grid line corresponding to absence of polarisation. This indicates that in most of the terms over the last 80 years the party holding the majority has been the main driver of polarisation. These results demonstrate that ${\widetilde{\tau }}_{\alpha \beta }$ is indeed a robust measure of polarisation. We argue that the same framework could be easily used in other contexts, including the polarisation of discussions in (online) social networks^45,46 or the flip of candidates between different political parties⁴⁷. In particular, it could be also possible to identify those agents or individuals who contribute more to polarisation by looking at the ranking of nodes by their values of CMFPT to other classes.

Contact assortativity and relation with epidemics spreading

Understanding the mixing between groups of individuals in a network can provide a lot of information about the properties of social dynamics, including the role played by different individuals in the transmission of diseases^1,48,49,50. As a second case study, we use CMFPTs to identify how different groups of people interact in three face-to-face contact networks, namely the contacts in a hospital, in a school and in an enterprise, obtained from the SocioPatterns project data set^1,51,52,53.

The definition of groups or classes is specific of each system, e.g., the role of a person in the case of hospitals, the class in the case of schools and the department in which a person works for the network of contacts in an enterprise environment. The weight of the undirected edge connecting two nodes in each graph is equal to the number of contacts between the corresponding individuals. By looking at the dynamics of two simple epidemic models, namely a susceptible–infected–susceptible (SIS) and a susceptible–infected–recovered (SIR), we show here that the distribution of CMFPT in each graph provides relevant information about the dynamics of disease spreading in the system. For both epidemic models, if an individual i is infected it selects one of its neighbours j with probability w_ij/∑_jw_ij and infects it with a probability β. Afterwards, each of the infected individuals will either recover with probability μ in the case of the SIR model, or become susceptible again in the case of the SIS model.

In Fig. 6a–c, we report the matrices of ${\widetilde{\tau }}_{\alpha \beta }$ respectively for the school, the enterprise and the hospital. In the school network, we observe a consistent pattern of lower values of normalised intra-CMFPT ${\widetilde{\tau }}_{\alpha \alpha }$, which is most probably due to the much higher number of contacts among individuals in the same classroom compared to individuals in other classrooms. Notwithstanding this general pattern, some of the classes are more tightly connected than other close-by classes, as in the case of ce1b and ce2b. Also in the case of the enterprise network we distinguish a clear pattern where for each class α the value of ${\widetilde{\tau }}_{\alpha \alpha }$ is normally much smaller than ${\widetilde{\tau }}_{\alpha \beta }$ for α ≠ β. Yet, there are some interesting deviations, as in the case of SFLE. This department plays a similar role to that of teachers in the case of the school network (i.e., contain people who tend to interact with more than one class), and all the other departments exhibit similar values of inter-CMFPT to that class. Finally, in the hospital, contact network patients seem to be the most isolated, while the paramedical staff and administratives display lower CMFPT to all the other classes.

**Fig. 6: Class mean first passage times and the spread of epidemics in contact networks.**

As we show in the following, we found a quite interesting relation between the pattern of CMFPT in each network and the spreading dynamics on the same graph. We ran a large number of simulations of the SIR and SIS models, seeding the disease in each of the nodes of a network, and calculating the number of time steps needed to the spread to reach the peak in each of the classes, as a function of the group to which the seed node belongs. In Fig. 6d–f we report, as a function of ${\widetilde{\tau }}_{\alpha \beta }$, the average number of steps until the peak of the epidemic t_peak in class β is reached in a SIR model where the seed is a node in class α. Interestingly, we found that the time to the peak in class β is an increasing function of ${\widetilde{\tau }}_{\alpha \beta }$, and the rank correlation between the two variables is always pretty large and significant in the three systems.

The results on the SIS model somehow complement the picture observed in the case of SIR. In Fig. 6g–i we show that the class mean return times ${\widetilde{\tau }}_{\alpha \alpha }$ are strongly correlated with the fraction ρ_α of infected individuals of that class in the stationary state. In particular, the lower the value of ${\widetilde{\tau }}_{\alpha \alpha }$, the larger the fraction of infected individuals in class α in the endemic state, suggesting that the steady-state dynamics is indeed predominantly driven by interactions among individuals in the same class. In Supplementary Table 1 we report these correlations for a wider range of β and μ. As shown in detail in Supplementary Figs. 1 and 2, the observed correlations between ρ_α and ${\tilde{\tau}}_{\alpha\beta}$ are consistently higher than the correlation with either the total number of edges between two classes or the total fraction of edges from class α to class β. Those results as well as the correlations with a wider range of parameters and contact networks can be found in Supplementary Note 2.

Residential and dynamical urban economic segregation

Socio-economic segregation has an enormous impact on city livability, and many different measures to quantify it have been proposed in the past years. However, most of those measures, with a few notable exceptions⁵⁴, focus on first-neighbour information, and disregard the role of the mobility of citizens^55,56,57,58. We test here the potential of CMFPTs to quantify urban economic segregation in large metropolitan areas, taking into account the daily mobility patterns of individuals.

We considered the 53 US cities with more than one million inhabitants, and census information about the number of households in each of the 16 income categories defined by the US Census Borough (see Supplementary Table 2 and the 2017 American Community survey⁵⁹), where Class 1 is lowest income and Class 16 is the highest one. We constructed two different graphs among census tracts, namely the graph of tract adjacency and the graph of daily workplace commuting⁶⁰. The former graph is undirected and unweighted, and is better suited to measure the so-called residential segregation, i.e., the extent to which people with similar levels of income tend to live in close-by areas. The commuting graph, instead, is directed and weighted so that the weight of a link going from i to j is given by the sum of the residents in i working in j and the residents of j working in i in order to mimick the daily mobility of citizens.

The fact that households of more than one category are present in each block group prevents us from assigning a single class to each unit, so we computed the distribution of CMFPT by averaging over a large number of realisations of class assignments. In each realisation, the class of each node in the graph is sampled from the distribution of household in the corresponding census tract, so that the probability that node i is assigned to class α in a given realisation is equal to m_i,α/∑_∀βm_i,β, where m_i,α is the number of people of category α living in node i. We took a slightly different approach for the commuting graph as described in³⁴, where the population attributed to node i is a combination of the resident population at i and the number of commuters working at i:

$${\widetilde{m}}_{i,\alpha }={m}_{i,\alpha }+{\mathop{\sum}\limits _{j}}{\omega }_{ji}\frac{{m}_{j,\alpha }}{{\sum }_{\forall \beta }{m}_{j,\beta }},$$

(14)

where ω_ji is the weight from node j to node i in the commuting graph, which is equal to the total number of people who live in j and commute to their workplace in i. By doing so we take into account the fact that commuters effectively contribute to the diversity of an area, as they spend a considerable amount of time in there and actually interact with other commuters coming from different areas as well. For each realisation of a class assignment, we run 10⁴–10⁶ random walkers from each of the nodes. Then, we obtain the CMFPT τ_αβ for each ordered pairs of classes (α, β) by averaging over all nodes and realisations of class assignments, and we analyse the normalised CMFPT ${\widetilde{\tau }}_{\alpha \beta }={\tau }_{\alpha \beta }/{\tau }_{\alpha \beta }^{{\rm{null}}}$.

In Fig. 7a, b we show the profiles of ${\widetilde{\tau }}_{\alpha \beta }$ from each of the 16 classes computed over the adjacency graph, respectively, for Detroit and Boston. It is worth noting that the categories at the two extremes (i.e., the poorest and the wealthiest ones) exhibit a quite similar pattern in the two cities. They are both characterised by larger values of CMFPT from any of the other classes, meaning that those two classes are in general more isolated from the rest of the population, with most of the high-income classes appearing slightly more isolated in Detroit than in Boston. Moreover, in both cities all classes have a virtually identical value of CMFPT to Class 9, and very similar values to Classes 8 and 10, which indicates that these middle-income classes play a pivotal role in the spatial distribution of income. However, despite the fact that the qualitative behaviour is similar in the two cities, there are some noticeable quantitative differences. First of all, the values of normalised CMFPT are significantly larger in Detroit than in Boston. Second, in Detroit we observe a strong dependence of CMFPT on the class of the starting node, while in Boston the average number of steps required to reach a class below 12 is almost constant regardless of the category of the origin node. These important quantitative differences would suggest that the spatial distribution of income in Detroit is more heterogeneous than in Boston, a conclusion which is in line with the classical literature about economic segregation in the US^3,61,62,63.

**Fig. 7: Class mean first passage times and urban income inequality.**

However, in large metropolitan areas most of the daily activities of individuals happen far away from their home, due to urbanisation pressure and to decentralisation of productive sectors, so that residential segregation can hardly tell the whole story. Indeed, in the last years there has been an increasing interest in the quantification of social and economic segregation by taking into account mobility patterns^64,65,66. This is easily doable within our framework by letting the walkers move on the mobility graph instead of the adjacency network between census tracts. We have computed ${\widetilde{\tau }}_{\alpha \beta }$ upon the mobility graph of each US city in the data set, as obtained from workplace commuting information. The results are shown in Fig. 7c, d, and provide an interesting picture of the differences between residential and mobility-focused segregation. First of all, in both cities ${\widetilde{\tau }}_{\alpha \beta }$ does not depend too much on the origin class α but instead on the destination class β. This is most likely due to the fact that people of different backgrounds commute to similar areas, i.e., the city centre and industrial sites. Still, both cities display a distinct organisation of CMFPTs. In Detroit low-income classes are much more isolated (higher ${\widetilde{\tau }}_{\alpha \beta }$) compared to high-income classes, which exhibit systematically lower values of ${\widetilde{\tau }}_{\alpha \beta }$. In the case of Boston, instead, ${\widetilde{\tau }}_{\alpha \beta }$ is almost flat with no important dependence on either the source or the destination class.

The profiles of ${\widetilde{\tau }}_{\alpha \beta }$ that we show in Fig. 7a–d provide an overall clear picture of the distribution of CMFPT in a metropolitan area, but do not allow us to easily compare two cities in a systematic manner. Hence, we devised two synthetic indices $\overline{{\xi }^{{\rm{out}}}}$ and $\overline{{\xi }^{{\rm{in}}}}$ that summarise the information on spatial income heterogeneity in a single number. The idea behind these quantities is that an income class α is more heterogeneously distributed if there is a large difference between the CMFPT from α to the income classes immediately adjacent to α and the median CMFPT from α to any other class. We consider the discrepancy between the local and global median of CMFPT from class α:

$${\xi }_{\alpha }^{{\rm{out}}}=| {\overline{\widetilde{\tau }}}_{{\alpha }_{nn}}^{{\rm{out}}}-{\overline{\widetilde{\tau }}}_{\alpha }^{{\rm{out}}}|$$

where ${\overline{\widetilde{\tau }}}_{\alpha }^{{\rm{out}}}$ is the median of ${\widetilde{\tau }}_{\alpha \beta }$ when α ≠ β and ${\overline{\widetilde{\tau }}}_{{\alpha }_{nn}}^{{\rm{out}}}$ is the median of ${\widetilde{\tau }}_{\alpha \beta }$ to its nearest neighbours β ∈ {α − 1, α, α + 1}, if α = 1 or α = 16 we only consider the median between, respectively, {α, α + 1} or {α − 1, α}. Similarly for the discrepancy between local and global median of CMFPT to class α:

$${\xi }_{\alpha }^{{\rm{in}}}=| {\overline{\widetilde{\tau }}}_{{\alpha }_{nn}}^{{\rm{in}}}-{\overline{\widetilde{\tau }}}_{\alpha }^{{\rm{in}}}|$$

where ${\overline{\widetilde{\tau }}}_{\alpha }^{{\rm{in}}}$ and ${\overline{\widetilde{\tau }}}_{{\alpha }_{nn}}^{{\rm{in}}}$ are now the median of ${\widetilde{\tau }}_{\beta \alpha }$ when α ≠ β and the median of ${\widetilde{\tau }}_{\beta \alpha }$ from its nearest neighbours β ∈ {α − 1, α, α + 1}, respectively.

In Fig. 7e we show the ranking of US cities induced by $\langle \xi \rangle =(\overline{{\xi }^{{\rm{in}}}}+\overline{{\xi }^{{\rm{out}}}})/2$, which is the average discrepancy between local and global CMFPT from/to each income class. In general, larger values of 〈ξ〉 indicate more pronounced levels of segregation. On the left-hand side of Fig. 7e the cities are ranked according to 〈ξ〉 in the adjacency graph of census tracts, while on the right-hand side the ranking is based on 〈ξ〉 in the commuting graph. Interestingly, Detroit is the first US city by residential segregation, with other cities traditionally known for their high levels of segregation like Milwaukee and Cleveland following closely. Conversely, Boston is at the bottom of the ranking. However, the ranking changes substantially if we consider instead the mobility graph, and what we call dynamic segregation, as reported in the right-hand side of Fig. 7e. For instance, Baltimore (which is ranked pretty high for residential segregation) gets relegated to a mid-rank position, while cities like Buffalo or Indianapolis, where residential segregation is not that high, get to the top of the ranking of dynamic segregation.

The most interesting aspect of 〈ξ〉 is that it captures some of the most undesirable consequences of income segregation, namely the incidence of different types of crimes obtained from⁶⁷. In Fig. 7f–h we report the correlation between incidence per capita of assaults, violent crimes and robberies with the levels of residential segregation measured on the adjacency and on the commuting graph of the cities in our data set. Interestingly, both indices display a significant correlation with all three types of crime. Moreover, the indices computed over the commuting network display a stronger correlation in all three cases, reinforcing the idea that quantifying segregation by disregarding mobility can indeed lead to distorted conclusions. For instance, the city that appears on the top of the dynamical segregation ranking (Memphis) is the one displaying the highest incidence of violent crimes per capita among the largest US cities, although it is placed in the second quartile of the ranking by residential segregation. As a comparison, we report in Supplementary Note 3 and Supplementary Figs. 3–5 the correlations of traditional metrics of segregation (i.e., the Spatial Gini coefficient and the Moran’s I index) with the same crime indicators, showing that they are not able to attain values as high as those obtained with 〈ξ〉. Besides the lower correlations, it is also important to note that both the Spatial Gini Coefficient and the Moran’s I index are symmetric quantities, and as such they fail to capture the intrinsic asymmetry between income classes revealed by Fig. 7a–d.

We can extend the framework developed in Fig. 4 to the case of multiple classes or, in the present case, income categories. To do so we average the profiles ${\widetilde{\tau }}_{\alpha \beta }$ to obtain the following quantities:

$$\begin{array}{lll}{\widetilde{\tau }}_{\alpha \to O}&=&\frac{\mathop{\sum}\limits_{\beta \ne \alpha }{\widetilde{\tau }}_{\alpha \beta }}{{N}_{i}-1},\\ {\widetilde{\tau }}_{O\to \alpha }&=&\frac{\mathop{\sum}\limits_{\beta \ne \alpha }{\widetilde{\tau }}_{\beta \alpha }}{{N}_{i}-1},\end{array}$$

that are, respectively, the average CMFPT from class α to all the other classes and from all the other classes to class α. Here N_i corresponds to the total number of income branches in a city. For a given city, we can put each class α in the (${\widetilde{\tau }}_{\alpha \to O},{\widetilde{\tau }}_{\alpha \to O}/{\widetilde{\tau }}_{O\to \alpha }$) phase space, as shown in Fig. 8a for each of the income classes in Detroit and Boston (here ${\widetilde{\tau }}_{\alpha \to \beta }$ is calculated on the adjacency graphs). As noted previously, Boston exhibits lower levels of segregation than Detroit, and indeed most of the classes in Boston (in shades of blue) are clustered around the point (1, 1), and are associated with lower values of ${\widetilde{\tau }}_{\alpha \to O}$ that in Detroit (shades of green). The classes in Detroit, instead, display larger values of ${\widetilde{\tau }}_{\alpha \to O}$ and a larger variability along the y-axis. Still, both cities show a similar qualitative behaviour: ${\widetilde{\tau }}_{\alpha \to O}/{\widetilde{\tau }}_{O\to \alpha }$ is smaller than 1 for low-income classes, increasing above one for intermediate income classes, and decreases again for higher income classes. Such a transition could be explained by the fact that middle-income classes are in general more homogeneously distributed across a city, while low-income and high-income ones tend to form isolated clusters. If middle-income classes are fairly distributed across the city, they are easy to reach from the rest of classes but for them reaching low- and high-income classes is much harder, which leads to ${\widetilde{\tau }}_{\alpha \to O}\,> \,{\widetilde{\tau }}_{O\to \alpha }$. To validate this hypothesis, we show in Fig. 8b–g the probability of finding a given class α in a spatial unit p_α, respectively for Class 1 (b–e), 8 (c–f) and 16 (d–g) in Boston (b–d) and Detroit (e–g). Indeed, Class 1 (extremely low income) seems to be clustered around the centre of both cities, and especially so in Detroit, while Class 16 is mostly concentrated in the peripheries. Notice that Class 8 (median income levels) does not display any clearly visible isolation. We provide the location of the cities studied in the (${\widetilde{\tau }}_{\alpha \to O},{\widetilde{\tau }}_{\alpha \to O}/{\widetilde{\tau }}_{O\to \alpha }$) for several income classes in Supplementary Fig. 6.

**Fig. 8: Characterisation of residential segregation in Boston and Detroit through class mean first passage times (CMFPT).**

Overall, our approach to spatial segregation based on the diffusion of random walks is not only a natural extension of the latter multi-scalar approaches introduced to characterise residential segregation⁶⁸, but it also allows us to define a dynamical segregation that includes mobility into the analysis as it has been recently discussed for instance in refs. ^64,66.

Conclusions

We have shown here that the information captured by the distribution of inter-class MFPTs can be used not just as a way to detect the presence of anisotropy and correlations in the properties of nodes, but also as a reliable proxy for the dynamics and emergent behaviours of a complex system. One of the most interesting aspects of the measures of heterogeneity, polarisation and segregation that we have introduced in this work is that they take into account microscopic, meso-scopic and global relations among classes, due to the fact that in principle random walks integrate information about paths of all possible lengths. Another relevant property of the measures of segregation based on CMFPT is that they are non-parametric and correctly normalised with respect to a meaningful null-model, hence allowing us to compare on equal grounds the heterogeneity of class distributions in systems of different sizes, which is where most of the classical indices of segregation fail^56,68. Even more importantly, the profiles of inter-class MFPTs are not symmetric with respect to classes, and provide fine-grained information about which classes are most responsible for the emergence of polarisation and heterogeneity. In this respect, it would be worthy exploring how the simple measure of polarisation that we proposed can be extended to the case of more than two classes.

The fact that measures of class heterogeneity based on random walk statistics correlate quite well with some of the intrinsic dynamics happening in social networks (i.e., the spread of an epidemic) and with some other exogenous processes mediated by the underlying graph (i.e., the incidence of crime in a city) confirms that the profiles of CMFPT are a useful toolbox for targeted mitigation of the undesired effects of these dynamics. For instance, the groups of a social network that are more central according to ${\widetilde{\tau }}_{\alpha \beta }$ might be the best candidates for early vaccinations aiming at slowing-down an epidemic. At the same time, the definition of dynamical segregation based on walks on the mobility graphs, and the fact that it correlates quite substantially with crimes, potentially paves the way for a re-definition of the traditional role attributed to residential segregation, in favour of a more balanced view that takes into account the activity patterns of citizens together with the spatial distribution of their dwellings.

The generality of the methodology proposed in this paper and its applicability to different classical problems in complexity science establish a concrete link between classical statistical physics and modern complexity science, and have the potential to provide new interesting insights about the relation between structure and dynamics of complex systems.

Methods

Mean-field approximation

The expressions for CMFPT and class return time provided in Eqs. (2) and (5) are exact, but they have the drawback of being computationally intensive for graphs with a large number of nodes. It is possible to construct C-class mean-field approximations of these expressions, by representing the behaviour of all the nodes of a class with a single node, and looking at the graph of node classes. If we denote by π_αβ the probability for a walker to jump in one step from any node of class α to any node of class β, the general mean-field equation for CMFPT reads:

$${T}_{\beta \alpha }^{{\rm{MF}}}={D}_{\overline{\alpha }}+{\mathop{\sum}\limits_{\gamma \ne \alpha }}{\pi }_{\beta \gamma }{T}_{\gamma \alpha }$$

which can be written in compact form as follows:

$${T}_{\alpha }^{{\rm{MF}}}={D}_{\overline{\alpha }}+{{{\Pi }}}_{\overline{\alpha }}{T}_{\alpha }^{{\rm{MF}}}$$

(15)

where by definition ${\{{T}_{\alpha }^{{\rm{MF}}}\}}_{\alpha }=0$ and ${{{\Pi }}}_{\overline{\alpha }}$ is the transition matrix where the row and column corresponding to class α are set equal to zero as above. Notice that π_αβ is the total fraction of edges from nodes of class α to nodes of class β,π_αα, defined as follows:

$${\pi }_{\alpha \beta }=\left\{\begin{array}{ll}\frac{{e}_{\alpha \beta }}{{\sum }_{\beta }{\pi }_{\alpha \beta }}&\alpha \,\ne\, \beta \\ \frac{2{e}_{\alpha \beta }}{{\sum }_{\beta }{\pi }_{\alpha \beta }}&\alpha \,=\,\beta \end{array}\right.$$

(16)

where e_αβ is the total number of edges from nodes of class α to nodes of class β. Solving Eq. (15) for ${T}_{\alpha }^{{\rm{MF}}}$ we obtain:

$${T}_{\alpha }^{{\rm{MF}}}={\left[I-{{{\Pi }}}_{\overline{\alpha }}\right]}^{-1}{D}_{\overline{\alpha }}$$

Notice that this equation is formally identical to Eq. (2), with the only difference that it deals with MFPTs from all other classes to class α, while Eq. (2) provides the MFPTs from all the nodes in other classes to class α.

Similarly, the return times in the C-class mean-field approximation can be computed staring from Eq. (4) for each of the classes, obtaining:

$${R}_{\alpha }=1+{\mathop{\sum}\limits_{\beta \ne \alpha }}{\pi }_{\alpha \beta }{T}_{\beta \alpha }$$

(17)

Normalisation of CMFPT distributions

As we will see in the following sections, the statistics of CMFPT depend substantially on the shape, size, and organisation of class assignments. To allow a fair comparison of CMFPT in systems with different sizes and shapes, we compute the normalised inter-class MFPT ${\widetilde{\tau }}_{\alpha \beta }={\tau }_{\alpha \beta }/{\tau }_{\alpha \beta }^{{\rm{null}}}$. This quantity is the ratio between the expected number of time steps needed to a walk that starts on a node of class α to reach a node of class β for the first time, and the corresponding value in a null-model where classes are assigned to nodes uniformly at random, by preserving their relative abundance. Notice that if classes are distributed uniformly across the system, then ${\widetilde{\tau }}_{\alpha \beta }\approx 1,\forall \alpha ,\beta$, while in the presence of spatial correlations ${\widetilde{\tau }}_{\alpha \beta }$ will deviate from 1. By using ${\widetilde{\tau }}_{\alpha \beta }$ we properly take into account several common confounding factors, including an uneven abundance of colours, differences in size and shape and the effect of borders, thus making it possible to compare different systems on common grounds. Depending on the size of the system, the computation of ${\tau }_{\alpha \beta }^{{\rm{null}}}$ is based on the average over 10³–10⁵ independent assignments of classes to the nodes of the original graph.

Computation of CMFPT distributions in real-world networks

In the following we will compute and use the distribution of CMFPT in a variety of synthetic and real-world networks. If a system is naturally represented by an unweighted network, the elements of the transition matrix of the random walker will be set to ${\pi }_{ij}=\frac{{a}_{ij}}{{k}_{i}}$, where a_ij = 1 is an edge from node i to node j, and k_i = ∑_ja_ij is the (out)degree of node i. If the network is weighted, instead, the elements of the transition matrix will be ${\pi }_{ij}=\frac{{w}_{ij}}{{s}_{i}}$, where w_ij is the weight of the edge connecting node i to node j, and s_i = ∑_jw_ij is the (out)strength of node i. For networks with up to N ~ 10⁴ nodes, we solved Eqs. (2) and (5) exactly, using standard linear algebra packages. For larger graphs we reverted instead to Monte-Carlo simulations, where we estimated the value of T_i,α for each node i of the graph as the average over 10⁴–10⁶ random walks originating at that node, and then obtained τ_αβ using Eq. (3).

Data availability

The rollcall data were obtained from⁶⁹. The contact networks used to model the epidemic spreading from⁵¹. The income data in US cities from⁵⁹, the commuting data from⁶⁰ and the crime data from⁶⁷. All the correspondence should be addressed to V.N.

Code availability

The programmes to compute class mean first passage times are available at https://mygit.katolaz.net/covid_19_ethnicity/rw-segregation and can be used, modified and distributed under the terms of the MIT/X11 Open Source License.

References

Vanhems, P. et al. Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PloS One 8, e73970 (2013).
Article ADS Google Scholar
Fernández-Gracia, J., Suchecki, K., Ramasco, J. J., San Miguel, M. & Eguíluz, V. M. Is the voter model a model for voters? Phys. Rev. Lett. 112, 158701 (2014).
Article ADS Google Scholar
Jargowsky, P. A. Take the money and run: economic segregation in US metropolitan areas. Am. Sociol. Rev. 61, 984–998 (1996).
Yan, G. et al. Network control principles predict neuron function in the Caenorhabditis elegans connectome. Nature 550, 519–523 (2017).
Article ADS Google Scholar
Barthelemy, M. The Structure and Dynamics of Cities (Cambridge University Press, Cambridge, UK, 2016).
Book MATH Google Scholar
Batty, M. The New Science of Cities (MIT Press, Cambridge, MA, 2017).
Google Scholar
Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci 10, 186–198 (2009).
Article Google Scholar
Bassett, D. S. & Sporns, O. Network neuroscience. Nat. Neurosci. 20, 353–364 (2017).
Article Google Scholar
Noh, J. D. & Rieger, H. Random walks on complex networks. Phys. Rev. Lett. 92, 118701 (2004).
Article ADS Google Scholar
Zhang, Z., Julaiti, A., Hou, B., Zhang, H. & Chen, G. Mean first-passage time for random walks on undirected networks. Eur. Phys. J. B 84, 691–697 (2011).
Article ADS Google Scholar
Hwang, S., Lee, D.-S. & Kahng, B. First passage time for random walks in heterogeneous networks. Phys. Rev. Lett. 109, 088701 (2012).
Article ADS Google Scholar
Bonaventura, M., Nicosia, V. & Latora, V. Characteristic times of biased random walks on complex networks. Phys. Rev. E 89, 012803 (2014).
Article ADS Google Scholar
De Domenico, M., Solé-Ribalta, A., Gómez, S. & Arenas, A. Navigability of interconnected networks under random failures. Proc. Natl Acad. Sci. USA 111, 8351–8356 (2014).
Article ADS MathSciNet MATH Google Scholar
Bassolas, A., Gallotti, R., Lamanna, F., Lenormand, M. & Ramasco, J. J. Scaling in the recovery of urban transportation systems from massive events. Sci. Rep. 10, 1–13 (2020).
Article Google Scholar
Codling, E. A., Plank, M. J. & Benhamou, S. Random walk models in biology. J. R. Soc. Interface 5, 813–834 (2008).
Article Google Scholar
Nicosia, V., Skardal, P. S., Arenas, A. & Latora, V. Collective phenomena emerging from the interactions between dynamical processes in multiplex networks. Phys. Rev. Lett. 118, 138302 (2017).
Article ADS Google Scholar
Bacry, E., Delour, J. & Muzy, J.-F. Modelling financial time series using multifractal random walks. Physica A Stat. Mech. Appl. 299, 84–92 (2001).
Article ADS MATH Google Scholar
Masuda, N., Porter, M. A. & Lambiotte, R. Random walks and diffusion on networks. Phys. Rep. 716-717, 1 – 58 (2017).
Article MathSciNet MATH Google Scholar
Zhang, Z., Shan, T. & Chen, G. Random walks on weighted networks. Phys. Rev. E 87, 012112 (2013).
Article ADS Google Scholar
Pons, P. & Latapy, M. Computing communities in large networks using random walks. Int. Symposium Computer Inf. Sci., 284–293 (2005).
Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proc. Natl Acad. Sci. USA 105, 1118–1123 (2008).
Article ADS Google Scholar
Lambiotte, R., Delvenne, J.-C. & Barahona, M. Random walks, Markov processes and the multiscale modular organization of complex networks. IEEE Trans. Netw. Sci. Eng. 1, 76–90 (2014).
Article MathSciNet Google Scholar
Newman, M. E. A measure of betweenness centrality based on random walks. Soc. Networks 27, 39–54 (2005).
Article ADS Google Scholar
De Domenico, M., Lancichinetti, A., Arenas, A. & Rosvall, M. Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Phys. Rev. X 5, 011027 (2015).
Google Scholar
Hoffmann, T., Porter, M. A. & Lambiotte, R. Random walks on stochastic temporal networks. In Understanding Complex Systems, pp. 295–313 (Springer, Berlin/Heidelberg, Germany, 2013).
Gómez-Gardeñes, J. & Latora, V. Entropy rate of diffusion processes on complex networks. Phys. Rev. E 78, 065102 (2008).
Article ADS Google Scholar
Burda, Z., Duda, J., Luck, J. M. & Waclaw, B. Localization of the maximal entropy random walk. Phys. Rev. Lett. 102, 160602 (2009).
Article ADS Google Scholar
Sinatra, R., Gómez-Gardeñes, J., Lambiotte, R., Nicosia, V. & Latora, V. Maximal-entropy random walks in complex networks with limited information. Phys. Rev. E 83, 030103 (2011).
Article ADS Google Scholar
Redner, S. A Guide to First-passage Processes (Cambridge University Press, 2001).
Condamin, S., Bénichou, O. & Moreau, M. First-passage times for random walks in bounded domains. Phys. Revi. Lett. 95, 260601 (2005).
Article ADS Google Scholar
Condamin, S., Tejedor, V., Voituriez, R., Bénichou, O. & Klafter, J. Probing microscopic origins of confined subdiffusion by first-passage observables. Proc. Natl Acad. Sci. USA 105, 5675–5680 (2008).
Article ADS Google Scholar
Fronczak, A. & Fronczak, P. Biased random walks in complex networks: the role of local navigation rules. Phys. Rev. E 80, 016107 (2009).
Article ADS Google Scholar
Nicosia, V., Domenico, M. D. & Latora, V. Characteristic exponents of complex networks. Europhys. Lett. 106, 58005 (2014).
Article ADS Google Scholar
Bassolas, A., Sousa, S. & Nicosia, V. Diffusion segregation and the disproportionate incidence of covid-19 in African American communities. J. R Soc. Interface 18, 20200961 (2021).
Article Google Scholar
Sousa, S. & Nicosia, V. Quantifying ethnic segregation in cities through random walks. arXiv. Preprint at http://arxiv.org/abs/2010.10462 (2020).
Kuncheva, Z. & Montana, G. Community detection in multiplex networks using locally adaptive random walks. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 1308–1315 (2015).
Marcon, E. & Puech, F. Evaluating the geographic concentration of industries using distance-based methods. J. Econ. Geogr. 3, 409–428 (2003).
Article Google Scholar
Marcon, E. & Puech, F. Measures of the geographic concentration of industries: improving distance-based methods. J. Econ. Geogr. 10, 745–762 (2010).
Article Google Scholar
Braha, D. & De Aguiar, M. A. Voting contagion: modeling and analysis of a century of us presidential elections. PloS One 12, e0177970 (2017).
Article Google Scholar
Stolz, B., Harrington, H. & Porter, M. A. The topological ‘shape’ of Brexit. Available at SSRN 2843662 (2016).
Dlotko, P., Rudkin, S. & Qiu, W. An economic topology of the Brexit vote. arXiv. Preprint at http://arxiv.org/abs/1909.03490 (2019).
Waugh, A. S., Pei, L., Fowler, J. H., Mucha, P. J. & Porter, M. A. Party polarization in congress: a network science approach. arXiv. Preprint at http://arxiv.org/abs/0907.3509v3 (2011).
Hirano, S., Snyder, J. M. Jr, Ansolabehere, S. D. & Hansen, J. M. Primary elections and partisan polarization in the US Congress. Q. J. Polit. Sci. 5, 169–91 (2010).
Article Google Scholar
Neal, Z. P. A sign of the times? Weak and strong polarization in the US Congress, 1973–2016. Soc. Networks 60, 103–112 (2020).
Article Google Scholar
Guerra, P. H. C., Meira, W. Jr, Cardie, C. & Kleinberg, R. A measure of polarization on social media networks based on community boundaries. In ICWSM (2013).
Matakos, A., Terzi, E. & Tsaparas, P. Measuring and moderating opinion polarization in social networks. Data Min. Knowl. Discov. 31, 1480–1505 (2017).
Article MathSciNet MATH Google Scholar
Faustino, J., Barbosa, H., Ribeiro, E. & Menezes, R. A data-driven network approach for characterization of political parties’ ideology dynamics. Appl. Netw. Sci. 4, 48 (2019).
Article Google Scholar
Starnini, M., Machens, A., Cattuto, C., Barrat, A. & Pastor-Satorras, R. Immunization strategies for epidemic processes in time-varying contact networks. J. Theor. Biol. 337, 89–100 (2013).
Article MathSciNet MATH Google Scholar
Barrat, A., Cattuto, C., Tozzi, A. E., Vanhems, P. & Voirin, N. Measuring contact patterns with wearable sensors: methods, data characteristics and applications to data-driven simulations of infectious diseases. Clin. Microbiol. Infect. 20, 10–16 (2014).
Article Google Scholar
Kiti, M. C. et al. Quantifying social contacts in a household setting of rural kenya using wearable proximity sensors. EPJ Data Sci. 5, 1–21 (2016).
Article Google Scholar
DATASETS SocioPatterns.org. http://www.sociopatterns.org/datasets/ (2016). Accessed 30 September 2020.
Génois, M. et al. Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers. Netw. Sci. 3, 326–347 (2015).
Article Google Scholar
G’enois, M. & Barrat, A. Can co-location be used as a proxy for face-to-face contacts? EPJ Data Sci. 7, 11 (2018).
Article Google Scholar
Barter, E. & Gross, T. Manifold cities: social variables of urban areas in the UK. Proc. R. Soc. A 475, 20180615 (2019).
Article ADS Google Scholar
Winship, C. A revaluation of indexes of residential segregation. Soc. Forces 55, 1058–1066 (1977).
Article Google Scholar
Reardon, S. F. & O’Sullivan, D. Measures of spatial segregation. Sociol. Methodol. 34, 121–162 (2004).
Article Google Scholar
Ballester, C. & Vorsatz, M. Random walk-based segregation measures. Rev. Econ. Stat. 96, 383–401 (2014).
Article Google Scholar
Louf, R. & Barthelemy, M. Patterns of residential segregation. PloS One 11, e0157476 (2016).
Article Google Scholar
Manson, S., Schroeder, J., Riper, D. V. & Ruggles, S. IPUMS National Historical Geographic Information System: Version 14.0 [Database]. Minneapolis, MN: IPUMS. 2019. https://doi.org/10.18128/D050.V14.0
Longitudinal Employer-Household Dynamics. https://lehd.ces.census.gov/ (2016). Accessed 30 May 2020.
Logan, J. R. The persistence of segregation in the 21st century metropolis. City Community 12, 160–168 (2013).
The Rise of Residential Segregation by Income∣Pew Research Center. https://www.pewsocialtrends.org/2012/08/01/the-rise-of-residential-segregation-by-income/ (2016). Accessed 30 May 2020.
Waitzman, N. J. & Smith, K. R. Separate but lethal: the effects of economic segregation on mortality in metropolitan America. Milbank Q. 76, 341–373 (1998).
Article Google Scholar
Le Roux, G., Vallée, J. & Commenges, H. Social segregation around the clock in the Paris region (France). J. Trans. Geogr. 59, 134–145 (2017).
Article Google Scholar
Petrović, A., van Ham, M. & Manley, D. Multiscale measures of population: within-and between-city variation in exposure to the sociospatial context. Ann. Am. Assoc. Geogr. 108, 1057–1074 (2018).
Google Scholar
Randon-Furling, J., Olteanu, M. & Lucquiaud, A. From urban segregation to spatial structure detection. Environ. Plan. B Urban Anal. City Sci. 47, 645–661 (2020).
Article Google Scholar
Uniform Crime Reporting (UCR) Program – FBI. https://ucr.fbi.gov/crime-in-the-u.s/ (2016). Accessed 30 September 2020.
Olteanu, M., Randon-Furling, J. & Clark, W. A. Segregation through the multiscalar lens. Proc. Natl Acad. Sci. USA 116, 12250–12254 (2019).
Article Google Scholar
Data—Voteview. https://voteview.com/data (2016). Accessed 30 September 2020.

Download references

Acknowledgements

A.B. and V.N. acknowledge support from the EPSRC New Investigator Award Grant No. EP/S027920/1. This work made use of the MidPLUS cluster, EPSRC Grant No. EP/K000128/1. This research utilised Queen Mary’s Apocrita HPC facility, supported by QMUL Research-IT (https://doi.org/10.5281/zenodo.438045).

Author information

Authors and Affiliations

School of Mathematical Sciences, Queen Mary University of London, London, UK
Aleix Bassolas & Vincenzo Nicosia

Authors

Aleix Bassolas
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzo Nicosia
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.B. and V.N. devised the study. A.B. performed the simulations and computations. A.B. and V.N. provided methods, analysed the results, prepared the figures and the visual material, wrote the paper and approved the final submitted version.

Corresponding author

Correspondence to Vincenzo Nicosia.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bassolas, A., Nicosia, V. First-passage times to quantify and compare structural correlations and heterogeneity in complex systems. Commun Phys 4, 76 (2021). https://doi.org/10.1038/s42005-021-00580-w

Download citation

Received: 04 December 2020
Accepted: 12 March 2021
Published: 15 April 2021
DOI: https://doi.org/10.1038/s42005-021-00580-w

This article is cited by

First Passage Density of Brownian Motion with Two-sided Piecewise Linear Boundaries
- Zhen Yu
- Mao Zai Tian
Acta Mathematica Sinica, English Series (2024)
$\Delta $-Conformity: multi-scale node assortativity in feature-rich stream graphs
- Salvatore Citraro
- Letizia Milli
- Giulio Rossetti
International Journal of Data Science and Analytics (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.