Introduction

Considerable research during the past few decades has aimed to understand spreading dynamics on networks1,2,3,4,5—a widespread phenomenon that occurs in diverse settings that range from biological epidemics6,7,8 to collective social processes such as social movements9 and innovation diffusion10. To study spreading, it is useful to contrast two classes of networks: ‘geometric networks,’ in which nodes lie in a metric space and are connected by short-range ‘geometric edges’ that are constrained by the nodes’ locations (for example, lattices that describe discretized partial differential equations11), and networks that are not geometric, in the sense that their edges are not constrained or defined by distances between nodes. Although the embedding of nodes in a metric space is ubiquitous for spatial networks on Earth’s surface12, recent studies have explored the mapping of nodes in a network to locations in a (potentially) latent and (typically) low-dimensional metric space for an extensive variety of applications. Such applications include inferring missing and spurious edges in networks13,14,15,16; efficiently routing information across the internet17,18; identifying node-specific attributes that are responsible for edge formation in social networks19; and nonlinear dimension reduction of proximity networks inferred from point-cloud data (for example, images, videos and time series) for data storage and signal-processing applications20,21,22,23,24,25,26,27.

When dynamics such as contagions occur on a geometrically embedded network, it is fundamental to question the extent to which the dynamics follow the underlying low-dimensional structure. This question is particularly important and difficult for geometric networks that are supplemented with long-range ‘non-geometric edges,’ which directly connect nodes that are distant from each other with respect to an underlying metric space. Long-range edges arise in numerous applications, either by chance (for example, subways that connect distant parts of cities)12 or as a result of merging distinct layers in multilayer networks28. In some scenarios, they can also be construed as a source of ‘noise’ in an otherwise geometric network (for example, when edges arise due to the presence of noise for inferred proximity networks25,26). They also play important roles in small-world network models29 such as Watts–Strogatz30, Newman–Watts31 and Kleinberg32 networks. Because we are interested in the geometric embeddedness of such networks, we use the term ‘noisy geometric networks’ for networks that include non-geometric edges as supplements to geometric edges. (See Figs 1 and 2 for examples.)

Figure 1: Examples of noisy geometric networks.
figure 1

Nodes are embedded in three manifolds: (a) a ring (1D) embedded as a circle in ; (b) a spherical surface (2D) in ; and (c) a bounded plane (2D) embedded (nonlinearly) in in a configuration known as the ‘swiss roll’22. Given a network with ‘geometric edges’ (blue), in a and b, we add ‘non-geometric edges’ (red) uniformly at random. In c, by contrast, we add noise to the nodes’ locations in the ambient space and place edges between nodes that are nearby in that space. In this scenario, we interpret edges between nodes that are nearby with respect to the ambient space, but not the manifold, as the non-geometric edges.

Figure 2: Wavefront propagation and the appearance of new clusters.
figure 2

(a) Contagions on a noisy geometric network containing geometric edges along a manifold (in this case, a two-dimensional lattice, which we indicate with the blue edges) and non-geometric edges (red edges), which introduce shortcuts in the network. We study two phenomena in the evolution of contagion clusters (shaded areas): ‘wavefront propagation’ (WFP) describes the outward expansion of a contagion cluster’s boundary, and the ‘appearance of a new contagion cluster’ (ANC) occurs when a contagion spreads exclusively along non-geometric edges (dashed arrow). (b,c) We examine WFP and ANC for the Watts threshold model (WTM)44 for complex contagions by studying node activation times (that is, the times at which nodes adopt the contagion), which depend on the WTM thresholds {Ti}, which we take to be identical for every node (that is, Ti=T for all i). (b) For small T, frequent ANC leads to rapid dissemination of a contagion. (c) For moderate T, little to no ANC occurs and WFP leads to slow dissemination. For large T, there is no spreading. For a given network, activation times across multiple realizations of a contagion (with varying initial conditions) map the nodes to a point cloud via what we call a ‘WTM map’.

The presence of long-range edges can significantly alter how processes spread30,31,32,33. For example, it is traditional to characterize contagions in a geometric setting using ‘wavefront propagation’ (WFP)3, which agrees with the qualitative properties of historical epidemics such as the Black Death34. By contrast, refs 6, 7, 8 (and numerous other sources1) have highlighted that modern biological epidemics tend to be dominated by long-range transportation networks, such as airline networks or railway networks, rather than by geographic proximity. Spreading across long-range edges can result in the ‘appearance of new clusters’ (ANC) of a contagion that are spatially distant, which is an important phenomenon in the dynamics of recent global epidemics35. Indeed, it has been reported that prominent strains of influenza (for example, H1N1/09) exhibited a pattern of ‘skip-and-resurgence’ (in which some countries avoided outbreaks in some years) during recent worldwide outbreaks36. In addition, long-range edges can also have significant effects on social contagions37,38,39,40. Given the (either implicit or explicit) geometric embeddedness for so many of the networks on which ideas and diseases spread1,12, an improved understanding of contagions on noisy geometric networks is important for numerous applications, which range from the identification of influential spreaders of information41 to control of biological epidemics42,43.

WFP and ANC can be very different in social versus biological contagions. One important difference arises from phenomena such as social reinforcement37,38,39,40, which occurs only for social contagions. In Fig. 2, we illustrate the prominent effect of social reinforcement for the Watts threshold model (WTM)44 of social contagions. The WTM is a generalization of bootstrap percolation45 and is based on the idea that each node i has some threshold Ti0 (refs 46, 47) for adopting a social contagion (that is, for becoming infected). The threshold dynamics gives rise to the characterization of the WTM as a so-called ‘complex’ contagion, because the dynamics at each node i depend on the states of all neighbouring nodes, and it might be necessary for multiple neighbours to be infected before node i adopts a contagion. Importantly, for some threshold values, WFP can dominate ANC even in the presence of many ‘noisy’ edges—a phenomenon that has widespread applications (see Discussion and our Supplementary Discussion).

In the present paper, we study bifurcations in WFP and ANC dynamics by examining data that are generated by several contagions on a given noisy geometric network. Our methodology is grounded in the field of computational topology48,49, and we note that there has been rapidly intensifying interest (see, for example refs 50, 51, 52, 53) in using tools from computational topology to study structural features in networks and for machine learning54. In taking this perspective, we introduce a map from the network nodes to points in a metric space based on contagion dynamics. By analogy to diffusion maps24 and similar ideas in nonlinear dimension reduction and manifold learning20,21,22,23,24,25,26,27, we use the term ‘contagion maps’ for these maps. We investigate the topology, geometry and dimensionality properties of these maps, and we find for the contagion regime that predominantly exhibits WFP versus ANC that these properties correspond to the manifold that underlies the noisy geometric network. We examine both synthetic and empirical networks, including a transit system in London (see the section ‘Contagions on a London transit network’, Supplementary Note 1 and Supplementary Figs 1–5). Given that the manifold structure in a contagion map can reflect the underlying manifold structure of a noisy geometric network, contagion maps also help for the identification of such underlying structure. This has numerous applications, including the denoising of networks (see Supplementary Note 2 and Supplementary Figs 6 and 7).

Results

Noisy geometric networks

Noisy geometric networks are a class of networks that arise from geometric networks12 but also include non-geometric, ‘noisy’ edges. Consider a set of network nodes that have intrinsic locations in a metric space. We restrict our attention to nodes that lie on a manifold that is embedded in an ambient space (that is, ). We use the term ‘node-to-node distance’ to refer to the distance between nodes in this embedding space , which we equip with the Euclidean norm (although one can also use other metrics17). To create a noisy geometric network, we place the nodes in the underlying manifold and add two families of edges: (1) a set of geometric edges, such that when nodes i and j are sufficiently close to one another (that is, the length of shortest path along the manifold that connects the two nodes is less than some distance threshold); and (2) a set of non-geometric edges, which we place using some random process between pairs of nodes (i, j), where and . In Fig. 1a,b, we show examples of constructing noisy geometric networks by adding non-geometric edges uniformly at random. In Fig. 1c, we show a construction that is motivated by nonlinear dimension reduction of point-cloud data22,23,24,25,26.

As an illustrative example, consider the noisy ring lattice in Fig. 1a, which is similar to the Newman–Watts variant of the Watts–Strogatz small-world model30,31. Specifically, we consider N nodes that are uniformly spaced along the unit circle in . We then add geometric edges so that every node i is connected to its d(G) nearest-neighbour nodes. (Note that there are no self-edges.) We then add d(NG) non-geometric edges to each node and connect the ends of these edges (that is, the stubs) uniformly at random while avoiding self-edges and multi-edges. The resulting network is a (d(G)+d(NG))-regular network that contains Nd(G)/2 geometric edges and Nd(NG)/2 non-geometric edges. We can thus specify this class of random networks using three parameters: N, d(G) and d(NG). It is also useful to define the ratio α=d(NG)/d(G) of non-geometric to geometric edges. Our construction assumes that N and d(G) are even. In Fig. 1a, we depict a noisy ring network with N=20 and (d(G), d(NG))=(4, 2). In Supplementary Note 3 (see also Supplementary Figs 8 and 9), we study models of noisy geometric networks on a ring manifold that incorporate heterogeneity in the nodes’ degrees and/or locations.

Watts threshold model

We analyse a well-known dynamical system for social contagions: the WTM for complex contagions44. In addition to allowing analytical tractability, we have two other motivations for using the WTM. First, WTM contagions yield ‘filtrations’ of a network and thereby allow us to develop a methodology grounded in computational topology48,49,50,51,52,53. Second, the WTM is a simple-but-insightful model for social influence that has the virtue of explicitly considering social reinforcement37,38,39.

We define a WTM contagion as follows. Given an unweighted network (which we represent using an adjacency matrix A) with a set of nodes and a set of edges, we let ηi(t) denote the state of node at time t, where ηi(t)=1 indicates adoption (that is, infection) and ηi(t)=0 indicates non-adoption. We initialize a contagion at time t=0 by choosing a set of nodes and setting ηi(0)=1 for and ηi(0)=0 for all other nodes. We refer to as the ‘contagion seed.’ We consider synchronous updating in discrete time4, so a node i that has not already adopted the contagion at time t (that is, ηi(t)=0) will adopt it during the next time step (that is, ηi(t+1)=1) if and only if fi>Ti, where Ti is a node-specific adoption threshold, denotes the fraction of neighbours that are infected and is the degree of node i. (Note that this is a slight modification from the original WTM44, which uses the adoption criterion fiTi.) We repeat this process until the system reaches an equilibrium point at some time t*<N (that is, no further adoptions occur). For each node i, we let x(i) denote the node’s ‘activation time,’ which is the time t at which the node adopts the contagion. Given {Ti} and the contagion seed, a WTM contagion on a network is a deterministic process. In addition, a node’s adoption of the contagion is irreversible (that is, there is no unadoption in this model), so the dynamics are monotonic in the sense that the subset of infected nodes at time t is non-decreasing with time (that is, ). One can thus use the contagion to construct a ‘filtration’ of the network nodes . (See refs 48, 49 and our discussion in Supplementary Note 4.)

Contagion maps

We study contagion maps based on WTM contagions, and we refer to these maps as ‘WTM maps.’ A WTM map is a nonlinear map of nodes in a network to a point cloud in a metric space, based on the activation times from several realizations of a WTM contagion. Given J realizations of a WTM contagion on a network with different initial conditions, the associated WTM map is a function from to that records the activation time xj(i) of the ith node in the jth realization. More precisely, we define a ‘regular’ WTM map as , where . In practice, we enumerate the contagions j=1, 2, …, JN, and we initialize the jth contagion at a contagion seed such that for each j. (Note that one can select any J nodes as seeds by relabelling the nodes.) In addition to the regular WTM map we also define ‘reflected’ and ‘symmetric’ versions of the WTM map for the subset of nodes . Letting and , we define the reflected WTM map and symmetric WTM map . For a given network and a given set {Ti} of thresholds, the regular, reflected and symmetric WTM maps are deterministic.

The choice of contagion seeds plays a crucial role in determining the dynamics of WTM contagions and a WTM map. In practice, we use J=N realizations of a WTM contagion for an N-node network, for which we initialize the jth realization with a contagion seed that includes node j and its network neighbours. We use the term ‘cluster seeding’ to describe this type of initial condition, which we illustrate in Fig. 3. By contrast, we use the term ‘node seeding’ to refer to the initialization of a contagion at a single node: . In addition, note that setting J=N yields , and then the complete set of nodes is mapped by all versions of a WTM map. In Supplementary Note 5 (see also Supplementary Fig. 10 and Supplementary Table 1), we show that the typical computational complexity for constructing a WTM map is , where M is the number of edges. We have made our code for constructing WTM maps publicly available. (See the Methods section ‘Data and code availability’.)

Figure 3: Contagion initialized with cluster seeding.
figure 3

A WTM contagion on a noisy ring lattice in which each node has d(G)=4 geometric edges and d(NG)=1 non-geometric edge. We initialize the contagion at time t=0 by setting node s and its network neighbours as infected (indicated by the light-blue nodes and edges). This results in two contagion clusters: C1 and C2. At time t=1, depending on the WTM thresholds {Ti}, additional nodes can adopt the contagion either via WFP and/or via ANC. As indicated by the orange nodes and edges, nodes that are in the ‘boundary’ of C1 can adopt the contagion via WFP travelling around the underlying ring lattice. We illustrate this idea further in the magnifying box, where nodes a and b in the boundary of C1 can potentially become infected in the first time step. Alternatively, nodes that share only a non-geometric edge with a contagion seed can potentially become infected via ANC (as indicated by the dark-blue nodes and dashed edges).

We now motivate our choice for contagion initialization. The requirement that is convenient because it allows us to think of the activation time as a notion of distance from node j to node i (that is, it describes the time that is required for a contagion to travel from node j to i). This choice is akin to the diffusion distance24 and commute-time distance55 derived from diffusion dynamics (although the latter is known to have shortcomings for certain classes of networks56). To illustrate this point, suppose that contagion seeds are individual nodes (that is, for ), and suppose that we construct the WTM map with Ti=T=0 for each node . In this case, the activation time exactly recovers the length of the shortest path between nodes i and j, and this in turn defines a metric on the discrete space . In fact, the N × N matrix is a dissimilarity matrix, which is central to many algorithms for dimension reduction22,23,24,25,26 (including Isomap22, which implements the mapping of nodes based on shortest paths). Letting T>0 and still assuming that each is finite, we show in Supplementary Note 4 that the symmetric WTM map induces a metric on . More generally, we show that a set of ‘filtrations’ induces a metric under certain conditions. Consequently, we find that one can also use topological data analysis of networks to study the embedding geometry of networks.

Although node seeding has wonderful mathematical properties, cluster seeding is very useful in practice because it can allow a contagion to infect a larger fraction of the nodes in a network. When Ti>0 for each , it is common for WTM contagions to reach equilibria that do not saturate the network with a contagion. This implies that for some . Activation times of infinity pose a problem, because WTM maps are well defined only for activation times that are finite (see the section Activation times of infinity in WTM maps). Contagions initialized with clusters of a contagion are more likely to spread than those that are initialized at a single node57, so cluster seeding increases the range of threshold choices that yield activation times that are finite. Although WTM maps that we construct using cluster seeding no longer automatically induce a metric on the node set , one can still construe as a distance from node j to i if the contagion seeds are sufficiently small, .

WTM contagions on noisy ring lattices

To guide our experiments on using WTM maps to study WFP and ANC on noisy ring lattices, we conduct a bifurcation analysis for WTM contagions with Ti=T that are initialized with cluster seeding. We present our analysis in detail in the section ‘Bifurcation analysis’ and in Supplementary Note 6, and we summarize our results here.

Our primary results are two sequences of critical values for the WTM threshold T that depend on the non-geometric degree d(NG) and geometric degree d(G). These critical values determine the presence versus absence of WFP and ANC, as well as their rates. The qualitative features of ANC behaviour are determined by the thresholds

Whenever , a node requires at least (d(NG)k) neighbours from non-geometric edges to be infected before it adopts the contagion. This subsequently determines the rate at which new clusters of contagion appear. For , there is no ANC. The qualitative features of WFP are determined by the thresholds

where a wavefront propagates at a speed of k+1 nodes per time step for . For , there is no WFP.

In Fig. 4a, we show a bifurcation diagram that summarizes the WTM dynamics for various values of the contagion threshold T and ratio α=d(NG)/d(G) of non-geometric edges to geometric edges. The dashed and solid curves, respectively, describe equations (1) and (2) for k=0. That is, T0(WFP)=1/(2+2α) and , which intersect at (α, T)=(1/2, 1/3) and yield four regimes of contagion dynamics that we characterize by the presence versus absence of WFP and ANC. In Fig. 4b, we plot equations (1) and (2) with other k values for d(G)=6, where we note that lower curves correspond to larger k. Observe that increasing T for fixed α leads to slower WFP and less-frequent ANC. In particular, for (d(G), d(NG))=(6, 2) (which implies that α=1/3), we find four qualitatively different regimes of WFP and ANC traits (see the regions that we label I–IV).

Figure 4: Bifurcation analysis for WTM contagions on a noisy ring lattice.
figure 4

(a) We plot the critical thresholds for k=0 given by equation (1) (dashed curve) and equation (2) (solid curve) versus the ratio α=d(NG)/d(G) of non-geometric to geometric edges. These curves divide the parameter space into four qualitatively different contagion regimes, which we characterize by the presence versus absence of WFP and ANC. (b) Equations (1) and (2) for other values of k further describe WFP and ANC, and we show them for d(G)=6. Note that the curves become lower with increasing k. Fixing (d(G), d(NG))=(6, 2), which yields α=1/3, we find four contagion regimes (which we label using the symbols I–IV), where increasing T corresponds to slower WFP and less-frequent ANC. (c) For N=200 and T{0.05, 0.2, 0.3, 0.45}, we plot the contagion size q(t) versus time t for one realization of a WTM contagion with cluster seeding (that is, q(0)=1+d(G)+d(NG)=9). We observe, as expected, that the growth rate decreases with T. In particular, for regime III (for example, T=0.3), the contagion spreads strictly via WFP, which initially spreads at a rate of 1 node per time step (both clockwise and counterclockwise along the ring) but eventually accelerates to d(G)/2 nodes per time step. As we show using the labelled black lines, we predict and observe linear growth for q(t) when the contagion spreads by WFP and no ANC and either q(t)≈1 or q(t)≈N. (See the section ‘Bifurcation analysis’ and Supplementary Note 6.) (d) We plot the number of contagion clusters C(t) versus t. As expected, C(t) only increases above its initial value of C(0)=1+d(NG)=3 for regimes I and II (for which T<T0(ANC)). There is no spreading in regime IV.

In Fig. 4c,d, we illustrate dynamics from these regimes by choosing T{0.05, 0.2, 0.3, 0.45} and plotting the size q(t) of the contagion (see Fig. 4c) and the number of contagion clusters C(t) (see Fig. 4d) versus time t. Note that the number C(t) of contagion clusters is equal to the number of connected components in the subgraph of the original network that only includes infected nodes and geometric edges. The values of q(t) and C(t) that we determine numerically (for N=200) agree with our analysis. For T=0.05, the WTM contagion saturates the network (that is, q(t)→N) very rapidly due in part to the appearance of many contagion clusters early in the contagion process. For T=0.2, the contagion saturates the network relatively rapidly due to the appearance of some new contagion clusters. For T=0.3, the contagion saturates the network slowly, as no new contagion clusters appear, and the contagion spreads only via WFP. For T=0.45, the contagion does not saturate the network, as neither WFP nor ANC occurs.

Analysing WTM maps for noisy ring lattices

In this section, we analyse symmetric WTM maps for noisy ring lattices in several ways: geometrically, topologically and in terms of dimensionality. Our point-cloud analytics identify parameter regimes in which characteristics of a network’s underlying manifold also appear in the WTM maps. This makes it possible to do manifold learning and to assess the extent to which a contagion exhibits WFP (along a network’s underlying manifold) versus ANC.

In Fig. 5, we study WTM maps for a noisy ring lattice with N=200 and (d(G), d(NG))=(6, 2). We give each node i an intrinsic location w(i)=[cos(2πi/N), sin(2πi/N)]T on the unit circle . In Fig. 5a, we illustrate the point clouds that result from WTM maps with thresholds of T{0.05, 0.2, 0.3, 0.45}, which correspond to the four regimes of contagion dynamics that are predicted by equations (1) and (2) for α=1/3. (See labels I–IV in Fig. 4b.) To visualize the N-dimensional point clouds , we use principal component analysis (PCA) to project onto (refs 22, 26, 58). The colour of each node at location w(i) and point z(i) reflects the activation time for node i during one realization of the WTM contagion that we use to generate the WTM map. In particular, dark-blue nodes (points) indicate the contagion seed under cluster seeding. Grey nodes (points) never adopt the contagion and thus have activation times that are infinite. For practical purposes, we set these activation times to be 2N rather than ∞. (See the Methods section ‘Activation times of infinity’ in WTM maps for additional discussion.) Regime III is the regime for which the point cloud {z(i)} appears to best resemble (up to rotation) the nodes’ intrinsic locations {w(i)}. This is expected, as this regime corresponds to WFP and no ANC. (In other words, the contagion follows the network’s underlying manifold .)

Figure 5: Contagion maps applied to noisy ring lattices.
figure 5

Symmetric WTM maps were applied to a noisy ring lattice with N=200 and (d(G), d(NG))=(6, 2). (a) We show point clouds for WTM maps with T{0.05, 0.2, 0.3, 0.45}, which correspond, respectively, to regimes I–IV in Fig. 4b. For visualization purposes, we show two-dimensional projections of the N-dimensional point clouds after applying principal component analysis (PCA)26,58. (b) We show one realization of the contagion that we used to construct the WTM maps in a. The colour of each point in a—and corresponding node in b—indicates the node’s activation time from this one realization. Nodes in the contagion seed are dark blue, and nodes that never adopt the contagion are grey. (c) As we discuss in the text, we analyse point clouds that result from WTM maps with respect to three criteria: geometry through a Pearson correlation coefficient ρ; dimensionality through the embedding dimension P; and topology through the difference Δ of lifetimes. (See the main text as well as the Methods section.) The vertical dashed lines in c indicate the predicted bifurcations in contagion dynamics from equations (1) and (2) (see Fig. 4b). Note that there are activation times that are infinite for TT0(WFP)=3/8 (shaded region in c). As expected for regime III, ρ≈1, P≈2 and large Δ indicate that the geometry, dimensionality and topology, respectively, of the point cloud recover those of a ring manifold. See sections ‘Geometry of WTM maps’, ‘Dimensionality of WTM maps’ and ‘Topology of WTM maps’ as well as Supplementary Note 7 for discussions of these approaches for analyzing point clouds.

In Fig. 5c, we summarize the characteristics of WTM maps for different thresholds T[0,0.6]. For each threshold, we analyse manifold structure in a point cloud by studying geometry through a Pearson correlation coefficient ρ; dimensionality through an approximate embedding dimension P; and topology through Δ, which denote the difference in lifetimes for the two most persistent 1-cycles in a Vietoris–Rips filtration48,49. Large values of Δ indicate the presence of a single dominant 1-cycle (that is, a ring) in a point cloud. See sections ‘Geometry of WTM maps’, ‘Dimensionality of WTM maps’ and ‘Topology of WTM maps’ as well as Supplementary Note 7 and Supplementary Figs 11–13 for additional discussion of our analysis of point clouds.

As expected by our analysis, for regime III (which exhibits WFP but no ANC), we identify characteristics of the manifold in the point clouds that result from WTM maps. Namely, for regime III, the point cloud has similar geometry (indicated by large ρ), embedding dimension (indicated by P=2) and topology (indicated by large Δ) as the network’s underlying ring manifold .

In Fig. 6, we analyse WTM maps applied to noisy ring lattices for various values of α=d(NG)/d(G). Specifically, we show values for ρ, P and Δ for N=200, d(G)=20, various T and various d(NG). We show using the dashed and solid curves, respectively, that the transitions between the qualitatively different regions of these properties closely resemble the bifurcation structure from equations (1) and (2) with k=0. In particular, when there is WFP but no ANC, we are able to consistently identify the geometry, embedding dimension and topology of the underlying manifold of the noisy ring lattice using the WTM map. When there is both WFP and ANC, the extent to which a contagion adheres to the network’s underlying manifold depends on α and T, and we can quantify this extent using the point-cloud measures ρ, P and Δ. We illustrate our observations further in Fig. 6d by fixing α=1/3 and plotting ρ, P and Δ as a function of the threshold T. We show results for (d(G), d(NG))=(6, 2) (blue dashed curves) and (d(G), d(NG))=(24, 8) (red solid curves). Observe that the latter curve is smoother than the former one. The latter curve yields values of ρ, P and Δ that better reflect the underlying ring manifold . By contrast, increasing the number N of nodes increases the contrast (that is, as observed through ρ, P and Δ) between the region that predominantly exhibits WFP and the other regions.

Figure 6: Analysing manifold structure in contagion maps.
figure 6

We analyse the point clouds of WTM maps for various thresholds T for noisy ring lattices with N=200 and various ratios α=d(NG)/d(G). (As an example, we show results for d(G)=20 and various values of d(NG).) For each point cloud, we study (a) geometry through ρ, (b) dimensionality through P and (c) topology through Δ (see the text and the Methods section). The transitions between qualitatively different structures in the WTM maps (that is, as seen through ρ, P and Δ) closely resemble the bifurcation structure from equations (1) and (2), which we show for k=0 using solid and dashed curves, respectively. In d, we fix α=1/3 and plot ρ, P and Δ as a function of threshold T. We show results for (d(G), d(NG))=(6, 2) (blue dashed curves) and (24, 8) (red solid curves). Note that there are activation times that are infinite for TT0(WFP)=3/8 (shaded region in d). The arrows indicate the ρ, P and Δ values that we obtain for the embedding of nodes based on shortest paths, which (as we discuss in the text) one can construe as a variant of the dimension-reduction algorithm Isomap22.

To give some perspective on the performance of WTM maps for identifying a noisy geometric network’s underlying manifold even in the presence of many non-geometric edges, we use the arrows in Fig. 6d to indicate the values of ρ, P and Δ for a mapping of nodes based on shortest paths, which one can construe as a variant of the dimension-reduction algorithm Isomap22 (which we apply to an unweighted network rather than to a point cloud). Specifically, we map with T=0 (as we discussed in the section ‘Contagion maps’).

In Supplementary Note 8, we describe additional numerical results that compare a WTM map with Isomap22 and a Laplacian eigenmap23 for generalizations of the noisy ring lattice by (1) allowing the node locations to be a random sampling of points on the unit circle and (2) allowing heterogeneity in their geometric and non-geometric degrees. We define these other network structures in Supplementary Note 2. Our results (see Supplementary Figs 14–21) reveal large parameter regimes in which the ring manifold that underlies the noisy ring lattice is much more apparent (that is, as indicated by large ρ, small P and large Δ) for maps based on WTM contagions versus those based on shortest-path or diffusion dynamics (that is, as in the Laplacian eigenmap). We stress that any applications of dimension reduction (for example, manifold learning) in networks should use an approach that is appropriate for the question of interest. This is why we use contagions in this paper instead of other types of spreading dynamics.

Contagions on a London transit network

In addition to synthetic networks, we study WTM maps for a London transit network (see Fig. 7a). Nodes in the network represent intersections of known latitude and longitude (their coordinates are {w(i)}), geometric edges represent roads (from data used in ref. 59) and non-geometric edges represent metropolitan lines (from data used in ref. 60). We have made the network publicly available (see the Methods section ‘Data and code availability’). We present our results in detail in Supplementary Note 1, and we summarize them here.

Figure 7: Complex contagions on a London transit system.
figure 7

(a) London transit network with N=2,217 nodes (that is, intersections), 2,854 roads59 (which we interpret as geometric edges) and 15 metropolitan lines60 (which we interpret as non-geometric edges). (b) Node activation times for a WTM contagion initialized with cluster seeding illustrate for small T that contagions quickly spread by skipping across the metro lines; this leads to ANC. (c) In contrast, for moderate T, the contagion spreads via slow WFP. (d) Although not all contagions exhibit such extreme sensitivity to T (see Supplementary Note 5), the dependence of ANC and WFP on T is captured by the geometry of WTM maps if one appropriately handles the activation times that are infinite (that is, nodes that never adopt the contagion). See the discussion in the text. The curves with symbols indicate the values of ρ for WTM maps (curves with symbols), and the horizontal dotted and dashed lines, respectively, indicate ρ for the mapping of nodes based on shortest-path distances (that is, as in the Isomap algorithm22) and a two-dimensional Laplacian eigenmap23.

Our central finding is that the qualitative dynamical regimes that we observe for synthetic noisy geometric networks also occur in the London transit network. More specifically, we observe both WFP and ANC. In addition, as we illustrate in Fig. 7b,c, these phenomena can be very sensitive to the WTM threshold T. We study WFP and ANC by examining the geometry of WTM maps. However, we do not study their one-dimensional (1D) homology, as computations of homology (which remain a very active area of research61,62) have a much higher computational cost than our calculations of geometry and dimensionality.

In Fig. 7d, we plot the Pearson correlation coefficient ρ that compares the distance between mapped nodes with their actual distance from each other (according to latitude and longitude) for various values of T. We show results for the regular, reflected and symmetric versions of a WTM map (curves with symbols), and the horizontal dotted and dashed lines, respectively, give ρ for the mapping of nodes based on shortest-path distances (that is, as in the Isomap algorithm22) and a two-dimensional Laplacian eigenmap23. For each type of WTM map, we handle the activation times that are infinite (see the Methods section ‘Activation times of infinity in WTM maps’) using two methods. In the method that we label ‘full,’ we keep the entire matrix that encodes activation times, and we set the activation times that are infinite to be 2N. (Recall that we used this approach when studying WTM maps for synthetic networks.) In the method that we label ‘part,’ we neglect contagions that do not saturate a network, so we use only a portion of the values in the matrix that encode activation times. In Fig. 7d, we see that these choices give contrasting results. For the ‘full’ option, activation times of infinity (which arise when ) distort the WTM map and lead to a drop in ρ. In contrast, the ‘part’ method neglects activation times of infinity, and we find that there is a range of T values for which there is a pronounced increase in ρ. Such improved agreement between the geometry of WTM contagions and the transit network’s inherent latitudinal and longitudinal embedding on Earth’s surface is characteristic of an increase in WFP versus ANC. Interestingly, we find that the small node degrees (for example, 〈di〉≈2.59) and the significant heterogeneity (for example, with respect to node locations, node degrees and the length of roads) in the London transit network cause WFP and ANC to be extremely sensitive to the value of T for only a few of the contagion seeds (see Supplementary Note 1 and Supplementary Fig. 5). Nevertheless, as we have demonstrated, such minority cases still have a significant effect on WTM maps.

Our numerical experiments for the London transit network highlight additional complexities that can arise for networks that are constructed from empirical data, and they offer complementary insights to our investigation of synthetic networks. In particular, the synthetic networks that we examine either are homogeneous or are only slightly heterogeneous, so the WFP and ANC behaviour tends to be similar for contagions that are initialized in different parts of a network. This is not the case for the London transit network, which has significant heterogeneity and very small node degrees (which seem to exacerbate the effect of heterogeneity). Infections that start in some parts of the network have rather different properties than those that start in others, and one also needs to consider multiple strategies for how to handle activation times of infinity. There are also other interesting phenomena that our approach can examine for heterogeneous networks. For example, in Supplementary Note 1, we study the geometry of WTM contagions for individual nodes (rather than averaging our results over an entire network) in what amounts to an ‘egocentric’ analysis of geometry. We find that the local geometry of WTM maps (and hence of contagions) at a given node relates strongly to its proximity to a metro line.

Discussion

Many empirical networks include a combination of geometric edges between nearby nodes and non-geometric, long-range edges12. Such situations can arise when nodes are restricted by their locations in a physical space (such as in a city) or in terms of latent underlying spaces16,17,18,19,20,21,22,23,24,25,26. When considering a spreading process on a noisy geometric network, it is important to understand the extent to which a contagion follows the underlying structure. (Additionally, one can also consider the possibility of WFP in a latent structure6, which need not look like WFP with ordinary observations.) To address this question, we conducted a detailed investigation using the WTM of complex contagions (with uniform threshold T) on noisy geometric networks. The spreading dynamics exhibit both WFP, which follows the underlying manifold structure of a network, as well as the ANC of contagion in distant locations. To investigate the extent to which a WTM contagion adheres to a network’s underlying manifold, we introduced the notion of WTM maps (and contagion maps more generally) and showed when a contagion predominantly spreads via WFP that WTM maps recover the topology, geometry and dimensionality of a network’s underlying manifold even in the presence of many non-geometric (that is, ‘noisy’) edges.

Our methodology of constructing and analysing contagion maps has important implications not only for the analysis, modelling and control of contagions, but also for other dynamics63,64,65 that can be used to construct filtrations of networks. Moreover, by studying manifold structure in contagion maps, we have shown that such maps can also be used to identify and study manifold structure in networks. We have compared WTM maps with Laplacian eigenmaps23 and Isomaps22 (see Supplementary Note 8 for additional discussion) and found that WTM maps—which are based on a nonlinear and nonconservative dynamical process—yield results that contrast with those from the other methods. This is sensible, as nonconservative and conservative dynamics (for example, diffusion) are known to give different results for which nodes are central66 and what network structures constitute bottlenecks to the dynamics67.

In the Supplementary Discussion, we further consider the implications of our work on three important fields of research: (i) studying contagions and other dynamics from the perspective of high-dimensional data analysis (that is, computational topology and nonlinear dimension reduction), (ii) identifying low-dimensional (for example, manifold) structure in networks and (iii) identifying low-dimensional (for example, manifold) structure in point-cloud data.

Methods

Data and code availability

The London transit network that we study in the section ‘Contagions on a London transit network’ and the code that we use to construct WTM maps are available as Supplementary Data 1 and Supplementary Software 1, respectively.

Bifurcation analysis

To guide our study of WTM maps, we set Ti=T for each node , and we perform a bifurcation analysis of WTM contagions on noisy ring lattices. In particular, we investigate the dependence of ANC and WFP on the contagion threshold T and on the network parameters d(G), d(NG) and N. In Fig. 3, we illustrate ANC and WFP for this class of networks with d(G)=4, d(NG)=1 and N=40 by considering a WTM contagion at time t=0. The light-blue nodes are in the contagion seed , which is centred at node . Because node s is incident to both geometric and non-geometric edges, the contagion is initialized with 1+d(NG)=2 contagion clusters. We denote these clusters by C1 and C2. Cluster C1 is more likely to grow via WFP than C2. The orange nodes in Fig. 3 are what we call contagion cluster C1’s ‘boundary’—the set of nodes that have yet to adopt the contagion but that are exposed to it via a geometric edge that is incident to an infected node in C1. As we show in the magnification on the right, nodes in the boundary can adopt the contagion via WFP. Nodes that are not infected and not on the boundary can become infected via ANC. (See the dark-blue nodes and dashed edges.)

If node i adopts a contagion via ANC, then by definition it is not in the boundary of a contagion cluster, so its neighbours due to geometric edges have yet to adopt the contagion. Consequently, node i potentially has 0, 1, …, d(NG) neighbours that are infected, and its fraction of infected neighbours is restricted to . This observation yields the critical thresholds

The contagion dynamics changes abruptly at the critical values of T, so the qualitative dynamics of ANC for any are similar to each other, but there are abrupt changes at the end points of the interval. In particular, whenever a node requires at least (d(NG)k) neighbours due to non-geometric edges to be infected before it adopts the contagion. In Supplementary Note 6, we study the probability that a node has exactly (d(NG)k) infected non-geometric neighbours at time t. For large networks, this probability is approximately where q(t) denotes the number of nodes that have adopted the contagion at or before time t. Note that the probability is an expectation over the ensemble of noisy ring lattices, because it uses the fact that non-geometric edges are generated uniformly at random in our model. Therefore, it does not matter which of the q(t) nodes happen to be infected.

Turning to WFP, we now study contagion transmissions exclusively across geometric edges. That is, given a node i in a contagion cluster’s boundary, we assume that the node’s neighbours due to non-geometric edges are not infected. Naturally, this assumption does not always hold, but it is insightful to first examine this ideal case and then consider more general situations as perturbations of such a baseline analysis of WFP.

To facilitate our discussion, we will use the example contagion illustrated in Fig. 3. In particular, we consider WFP in the clockwise direction for cluster C1. Nodes a, b and c are exposed, respectively, to 2, 1 and 0 nodes that have adopted the contagion, so their fractions of neighbours that are infected are fa=2/5, fb=1/5 and fc=0/5, respectively. Note that we assume that the non-geometric edges for nodes a, b and c are incident to nodes that are not infected (that is, which have not adopted the contagion). Because fi>T for node i to adopt the contagion, one of three situations can occur at time t=1: (1) if 0≤T<1/5, then nodes a and b adopt the contagion; (2) if 1/5≤T<2/5, then node a adopts the contagion; and (3) if 2/5≤T, then the contagion cluster C1 does not increase in size via WFP. Node c cannot adopt the contagion via WFP at time t=1 for any T0. We find that WFP is governed by the critical thresholds

where a wavefront propagates at a speed of k+1 nodes per time step for . For TT0(WFP), there is no WFP.

We now include additional discussion of the assumptions in our analysis of WFP. Specifically, when considering whether or not node i in a contagion cluster’s boundary will become infected, we assumed that its non-geometric edges are not incident to an infected node. Obviously, this assumption is valid for d(NG)=0. However, as we discuss in Supplementary Note 6, the expected probability (over an ensemble of noisy geometric networks with non-geometric edges generated uniformly at random) that a node’s non-geometric edge is incident to an infected node is q(t)/(N−1). Similarly, the probability that a node has d(NG) non-geometric neighbours and that none of them are infected is approximately [1−q(t)/N]dNG, which is therefore the probability that our assumption is valid. In particular, whenever , which necessarily requires and describes the scenario of an early stage of a contagion on a large network, the probability that our assumption is valid is approximately equal to 1. Therefore, equation (2) accurately describes the speed of WFP in this scenario with high probability. (Note that we also assume that , so there cannot be too many non-geometric edges.)

Equation (2), which one can construe as a ‘local’ result, is also very useful for predicting the ‘global’ behaviour of WFP. To see this, we make the following two observations: (1) if a contagion cannot spread when , then it will not reach a state in which ; and (2) if q(t) does spread for , then it will also spread when , because an increase in q(t) will help promote further spreading. Specifically, the presence of a node in the boundary with infected non-geometric neighbours can accelerate WFP by allowing the node to adopt the contagion with fewer infected geometric neighbours than equation (2) would predict. In fact, when the contagion size is large (that is, when q(t)≈N), we find that the WFP speed accelerates up to d(G)/2 nodes per time step (that is, all nodes in the boundary on one side of the contagion cluster become infected upon each time step). Similar accelerated WFP has also been observed for other applications including species dispersion68. See Supplementary Note 6 for further discussion.

In Supplementary Note 3, we use a perturbative approach to generalize our bifurcation analysis to a family of synthetic noisy geometric networks with slight heterogeneities. In our generalizations, we examine the WFP and ANC behaviour of WTM contagions at each node. When the nodes are identical (that is, as in the synthetic ring lattice), the contagion behaviour is uniform across a network; this leads to the bifurcation diagram in Fig. 4. When there is heterogeneity, the contagion behaviour at each node varies across a network. However, if the amount of heterogeneity is small, then one can construct a perturbed bifurcation diagram in which the boundaries between contagion regimes are thickened. That is, as one varies T or α, the transition from one regime (for example, WFP and no ANC) to another (for example, WFP and ANC) still occurs, but it does not occur simultaneously for each node.

Activation times of infinity in WTM maps

When studying WTM maps, one needs a strategy for dealing with activation times that are infinite (which in some cases might be useful for identifying outliers and in other cases might be problematic). After constructing a map such as , the distance between points x(i) and x(j) for can be infinite or even undefined, which complicates any subsequent analyses of the point cloud {x(i)}. Such an issue can also arise for distances that are derived from shortest paths or the commute time for diffusion, so algorithms for mapping networks often assume that a network consists of a single connected component22,23. Distances that are infinite are not an issue for diffusion maps24, because the nodes are mapped to a bounded metric space whose diameter is equal to twice the maximum of the heat kernel.

For complex contagions, activation times that are infinite arise not only due to disconnected networks, but also for networks that are ‘disconnected’ with respect to the contagion dynamics. In the present work, we use two methods for handling activation times that are infinite: we either set these activation times to be large but finite (specifically, we choose ), or we neglect the contagions that lead to activation times that are infinite by restricting the map to a subset of contagions (that is, , where ). We note in passing (although we do not explore the strategy in the present manuscript) that there exist maps such as d/(d+1)[0,1] that map an unbounded metric space to a topologically equivalent metric space that is bounded. This ought to be useful for some situations.

Geometry of WTM maps

To quantify the similarity of the geometry of a WTM map to that of the nodes on the underlying manifold of a noisy geometric network, we calculate the Pearson correlation coefficient ρ to relate node-to-node distances for the WTM map. In Fig. 5, we compare the geometry of {z(i)} (see Fig. 5a) with that of the nodes’ locations (see Fig. 5b) by computing a Pearson correlation coefficient ρ to compare the node-to-node distances for the two point clouds (that is, and for ). We conduct our comparison with respect to the dimension of the ambient spaces in which the points lie (that is, for {z(i)} and for {w(i)}). See Supplementary Note 7 for further discussion.

Dimensionality of WTM maps

We study the dimensionality by examining the residual variance22,58 of the point cloud {z(i)} and computing the smallest dimension such that we lose less than 5% of the variance when projecting to a lower dimension using PCA22,26,58. We refer to this dimension as the ‘embedding dimension’ P. Specifically, we estimate the embedding dimension P of a WTM map by studying p-dimensional projections of the WTM map obtained via PCA for different values of p{1, 2, …}. For each projection, we calculate the residual variance Rp=1−(ρ(p))2 (refs 22, 58), where ρ(p) denotes the Pearson correlation coefficient that relates the geometric similarity between the p-dimensional projection and the unprojected WTM map (see the section Geometry of WTM maps). We define the embedding dimension P as the smallest dimension p such that Rp<0.05. See Supplementary Note 7 for further discussion.

Topology of WTM maps

We study the topology of a WTM map by examining the persistence diagram of a Vietoris–Rips filtration that is generated by the point cloud {z(i)} (see refs 48, 49). For our experiments involving a noisy ring lattice, we are interested primarily in assessing the presence versus absence of a ring topology in a WTM map. We thus study the persistent homology of a WTM map by examining a Vietoris–Rips filtration using the software package Perseus69. We calculate persistent 1D features (that is, 1-cycles) for the point cloud, and record the difference Δ=l1l2 between the two largest lifetimes of such 1D features. We normalize all lifetimes by the diameter of the point cloud so that Δ,l1,l2[0,1]. (Note that sometimes it can be preferable to use the ‘bottleneck distance’ between persistence diagrams70 rather than Δ.) See Supplementary Note 7 for further discussion.

Additional information

How to cite this article: Taylor, D. et al. Topological data analysis of contagion maps for examining spreading processes on networks. Nat. Commun. 6:7723 doi: 10.1038/ncomms8723 (2015).