Single-trajectory map equation

Kawamoto, Tatsuro

doi:10.1038/s41598-023-33880-y

Download PDF

Article
Open access
Published: 22 April 2023

Single-trajectory map equation

Tatsuro Kawamoto¹

Scientific Reports volume 13, Article number: 6597 (2023) Cite this article

988 Accesses
1 Citations
4 Altmetric
Metrics details

Subjects

Abstract

Community detection, the process of identifying module structures in complex systems represented on networks, is an effective tool in various fields of science. The map equation, which is an information-theoretic framework based on the random walk on a network, is a particularly popular community detection method. Despite its outstanding performance in many applications, the inner workings of the map equation have not been thoroughly studied. Herein, we revisit the original formulation of the map equation and address the existence of its “raw form,” which we refer to as the single-trajectory map equation. This raw form sheds light on many details behind the principle of the map equation that are hidden in the steady-state limit of the random walk. Most importantly, the single-trajectory map equation provides a more balanced community structure, naturally reducing the tendency of the overfitting phenomenon in the map equation.

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Reconstructing the evolution history of networked complex systems

Article Open access 02 April 2024

Junya Wang, Yi-Jiao Zhang, … Yanqing Hu

High-throughput prediction of protein conformational distributions with subsampled AlphaFold2

Article Open access 27 March 2024

Gabriel Monteiro da Silva, Jennifer Y. Cui, … Brenda M. Rubenstein

Introduction

Community detection plays a vital role in various disciplines of science dealing with network data. It is an unsupervised learning task to classify the node set in a network into groups, or modules. Each subgraph that is identified as a module has no significant internal structure, while we expect each module to be structurally distinct. Thus, community detection provides a concise explanation of the dataset by ignoring detailed structures as noises within a module while preserving a significant macroscopic organisation as a module structure^1,2,3,4. Analogous to other machine-learning tasks, because it is often not apparent which parts of the dataset represent noise, community detection algorithms suffer from overfitting and underfitting problems⁵.

The map equation⁶ is a popular community detection method for networks and is formulated as a minimization problem of an information-theoretic objective function describing the average code length of the random walk. Infomap⁷—an implementation for a greedy optimisation of the map equation—has often been used to analyze real-world datasets. Furthermore, there are several extensions of the map equation itself^{8,9,10,11,12,13,14,15,16,17,18,19}, mainly focusing on incorporating higher-order network information. However, the map equation is prone to overfitting, particularly for sparse networks. Despite the map equation providing an optimal module structure for the description of the random walk on a network, an excessively fine module structure may be obtained. For example, as illustrated in Fig. 1 (left), many small modules are often identified in addition to a few large modules. When too many modules consisting of only a few nodes are identified, we have difficulty interpreting it as a concise explanation of the dataset.

This study revisits the map equation and considers its raw form, which we refer to as the single-trajectory map equation. Its objective function is the average code length of (not necessarily random) walkers with finite path lengths. The concept of the single-trajectory map equation already appears in the original paper⁶ for a schematic description of the map equation. Nevertheless, this raw form has never been actively studied or utilised although it is a valuable variant with a mechanism to prune small modules and prevent the map equation from overfitting (as depicted in Fig. 1 (right)).

The emergence of small or highly unbalanced modules has been discussed in various contexts in the community detection literature. It is often considered to be the nature of real-world datasets^20,21,22, or a phenomenon that occurs because of the implementation details of an algorithm^21,23. In the context of the inference problem, the emergence of small modules is interpreted as artefacts caused by overfitting. Optimization-based (or maximum likelihood-based) methods are typically prone to overfitting, whereas methods based on Bayesian formulations avoid partitioning a network or subgraph where there is no statistically significant internal structure^24,25, that is, they avoid generating small modules. The map equation also has a Bayesian counterpart^16,19. Regardless of the underlying mechanism, the pruning of small modules is sometimes preferred in practice because it provides a more concise explanation of the network. A similar issue can be found in the regression problem in supervised learning²⁶. Although the ridge regression is a principled method, many variables with extremely small coefficients are often assessed as significant. In contrast, the lasso regression prunes such variables and provides a concise description of the dataset.

The single-trajectory map equation is a variant of the map equation that achieves a partition of coarser-resolution scale. Our approach differs from that of the hierarchical map equation⁸ that achieves finer-resolution partitions²⁷. Although a high-resolution community detection method is useful when the network consists of several small modules, a method with a coarser resolution is also needed when an algorithm suffers from overfitting. For bipartite networks¹⁷, showed that a coarser resolution can be obtained by incorporating the bipartiteness property. A different resolution scale can also be obtained by introducing the “Markov time”^12,28, which is an external parameter of the random walk. However, as shown in the following, the framework of the map equation can intrinsically prune small modules when formulated as the single-trajectory map equation and the balanced-size modules can be identified in a principled manner.

Results

Revisiting the map equation

We proceed with the step-by-step formulation of the map equation. A prominent characteristic of the map equation is the hierarchical encoding scheme for the random walk using multiple codebooks that takes account of module structure in a network. As a specific example, let us consider the encoding of a trajectory in a network as shown in Fig. 2a. We let ${\varvec{\zeta }} = \{ \zeta _{0}, \dots , \zeta _{T-1} \}$ be a trajectory of a walker, where $\zeta _{t} \in V$ is the tth visited node. We also consider a partition ${\varvec{\sigma }} = \{ \sigma _{1}, \dots , \sigma _{N} \}$ of node set V ($|V| = N$), where $\sigma _{i} \in \{1, \dots , K\}$ is the module label of node $i \in V$. K is the number of modules. A trajectory of a walker is encoded using two types of codebooks: inter-module and intra-module codebooks. The inter-module codebook describes the transitions of a walker moving into another module. In contrast, each intra-module codebook describes the walker transiting between nodes within the module or exiting the module.

The actual codewords for a trajectory based on the Huffman coding^29,30 are shown in Fig. 2a. Starting with the code “10” for the module that the walker has first visited, the trajectory is described by indicating the visited nodes. Every time the walker moves to a different module, the exiting code of the previous module and the entering code of the next module are consumed.

In general, given a trajectory ${\varvec{\zeta }}$ of length T and module assignments ${\varvec{\sigma }}$, the average code length ${\mathscr {L}}\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) $ is expressed as

$$\begin{aligned}&{\mathscr {L}}\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) = \frac{1}{T} \Biggl ( \sum _{t=0}^{T-1} \ell _{0}\left( \zeta _{t}, \sigma _{\zeta _{t}} \right) + \sum _{\begin{array}{c} t=0 \\ (\sigma _{\zeta _{t}} \ne \sigma _{\zeta _{t+1}}) \end{array} }^{T-2} \biggl ( \ell _{0}\left( \curvearrowright , \sigma _{\zeta _{t}} \right) + \ell _{1}\left( \sigma _{\zeta _{t+1}} \right) \biggr ) +\ell _{1}\left( \sigma _{\zeta _{0}} \right) \Biggr ). \end{aligned}$$

(1)

Here, we denote $\ell _{0}\left( i, \sigma \right) $ as the length of the code in an intra-module codebook, indicating that a walker visits node $i \in V$ in module $\sigma $; $\ell _{0}\left( \curvearrowright , \sigma \right) $ as the length of the code in an intra-module codebook, indicating that a walker exits module $\sigma $; and $\ell _{1}\left( \sigma \right) $ as the length of the code in an inter-module codebook, indicating that a walker enters module $\sigma $. In Eq. (1), the first summation represents the code length for visited nodes and the second summation represents the code length for transitions between modules. The last term is the code length for the module at the starting point, with a negligible contribution when $T \gg 1$. It is important that the codebooks are coupled; that is, because an exiting code from a module belongs to an intra-module codebook, transitions between modules affect the encoding of transitions within each module.

The principle of the map equation framework is that the compression of the average code length through the hierarchical coding reveals a module structure as an optimal partition ${\varvec{\sigma }}$. Readers might believe that the introduction of codewords for transitions between modules simply makes the code length longer. However, such a hierarchical encoding scheme can compress the average code length because it allows us to assign shorter codewords for visited nodes; for example, although the code “0” is assigned to two different nodes in Fig. 2a, they are distinguishable because they belong to different modules. Therefore, when a trajectory rarely consumes the codewords for transitions between modules, the average code length can be compressed more efficiently.

Equation (1) can also be expressed using visiting frequencies as follows:

$$\begin{aligned}&{\mathscr {L}}\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) = \left( \frac{1}{T} + \sum _{{\tilde{\sigma }}^{\prime }, {\tilde{\sigma }}} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } {\tilde{\sigma }} } \right) {\mathscr {H}}_{1} + \sum _{\sigma } \left( \sum _{j \in \sigma } \hat{{\textsf{p}}}_{j} + \sum _{{\tilde{\sigma }}^{\prime }} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } \sigma } \right) {\mathscr {H}}^{\sigma }_{0}, \end{aligned}$$

(2)

$$\begin{aligned}&{\left\{ \begin{array}{ll} {\mathscr {H}}_{1} = \sum _{\sigma ^{\prime }} \frac{\sum _{\sigma }\hat{{\textsf{p}}}_{ \sigma ^{\prime } \sigma }}{ 1/T + \sum _{{\tilde{\sigma }}^{\prime }, {\tilde{\sigma }}} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } {\tilde{\sigma }} } } \ell _{1}\left( \sigma ^{\prime } \right) + \frac{ 1/T }{ 1/T + \sum _{{\tilde{\sigma }}^{\prime }, {\tilde{\sigma }}} \hat{{\textsf{p}}}_{{\tilde{\sigma }}^{\prime } {\tilde{\sigma }} } } \ell _{1}\left( \sigma _{\zeta _{0}} \right) \\ {\mathscr {H}}^{\sigma }_{0} = \sum _{i \in \sigma } \frac{\hat{{\textsf{p}}}_{i}}{ \sum _{j \in \sigma } \hat{{\textsf{p}}}_{j} + \sum _{{\tilde{\sigma }}^{\prime }} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } \sigma } } \ell _{0}\left( i, \sigma \right) + \frac{ \sum _{\sigma ^{\prime }} \hat{{\textsf{p}}}_{ \sigma ^{\prime } \sigma } }{ \sum _{j \in \sigma } \hat{{\textsf{p}}}_{j} + \sum _{{\tilde{\sigma }}^{\prime }} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } \sigma } } \ell _{0}\left( \curvearrowright , \sigma \right) \end{array}\right. }, \end{aligned}$$

(3)

where $\sum _{i \in \sigma }$ represents the sum over the node set in module $\sigma $. $\hat{{\textsf{p}}}_{i}$ is the visiting frequency of node $i \in V$ and $\hat{{\textsf{p}}}_{ \sigma ^{\prime } \sigma }$ is the joint transition frequency from module $\sigma $ to module $\sigma ^{\prime }$, i.e.,

$$\begin{aligned} \hat{{\textsf{p}}}_{i}&= \frac{1}{T} \sum _{t=0}^{T-1} \delta _{i, \zeta _{t}}, \end{aligned}$$

(4)

$$\begin{aligned} \hat{{\textsf{p}}}_{\sigma ^{\prime } \sigma }&= {\left\{ \begin{array}{ll} \frac{1}{T} \sum _{t=0}^{T-2} \delta _{\sigma ^{\prime }, \sigma _{\zeta _{t+1}}} \delta _{\sigma , \sigma _{\zeta _{t}}} &{} (\text {for } \sigma \ne \sigma ^{\prime })\\ 0 &{} (\text {for } \sigma = \sigma ^{\prime }) \end{array}\right. }, \end{aligned}$$

(5)

where $\delta _{ab}$ represents the Kronecker delta. ${\mathscr {H}}^{\sigma }_{0}$ and ${\mathscr {H}}_{1}$ are conditional average code lengths within the intra- and inter-module codebooks, respectively.

Recall that the random walk is a stochastic variable; there is no such thing as a single (finite-length) trajectory representing the random walk. Therefore, instead of a specific trajectory, we consider the expected average code length ${\mathbb {E}}_{{\varvec{Z}}}\left[ {\mathscr {L}}\left( {\varvec{Z}}, {\varvec{\sigma }} \right) \right] $ in the map equation, where ${\varvec{Z}}$ is the stochastic variable representing the random walk; in other words, ${\mathbb {E}}_{{\varvec{Z}}}[\cdots ]$ is the ensemble average over all possible trajectories (say, from all possible starting points). We assume that the trajectory length T is sufficiently large that the random walk is in a steady state. When the network is strongly connected, the empirical frequencies are converted to the corresponding steady-state probabilities. $\sum _{\sigma } \hat{{\textsf{p}}}_{\sigma ^{\prime } \sigma }$ is converted to the entering probability into module $\sigma ^{\prime }$ denoted by $q_{\sigma ^{\prime } \curvearrowleft }$; $\sum _{\sigma ^{\prime }} \hat{{\textsf{p}}}_{\sigma ^{\prime } \sigma }$ to the exiting probability from module $\sigma $ denoted by $q_{\sigma \curvearrowright }$; and $\hat{{\textsf{p}}}_{i}$ to the visiting probability of node i denoted by $q_{i}$. The conditional average code lengths ${\mathscr {H}}^{\sigma }_{0}$ and ${\mathscr {H}}_{1}$ are also converted to the expectations. According to Shannon’s source coding theorem^30,31, these expectations are respectively bounded by the Shannon entropies,

$$\begin{aligned} H_{1}\left( \{ q_{\sigma \curvearrowleft } \} \right)&= -\sum _{\sigma =1}^{K} \frac{q_{\sigma \curvearrowleft }}{q_{\curvearrowleft }} \log \frac{q_{\sigma \curvearrowleft }}{q_{\curvearrowleft }}, \end{aligned}$$

(6)

$$\begin{aligned} H^{\sigma }_{0}\left( q_{\sigma \curvearrowright }, \{ q_{i} \}_{i\in \sigma } \right)&= -\frac{q_{\sigma \curvearrowright }}{p^{\sigma }_{\circlearrowright }} \log \frac{q_{\sigma \curvearrowright }}{p^{\sigma }_{\circlearrowright }} -\sum _{i \in \sigma } \frac{q_{i}}{p^{\sigma }_{\circlearrowright }} \log \frac{q_{i}}{p^{\sigma }_{\circlearrowright }}, \end{aligned}$$

(7)

where $q_{\curvearrowleft } = \sum _{\sigma =1}^{K} q_{\sigma \curvearrowleft }$, $p^{\sigma }_{\circlearrowright } = q_{\sigma \curvearrowright } + \sum _{i \in \sigma } q_{i}$, and $\log $ is the logarithm with base 2. Then, the expected average code length of the random walk is bounded from below as follows:

$$\begin{aligned} L\left( {\varvec{\sigma }} \right)&= q_{\curvearrowleft } H_{1}\left( \{ q_{\sigma \curvearrowleft } \} \right) + \sum _{\sigma } p^{\sigma }_{\circlearrowright } H^{\sigma }_{0}\left( q_{\sigma \curvearrowright }, \{ q_{i} \}_{i\in \sigma } \right) \le {\mathbb {E}}_{{\varvec{Z}}}\left[ {\mathscr {L}}\left( {\varvec{Z}}, {\varvec{\sigma }} \right) \right] . \end{aligned}$$

(8)

Note here that the contribution from the starting points of the random walk is excluded. This lower bound asymptotically coincides with the expected average code length itself as $T \rightarrow \infty $. This is the objective function of the map equation and the node partition ${\varvec{\sigma }}$ is optimised so that $L\left( {\varvec{\sigma }} \right) $ is minimised.

The assumption that the network is strongly connected plays a vital role in the aforementioned derivation. If this is not the case, the trajectory length T cannot be sufficiently large. Then, the contribution from the starting points of the random walk may not be negligible in ${\mathbb {E}}_{{\varvec{Z}}}\left[ {\mathscr {L}}\left( {\varvec{Z}}, {\varvec{\sigma }} \right) \right] $. Therefore, we can say that the map equation evaluates the code length of the “flow.” It is a stochastic variable representing the ensemble of transitions, and it has no information about the starting points of the random walk by definition (as discussed below, this distinction becomes more prominent when we consider the - -flow-model rawdir option in Infomap). The only input for the map equation is a network because the connectivity of nodes fully characterises the flow. By the introduction of so-called teleportation³² to the random walk that moves the walker to another node randomly with a certain probability, we can always let the trajectory length T be infinitely large and make the flow ergodic⁶. Therefore, the map equation is not essentially limited to strongly-connected networks.

Single-trajectory map equation

The average code length ${\mathscr {L}}\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) $ of a trajectory is the raw form of the objective function in the map equation. When we have multiple trajectories $\{{\varvec{\zeta }}_{a}\} := \{{\varvec{\zeta }}_{1}, \dots , {\varvec{\zeta }}_{M}\}$ on a common node set, analogous to the expected average code length ${\mathbb {E}}_{{\varvec{Z}}}\left[ {\mathscr {L}}\left( {\varvec{Z}}, {\varvec{\sigma }} \right) \right] $, we consider the following mean average code length:

$$\begin{aligned} \overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) = \frac{1}{M} \sum _{a=1}^{M} {\mathscr {L}}\left( {\varvec{\zeta }}_{a}, {\varvec{\sigma }} \right) . \end{aligned}$$

(9)

Each trajectory may have different lengths. Similar to the $L\left( {\varvec{\sigma }} \right) $, this mean average code length can be used as a minimization function to determine the optimal module assignments of nodes. We refer to such an optimisation method as the single-trajectory map equation. Note that the trajectories $\{{\varvec{\zeta }}_{a}\}$ are provided as inputs in Eq. (9); unlike the map equation, there is no need to assume that they are generated (or simulated) from random walks, although one can consider simulated walks as trajectories.

The average code lengths in the summation of Eq. (9) are not independent because they share codebooks. To illustrate this, let us consider two trajectories as shown in Fig. 2b. Although the trajectory ${\varvec{\zeta }}_{1}$ is identical to ${\varvec{\zeta }}$ in Fig. 2a, the codes describing them are different because we must assign codewords for the nodes that ${\varvec{\zeta }}_{1}$ does not go through due to the existence of trajectory ${\varvec{\zeta }}_{2}$. In contrast, the nodes where no trajectories go through do not contribute to the average code lengths, reflecting the fact that the trajectories of finite lengths are considered. Those nodes should not have any module labels because there is no information based on the trajectories.

As we have seen, $L\left( {\varvec{\sigma }} \right) $ and $\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ are conceptually different. In the map equation, $L\left( {\varvec{\sigma }} \right) $ is the expected average code length for the flow that is completely specified by the transition probabilities. We can also modify the transitions using teleportation to make the random walk ergodic. By contrast, $\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ does not have such a stochasticity. It is the mean of the actual average code lengths, where each element corresponds to a single trajectory. Furthermore, $\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ depends explicitly on the coding scheme applied, e.g., the Huffman coding, Shannon–Fano coding³⁰, etc. Quantitatively, the contribution from the last term in Eq. (1) mainly makes the minimization of $\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ distinct from that of $L\left( {\varvec{\sigma }} \right) $. The codeword for the module that is required to specify the starting point of a trajectory makes the coding using multiple codebooks less efficient. Recall that an efficient compression is achieved when the inter-module codebook is not frequently used. This implies that the introduction of module labels is more costly in $\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ and the single-trajectory map equation avoids generating many small modules.

The single-trajectory map equation searches for the node partition ${\varvec{\sigma }}$ that achieves the optimal compression for the description of trajectories under a certain coding scheme. The optimality of the coding scheme itself is not required for the effectiveness of the method. Therefore, we can use different types of coding for the intra-module and inter-module codebooks. For example, we can introduce a heterogeneous coding where the code lengths are multiplied by a constant factor $\lambda > 0$ for the codewords in the inter-module codebook. That is, given a trajectory ${\varvec{\zeta }}$ and a partition ${\varvec{\sigma }}$, Eq. (1) is modified to

$$\begin{aligned}&{\mathscr {L}}_{\lambda }\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) = \frac{1}{T} \Biggl ( \sum _{t=0}^{T-1} \ell _{0}\left( \zeta _{t}, \sigma _{\zeta _{t}} \right) + \sum _{\begin{array}{c} t=0 \\ (\sigma _{\zeta _{t}} \ne \sigma _{\zeta _{t+1}}) \end{array} }^{T-2} \biggl ( \ell _{0}\left( \curvearrowright , \sigma _{\zeta _{t}} \right) + \lambda \ell _{1}\left( \sigma _{\zeta _{t+1}} \right) \biggr ) +\lambda \ell _{1}\left( \sigma _{\zeta _{0}} \right) \Biggr ). \end{aligned}$$

(10)

Here, $\lambda $ is a hyperparameter that penalises the emergence of modules when such modules are relatively inefficient for the compression of the code length.

We can also derive a lower bound for the actual code length using Shannon’s source coding theorem, similar to how $L\left( {\varvec{\sigma }} \right) $ was such an estimate for the random walk in the steady-state limit. To this end, we consider the average code length of the concatenated code, $\sum _{a=1}^{M} T_{a} {\mathscr {L}}\left( {\varvec{\zeta }}_{a}, {\varvec{\sigma }} \right) /\sum _{a=1}^{M} T_{a}$, where $T_{a}$ is the length of the ath trajectory; equivalently $\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ when all trajectories have the same length. We regard the empirical frequencies $\hat{{\textsf{p}}}_{i}$ and $\hat{{\textsf{p}}}_{\sigma ^{\prime } \sigma }$ as the true probabilities for the stochastic variables indicating the codewords and the concatenated code as the expected code length. Then, the conditional average code lengths in Eq. (3) are bounded from below by the Shannon entropies³⁰ with the empirical frequencies. Therefore, the average code length of the concatenated code is bounded as follows:

$$\begin{aligned} \underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right)&= \hat{{\textsf{q}}}_{\curvearrowleft } H_{1}\left( \{ \hat{{\textsf{q}}}_{\sigma \curvearrowleft } \} \right) + \sum _{\sigma } \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } H^{\sigma }_{0}\left( \hat{{\textsf{q}}}_{\sigma \curvearrowright }, \{ \hat{{\textsf{q}}}_{i} \}_{i\in \sigma } \right) \le \frac{1}{\sum _{a=1}^{M} T_{a}}\sum _{a=1}^{M} T_{a} {\mathscr {L}}\left( {\varvec{\zeta }}_{a}, {\varvec{\sigma }} \right) , \end{aligned}$$

(11)

where

$$\begin{aligned} \hat{{\textsf{q}}}_{i}&= \frac{1}{\sum _{a=1}^{M} T_{a}} \sum _{a=1}^{M} \sum _{t=0}^{T_{a}-1} \delta _{i, \zeta _{a t}}, \nonumber \\ \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }&= {\left\{ \begin{array}{ll} \displaystyle \frac{1}{\sum _{a=1}^{M} T_{a}} \sum _{a=1}^{M} \sum _{t=0}^{T_{a}-2} \delta _{\sigma ^{\prime }, \sigma _{\zeta _{a t+1}}} \delta _{\sigma , \sigma _{\zeta _{a t}}} &{} (\sigma \ne \sigma ^{\prime })\\ 0 &{} (\sigma = \sigma ^{\prime }) \end{array}\right. }, \nonumber \\ \hat{{\textsf{q}}}_{\sigma ^{\prime } \curvearrowleft }&= \frac{ \sum _{a=1}^{M} \delta _{\sigma ^{\prime }, \sigma _{\zeta _{a 0}} } }{\sum _{a=1}^{M} T_{a}} + \sum _{\sigma } \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }, \hat{{\textsf{q}}}_{\sigma \curvearrowright } = \sum _{\sigma ^{\prime }} \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }, \nonumber \\ \hat{{\textsf{q}}}_{\curvearrowleft }&= \sum _{\sigma } \hat{{\textsf{q}}}_{\sigma \curvearrowleft }, \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } = \hat{{\textsf{q}}}_{\sigma \curvearrowright } + \sum _{i \in \sigma } \hat{{\textsf{q}}}_{i}. \end{aligned}$$

(12)

We regard $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ as an alternative objective function for the single-trajectory map equation. $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ is independent of the coding scheme and its minimization is computationally more efficient than that of $\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ because we do not need to construct the codebooks explicitly. Note that $\hat{{\textsf{q}}}_{\sigma \curvearrowleft }$ and $\hat{{\textsf{q}}}_{\sigma \curvearrowright }$ may not coincide in Eq. (12), whereas $q_{\sigma \curvearrowleft } = q_{\sigma \curvearrowright }$ for any module in $L\left( {\varvec{\sigma }} \right) $ owing to the detailed balance condition of the random walk in the steady state. Analogous to ${\mathscr {L}}_{\lambda }\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) $ in Eq. (10), we can also consider a heterogeneous coding in $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $, i.e.,

$$\begin{aligned} \underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right)&= \lambda \, \hat{{\textsf{q}}}_{\curvearrowleft } H_{1}\left( \{ \hat{{\textsf{q}}}_{\sigma \curvearrowleft } \} \right) + \sum _{\sigma } \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } H^{\sigma }_{0}\left( \{ \hat{{\textsf{q}}}_{\sigma \curvearrowright } \}, \{ \hat{{\textsf{q}}}_{i} \}_{i\in \sigma } \right) . \end{aligned}$$

(13)

Interestingly, the method using $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ is also related to a variant of the map equation that is implemented in Infomap as an option named - -flow-model rawdir. In this variant of the map equation, we consider the flow based on the set of transition probabilities induced by the edges (i.e., not the random walk on the network). The corresponding objective function is in fact equivalent to $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ where we ignore the codewords for the initial module and the initial node in each trajectory (consequently, the total length of trajectories $\sum _{a=1}^{M} T_{a}$ is also modified to $\sum _{a=1}^{M} T_{a} - M$). A summary table of the average code lengths is shown in Supplementary Table 1 (Section S1).

Before moving on, let us compare how the minimizations of $L( {\varvec{\sigma }})$, $\overline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )$, and $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ differ using a simple example. We consider a trajectory where a walker visits each node exactly once on a path. Figure 3 shows the results obtained through the exact minimization of the objective functions. It quantifies how the average code lengths approach a common value as N increases, because the contribution from the starting point of the trajectory becomes negligible. ${\mathscr {L}}\left( {\varvec{\sigma }}; {\varvec{\zeta }} \right) $ and $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; {\varvec{\zeta }} \right) $ quickly approach to each other, whereas $L( {\varvec{\sigma }})$ converges relatively slowly, implying that the contribution from the codeword of the initial module can be considerable. We also confirmed that the single-trajectory map equation indeed tends to identify a smaller number of modules, and the resulting partitions can vary depending on the coding scheme applied.

The exact minimization of the (expected) average code length is not computationally feasible unless a dataset is extremely small, and thus, we must rely on approximate heuristics in practice. The greedy heuristic implemented in Infomap is commonly used for the map equation. Therefore, we implemented the optimisation for the single-trajectory map equation as a wrapper of Infomap. That is, we first run Infomap as the initial state of the node partition, and then, reduce overfitting by pruning small modules based on $\overline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )$ or $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; {\varvec{\zeta }} \right) $ as a fine-tuning process; our fine-tuning algorithm is also a greedy heuristic. In the following, we refer to this algorithm as Infomap+, and the implementation code is publicly available³³. Further details of the algorithm are described in Methods section.

Experiments

This section demonstrates that the single-trajectory map equation prevents overfitting using datasets represented as networks and a real-world dataset as a set of trajectories. A network is a special case of trajectory datasets because each directed edge can be regarded as a trajectory with length $T=2$. We treat each edge as a pair of directed edges in both directions for an undirected network. All networks considered are weakly connected.

For the network datasets, we can also consider simulated walks on the underlying network as the input trajectories. In this setting, we would need to specify the type of simulated walks and choose the values of T and M as hyperparameters. Herein, however, we do not consider the simulated walks and treat the edges set directly as the set of trajectories.

Network datasets

We first consider synthetic networks that are generated by the stochastic block model (SBM)^25,37,38,39, which is a random graph model having a planted (pre-assigned) module structure. This is a canonical model that is used for analyses in community detection. We particularly consider the so-called symmetric SBM that has two equally-sized planted modules. Each pair of nodes in the same planted module is connected with probability $p_{\textrm{in}}$ and each pair of nodes in different planted modules is connected with probability $p_{\textrm{out}}$. The symmetric SBM is commonly parameterized by the average degree c and the fuzziness of module structure $\epsilon = p_{\textrm{out}}/p_{\textrm{in}}$ instead of $p_{\textrm{in}}$ and $p_{\textrm{out}}$. The detection of planted modules is easier when $\epsilon $ is small because the module structure is clearer. Even when $\epsilon < 1$, there exists a critical value of $\epsilon $ above which it becomes impossible to identify the planted module structure better than by chance; this is known as the detectability limit^{24,34,35,36,39,40} (in $N \rightarrow \infty $). For these networks, the single-trajectory map equation cannot be the best method, as the Bayesian inference methods based on the SBM can avoid overfitting at all.

Figure 4 shows the results of community detection based on the map equation and the single-trajectory map equation applied to the SBM. Each point represents the relative module size, which is defined as $\sum _{i=1}^{N} \delta _{\sigma _{i}, \sigma }/N$ for module $\sigma $. The results based on modularity maximization (the Louvain⁴¹ and Leiden⁴² algorithms) are also shown for comparison. The - -two-level option in Infomap indicates that it is the method introduced in the original paper for the map equation.

The Infomap (incorrectly) identifies several small modules even when the module structure is relatively clear, whereas Infomap+ prunes such small modules and identifies the equally-sized modules. The network plots in Fig. 1 are the results of the same experiment but with the SBM parameters $N=300$, $c=8$, and $\epsilon =0.1$. Although the modularity-based algorithms also identify small modules, Infomap is more prone to overfitting in the region where $\epsilon $ is small. This phenomenon can be described by the map equation having a finer resolution limit²⁷ compared with that of the modularity⁴³, i.e., the map equation can identify smaller modules. Note, however, that the analysis of the resolution limit is based on an extreme-case example that has a well-defined module structure; it does not describe the whole behaviour in Fig. 4. In the region of $\epsilon $ above the detectability limit ($\epsilon \approx 0.15$⁴⁰), the modularity-based algorithms subdivide the planted modules into a number of smaller-sized modules. This is problematic because a practitioner can hardly realise when the resulting partition is due to overfitting. In contrast, most of the map equation-based algorithms do not partition a network in that region, implying that they avoid overfitting.

Although we showed the relative module sizes obtained by the algorithms, readers might wonder whether the identified modules are actually consistent with the planted ones. In Supplementary Fig. 1 (Section S2), we confirmed that the inferred and planted module structures are indeed highly consistent when the number of modules is correctly estimated. Note also that community detection algorithms generally suffer from overfitting and underfitting more severely when the average degree c is smaller. Therefore, all the methods considered here are expected to perform less accurately when c is extremely small.

The experiments here can be conducted for larger networks. In that case, however, some of the plots in Fig. 4 would be unnecessarily difficult to read because we would have many more points due to ovefitting. Moreover, the comparison with the previously known result on the detectability limit may be difficult for larger networks, because it is observed in⁴⁰ that algorithmic detectability limits of greedy algorithms can be size-dependent.

We then apply the algorithms for the single-trajectory map equation to real-world networks. Figure 5 shows the relative module sizes obtained using Infomap and Infomap+. It shows that small modules are pruned, yet larger modules remain identified in most of the cases with Infomap+. Although all variants of Infomap+ often provide similar partitions, empirically, the Huffman coding method finds a good balance of module sizes in real-world networks. The datasets considered here are often analyzed in the literature on community detection. For example, readers can compare the results here with those of Bayesian inference methods reported in^{5,44,45,46,47}.

In Fig. 5, the value of the hyperparameter $\lambda $, which acts as a resolution parameter, is adjusted for each network so that the size of the smallest module is not less than $\min \{3, N/100\}$ (this adjustment can be performed automatically). The selected values of $\lambda $ and the details of experimental settings and datasets are provided in Supplementary Table 2 (Section S3). We also examined the $\lambda $-dependency in Fig. 6, and we found that the number of modules varies within $1 \le \lambda < 2$ in many datasets; in the Method section, we show that $\lambda =2$ is a practical upper bound according to the resolution limit. Note that the threshold $\min \{3, N/100\}$ is only a reference to determine a reasonable value of $\lambda $; when Infomap+ excessively prunes modules, one can directly tune $\lambda $ to resolve the underfitting problem.

Bike-sharing dataset: application to a set of trajectories

Finally, we compare the methods using a dataset of a bike-sharing service in London^48,49, which is a dataset consisting of trajectories (sequences of bike stations visited) that individual bikes have travelled in a day (see Supplementary Information (Section S4) for the details of the dataset); thus, $T_{a}$ is the number of stations that bike a has visited. Figure 7a illustrates trajectories of three bikes in the dataset. Community detection of the trajectories identifies the area within which a bike is often used. Figure 7b shows the partition obtained by minimising $L( {\varvec{\sigma }})$ using Infomap; here, we constructed a network by decomposing each trajectory into a set of edges between successive pairs of stations. As a result, we obtained eight modules; in addition to four large modules, several modules consisting of only a few stations are also identified. In Fig. 7c which shows the partitions obtained by minimising $\overline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )$, we no longer observe the small modules. Although Fig. 7c is of the Huffman coding method ($\lambda =1$), we obtain the same partition with the Shannon–Fano coding method ($\lambda =1$) and with the method minimising $\underline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )$ ($\lambda =1.8$).

Discussion

This study revisited the formulation of the map equation and shed light on many details hidden in its principle. We addressed the fact that the encoding of trajectories is qualitatively distinct from the encoding of the flow on a network and proposed the single-trajectory map equation. Importantly, the proposed method can prune small modules and prevent overfitting.

The single-trajectory map equation provides a more balanced community structure compared with the map equation. Whereas balanced partitions may not always be desirable, it is often beneficial because we can prune spurious modules due to overfitting as demonstrated in Fig. 1. Furthermore, the analysis in the Method section implies that the single-trajectory map equation is not prone to underfitting compared with the map equation because their resolution limits are almost the same when $\lambda =1$.

Readers might wonder if the present approach is distinct from other variants of the map equation, such as the one with the Markov-time parameter^12,28 and the Bayesian formulation of the map equation^16,19 which is an improved teleportation method⁵⁰. To clarify this point, we also conducted experiments analogous to those described in Experiments section using these methods in Supplementary Information (Section S5). In some cases, these methods also exhibit similar partitions as in Figs. 4 and 5. However, they are apparently not particularly suitable for pruning small modules because these methods are more sensitive/insensitive to the choices of the hyperparameters that we need to search for the optimal values in a finer scale/wider range, while balanced partitions are often obtained without tuning the hyperparameter $\lambda $ in the single-trajectory map equation. We also emphasise that the single-trajectory map equation is not a generalisation of the map equation, but its raw form, and overfitting is avoided using the principle of the map equation itself.

The bootstrapping method⁵¹ is another approach for avoiding overfitting. However, this approach is computationally expensive¹⁹ and a comparison with the present approach is not very clear because the output is a population of partitions. A more detailed study of the qualitative and quantitative relationships between the single-trajectory map equation and other variants of the map equation is left for future work. Furthermore, because the single-trajectory map equation is a trajectory-based approach, the relationship between the memory-network extension^10,13 of the map equation is another potential research direction because both take a set of trajectories as the input.

The time complexity of the optimisation algorithm is a major issue in the single-trajectory map equation. Whereas the lower bound $\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ can be optimised as efficiently as Infomap, explicit construction of the codebooks is required for the actual average code length $\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $. Although our implementations of Infomap+ run within a reasonable amount of time for fairly large datasets as demonstrated in Supplementary Information Fig. 6, an improved implementation is also left for future work.

Methods

Optimisation algorithm

Herein, we explain the implementation details of the greedy heuristic. A typical greedy heuristic for community detection, including Infomap, iteratively merges two or more modules that improve the value of an objective function^41,42,52 and equally-sized modules are often preferentially merged²³. Such an update rule does not effectively compress the average code length at the stage of fine-tuning. This is not surprising because the initial partition is located at a local or global minimum of the objective function in the map equation, which may also be a local minimum in the single-trajectory map equation. Moreover, there is no reason that equally-sized modules should be preferentially merged. Although we typically have a few large and many small modules as the initial partition, it is unlikely that merging those small modules provides better compression of the average code length. Therefore, instead, we iteratively merge the smallest module and its most tightly-connected module regardless of the resulting value of the average code length until only one module is left; among the partitions that form with this merging process, we accept the partition that achieves the minimum average code length. Given an initial partition, this algorithm is deterministic. Although this algorithm is straightforward and the resulting partition may not be the global optimum, an improved compression of the average code length can be achieved by pruning small modules without being trapped into the local minima of the objective function.

When we use the lower bound $\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ as the objective function, the greedy update can be performed as done in Infomap. The expanded form of $\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ is

$$\begin{aligned}{}&\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) = \lambda \hat{{\textsf{q}}}_{\curvearrowleft } \log \hat{{\textsf{q}}}_{\curvearrowleft } -\lambda \sum _{\sigma } \hat{{\textsf{q}}}_{\sigma \curvearrowleft } \log \hat{{\textsf{q}}}_{\sigma \curvearrowleft } + \sum _{\sigma } \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } \log \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } - \sum _{\sigma } \hat{{\textsf{q}}}_{\sigma \curvearrowright } \log \hat{{\textsf{q}}}_{\sigma \curvearrowright } - \sum _{i=1}^{N} \hat{{\textsf{q}}}_{i} \log \hat{{\textsf{q}}}_{i}. \end{aligned}$$

(14)

The last term in Eq. (14) is independent of partition. Therefore, when we merge two modules, we only need to keep track of changes in $\hat{{\textsf{q}}}_{\sigma \curvearrowleft }$, $\hat{{\textsf{q}}}_{\sigma \curvearrowright }$, and $\hat{{\textsf{p}}}^{\sigma }_{\circlearrowright }$, which are defined in Eq. (12). In these quantities, $\sum _{a=1}^{M} \delta _{\sigma ^{\prime }, \sigma _{\zeta _{a 0}} }$ is the population of the starting-point nodes in module $\sigma $, $\sum _{\sigma } \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }$ and $\sum _{\sigma ^{\prime }} \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }$ are the sums of the populations of the transitions across modules in the set of trajectories, and $\sum _{i \in \sigma } \hat{{\textsf{q}}}_{i}$ is the sum of the node-visiting frequencies in module $\sigma $. They are O(K) quantities, such that we can efficiently compute the change in $\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ when two modules are merged.

When we use the actual average code length $\overline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ as the objective function, the greedy update cannot be computed as efficiently as for $\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $. When two modules are merged, we need to reconstruct the intra-module codebook of the target module as well as the inter-module codebook to compute the updated code length. The time complexity of constructing a codebook depends on the specific coding scheme applied. In Supplementary Fig. 6, we show the running times of Infomap and Infomap+ on the SBM; herein, we used the Infomap API⁵³ (a C++-based implementation with a Python wrapper) for Infomap and our Python-based implementation³³ for Infomap+.

Resolution limit

Readers might consider that the pruning effect implies that the proposed method is prone to underfitting. To examine this issue, we derive the resolution limit of the single-trajectory map equation focusing on $\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ and network datasets. The resolution limit is the smallest module size that the method can identify given a network size such as the total number of edges.

The following analysis shows that, although the method has a relatively coarser-resolution scale compared with the standard or hierarchical map equation, it is still a high-resolution method. The analysis also provides a theoretical explanation of some of the empirical results we obtained through the experiments in Experiments section and an implication to the range that the hyperparameter $\lambda $ should take.

General form

We closely follow the derivation in²⁷, which is applied to undirected networks. The present resolution limit is for directed networks and the considered objective function is $\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) $ instead of $L\left( {\varvec{\sigma }} \right) $.

We first rewrite the empirical frequencies of the walkers and the objective function in the single-trajectory map equation in terms of network statistics. When the input trajectories are the edges in a directed network (i.e., the number of trajectories M is the number of directed edges), we have

$$\begin{aligned} \hat{{\textsf{q}}}_{\sigma \curvearrowright }&= \frac{\ell ^{\textrm{out}}_{\sigma }}{2M}, \quad \hat{{\textsf{q}}}_{\sigma \curvearrowleft } = \frac{\ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma }}{2M} + \frac{\ell ^{\textrm{in}}_{\sigma }}{2M}, \quad \hat{{\textsf{q}}}_{\curvearrowleft } = \frac{M+C}{2M}, \nonumber \\ \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright }&= \frac{\ell ^{\textrm{out}}_{\sigma }}{2M} + \frac{2 \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \ell ^{\textrm{in}}_{\sigma }}{2M}, \quad \hat{{\textsf{q}}}_{i} = \frac{d^{\textrm{in}}_{i} + d^{\textrm{out}}_{i}}{2M}, \end{aligned}$$

(15)

where $\ell _{\sigma }$ is the number of directed edges within module $\sigma $; $\ell ^{\textrm{in}}_{\sigma }$ and $\ell ^{\textrm{out}}_{\sigma }$ are the numbers of in-coming and out-going edges of module $\sigma $, respectively; $d^{\textrm{in}}_{i}$ and $d^{\textrm{out}}_{i}$ are the in- and out-degrees of node i; and C is the cut size of the network, i.e., the total number of directed edges that are crossing different modules. Using Eq. (15), the objective function is recast as

$$\begin{aligned} \underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right)&= \frac{1}{2M} \biggl ( \lambda (M+C) \log (M+C) + C + \sum _{\sigma } \underline{{\mathscr {L}}}^{\sigma }_{\lambda } + 2M \nonumber \\&\quad - \sum _{i=1}^{N} \left( d^{\textrm{in}}_{i} + d^{\textrm{out}}_{i} \right) \log \left( d^{\textrm{in}}_{i} + d^{\textrm{out}}_{i} \right) \biggr ), \end{aligned}$$

(16)

where

$$\begin{aligned} \underline{{\mathscr {L}}}^{\sigma }_{\lambda }&= -\lambda \left( \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \ell ^{\textrm{in}}_{\sigma } \right) \log \left( \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \ell ^{\textrm{in}}_{\sigma } \right) \nonumber \\&\quad + 2 \left( \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \frac{1}{2}\ell ^{\textrm{in}}_{\sigma } \right) \log \left( \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \frac{1}{2} \ell ^{\textrm{in}}_{\sigma } \right) - \ell ^{\textrm{out}}_{\sigma } \log \ell ^{\textrm{out}}_{\sigma }. \end{aligned}$$

(17)

In the resolution-limit analysis, we consider two well-defined modules and derive the condition under which their merging is favoured (i.e., the modules are not resolved) for better optimisation of the objective function. Thus, we evaluate the condition such that the difference in the objective function $\Delta \underline{{\mathscr {L}}}^{\sigma }_{\lambda }$ becomes negative when two modules are merged. We denote the labels of two well-defined modules as A and B and the merged module as AB. We also denote the change in $\sum _{\sigma } \underline{{\mathscr {L}}}^{\sigma }_{\lambda }$ through the update as R, i.e.,

$$\begin{aligned} R = \underline{{\mathscr {L}}}^{AB}_{\lambda } - \underline{{\mathscr {L}}}^{A}_{\lambda } - \underline{{\mathscr {L}}}^{B}_{\lambda }. \end{aligned}$$

(18)

Here, R is a local quantity that depends only on the variables within/around modules A and B. When two well-defined modules are merged, the cut size is decreased by a small $\delta $ ($\delta \ll M + C$). The difference in the objective function based on the update is

$$\begin{aligned} \Delta \underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right)&= \frac{1}{2M} \biggl ( \lambda (M+C-\delta ) \log (M+C-\delta ) - \lambda (M+C) \log (M+C) - \delta + R \biggr ) \nonumber \\&\simeq \frac{1}{2M} \biggl ( -\delta \lambda \log \left( e (M+C) \right) - \delta + R \biggr ), \end{aligned}$$

(19)

where e is the basis of the natural logarithm. Therefore, the resolution limit is generally expressed as

$$\begin{aligned} R \lesssim \delta \biggl ( 1 + \lambda \log \left( e (M+C) \right) \biggr ). \end{aligned}$$

(20)

In the map equation, the cut size C is the only global term that is responsible for the resolution limit (see equation (11) in²⁷). By contrast, the single-trajectory map equation has the total number of directed edges M as another global term in Eq. (20). Note, however, that the contribution from M is logarithmic, implying that the single-trajectory map equation is still a high-resolution method. Next, we will derive a more explicit scaling.

Ring of cliques

It is common to consider a “ring of cliques” in a resolution-limit analysis, as illustrated in Fig. 8a. We consider m cliques (each of which consists of n nodes) and connect each with a single edge to form a ring. This is an undirected network. Again, we treat each undirected edge as a pair of directed edges in both directions. We regard each clique as a module, and using this example, derive the resolution limit in a more explicit form.

When we merge two of these cliques, the cut size is decreased by 2. We denote $\ell _{\sigma } = n (n-1) = \ell $ for an arbitrary module. Assuming that $\ell \gg 1$, we have

$$\begin{aligned} R \simeq -2 (\lambda -2) (\ell + 3) + 2(\lambda -1) \left( \log (e\ell ) \right) . \end{aligned}$$

(21)

Substituting Eq. (21) into Eq. (20), we obtain

$$\begin{aligned} \frac{\ell ^{\lambda -1} 2^{(2-\lambda )(\ell +3)-1}}{e} \lesssim \left( M+C \right) ^{\lambda }. \end{aligned}$$

(22)

Each clique is resolved as a module unless $(M+C)^{\lambda }$ is larger than the left-hand side of Eq. (22), which is an exponentially growing function with respect to the clique size $\ell $.

Figure 8b depicts the resolution limits of the single-trajectory map equation, together with those of the map equation²⁷ and modularity⁴³. Although n and m are integers, we treat them as real numbers to highlight the scaling of each resolution limit. The resolution limit with $\lambda =1$ is extremely close to that of the map equation. Therefore, the single-trajectory map equation is not prone to underfitting compared with the map equation.

When $\lambda $ is large, modules with a small n are not resolved for any network size. However, the limit rapidly disappears as n becomes larger, whereas the resolution limit of the modularity disappears relatively slowly. This dependency of the resolution limit partially explains the favourable behaviour of the single-trajectory map equation, i.e., small modules are pruned yet large modules continue to be identified. However, as pointed out in the main text, the resolution limit does not describe the full behaviour of the single-trajectory map equation; it is not $\lambda $ that plays a critical role in the method and $\lambda =1$ is often sufficient to avoid overfitting.

In the left-hand side of Eq. (22), the leading coefficient in the exponent becomes negative at $\lambda = 2$. In this case, a clique will not be resolved as a module for any network size regardless of its size n, i.e., the ability as a community detection method will be completely lost. This transition implies that the optimal value of $\lambda $ is usually located within $1 \le \lambda < 2$, which is indeed consistent with our experimental results in Fig. 6.

Data availability

The karate club, Les Miserables, political books, football, and power grid datasets were downloaded from http://www-personal.umich.edu/~mejn/netdata/. The C-elegans-frontal, email-Eu-core, and wiki-Vote datasets were downloaded from http://konect.cc/networks/. The political blogs dataset was downloaded from https://snap.stanford.edu/data/. The E. coli transcription dataset was downloaded from https://www.weizmann.ac.il/mcb/UriAlon/e-coli-transcription-network. The bike-sharing dataset was downloaded from https://github.com/konstantinklemmer/bikecommclust.

Code availability

The codes for the single-trajectory map equation are available at https://github.com/tatsuro-kawamoto/single-trajectory_map_equation.

References

Schaeffer, S. E. Graph clustering. Comput. Sci. Rev. 1, 27–64 (2007).
Article MATH Google Scholar
Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
Article ADS MathSciNet Google Scholar
Fortunato, S. & Hric, D. Community detection in networks: A user guide. Phys. Rep. 659, 1–44 (2016).
Article ADS MathSciNet Google Scholar
Jin, D. et al. A survey of community detection approaches: From statistical modeling to deep learning. IEEE Trans. Knowl. Data Eng. 35(2), 1149–1170 (2023).
Ghasemian, A., Hosseinmardi, H. & Clauset, A. Evaluating overfit and underfit in models of network community structure. IEEE Trans. Knowl. Data Eng. 32, 1722–1735 (2020).
Google Scholar
Rosvall, M. & Bergstrom, C. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. U.S.A. 105, 1118–1123 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
https://www.mapequation.org/.
Rosvall, M. & Bergstrom, C. T. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE 6, e18209 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Viamontes Esquivel, A. & Rosvall, M. Compression of flow can reveal overlapping-module organization in networks. Phys. Rev. X 1, 021025 (2011).
Google Scholar
Rosvall, M. et al. Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 4630 (2014).
De Domenico, M., Lancichinetti, A., Arenas, A. & Rosvall, M. Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems. Phys. Rev. X 5, 011027 (2015).
Google Scholar
Kheirkhahzadeh, M., Lancichinetti, A. & Rosvall, M. Efficient community detection of network flows for varying Markov times and bipartite networks. Phys. Rev. E 93, 032309 (2016).
Article ADS PubMed Google Scholar
Edler, D., Bohlin, L. & Rosvall, M. Mapping higher-order network flows in memory and multilayer networks with infomap. Algorithms 10, 112 (2017).
Article MathSciNet MATH Google Scholar
Aslak, U., Rosvall, M. & Lehmann, S. Constrained information flows in temporal networks reveal intermittent communities. Phys. Rev. E 97, 062312 (2018).
Article ADS CAS PubMed Google Scholar
Emmons, S. & Mucha, P. J. Map equation with metadata: Varying the role of attributes in community detection. Phys. Rev. E 100, 022301 (2019).
Article ADS CAS PubMed Google Scholar
Smiljanić, J., Edler, D. & Rosvall, M. Mapping flows on sparse networks with missing links. Phys. Rev. E 102, 012302 (2020).
Article ADS MathSciNet PubMed Google Scholar
Blöcker, C. & Rosvall, M. Mapping flows on bipartite networks. Phys. Rev. E 102, 052305 (2020).
Article ADS MathSciNet PubMed Google Scholar
Eriksson, A., Edler, D., Rojas, A., de Domenico, M. & Rosvall, M. How choosing random-walk model and network representation matters for flow-based community detection in hypergraphs. Commun. Phys. 4, 1–12 (2021).
Article Google Scholar
Smiljanić, J., Blöcker, C., Edler, D. & Rosvall, M. Mapping flows on weighted and directed networks with incomplete observations. J. Complex Netw.9 (2021).
Arenas, A., Danon, L., Diaz-Guilera, A., Gleiser, P. M. & Guimera, R. Community analysis in social networks. Eur. Phys. J. B 38, 373–380 (2004).
Article ADS CAS MATH Google Scholar
Clauset, A., Newman, M. E. J. & Moore, C. Finding community structure in very large networks. Phys. Rev. E 70, 066111 (2004).
Article ADS Google Scholar
Leskovec, J., Lang, K. J., Dasgupta, A. & Mahoney, M. W. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 29–123 (2009).
Article MathSciNet MATH Google Scholar
Wakita, K. & Tsurumi, T. Finding community structure in mega-scale social networks: [extended abstract]. In Proceedings of the 16th International Conference on World Wide Web, WWW ’07, 1275–1276 (Association for Computing Machinery, New York, NY, USA, 2007).
Moore, C. The computer science and physics of community detection: landscapes, phase transitions, and hardness. arXiv preprint arXiv:1702.00467 (2017).
Peixoto, T. P. Bayesian stochastic blockmodeling. In Advances in Network Clustering and Blockmodeling (eds Doreian, V. & Batagelj, A. Ferligoj.) (Wiley, New York, 2019).
Google Scholar
Hastie, T. J., Tibshirani, R. J. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics (Springer, New York, 2009).
Book MATH Google Scholar
Kawamoto, T. & Rosvall, M. Estimating the resolution limit of the map equation in community detection. Phys. Rev. E 91, 012809 (2015).
Article ADS Google Scholar
Schaub, M. T., Lambiotte, R. & Barahona, M. Encoding dynamics for multiscale community detection: Markov time sweeping for the map equation. Phys. Rev. E 86, 026112 (2012).
Article ADS Google Scholar
MacKay, D. J. & Mac Kay, D. J. Information Theory, Inference and Learning Algorithms (Cambridge University Press, Cambridge, 2003).
Google Scholar
Cover, T. M. Elements of Information Theory (Wiley, New York, 1999).
Google Scholar
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
Article MathSciNet MATH Google Scholar
Brin, S. & Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998).
Article Google Scholar
https://github.com/authorname/single-trajectory_map_equation.
Decelle, A., Krzakala, F., Moore, C. & Zdeborová, L. Inference and phase transitions in the detection of modules in sparse networks. Phys. Rev. Lett. 107, 065701 (2011).
Article ADS PubMed Google Scholar
Mossel, E., Neeman, J. & Sly, A. Reconstruction and estimation in the planted partition model. Probab. Theory Relat. Fields 162, 431–461 (2015).
Article MathSciNet MATH Google Scholar
Massoulié, L. Community detection thresholds and the weak ramanujan property. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing, STOC ’14, 694–703 (ACM, New York, 2014).
Holland, P. W., Laskey, K. B. & Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 5, 109–137 (1983).
Article MathSciNet Google Scholar
Wang, Y. J. & Wong, G. Y. Stochastic blockmodels for directed graphs. J. Am. Stat. Assoc. 82, 8–19 (1987).
Article MathSciNet MATH Google Scholar
Abbe, E. Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 18, 1–86 (2018).
MathSciNet MATH Google Scholar
Kawamoto, T. & Kabashima, Y. Counting the number of metastable states in the modularity landscape: Algorithmic detectability limit of greedy algorithms in community detection. Phys. Rev. E 99, 010301 (2019).
Article ADS CAS PubMed Google Scholar
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008 (2008).
Article MATH Google Scholar
Traag, V. A., Waltman, L. & Van Eck, N. J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 9, 1–12 (2019).
Article CAS Google Scholar
Fortunato, S. & Barthélemy, M. Resolution limit in community detection. Proc. Natl. Acad. Sci. U.S.A. 104, 36–41 (2007).
Article ADS CAS PubMed Google Scholar
Peixoto, T. P. Model selection and hypothesis testing for large-scale network models with overlapping groups. Phys. Rev. X 5, 011033 (2015).
Google Scholar
Peixoto, T. P. Nonparametric Bayesian inference of the microcanonical stochastic block model. Phys. Rev. E 95, 012317 (2017).
Article ADS PubMed Google Scholar
Kawamoto, T. & Kabashima, Y. Cross-validation estimate of the number of clusters in a network. Sci. Rep. 7, 3327 (2017).
Article ADS PubMed PubMed Central Google Scholar
Kawamoto, T. & Kabashima, Y. Comparative analysis on the selection of number of clusters in community detection. Phys. Rev. E 97, 022315 (2018).
Article ADS CAS PubMed Google Scholar
Munoz-Mendez, F., Klemmer, K., Han, K. & Jarvis, S. Community structures, interactions and dynamics in London’s bicycle sharing network. In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers - UbiComp ’18, 1015–1023 (ACM Press, New York, New York, USA, 2018). http://dl.acm.org/citation.cfm?doid=3267305.3274156.
https://github.com/konstantinklemmer/bikecommclust.
Lambiotte, R. & Rosvall, M. Ranking and clustering of nodes in networks with smart teleportation. Phys. Rev. E 85, 056107 (2012).
Article ADS CAS Google Scholar
Rosvall, M. & Bergstrom, C. T. Mapping change in large networks. PLoS ONE 5, 1–7 (2010).
Article Google Scholar
Clauset, A., Moore, C. & Newman, M. E. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98–101 (2008).
Article ADS CAS PubMed Google Scholar
https://mapequation.github.io/infomap/python/.

Download references

Acknowledgements

The author is grateful to Martin Rosvall for inspiring discussions. The author also thanks Christopher Blöcker and Teruyoshi Kobayashi for fruitful comments. This work was supported by JST ACT-X Grant No. JPMJAX21A8 and JSPS KAKENHI No. 19H01506.

Author information

Authors and Affiliations

Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, 135-0064, Japan
Tatsuro Kawamoto

Authors

Tatsuro Kawamoto
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.K. designed the research, conducted analytical calculations, analyzed real data, implemented the code, and wrote the manuscript.

Corresponding author

Correspondence to Tatsuro Kawamoto.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kawamoto, T. Single-trajectory map equation. Sci Rep 13, 6597 (2023). https://doi.org/10.1038/s41598-023-33880-y

Download citation

Received: 01 December 2022
Accepted: 20 April 2023
Published: 22 April 2023
DOI: https://doi.org/10.1038/s41598-023-33880-y

This article is cited by

Circulation of a digital community currency
- Carolina E. S. Mattsson
- Teodoro Criscione
- Frank W. Takes
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Genome-wide association studies

Reconstructing the evolution history of networked complex systems

High-throughput prediction of protein conformational distributions with subsampled AlphaFold2

Introduction

Results

Revisiting the map equation

Single-trajectory map equation

Experiments

Network datasets

Bike-sharing dataset: application to a set of trajectories

Discussion

Methods

Optimisation algorithm

Resolution limit

General form

Ring of cliques

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Circulation of a digital community currency

Comments

Search

Quick links