Introduction

Community detection plays a vital role in various disciplines of science dealing with network data. It is an unsupervised learning task to classify the node set in a network into groups, or modules. Each subgraph that is identified as a module has no significant internal structure, while we expect each module to be structurally distinct. Thus, community detection provides a concise explanation of the dataset by ignoring detailed structures as noises within a module while preserving a significant macroscopic organisation as a module structure1,2,3,4. Analogous to other machine-learning tasks, because it is often not apparent which parts of the dataset represent noise, community detection algorithms suffer from overfitting and underfitting problems5.

The map equation6 is a popular community detection method for networks and is formulated as a minimization problem of an information-theoretic objective function describing the average code length of the random walk. Infomap7—an implementation for a greedy optimisation of the map equation—has often been used to analyze real-world datasets. Furthermore, there are several extensions of the map equation itself8,9,10,11,12,13,14,15,16,17,18,19, mainly focusing on incorporating higher-order network information. However, the map equation is prone to overfitting, particularly for sparse networks. Despite the map equation providing an optimal module structure for the description of the random walk on a network, an excessively fine module structure may be obtained. For example, as illustrated in Fig. 1 (left), many small modules are often identified in addition to a few large modules. When too many modules consisting of only a few nodes are identified, we have difficulty interpreting it as a concise explanation of the dataset.

This study revisits the map equation and considers its raw form, which we refer to as the single-trajectory map equation. Its objective function is the average code length of (not necessarily random) walkers with finite path lengths. The concept of the single-trajectory map equation already appears in the original paper6 for a schematic description of the map equation. Nevertheless, this raw form has never been actively studied or utilised although it is a valuable variant with a mechanism to prune small modules and prevent the map equation from overfitting (as depicted in Fig. 1 (right)).

Figure 1
figure 1

Community detection based on the map equation and the single-trajectory map equation applied to a synthetic network. The nodes in the same colour belong to the same module. See “Experiments” section for details.

The emergence of small or highly unbalanced modules has been discussed in various contexts in the community detection literature. It is often considered to be the nature of real-world datasets20,21,22, or a phenomenon that occurs because of the implementation details of an algorithm21,23. In the context of the inference problem, the emergence of small modules is interpreted as artefacts caused by overfitting. Optimization-based (or maximum likelihood-based) methods are typically prone to overfitting, whereas methods based on Bayesian formulations avoid partitioning a network or subgraph where there is no statistically significant internal structure24,25, that is, they avoid generating small modules. The map equation also has a Bayesian counterpart16,19. Regardless of the underlying mechanism, the pruning of small modules is sometimes preferred in practice because it provides a more concise explanation of the network. A similar issue can be found in the regression problem in supervised learning26. Although the ridge regression is a principled method, many variables with extremely small coefficients are often assessed as significant. In contrast, the lasso regression prunes such variables and provides a concise description of the dataset.

The single-trajectory map equation is a variant of the map equation that achieves a partition of coarser-resolution scale. Our approach differs from that of the hierarchical map equation8 that achieves finer-resolution partitions27. Although a high-resolution community detection method is useful when the network consists of several small modules, a method with a coarser resolution is also needed when an algorithm suffers from overfitting. For bipartite networks17, showed that a coarser resolution can be obtained by incorporating the bipartiteness property. A different resolution scale can also be obtained by introducing the “Markov time”12,28, which is an external parameter of the random walk. However, as shown in the following, the framework of the map equation can intrinsically prune small modules when formulated as the single-trajectory map equation and the balanced-size modules can be identified in a principled manner.

Results

Revisiting the map equation

We proceed with the step-by-step formulation of the map equation. A prominent characteristic of the map equation is the hierarchical encoding scheme for the random walk using multiple codebooks that takes account of module structure in a network. As a specific example, let us consider the encoding of a trajectory in a network as shown in Fig. 2a. We let \({\varvec{\zeta }} = \{ \zeta _{0}, \dots , \zeta _{T-1} \}\) be a trajectory of a walker, where \(\zeta _{t} \in V\) is the tth visited node. We also consider a partition \({\varvec{\sigma }} = \{ \sigma _{1}, \dots , \sigma _{N} \}\) of node set V (\(|V| = N\)), where \(\sigma _{i} \in \{1, \dots , K\}\) is the module label of node \(i \in V\). K is the number of modules. A trajectory of a walker is encoded using two types of codebooks: inter-module and intra-module codebooks. The inter-module codebook describes the transitions of a walker moving into another module. In contrast, each intra-module codebook describes the walker transiting between nodes within the module or exiting the module.

Figure 2
figure 2

Trajectories (yellow solid lines) on a network and their encoding for a given node partition. Nodes in the same module have the same symbol and colour representations. The codewords in each codebook are listed on the right. Whereas there is only one trajectory in (a), another trajectory is added in (b). The average code length of each trajectory is shown at the bottom.

The actual codewords for a trajectory based on the Huffman coding29,30 are shown in Fig. 2a. Starting with the code “10” for the module that the walker has first visited, the trajectory is described by indicating the visited nodes. Every time the walker moves to a different module, the exiting code of the previous module and the entering code of the next module are consumed.

In general, given a trajectory \({\varvec{\zeta }}\) of length T and module assignments \({\varvec{\sigma }}\), the average code length \({\mathscr {L}}\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) \) is expressed as

$$\begin{aligned}&{\mathscr {L}}\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) = \frac{1}{T} \Biggl ( \sum _{t=0}^{T-1} \ell _{0}\left( \zeta _{t}, \sigma _{\zeta _{t}} \right) + \sum _{\begin{array}{c} t=0 \\ (\sigma _{\zeta _{t}} \ne \sigma _{\zeta _{t+1}}) \end{array} }^{T-2} \biggl ( \ell _{0}\left( \curvearrowright , \sigma _{\zeta _{t}} \right) + \ell _{1}\left( \sigma _{\zeta _{t+1}} \right) \biggr ) +\ell _{1}\left( \sigma _{\zeta _{0}} \right) \Biggr ). \end{aligned}$$
(1)

Here, we denote \(\ell _{0}\left( i, \sigma \right) \) as the length of the code in an intra-module codebook, indicating that a walker visits node \(i \in V\) in module \(\sigma \); \(\ell _{0}\left( \curvearrowright , \sigma \right) \) as the length of the code in an intra-module codebook, indicating that a walker exits module \(\sigma \); and \(\ell _{1}\left( \sigma \right) \) as the length of the code in an inter-module codebook, indicating that a walker enters module \(\sigma \). In Eq. (1), the first summation represents the code length for visited nodes and the second summation represents the code length for transitions between modules. The last term is the code length for the module at the starting point, with a negligible contribution when \(T \gg 1\). It is important that the codebooks are coupled; that is, because an exiting code from a module belongs to an intra-module codebook, transitions between modules affect the encoding of transitions within each module.

The principle of the map equation framework is that the compression of the average code length through the hierarchical coding reveals a module structure as an optimal partition \({\varvec{\sigma }}\). Readers might believe that the introduction of codewords for transitions between modules simply makes the code length longer. However, such a hierarchical encoding scheme can compress the average code length because it allows us to assign shorter codewords for visited nodes; for example, although the code “0” is assigned to two different nodes in Fig. 2a, they are distinguishable because they belong to different modules. Therefore, when a trajectory rarely consumes the codewords for transitions between modules, the average code length can be compressed more efficiently.

Equation (1) can also be expressed using visiting frequencies as follows:

$$\begin{aligned}&{\mathscr {L}}\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) = \left( \frac{1}{T} + \sum _{{\tilde{\sigma }}^{\prime }, {\tilde{\sigma }}} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } {\tilde{\sigma }} } \right) {\mathscr {H}}_{1} + \sum _{\sigma } \left( \sum _{j \in \sigma } \hat{{\textsf{p}}}_{j} + \sum _{{\tilde{\sigma }}^{\prime }} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } \sigma } \right) {\mathscr {H}}^{\sigma }_{0}, \end{aligned}$$
(2)
$$\begin{aligned}&{\left\{ \begin{array}{ll} {\mathscr {H}}_{1} = \sum _{\sigma ^{\prime }} \frac{\sum _{\sigma }\hat{{\textsf{p}}}_{ \sigma ^{\prime } \sigma }}{ 1/T + \sum _{{\tilde{\sigma }}^{\prime }, {\tilde{\sigma }}} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } {\tilde{\sigma }} } } \ell _{1}\left( \sigma ^{\prime } \right) + \frac{ 1/T }{ 1/T + \sum _{{\tilde{\sigma }}^{\prime }, {\tilde{\sigma }}} \hat{{\textsf{p}}}_{{\tilde{\sigma }}^{\prime } {\tilde{\sigma }} } } \ell _{1}\left( \sigma _{\zeta _{0}} \right) \\ {\mathscr {H}}^{\sigma }_{0} = \sum _{i \in \sigma } \frac{\hat{{\textsf{p}}}_{i}}{ \sum _{j \in \sigma } \hat{{\textsf{p}}}_{j} + \sum _{{\tilde{\sigma }}^{\prime }} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } \sigma } } \ell _{0}\left( i, \sigma \right) + \frac{ \sum _{\sigma ^{\prime }} \hat{{\textsf{p}}}_{ \sigma ^{\prime } \sigma } }{ \sum _{j \in \sigma } \hat{{\textsf{p}}}_{j} + \sum _{{\tilde{\sigma }}^{\prime }} \hat{{\textsf{p}}}_{ {\tilde{\sigma }}^{\prime } \sigma } } \ell _{0}\left( \curvearrowright , \sigma \right) \end{array}\right. }, \end{aligned}$$
(3)

where \(\sum _{i \in \sigma }\) represents the sum over the node set in module \(\sigma \). \(\hat{{\textsf{p}}}_{i}\) is the visiting frequency of node \(i \in V\) and \(\hat{{\textsf{p}}}_{ \sigma ^{\prime } \sigma }\) is the joint transition frequency from module \(\sigma \) to module \(\sigma ^{\prime }\), i.e.,

$$\begin{aligned} \hat{{\textsf{p}}}_{i}&= \frac{1}{T} \sum _{t=0}^{T-1} \delta _{i, \zeta _{t}}, \end{aligned}$$
(4)
$$\begin{aligned} \hat{{\textsf{p}}}_{\sigma ^{\prime } \sigma }&= {\left\{ \begin{array}{ll} \frac{1}{T} \sum _{t=0}^{T-2} \delta _{\sigma ^{\prime }, \sigma _{\zeta _{t+1}}} \delta _{\sigma , \sigma _{\zeta _{t}}} &{} (\text {for } \sigma \ne \sigma ^{\prime })\\ 0 &{} (\text {for } \sigma = \sigma ^{\prime }) \end{array}\right. }, \end{aligned}$$
(5)

where \(\delta _{ab}\) represents the Kronecker delta. \({\mathscr {H}}^{\sigma }_{0}\) and \({\mathscr {H}}_{1}\) are conditional average code lengths within the intra- and inter-module codebooks, respectively.

Recall that the random walk is a stochastic variable; there is no such thing as a single (finite-length) trajectory representing the random walk. Therefore, instead of a specific trajectory, we consider the expected average code length \({\mathbb {E}}_{{\varvec{Z}}}\left[ {\mathscr {L}}\left( {\varvec{Z}}, {\varvec{\sigma }} \right) \right] \) in the map equation, where \({\varvec{Z}}\) is the stochastic variable representing the random walk; in other words, \({\mathbb {E}}_{{\varvec{Z}}}[\cdots ]\) is the ensemble average over all possible trajectories (say, from all possible starting points). We assume that the trajectory length T is sufficiently large that the random walk is in a steady state. When the network is strongly connected, the empirical frequencies are converted to the corresponding steady-state probabilities. \(\sum _{\sigma } \hat{{\textsf{p}}}_{\sigma ^{\prime } \sigma }\) is converted to the entering probability into module \(\sigma ^{\prime }\) denoted by \(q_{\sigma ^{\prime } \curvearrowleft }\); \(\sum _{\sigma ^{\prime }} \hat{{\textsf{p}}}_{\sigma ^{\prime } \sigma }\) to the exiting probability from module \(\sigma \) denoted by \(q_{\sigma \curvearrowright }\); and \(\hat{{\textsf{p}}}_{i}\) to the visiting probability of node i denoted by \(q_{i}\). The conditional average code lengths \({\mathscr {H}}^{\sigma }_{0}\) and \({\mathscr {H}}_{1}\) are also converted to the expectations. According to Shannon’s source coding theorem30,31, these expectations are respectively bounded by the Shannon entropies,

$$\begin{aligned} H_{1}\left( \{ q_{\sigma \curvearrowleft } \} \right)&= -\sum _{\sigma =1}^{K} \frac{q_{\sigma \curvearrowleft }}{q_{\curvearrowleft }} \log \frac{q_{\sigma \curvearrowleft }}{q_{\curvearrowleft }}, \end{aligned}$$
(6)
$$\begin{aligned} H^{\sigma }_{0}\left( q_{\sigma \curvearrowright }, \{ q_{i} \}_{i\in \sigma } \right)&= -\frac{q_{\sigma \curvearrowright }}{p^{\sigma }_{\circlearrowright }} \log \frac{q_{\sigma \curvearrowright }}{p^{\sigma }_{\circlearrowright }} -\sum _{i \in \sigma } \frac{q_{i}}{p^{\sigma }_{\circlearrowright }} \log \frac{q_{i}}{p^{\sigma }_{\circlearrowright }}, \end{aligned}$$
(7)

where \(q_{\curvearrowleft } = \sum _{\sigma =1}^{K} q_{\sigma \curvearrowleft }\), \(p^{\sigma }_{\circlearrowright } = q_{\sigma \curvearrowright } + \sum _{i \in \sigma } q_{i}\), and \(\log \) is the logarithm with base 2. Then, the expected average code length of the random walk is bounded from below as follows:

$$\begin{aligned} L\left( {\varvec{\sigma }} \right)&= q_{\curvearrowleft } H_{1}\left( \{ q_{\sigma \curvearrowleft } \} \right) + \sum _{\sigma } p^{\sigma }_{\circlearrowright } H^{\sigma }_{0}\left( q_{\sigma \curvearrowright }, \{ q_{i} \}_{i\in \sigma } \right) \le {\mathbb {E}}_{{\varvec{Z}}}\left[ {\mathscr {L}}\left( {\varvec{Z}}, {\varvec{\sigma }} \right) \right] . \end{aligned}$$
(8)

Note here that the contribution from the starting points of the random walk is excluded. This lower bound asymptotically coincides with the expected average code length itself as \(T \rightarrow \infty \). This is the objective function of the map equation and the node partition \({\varvec{\sigma }}\) is optimised so that \(L\left( {\varvec{\sigma }} \right) \) is minimised.

The assumption that the network is strongly connected plays a vital role in the aforementioned derivation. If this is not the case, the trajectory length T cannot be sufficiently large. Then, the contribution from the starting points of the random walk may not be negligible in \({\mathbb {E}}_{{\varvec{Z}}}\left[ {\mathscr {L}}\left( {\varvec{Z}}, {\varvec{\sigma }} \right) \right] \). Therefore, we can say that the map equation evaluates the code length of the “flow.” It is a stochastic variable representing the ensemble of transitions, and it has no information about the starting points of the random walk by definition (as discussed below, this distinction becomes more prominent when we consider the - -flow-model rawdir option in Infomap). The only input for the map equation is a network because the connectivity of nodes fully characterises the flow. By the introduction of so-called teleportation32 to the random walk that moves the walker to another node randomly with a certain probability, we can always let the trajectory length T be infinitely large and make the flow ergodic6. Therefore, the map equation is not essentially limited to strongly-connected networks.

Single-trajectory map equation

The average code length \({\mathscr {L}}\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) \) of a trajectory is the raw form of the objective function in the map equation. When we have multiple trajectories \(\{{\varvec{\zeta }}_{a}\} := \{{\varvec{\zeta }}_{1}, \dots , {\varvec{\zeta }}_{M}\}\) on a common node set, analogous to the expected average code length \({\mathbb {E}}_{{\varvec{Z}}}\left[ {\mathscr {L}}\left( {\varvec{Z}}, {\varvec{\sigma }} \right) \right] \), we consider the following mean average code length:

$$\begin{aligned} \overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) = \frac{1}{M} \sum _{a=1}^{M} {\mathscr {L}}\left( {\varvec{\zeta }}_{a}, {\varvec{\sigma }} \right) . \end{aligned}$$
(9)

Each trajectory may have different lengths. Similar to the \(L\left( {\varvec{\sigma }} \right) \), this mean average code length can be used as a minimization function to determine the optimal module assignments of nodes. We refer to such an optimisation method as the single-trajectory map equation. Note that the trajectories \(\{{\varvec{\zeta }}_{a}\}\) are provided as inputs in Eq. (9); unlike the map equation, there is no need to assume that they are generated (or simulated) from random walks, although one can consider simulated walks as trajectories.

The average code lengths in the summation of Eq. (9) are not independent because they share codebooks. To illustrate this, let us consider two trajectories as shown in Fig. 2b. Although the trajectory \({\varvec{\zeta }}_{1}\) is identical to \({\varvec{\zeta }}\) in Fig. 2a, the codes describing them are different because we must assign codewords for the nodes that \({\varvec{\zeta }}_{1}\) does not go through due to the existence of trajectory \({\varvec{\zeta }}_{2}\). In contrast, the nodes where no trajectories go through do not contribute to the average code lengths, reflecting the fact that the trajectories of finite lengths are considered. Those nodes should not have any module labels because there is no information based on the trajectories.

As we have seen, \(L\left( {\varvec{\sigma }} \right) \) and \(\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) are conceptually different. In the map equation, \(L\left( {\varvec{\sigma }} \right) \) is the expected average code length for the flow that is completely specified by the transition probabilities. We can also modify the transitions using teleportation to make the random walk ergodic. By contrast, \(\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) does not have such a stochasticity. It is the mean of the actual average code lengths, where each element corresponds to a single trajectory. Furthermore, \(\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) depends explicitly on the coding scheme applied, e.g., the Huffman coding, Shannon–Fano coding30, etc. Quantitatively, the contribution from the last term in Eq. (1) mainly makes the minimization of \(\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) distinct from that of \(L\left( {\varvec{\sigma }} \right) \). The codeword for the module that is required to specify the starting point of a trajectory makes the coding using multiple codebooks less efficient. Recall that an efficient compression is achieved when the inter-module codebook is not frequently used. This implies that the introduction of module labels is more costly in \(\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) and the single-trajectory map equation avoids generating many small modules.

The single-trajectory map equation searches for the node partition \({\varvec{\sigma }}\) that achieves the optimal compression for the description of trajectories under a certain coding scheme. The optimality of the coding scheme itself is not required for the effectiveness of the method. Therefore, we can use different types of coding for the intra-module and inter-module codebooks. For example, we can introduce a heterogeneous coding where the code lengths are multiplied by a constant factor \(\lambda > 0\) for the codewords in the inter-module codebook. That is, given a trajectory \({\varvec{\zeta }}\) and a partition \({\varvec{\sigma }}\), Eq. (1) is modified to

$$\begin{aligned}&{\mathscr {L}}_{\lambda }\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) = \frac{1}{T} \Biggl ( \sum _{t=0}^{T-1} \ell _{0}\left( \zeta _{t}, \sigma _{\zeta _{t}} \right) + \sum _{\begin{array}{c} t=0 \\ (\sigma _{\zeta _{t}} \ne \sigma _{\zeta _{t+1}}) \end{array} }^{T-2} \biggl ( \ell _{0}\left( \curvearrowright , \sigma _{\zeta _{t}} \right) + \lambda \ell _{1}\left( \sigma _{\zeta _{t+1}} \right) \biggr ) +\lambda \ell _{1}\left( \sigma _{\zeta _{0}} \right) \Biggr ). \end{aligned}$$
(10)

Here, \(\lambda \) is a hyperparameter that penalises the emergence of modules when such modules are relatively inefficient for the compression of the code length.

We can also derive a lower bound for the actual code length using Shannon’s source coding theorem, similar to how \(L\left( {\varvec{\sigma }} \right) \) was such an estimate for the random walk in the steady-state limit. To this end, we consider the average code length of the concatenated code, \(\sum _{a=1}^{M} T_{a} {\mathscr {L}}\left( {\varvec{\zeta }}_{a}, {\varvec{\sigma }} \right) /\sum _{a=1}^{M} T_{a}\), where \(T_{a}\) is the length of the ath trajectory; equivalently \(\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) when all trajectories have the same length. We regard the empirical frequencies \(\hat{{\textsf{p}}}_{i}\) and \(\hat{{\textsf{p}}}_{\sigma ^{\prime } \sigma }\) as the true probabilities for the stochastic variables indicating the codewords and the concatenated code as the expected code length. Then, the conditional average code lengths in Eq. (3) are bounded from below by the Shannon entropies30 with the empirical frequencies. Therefore, the average code length of the concatenated code is bounded as follows:

$$\begin{aligned} \underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right)&= \hat{{\textsf{q}}}_{\curvearrowleft } H_{1}\left( \{ \hat{{\textsf{q}}}_{\sigma \curvearrowleft } \} \right) + \sum _{\sigma } \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } H^{\sigma }_{0}\left( \hat{{\textsf{q}}}_{\sigma \curvearrowright }, \{ \hat{{\textsf{q}}}_{i} \}_{i\in \sigma } \right) \le \frac{1}{\sum _{a=1}^{M} T_{a}}\sum _{a=1}^{M} T_{a} {\mathscr {L}}\left( {\varvec{\zeta }}_{a}, {\varvec{\sigma }} \right) , \end{aligned}$$
(11)

where

$$\begin{aligned} \hat{{\textsf{q}}}_{i}&= \frac{1}{\sum _{a=1}^{M} T_{a}} \sum _{a=1}^{M} \sum _{t=0}^{T_{a}-1} \delta _{i, \zeta _{a t}}, \nonumber \\ \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }&= {\left\{ \begin{array}{ll} \displaystyle \frac{1}{\sum _{a=1}^{M} T_{a}} \sum _{a=1}^{M} \sum _{t=0}^{T_{a}-2} \delta _{\sigma ^{\prime }, \sigma _{\zeta _{a t+1}}} \delta _{\sigma , \sigma _{\zeta _{a t}}} &{} (\sigma \ne \sigma ^{\prime })\\ 0 &{} (\sigma = \sigma ^{\prime }) \end{array}\right. }, \nonumber \\ \hat{{\textsf{q}}}_{\sigma ^{\prime } \curvearrowleft }&= \frac{ \sum _{a=1}^{M} \delta _{\sigma ^{\prime }, \sigma _{\zeta _{a 0}} } }{\sum _{a=1}^{M} T_{a}} + \sum _{\sigma } \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }, \hat{{\textsf{q}}}_{\sigma \curvearrowright } = \sum _{\sigma ^{\prime }} \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }, \nonumber \\ \hat{{\textsf{q}}}_{\curvearrowleft }&= \sum _{\sigma } \hat{{\textsf{q}}}_{\sigma \curvearrowleft }, \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } = \hat{{\textsf{q}}}_{\sigma \curvearrowright } + \sum _{i \in \sigma } \hat{{\textsf{q}}}_{i}. \end{aligned}$$
(12)

We regard \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) as an alternative objective function for the single-trajectory map equation. \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) is independent of the coding scheme and its minimization is computationally more efficient than that of \(\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) because we do not need to construct the codebooks explicitly. Note that \(\hat{{\textsf{q}}}_{\sigma \curvearrowleft }\) and \(\hat{{\textsf{q}}}_{\sigma \curvearrowright }\) may not coincide in Eq. (12), whereas \(q_{\sigma \curvearrowleft } = q_{\sigma \curvearrowright }\) for any module in \(L\left( {\varvec{\sigma }} \right) \) owing to the detailed balance condition of the random walk in the steady state. Analogous to \({\mathscr {L}}_{\lambda }\left( {\varvec{\zeta }}, {\varvec{\sigma }} \right) \) in Eq. (10), we can also consider a heterogeneous coding in \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \), i.e.,

$$\begin{aligned} \underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right)&= \lambda \, \hat{{\textsf{q}}}_{\curvearrowleft } H_{1}\left( \{ \hat{{\textsf{q}}}_{\sigma \curvearrowleft } \} \right) + \sum _{\sigma } \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } H^{\sigma }_{0}\left( \{ \hat{{\textsf{q}}}_{\sigma \curvearrowright } \}, \{ \hat{{\textsf{q}}}_{i} \}_{i\in \sigma } \right) . \end{aligned}$$
(13)

Interestingly, the method using \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) is also related to a variant of the map equation that is implemented in Infomap as an option named - -flow-model rawdir. In this variant of the map equation, we consider the flow based on the set of transition probabilities induced by the edges (i.e., not the random walk on the network). The corresponding objective function is in fact equivalent to \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) where we ignore the codewords for the initial module and the initial node in each trajectory (consequently, the total length of trajectories \(\sum _{a=1}^{M} T_{a}\) is also modified to \(\sum _{a=1}^{M} T_{a} - M\)). A summary table of the average code lengths is shown in Supplementary Table 1 (Section S1).

Figure 3
figure 3

Average code lengths for a trajectory on a path (illustrated at the top) obtained by minimising \({\mathscr {L}}\left( {\varvec{\sigma }}; {\varvec{\zeta }} \right) \) (“Huffman”: Huffman coding, “Shannon–Fano”: Shannon–Fano coding) and \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; {\varvec{\zeta }} \right) \) (“lower bound”). The expected average code length \(L( {\varvec{\sigma }})\) (“map equation”) based on the set of transition probabilities induced by the edges, i.e., the - -flow-model rawdir option, is also shown. The number of detected modules in each method is indicated at the top of each bar.

Before moving on, let us compare how the minimizations of \(L( {\varvec{\sigma }})\), \(\overline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )\), and \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) differ using a simple example. We consider a trajectory where a walker visits each node exactly once on a path. Figure 3 shows the results obtained through the exact minimization of the objective functions. It quantifies how the average code lengths approach a common value as N increases, because the contribution from the starting point of the trajectory becomes negligible. \({\mathscr {L}}\left( {\varvec{\sigma }}; {\varvec{\zeta }} \right) \) and \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; {\varvec{\zeta }} \right) \) quickly approach to each other, whereas \(L( {\varvec{\sigma }})\) converges relatively slowly, implying that the contribution from the codeword of the initial module can be considerable. We also confirmed that the single-trajectory map equation indeed tends to identify a smaller number of modules, and the resulting partitions can vary depending on the coding scheme applied.

The exact minimization of the (expected) average code length is not computationally feasible unless a dataset is extremely small, and thus, we must rely on approximate heuristics in practice. The greedy heuristic implemented in Infomap is commonly used for the map equation. Therefore, we implemented the optimisation for the single-trajectory map equation as a wrapper of Infomap. That is, we first run Infomap as the initial state of the node partition, and then, reduce overfitting by pruning small modules based on \(\overline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )\) or \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; {\varvec{\zeta }} \right) \) as a fine-tuning process; our fine-tuning algorithm is also a greedy heuristic. In the following, we refer to this algorithm as Infomap+, and the implementation code is publicly available33. Further details of the algorithm are described in Methods section.

Experiments

This section demonstrates that the single-trajectory map equation prevents overfitting using datasets represented as networks and a real-world dataset as a set of trajectories. A network is a special case of trajectory datasets because each directed edge can be regarded as a trajectory with length \(T=2\). We treat each edge as a pair of directed edges in both directions for an undirected network. All networks considered are weakly connected.

For the network datasets, we can also consider simulated walks on the underlying network as the input trajectories. In this setting, we would need to specify the type of simulated walks and choose the values of T and M as hyperparameters. Herein, however, we do not consider the simulated walks and treat the edges set directly as the set of trajectories.

Network datasets

We first consider synthetic networks that are generated by the stochastic block model (SBM)25,37,38,39, which is a random graph model having a planted (pre-assigned) module structure. This is a canonical model that is used for analyses in community detection. We particularly consider the so-called symmetric SBM that has two equally-sized planted modules. Each pair of nodes in the same planted module is connected with probability \(p_{\textrm{in}}\) and each pair of nodes in different planted modules is connected with probability \(p_{\textrm{out}}\). The symmetric SBM is commonly parameterized by the average degree c and the fuzziness of module structure \(\epsilon = p_{\textrm{out}}/p_{\textrm{in}}\) instead of \(p_{\textrm{in}}\) and \(p_{\textrm{out}}\). The detection of planted modules is easier when \(\epsilon \) is small because the module structure is clearer. Even when \(\epsilon < 1\), there exists a critical value of \(\epsilon \) above which it becomes impossible to identify the planted module structure better than by chance; this is known as the detectability limit24,34,35,36,39,40 (in \(N \rightarrow \infty \)). For these networks, the single-trajectory map equation cannot be the best method, as the Bayesian inference methods based on the SBM can avoid overfitting at all.

Figure 4
figure 4

Performance of modularity maximization methods (the Louvain and Leiden algorithms), Infomap (- -two-level), and algorithms for the single-trajectory map equation based on \(\overline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )\) and \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) (“Infomap+”) on the symmetric SBM (\(N=1,000\), \(c=12\)). We generated five instances of the SBM for various fuzziness of the module structure \(\epsilon \) and plotted the distribution of the resulting relative module sizes. Herein, we set \(\lambda = 1\). The performances of all algorithms change around the algorithmic detectability limit \(\epsilon \approx 0.15\), which is distinct from the information-theoretic detectability limit located at \(\epsilon = (\sqrt{c}-1)/(\sqrt{c}+1) \simeq 0.55\)34,35,36.

Figure 4 shows the results of community detection based on the map equation and the single-trajectory map equation applied to the SBM. Each point represents the relative module size, which is defined as \(\sum _{i=1}^{N} \delta _{\sigma _{i}, \sigma }/N\) for module \(\sigma \). The results based on modularity maximization (the Louvain41 and Leiden42 algorithms) are also shown for comparison. The - -two-level option in Infomap indicates that it is the method introduced in the original paper for the map equation.

The Infomap (incorrectly) identifies several small modules even when the module structure is relatively clear, whereas Infomap+ prunes such small modules and identifies the equally-sized modules. The network plots in Fig. 1 are the results of the same experiment but with the SBM parameters \(N=300\), \(c=8\), and \(\epsilon =0.1\). Although the modularity-based algorithms also identify small modules, Infomap is more prone to overfitting in the region where \(\epsilon \) is small. This phenomenon can be described by the map equation having a finer resolution limit27 compared with that of the modularity43, i.e., the map equation can identify smaller modules. Note, however, that the analysis of the resolution limit is based on an extreme-case example that has a well-defined module structure; it does not describe the whole behaviour in Fig. 4. In the region of \(\epsilon \) above the detectability limit (\(\epsilon \approx 0.15\)40), the modularity-based algorithms subdivide the planted modules into a number of smaller-sized modules. This is problematic because a practitioner can hardly realise when the resulting partition is due to overfitting. In contrast, most of the map equation-based algorithms do not partition a network in that region, implying that they avoid overfitting.

Although we showed the relative module sizes obtained by the algorithms, readers might wonder whether the identified modules are actually consistent with the planted ones. In Supplementary Fig. 1 (Section S2), we confirmed that the inferred and planted module structures are indeed highly consistent when the number of modules is correctly estimated. Note also that community detection algorithms generally suffer from overfitting and underfitting more severely when the average degree c is smaller. Therefore, all the methods considered here are expected to perform less accurately when c is extremely small.

The experiments here can be conducted for larger networks. In that case, however, some of the plots in Fig. 4 would be unnecessarily difficult to read because we would have many more points due to ovefitting. Moreover, the comparison with the previously known result on the detectability limit may be difficult for larger networks, because it is observed in40 that algorithmic detectability limits of greedy algorithms can be size-dependent.

Figure 5
figure 5

Relative module sizes obtained by Infomap and Infomap+ for real-world networks. The number of identified modules is depicted at the top of each result. The dashed line represents 0.01.

We then apply the algorithms for the single-trajectory map equation to real-world networks. Figure 5 shows the relative module sizes obtained using Infomap and Infomap+. It shows that small modules are pruned, yet larger modules remain identified in most of the cases with Infomap+. Although all variants of Infomap+ often provide similar partitions, empirically, the Huffman coding method finds a good balance of module sizes in real-world networks. The datasets considered here are often analyzed in the literature on community detection. For example, readers can compare the results here with those of Bayesian inference methods reported in5,44,45,46,47.

In Fig. 5, the value of the hyperparameter \(\lambda \), which acts as a resolution parameter, is adjusted for each network so that the size of the smallest module is not less than \(\min \{3, N/100\}\) (this adjustment can be performed automatically). The selected values of \(\lambda \) and the details of experimental settings and datasets are provided in Supplementary Table 2 (Section S3). We also examined the \(\lambda \)-dependency in Fig. 6, and we found that the number of modules varies within \(1 \le \lambda < 2\) in many datasets; in the Method section, we show that \(\lambda =2\) is a practical upper bound according to the resolution limit. Note that the threshold \(\min \{3, N/100\}\) is only a reference to determine a reasonable value of \(\lambda \); when Infomap+ excessively prunes modules, one can directly tune \(\lambda \) to resolve the underfitting problem.

Figure 6
figure 6

Number of modules identified for each value of the hyperparameter \(\lambda \) in Infomap+.

Bike-sharing dataset: application to a set of trajectories

Finally, we compare the methods using a dataset of a bike-sharing service in London48,49, which is a dataset consisting of trajectories (sequences of bike stations visited) that individual bikes have travelled in a day (see Supplementary Information (Section S4) for the details of the dataset); thus, \(T_{a}\) is the number of stations that bike a has visited. Figure 7a illustrates trajectories of three bikes in the dataset. Community detection of the trajectories identifies the area within which a bike is often used. Figure 7b shows the partition obtained by minimising \(L( {\varvec{\sigma }})\) using Infomap; here, we constructed a network by decomposing each trajectory into a set of edges between successive pairs of stations. As a result, we obtained eight modules; in addition to four large modules, several modules consisting of only a few stations are also identified. In Fig. 7c which shows the partitions obtained by minimising \(\overline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )\), we no longer observe the small modules. Although Fig. 7c is of the Huffman coding method (\(\lambda =1\)), we obtain the same partition with the Shannon–Fano coding method (\(\lambda =1\)) and with the method minimising \(\underline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )\) (\(\lambda =1.8\)).

Figure 7
figure 7

Community detection of the bike-sharing dataset. (a) Three trajectories in the dataset, where each point (node) represents the location of a bike station. The partitions of the stations are obtained by minimising (b) \(L( {\varvec{\sigma }})\) (detected 8 modules) and (c) \(\overline{{\mathscr {L}}}( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} )\) based on the Huffman coding (detected 4 modules). The stations in the same module are indicated in the same colour.

Discussion

This study revisited the formulation of the map equation and shed light on many details hidden in its principle. We addressed the fact that the encoding of trajectories is qualitatively distinct from the encoding of the flow on a network and proposed the single-trajectory map equation. Importantly, the proposed method can prune small modules and prevent overfitting.

The single-trajectory map equation provides a more balanced community structure compared with the map equation. Whereas balanced partitions may not always be desirable, it is often beneficial because we can prune spurious modules due to overfitting as demonstrated in Fig. 1. Furthermore, the analysis in the Method section implies that the single-trajectory map equation is not prone to underfitting compared with the map equation because their resolution limits are almost the same when \(\lambda =1\).

Readers might wonder if the present approach is distinct from other variants of the map equation, such as the one with the Markov-time parameter12,28 and the Bayesian formulation of the map equation16,19 which is an improved teleportation method50. To clarify this point, we also conducted experiments analogous to those described in Experiments section using these methods in Supplementary Information (Section S5). In some cases, these methods also exhibit similar partitions as in Figs. 4 and 5. However, they are apparently not particularly suitable for pruning small modules because these methods are more sensitive/insensitive to the choices of the hyperparameters that we need to search for the optimal values in a finer scale/wider range, while balanced partitions are often obtained without tuning the hyperparameter \(\lambda \) in the single-trajectory map equation. We also emphasise that the single-trajectory map equation is not a generalisation of the map equation, but its raw form, and overfitting is avoided using the principle of the map equation itself.

The bootstrapping method51 is another approach for avoiding overfitting. However, this approach is computationally expensive19 and a comparison with the present approach is not very clear because the output is a population of partitions. A more detailed study of the qualitative and quantitative relationships between the single-trajectory map equation and other variants of the map equation is left for future work. Furthermore, because the single-trajectory map equation is a trajectory-based approach, the relationship between the memory-network extension10,13 of the map equation is another potential research direction because both take a set of trajectories as the input.

The time complexity of the optimisation algorithm is a major issue in the single-trajectory map equation. Whereas the lower bound \(\underline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) can be optimised as efficiently as Infomap, explicit construction of the codebooks is required for the actual average code length \(\overline{{\mathscr {L}}}\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \). Although our implementations of Infomap+ run within a reasonable amount of time for fairly large datasets as demonstrated in Supplementary Information Fig. 6, an improved implementation is also left for future work.

Methods

Optimisation algorithm

Herein, we explain the implementation details of the greedy heuristic. A typical greedy heuristic for community detection, including Infomap, iteratively merges two or more modules that improve the value of an objective function41,42,52 and equally-sized modules are often preferentially merged23. Such an update rule does not effectively compress the average code length at the stage of fine-tuning. This is not surprising because the initial partition is located at a local or global minimum of the objective function in the map equation, which may also be a local minimum in the single-trajectory map equation. Moreover, there is no reason that equally-sized modules should be preferentially merged. Although we typically have a few large and many small modules as the initial partition, it is unlikely that merging those small modules provides better compression of the average code length. Therefore, instead, we iteratively merge the smallest module and its most tightly-connected module regardless of the resulting value of the average code length until only one module is left; among the partitions that form with this merging process, we accept the partition that achieves the minimum average code length. Given an initial partition, this algorithm is deterministic. Although this algorithm is straightforward and the resulting partition may not be the global optimum, an improved compression of the average code length can be achieved by pruning small modules without being trapped into the local minima of the objective function.

When we use the lower bound \(\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) as the objective function, the greedy update can be performed as done in Infomap. The expanded form of \(\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) is

$$\begin{aligned}{}&\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) = \lambda \hat{{\textsf{q}}}_{\curvearrowleft } \log \hat{{\textsf{q}}}_{\curvearrowleft } -\lambda \sum _{\sigma } \hat{{\textsf{q}}}_{\sigma \curvearrowleft } \log \hat{{\textsf{q}}}_{\sigma \curvearrowleft } + \sum _{\sigma } \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } \log \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright } - \sum _{\sigma } \hat{{\textsf{q}}}_{\sigma \curvearrowright } \log \hat{{\textsf{q}}}_{\sigma \curvearrowright } - \sum _{i=1}^{N} \hat{{\textsf{q}}}_{i} \log \hat{{\textsf{q}}}_{i}. \end{aligned}$$
(14)

The last term in Eq. (14) is independent of partition. Therefore, when we merge two modules, we only need to keep track of changes in \(\hat{{\textsf{q}}}_{\sigma \curvearrowleft }\), \(\hat{{\textsf{q}}}_{\sigma \curvearrowright }\), and \(\hat{{\textsf{p}}}^{\sigma }_{\circlearrowright }\), which are defined in Eq. (12). In these quantities, \(\sum _{a=1}^{M} \delta _{\sigma ^{\prime }, \sigma _{\zeta _{a 0}} }\) is the population of the starting-point nodes in module \(\sigma \), \(\sum _{\sigma } \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }\) and \(\sum _{\sigma ^{\prime }} \hat{{\textsf{q}}}_{\sigma ^{\prime } \sigma }\) are the sums of the populations of the transitions across modules in the set of trajectories, and \(\sum _{i \in \sigma } \hat{{\textsf{q}}}_{i}\) is the sum of the node-visiting frequencies in module \(\sigma \). They are O(K) quantities, such that we can efficiently compute the change in \(\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) when two modules are merged.

When we use the actual average code length \(\overline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) as the objective function, the greedy update cannot be computed as efficiently as for \(\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \). When two modules are merged, we need to reconstruct the intra-module codebook of the target module as well as the inter-module codebook to compute the updated code length. The time complexity of constructing a codebook depends on the specific coding scheme applied. In Supplementary Fig. 6, we show the running times of Infomap and Infomap+ on the SBM; herein, we used the Infomap API53 (a C++-based implementation with a Python wrapper) for Infomap and our Python-based implementation33 for Infomap+.

Resolution limit

Readers might consider that the pruning effect implies that the proposed method is prone to underfitting. To examine this issue, we derive the resolution limit of the single-trajectory map equation focusing on \(\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) and network datasets. The resolution limit is the smallest module size that the method can identify given a network size such as the total number of edges.

The following analysis shows that, although the method has a relatively coarser-resolution scale compared with the standard or hierarchical map equation, it is still a high-resolution method. The analysis also provides a theoretical explanation of some of the empirical results we obtained through the experiments in Experiments section and an implication to the range that the hyperparameter \(\lambda \) should take.

General form

We closely follow the derivation in27, which is applied to undirected networks. The present resolution limit is for directed networks and the considered objective function is \(\underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right) \) instead of \(L\left( {\varvec{\sigma }} \right) \).

We first rewrite the empirical frequencies of the walkers and the objective function in the single-trajectory map equation in terms of network statistics. When the input trajectories are the edges in a directed network (i.e., the number of trajectories M is the number of directed edges), we have

$$\begin{aligned} \hat{{\textsf{q}}}_{\sigma \curvearrowright }&= \frac{\ell ^{\textrm{out}}_{\sigma }}{2M}, \quad \hat{{\textsf{q}}}_{\sigma \curvearrowleft } = \frac{\ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma }}{2M} + \frac{\ell ^{\textrm{in}}_{\sigma }}{2M}, \quad \hat{{\textsf{q}}}_{\curvearrowleft } = \frac{M+C}{2M}, \nonumber \\ \hat{{\textsf{p}}}^{\sigma }_{\circlearrowright }&= \frac{\ell ^{\textrm{out}}_{\sigma }}{2M} + \frac{2 \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \ell ^{\textrm{in}}_{\sigma }}{2M}, \quad \hat{{\textsf{q}}}_{i} = \frac{d^{\textrm{in}}_{i} + d^{\textrm{out}}_{i}}{2M}, \end{aligned}$$
(15)

where \(\ell _{\sigma }\) is the number of directed edges within module \(\sigma \); \(\ell ^{\textrm{in}}_{\sigma }\) and \(\ell ^{\textrm{out}}_{\sigma }\) are the numbers of in-coming and out-going edges of module \(\sigma \), respectively; \(d^{\textrm{in}}_{i}\) and \(d^{\textrm{out}}_{i}\) are the in- and out-degrees of node i; and C is the cut size of the network, i.e., the total number of directed edges that are crossing different modules. Using Eq. (15), the objective function is recast as

$$\begin{aligned} \underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right)&= \frac{1}{2M} \biggl ( \lambda (M+C) \log (M+C) + C + \sum _{\sigma } \underline{{\mathscr {L}}}^{\sigma }_{\lambda } + 2M \nonumber \\&\quad - \sum _{i=1}^{N} \left( d^{\textrm{in}}_{i} + d^{\textrm{out}}_{i} \right) \log \left( d^{\textrm{in}}_{i} + d^{\textrm{out}}_{i} \right) \biggr ), \end{aligned}$$
(16)

where

$$\begin{aligned} \underline{{\mathscr {L}}}^{\sigma }_{\lambda }&= -\lambda \left( \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \ell ^{\textrm{in}}_{\sigma } \right) \log \left( \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \ell ^{\textrm{in}}_{\sigma } \right) \nonumber \\&\quad + 2 \left( \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \frac{1}{2}\ell ^{\textrm{in}}_{\sigma } \right) \log \left( \ell _{\sigma } + \ell ^{\textrm{out}}_{\sigma } + \frac{1}{2} \ell ^{\textrm{in}}_{\sigma } \right) - \ell ^{\textrm{out}}_{\sigma } \log \ell ^{\textrm{out}}_{\sigma }. \end{aligned}$$
(17)

In the resolution-limit analysis, we consider two well-defined modules and derive the condition under which their merging is favoured (i.e., the modules are not resolved) for better optimisation of the objective function. Thus, we evaluate the condition such that the difference in the objective function \(\Delta \underline{{\mathscr {L}}}^{\sigma }_{\lambda }\) becomes negative when two modules are merged. We denote the labels of two well-defined modules as A and B and the merged module as AB. We also denote the change in \(\sum _{\sigma } \underline{{\mathscr {L}}}^{\sigma }_{\lambda }\) through the update as R, i.e.,

$$\begin{aligned} R = \underline{{\mathscr {L}}}^{AB}_{\lambda } - \underline{{\mathscr {L}}}^{A}_{\lambda } - \underline{{\mathscr {L}}}^{B}_{\lambda }. \end{aligned}$$
(18)

Here, R is a local quantity that depends only on the variables within/around modules A and B. When two well-defined modules are merged, the cut size is decreased by a small \(\delta \) (\(\delta \ll M + C\)). The difference in the objective function based on the update is

$$\begin{aligned} \Delta \underline{{\mathscr {L}}}_{\lambda }\left( {\varvec{\sigma }}; \{{\varvec{\zeta }}_{a}\} \right)&= \frac{1}{2M} \biggl ( \lambda (M+C-\delta ) \log (M+C-\delta ) - \lambda (M+C) \log (M+C) - \delta + R \biggr ) \nonumber \\&\simeq \frac{1}{2M} \biggl ( -\delta \lambda \log \left( e (M+C) \right) - \delta + R \biggr ), \end{aligned}$$
(19)

where e is the basis of the natural logarithm. Therefore, the resolution limit is generally expressed as

$$\begin{aligned} R \lesssim \delta \biggl ( 1 + \lambda \log \left( e (M+C) \right) \biggr ). \end{aligned}$$
(20)

In the map equation, the cut size C is the only global term that is responsible for the resolution limit (see equation (11) in27). By contrast, the single-trajectory map equation has the total number of directed edges M as another global term in Eq. (20). Note, however, that the contribution from M is logarithmic, implying that the single-trajectory map equation is still a high-resolution method. Next, we will derive a more explicit scaling.

Ring of cliques

It is common to consider a “ring of cliques” in a resolution-limit analysis, as illustrated in Fig. 8a. We consider m cliques (each of which consists of n nodes) and connect each with a single edge to form a ring. This is an undirected network. Again, we treat each undirected edge as a pair of directed edges in both directions. We regard each clique as a module, and using this example, derive the resolution limit in a more explicit form.

Figure 8
figure 8

Ring of cliques and its resolution limit. (a) Network plot with \(n=5\) and \(m=8\), and (b) the resolution limits of the map equation (light blue line), the single-trajectory map equation with \(\lambda =(1, 1.25, 1.5)\) (dark blue lines), and the modularity (dashed grey line). Each line represents the phase boundary above which a clique is not resolved as a module because the network is too large (e.g., the marked region represents the undetectable region in the map equation).

When we merge two of these cliques, the cut size is decreased by 2. We denote \(\ell _{\sigma } = n (n-1) = \ell \) for an arbitrary module. Assuming that \(\ell \gg 1\), we have

$$\begin{aligned} R \simeq -2 (\lambda -2) (\ell + 3) + 2(\lambda -1) \left( \log (e\ell ) \right) . \end{aligned}$$
(21)

Substituting Eq. (21) into Eq. (20), we obtain

$$\begin{aligned} \frac{\ell ^{\lambda -1} 2^{(2-\lambda )(\ell +3)-1}}{e} \lesssim \left( M+C \right) ^{\lambda }. \end{aligned}$$
(22)

Each clique is resolved as a module unless \((M+C)^{\lambda }\) is larger than the left-hand side of Eq. (22), which is an exponentially growing function with respect to the clique size \(\ell \).

Figure 8b depicts the resolution limits of the single-trajectory map equation, together with those of the map equation27 and modularity43. Although n and m are integers, we treat them as real numbers to highlight the scaling of each resolution limit. The resolution limit with \(\lambda =1\) is extremely close to that of the map equation. Therefore, the single-trajectory map equation is not prone to underfitting compared with the map equation.

When \(\lambda \) is large, modules with a small n are not resolved for any network size. However, the limit rapidly disappears as n becomes larger, whereas the resolution limit of the modularity disappears relatively slowly. This dependency of the resolution limit partially explains the favourable behaviour of the single-trajectory map equation, i.e., small modules are pruned yet large modules continue to be identified. However, as pointed out in the main text, the resolution limit does not describe the full behaviour of the single-trajectory map equation; it is not \(\lambda \) that plays a critical role in the method and \(\lambda =1\) is often sufficient to avoid overfitting.

In the left-hand side of Eq. (22), the leading coefficient in the exponent becomes negative at \(\lambda = 2\). In this case, a clique will not be resolved as a module for any network size regardless of its size n, i.e., the ability as a community detection method will be completely lost. This transition implies that the optimal value of \(\lambda \) is usually located within \(1 \le \lambda < 2\), which is indeed consistent with our experimental results in Fig. 6.