Introduction

Network science has been acknowledged as a crosscutting discipline that pursues the universal nature of network characteristics. Fundamental research on network topologies has been particularly significant for many applications1. Namely, the knowledge in network topologies is beneficial not only for a specific field but also for various areas in science, engineering, and technologies. Examples include the Internet, telecommunications, ecological systems, and transport networks2,3,4. It is necessary to analyze structural properties, such as the average path lengths or node degrees5,6,7. Furthermore, there is a strong demand to explore transport properties, which are the distribution of agents and their traffic in topological networks, where agents usually represent creatures, humans, or objects. Graph theory8,9 and queuing theory10,11 are effective tools for the theoretical analysis of traffic transportation12,13,14,15. However, the applications of these theories to complex real-world transport networks are limited. As a remedial measure, cellular automata16,17 or multi-agent simulations have been utilized as supportive but powerful approaches to complex network problems18,19,20.

Elucidating the effect of agents’ congestion avoidance on their collective dynamics in basic topologies can help deepen the understanding of the mechanism of “traffic jams” in various network sciences. However, such an effect has not been fully discussed in multivariate statistics. In studies of traffic transport networks, there have been two different approaches to analyzing network problems: conceptual and semi-empirical. The former examines the basic topologies or conceptual network models to grasp the essence, whereas the latter approach develops generic models and adapts them to complicated phenomena by optimizing parameters according to experimental data. Many related studies have primarily focused on applications in engineering; thus, they used semi-empirical approaches21,22,23,24. For example, several studies have parameterized multiple types of traffic information [e.g., incidents, congestion, and travel time] and calculated cost-minimization functions using optimization algorithms to improve traffic efficiency21,22. One study used traffic information as the cost of searching the shortest routes23 using the A-star algorithm25. Other studies applied the Bayesian network24 or logit regressions26 to a set of traffic information to determine the relationship between travel time and drivers’ preferences on real-world road networks. Briefly, many previous studies can be largely classified into three methodologies according to their application purposes: optimizations, graph theory, and regressions.

Several studies have applied multivariate statistics to traffic transport networks27,28,29. Notably, the literature29 presented an optimization of travel time by using traffic information provision. They focused on a general stochastic network that connects multiple origin-destination pairs, calculated the variance-covariance matrix of the route flow in the system, and utilized it with a network equilibrium model30,31,32,33. They demonstrated the shortening of the travel time in a road network in a city by using the stationary conditions of stochastic network equilibrium. Although the literature’ s study29 is impressive because it designed a multivariate statistical model of agent dynamics with traffic information provision, it assumed several constraints for application purposes. In particular, agents decide on their destinations only due to exogenous factors. In other words, traffic information is given to agents deterministically, and the resulting route choices of agents do not provide feedback into the input traffic information. This condition is required for the system to satisfy the multivariate normal distribution (mnd)34,35,36,37, which is indispensable for their optimization using stochastic network equilibrium. Unfortunately, real-world traffic networks are primarily endogenous. Congestion information is stochastically given to agents, and the resulting congestion status is often fed back into the input congestion information. It is known that the relationship between stochastic input variables and output distribution, that is, endogenous or exogenous of the network, seriously affects system dynamics. Because of these complexities, to the best of our knowledge, no study has fully discussed the effects of providing congestion information on the fundamental characteristics of endogenous traffic networks in terms of multivariate statistics.

This study aims to clarify the effect of agents’ congestion avoidance due to congestion information provision on traffic properties in a network topology. We focused on a star topology, in which a central primary node is connected to multiple secondary nodes that are isolated from each other. We can find star topologies in various fields. A star topology is one of seven basic networks and analyzing the flow of data in a star topology is essential to computer networks. Collective dynamics in a star topology have attracted attention in epidemiological or ecological metapopulations. In addition to these physical networks, a star topology is often used as a conceptual decision-making model by individuals with multiple choices. The convolution of numerous star topologies yields neural networks. Accordingly, clarifying the general characteristic of a star topology will also contribute to non-specific disciplines. In this study, we investigated the dynamics of a stochastic transportation network in which each agent at the primary node stochastically selects one of the secondary nodes according to the preference of agents for each secondary node. Here, the preference is modeled as the sum of the two components, one of which is positively proportional to the fixed access rate unique to the secondary node; the other is negatively proportional to the congestion rate of the secondary node. In fact, this type of model has often been implemented in collective dynamics. Specifically, the stochastic model that employs the sum of two terms, where the first term represents a fixed preference and the second term corrects the first term, can be found in studies on the behavior of ants38 and vehicles with congestion avoidance39,40; our model can correspond to a specific case of the models in these studies. Note that “preference” is rephrased with “pheromone” in38. In this manner, our approach has gained some recognition as a heuristic model in application fields. A detailed description of the preferences is presented in Section "Model". Throughout the paper, we refer to this relationship as “the preference affected by congestion” for a clear explanation. In numerical tests, we examined the following two scenarios: (1) each agent can access the same secondary node repeatedly and (2) each agent can access each secondary node only once. We refer to the former scenario as Scenario-1 and the latter as Scenario-2. We investigated the uniformity of agent distribution in the stationary state in Scenario-1 and measured the travel time for all agents visiting all the nodes in Scenario-2. This paper discusses the observations in terms of multivariate statistics.

A key feature of our model is that the system stochastically provides congestion information to agents. The agent distribution or expected number of agents at a secondary node in a stationary state depends on fixed preferences and the congestion rate, which is in proportion to the number of agents in each node. Accordingly, the number of agents operates as both input and output variables. The resulting agent distribution among the secondary nodes feeds back into the stochastic input variable regarding the congestion rate; the network is purely endogenous. In multivariate statistics, the endogenous system has been challenging to study because it often shows irregular behaviors that do not necessarily follow standard statistical models including the mnd. Therefore, this study investigates whether we can apply the framework of multivariate statistics to such an endogenous traffic network with a star topology. Furthermore, we theoretically derive a variance-covariance matrix without assuming that the agent distribution follows mnd, to discuss the generic case involving small systems. Section "Model" provides more details of our model. The results of this conceptual study can be helpful for many related scientific areas. Scenario-1 can provide a new perspective on metapopulation networks41,42. Scenario-2 reports an applied case of the traveling salesman problems43, where we consider the effect of congestion information provision on facilitating travel on a star network topology. To take a more familiar example, the finding for Scenario-1 can serve as a guide for applying the results to crowd management44 at event venues; consider a situation where visitors are dispersed across several local areas within an event venue. Event managers are typically expected to ensure that visitors are equally distributed across local areas to mitigate the risk of infection or accidents caused by dense crowds. Accordingly, our interest falls on the uniformity of visitor distribution among multiple event venues in Scenario-1. In contrast, the application of Scenario-2 is exemplified by a health examination, known as a typical optimization problem in task scheduling45,46,47. Here, the agents represent patients, and secondary nodes represent examination booths corresponding to different examination items; accordingly, our interest falls on the travel time for all patients visiting all the examination items in Scenario-2. These examples are critical because they can be direct applications of the findings of this study. Therefore, as mentioned above, we decided to measure the characteristics of interest in each scenario: the uniformity of the agent distribution in the steady state for Scenario-1, and the travel time of agents visiting all nodes for Scenario-2.

The remainder of this paper is organized as follows. In section "Methods", we provide a detailed description of the target system. We also derive analytical solutions for a specific case in which agents are provided with no traffic information by solving the state-transition equations. We then report the results of small preliminary simulations in both scenarios to clarify the following two characteristics: (a) the equalization of network usage owing to congestion avoidance of agents observed in Scenario-1. (b) the shortening effect of travel time due to the equalization effect observed in Scenario-2. Considering the results of the preliminary tests, we define a criterion indicating the uniform distribution of agents based on the above-mentioned analytical solution as a preparation for the subsequent sections. In Section "Results", we investigate the uniformity of agent distributions in Scenario-1. We observed that the uniformity of agent distribution shows three types of nonlinear dependence on the increase in nodes. We also report that congestion avoidance linearizes the travel time irrespective of the degree of reference to the congestion information in Scenario-2. In contrast, the travel time increases exponentially with the number of secondary nodes when not referring to congestion information at all. In Section "Discussion", we derive a theoretical model based on multivariate statistics and compare it with the simulation results, reporting that our model clearly describes the observed characteristic dependence in Scenario-1. In particular, our analysis indicates the following: the balance between the equalization of network usage by avoiding congestion and the covariance caused by mutually referring to congestion information determines overall uniformity. We also discuss why the travel time step exponentially increases when agents are provided with no congestion information and are linearized otherwise. Section "Conclusion" summarizes the results and concludes the study.

Figure 1
figure 1

Schematic of the target system.

Methods

Model

Consider a star topology with a primary node connected to M isolated secondary nodes with different preferences. We denote the total number of agents and the number of agents of the ith secondary node at the t discrete-time step and the fixed preference for the ith secondary node by agents as \(N_v\), \(N_i(t)\), and \(w_i\), respectively. For ease of understanding, we refer to \(w_{i}\) as a weight or the ith preference. Each agent at the primary node stochastically chooses one of the secondary nodes by referring to the preferences affected by the congestion rate of each secondary node. Accordingly, the probability that an agent moves from the primary node to the ith secondary node is expressed as

$$\begin{aligned} r_{i}:= & {} \frac{A}{\xi } \cdot \Biggl \{{w}_{i} + \alpha \biggl ( 1 - \frac{N_{i}(t)}{N_{v}}\biggr )\Biggr \}, \nonumber \\ \xi= & {} \sum _{i=1}^{M} \Biggl \{ {w}_{i} + \alpha \biggl ( 1 - \frac{N_{i}(t)}{N_{v}}\biggr ) \Biggr \}, \end{aligned}$$
(1)

where the parameter A determines the moving rate from the primary node to either of the secondary nodes and controls the outflow of agents among \(N_{v}\) agents from the primary to secondary nodes. By contrast, we set the return rate from the secondary nodes to the primary node as the l multiple of A; in other words, parameter l controls the inflow from the secondary nodes to the primary node. Parameter \(\alpha\) determines the effect of avoiding congestion on the decline in preference \(w_i\). Figure 1A shows the schematic of the target system from the viewpoint of decision-making; Each layer in the horizontal direction represents all possible choices in decision-making by an identical agent. For instance, the yellow \(L_j\)th layer has M secondary nodes that the jth agent can select. \(s_{i}\) indicates the primary node when \(i~=~0\), and otherwise indicates the ith secondary node. Figure 1B depicts all the possible states that an agent can take, corresponding to the yellow layer in Fig. 1A.

In the target system, all agents begin at the primary node. Each agent at the primary node stochastically selects one of the secondary nodes by referring to the preferences affected by the congestion rate of each secondary node. We examined the following two scenarios: (1) each agent can access the same secondary node repeatedly and (2) each agent can access each secondary node only once. We refer to the former scenario as Scenario-1 and the latter as Scenario-2. We investigated the uniformity of agent distribution in the stationary state in Scenario-1 and measured the travel time for all agents visiting all the nodes in Scenario-2. In Scenario-1, the average outflow of the number of agents moving from the primary to secondary nodes is controlled by parameter A. By contrast, in Scenario-2, the total probability is normalized among all the secondary nodes, and the agents leave out the probabilities of choosing the already visited secondary nodes. If an already visited node is selected in a trial, the system discards the trial; it can be considered a call-loss system in Scenario-2. All the agents are updated simultaneously in both scenarios. We introduce the following physical value U to evaluate usage in each node:

$$\begin{aligned} U:= & {} \frac{U_{f}}{N_{v}}, \end{aligned}$$
(2)

where \(U_{f}\) represents the number of agents in a secondary node after reaching a stationary state, and \(N_{v}\) denotes the total number of agents. In general, it is difficult to theoretically prove the existence of the stationary state in endogenous traffic networks. In this study, we measured U in a wide parameter range and confirmed that the system reaches the stationary state within the measured ranges; all simulation results in this study were measured after the system was confirmed to reach the stationary state. Regarding travel time in Scenario-2, we count the number of time steps required for all agents to complete visiting at all secondary nodes and refer to this as \(T_s\). Accordingly, we investigate the dependence of these two features U and \(T_{s}\) on the major parameters of the system. We then discuss these mechanisms from the perspective of multivariate statistics. The method for evaluating the uniformity of the distribution is described in Section "Evaluation of the uniformity of agent distribution".

Agent distribution

As depicted in Fig. 1B, each agent assumes one of the \(M+1\) possible states. The geometric feature of the star topology leads to the following rules: the state at which an agent exists on the primary node, \(S_0\), can be transitioned from one of the states between \(S_0\) and \(S_M\). In addition, each of the states between \(S_1\) and \(S_M\) only transitioned from state \(S_0\). Accordingly, the state at time step \(n+1\) is determined by the transitions from one of the \(M+1\) states at time step n; therefore, the state transition equations between time steps \(n+1\) and n can be described as \({\textbf{S}}^{(n+1)} = {\textbf{T}}{\textbf{S}}^{(n)}\), where \({\textbf{S}}^{(n)}\) is a state vector at time step n, represented by

$$\begin{aligned} {\textbf{S}}^{(n)}:= & {} \displaystyle \begin{bmatrix}S_0^n\\ S_1^n\\ S_2^n\\ \vdots \\ S_M^n\end{bmatrix}, \end{aligned}$$
(3)

where \(S^{n}_{x}\) (\(x=0,1,\cdots ,M\)) represents the state that an agent exists on the xth node at time step n; each element of \({\textbf{S}}^{(n)}\) corresponds to the state depicted in Fig. 1B. \({\textbf{T}}\) is a state transition matrix of size \(M+1 \times M+1\). The element \({e}_{ij}\) of \({\textbf{T}}\) in the ith column and the jth row is expressed as

$$\begin{aligned} e_{ij}= & {} {\left\{ \begin{array}{ll} ~1-A &{} (i~=~0~,~j~=~0), \\ ~lA &{} (i~=~0~,~1~\le ~j~\le ~M), \\ ~1-lA &{} (i~=~j~=k, ~1~\le ~k~\le ~M), \\ ~r_{i} &{} (1~\le ~i~\le ~M,~j~=~0), \\ ~0 &{} (otherwise). \end{array}\right. } \end{aligned}$$
(4)

The relationship \({\textbf{S}}^{(n+1)} \approx {\textbf{S}}^{(n)}\) was established after reaching a stationary state. The exact expression of the state vector \({\textbf{S}}^{(n)}\) is obtained by solving the state transition equations as follows:

$$\begin{aligned} {\textbf{S}}^{(n)}= & {} \displaystyle \begin{bmatrix}S_0^n\\ S_1^n\\ S_2^n\\ \vdots \\ S_M^n\end{bmatrix}~~=~~c/(l+1) \times \begin{bmatrix}1\\ r_{1}/(lA)\\ r_{2}/(lA)\\ \vdots \\ r_{M}/(lA)\end{bmatrix}, \end{aligned}$$
(5)

where c is the indeterminate parameter. Although \({\textbf{S}}^{(n)}\) is a state vector, we can regard it as a probability distribution by setting \(c=l\), which is derived from the normalization condition of \(|{\textbf{S}}^{(n)}|=1\). However, we can see the difficulty of solving this problem; In case \(\alpha \ne 0\), the system becomes endogenous; Namely, the number of agents at a stationary state is designated by the state \({\textbf{S}}^{(n)}\) whereas \({\textbf{S}}^{(n)}\) itself includes the number of agents \(N_{i}(t)\), as confirmed by the right-hand side of Eq. (1). The agent distribution at the steady state can be obtained because of this autonomous self-optimization. Let us elaborate on this point further. If the system is exogenous, it is possible to estimate the agent distribution using Eq. (1) and Eq. (5) after reaching the stationary state; however, our system is endogenous when \(\alpha \ne 0\). Namely, the resulting agent distribution is fed back into the input congestion information in the second term in the curly bracket of Eq. (1). In such cases, we must consider the case that the deviation \(\epsilon _c\) of the indeterminate parameter c exists as \(c = \acute{c} + \epsilon _c\) while \(\epsilon _c\) is sufficiently small that the relation \({\textbf{S}}^{(n+1)} \approx {\textbf{S}}^{(n)}\) still holds. Because the system is endogenous, \(\epsilon _c\) propagates and amplifies in the time direction, affecting the state of the system after a long period; eventually, \(\epsilon _c\) can cause the nonlinearity of the system. However, because \(\epsilon _c\) is unpredictable, it is impossible to estimate its effect theoretically. Therefore, performing numerical simulations is essential for investigating the system dynamics. In addition, the solution always includes an indeterminate parameter. In a wider sense, the target system can be said to be the Diophantine problem48. Meanwhile, in the case of \(\alpha =0\), the second term in the curly braces in Eq. (1) is always zero. Therefore, the system becomes exogenous, and we can obtain the exact solutions for Scenario-1 by setting c to l such that \({\textbf{S}}^{(n)}\) satisfies the normalization of \(|{\textbf{S}}^{(n)}|=1\).

Figure 2
figure 2

(a) Comparison of simulations with our theoretical model in Eq. (5) when setting \(\alpha = 0\) for different values of the parameter l between 0 and 3.0, (b) comparison of simulations at \(\alpha = 2.5\) for different values of l between 0 and 3.0 with the model of Eq. (5) at \(\alpha = 0\), and (c) comparison of simulations at \(l = 2.0\) for different values of \(\alpha\) between 0 and 3.0 with the model in Eq. (5) while keeping \(\alpha = 0\).

As a preliminary test for Scenario-1, we focused on a problem using \(M=3\). We set the preferences (\(w_{1}\), \(w_{2}\), \(w_{3}\)) to (1, 2, 3), where the subscript x of \({w}_{x}\) corresponds to the index of the secondary node. We set the parameters (\(N_v\), A) to (1,000, 0.1) in this test. We set the number of time steps for each simulation to 1,000, which was confirmed to be a sufficient number of time steps for reaching a stationary state in preliminary tests. Then, we measured the usage of each secondary node for different values of l and \(\alpha\). Figure 2a shows the comparison of simulations with our theoretical model in Eq. (5) when \(\alpha = 0\), for different values of the parameter l between 0 and 3.0. The solid lines and circle symbols in Fig. 2a respectively represent the simulations and model values, which agreed well in every measured range. As previously mentioned, we cannot obtain analytical solutions for all cases of \(\alpha \ne 0\) because the second term in the curly braces of Eq. (1) includes the number of agents \(N_{i}(t)\). Accordingly, we performed simulations for \(\alpha = 2.5\) as an example of \(\alpha \ne 0\) for different values of l between 0 and 3.0 and compared them with the model values of Eq. (5) with \(\alpha = 0\). The results are presented in Fig. 2(b) on a logarithmic scale, where the solid lines indicate the simulation results and the symbols represent the plots of the model in Eq. (5): Here, we made some interesting observations; as shown in Fig. 2b, the usage U of the first node was found to approach that of the second node, which had a neutral preference. In addition, U in the third node, which had the largest preference, approached the second node from the opposite direction. Namely, the usages of the three nodes were almost equal regardless of parameter l.

Similarly, we performed simulations for \(l=2.0\) for different values of \(\alpha\) between 0 and 3.0 and compared them with the model values of Eq. (5) with \(\alpha = 0\). The results are presented in Fig. 2c. As with the results in the l direction, the usage U of the first node was found to approach the second node, and that of the third node approached the second node from the opposite direction as \(\alpha\) increased. The increase in \(\alpha\) mitigated the imbalance and facilitated the equalization of usage U among nodes. To summarize the results of (b) and (c), we can say the following: even though the secondary nodes were set to have non-uniform preferences, the network behaved uniformly under specific conditions. In this paper, we refer to this observation as the “equalization effect”. In Section "Evaluation of the uniformity of agent distribution", we introduce a criterion to evaluate the uniformity of the agent distribution resulting from the equalization effect.

Evaluation of the uniformity of agent distribution

We define the imbalance of the system as the deviation of the state vector \({\textbf{S}}^{(n)}\) in a stationary state from an ideal state vector \({\textbf{S}}^{(n)}_h\), where each secondary node has the same number of agents. We directly obtain the state vector \({\textbf{S}}^{(n)}_h\) by replacing \(N_{i}(t)\) in Eq. (1) with \(N_{v}/M\). The total number of agents is equally divided by the number of secondary nodes, as follows:

$$\begin{aligned} {\textbf{S}}^{(n)}_h~~:= & {} ~~ \begin{bmatrix}S_{h, 0}^n\\ S_{h, 1}^n\\ S_{h, 2}^n\\ \vdots \\ S_{h, M}^n\end{bmatrix} = l/(l+1) \times \begin{bmatrix}1\\ \overline{\langle {r}_{1}\rangle }/(lA)\\ \overline{\langle {r}_{2}\rangle }/(lA)\\ \vdots \\ \overline{\langle {r}_{M}\rangle }/(lA)\end{bmatrix}, \nonumber \\ \overline{\langle {r}_{i}\rangle }:= & {} \frac{A}{\xi } \cdot \Biggl \{{w}_{i} + \alpha \biggl ( 1 - \frac{1}{M}\biggr )\Biggr \}, \nonumber \\ \xi= & {} \sum _{i=1}^{M} \Biggl \{ {w}_{i} + \alpha \biggl ( 1 - \frac{1}{M}\biggr ) \Biggr \}. \end{aligned}$$
(6)

By using Eq. (5) and Eq. (6), we define the imbalance ratio \(I_h\) as follows:

$$\begin{aligned} I_h := \frac{100}{M} \sum _{i=1}^{M} \frac{\sqrt{(S_{i}^{n} - S_{h, i}^{n})^2}}{S_{h, i}^n} , \end{aligned}$$
(7)

where \(S_{x}^{n}\) represents the xth element of the state vector \({\textbf{S}}^{(n)}\), and \(S_{h, x}^{n}\) indicates the xth element of an ideal state vector \({\textbf{S}}_{h}^{(n)}\). We introduced a constant value of 100 to express \(I_h\) as a percentage.

Travel efficiency on the network

As a preliminary test for Scenario-2, we investigated \(T_s\), the number of time steps required for all agents to complete visiting all secondary nodes for the system with \(M = 3\), similarly to Scenario-1. We set the parameters (\(N_v\), M, A, l, \(\alpha\)) to (1,000, 3, 0.1, 2.0, 2.5). Figure 3A shows the change in the usage U of each secondary node during \(T_{s}\). The blue chain lines indicate the results in the case of \(\alpha =0\), where each agent stochastically chooses its destination by referring only to fixed preferences. By contrast, the red solid lines represent the results when \(\alpha = 2.5\), where each agent stochastically chooses its destination by referring to the preferences affected by the congestion rate of each secondary node. The circular, square, and triangular symbols represent the first, second, and third nodes, respectively. Importantly, usage U tends to change moderately and is distributed equally among the three nodes in the case of \(\alpha = 2.5\) compared to the case of \(\alpha = 0\). This result indicates that congestion avoidance by agents can mitigate the imbalance of usage among secondary nodes in Scenario-2, facilitate the effective use of secondary nodes and improve the travel efficiency of the network. Figure 3B shows the dependence of \(T_s\) on the parameter \(\alpha\). We can confirm that \(T_{s}\) drastically decreases soon after \(\alpha\) becomes greater than zero, reaching a plateau at around \(\alpha > 1.5\).

In summary, it was confirmed from two preliminary tests for Scenario-1 and Scenario-2 that avoiding congestion can promote the equalization effect in both steady-state agent distribution and travel efficiency. Accordingly, in the following sections, we examine the equalization effect for general cases through simulations over a wider range of the number of secondary nodes, M, agents \(N_{v}\), and the parameter \(\alpha\).

Figure 3
figure 3

(A) Change of usage U of each node during the time taken for all agents to finish traveling to all the nodes, and (B) Dependence of the travel time for all agents visiting all the secondary nodes on the parameter \(\alpha\).

Figure 4
figure 4

Dependence of imbalance ratio \(I_h\) on the parameter M in different values of \(N_v\) and \(\alpha\) in Scenario-1.

Results

Simulation results for Scenario-1

Figure 4 shows the dependence of the imbalance ratio \(I_h\) on parameter M for different values of \(N_v\) and \(\alpha\) in Scenario-1. The dependence of \(I_{h}\) on M can be divided into three cases. The black line represents the case of uniform preferences with \(\alpha = 0\), where we set weight \(w_{i}\) to 1 for all i. We refer to the results of the black lines as (A). The blue line represents the case of weighted preferences with \(\alpha = 0\), where we set weight \(w_{i}\) to i for the ith secondary node. We refer to the results of the blue lines as (B). The red lines represent the cases of weighted preferences with \(\alpha \ne 0\), where we set weight \(w_{i}\) to i for the ith secondary node, as in case (B). We refer to the results of the red lines as (C). In the legend, (\(W_t\), \(\alpha\), \(N_v\)) represents the set of preference types, the parameter \(\alpha\), and the number of agents \(N_v\), where W indicates weighted preferences and U represents uniform preferences. Specifically, in case (A), we measured the dependence of \(I_{h}\) on parameter M for different values of \(N_{v}\) between 1, 000, 2, 000, 5, 000, 10, 000, and 30, 000 while keeping \(\alpha\) constant at zero. It was observed that \(I_{h}\) decreases as \(N_{v}\) increases, and we obtain four different black curves. In case (B), we set the same parameters as (A) for \(N_{v}=1,000\), except for the preferences mentioned above. In this case, \(I_h\) was observed to be greater than in the case of (A) with \(N_{v}=1,000\) in all areas of parameter M. In case (C), we set the same conditions as in case (B), except for \(\alpha \ne 0\). Consequently, \(I_h\) surges in a small area of \(\alpha\) and increases as its minimum value M increases. Thereafter, \(I_h\) converges to case (A) when \(N_{v}=1,000\), regardless of the degree of \(\alpha\), and the strength of the surges in \(I_h\) becomes larger as \(\alpha\) decreases. The reasons for the peculiar dependencies shown in Fig. 4 from the viewpoint of the multivariate statistics are discussed in Section "Discussion".

Figure 5
figure 5

Dependence of travel time steps \(T_s\) on parameters \(\alpha\), M, and \(N_v\) in Scenario-2.

Simulation results for Scenario-2

We measured the dependence of travel time steps \(T_{s}\) on parameters \(\alpha\), M, and \(N_{v}\) in Scenario-2. First, we set \(N_{v}\) to 1, 000 and then measured the dependence of \(T_{s}\) on parameters \(\alpha\), and M. Figure 5a shows the dependence of \(T_{s}\) on parameter \(\alpha\) for four different values of M from 3 to 49 on the semilogarithmic scale in the case of weighted preferences, where we set the weight \(w_{i}\) to i for the ith secondary node. As observed in the preliminary tests with \(M=3\), \(T_{s}\) decreases soon after \(\alpha\) becomes greater than zero and reaches a plateau in all cases of M. It was also observed that the difference in \(T_s\) between the cases of \(\alpha =0\) and \(\alpha \ne 0\) increases as M increases. Figure 5b shows the dependence of \(T_{s}\) on the parameter M in various \(\alpha\) ranges between 0 and 3.0 on the linear scale in the cases of both weighted and uniform preferences. Notably, \(T_{s}\) was observed to increase exponentially when the preferences were weighted with \(\alpha =0\). By contrast, \(T_s\) linearly increased to a similar degree when the preferences were otherwise weighted with \(\alpha \ne 0\) or uniform. An interesting point is that we obtain the same results as uniform preferences when \(\alpha \ne 0\), which is almost irrelevant to the degree of parameter \(\alpha\). In addition, Fig. 5c shows the dependence of \(T_{s}\) on the number of agents \(N_{v}\) for \(\alpha = 0\) and for \(\alpha = 2.5\), which is a representative case of \(\alpha \ne 0\). By fitting each case using a function of \(y=a\mathrm{log(x)}+b\) where a and b are constants, it was found that \(T_{s}\) increased more moderately when \(\alpha \ne 0\) compared to when \(\alpha = 0\); We confirmed from Fig. 5c that the stability of the network in the \(N_{v}\) direction also increased because of the equalization effect.

Discussion

Multivariate statistics for agent distribution on a star-topology

Multivariate statistics have been utilized in a wide range of physics, as exemplified by the singular value decomposition of quantum spectra fluctuation49, occupancy correlations in lattice gas model50, replica analysis51, and data analysis in experimental particle physics52. In this section, we describe the uncertainty of imbalance ratio \(I_{h}\) from the viewpoint of multivariate statistics. In the definition of the imbalance ratio \(I_{h}\) in Eq. (7), \(S_{h, i}^{n}\) is a deterministic value because \(S_{h, i}^{n}\) holds only constant parameters: preference \(w_{i}\), control parameter \(\alpha\), and number of secondary nodes M. By contrast, \(S_{i}^n\) is a stochastic variable because it holds \(N_{i}(t)\), as confirmed from Eq. (1) and Eq. (5); accordingly, \(I_h\) can be represented as a function of the set of stochastic variables \(S_{i}^{n}\) (\(i=1,2,\cdots ,M\)) as follows:

$$\begin{aligned} I_h(S_1, S_2, \cdots , S_i,\cdots , S_M), \end{aligned}$$
(8)

where \(S_{i}\) is an abbreviation for \(S_{i}^{n}\) and \(I_h\) is a mapping from the set of (\(S_{1}\), \(S_{2}\), \(\cdots\), \(S_M\)) from a mathematical perspective. According to the multivariate statistics, the propagation of uncertainty \(\sigma _{I_h}\) of \(I_h\) is described as follows:

$$\begin{aligned} \sigma _{I_h}= & {} \sqrt{D}, \end{aligned}$$
(9)
$$\begin{aligned} D= & {} \begin{pmatrix} \frac{\partial I_h}{\partial S_1} \\ \frac{\partial I_h}{\partial S_2} \\ \vdots \\ \frac{\partial I_h}{\partial S_M} \\ \end{pmatrix}^t \begin{pmatrix} {\sigma _{1}}^2&{} {\sigma _{12}} &{} {\sigma _{13}} &{} \cdots &{} {\sigma _{1M}} \\ {\sigma _{21}} &{} {\sigma _2}^2 &{} {\sigma _{23}} &{} \cdots &{} {\sigma _{2M}} \\ {\sigma _{31}} &{} {\sigma _{32}} &{} {\sigma _3}^2 &{} \cdots &{} {\sigma _{3M}} \\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ {\sigma _{M1}} &{} {\sigma _{M2}} &{} {\sigma _{M3}} &{} \cdots &{} {\sigma _M}^2 \\ \end{pmatrix} \begin{pmatrix} \frac{\partial I_h}{\partial S_1} \\ \frac{\partial I_h}{\partial S_2} \\ \vdots \\ \frac{\partial I_h}{\partial S_M} \\ \end{pmatrix}, \end{aligned}$$
(10)

where \({\partial I_h}/{\partial S_i}\) represents the partial derivative of \(I_h\) with respect to variable \(S_i\). The center bracket represents the variance-covariance matrix of the system (hereafter referred to as \({\textbf{E}}\)). \({\sigma _{i}}^2\) represents the variance of \(I_h\) with respect to variable \(S_{i}\), and \(\sigma _{ij}\) indicates the covariance of \(I_h\) between variables \(S_{i}\) and \(S_{j}\). Equation (9) can be expressed in scalar form as

$$\begin{aligned} \sigma _{I_h} = \sqrt{\sum _{i=1}^{M} \biggl (\frac{\partial I_h}{\partial S_i}\biggr )^{2}\sigma _{i}^2 + \sum _{i \ne j}^{M} \biggl (\frac{\partial I_h}{\partial S_i}\biggr ) \biggl (\frac{\partial I_h}{\partial S_j}\biggr )~\sigma _{ij} }. \end{aligned}$$
(11)

The first and second terms inside the square root of Eq. (11) are the contributions of the diagonal and non-diagonal components of the variance-covariance matrix \({\textbf{E}}\), respectively. In matrix \({\textbf{E}}\), the covariance \(\sigma _{ij}\) becomes zero when the variables \(N_{i}\) and \(N_{j}\) are uncorrelated, and the second term inside the square root of Eq. (11) vanishes if all the variables are uncorrelated.

According to the theory of errors, the measured value \(I_{h}\) can be decomposed into the mean value \(\langle I_h \rangle\) and the uncertainty \(\sigma _{I_h}\) as \(I_{h}\) = \(\langle I_h \rangle + \sigma _{I_h}\). When \(\alpha =0\), \(S_{i}\) and \(S_{h, i}^n\) become equal since the second terms in \(r_i\) in Eq. (1) and \(\langle r_{i} \rangle\) in Eq. (6) vanish; from the definition of \(I_h\) in Eq. (7), the mean \(\langle I_{h} \rangle\) can be estimated as zero. Meanwhile, when \(\alpha \ne 0\), the number of agents in each secondary node is equalized after reaching a stationary state owing to the equalization effect. Accordingly, we can assume that the difference between \(S_{i}^{n}\) and \(S_{h, i}^n\) becomes sufficiently small to be negligible, and the relationship of \(S_{i}^{n} \approx S_{h, i}^n\) can be established for a sufficiently large \(\alpha\). In other words, \(\langle I_h \rangle\) can be approximately zero or a small number \(\epsilon\). In summary, we obtain the following relationship:

$$\begin{aligned} I_h = {\left\{ \begin{array}{ll} \sigma _{I_h}&{} (\alpha = 0)\\ \sigma _{I_h} + \epsilon &{} (\alpha \ne 0). \end{array}\right. } \end{aligned}$$
(12)

When \(\alpha =0\), each agent chooses its direction only by referencing fixed preferences. In this case, all resulting statistics of (\(N_1\), \(N_2\), \(\cdots\), \(N_M\)) in the secondary nodes are uncorrelated. Because of the linear transformation relationship, all stochastic variables (\(S_1\), \(S_2\), \(\cdots\), \(S_M\)) are also uncorrelated. Accordingly, the second term in the square root of Eq. (11) vanishes, the covariances in the non-diagonal components of \({\textbf{E}}\) are zero; we obtain the value of parameter \(\sigma _{I_h}\) only from the first term in Eq. (11). Because we can calculate \({\partial I_{h}}/{\partial S_i}\) by differentiating Eq. (7) with respect to \(S_{i}\), the remaining parameter that must be estimated is the deviation \(\sigma _{i}\). By contrast, when \(\alpha \ne 0\), we need to estimate \(\sigma _{ij}\) in addition to \(\sigma _{i}\) because of the emergence of the correlations among secondary nodes. Here, each agent selects the ith node from among the M secondary nodes with probability \(r_{i}\) and avoids the ith node with probability \(1 - r_{i}\) at each time step, as mentioned in Section "Model". Therefore, the system follows a Bernoulli trial, where the deviation is represented by \(\sigma = \sqrt{Np(1 - p)}\). N indicates the number of statistics and r represents the probability. The details of the derivations of \(\sigma _{i}\) and \(\sigma _{ij}\) in each case are given as follows.

In case (A), we can replace p with \(S_{h, i}^n\) because it represents the probability of finding an agent in the ith secondary node in a stationary state. In addition, the total number of statistics is proportional to \(N_{v} M\), which is the number of secondary nodes multiplied by the total number of agents in the target system. We introduce a constant parameter \(\sigma _{0}\). Consequently, an approximation for the uncertainty \(\sigma _{i}\) is represented as follows:

$$\begin{aligned} \sigma _{i} \approx \sigma _{0}\sqrt{N_v M {S_{h,i}^n} (1- {S_{h,i}^n})}. \end{aligned}$$
(13)

In cases (B) and (C), it is necessary to modify Eq. (13) owing to weighted preferences. The serial indices are set to the secondary nodes as linearly weighted preferences; the ith node has weight \(i/W_{s}\), where \(W_{s}\) is the sum of the numbers from 1 to M, which is \(M(M +1)/2\). When two variables X and Y have a linear relationship \(Y = aX + b\), their variances \(\sigma _{Y}^{2}\) and \(\sigma _{X}^{2}\) satisfy \(\sigma _{Y}^2 = a^{2}\sigma _{X}^2\). Therefore, the variance of the ith secondary node for weighted preferences differs from that for uniform preferences. We estimate the variance in the ith secondary node as follows: We consider a linear transformation of \(S_{h, i}^{n}\) from the state vector \(\overline{S_{h, i}^n}\) as \(a = (S_{h,i}^{n}-b) / \overline{S_{h, i}^{n}}\), where \(\overline{S_{h, i}^{n}}\) is a state vector with uniform preferences of \(\alpha =0\) obtained by setting \(w_i\) to 1 for all i, which is expressed as \(\overline{S_{h, i}^n} = 1/(l+1)M\). We modify Eq. (13) to be proportional to parameter a as follows:

$$\begin{aligned} \sigma _{i} \approx \mu M (S_{h,i}^{n}-b) \sqrt{N_v M {S_{h, i}^{n}} (1- {S_{h, i}^{n}})}, \end{aligned}$$
(14)

where we introduce a constant factor \(\mu\) in addition to the parameter b for a simple expression.

In case (C), it is necessary to calculate the covariance between the secondary nodes. In this case, agents choose their destinations by referring to the congestion rates of all secondary nodes, which suggests that the statistics in a secondary node depend on the congestion status of the other secondary nodes. In other words, the statistics of the different secondary nodes correlate with each other. Accordingly, it is necessary to calculate both the diagonal and non-diagonal components of the variance-covariance matrix \({\textbf{E}}\). We estimate the values of the covariance components of \({\textbf{E}}\) as follows: First, we recall that there is a general relationship between the two correlating stochastic variables \(S_i\) and \(S_j\) as \(\sigma _{ij}= \langle S_{i} S_{j} \rangle - \eta _{i}\eta _{j}\), where \(\sigma _{ij}\) is the covariance between \(S_i\) and \(S_j\), \(\eta _{i}\) and \(\eta _{j}\) represent the means of \(S_{i}\) and \(S_{j}\), and \(\langle S_{i} S_{j} \rangle\) is the mean of the multiples of \(S_{i}\) and \(S_{j}\). We evaluate \(\eta _{x}~(x=i,j)\) by its arithmetic mean, which can be obtained by summing up \(S_{h,i}^n\) for i from 1 to M and dividing it by M; the resulting \(\eta _{x}\) is \(1/(l+1)M\), which yields the relationship \(\sigma _{ij} = \langle S_{i} S_{j} \rangle - \{(l+1)M\}^{-2}\). Subsequently, because \(\langle S_{i} S_{j} \rangle\) is the expected value of \(S_{i}S_{j}\), it is evaluated by multiplying state \(S_{i}S_{j}\) by probability \(S_{h,i}^{n}S_{h,j}^{n}\). Here, \(S_{x}\) includes an indeterminate parameter c that characterizes the system as a Diophantine problem, as shown in Eq. (5) in Section "Agent distribution". We mentioned that \(c=l\) is required for normalization when we refer to \(S_x\) as the probability; however, the parameter c remains unspecified when referring to \(S_{x}\) as a system state. It is necessary to explicitly represent the parameter c when describing an arbitrary state \(S_x\) because parameter c may be a major contributor to the observed nonlinear phenomena. Because \(S_{h,i}^{n}\) is already normalized, as in Eq. (6), we can express \(S_{i}\) by \(S_{h,i}^{n}\) as \(S_{i} \approx {\bar{c}}S_{h, i}^{n}\), where \({\bar{c}}=c/l\). The mean \(\langle S_{i} S_{j} \rangle\) can be expressed as \(({\bar{c}}S_{h,i}^{n} S_{h,j}^{n})^2\). In this stage, the covariance can be represented as \(\sigma _{ij} = ({\bar{c}}S_{h,i}^{n} S_{h,j}^{n})^2 - \{(l+1)M\}^{-2}\).

Recall that the dependence of \(I_{h}\) on M in case (C) was observed to approach that in case (A) as parameter \(\alpha\) increased. Specifically, the surges of \(I_h\) in a small area of M were observed to decrease as \(\alpha\) increased, as shown in Fig. 4; agents were equally distributed among secondary nodes as the parameter \(\alpha\) increased. This corresponds to the fact that the agent distribution in a stationary state gets closer to the uniform distribution as \(\alpha\) increases. We can say that the correlations among different secondary nodes become negligible when agents are kept equally distributed among secondary nodes, compared to the case of a biased agent distribution. This is further explained as follows: when the amount of usage in different secondary nodes is equal among secondary nodes, the second terms in the bracket of the numerator in Eq. (1) for all secondary nodes have approximately equal values. Thus, the difference in their congestion rates hardly contributes to the agents’ decision-making to choose destinations. In brief, the covariance decreases as \(\alpha\) increases. To reflect this, we introduced a phenomenological scale parameter \(C_d\) that controls the intensity of \(\sigma _{ij}\). Consequently, we obtained the following expression for variance \(\sigma _{ij}\):

$$\begin{aligned} \sigma _{ij} \approx C_d\biggl \{ \bigl ({\bar{c}}~S_{h,i}^{n} S_{h,j}^{n}\bigr )^2 - \frac{1}{(l+1)^2M^2}\biggr \}. \end{aligned}$$
(15)

In summary, our theoretical model has scale parameters of \(\sigma _{0}\) for case (A), \(\mu\) for case (B), and (\({\bar{c}}\), \(C_d\), \(\mu\)) for case (C). We determined these scale parameters by fitting the experimental data because the scale of the system state is indeterminate, as represented by parameter c in Eq. (5). Specifically, in case (A), we first determined \(\sigma _{0}\) by fitting the case of \(N_v=1,000\) using least squares and used the same value when plotting model value of \(I_{h}\) in other cases of \(N_v=2,000\), \(N_v=5,000\), \(N_v=10,000\), and \(N_v=30,000\). In case (B), we determined \(\mu\) in a manner similar to that in case (A). In case (C), we searched for the optimal condition of (\({\bar{c}}\), \(C_d\), \(\mu\)) that reproduces the measurements shown in Fig. 4. In addition, we investigated the dependence of the phenomenological scale parameter \(C_d\) on parameter \(\alpha\).

Figure 6
figure 6

(a) Comparison of our models with the simulation results in cases of (A), (B), and (C) in Fig. 4 in Scenario-1, (b) dependence of parameter \(C_{d}\) on parameter \(\alpha\), (c) comparison of our models with the simulation results in two cases of uniform preferences and the weighted preferences with \(\alpha =0\) in Scenario-2, and (d) dependence of parameter \(C_s\) on parameter \(\alpha\).

Figure 6a shows the comparisons of our theoretical models from Eq. (13) to Eq. (15) with the measurements shown in Fig. 4. Each solid line corresponds to the result with the same color and symbol in Fig. 4. The theoretical models show good agreement with measurements in all three cases from (A) to (C). The resulting \(\sigma _{0}\) obtained by fitting the measurement in the case of (A) with \(N_{v} = 1,000\) was \(1.316 \times 10^{2}\) in this test. The other plots for various \(N_{v}\) in case (A) were obtained using the determined \(\sigma _{0}\). As a result of fittings, the optimal values of \(\mu\) for case (B) and \(({\bar{c}}, \mu )\) for case (C) were obtained as \(3.072\times 10^{-4}\) and (\(1.0 \times 10^{3}\), \(2.258 \times 10^{-4}\)), respectively. Note that the parameter \(\epsilon\) in Eq. (12) was confirmed to be zero.

The model in case (B) calculates only the diagonal components of the variance-covariance matrix \({\textbf{E}}\). On the other hand, the model in case (C) calculates the non-diagonal components of \({\textbf{E}}\) as well as the diagonal components using a generic statistical relationship: \(\sigma _{ij} = \langle S_{i} S_{j} \rangle - \eta _{i}\eta _{j}\) since it is expected that the statistics in different secondary nodes correlate from each other as a result of agents mutually referring to the congestion rate of the secondary nodes. In this respect, we can confirm from Fig. 6a that the contributions from the non-diagonal components of \({\textbf{E}}\) reproduce the characteristic surges of \(I_{h}\) in small areas of the parameter M and converge to the case of uniform preferences. This suggests that mutually referring to the congestion information primarily causes the surges of \(I_h\), that is, the deterioration of uniformity. In other words, referencing congestion information can make the uniformity worse rather than better if the degree of reference to the congestion information is insufficient in small systems.

As mentioned in the Introduction section, the network becomes endogenous when the system provides agents with congestion information to balance agents among nodes because the resulting agent distribution feeds back into the input congestion rates; hence, our findings can be helpful when controlling such endogenous traffic networks that provide agents with congestion information as traffic information in real-world cases. In addition, Fig. 6b shows a plot of the values of \(C_{d}\) for different values of \(\alpha\) in case (C), where \(D=1.3 \times 10^{-6}\). It was found that parameter \(C_{d}\) is approximately proportional to the reciprocal of the square of parameter \(\alpha\). We can confirm the following from Fig. 6b. When agents avoid congestion linearly to the congestion rate with the scale of \(\alpha\), the non-diagonal components of the variance-covariance matrix varies approximately depending on the inverse square of the parameter \(\alpha\). This finding also serves as a guide for applying the results to crowd management at event venues. Consider a situation where visitors are dispersed across several local areas within an event venue. Event managers are typically expected to ensure that visitors are equally distributed across local areas to mitigate the risk of infection or accidents caused by dense crowds. However, determining how forcefully visitors need to be guided is difficult, since the extent to which visitors will obediently follow a guide is unknown. Our results indicate that if visitors respond linearly to congestion information in proportion to parameter \(\alpha\), the uniformity of visitor distribution improves as \(\alpha\) increases. This effect can be attributed to the fact that the degree of mutual reference, i.e., the potential source of the surge in imbalance, decreases approximately depending on the inverse square of parameter \(\alpha\). This knowledge can be used as an indicator for event managers to successfully distribute visitors by comprehending how strongly visitors respond to the information provided to them.

Consequently, our theoretical model accurately describes the mechanism of the target system; our analysis corroborated that the balance between the equalization of network usage by avoiding congestion and the amplification of covariance caused by a mutual reference to congestion information determines the overall uniformity of a network with star topology.

Figure 7
figure 7

(a) Dependence of \(\Sigma _1\) and \(\Sigma _2\) on parameter M, and (b) dependence of the total time steps \(T_s^h\) required for hopping from the primary to secondary nodes on parameter M when weighted preferences in respective cases of \(\alpha = 0\) and \(\alpha \ne 0\).

Traveling time in Scenario-2

The most straightforward way to reproduce the travel time steps \(T_{s}\) in Scenario-2 is to assume that the times for moving from the primary to the secondary node and returning from the secondary node are proportional to the reciprocals of the hopping and return probabilities. For example, when uniform preferences with \(\alpha =0\), the hopping probability is given by 1/M for each secondary node, and the returning probability is given by lA, as explained in Section "Model". Hence, the travel time step required to visit a cell, \(t_s\), can be described as \(t_{s} = \eta M^{-1} + \zeta {lA}^{-1}\), where \(\eta\) and \(\zeta\) are the coefficients. By summing \(t_s\) from 1 to M, we can obtain the expression of \(T_{s}\) as follows:

$$\begin{aligned} T_s = \eta + \zeta \frac{M}{lA} . \end{aligned}$$
(16)

In the case of weighted preferences, we can calculate the travel time steps \(T_{s}\) similarly for uniform preferences after replacing 1/M with the inverse of the probability \(S_{h, i}^n\). Specifically,

$$\begin{aligned} T_s = \eta \sum _{i=1}^{M} \frac{1}{S_{h,i}^{n}} + \zeta \frac{M}{lA}. \end{aligned}$$
(17)

When \(\alpha =0\), Eq. (17) can be further broken down as

$$\begin{aligned} T_s= & {} \eta \frac{(l+1)M(M+1)}{2} \sum _{i=1}^{M} \frac{1}{i} + \zeta \frac{M}{lA} \nonumber \\\approx & {} \eta \frac{(l+1)M(M+1)}{2} \biggl (\textrm{ln} M + \gamma + \frac{1}{2M} \biggr ) + \zeta \frac{M}{lA}, \end{aligned}$$
(18)

where we use the mathematical relationship of \(\sum _{i=1}^{k} 1/i \approx \textrm{ln} k + \gamma + 1/2k\); \(\gamma\) represents the Euler–Mascheroni constant, which nearly equals 0.57721566453.

In Fig. 6c, the solid red line with the star symbol and the solid blue line with the circle symbol, respectively, indicate the measurements of uniform preferences and weighted preferences with \(\alpha =0\). The dashed lines represent the fitting lines obtained using Eq. (17) and Eq. (18). We confirmed that our model describes the variation in the traveling time step in Scenario-2. Figure 6d shows the dependence of the ratio of \(\eta\) or \(\zeta\) on the sum of the parameters \(\alpha\) in the case of the weighted preferences obtained by fitting the measurements by our model in Eq. (18) for all cases of \(\alpha \ne 0\), where \(C_{d}\) represents \(\eta /(\eta +\zeta )\) or \(\zeta /(\eta +\zeta )\). It was confirmed that the relative ratio of \(\zeta\), which is the coefficient of the return time, increases, and the relative ratio of \(\eta\), which is the coefficient of the hopping time from primary to secondary, decreases; the time required for the outward trip was confirmed to decrease as a result of agents avoiding congestion due to the increase in parameter \(\alpha\).

In addition, we can show the exponentiality and linearity of time steps \(T_s\) in the respective cases of weighted preferences with \(\alpha =0\) and \(\alpha \ne 0\) from a different angle as follows: As mentioned in Section "Model", the total probability is normalized for all the secondary nodes, and agents lose the probabilities of choosing the already-visited secondary nodes; if an already-visited node is selected in a trial, the system discards the trial. Therefore, the probability of hopping to any of the secondary nodes at the time of kth visiting, \(P_{u}\), is given as a complement of the sum of the probabilities of hopping to one of the secondary nodes that have been already visited before the kth visiting, \(P_{a}\), that is, \(P_{u}+P_{a}=1\), where the subscripts u and a indicate the “unvisited” and “already-visited” nodes, respectively. When an agent avoids congestion owing to \(\alpha \ne 0\), the probability of hopping to one of the secondary nodes becomes equalized and can be approximately expressed as 1/M; \(P_a\) at the time of the kth visit \((k=1,2\cdots , M)\) is given as k/M, and \(P_u\) is expressed as \((1-k/M)\). Here, we assume that the average time step required to hop from the primary node to one of the secondary nodes at the kth visit, \(t_s^{k}\), is proportional to the inverse of the value of \(P_u\) at the kth visit; \(t_s^{k}\) can be expressed as \(\eta (1-k/M)^{-1}\). Accordingly, the total time steps required for hopping from the primary node to the secondary nodes, \(T_{s}^{h}\), can be obtained by summing up \(t_s^{k}\) for k; A simple calculation leads to:

$$\begin{aligned} {T_{s}^{h} = \eta M \sum _{k=1}^{M}\frac{1}{k}. } \end{aligned}$$
(19)

Here, the summation in Eq. (19) can be approximated as \(\sum _{k=1}^{M} 1/k \approx \textrm{ln} M + \gamma + 1/2M\), similar to Eq. (18). We refer to the summation in Eq. (19) as \(\Sigma _{1}\). As we can observe from Euler’s approximation, \(\Sigma _{1}\) shows a moderate dependence on the parameter M, as shown by the blue line in Fig. 7a. Consequently, the component \(\eta M\) of Eq. (19) is confirmed to be dominant in the M-dependence of \(T_s^h\), as indicated by the blue line with the circle symbol in Fig. 7b. Accordingly, the linearity of \(T_{s}\) is confirmed for weighted preferences with \(\alpha \ne 0\).

Meanwhile, when \(\alpha = 0\), agents move to one of the secondary nodes only according to the fixed preference i for the ith secondary node. Because the secondary node with a larger weight is preferentially visited, \(P_a\) at the time of the kth visit \((k=1,2\cdots , M)\) can be expressed as \(\sum _{j=1}^{k}(M-j+1)/W\) as a typical case, where W is the sum of the preference i from 1 to M, which is \(M(M+1)/2\). In this case, \(P_u\) is expressed as \(1-\sum _{j=1}^{k}(M-j+1)/W\). We assume that the average time steps \(t_{s}^k\) is proportional to the inverse of the value of \(P_u\) at the kth visit, similar to Eq. (19). Then, \(T_s^h\) is calculated by summing \(t_s^k\) for k; the resulting expression of \(T_{s}^{h}\) is obtained as follows:

$$\begin{aligned} {T_{s}^{h} = \eta M(M+1)\sum _{i=0}^{M-1}\frac{1}{M(M+1)-i(2M-i+1)}. } \end{aligned}$$
(20)

Here, we refer to the summation of Eq. (20) as \(\Sigma _{2}\): The dashed red line in Fig. 7(a) shows the dependence of \(\Sigma _{2}\) on the parameter M, which becomes sluggish at approximately \(M=25\), and then converges to one. Consequently, component \(\eta M(M+1)\) in Eq. (20) is confirmed to be dominant in the M-dependence of \(T_{s}^h\), as indicated by the red line with the square symbol in Fig. 7b. Accordingly, the exponentiality of \(T_{s}\) is confirmed for weighted preferences, with \(\alpha = 0\). Notably, we describe all the cases observed in Fig. 5b using the theoretical models represented by Eq. (16) into Eq. (20): Uniform preferences with \(\alpha =0\), weighted preferences with \(\alpha =0\), and weighted preferences with \(\alpha \ne 0\).

A star topology is often used as a conceptual decision-making model by individuals with multiple choices. An example of the application of Scenario-2 is a health examination. In this case, the agents represent patients and M secondary nodes represent examination booths corresponding to M different examination items. Additionally, \(w_{i}\) represents the priority of the ith examination item. When \(\alpha = 0\), the patient tries to visit the examination booths in a predetermined order. Whereas, when \(\alpha \ne 0\), the patient will preferentially select and move to the vacant booths. Our results indicate that by considering congestion information and allowing patients to be preferentially selected for examination in vacant examination sites, we can linearize the time it takes for everyone to complete their examination. On the other hand, from another perspective, the relationship between \(P_u\) and \(P_a\) is similar to that between the fatigue from visiting congested nodes and the motivation to leave the primary node; \(P_a\) increases by approximately 1/M every time returning to the primary node when the preferences were weighted with \(\alpha \ne 0\) or uniform. By contrast, \(P_a\) is \(\sum _{j=1}^{k} (M-j + 1)/W\) at the time of the kth visit when weighted preferences with \(\alpha =0\) because every agent tends to visit the nodes with higher preferences. Namely, the nodes with higher preferences get more congested. If we assume that the fatigue from visiting a node is proportional to the degree of congestion in the node, \(P_a\) represents accumulated fatigue due to visiting congested nodes. Our results suggest that fatigue moderately increases when \(\alpha \ne 0\) because agents can avoid congestion; however, it drastically builds up when \(\alpha = 0\) because they face congestion. The discussion here suggests that our results can be applied to problems in a variety of fields by interpreting the physical meaning of parameters \(P_u\) and \(P_a\) from different angles.

Although this paper focuses on basic star topologies, our findings contribute to the understanding of complex networks. Generally, several complex networks can be decomposed into multiple superimposed or connected star topologies. Consider an undirected weighted random graph with \(M+1\) nodes. Connecting the ith node to another stochastically selected node from the remaining M nodes to generate an edge is equivalent to creating a combination of primary and secondary nodes in our system. Therefore, in the case where weights are assigned to nodes, in a manner similar to that of Scenario-1 of this study, the nonlinearity shown in Fig. 4 might be observed in the random graph at an equilibrium state. Previously, the relationship between the preferences of agents and the topology of networks has been studied in statistical physics involving complex networks54; our study can contribute not only to a single-star topology but also to complex networks.

Conclusion

The importance of fundamental research on network topologies has been widely acknowledged in many scientific areas. This study examined the effect of congestion avoidance of agents given congestion information on optimizing traffic in a star network topology. We investigated the dynamics of a stochastic transportation network in which each agent at the primary node stochastically selects one of the secondary nodes by referring to the declining preferences based on the congestion rate of the secondary nodes. We examined the following two scenarios: each agent can repeatedly access the same secondary node, or each agent can access each secondary node only once. We refer to the former scenario as Scenario-1 and the latter as Scenario-2. We measured the uniformity of agent distribution in the stationary state in Scenario-1, and we measured the travel time for all agents visiting all the nodes in Scenario-2. The findings of this study are summarized as follows.

In Scenario-1, the uniformity of agent distribution was found to show three types of nonlinear dependences on the increase of nodes. We found that multivariate statistics describe these characteristic dependences well, revealing the existence of the optimal number of secondary nodes that makes the agent distribution most uniform. Our theoretical analysis corroborates that the balance between the equalization of network usage by avoiding congestion and the amplification of the covariance caused by mutual reference to congestion information determines the overall uniformity of the star network topology. This further indicates the following: Referencing congestion information can make the uniformity of networks worse rather than better if the degree of reference to the congestion information is insufficient in small systems; this finding can be helpful when controlling the endogenous traffic networks that provide agents with congestion information as traffic information in real-world cases. In addition, our analysis shows that if agents linearly respond to congestion information in proportion to a scale parameter, the uniformity of agent distribution improves as the parameter increases because the degree of mutual reference, i.e., the potential source of the surge in imbalance, decreases approximately proportional to the inverse square of the parameter. This knowledge can be used as an indicator for event managers to successfully distribute visitors among local areas in the event venue by looking at how strongly visitors respond to the information provided.

In Scenario-2, we discovered that congestion avoidance linearizes the travel time irrespective of the degree of reference to the congestion information. In contrast, the travel time increases exponentially with the number of secondary nodes when not referring to congestion information at all. Our theoretical models clearly explain the linearity and exponentiality in their respective cases. Using the case of a health examination as an example for Scenario 2, we demonstrated that allowing patients to be preferentially selected for examination in vacant examination sites can linearize the time it takes for everyone to complete their examination. In future work, the physical parameters will be interpreted from different angles in both scenarios to develop further applications. Consequently, we successfully described the optimization effect of congestion-avoiding behavior on the collective dynamics of agents in star topologies.