## Abstract

We study a simple model for social-learning agents in a restless multiarmed bandit (rMAB). The bandit has one good arm that changes to a bad one with a certain probability. Each agent stochastically selects one of the two methods, random search (individual learning) or copying information from other agents (social learning), using which he/she seeks the good arm. Fitness of an agent is the probability to know the good arm in the steady state of the agent system. In this model, we explicitly construct the unique Nash equilibrium state and show that the corresponding strategy for each agent is an evolutionarily stable strategy (ESS) in the sense of Thomas. It is shown that the fitness of an agent with ESS is superior to that of an asocial learner when the success probability of social learning is greater than a threshold determined from the probability of success of individual learning, the probability of change of state of the rMAB, and the number of agents. The ESS Nash equilibrium is a solution to Rogers’ paradox.

## Introduction

One of the differences between human beings and other animals is that the former transfer their predecessors’ experience and wisdom in the form of knowledge^{1}. Social learning—learning from the experience of others— is advantageous compared to individual learning^{2,3,4}. Without social learning everybody would have to learn everything for themselves^{2}. In other words, individual learning costs more than social learning does^{2,3,4}. Therefore, Rogers’ finding that social learning is not necessarily more advantageous than individual learning is counterintuitive^{5}. This is now called Rogers’ paradox.

Rogers’ conclusion seems very strange in light of our experience^{4}. Several attempts have been made to solve Rogers’ paradox in social learning. Boyd and Richerson^{2} pointed out that Rogers’ paradox is *not* a paradox when the only benefit of social learning is to avoid learning costs. Further, on analysing two models where social learning reduces individual-learning costs and improves the information obtained through the latter, they concluded that social learning can be adaptive. Enquist *et al*.^{3} advocated a learning form called critical social learning, which is social learning supplemented by individual learning. They discussed using rate equations and succeeded in solving the paradox. Rendell *et al*.^{4} studied the relative merits of several learning strategies by using a spatially explicit stochastic model.

The concept of adaptive information filtering^{3, 6} has been proposed as key to the effective working of social learning. It indicates that each member effectively learns good-quality information provided by other members. For example, in a famous tournament by Rendell *et al*.^{6}, discountmachine that did the most effective social learning won over the other strategies that combined individual learning and social learning.

In this study, we propose a stochastic model to solve Rogers’ paradox in the framework of a restless multiarmed bandit (rMAB) used in that tournament. The objective of this study is to analyse equilibrium social learning in an rMAB. An rMAB is analogous to the “one-armed bandit” slot machine but with multiple “arms”, each with a distinct payoff. We call an arm with a high payoff a good arm. The term “restless” means that the payoffs change randomly. Agents maximise their payoffs by exploiting an arm, searching for a good arm at random (individual learning), or copying an arm exploited by other agents (social learning). Because rMAB is simple in structure and its generality, we believe that it is an appropriate framework to consider Rogers’ paradox.

As a model for social-learning collectives, Bolton and Harris studied an agent system in a multi-armed bandit^{7}. They assumed that the agents could know all information of other agents and obtained a socially optimal experiment (learning) strategy. In the present study, we consider the bounded rationality of agents, who can access the results of their respective choices only. In addition, we assume that the environment (i.e., the rMAB) changes randomly. We obtain the socially optimal and equilibrium learning strategies.

## Model

We make the model as simple as possible and incorporate the property of adaptive filtering of information into it. A mathematical overview of the model is given in the Methods section.

The rMAB has only one good arm and infinitely many bad arms. There are *N* agents labeled by *n* = 1, …, *N*. In each turn, an agent (say, agent *n*) is randomly chosen. He/she exploits his/her arm and obtains payoff 1 if he/she knows a good arm. If he/she does not know a good arm, he/she randomly searches for it (individual learning) with probability 1 − *r*_{
n
}, or copies the information of other agents’ good arms (social learning) with probability *r*_{
n
}. In the random search, the probability that he/she successfully finds a good arm is denoted as *q*_{
I
}. On the other hand, we assume that the copy process succeeds with probability *q*_{
O
}^{8} if there is at least one agent who knows a good arm, and fails if no agent knows a good arm. Then, with probability *q*_{
C
}/*N*, the good arm changes to a bad one and another good arm appears. If a good arm changes to a bad one, the agents who knew the arm are forced to forget it and to know a bad one. See Fig. 1. The difference with our previous model^{8} is that there are *M* good arms in the previous model, whereas in the present model there is only one good arm.

Let *σ*_{
n
} be a random variable defined by

This is simply the payoff for agent *n*. For each turn *t*, we have a joint probability function *P*(*σ*_{1}, …, *σ*_{
N
}|*t*), which evolves in *t* according to the aforementioned rule. To exclude trivial results, we assume that *q*_{
C
}, *q*_{
I
} and *q*_{
O
} are positive and that *r*_{
n
}s are less than 1^{8}. Then, in the long run, we have the unique steady probability function $P({\sigma}_{1},\cdots ,{\sigma}_{N})={\mathrm{lim}}_{t\to \infty}P({\sigma}_{1},\cdots ,{\sigma}_{N}|t)$. Now, we shall introduce the expected payoff for each agent in the steady state,

This quantity depends on parameters *N*, *q*_{
C
}, *q*_{
I
}, *q*_{
O
}, and *r*_{
n
}s. We regard *w*_{
n
} mainly as a function of *r*_{
n
}s. We denote this function by *w*(*r*_{
n
}, $\overline{r}$_{
n
}), where

Thus, we have *w*_{
n
} = *w*(*r*_{
n
}, $\overline{r}$_{
n
}) for each *n* = 1, …, *N*.

In this study, we treat *w*_{
n
} as the fitness for agent *n*.

## Results and Discussion

### Pure Strategies and Rogers’ Paradox

In the present study, the strategy of agent *n* refers to the social learning probability, *r*_{
n
}. We call *r*_{
n
} = 0, 1 as pure strategies and 0 < *r*_{
n
} < 1 as mixed strategies.

First, we confirm that Rogers’ paradox occurs when agents adopt pure strategies. We shall divide *N* agents into two groups. The first group consists of *N*_{
I
} individual learners (*r*_{
k
} = 0, *k* = 1, …, *N*_{
I
}). The second group consists of *N*_{
S
} = *N* − *N*_{
I
} social learners (*r*_{
k
} = 1, *k* = *N*_{
I
} + 1, …, *N*_{
I
} + *N*_{
S
}). The corresponding fitness per agent, which we denote respectively as *w*_{
I
} and *w*_{
S
}, are given by

where *a* is defined in equation (4).

When *q*_{
O
} ≤ *q*_{
I
}, we have *w*_{
I
} > *w*_{
S
}. Therefore, in this case, individual learning is always favourable over social learning.

Now, we consider the *q*_{
O
} > *q*_{
I
} case. Figure 2 is the plot of *w*_{
I
} and *w*_{
S
} for sufficiently large *N*.

When the proportion of social learners is small, social learning is effective. However, as the proportion of social learners increases, *w*_{
S
} monotonically decreases and tends to zero. Thus, Rogers’ paradox occurs.

It is important to note that *w*_{
I
} < *w*_{
S
} is true when *N*_{
I
}/*N* is finite, with a sufficiently large *N*. This is because, as *N* → ∞, we have *w*_{
I
} → *q*_{
I
}/(*q*_{
C
} + *q*_{
I
}) and *w*_{
S
} → *q*_{
O
}/(*q*_{
C
} + *q*_{
O
}).

### Nash Equilibrium and Rogers’ Paradox

Let us assume that each agent adopts a mixed strategy, that is, for each *n* = 1, …, *N*, the social-learning probability, *r*_{
n
}, is an arbitrary number between 0 and 1. This means that agent *n* performs social learning with probability *r*_{
n
} and individual learning with probability 1 − *r*_{
n
}. The learning mode that he/she chooses would be decided stochastically and automatically.

We consider the *N*-tuple, (*r*_{1}, …, *r*_{
N
}), of the social-learning probabilities. This is a point in the *N*-dimensional unit cube *J* = [0, 1] × … × [0, 1]. *J* is regarded as the space of *N*-tuples of mixed strategies. For each point in *J*, a joint probability function *P*(*σ*_{1}, …, *σ*_{
N
}) is determined and an *N*-tuple, (*w*_{1}, …, *w*_{
N
}), of the fitness functions of the agents is calculated.

Now, imagine that agent *n* maximises *w*_{
n
} by adjusting *r*_{
n
} for fixed *r*_{
k
}s (*k* ≠ *n*). It is not difficult to show that the maximum point is unique (Fig. 3) and expressed as

where

We note that *f*(*r*) → 0 as *q*_{
O
} → *q*_{
I
} + 0. Next, we introduce the function,

This is a continuous function mapping from the *N*-dimensional unit cube *J* into itself. As shown in the Methods section, the fixed point of *F* is unique and is on the diagonal line of *J*,

where *r*_{Nash} is a function of *q*_{
C
}, *q*_{
I
}, *q*_{
O
} and *N*. The value of *r*_{Nash} is explicitly given by

where

The entity *r*_{Nash} has the following properties (see the Methods section): (i) 0 ≤ *r*_{Nash} < 1, (ii) *r*_{Nash} → 0 as (*q*_{
O
} − *q*_{
I
})*N* − (*a* + *q*_{
O
}) → 0, and (iii) the fixed point (*r*_{Nash}, …, *r*_{Nash}) is the unique Nash equilibrium point in *J*. Figure 4 is a schematic explanation of the Nash equilibrium point.

Moreover, the corresponding mixed strategy is an evolutionarily stable strategy (ESS)^{9} because the fixed point is a Nash equilibrium point in the strong sense,

Further, it is an ESS in the sense of Thomas^{10}, because the inequality,

is true.

Now, we consider the two fitness functions, *w*_{
I
} and *w*_{
N
} = *w* (*r*_{Nash}, *r*_{Nash}). As shown in the Methods section, the inequality *w*_{
N
} > *w*_{
I
} is correct if and only if (*q*_{
O
} − *q*_{
I
}) *N* > *a* + *q*_{
O
}. See also Fig. 5. The Nash equilibrium point is usually regarded as a stable point in the sense that no agent has an intention to change his/her strategy. Therefore, this inequality claims that the mixed strategy *r*_{
n
} = *r*_{Nash} (*n* = 1, …, *N*) can outperform the pure strategy of individual learning. This solves Rogers’ paradox. We note that the Nash equilibrium point is realised as a mixed strategy of social learning and individual learning.

### Pareto Optimality

Pareto optimality is an important concept alongside Nash equilibrium. Thus, we consider Pareto optimality in our model. We shall adopt a *natural* definition of the Pareto-optimal point in *J* as the maximum point of the function, ${\sum}_{k=1}^{n}{w}_{k}$. We can show that the maximum point is unique and is on the diagonal line of *J*,

where *r*_{Pareto} is a function of *q*_{
C
}, *q*_{
I
}, *q*_{
O
}, and *N*. The value of *r*_{Pareto} is explicitly given by

where

Further, *r*_{Pareto} has the following properties: (i) 0 ≤ *r*_{Pareto} < 1, (ii) *r*_{Pareto} → 0 as (*q*_{
O
} − *q*_{
I
})*N* − (*a* + *q*_{
O
}) → 0, (iii) *r*_{Pareto} < *r*_{Nash} if and only if (*q*_{
O
} − *q*_{
I
})*N* > *a* + *q*_{
O
} (see the Methods section), and (iv) the point (*r*_{Pareto}, …, *r*_{Pareto}) is the Pareto-optimal point in *J*. Here, by Pareto optimality, we imply that the statement “if an agent succeeds to increase his/her fitness by changing his/her social-learning probability from *r*_{Pareto} to *r*_{Pareto} + *δr* by *δr* ≠ 0, then another agent’s fitness certainly decreases” is true. Such a *δr* exists when *r*_{Pareto} > 0 and no *δr* exists when *r*_{Pareto} = 0. The statement is correct in both cases.

We define the Pareto fitness function, *w*_{
P
} = *f*(*r*_{Pareto}, *r*_{Pareto}). Then, we have the inequality *w*_{
P
} > *w*_{
N
} if and only if (*q*_{
O
} − *q*_{
I
})*N* > *a* + *q*_{
O
} (see Fig. 5 and the Methods section). This is trivial by the definition of the Pareto-optimal point. Thus, we have established the relation among fitness functions,

## Concluding Remarks

We have proposed a stochastic model of *N* agents and an rMAB. The unique Nash equilibrium point in the mixed strategy space *J* has been presented and shown to be an ESS in the sense of Thomas^{10}. The corresponding fitness *w*_{
N
} per agent is greater than the fitness *w*_{
I
} for an individual learner. This solves Rogers’ paradox.

In this study, we concentrated on steady states. This is valid if the system relaxes quickly to the steady state (see the Methods section). However, if *r*_{
n
}s change faster than the relaxation to the steady state, it is an introduction of non-trivial dynamics. It may be possible that our system has a nice dynamics possessing the stable Nash equilibrium point.

As a future research subject, we propose an experimental study of human collectives in rMAB. There have been several attempts in this direction^{11,12,13}, whose target has been the improvement of performance by social learning, that is, collective intelligence effect. Since we have shown that there is an ESS Nash equilibrium in the social-learning agents system in rMAB, it is interesting to experimentally examine whether the prediction is realised. As a first step, the interactive rMAB game might be a suitable environment where one human competes with many other mixed-strategy agents and *r* = *r*_{Nash}. We can check whether the social-learning rate of people is the same with *r*_{Nash}. Second, when many people compete, the Nash equilibrium emerges as the model parameter *q*_{
I
} changes. Meanwhile, we might be able to detect some phase-transitive behaviour^{8}.

As for theoretical research, the stage of our analysis is far from mature. In the present work, we have studied the game of rMAB in the steady state of the system. However, when the relaxation time of the system discussed in the Methods section is not small enough, the assumption of steadiness is unrealistic in the laboratory experiment. Thus, we need to develop a *t*-dependent theory. It might be a difficult problem. We believe that the research direction is fruitful.

## Methods

### Mathematical Overview of the Model

For simplicity we use the following notation,

Our model develops in *t* according to an agent action and the subsequent state change of the rMAB. This is a Markov process^{14}. The probability of change $\stackrel{\u20d7}{\sigma}\to {\stackrel{\u20d7}{\sigma}}^{\prime}$ is described by the transition probability matrix^{14},

where

The joint probability function $P(\stackrel{\u20d7}{\sigma}|t)=P({\sigma}_{1},\dots ,{\sigma}_{N}|t)$ satisfies the Chapman-Kolmogorov equation^{14},

Our assumption is that *q*_{
C
}, *q*_{
I
}, *q*_{
O
} > 0 and *r*_{
n
} < 1 (*n* = 1, …, *N*). In this case, the matrix *T* is shown to be irreducible and primitive^{15}. Then, the Perron-Frobenius theory^{15} ensures that (i) *λ*_{1} = 1 is an eigenvalue of *T* of multiplicity 1 and the steady probability function *P*($\stackrel{\u20d7}{\sigma}$) is a corresponding eigenvector, (ii) the set {|*λ*_{
i
}|}_{
i≥2} of absolute values of eigenvalues of *T* other than *λ*_{1} has an upper bound *ρ* < 1. When *r*_{
n
}s are fixed, we have the time-homogeneous Markov process^{14}, that is, the matrix *T* does not depend on *t*. Therefore, for any initial probability function $P(\stackrel{\u20d7}{\sigma}|0)$, we have the unique limit $P(\stackrel{\u20d7}{\sigma})={\mathrm{lim}}_{t\u20d7\infty}P(\stackrel{\u20d7}{\sigma}|t)$. Then, it is not difficult to derive equation (3) using $P(\stackrel{\u20d7}{\sigma})$.

The convergence $P(\stackrel{\u20d7}{\sigma}|t)\u20d7P(\stackrel{\u20d7}{\sigma})$ is exponential, $|P(\stackrel{\u20d7}{\sigma}|t)-P(\stackrel{\u20d7}{\sigma})|\sim {\rho}^{t}$. This means that the relaxation time is $\tau =-\phantom{\rule{-.25em}{0ex}}1/\phantom{\rule{.25em}{0ex}}\mathrm{log}\phantom{\rule{.10em}{0ex}}{\rho}^{-1}$. Thus, when no agent changes his/her social learning probability over a much longer period than *τ*, the fitness per agent per turn is almost exactly equal to the value of the function *w* in equation (3).

### Existence of a Fixed Point of *F*

Since the *N*-dimensional cube *J* = [0, 1] × … × [0, 1] is a compact, convex set and *F* is a continuous function mapping from *J* into itself, Brouwer’s fixed-point theorem^{16} guarantees that there exists a fixed point of *F* in *J*.

### A Fixed Point of *F* is a Nash Equilibrium Point, and Vice Versa

Let (*r*_{1}, …, *r*_{
N
}) be a fixed point of *F*, that is, *r*_{
n
} = *f*($\overline{r}$_{
n
}) for each *n* = 1, …, *N*. Since *r* = *f*($\overline{r}$_{
n
}) is the unique maximal point of *w*(*r*, $\overline{r}$_{
n
}), we have *w*(*r*_{
n
} + *δr*, $\overline{r}$_{
n
}) < *w*(*r*_{
n
}, $\overline{r}$_{
n
}) for each *n* = 1, …, *N* when *δr* ≠ 0. Thus, (*r*_{1}, …, *r*_{
N
}) is a Nash equilibrium point. Conversely, let (*r*_{1}, …, *r*_{
N
}) be a Nash equilibrium point, that is, *r* = *r*_{
n
} is a maximal point of *w*(*r*, $\overline{r}$_{
n
}) for each *n* = 1, …, *N*. Since *r* = *f*($\overline{r}$_{
n
}) is the unique maximal point of *w*(*r*, $\overline{r}$_{
n
}) (see Fig. 3), we have *r*_{
n
} = *f*($\overline{r}$_{
n
}). Thus, (*r*_{1}, …, *r*_{
N
}) is a fixed point of *F*.

### Uniqueness of the Fixed Point of *F*

When *q*_{
O
} ≤ *q*_{
I
}, we have the unique fixed point (0, …, 0).

Next, we consider the *q*_{
O
} > *q*_{
I
} case.

Let (*r*_{1}, …, *r*_{
N
}) be a fixed point of *F*. Since $\overline{r}$_{
n
} = (*s* − *r*_{
n
})/(*N* − 1), $s={\sum}_{k=1}^{N}{r}_{k}$, all the *r*_{
n
}s satisfy the common relation,

Figure 6(b) is a plot of the function *g*(*r*).

This is a strictly increasing concave function for *s* − (*N* − 1)*r*^{*} ≤ *r* ≤ *s* − (*N* − 1)*r*_{*}, where

It is not difficult to show that *r*_{*} < *r*^{*} < 1. The maximum value of the derivative *g*′(*r*) is 1/2, which is realised at *r* = *s* − (*N* − 1)*r*^{*}. Thus, $\tilde{g}(r)=r-g(r)$ is a strictly increasing function such that $\tilde{g}\mathrm{(0)}\le 0\le \tilde{g}\mathrm{(1)}$. Therefore, there is only one zero, *r*_{0}, of the function $\tilde{g}(r)$ in the interval 0 ≤ *r* ≤ 1. Then, we conclude that *r*_{1} = … = *r*_{
N
} = *r*_{0}.

Now we have *s* = *Nr*_{0}. Therefore, *r*_{0} is a solution of the equation,

Figure 6(a) is a plot of the function *f*(*r*). The function *f*(*r*) is a decreasing function. Thus, *h*(*r*) = *r* − *f*(*r*) is a strictly increasing function such that *h*(0) ≤ 0 ≤ *h*(1). Therefore, the function *h*(*r*) possesses only one zero, *r*_{Nash}, such that 0 ≤ *r*_{Nash} < 1. Thus, we have *r*_{0} = *r*_{Nash}. This proves the uniqueness of the Nash equilibrium point.

### Inequality *w*
_{
P
} > *w*
_{
N
} > *w*
_{
I
}

It is sufficient to consider the (*q*_{
O
} − *q*_{
I
})*N* > *a* + *q*_{
O
} case. Then, *r*_{Nash} satisfies $r=\overline{f}(r)$. We introduce the following function,

It is not difficult to check that 1/(1 − *r*_{Nash}) is the larger root of *k*(*u*). We note that *k*(1) < 0.

Next, we define *r*_{
I
} as

This entity has the following properties: (i) 0 < *r*_{
I
} < 1, (ii) *w*(*r*_{
I
}, *r*_{
I
}) = *w*_{
I
}, and (iii) *k* (1/(1 − *r*_{
I
})) > 0. On the other hand, it is elementary to show that *k* (1/(1 − *r*_{Pareto})) < 0. Thus, we conclude that *r*_{Pareto} < *r*_{Nash} < *r*_{
I
}.

Now, *r*_{Pareto} is the maximal point of *w*(*r*, *r*). Therefore, we have the inequality *w*(*r*_{Pareto}, *r*_{Pareto}) > *w*(*r*_{Nash}, *r*_{Nash}) > *w*(*r*_{
I
}, *r*_{
I
}), that is, *w*_{
P
} > *w*_{
N
} > *w*_{
I
}.

## Additional Information

**Publisher's note:** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

- 1.
Boyd, R. & Richerson, P. J.

*Culture and the Evolutionary Process*(University of Chicago Press, Chicago, 1985). - 2.
Boyd, R. & Richerson, P. J. Why does culture increase human adaptability?

*Ethol. Sociobiol.***16**, 125–143, doi:10.1016/0162-3095(94)00073-G (1995). - 3.
Enquist, M., Eriksson, K. & Ghirlanda, S. Critical social learning: A solution to Rogers’s paradox of nonadaptive culture.

*Am. Anthropol.***109**, 727–734, doi:10.1525/aa.2007.109.4.727 (2007). - 4.
Rendell, L., Fogarty, L. & Laland, K. N. Rogers’ paradox recast and resolved: Population structure and the evolution of social learning strategies.

*Evolution***64**, 534–548, doi:10.1111/j.1558-5646.2009.00817.x (2010). - 5.
Rogers, A. R. Does biology constrain culture?

*Am. Anthropol.***90**, 819–831, doi:10.1525/aa.1988.90.4.02a00030 (1988). - 6.
Rendell, L.

*et al*. Why copy others? Insights from the social learning strategies tournament.*Science***328**, 208–213, doi:10.1126/science.1184719 (2010). - 7.
Bolton, P. & Harris, C. Strategic experimentation.

*Econometrica***67**, 349–374, doi:10.1111/1468-0262.00022 (1999). - 8.
Mori, S., Nakayama, K. & Hisakado, M. Phase transition of social learning collectives and the echo chamber.

*Phys. Rev. E***94**, 052301, doi:10.1103/PhysRevE.94.052301 (2016). - 9.
Maynard-Smith, J.

*Evolution and the Theory of Games*(Cambridge University Press, Cambridge, 1982). - 10.
Thomas, B. Evolutionary stability: states and strategies.

*Theor. Popul. Biol.***26**, 49–67 (1984). - 11.
Toyokawa, W., Kim, H. & Kameda, T. Human collective intelligence under dual exploration-exploitation dilemmas.

*PloS One***9**, e95789, doi:10.1371/journal.pone.0095789 (2014). - 12.
Kameda, T. & Nakanishi, D. Cost-benefit analysis of social/cultural learning in a nonstationary uncertain environment: An evolutionary simulation and an experiment with human subjects.

*Evol. Hum. Behav.***23**, 373–393, doi:10.1016/S1090-5138(02)00101-0 (2002). - 13.
Yoshida, S., Hisakado, M. & Mori, S. Interactive restless multi-armed bandit game and swarm intelligence effect.

*New Generat. Comput.***34**, 291–306, doi:10.1007/s00354-016-0306-y (2016). - 14.
Stroock, D. W.

*An Introduction to Markov Processes*(Springer-Verlag, Heidelberg, 2014). - 15.
Meyer, C. D.

*Matrix Analysis and Linear Algebra*(SIAM, 2000). - 16.
Granas, A. & Dugundji, J.

*Fixed Point Theory*(Springer-Verlag, New York, 2003).

## Acknowledgements

We would like to thank Editage (www.editage.jp) for English language editing. This work was supported by JSPS KAKENHI Grant Number 17K00347.

## Author information

### Affiliations

#### Department of Mathematics, Faculty of Science, Shinshu University, Asahi 3-1-1, Matsumoto, Nagano, 390-8621, Japan

- Kazuaki Nakayama

#### Fintech Lab. LLC Meguro, Tokyo, 153-0051, Japan

- Masato Hisakado

#### Department of Physics, Faculty of Science, Kitasato University, Kitasato 1-15-1, Sagamihara, Kanagawa, 252-0373, Japan

- Shintaro Mori

### Authors

### Search for Kazuaki Nakayama in:

### Search for Masato Hisakado in:

### Search for Shintaro Mori in:

### Contributions

S.M. and M.H. conceived the model. K.N. performed a theoretical analysis. All authors contributed to analysing and interpreting the results and to writing the manuscript.

### Competing Interests

The authors declare that they have no competing interests.

### Corresponding author

Correspondence to Kazuaki Nakayama.

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.