Structural transition in the collective behavior of cognitive agents

Living organisms process information to interact and adapt to their surroundings with the goal of finding food, mating, or averting hazards. The structure of their environment has profound repercussions through both selecting their internal architecture and also inducing adaptive responses to environmental cues and stimuli. Adaptive collective behavior underpinned by specialized optimization strategies is ubiquitous in the natural world. We develop a minimal model of agents that explore their environment by means of sampling trajectories. The spatial information stored in the sampling trajectories is our minimal definition of a cognitive map. We find that, as cognitive agents build and update their internal, cognitive representation of the causal structure of their environment, complex patterns emerge in the system, where the onset of pattern formation relates to the spatial overlap of cognitive maps. Exchange of information among the agents leads to an order-disorder transition. As a result of the spontaneous breaking of translational symmetry, a Goldstone mode emerges, which points at a collective mechanism of information transfer among cognitive organisms. These findings may be generally applicable to the design of decentralized, artificial-intelligence swarm systems.

additional moves in response. The player, in other words, will sample hypothetical sequences of steps (i.e. trajectories) in the abstract space of moves within the chessboard.
The player's strategy can be cast into a simple, general form: the maximization of future options to move without losing the king. The number of moves the player is able to contemplate ahead is a direct numerical measure of her cognitive competence. In the present study, we adopt a straightforward generalization of this measure by defining cognitive competence as the ability of an agent to determine the number of possible moves within a given environment. This ability depends upon the agent's cognitive map, and we may assume that, similarly to the chess player, the agent will seek to maximize its number of future options to move.
We will now describe a mechanistic implementation of the above ideas which has the twofold advantage to (i) make our ideas concrete, and (ii) provide a connection to the field of active matter 10,11 . We consider a set of freely mobile, identical, spherical particles with diameter σ. Their only 'cognitive' activity is to explore the surrounding space. This exploration is performed via hypothetical random walks of a certain length, starting from the agent's current position, and by evaluating the shape of such walks they gain knowledge about the location of other particles or confining boundaries. More precisely, each agent performs a fixed number N ω of such walks and evaluates how elongated the hypothetical trajectory of this walk is, thus allowing the inference of the location of confining objects, where the trajectory is likely to be more compact, and empty areas, where the trajectory can be more elongated and spacious. As measure for quantifying the configuration of each hypothetical trajectory we use the radius of gyration (see Methods). Clearly, the larger the cognitive competence of the agent is (i.e. the longer these hypothetical walks can be made), the larger the cognitive map of the environment will be.
A formal analogy between random walks and ideal polymers 12,13 can help us understand what to expect. If the random walks were really executed, with no crossings allowed, the resulting interaction would be analogous to the repulsion experienced between star-copolymer molecules with n ω chains. The mutual interaction of polymer chains is known to be characterized by isotropically repulsive entropic forces 14 . We will demonstrate below that by replacing the real random walks with hypothetical, sampling walks, hence introducing cognitive maps into the system, the collective behavior changes dramatically.
For each agent, we consider a set of random, hypothetical sampling trajectories, {Γ τ (t)}, each of total duration τ, that the agent may traverse to explore its environment. Following an earlier work where these hypothetical walks have been introduced 15 , we explicitly indicate the dependence on time t because the cognitive map is dynamically updated as information is acquired. Starting from its initial position, r 0 , the region probed by the agent, and thus the size of its cognitive map, is directly proportional to τ.
We can build a probabilistic description of this cognitive map by considering the probability density function P(Γ τ (t)|r 0 ) associated to an ensemble of trajectories all starting from r 0 . Our probabilistic description should however represent mathematically the information acquired in building a cognitive map. According to Shannon 16,17 , −PlnP is the most general functional form that obeys the constraints of continuity, nonnegativity, and additivity of information. It is then natural to express the information content stored in the cognitive map as which is a path integral over the hypothetical trajectories of an agent at position r, building up the cognitive map {Γ τ (t)} (see Methods and ref. 15,18 ), and where k B is Boltzmann's constant to give dimensions of entropy. The central assumption of the present work shall be that intelligent agents tend to maximize the information content stored in their cognitive maps. This assumption, together with Eq. (1), immediately implies that a cognitive agent tends to maximize the diversity of possible future trajectories. As a cognitive agent acquires information about the external environment, which is by nature of the process limited and partial, the most unbiased decision the agent can make is the one corresponding to the maximum of entropy, because it uses all the available information without any additional assumptions. Mathematically, it assigns a positive weight to every situation which is not excluded by the given information 19,20 .
Maximization of  can then be represented by a force acting on the agent of the form  where the coupling parameter θ (with dimensions of temperature) quantifies the cognitive competence of the agent, that is, how strongly the agent responds to the environment (see Methods). In order to maximize the information content of the cognitive map, an agent will move by following the gradient of .
An intuitive understanding of the principle stated above can be gained by considering again the predator-prey system discussed above. The prey will choose a path which maximizes the future available options (and hence its survival probability). More formally, the approach presented here is indebted to a number of contributions in complex system theory. The well-informed reader will recognize echoes of Kauffman's hypothesized 'fourth law of thermodynamics' 21 , stating that autonomous agents maximize the average secular construction of diversity 21 , e.g. organisms tend to increase the diversity of their organization. In the context of information-processing systems, related approaches have been expressed by Linsker 22 with his "Infomax" principle which was used to demonstrate the emergence of structure in models of neural architecture [23][24][25] , or Ay et al. 26 with the maximum of a predictive information -a relation between future states and past ones-biological infotaxis 2 , sensorimotor systems 27 , and control theory 15 . Our approach is based on the idea that cognitive systems entail some mechanism of prediction 28,29 . We therefore consider a finite duration τ of the hypothetical trajectories.
The motivation for our approach is that an optimal information-processing dynamics should indicate the level of competence of the agent to respond to complex stresses and stimuli. To name only a few examples where maximization of information or entropy has been found empirically and might constitute a fundamental mechanism, maximization of information has been measured as a characteristic of human cognition 30,31 ; a pair-wise www.nature.com/scientificreports www.nature.com/scientificreports/ maximum entropy accurately models resting-state human brain activity 32 ; patients with ADHD exhibit reduced signal entropy as compared with healthy individuals 33 . Figure 1 shows a schematic of a few agents moving on a two-dimensional space, whereas the vertical dimension represents time. Agents interact with each other and with the environment. Each agent explores the available configuration space and acquires information about its structure, and in so doing builds its cognitive map, and optimizes its behavior through responding to the surrounding. The hypothetical trajectories, exploring the available space, have an envelope characterized by a spatial extension λ, and temporal extension τ. In the overlap regions of the forward cones the agents have a probability to collide. This possibility gives rise to the effective force F(r; τ). The overlap regions and the corresponding effective forces appear when the distance between any two agents becomes shorter than the average linear length of the agents' hypothetical trajectories, which relates to the size of the cognitive map.
Our definition of information entropy  satisfies the following criteria. First,  is based on the information content of the system because agents retrieve and process information about the presence of other agents. Second, it does not require any specific goal or strategy, such as rules for taxis of bacteria in chemo-attractant concentration fields. Third, it obeys the laws of information theory and information processing, essential to build cognitive maps. Fourth, it obeys causality because the current state of the cognitive map influences the agent's future dynamics.

Results
We carried out simulations of N identical agents in a two-dimensional, continuous system of size L×L, where agents interact with each other via the cognitive force F and via hard-core repulsion when their distance is less than the agent's diameter σ. The agents' configurations evolve continuously from a random initial distribution towards steady-state configurations for different sizes λ of the cognitive map. We define the size of a cognitive map as the average distance between start and end of hypothetical where N Ω is the total number of hypothetical sampling trajectories. Figure 2 shows the steady-state configurations of the system as the size λ of the map increases. At low values of cognitive map size λ with respect to the inter-agent separation, most agents are isolated and randomly distributed throughout the system (Fig. 2a). The agents try to stay as far apart as possible from each other in an attempt to maximize their available space. For a horizon of linear size λ, the available space scales as λ 2 . As λ increases, we observe the spontaneous formation of short linear chains of agents ( Fig. 2b). At λ = 5.6σ, the chains grow longer and outline a labyrinthine pattern in the system (Fig. 2c). The emergence of this spatial organization can be understood as a way to increase the available configuration space in the direction normal to the chain-like structures. Consider chains of typical length . Once chains are formed, the space available to their horizons scales approximately as λ  . Thus, the ratio of available space between chains and the disordered configuration scales www.nature.com/scientificreports www.nature.com/scientificreports/ approximately as /λ. The agents can therefore increase their available space by increasing , and forming long chains.
Upon further increase of λ, the pattern continuously turns into a cellular structure (Fig. 2d,e), which we find well developed at λ = 10.8σ (Fig. 2f). Consider now the (entropically) advantageous strategy of agents to form chains (e.g. for λ = 5.6σ). With increasing λ agents tend to arrange such that they keep larger distances from other chains of agents within proximity. Due to the fixed filling fraction and size of the system this leads to chains connecting to joints and ultimately cellular structures, of roughly hexagonal symmetry, which provides the optimal tiling of the plane.
A similar sequence of patterns can be observed when we vary the filling fraction, φ ≡ Nπσ 2 /(4L 2 ). The phase diagram of the system is shown in Fig. 3a. The transition line from short chains to more complex patterns is well fitted by a relation φ λ −2 which suggests that the transition is triggered as the mean inter-agent distance becomes comparable to the cognitive map size λ.
In order to analyze the complex morphology of the patterns, we employ the anisotropy parameter , defined in terms of the eigenvalues β 1 and β 2 of the Minkowski tensor 34,35 , which is a measure of anisotropy of configurations of particles (see Methods for more details). Figure 3a shows as a heat map the association of the phase diagram with the anisotropy α of the configurational patterns. At fixed filling fraction φ, the system exhibits the largest anisotropy α when linear chains start to connect with each other for intermediate values of the size λ of the cognitive map. In contrast, at low λ, where agents are isolated, the system is trivially isotropic. At large values of λ, where cognitive maps significantly overlap and cellular patterns emerge (Fig. 2f), the associated anisotropy decreases to values that are however larger than in the case of low λ. This indicates that the system gains again isotropy on the larger scale of the cells. Figure 3b shows the anisotropy α of the pattern. It exhibits a sharp maximum at  λ σ . 5 5 where the linear chains are most pronounced and the system is at the threshold of forming the labyrinthine patterns. The transition from isolated particles to cellular structures appears to be continuous, as no structural or dynamical observable shows discontinuous behavior. The transition occurs when α is considerably larger than zero, that is, for  λ σ 2 . www.nature.com/scientificreports www.nature.com/scientificreports/ Our results so far illustrate that as the cognitive maps of the agents overlap, interesting collective behavior emerges. A moment's reflection will show that agents perceive each other's presence via their respective cognitive maps, and thus information about each other's presence must be exchanged, and this information flow, in turn, dynamically modifies the cognitive maps of the agents. Quantifying the information flow among agents will instruct us on the origin of the collective behavior.
This information flow can be quantified via the notion of mutual information [36][37][38][39][40][41][42] . The mutual information for two random variables a and b is given by , where P(a) represents the probability distribution of a, and P(a, b) is the joint probability distribution. Because we are interested in isolating the causal interaction between agents that underpins the update of the cognitive maps, we consider the positions (x i (t), y i (t)) of the i-th agent at time t and compute The total mutual information is then where 〈i, j〉 represent all pairs of agents which are within a local neighborhood of distance 4σ (larger values of this cutoff do not qualitatively change the results), and N p is the total number of neighbor pairs (i, j) included in the sum in Eq. (3). Figure 4(a) shows the dependence of the mutual information  on the size of the cognitive map λ. At very small λ, the cognitive agent system exhibits a mutual information which is nearly vanishing on account of the nearly independent motion of each agent. As the size of the cognitive map increases beyond 2 0 λ σ .  ,  increases steadily, while the system develops labyrinthine and cellular patterns. At  9 5 λ σ . ,  reaches a plateau value, corresponding to well-developed cellular patterns. These results show that upon increasing the size of the cognitive maps, an indirect exchange of information takes place among the agents, which in turn leads to the formation of complex structures.
The information exchange among the agents is correlated to their structural transition. To quantify the degree of correlated motion, we turn to a standard tool for the analysis of the dynamic response of many-body systems 43 : the displacement covariance matrix 44 C ij ≡ 〈δr i (t)⋅δr j (t)〉 t , where δr i (t) ≡ r i (t) − 〈r i 〉, and the angle brackets with subscript t indicate an average over time; the eigenvalues and eigenvectors of C ij provide information about the coherent www.nature.com/scientificreports www.nature.com/scientificreports/ motion in the system. Provided that the particles have well-defined average positions 〈r i 〉 -and this is in fact the case once complex patterns emerge-the spectral properties of C ij can describe effective excitation modes 45 . Figure 5a shows the eigenvalues of C ij for δx and δy, and the comparison with the uncorrelated motion generated with a random matrix model of Gaussian-distributed displacements. The first mode of the cognitive system is considerably above the random Gaussian model, and corresponds to a large wavelength mode propagating through the system. This collective mode is shown in Fig. 5b. This is the Goldstone mode associated to the structural transition and corresponds to excitations of the ordered state. The situation is reminiscent of equilibrium systems where the spontaneous breaking of translational symmetry (Galilean invariance) is associated with the emergence a massless Goldstone mode, propagating through the system with a scale-free correlation length. Goldstone modes have been also identified in models and observations of active collective behavior [46][47][48] . In practice, this means that in certain configurations of the system some fluctuations propagate very quickly throughout the system, and they do not depend on the system size. Figure 6 shows the spatial correlation of displacements between agents defined via the correlation function G(d) = 〈δr i (t)⋅δr j (t)〉 t, (i,j) , where d ≡ |r i (t) − r j (t)|, and the angle brackets with subscript t, (i, j) indicate an average over time and pairs of agents (i, j) separated by distance d. The correlation in the agents' motion decays with distance with a power law (the oscillations with decaying amplitudes are due to the cellular pattern of the system).
In summary, our study furnishes a first step forward in the understanding of nonequilibrium transitions in a system of cognitive agents that dynamically interact with their environment, and respond to it with cognitive competence by maximizing the information content of their cognitive maps. The transition from isolated particles to complex patterns is characterized by different degree of overlap of the cognitive maps. The continuous change of  as the system develops complex patterns, together with the change of the anisotropy parameter 1 2 point at a transition in cognitive-agent systems. We have identified a Goldstone mode propagating through the system that is generated by the spontaneous symmetry breaking of the structural transition as complex patterns emerge.
Apart for its significance for investigations of complex organisms whose active response to environmental stimuli is based on various levels of cognition, from eusocial insects to mammals, our results are relevant to artificial systems like autonomous micro-robots, and swarm robotics 49,50 , which are explicitly designed to autonomously mimic the collective behavior of living organisms.

Simulations. Every agent obeys the following equation of motion
where v is the velocity of the agent, m its mass, γ the viscous drag, F(r;τ) the cognitive force, and h(r) is the short-range repulsion among agents, modeled via a repulsive linear spring when |r i − r j | < σ, where σ is the hard-core diameter of the agents.
As described above, the calculation of the cognitive force F(r;τ) [Eq.
(2)] requires calculating a set of hypothetical sampling trajectories {Γ τ (t)}. The simulation algorithm is based on the following two steps: (i) generation of the hypothetical trajectories, resulting in the construction of the cognitive map, and computation of the cognitive force F(r;τ) (see below for details); (ii) update of the agent's position according to Eq. (4). During the generation www.nature.com/scientificreports www.nature.com/scientificreports/ of the hypothetical trajectories, all agents in the system remain fixed in their current positions. The dynamics of the system evolve by repeating the two steps above.
The agents are initially placed randomly with a uniform distribution within the system and without any overlap of the agents' hard cores. The system size is, unless otherwise specified, fixed at L = 80σ.
Construction of the cognitive map. The calculation of P(Γ τ (t)|r 0 ) is performed by generating hypothetical sampling trajectories, each of which represents a virtual evolution during the time [0,τ] of the agent with constraints fixed at the present configuration and not depending on time. The hypothetical trajectories are generated using Langevin dynamics where v, m, γ, and h(r) have the same meaning as in Eq (4), ξ(t) is a random noise with zero mean and 〈ξ i (t)ξ j (t′)〉 = 2γk B Tδ ij δ(t − t′).
Any interaction of a hypothetical sampling trajectory with another agent is hard-core repulsive, that is the trajectory is reflected elastically by the other agent.

Derivation of the cognitive force.
Here we show the derivation of an expression for the entropic force we use to calculate the system's dynamics. This is a derivation adapted and simplified from 15 . We start by using Eqs (1) and (2) and consider the gradient in (2) with respect to position space coordinates at present time r(t = 0) = r(0) and arrive at   We can assume deterministic behavior within one small sub-interval [t, t + ε]. Therefore a conditional path probability can be decomposed into the probabilities of its intervals in the following way: t r r r Pr( (0)) P r( ( )) Pr( (0)), where Γ ε denotes a path of length ε and τ = Nε. Accordingly, we can express the gradient of the probability as Since Γ ε can be seen as the path from r(0) to r(ε) in one step, the gradient in probability of jumping from r(0) to r(ε) with respect to r(0) is equal to the negative gradient in probability of jumping from r(0) to r(ε) with respect to r(ε): By Taylor expanding the position r(ε) we find To estimate the probabilities Pr(Γ τ |r(0)) we use N Ω Brownian trajectories exploring the available space for a finite time (horizon) τ. Every sampling trajectory starts from the current system state r(0). We assign a uniform probability for all paths within a neighborhood of a sampled path based on the volume Ω n explored Γ | Ω ∫ ≡ ∂ provides a measure of anisotropic morphologies 34,35 . It is a second rank symmetric tensor, where G 2 = (κ 1 + κ 2 )/2 is the local curvature, r the position vector, n the normal vector to the surface ∂C of a body, and is the symmetric tensor product of vectors a and b. The anisotropy parameter 1  , respectively, gives a measure of anisotropy of the pattern.