Introduction

The evolution of complexity is one of the most striking characteristics of life throughout its hierarchy1,2. The resulting complexity, that is, the system’s increase in its components and multifaceted interactions among them, often entails adaptation to the environment, known as complex adaptive systems3,4,5. At the organismal level, theoretical works based on Fisher’s geometric model of adaptation predict that a population of organisms climbing a hill in a fitness landscape evolves a complex polygenic trait based on a few major genes and many minor genes6,7,8,9. Typically, the advent of major genes with large phenotypic effects is followed by the subsequent evolution of minor genes with small phenotypic effects.

Complex adaptive systems beyond the single organismal level, such as multicellularity and social organization, show similar multi-component regulations: The systems seem to take a layered form in which their core components, often considered as evolutionary innovation and hence with large phenotypic effects, are buffered by additional, regulatory components with relatively small phenotypic effects10. The core components generally involve interactions between group members that have synergistic effects on the performance of the group, while the regulatory components support the function of core components (note that the regulatory is relative to the core). Self-organization itself does not guarantee adaptation10,11,12, and previous studies have shown how supplementation of accessory regulation to core components of self-organization can lead to variable system outputs10,13,14,15,16 and make the systems adapt to species-specific17,18,19 or within-species changing environments19,20. Being biologically implemented, these adaptive modifications should be attributed to the evolution of regulatory components that improve the self-organizing property of core components. However, the underlying tenet of gradual evolution has often been hampered by the classic problem of maladaptive intermediate, which raises a concern that a system equipped only with a core component (i.e., no regulatory components) may suffer lowered fitness compared to those without the core component (see Discussion). Therefore, deciphering the evolutionary order of appearance of regulatory components relative to core components will contribute to a better understanding of the evolutionary mechanisms that add complexity to natural biological systems21.

In this study, we address this issue by considering social insect colonies as a model system. Social insects stand at one of the pinnacles of biological complexity. Their colonies are characterized by highly coordinated systems, often likened to swarm intelligence, in which microscopic interactions of nestmate individuals collectively produce diverse macroscopic phenomena10,22,23,24. A typical example is found in mass foraging by ants with trail pheromones25,26. The core component of this system is indirect pheromone communication among nestmates according to the following algorithm: Once a worker finds food, she puts a chemical marker on the ground while carrying the food to her nest; nestmate workers are recruited to the marker and follow the trail toward the food, and lay the same marker to reinforce the trail. The system is strengthened by the balance between positive (trail reinforcement) and negative (trail decay) feedbacks depending on changing availability of pheromones, which was the inspiration of Ant Colony Optimization (ACO27), a computational metaheuristic for solving hard combinatorial optimization problems.

We took a constructive approach with both robotic and computational systems that mimic this foraging system, focusing on its logistic aspects28,29. The use of artificial systems mimicking collective behaviors of animals can provide not only controlled experiments that would be impossible in real organisms, but also greater realism through embodiment, which is often abstracted in analytical and purely simulational studies30,31,32,33. Examples include the demonstration of evolutionary emergence of cooperative behaviors34,35,36,37,38 and rigorous testing of kin selection theory39. During the development of the real robotic system, we faced a problem of how to deal with overcrowding on the trail. Because the use of the trail inevitably puts robots (as it does ants) into traffic-jam-like overcrowding, an accessory regulation that supports efficient pheromone communication is required in both systems. As a solution to the robotic overcrowding, we heuristically introduced a set of collision-avoidance behaviors in the robots, which improved their group-foraging performance29,40. These behaviors constitute an overall traffic rule, such that inbound (food-to-nest) robots are always given priority over outbound (nest-to-food) robots. Interestingly, similar collision-avoidance rules have been reported in some ants41 (see Discussion). Although the ants and our robots are obviously different in many respects, the two systems share the same property of layered complexity: the core component (pheromone-mediated group foraging) is supported by regulatory traits (traffic rule that improves the group foraging). Therefore, we asked how such complex adaptive systems that would have been malfunctional without accessory regulations were achieved through adaptive evolution.

Using a simulated system that precisely modeled the dynamic properties of real robots29 and a field setup that mimics the foraging of arboreal ants on tree branches or of subterranean ants and termites inside tunnels, we first confirmed that the algorithm of pheromone communication alone was not sufficient to establish effective recruitment due to the overcrowding on the pheromone trail. The additional traffic rule was required to solve this problem by keeping the robots from being stuck on the trail. Next, we performed evolutionary population genetic simulations that allowed for mutation and selection to occur simultaneously in pheromone responsiveness and collision-avoidance behaviors of the robots. In accordance with the conventional notions of polygenic traits, it initially seemed reasonable to assume that such regulations evolve after the system’s core component has been established, simply because the former rely on the latter for their functions. However, in most simulation runs, the priority-giving behavior for collision-avoidance did not arise as a consequence of pheromone communication, but arose in advance, taking the form of selectively neutral behaviors in the absence of the self-organizing capacity. The behaviors then assisted the pheromone responsiveness trait to arise and become fixed in the population. Finally, we confirmed the above results with a population genetic analysis hybridized with simulated distributions of fitness values.

Results

Experimental setup

Our virtual robot swarms, each consisting of 30 simulated robots, searched for the food located on one end of a rectangular field (900 mm × 9000 mm) surrounded by walls and enclosing their nest-site on the other end (Fig. 1a). The algorithm for group foraging behaviors of the robot swarms29 is described as state transitions of each robot among three behaviors: S1, searching; S2, carrying food (inbound) and recruiting (laying scent); and S3, being recruited (outbound, following scent), which work as follows (Fig. 1b): once a searching robot (state S1) finds food, it starts to secrete a chemical compound on the ground while returning to its nest with a virtual food item (state S2). If the robots can detect the resulting chemical trail as a trail pheromone, they then follow the trail toward the food (state S3, regarded as successful recruitment). State S3 is functional only in the presence of the ability to detect pheromones. As an accessory regulation, we implemented a collision-avoidance behavior: When two robots collide, both take one of two reactions: ‘Stay’ (stop moving for a given time) or ‘Leave’ (move backward by a given distance). We assumed that the robots’ reactions depend on their state (for different assumptions, see Discussion). When colliding robots with different states take different reactions, they can be regarded as obeying a traffic rule, i.e., the robot with the reaction ‘Leave’ gives priority to the robot with the reaction ‘Stay’ (Fig. 1c). To allow for the robotic swarms to evolve, we made a simple assumption that each robot has its own haploid genome consisting of four loci, b1, b2, b3, and p, each of which has a binary allelic state (0 or 1). The resulting multilocus genotype is described as {b1,b2,b3;p}. The locus bi (i \(\in\) {1, 2, 3}) defines the collision-avoidance behavior (i.e., Stay = 0, Leave = 1) taken by the robot with state Si. That is, if a robot is currently in state Si, then its collision behavior is bi. The locus p defines the ability to detect pheromone.

Fig. 1
figure 1

Experimental setup. a The foraging arena used in the simulations. b The behavioral algorithm for individual robots with their state transitions (S1S3). c The collision-avoidance behaviors

Measuring fitness

We measured biological fitness of simulated clonal swarms resulting from their multilocus genotypes ({0,0,0;0} – {1,1,1;1}, 24 = 16 in total, Supplementary Table 1) to map a multidimensional fitness landscape. The total number of times the robots go back to their nest from the food in a given time (see Methods) was considered as a measure of swarm fitness. The ability to detect pheromone (p = 1) alone did not result in higher fitness (Fig. 2), although recruitment (S3) occurred (illustrated as the presence of orange bars in Supplementary Figure 1). The reaction ‘Leave’ at locus b1 contributed to the fitness increase regardless of the presence of pheromone responsiveness. Interestingly, particular sets of behavioral genotypes together with the pheromone responsiveness remarkably improved the swarm fitness, which was achieved through successful recruitment. Among the 16 multilocus genotypes, the genotype {1,0,1;1} showed the highest swarm fitness, in which a traffic rule was established with outbound robots (b3 = 1) giving priority to inbound robots (b2 = 0) on the pheromone trail. This result was consistent with our previous real robot experiment40. To confirm that the traffic rule helped the swarms to avoid overcrowding, we counted the number of collisions during each simulation run. The traffic rule on the pheromone trail (b2 = 0, b3 = 1), together with b1 = 1, strengthened the pheromone communication by reducing the occurrence of collision (Supplementary Figure 2).

Fig. 2
figure 2

Measures of group foraging performance. Each circle in the bubble plot represents the total number of foraging bouts (y-axis; as a measure of swarm fitness) shown by swarms with the multilocus genotype (x-axis), with its size corresponding to the number of observations (n = 100,000 trials for each multilocus genotype)

Evolutionary simulations

We next conducted population genetic simulations to trace an evolutionary trajectory of the robotic population that initially had the genotype {0,0,0;0}, that is, without pheromone communication and priority-giving behaviors. For simplicity, all robots in a single swarm were assumed to have the same genotype, and genetic variation was permitted only among swarms. In terms of sociobiology, only a single robot in the swarm clonally (with mutations) produces foundress robots of the next generation depending on the swarm fitness, each of which then clonally reproduces (without mutations) to form a new swarm. Consequently, the average relatedness of nestmates within a swarm is strictly 1.

Throughout the simulation replicates, as expected, the populations eventually became fixed at the genotype {1,0,1;1}; that is, the swarms successfully evolved the pheromone-mediated collective behavior together with the traffic rule (Supplementary Movie 1). Then we inspected the detail of each evolutionary trajectory. The population of swarms first became dominated quickly (frequency ≥ 0.985, fixed in most cases, Supplementary Table 2) by the genotype {1,0,0;0}, in which the reaction ‘Leave’ by randomly searching robots (b1 = 1) was not related to pheromone communication (Supplementary Movie 1). Starting from the population dominated by the genotype {1,0,0;0} and assuming the shortest path, the robot swarms had to experience 0 → 1 mutations at both loci b3 and p before fixation of the genotype {1,0,1;1}. Surprisingly, in most (44/50) of the simulated genealogies leading to the final genotype {1,0,1;1}, the 0 → 1 mutations occurred first at the behavior locus b3, followed by the pheromone-responsiveness locus p (Fig. 3a). The intermediate genotype {1,0,1;0} never became fixed; it served as the source of the final genotype when it had an intermediate (0.005–0.685) frequency in the population (Fig. 3b, Supplementary Table 2). Among the rest of the simulated genealogies, five had the 0 → 1 mutation occurring first at locus p and then at locus b3. The intermediate genotype {1,0,0;1} remained at a low frequency (0.01–0.02). One genealogy did not take the shortest path from {1,0,0;0} to {1,0,1;1}; its sequence was {1,0,0;0} → ​{1,1,0;0} → ​{1,1,1;0} → ​{1,1,1;1} → ​{1,0,1;1}. Mutations at the behavior loci were observed both before and after the 0 → 1 mutation at locus p. The irregularity can be explained by an additional 0 → 1 mutation at locus b2 before the 0 → 1 mutation at locus b3 while the dominant pattern of evolutionary precedence of b3 = 1 over p = 1 remained intact.

Fig. 3
figure 3

Simulated evolutionary dynamics. a In most cases, the population of the ancestral genotype {0,0,0;0} (dark gray) was quickly taken over by the genotype {1,0,0;0} (light gray); the genotype {1,0,1;0} (yellow) as well as other genotypes then arose but remained at low frequencies; and the final genotype {1,0,1;1} (red) arose from {1,0,1;0} and quickly became fixed. The whole population showed a state transition from all-{1,0,0;0} to all-{1,0,1;1}. b Selected genealogies of the genotype {1,0,1;0} during generations 192 and 201 shown in a. Circles denote swarms, with colors as in a. This includes the genealogy leading to the final genotype {1,0,1;1}. The regulatory trait b3 = 1 predated the evolution of pheromone responsiveness trait p = 1 along the focal genealogy

Interpreting the evolutionary process

The observed bias toward evolutionary precedence of the regulatory trait over the core pheromone-responsiveness trait can be explained using the fitness landscape as follows (Fig. 4a): One of the two possible intermediate genotypes, {1,0,0;1}, actually had lower swarm fitness than the genotype {1,0,0;0} because of overcrowding on the pheromone trail (Supplementary Movie 1), whereas the other intermediate {1,0,1;0} gave the same fitness in principle because locus b3 gives a selectively neutral trait by definition, as long as the pheromone-responsiveness is absent (i.e., p = 0) (Supplementary Movie 1). Consequently, the genotype {1,0,1;0} is more likely to arise first.

Fig. 4
figure 4

a Schematic diagram of fitness landscape involving the core (p) and regulatory (b3) traits of the self-organizing capacity. Starting from the genotype {1,0,0;0} (light gray), the swarm fitness of the genotype {1,0,0;1} (blue) is generally lower than that of the genotype {1,0,1;0} (yellow), making the latter more likely to arise first (depicted by the thicker arrow) and to serve as the precursor of the final genotype {1,0,1;1} (red). Height of the bars corresponds to swarm fitness in Supplementary Figure 1. b Resampled distribution of relative swarm fitness of the neutral intermediate {1,0,1;0} (yellow), inferior intermediate {1,0,0;1} (blue), and final {1,0,1;0} (red) genotypes, compared with the original genotype {1,0,0;0} (− − −). The distributions were incorporated into the mathematical analysis

The evolutionary process with selectively neutral or even inferior intermediates is known as stochastic tunneling42 (see Discussion). To evaluate how common the observed evolutionary precedence of the regulatory trait (45/50; note that we tentatively included the irregular genealogy described above) is, we applied the population genetic formulation of stochastic tunneling43 that analytically yields the point estimate of waiting time to fixation of the genotype {1,0,1;1}. Starting from the population fixed at the genotype {1,0,0;0}, the evolving population takes one of the two possible shortest evolutionary paths—that is, the path with the intermediate genotype {1,0,1;0} (b3 = 1 first) and that with the alternative intermediate {1,0,0;1} (p = 1 first)—and the realized path could be predicted as the one with the shorter waiting time by comparing respective estimates. Among the parameters of the analytical model were the relative fitness (compared to the original genotype {1,0,0;0}) of the neutral intermediate {1,0,1;0}, the inferior intermediate {1,0,0;1}, and the final genotype {1,0,1;1}, which had to be derived empirically from the evolutionary simulations. Therefore, we incorporated resampled distributions of relative mean swarm fitness (Fig. 4b) into the analytical model (see Methods and Supplementary Note 1 for details).

We generated 1000 sets of 50 evolutionary outcomes that the populations were expected to show. The 50 outcomes were typically dominated by those with the intermediate genotype {1,0,1;0}, and their frequency distribution (calculated over the 1000 sets) included the observed frequency (45/50) at the borderline of the 95% interval between the 2.5th34 and 97.5th44 percentiles (Supplementary Figure 3). The rare outcomes with the inferior intermediate (i.e., the precedence of pheromone responsiveness) were explained by the stochasticity in the swarm fitness that sometimes resulted in the higher mean relative swarm fitness of this genotype than of the neutral intermediate (Fig. 4b). Moreover, the waiting time to fixation averaged over the 50 runs (157.68 generations) again fell within the 95% interval between the 2.5th (145.54) and 97.5th (354.76) percentiles of the distribution of the mean waiting time (Supplementary Figure 4). These analyses support the view that the phenomenon observed in our evolutionary simulations is well-captured quantitatively by the theory of stochastic tunneling, and that the evolution of self-organizing capacity in our robotic system was facilitated by this evolutionary process.

Discussion

One of the long-standing debates in evolutionary biology ever since Charles Darwin is based on the observation that the complexity of life we can find today seems too sophisticated to be achieved through gradual evolution44,45. Advocates of saltationism or macromutationism claim that gradualism cannot account for the incipient stages of complex adaptive traits because the intermediate forms must be maladaptive45. The debate has consequently motivated further work on the genetic and developmental basis of organismal complexity, leading to the rise of present-day evolutionary developmental biology (reviewed in e.g., refs. 46,47). The current consensus is that both views are correct depending on systems, and that the real evolutionary processes are fairly complex48. These contrasting views on the origin of complexity can also be applied to the evolution of social systems, which motivated our present study. In our robotic swarms, the intermediate maladaptive stage corresponds to the pheromone-based recruitment without the traffic rule. By taking a standard population genetic approach, we demonstrated that the resulting fitness valley can be dynamically bypassed through a selectively neutral alternative, that is, cryptic priority-giving behavior without pheromone-based recruitment. Here, the regulatory mechanism was not an evolutionary follower of the core component; instead, it preceded the establishment of the core component and assisted it. The evolutionary precedence of some regulatory mechanisms originated as selectively neutral traits (neutral from their regulatory functions) might be a general component feature of complex adaptive systems, as long as the systems’ core components alone are maladaptive and the accessory regulations rely for their functions on the core components. It does not rule out the possibility that the regulatory mechanisms can arise after the establishment of the core components. The resulting layers of accessory regulations should contribute to the complexity of life.

The role of the cryptic regulatory mechanisms can be understood by employing the structuralist concept of exaptation49, a formal definition of so-called preadaptation. Exaptation refers to the evolution of traits that had originated by the selective advantage other than their current use (pre-aptation) or that had originally been non-adaptive (non-aptation or spandrel); both were subsequently co-opted into current adaptive use. The regulatory behavior of our system may provide an example of non-aptation because, by definition, the trait b3 = 1 was selectively neutral at its origin. A biologically realistic interpretation of this selective neutrality in the absence of pheromone communication would be that the trait b3 involves the behavior that is specifically released when colliding with the food-laden agent (i.e., with state S2). It should be noted here that we can also consider different genetic coding of the three behavioral traits (b1b3). On the basis of another biologically reasonable assumption that the trait b3 = 1 (or b2 = b3 = 1) is a byproduct following the expression of b1 = 1 (i.e., pleiotropy), again we can expect preadaptive precedence (or spandrel, i.e., a neutral byproduct of previous adaptation in other contexts50) of the regulatory mechanism in this foraging system, otherwise the fitness valley would arise owing to overcrowding (Supplementary Note 2).

In our simulations, the bypassing of the fitness valley was driven by the evolutionary process called stochastic tunneling. This was originally proposed as a mathematical description of cancer initiation42,51, and the same mechanism was independently found in an analysis of the evolution of cis-regulatory elements (genetic regions that regulate gene transcription in the physical vicinity of the target gene)52. Taking cancer initiation as an example, its somatic evolution is characterized by a two-step process leading to biallelic loss-of-function mutations at the tumor suppressor gene. The first mutation at a single allele is either selectively neutral or disadvantageous (through chromosomal instability) for cell proliferation, and the second mutation at the counterpart allele of this diploid locus triggers increased proliferation as tumor cells. Through stochastic tunneling, the population of cells can shift from the all-intact state to the all-tumor state without experiencing the all-intermediate state. The situation is very similar to our model, except that the genes involved are more than one in our clonal robots. The population of robot swarms favored the neutral intermediate over the disadvantageous alternative, which was easily explained by the comparison of waiting time estimates between the two tunneling routes. The role of stochastic tunneling in the origins of more complex systems, especially those with recombination21, epistasis53,54,55, or indirect genetic effects54,55, deserves further study.

A potential concern about applying our rather retrospective approach (i.e., the genetic encoding was made after the best set of behaviors thus far had been known heuristically, see Introduction) to real complex adaptive systems would be that biological systems cannot tell a priori what should serve as regulatory mechanisms before the emergence of a core system. Nevertheless, we can predict that complex adaptive systems should be found more frequently in systems allowing more capacity for neutral genetic variations as a source of exaptation. The role of cryptic genetic variations in the emergence of evolutionary novelty is well acknowledged in current evolutionary biology56. Such cryptic variations would help a primitive system to avoid crossing the fitness valley by providing selectively neutral alternatives. In the context of social organizations, a previous theoretical study predicted that genes with social effects should harbor more variations within a population owing to weaker selection pressure on indirect genetic effects57. Meanwhile, computational studies that focused on the architecture of biological systems, such as genetic and neural networks58,59,60 and secondary structures of RNA molecules61, have acknowledged the importance of neutral variations as evolutionary capacitors. By highlighting the importance of exaptation and neutral genetic variations for complex adaptive systems, our study bridges an apparent gap between computational and macro-biological studies on the evolution of biological self-organization. Phylogenetic comparative methods might help to test our prediction empirically by reconstructing multi-trait evolutionary processes that lead to extant complex adaptive systems.

In our evolutionary model, there is another exaptation regarding the pheromone-communication ability. That is, we implicitly assumed that the inbound robots had already secreted some chemical substance on the ground, and that pheromone communication was achieved upon acquisition of the detection ability as a cue (i.e., 0 → 1 mutation at the pheromone-related trait p). In the study of animal communication, there are two models describing the evolutionary origin of communication: sender-precursor model and receiver-precursor model62. The former model assumes that there was initially a material (visual, chemical, or tactile) emitted from the sender-to-be, which are then utilized by the receiver as a cue. We followed this scenario by assuming that there were already some chemicals secreted by inbound agents. The latter model assumes that there was initially a sensory characteristic of the receiver-to-be agents (also known as sensory bias), which are then co-opted into the biased communications with specific interactants. Although it is beyond the scope of our present study, our bottom-up approach with embodied agents would also be suitable to study the evolutionary origins of communication systems (see ref. 35).

Our simulated robots favored the traffic rule under which inbound robots had priority over outbound robots on the trail. In real social insects, previous studies reported traffic-rule-like behaviors shown by ant foragers along their foraging trails (reviewed in ref. 41). Examples include three-lane bidirectional traffic flow63, alternating clusters of inbound and outbound ants facing a traffic bottleneck64, inbound leaf-laden ants followed by unladen ants65, and alternative route selection through inbound–outbound behavioral interactions at the junction66. In those examples, macroscopic traffic flow emerges from microscopic behavioral rules where inbound ants are given priority over less-loaded outbound ants. Using embodied robots supplied with virtual food, our study demonstrated that such traffic rules do have an adaptive significance for efficient logistics, in addition to their mechanism (proximate cause) in which real inbound ants have less maneuverability due to food load41. It should be noted here that the rectangular field setup used in our simulations was not the primary factor giving an advantage of having the traffic rule. Our previous study using real robots40 showed that the same traffic rule {1,0,1;1} remained essential for the pheromone-mediated group foraging with fewer (n = 4–10) robots on a field of different shape (3600 mm × 3600 mm). Our field setup might be applied not only to the group foraging behaviour (see Introduction) but also to the traffic flow inside ant nests, where numerous ants have to manage overcrowding along their underground galleries. The effects of robot density and field shape on the evolutionary order of core (p = 1) and regulatory (b3 = 1) components deserve further study.

To obey the traffic rule, the trail-following robots (i.e., with state S3) need to put the priority-giving behavior temporarily above the core pheromone responsiveness. This temporal irregularity becomes evident when the reaction ‘Leave’ often moves the robot away from the pheromone trail (Supplementary Movie 1). The priority-giving behavior is released after the direct experience of collision with inbound robots, while the pheromone responsiveness can be regarded as following socially available information of food location. Therefore, the adaptive use of the traffic rule might be a situation whereby the robots flexibly prioritize direct social information (collision suggesting traffic jam) over indirect social information (pheromone trail) depending on their internal state during behavioral decision-making. Recent studies have revealed how ants make decisions under such conflicting information (reviewed in ref. 67). The use of multiple information sources and their integration during collective decision-making would be of particular interest in future study.

An advantage of taking constructive approaches with embodied agents is that we can incorporate physical consequences derived from properties of the agents and their abiotic and biotic interactions. Some of them might manifest as physical constraints hindering adaptation, such as the overcrowding we observed, but a more positive aspect would be greater, often unpredictable, degrees of freedom (or evolvability) supplied to the dynamic systems. Previous studies of experimental evolution with swarm robots have revealed various aspects of such consequences, including coordinated behaviors (e.g. ref. 68) and self-organized division of labor69 (see also Introduction). The evolutionary convergence of traffic rules between ants and our robots, together with those earlier studies, clearly indicates that a collaboration between macro-biology and swarm robotics provides a promising avenue to elucidate the evolutionary and developmental processes leading to the complexity of social life, as well as a hopeful engineering application to solve our real-world problems.

Methods

The basic algorithm for the pheromone-mediated group foraging behavior and the priority-giving behavior, as described in Results, were originally designed to control real robots29. The real robot system, named ARGOS-02 (‘Ant’elligent Robot Group Operating System, note that our system is different from another swarm robot system named ARGoS70), used an aqueous solution of ethanol as the trail pheromone. ARGOS-02 is a modified version of the original ARGOS-0128, using two sets (instead of the original one) of micro-pumps and tanks to secrete the pheromone at arbitrary concentrations. The agent-based simulation platform was developed in-house, written in C28,29,40 (see also Code availability). Dynamic properties of this simulated system have been well validated by comparing with real robot systems28,29,40. Briefly, this system is intended to emulate shape characteristics of our robots (rigid-body objects with 150 mm and a maximum speed of 100 mm/s) and the experimental field, evaporation dynamics of the pheromone (with coefficients of evaporation and diffusion) and its sensing, and the collision process described by simple contact of a robot with walls and other robots (for a full description, see ref. 29).

Experimental setup

A simulated swarm consisted of n = 30 robots, which were provided with a rectangular field of 900 mm × 9000 mm surrounded by walls, together with a food source (ø 300 mm) and a nest (ø 1000 mm) on opposite ends (Fig. 1a). The food emitted light, which a robot could detect within a radius of 600 mm from its center. The nest location was made available to all robots by provision of an infrared light directly above it. The initial positions of the robots were set randomly on the field, and each swarm was allowed to forage for 12,000 time steps (20 min for the real robots).

Evolutionary simulations

The evolving population consisted of N = 200 non-interacting swarms (i.e., no resource competition among swarms). We assumed a Wright–Fisher population with a constant size. The genetic coding of the traits is described in the main text. As a prior state to the pheromone-related trait p, we implicitly assumed that the inbound robots had already secreted some chemical substance on the ground, and that pheromone communication was achieved upon acquisition of the detection ability as a cue (see Discussion). A swarm was selected as a (clonal) parent for the next generation’s swarm in proportion to its fitness, so that a swarm with the better foraging performance had the greater chance of producing offspring clonally (i.e., being selected as a parent swarm in the simulations). In the next generation, bi and p of all genomes in an offspring swarm mutated to the other value (0 → 1 or 1 → 0) with a probability µ = 0.001. The low value of mutation rate, compared to conventional evolutionary simulation studies used in computer sciences, was intended to approximate to real biological systems (i.e., it should take relatively long generations for a polygenic system to reach the optimal state). On the basis of the preliminary observations that the genotype {1,0,1;1} is uninvadable by the other genotypes, each simulation run continued until the population became fixed at that genotype. Genotypes of the parent swarms of the genotype {1,0,1;1} that became fixed were determined by direct assessment of the genealogies.

Analytical solutions of time to fixation

We considered the time until the genotype became fixed {1,0,1;1}, starting from the population of the genotype {1,0,0;0}. The time to fixation can be obtained analytically by using population genetic formulations43. Four kinds of genotypes were considered: the original genotype {1,0,0;0}, the two intermediates {1,0,1;0} (neutral) and {1,0,0;1} (inferior), and the final genotype {1,0,1;1}.

We calculated the time (generations) until the final genotype became fixed, starting from the population with the original genotype. The original-to-intermediate transition and the intermediate-to-final transition at a swarm occur with the same probability µ, which corresponds to the mutation rate of a single locus. We obtained distributions of the relative fitness (compared to the original) of the neutral intermediate (denoted by r0), the inferior intermediate (r), and the final genotype (a) using data from our evolutionary simulations, as described in Supplementary Note 2.

The probability of tunneling can be given by T = [1 – U(rx)](1 – v1), where N is the number of swarms, U(rx) is the probability of fixation of the intermediate genotype with relative swarm fitness rx (x \(\in\) {0, –}), and v1 is the probability of non-appearance or extinction of the final genotype lineage arising from a single swarm of the intermediate genotype with relative swarm fitness rx43. The expected time until the final genotype is given by

$$E\left[ t \right] = \frac{T}{{\left( {T + S_1} \right)^2}} + \frac{{S_1(S_1 + S_2 + T)}}{{S_2\left( {T + S_1} \right)^2}}$$

where \(S_1\) and \(S_2\) describe the probabilities that primary and secondary mutations, respectively, become fixed. In the equation, the first term represents the contribution from tunneling paths (original → final) and the second term the contribution from the sequential paths (original → intermediate → final)43. Given that a shorter time results in a higher probability of realization, the evolutionary paths are expected to go through the intermediate genotype with the highest relative fitness when there are more than one candidate tunneling paths. See Supplementary Note 1 for details.

The main codes of the simulations were implemented in C and R. To shorten the calculation time, we used parallel computation realized by OpenMP. The evolution of swarms was managed by Python programs. For the analytical calculation, we used Mathematica, especially for large matrix calculations.

Code availability

The source code is available at https://github.com/SWARM-ARGOS/Harsh_Mistress/.