Mode switching in organisms for solving explore-versus-exploit problems

Biswas, Debojyoti; Lamperski, Andrew; Yang, Yu; Hoffman, Kathleen; Guckenheimer, John; Fortune, Eric S.; Cowan, Noah J.

doi:10.1038/s42256-023-00745-y

Download PDF

Article
Open access
Published: 26 October 2023

Mode switching in organisms for solving explore-versus-exploit problems

Nature Machine Intelligence volume 5, pages 1285–1296 (2023)Cite this article

4158 Accesses
143 Altmetric
Metrics details

Subjects

Abstract

Trade-offs between producing costly movements for gathering information (‘explore’) and using previously acquired information to achieve a goal (‘exploit’) arise in a wide variety of problems, including foraging, reinforcement learning and sensorimotor control. Determining the optimal balance between exploration and exploitation is computationally intractable, necessitating heuristic solutions. Here we show that the electric fish Eigenmannia virescens uses a salience-dependent mode-switching strategy to solve the explore–exploit conflict during a refuge-tracking task in which the same category of movement (fore-aft swimming) is used for both gathering information and achieving task goals. The fish produced distinctive non-Gaussian distributions of movement velocities characterized by sharp peaks for slower, task-oriented ‘exploit’ movements and broad shoulders for faster ‘explore’ movements. The measures of non-normality increased with increased sensory salience, corresponding to a decrease in the prevalence of fast explore movements. We found the same sensory salience-dependent mode-switching behaviour across ten phylogenetically diverse organisms, from amoebae to humans, performing tasks such as postural balance and target tracking. We propose a state-uncertainty-based mode-switching heuristic that reproduces the distinctive velocity distribution, rationalizes modulation by sensory salience and outperforms the classic persistent excitation approach while using less energy. This mode-switching heuristic provides insights into purposeful exploratory behaviours in organisms, as well as a framework for more efficient state estimation and control of robots.

Internal state dynamics shape brainwide activity and foraging behaviour

Article 18 December 2019

Information-based centralization of locomotion in animals and robots

Article Open access 13 August 2019

Locomotion rhythm makes power and speed

Article Open access 28 August 2023

Main

Organisms show complex patterns of movement that arise from the interplay between obtaining information (‘explore’)^1,2,3 and using current information (‘exploit’)⁴. Exploratory movements to gain information and exploitative movements to achieve the task at hand are often mediated by the same motor systems. For example, the weakly electric glass knifefish (Eigenmannia virescens) produces both information-seeking exploratory movements^2,3,5 and goal-driven exploitative movements to remain within a refuge^6,7 using the same ribbon-fin locomotor system^8,9. Both of these types of movement occur in a single linear dimension, along the rostrocaudal axis. This behaviour makes E. virescens an excellent model system with which to investigate the interplay between explore and exploit movements: within a fixed refuge, fish produce ancillary back-and-forth exploratory movements to sense the refuge^2,3,5, but these back-and-forth (explore) movements conflict with the corrective movements (exploit) required for station keeping.

Resolving this conflict between explore movements⁵ versus goal-directed exploit movements is a computationally intractable optimization problem^10,11,12. How do organisms resolve the explore–exploit conflict? A simple heuristic to solve this problem would be for an organism to perform goal-directed exploit movements while superimposing continuous small exploratory sensing movements—in other words, to use a persistent excitation approach¹³. Indeed, this heuristic has proven effective (if suboptimal) as an engineering approach to solve the explore–exploit problem of identifying states and parameters of a dynamical system during task execution¹³. If organisms were to employ such a strategy, they would produce movement statistics that correspond to a single behavioural mode (for example, a single-component Gaussian distribution) that continuously superimposes explore and exploit behaviour.

In contrast, we discovered that E. virescens does not use a persistent excitation strategy; instead, it shows a mode-switching strategy between fast, active-sensing movements (explore) and slow, corrective movements (exploit). This mode switching is modulated by sensory salience (Fig. 1 and 2). To assess the generality of this mode-switching strategy we investigated ten additional tasks performed by ten species ranging from amoebae to humans^{14,15,16,17,18,19,20,21,22}, using five major sensing modalities—vision, audition, olfaction, tactile sensing and electrosensation (Fig. 3). On the basis of this extensive reanalysis, we found that such mode switching—and its dependence on sensory salience—is found across diverse behaviours, taxa and sensing modalities (Fig. 4). Inspired by this widespread biological strategy, we propose an engineering heuristic for selecting behavioural modes based on state uncertainty (Fig. 5), and show that this heuristic captures key features of mode switching found across organismal models. Furthermore, we show that this mode-switching heuristic can achieve better task-level performance, and do so with less control effort, than the conventional persistent-excitation strategy.

**Fig. 1: Velocity distributions are broad-shouldered.**

**Fig. 2: Bursts of faster movements are more common in lights-off trials than in lights-on trials.**

**Fig. 3: Diversity of organisms and sensing modalities used for analysing mode-switching strategies.**

**Fig. 4: Broad-shouldered velocity distribution is found across taxa, behaviours and sensing modalities.**

**Fig. 5: Template model illustrating three different exploration strategies.**

E. virescens exhibited fast and slow behavioural modes

We examined the behaviour of individual E. virescens as they performed untrained station keeping within a fixed refuge (Fig. 1a,b). Station keeping requires only small corrective movements; therefore, any significant movements by the fish are attributed to information-seeking, exploratory movement^2,3,5. Previous work²³ has demonstrated that E. virescens use both vision and electrosense for station keeping. Hence varying the light level is an experimental mechanism to examine the effect of visual salience on the selection between explore and exploit movements.

We measured the movements of five individual fish in 40 s duration station-keeping trials, in two lighting conditions: lights ‘off’ trials had low illumination (~0.3 lx; Supplementary Video 1) and lights ‘on’ trials had bright illumination (~80 lx; Supplementary Video 2). We conducted between seven and ten trials per condition per fish. We discarded trials in which the fish changed its swimming direction or exited the refuge. Consistent with previous studies^2,5, fish moved significantly more in lights-off conditions than in lights-on conditions (Fig. 1c,d and Extended Data Fig. 1a–d). However, these previous analyses^2,5 focused on tracking performance using analytical methods including Fourier analysis and root-mean-square (RMS) metrics that masked the temporal structure of the active-sensing movements that we seek to understand in this paper.

We found the patterns of fish swimming velocities were consistent with a mode-switching strategy. The distribution of velocities (v) featured a sharp peak around v = 0 with ‘broad’ shoulders for faster movements (Fig. 1e,f, right). These empirical distributions differed from a Gaussian distribution in two ways: (1) the distinct central peak and (2) the broad shoulders corresponding to the faster movements. The central peak (near-zero velocity) represents slow movement and the broad shoulders represent faster movement. These behavioural modes are associated with exploit and explore, respectively, as discussed in greater detail below.

The two behavioural modes were significantly better approximated by three-component Gaussian mixture models (GMMs) than by single-component models (Fig. 1e,f, right). This was shown by three measures, namely Kullback–Leibler divergence, Bayesian information criterion and closeness of quantile–quantile plots to the reference line (Extended Data Table 1, Extended Data Fig. 1e,f and Supplementary Fig. 1). The three-component GMMs generally comprised a sharp central Gaussian peak, capturing slow, task-oriented station-keeping movements, and two Gaussian ‘shoulders’, capturing faster, positive (forwards) and negative (backwards) exploratory movements. We found that only modest improvements in the fit of the GMMs occurred when using more than three components (Extended Data Fig. 1g).

The fast-mode movements increased in frequency in lights-off trials, increasing the relative prominence of the ‘shoulders’. For example, Fig. 1e,f shows representative data from one fish in which there were 48 fast movements with lights off (Fig. 1e, left) but only 13 fast movements with lights on (Fig. 1f, left). Interestingly, the overall higher proportion of fast velocities in lights-off trials leads to a surprising result, namely, higher kurtosis values for lights-on versus lights-off trials (Extended Data Fig. 1h). In other words, the increase in frequency of fast motions in the dark leads to a decrease in the relative prominence of the central, task-oriented velocity peak at v = 0, so that the overall distribution is closer to Gaussian and the kurtosis trends towards 3.

We found that the trend towards a Gaussian distribution of movement velocities in lights-off trials (reduced sensory salience) to be surprising because the exploratory movements for actively sensing the environment are associated with a nonlinear requirement^24,25 to make movements that are potentially in conflict with task goals. Therefore, our initial hypothesis—that this nonlinearity would produce increased deviation from a Gaussian velocity distribution as sensory salience was reduced—was not supported. Our initial intuition failed because we did not appreciate that decreases in sensory salience drives the selection of explore behaviour, and that behaviour itself is approximately Gaussian, ultimately reducing the relative prominence of the task-oriented central peak.

Interestingly, reanalysis of data from a previous study of exploratory movements in a similar refuge-tracking paradigm in E. virescens show the same relationship between sensory salience and changes in velocity profiles, but for modulations of a different sensory modality, namely, electrosensation³. In these previous experiments, artificially generated electrical signals were used to diminish the salience of electrosensory information as the electric fish performed the refuge tracking. Our reanalysis of these published data (see Supplementary Material and Methods for details) showed that fish exhibited the distinctive non-Gaussian distribution of velocities. Moreover, the velocity distributions were modulated in relation to electrosensory salience: lower kurtosis values (corresponding to more normal distributions) occurred in experimental trials with added artificial electrosensory ‘jamming’ signals (Extended Data Fig. 2a–f).

Sensory salience drives explore–exploit mode switching

How do changes in sensory salience drive changes in mode switching? To investigate this question, we segregated the velocity trajectories into ‘S’, a slow-velocity mode (exploit) comprising task-oriented, station-keeping movements, and ‘F’, a fast-velocity mode (explore) comprising large positive (forwards) and negative (backwards) movement velocities (Fig. 2a,b, Extended Data Fig. 3 and Supplementary Fig. 2; see Methods for different clustering algorithms used).

Fish produced slow- and fast-velocity modes of movements in both lights-on and lights-off trials. We computed the residence time in each behavioural mode as a proportion of the total time spent in that mode compared with the trial duration of 40 s (note that the residence time in slow and fast modes adds up to unity). The residence time τ_s in the slow exploit mode was significantly higher (>1.7 times) in lights-on trials than the lights-off trials (Fig. 2c). In contrast, the residence time in the fast explore mode (1 − τ_s) was higher in light-off than in lights-on trials.

Fish switched between slow and fast modes more frequently in lights-off trials than in lights-on trials (Fig. 2d). From the computation of the transition rates between slow (S) and fast (F) modes as a two-state Markov process, we found that the transition rate S → F was significantly lower in lights-on versus lights-off trials, that is, the slow (exploit) state was visited more frequently in the lights-on trials compared with lights-off trials (Extended Data Fig. 3b). This salience-dependent modulation of switching frequency was the key mechanism by which movement velocity distributions trended towards a Gaussian distribution as a function of decreased sensory salience.

Mode-switching across taxa, behaviours and sensory modalities

Is this mode-switching strategy solution for the explore-versus-exploit problem found in other species, in other categories of behavioural tasks and in control systems that rely on other sensing modalities?

To answer this question, we analysed published data for an additional ten species, representing a wide phylogenetic range of taxa, from single-celled organisms to humans, involving categorically different tasks and sensorimotor regimes^{3,14,15,16,17,18,19,20,22}. These taxonomically diverse species were selected to encompass a wide range of behaviours that rely on a broad range of sensory systems (Fig. 3). For every example we examined, we found the same distinctive non-normal distribution of velocities, with a peak at low-velocity movements and broad shoulders for higher-velocity movements (Fig. 4).

For example, postural sway movements in humans are thought to prevent the fading of postural state information during balance²⁶. Our reanalysis of quiescent stance data¹⁷ revealed evidence of mode switching (Fig. 4a) that is remarkably similar to our findings in electric fish. In the quiescent stance task, human participants used visual and tactile feedback to maintain an upright posture. The distribution of sway velocities revealed a distinct peak at low velocities corresponding to the task goal, and broad shoulders for higher velocities produced by the exploratory movements; the velocity statistics were better captured by a GMM than a normal distribution. Furthermore, the velocity distribution showed the same surprising relation to changes in salience, becoming more Gaussian as well as an increase in the switching frequency when sensory salience was decreased (Extended Data Fig. 4a–d), as seen in the electric fish E. virescens.

Mode switching was also observed in invertebrate species. For example, the Carolina sphinx hawkmoth (Manduca sexta) uses somatosensory feedback from their proboscis to detect the curvature of flowers when searching for nectaries at dawn and dusk²⁰. In this search behaviour, which has dynamics that are qualitatively similar to vibrissal sensing in rats²⁷, the moth sweeps its proboscis across the surface of the flower using a combination of slow- and high-velocity movements. Our analysis of the distribution of the rate of change of radial orientation angle (angle between the proboscis tip trajectory and the radial axis of the flower), before the insertion of the proboscis tip into the nectary shows the characteristic sharp peak with broad shoulders (Fig. 4g) that is captured by a GMM. Experimental changes in the shape of artificial flowers that degrade the salience of the curvature of the flower surface²⁰ resulted in a decrease of the kurtosis value of the proboscis angular velocity distribution (Extended Data Fig. 5a–f), similar to how both E. virescens and humans responded to changes in sensory salience.

The fact that this salience-based, mode-switching strategy was found in two distantly related classes (mammalia and insecta) performing very different behaviours, using different sensorimotor systems, suggests that the strategy emerged as a convergent solution to the explore-versus-exploit problem. We found additional evidence of convergence of this solution in reanalysis of eight additional datasets: visual saccades in humans (Fig. 4b and Supplementary Fig. 3)²¹ and in house mice Mus musculus (Fig. 3c and Supplementary Fig. 4)¹⁵, movements of the pinnae of echolocating big brown bats Eptesicus fuscus (Fig. 4d and Supplementary Fig. 5)¹⁶, olfaction in eastern moles Scalopus aquaticus (Fig. 4e and Supplementary Fig. 6)¹⁴ and American cockroaches Periplaneta americana (Fig. 4f and Extended Data Fig. 4e–f)¹⁸, and visual tracking of a swaying flower in three species of hawkmoths (M. sexta, Deilephila elpenor and Macroglossum stellatarum; Fig. 4h and Extended Data Fig. 5g–p)¹⁹. The discovery of a similar, parsimonious velocity distributions across taxa, behaviour and sensing modalities, with consistent dependency on sensory salience, was surprising.

Intriguingly, our analysis of the dynamics of transverse exploration by pseudopods of amoebae²² (Amoeba proteus and Metamoeba leningradensis) reveals similar GMM velocity distributions in response to an electric field (Extended Data Fig. 6). Although our modelling approach (see next section) includes inertial dynamics, which cannot be directly applied to movement of organisms in the low-Reynolds-number regimes occupied by single-celled and other microscopic organisms, these observations are consistent with a mode-switching strategy for the control of movement in these amoebae.

The examples described above include a broad phylogenetic array of organisms that perform a variety of behavioural tasks using different control and morphophysiological systems. Just as these behavioural systems evolved within each of the lineages represented in our reanalyses, we suggest that mode switching probably evolved independently in each lineage as well. In other words, the similarities we found across taxa are the result of convergent evolution towards a common solution—mode switching—for the explore-versus-exploit problem.

Heuristic model of the mode-switching strategy

Why might animals use mode switching, rather than the simpler heuristic of applying continual, low-amplitude exploratory inputs used by control engineers¹³? To address this question, we propose a parsimonious heuristic model that comprises a nonlinear motion-dependent sensor, a linear musculoskeletal plant, a state estimator (also known as an observer²⁸) and a mode-switching controller (Fig. 5a). For the musculoskeletal plant, we assumed a simplified second-order Newtonian model^9,24:

$$\begin{array}{rcl}\frac{{\mathrm{d}}x(t)}{{\mathrm{d}}t}&=&v(t),\\ \frac{{\mathrm{d}}v(t)}{{\mathrm{d}}t}&=&-\frac{b}{m}v(t)+\frac{1}{m}u(t)+{w}_{1}(t).\end{array}$$

(1)

Here t is dimensionless time, x(t) is dimensionless position, v(t) is dimensionless velocity, u(t) is the controller input and w₁(t) is process noise. The process noise includes noise due to physical disturbances^29,30 as well as motor noise³¹. The system parameters m and b represent unitless mass and viscous damping, respectively.

The key feature of the model is that the nonlinear sensory system (that is, the ‘motion-dependent sensor’) embodies the high-pass filtering (that is, fading or adapting) characteristics found across biological sensory systems^{26,32,33,34,35,36,37,38}. This sensory system model (‘motion-dependent sensor’ in Fig. 5a) assumes nonlinear measurements that decay to zero over time in the face of constant stimuli:

$$y(t)=\frac{{\mathrm{d}}}{{\mathrm{d}}t}s(x(t))+{w}_{2}(t)=g(x)v(t)+{w}_{2}(t).$$

(2)

Here s(x) is the position-dependent sensory stimulus experienced by the organism, g(x) is the spatial derivative of the sensory stimulus (ds(x)/dx), and w₂(t) is the sensory noise. The controller includes a state-feedback-based task-level control policy, $-f(\hat{x},\hat{v})$, which exploits previously collected sensory information; that information is parsimoniously encoded (1) in estimates of the position and velocity $(\hat{x},\hat{v})$, and (2) in an ongoing measure of uncertainty, M (based on the covariance of position and velocity estimates; see Methods). Previous theoretical work has demonstrated that exploratory movements are required for state estimation in control systems that rely on such high-pass (that is, fading) sensors^24,25. Hence, the controller also includes an active-sensing control policy, u_a(t), that seeks to gain new information through exploratory movements.

To find the optimal balance between exploit and explore components, for a given admissible control policy, π and a given weight, r for an input, $u\in {{{\mathcal{U}}}}$ (action space), we can define the average steady-state cost function:

$$J_{\pi}=\limsup_{t\to\infty}\mathbb{E}[x(t)^{2}+ru(t)^{2}]$$

(3)

where ${\mathbb{E}}$ is the expectation computed over all the trajectories induced by admissible control policy, π. Note that a control policy is admissible if it depends causally on the sensor and actuator data. We chose the cost function, J_π as a weighted combination of steady-state tracking error and control effort. Even with complete knowledge of the system states, computation of the optimal solution ${J}_{{\pi }^{* }}=\mathop{\inf }\limits_{\pi \in {{\varPi }}}{J}_{\pi }$ where Π is the set of all admissible control policies, is only tractable in the case of linear systems or systems with finite state and action spaces.³⁹. As the system is partially observed, existing approaches to optimal control require the solution to an optimal filtering problem and then formulate feedback laws on the filter states³⁹. The filtering problem requires computation of the conditional probability ${\mathbb{P}}\left(\left[\begin{array}{c}x(t)\\ v(t)\end{array}\right]\left\vert \right.\,y(\tau ),u(\tau )\,\forall \tau \le t\right)$. However, due to nonlinearity in the measurement (equation (2)), there is no tractable method to compute this conditional probability, and so heuristic strategies are required. We tested three exploratory movement heuristics for the controller to find an approximate answer to this intractable optimal control problem.

(1)
Zero excitation: this is a passive strategy (that is, no exploration) in which the system provides no input excitation for the actuation forces (u_a(t) = 0 for all t). This is a conventional state-feedback controller.
(2)
Persistent excitation: this scheme tests a common continuous exploration strategy used in the field of adaptive control¹³. The controller continually injects a Gaussian input u_a(t).
(3)
Triggered excitation: this mode-switching strategy depends on lower and upper thresholds, T_min and T_max; the controller injects Gaussian input only when the uncertainty in the state estimator M exceeds T_max, and then continues to inject input until this uncertainty drops below a lower threshold, T_min (Fig. 5b).

As previous theoretical work has shown²⁵, the zero-excitation strategy (that is, traditional state-estimate feedback) cannot minimize the state estimation error and hence, not surprisingly, results in poor tracking performance (Fig. 5c), thus illustrating the need for an additional active-sensing component in the face of adaptive sensing and perceptual fading. The persistent-excitation and triggered-excitation strategies both facilitate substantially better position control than does the zero-excitation strategy (Fig. 5c,d). Although these two strategies resulted in comparable tracking errors (e_RMS; Fig. 5b,c), the triggered excitation was more efficient, requiring substantially lower control effort (u_RMS; Fig. 5g,h). Moreover, unlike the persistent-excitation strategy, triggered excitation generated a distinctive broad-shouldered velocity distribution that featured a sharp peak near zero, with broad shoulders corresponding to bursts of fast movement (Fig. 5j,k). This distribution was strikingly similar to experimental observations across organisms (Figs. 1 and 4 and Extended Data Figs. 4–6), suggesting that such broad shoulders are a signature (if not definitive proof) of a mode-switching strategy.

We showed that active exploration is essential for better tracking performance as it improves state estimation. But, there is a point of diminishing returns: although higher (more energetic) active excitation can result in excellent state estimation, there is a point beyond which these additional active-sensing movements lead to greater tracking errors.

To contrast between persistent excitation and triggered excitation, we performed a numerical study to obtain the variance of the active-sensing signal u_a(t) that minimizes the RMS tracking error for the persistent excitation strategy, e_PE,min. Note that persistent excitation is the limiting case of triggered excitation with extremely low threshold values (that is, insuring that the active-sensing mode is always ‘on’). With that optimum stimulation obtained from persistent excitation, we next performed a parameter sweep involving threshold pair (T_max, T_min) in the triggered excitation. We discovered that the choice of thresholds in the triggered-excitation strategy plays an important role—with the right choice of parameters we could achieve better tracking performance (Fig. 5m) at reduced control effort (Fig. 5n). The choice of thresholds also shapes the velocity to best extract sensory information; with low thresholds, the statistics approach that of the persistent excitation, whereas high thresholds lead to velocity distributions with higher kurtosis (departure from normality), while requiring less control effort (Extended Data Fig. 7a–c).

How does sensory salience affect performance of the triggered excitation (mode switching) heuristic? To simulate the effects of changes in sensory salience, we parametrically varied the sensory noise variance while keeping constant the switching thresholds T_min and T_max. As the sensory noise variance was increased (simulating a decrease in salience), the kurtosis value of the velocity distribution decreased, numerically approaching normality in the limit of high sensory noise (Extended Data Fig. 7d–g). This trend of decreased kurtosis in the face of increased noise variance captures the widespread observation that, in animals, the velocity statistics tend towards a Gaussian distribution as sensory salience is decreased. Moreover, the underlying mechanism, namely, increasing frequency of bursts of exploratory movements, matches our experimental observations in E. virescens, which performed more frequent transitions to fast movements and spent less time in the slow mode in the lights-off trials than in the lights-on trials (Fig. 2c,d). These analyses clarify that this trend towards a Gaussian distribution with decreased salience is an epiphenomenon of mode switching: as the frequency of fast movement bursts increases, it overwhelms the task-oriented movements, diminishing the prominence of the central peak.

Discussion

We examined explore–exploit trade-offs in the context of goal-direct motor behaviours, such as station keeping, postural balance and plume tracking, that require active, exploratory movements to enhance sensation. We discovered that the velocity distributions that emerge from the interplay between exploratory movements and goal-directed control are broad-shouldered across taxa, and that this distinctive distribution of movements is robustly modulated by sensory salience. The bouts of ancillary movements that comprise the broad shoulders of these velocity distributions are commonly described as ‘active sensing’, that is, the expenditure of energy by organisms for the purpose of sensing⁴⁰, for example, ancillary movements described here. Active sensing also includes the emission of energetically costly signals such as electric fields by weakly electric fishes⁴¹ and echolocation calls in dolphins, birds and bats^42,43,44,45. Active-sensing research in humans, in relation to touch, was popularized in the 1960s by J. J. Gibson¹, and the original ideas date back at least to the eighteenth century (for a historical account, see ref. ⁴⁰).

Surprisingly, active sensing is largely avoided in engineering design despite being ubiquitous in animals. The performance of engineered systems may benefit from the generation of movement for improved sensing. An algorithm known as Ergodic Information Harvesting (EIH)³ could be used to control movements for sensing in artificial systems. This algorithm balances the energetic costs of generating movements against the expected reduction in sensory entropy. The EIH has been tested in relation to several animal model systems, and produces plausible animal trajectories³.

Interestingly, the EIH algorithm produces the opposite trend in kurtosis of velocity distributions in relation to sensory salience (Extended Data Fig. 2g–l and Extended Data Table 3) that we observed in our experiments, reanalysis of previous data and in our model: as sensory salience decreases, there is an increase in active-sensing movements but a decrease in kurtosis (Extended Data Table 2). That EIH leads to decrease in kurtosis occurs in part because EIH generates continuous-sensing movements, and does not incorporate mode switching. A refined EIH model, that generates the temporally distinct periods of sensing movements that characterize mode switching would better reflect our findings in animals, and is a promising strategy for improving the performance of robotic control systems.

How mode switching is manifest across the diverse biological systems we examined is a compelling open question. Many of these control systems have evolved via convergent evolution in which adaptive strategies emerge independently across lineages. One result of convergent adaptation is that species often rely on idiosyncratic features, such as feathers or skin flaps, to achieve the same adaptive strategy, such as flapping flight. We infer that the mechanisms for mode switching are present in control systems that range from subcellular systems⁴⁶ to neural systems in vertebrates.

The mechanisms for mode switching in vertebrate nervous systems may emerge at different levels within sensorimotor control pathways. For example, neurophysiological recordings show that sensory salience can be encoded in brain circuits via synchronization and desynchronization of spiking activity⁴⁷. Such population coding of salience^48,49,50, when coupled with a threshold, could trigger discrete bursts of motor activity for sensing⁸. Motor circuits for the production of discrete bursts of movement occur in spinal circuits⁵¹.

These discrete bursts of movements could arise from reflex-like, threshold-based activity in animals, akin to how Mauthner cells trigger a cascade of motor activity when sensory inputs exceed a threshold⁵². A key difference between reflex-like, threshold-based behaviours and the mode switching we describe in this paper would be that the signal in question would arise from an internal representation of sensory uncertainty, rather than from the overall level of sensory excitation. Such a reflex-like action could produce stereotyped forms of interactions with the external environment in relation to sensing⁸.

A common engineering approach to sensing and control is to add sensors and improve sensor performance, particularly at low frequencies, effectively side-stepping the need for active-sensing movements altogether. Such improved sensing enhances observability without relying on movement. In stark contrast, organismal sensor systems are almost invariably adapting (high pass), necessitating active sensing. Irrespective of whether organisms have achieved an optimal solution to the control problem (or instead are limited by evolutionary constraints on sensor performance), the widespread convergent evolution of a common active-sensing strategy nevertheless suggests an alternative engineering design paradigm. The confluence of adapting sensors⁵³ and the uncertainty-triggered mode-switching heuristic presented in this paper provide a new roadmap for movement control of robotic systems.

In this paper, the explore–exploit trade-off arises from the need for active-state estimation²⁸ in a subset of tasks in which movement is used both for acquisition of information and achieving task goals. However, similar trade-offs arise in a wide variety of potentially more complex behaviours. For example, in foraging where the resources are found in patchy distributions, organisms balance the trade-offs between exploiting a local food source, exploring for distant sources⁵⁴ and the costs of predation across the habitat⁵⁵. Similarly, reinforcement learning involves choosing whether to adhere to a familiar option with a known reward or taking the risk to explore unknown options that can lead to increased rewards over the longer term⁵⁶. We do not have direct evidence that the broad-shouldered feature we have identified in animal movements described here (Figs. 1 and 4)—reflecting the manifestation of mode switching—are also be found in these behavioural domains across taxa. Recent evidence from studies of human reinforcement learning, however, appear to be consistent with mode-switching behaviour.

Methods

Tracking of glass knifefish

Subjects

We obtained adult, weakly electric, glass knifefish E. virescens (10–15 cm in length) from commercial vendors, and housed the fish according to the published guidelines⁵⁷. The water temperature in the experimental tank was kept between 24 °C and 27 °C, and conductivity ranged from 10 μS cm⁻¹ to 150 μS cm⁻¹. Fishes were transferred from the holding tank to the experimental tank 12–24 h before the experiments to allow for acclimation. All experimental procedures were approved by the Johns Hopkins Animal Care and Use Committee, and followed guidelines established by the National Research Council and the Society for Neuroscience.

Experimental apparatus

The experimental apparatus was similar to that used in previous studies^2,6,8,23,58. The refuge was machined from a 152-mm-long segment of 46 × 50 mm rectangular PVC tubing, with the bottom surface removed to allow the camera to record the ventral view of the fish. On both sides of the refuge, a series of 6 rectangular windows (6 mm wide × 31 mm high, spaced 19 mm apart) were machined, through which to provide visual and electrosensory cues.

A computer sent designed digitized input stimuli (25 Hz) from LabVIEW (National Instruments) to a Field Programmable Gate Array based controller for a stepper motor (STS-0620-R, HW W Technologies). The stepper motor drove a linear actuator, leading to the one-degree-of-freedom refuge movement in real time. A video camera (pco.1200, PCO AG) captured fish movements through mirror reflection at 100 Hz. The captured frames (width × height, 1,280 pixels × 276 pixels) were saved as 16 bit .tif files via camera application software (pco.camware, PCO AG).

Experimental procedure

The experiments were conducted in two illuminance levels—around 0.3 lx (lights off) and 80 lx (lights on). Each trial lasted for 60 s. During the initial 10 s of each trial, the refuge was actuated to follow a 0.45 Hz sinusoidal trajectory, the amplitude of which was gradually increased to 3 cm, and then decreased to 0 at the end of the 10 s interval, in a similar fashion as described in ref. ². After the initiation phase, the refuge remained stationary for 40 s, finally followed by a termination phase for 10 s, during which the refuge was actuated in a similar fashion as during the initiation phase.

Tracking algorithm

To observe fine details of the fish movement, we used a high frame rate in our video recordings. High tracking accuracy was essential as the position and velocity data were likely to be contaminated by measurement noise. To ensure high tracking accuracy, the refuge and fish position were analysed by custom video tracking software⁵⁹ developed by Balázs P. Vágvölgyi from the Laboratory for Computational Sensing and Robotics, at Johns Hopkins University.

The tracking algorithm worked in two phases. The first phase was template matching, which roughly located the targets (fish or refuge). In the first frame of a given video, a rectangular region was manually selected around the target to create a template. On subsequent frames, a neighbourhood region around the template (±20 pixels) from the previous frame was selected for the computation of a normalized two-dimensional cross-correlation matrix. If the target changed its orientation in the new frames, before computing the normalized two-dimensional cross-correlation, the new image frame was first rotated to match the orientation of the template from previous frame. If needed, the areas of the image were sampled (then scaled and interpolated if necessary) with subpixel accuracy.

After creating the template, the second phase applied the Levenberg–Marquardt algorithm to find the global maximum of the normalized cross-correlation function. This step produced a match between the template and target at each frame, with subpixel accuracy. We performed extensive preliminary testing and analysis to confirm that the remaining measurement errors had smaller variance than the stochastic movements of the fish.

Data processing

The tracking algorithm stored the fish position in both horizontal and vertical directions (originally in pixels, along with the respective pixel-to-metre conversion factor) and the angle of orientation (in degrees) in .csv files. We used only the data while the refuge was stationary (40 s, 4,000 data points in total) for each trial. To further reduce the measurement noise, the position data were filtered through a Butterworth zero-phase distortion filter (filtfilt command in MATLAB) with a 10 Hz cut-off frequency. Fish velocity in the horizontal direction was computed as forward differences of the horizontal position time series.

Identification and characterization of behavioural modes

For the identification of the behavioural modes, we used three different clustering approaches—(1) a GMM with inflection-point-based clustering, (2) hidden Markov model (HMM)-based clustering, and (3) a GMM with maximum a posteriori probability (MAP)-based clustering.

For GMM with inflection-point-based clustering, the velocity (v) data from each individual fish at a specific lighting condition were clustered into three components, slow, fast positive and fast negative (Extended Data Fig. 3a,b), using two velocity thresholds, v_L and v_H (v_L < v_H), resulting in two behavioural modes—slow and fast (fast positive and negative were combined). The velocity threshold values were computed by finding the inflection points of the GMM fits to the velocity data, f_GMM, specific to a lighting condition. To numerically identify the inflection points of f_GMM, we numerically computed the spatial second-order derivative of f_GMM (${f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }$), and located the first and the last indices of the array ${{f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }}/{{f}_{{{{\rm{GMM}}}}}}$ such that the condition ${{f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }}/{{f}_{{{{\rm{GMM}}}}}} < c$ was satisfied for a given ad hoc constant c, selected as described below. This method separated the central peak of the f_GMM, velocity distribution around zero velocity, from the broad shoulders. We chose c = 0.005 for all the individual fish irrespective of the lighting conditions, except for fish 1 lights-off trials (c = 0.02); the different c value for fish 1 lights-off trials was chosen so that the relative area under the central peak of the distribution was less than 0.6 (similar to other fish during lights-off trials). For further analysis of these behavioural modes, we assumed a continuous-time Markov chain model (Extended Data Fig. 3c,d). For infinitesimal dt, the transition probabilities from state i to state j, Q_ij are given as follows:

$$\,{{\mbox{Pr}}}\,(v(t+{\mathrm{d}}t)=j| v(t)=i)={Q}_{ij}{\mathrm{d}}t+O({\mathrm{d}}t^2),\quad i\ne j$$

(4a)

$${Q}_{ii}=-\mathop{\sum}\limits_{i\ne j}{Q}_{ij}$$

(4b)

where O denotes order, and the probability matrix P with p_ij = Pr(v(t) = j∣v(0) = i) and transition rate matrix Q with entries Q_ij satisfy the first-order differential equation

$$\frac{{\mathrm{d}}}{{\mathrm{d}}t}{{{\bf{P}}}}={{{\bf{PQ}}}},$$

(5)

whose solution is given by

$${{{\bf{P}}}}={{\rm{e}}}^{t{{{\bf{Q}}}}}=\mathop{\sum }\limits_{n=0}^{\infty }\frac{{(tQ)}^{n}}{n!}.$$

(6)

For every trial from each individual fish, we computed the probability matrix P with entries p_ij, i = 1, 2, j = 1, 2 where states 1 and 2 correspond to slow (exploit) and fast (explore) modes, respectively. We used the approximation to the matrix exponential in equation (6), ${{{\bf{Q}}}}\approx \frac{1}{h}({{{\bf{P}}}}-I)$ for the computation of the transition rates between slow and fast modes in each trial from the respective probability matrix, P. Here h is time step = 0.01 s and I is 2x2 identity matrix.

For HMM clustering, we combined all the positional trial data (x_t) from all the five fish at a specific lighting condition along with their negatives (−x_t). The subscript t is a variable representing time. We included the negative data to eliminate any directional bias. We assumed that the observed measurements of position, x_t, follow a homogeneous Markov switching first-order autoregressive model:

$${x}_{t}={\alpha }_{0}^{{s}_{t}}+{\alpha }_{1}^{{s}_{t}}{x}_{t-1}+\sigma {\varepsilon }_{t}$$

(7)

where the superscript s_t ∈ {1, 2, 3} refers to the hidden discrete state, coefficients $\alpha^{s_t}_k \in {0,1}$ are model parameters, ε_t is Gaussian white noise, and $\sigma$ is the noise variance. We fit this model using the NHMSAR package in R.

The HMM fitting resulted in three clusters similar to slow, fast positive and fast negative, as obtained with the GMM with inflection-point-based clustering method (Extended Data Fig. 3e–h). Finally by combining fast positive and negative, we ended up with two behavioural modes—fast and slow—for further computation of switching frequency and residence time.

In GMM with MAP-based clustering, GMM models with three components were fitted to the velocity data from each individual fish at a specific lighting condition. We assigned the cluster index for each data point based on the maximum a posteriori probability using Bayes’ rule. This method required a post hoc assignment of which cluster or clusters correspond to the ‘slow’ behavioural mode to compute residence time; see, for example, Extended Data Fig. 3i–m.

All analysis was performed using code written in R and MATLAB.

Simulation

Sensory adaptation is a robustly observed phenomenon among organisms ranging from unicellular amoebae^37,38 to humans²⁶ where the sensory systems stop responding to constant stimuli. Here we modelled this adaptive/high-pass nature of the sensory receptors as a ‘motion-dependent sensor’ for which we assumed a nonlinear measurement model^24,25 with sensory noise w₂(t):

$$y(t)=\frac{{\mathrm{d}}}{{\mathrm{d}}t}s(x(t))+{w}_{2}(t)=\frac{{\mathrm{d}}}{{\mathrm{d}}x}s(x(t))v(t)+{w}_{2}(t)=g(x)v(t)+{w}_{2}(t).$$

(8)

Here, s(x) is the position-dependent sensory scene experienced by the organism. For the present study, we assumed a quadratic sensory scene $s(x)=\frac{1}{2}\alpha {x}^{2}+\beta x$ with non-zero constant sensory-scene parameters α and β. This assumption on sensory scene yields g(x) = αx + β, a linear function of position, x.

Due to the presence of the nonlinearity in the measurement, we used an extended Kalman filter for state estimation, a common heuristic. For the state-feedback component, we applied $f(\hat{x},\hat{v})={k}_{1}\hat{x}+{k}_{2}\hat{v}$ where $k_1$ and $k_2$ are the position and velocity feedback constants, respectively. In the triggered-excitation scheme, for the uncertainty measure (M) we used the trace of the state estimation error covariance matrix, Tr(P(t)). When the uncertainty measure Tr(P(t)) rose above a maximum threshold, T_max, the controller generated active-sensing component, u_a(t) as a Gaussian input with fixed power spectral density and it continued to inject the input until Tr(P(t)) dropped below a lower threshold, T_min. At this point, the controller switched back to traditional state-feedback form. For the persistent-excitation scheme, the controller continued to inject a Gaussian input u_a(t) for all time.

To obtain the critical excitation level of the active-sensing component u_a,crit(t) for optimum tracking performance in persistent excitation, we chose 30 logarithmically spaced variance values of u_a(t) from 1 to 100. From the mean of 100 independent simulations for each variance value, we obtained u_a,crit(t) ≈ 9.33, which achieved the minimum RMS tracking error of e_PE,min ≈ 0.071 and RMS control effort u_PE ≈ 10. Using this critical value for the excitation u_a,crit(t), we studied the effect of thresholds in triggered excitation by varying T_max and T_max/T_min linearly from 4 × 10⁻³ to 10 × 10⁻³, and 0.5 to 1, respectively, and performed 300 independent simulations for each pair of values.

The system parameters were chosen from previous studies^6,24 as follows: b = 1.7, m = 1, α = 3, β = 5, ${k}_{1}=m{\omega }_{n}^{2}$, k₂ = (2mζω_n + b), ζ = 0.56 and ω_n = 1.05 × 2π. The process noise, w₁(t) and sensor noise, w₂(t) were chosen as fixed Gaussian noise inputs with variances 0.03 and 10, respectively.

Statistics

All the statistical analysis was performed with sign test and Mann–Whitney–Wilcoxon test using custom codes written in R version: 4.3.0, R Core Team, and MATLAB version: 9.12 (R2022a), MathWorks. For all tests, the significance level was set to 0.05. The experimental and simulation data are provided as either mean plus or minus the standard deviation (μ ± s.d.) or mean plus or minus the standard error of the mean (μ ± s.e.m).

Data availability

An archived version of the datasets supporting this article is available through the Johns Hopkins University Data Archive at https://doi.org/10.7281/T1/QS3QFT ref. ⁶⁰

Code availability

An archived version of the analysis codes supporting this article is available through the Johns Hopkins University Data Archive at https://doi.org/10.7281/T1/QS3QFT ref. ⁶⁰

References

Gibson, J. J. Observations on active touch. Psychol. Rev. 69, 477–491 (1962).
Article Google Scholar
Biswas, D. et al. Closed-loop control of active sensing movements regulates sensory slip. Curr. Biol. 28, 4029–4036 (2018).
Article Google Scholar
Chen, C., Murphey, T. D. & MacIver, M. A. Tuning movement for sensing in an uncertain world. eLife 9, e52371 (2020).
Article Google Scholar
Soatto, S. in Machine Learning for Computer Vision (eds. Cipolla, R. et al.) 17–48 (Springer, 2013).
Stamper, S. A., Roth, E., Cowan, N. J. & Fortune, E. S. Active sensing via movement shapes spatiotemporal patterns of sensory feedback. J. Exp. Biol. 215, 1567–1574 (2012).
Article Google Scholar
Cowan, N. J. & Fortune, E. S. The critical role of locomotion mechanics in decoding sensory systems. J. Neurosci. 27, 1123–1128 (2007).
Article Google Scholar
Rose, G. J. & Canfield, J. G. Longitudinal tracking responses of the weakly electric fish, Sternopygus. J. Comp. Physiol. A 171, 791–798 (1993).
Article Google Scholar
Uyanik, I., Stamper, S. A., Cowan, N. J. & Fortune, E. S. Sensory cues modulate smooth pursuit and active sensing movements. Front. Behav. Neurosci. 13, 59 (2019).
Article Google Scholar
Sefati, S. et al. Mutually opposing forces during locomotion can eliminate the tradeoff between maneuverability and stability. Proc. Natl Acad. Sci. USA 110, 18798–18803 (2013).
Article Google Scholar
Golovin, D. & Krause, A. Adaptive submodularity: theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res. 42, 427–486 (2011).
MathSciNet MATH Google Scholar
Cooper, G. F. The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42, 393–405 (1990).
Article MathSciNet MATH Google Scholar
Blondel, V. D. & Tsitsiklis, J. N. A survey of computational complexity results in systems and control. Automatica 36, 1249–1274 (2000).
Article MathSciNet MATH Google Scholar
Narendra, K. S. & Annaswamy, A. M. Persistent excitation in adaptive systems. Int. J. Control 45, 127–160 (1987).
Article MathSciNet MATH Google Scholar
Catania, K. C. Stereo and serial sniffing guide navigation to an odour source in a mammal. Nat. Commun. 4, 1–8 (2013).
Article Google Scholar
Michaiel, A. M., Abe, E. T. & Niell, C. M. Dynamics of gaze control during prey capture in freely moving mice. eLife 9, e57458 (2020).
Article Google Scholar
Wohlgemuth, M. J., Kothari, N. B. & Moss, C. F. Action enhances acoustic cues for 3-D target localization by echolocating bats. PLoS Biol. 14, e1002544 (2016).
Article Google Scholar
Kiemel, T., Oie, K. S. & Jeka, J. J. Multisensory fusion and the stochastic structure of postural sway. Biol. Cyber. 87, 262–277 (2002).
Article MATH Google Scholar
Lockey, J. K. & Willis, M. A. One antenna, two antennae, big antennae, small: total antennae length, not bilateral symmetry, predicts odor-tracking performance in the American cockroach Periplaneta americana. J. Exp. Biol. 218, 2156–2165 (2015).
Google Scholar
Stöckl, A. L., Kihlström, K., Chandler, S. & Sponberg, S. Comparative system identification of flower tracking performance in three hawkmoth species reveals adaptations for dim light vision. Phil. Trans. R. Soc. Lond. B 372, 20160078 (2017).
Article Google Scholar
Deora, T., Ahmed, M. A., Daniel, T. L. & Brunton, B. W. Tactile active sensing in an insect plant pollinator. J. Exp. Biol. 224, jeb239442 (2021).
Article Google Scholar
Hauperich, A.-K., Young, L. K. & Smithson, H. E. What makes a microsaccade? A review of 70 years of research prompts a new detection method. J. Eye Mov. Res. 12, 1–22 (2019).
De la Fuente, I. M. et al. Evidence of conditioned behavior in amoebae. Nat. Commun. 10, 1–12 (2019).
Google Scholar
Sutton, E. E., Demir, A., Stamper, S. A., Fortune, E. S. & Cowan, N. J. Dynamic modulation of visual and electrosensory gains for locomotor control. J. R. Soc. Interface 13, 20160057 (2016).
Article Google Scholar
Kunapareddy, A. & Cowan, N. J. Recovering observability via active sensing. In Proc. Annual American Control 2821–2826 (IEEE, 2018).
Sontag, E. D., Biswas, D. & Cowan, N. J. An observability result related to active sensing. Preprint at https://arxiv.org/abs/2210.03848 (2022).
Fabre, M. et al. Large postural sways prevent foot tactile information from fading: neurophysiological evidence. Cereb. Cortex. Commun. 2, tgaa094 (2020).
Arkley, K., Grant, R. A., Mitchinson, B. & Prescott, T. J. Strategy change in vibrissal active sensing during rat locomotion. Curr. Biol. 24, 1507–1512 (2014).
Article Google Scholar
Yang, S. C.-H., Wolpert, D. M. & Lengyel, M. Theoretical perspectives on active sensing. Curr. Opin. Behav. Sci. 11, 100–108 (2016).
Article Google Scholar
Matthews, M. & Sponberg, S. Hawkmoth flight in the unsteady wakes of flowers. J. Exp. Biol. 221, jeb179259 (2018).
Article Google Scholar
Tritico, H. M. & Cotel, A. J. The effects of turbulent eddies on the stability and critical swimming speed of creek chub Semotilus atromaculatus. J. Exp. Biol. 213, 2284–2293 (2010).
Article Google Scholar
Harris, C. M. & Wolpert, D. M. Signal-dependent noise determines motor planning. Nature 394, 780–784 (1998).
Article Google Scholar
Nelson, M., Xu, Z. & Payne, J. Characterization and modeling of p-type electrosensory afferent responses to amplitude modulations in a wave-type electric fish. J. Comp. Physiol. A 181, 532–544 (1997).
Article Google Scholar
Lee, J. et al. Templates and anchors for antenna-based wall following in cockroaches and robots. IEEE Trans. Robot. 24, 130–143 (2008).
Article Google Scholar
Jun, J. J., Longtin, A. & Maler, L. Active sensing associated with spatial learning reveals memory-based attention in an electric fish. J. Neurophysiol. 115, 2577–2592 (2016).
Article Google Scholar
Clarke, S. E., Naud, R., Longtin, A. & Maler, L. Speed-invariant encoding of looming object distance requires power law spike rate adaptation. Proc. Natl Acad. Sci. USA 110, 13624–13629 (2013).
Article Google Scholar
Clarke, S. E., Longtin, A. & Maler, L. A neural code for looming and receding motion is distributed over a population of electrosensory on and off contrast cells. J. Neurosci. 34, 5583–5594 (2014).
Article Google Scholar
Takeda, K. et al. Incoherent feedforward control governs adaptation of activated ras in a eukaryotic chemotaxis pathway. Sci. Signal. 5, ra2 (2012).
Article Google Scholar
Biswas, D., Devreotes, P. N. & Iglesias, P. A. Three-dimensional stochastic simulation of chemoattractant-mediated excitability in cells. PLoS Comput. Biol. 17, e1008803 (2021).
Article Google Scholar
Bertsekas, D. Dynamic Programming and Optimal Control Vol. 1 (Athena Scientific, 2012).
Zweifel, N. O. & Hartmann, M. J. Defining ‘active sensing’ through an analysis of sensing energetics: homeoactive and alloactive sensing. J. Neurophys. 124, 40–48 (2020).
Article Google Scholar
Bullock, T. H. Electroreception. Annu. Rev. Neurosci. 5, 121–170 (1982).
Article Google Scholar
Brinkløv, S., Elemans, C. P. & Ratcliffe, J. M. Oilbirds produce echolocation signals beyond their best hearing range and adjust signal design to natural light conditions. R. Soc. Open Sci. 4, 170255 (2017).
Article Google Scholar
Nelson, M. E. & MacIver, M. A. Sensory acquisition in active sensing systems. J. Comp. Physiol. A 192, 573–586 (2006).
Article Google Scholar
Snyder, J. B., Nelson, M. E., Burdick, J. W. & MacIver, M. A. Omnidirectional sensory and motor volumes in electric fish. PLoS Biol. 5, e301 (2007).
Article Google Scholar
Ghose, K. & Moss, C. F. Steering by hearing: a bat’s acoustic gaze is linked to its flight motor output by a delayed, adaptive linear law. J. Neurosci. 26, 1704–1710 (2006).
Article Google Scholar
Berg, H. C. & Brown, D. A. Chemotaxis in Escherichia coli analysed by three-dimensional tracking. Nature 239, 500–504 (1972).
Article Google Scholar
Benda, J., Longtin, A. & Maler, L. A synchronization–desynchronization code for natural communication signals. Neuron 52, 347–358 (2006).
Article Google Scholar
Metzen, M. G., Hofmann, V. & Chacron, M. J. Neural synchrony gives rise to amplitude- and duration-invariant encoding consistent with perception of natural communication stimuli. Front. Neurosci. 14, 79 (2020).
Article Google Scholar
Hofmann, V. & Chacron, M. J. Population coding and correlated variability in electrosensory pathways. Front. Integr. Neurosci. 12, 56 (2018).
Article Google Scholar
Grewe, J., Kruscha, A., Lindner, B. & Benda, J. Synchronous spikes are necessary but not sufficient for a synchrony code in populations of spiking neurons. Proc. Natl Acad. Sci. USA 114, E1977–E1985 (2017).
Article Google Scholar
Fetcho, J. R., Higashijima, S.-i & McLean, D. L. Zebrafish and motor control over the last decade. Brain. Res. Rev. 57, 86–93 (2008).
Article Google Scholar
Tabor, K. M. et al. Direct activation of the Mauthner cell by electric field pulses drives ultrarapid escape responses. J. Neurophysiol. 112, 834–844 (2014).
Article Google Scholar
Orchard, G. et al. HFirst: a temporal approach to object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2028–2040 (2015).
Article Google Scholar
Krebs, J. R., Kacelnik, A. & Taylor, P. Test of optimal sampling by foraging great tits. Nature 275, 27–31 (1978).
Article Google Scholar
Cerri, R. D. & Fraser, D. F. Predation and risk in foraging minnows: balancing conflicting demands. Am. Nat. 121, 552–561 (1983).
Article Google Scholar
Kaelbling, L. P., Littman, M. L. & Moore, A. W. Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996).
Article Google Scholar
Hitschfeld, É. M., Stamper, S. A., Vonderschen, K., Fortune, E. S. & Chacron, M. J. Effects of restraint and immobilization on electrosensory behaviors of weakly electric fish. ILAR J. 50, 361–372 (2009).
Article Google Scholar
Roth, E., Zhuang, K., Stamper, S. A., Fortune, E. S. & Cowan, N. J. Stimulus predictability mediates a switch in locomotor smooth pursuit performance for Eigenmannia virescens. J. Exp. Biol. 214, 1170–1180 (2011).
Article Google Scholar
Vágvölgyi, B. P. General tracker. GitHub https://github.com/vagvolgyi/general_tracker (2021).
Biswas, D. et al. Data and code associated with the publication: Mode switching in organisms for solving explore-vs-exploit problem. Johns Hopkins Research Data Repository https://doi.org/10.7281/T1/QS3QFT6 (2023).

Download references

Acknowledgements

We thank T. Kiemel (UMD) for providing human balance data and B. P. Vágvölgyi (JHU) for developing the tracking software used in this work. We thank C. F. Moss (JHU), V. P. Sharma (GT) and S. Sponberg (GT) for suggesting relevant articles for the reanalysis of the animal locomotion data, and S. L. Poynton (JHMI) for critical feedback on the paper. This work was supported by the Office of Naval Research under grant no. N00014-21-1-2431 (N.J.C.) and the National Science Foundation under grant no. 2011619 (N.J.C.).

Author information

Authors and Affiliations

Laboratory for Computational Sensing and Robotics, Johns Hopkins University, Baltimore, MD, USA
Debojyoti Biswas, Yu Yang & Noah J. Cowan
Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA
Andrew Lamperski
Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA
Yu Yang & Noah J. Cowan
Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, MD, USA
Kathleen Hoffman
Department of Mathematics, Cornell University, Ithaca, NY, USA
John Guckenheimer
Federated Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ, USA
Eric S. Fortune

Authors

Debojyoti Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Lamperski
View author publications
You can also search for this author in PubMed Google Scholar
Yu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen Hoffman
View author publications
You can also search for this author in PubMed Google Scholar
John Guckenheimer
View author publications
You can also search for this author in PubMed Google Scholar
Eric S. Fortune
View author publications
You can also search for this author in PubMed Google Scholar
Noah J. Cowan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, all authors. Methodology, all authors. Software, D.B., A.L., K.H., Y.Y. and J.G. Formal analysis, D.B., A.L., K.H., Y.Y. and J.G. Investigation, Y.Y. Data curation, D.B. and Y.Y. Writing—original draft, all authors. Writing—review and editing, all authors. Visualization, D.B. with other authors. Supervision, N.J.C.

Corresponding authors

Correspondence to Debojyoti Biswas or Noah J. Cowan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Leonard Maler, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Liesbeth Venema, in collaboration with the Nature Machine Intelligence team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison of fish movement in lights-on trials versus lights-off trials.

(a) Velocity trace of the truncated trial data (black) with traces from other trials (gray) for the same fish from Fig. 1c. (b) Histogram of all trials (n = 7) using same length scale, and the three-component Gaussian mixture model (GMM) fit. (c) Magnitude of discrete Fourier transform of velocity traces from (a) with total number of trials (n) indicated next to the plots. The solid line and the shaded region denote mean and the standard error of mean, respectively. (d) Box and whisker plot showing RMS values of individual trials for all fish (N = 5) with colors the same as in (c). Mean RMS velocity across trials for all fish in lights-off trials was greater than in lights-on trials (one-sided p-values are 0.0004, 0.0001, 0.0036, 0.0003, and 0.0001, respectively). (e, f) Q-Q plots from a single representative fish (fish 1) comparing the velocity data from lights-off (g) and lights-on trials (h) with theoretical quantiles from a normal fit (magenta) and GMM fit (blue), respectively. Lesser deviation from the reference line (black dashed) for GMM fits with three components than for the normal, suggested better fitting of the former. The Q-Q plots also showed that the lights-off trial data were closer to normal distribution fits than were those of lights-on trial data. (g) Cumulative difference in Bayesian information criterion (ΔBIC) values for varying number of components in GMM (top: lights off; bottom: lights on). The gray dashed line corresponds to three-component GMM. (h) Box and whisker plot showing kurtosis values of individual trials for all fish (N = 5) with colors the same as in (c). For all fish, mean kurtosis values across trials in lights-on trials was greater than in lights-off trials (one-sided p-values are 0.0034, 0.0188, 0.0005, 0.0445, and 0.0068, respectively). All box and whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ± 1.5 × IQR. The total number of lights-off trials (n) for fish 1 and 3 was 7, and for the rest, it was 10 trials per condition per fish. All p-values were calculated using the Mann-Whitney-Wilcoxon test.

Extended Data Fig. 2 Reanalysis of the experimental and simulated trajectories from Chen et al.3.

(a) Experimental velocity traces (n = 10) with ‘jamming’ of the electrosensory system in Eigenmannia virescens, which decreased the salience and reliability of electrosensory navigation. (b) Corresponding histogram, with the kurtosis value, κ. The magenta dashed and the blue solid curves correspond to a normal and GMM fit with three components, respectively. (c) Q-Q plots comparing the sample velocity data for all trials (n = 10) from (a) with theoretical quantiles from the same normal (magenta) and GMM fit (blue) from (b). Clearly, the GMM fit was better than the normal. See Extended Data Table 2 for statistical details. (d–f) Experimental velocity traces (n = 10) with jamming electrode off (d), corresponding histogram (e) with the kurtosis value, κ and the Q-Q plots (f). (g–i) Simulated velocity traces (n = 9) using Ergodic Harvesting Information (EIH) algorithm for weak signal (g; SNR ≤ 30 dB equivalent to jamming amplitude ≥ 10 mA) (g), corresponding histogram (h) with the kurtosis value, κ and the Q-Q plots (i). (j-l) Simulated velocity traces (n = 9) using Ergodic Harvesting Information (EIH) algorithm for strong signal (d; SNR ≥ 50 dB equivalent to absence of jamming) (j), corresponding histogram (k) with the kurtosis value, κ and the Q-Q plots (l). Colors and styles are same as in (a-c). Mean ± SD of the respective RMS values of experimental velocity traces (v_RMS) are shown next to plots of (a,d). The fitting performances of GMM and the normal distribution for simulated trajectories were comparable. See Extended Data Table 3 for statistical details.

Extended Data Fig. 3 Clustering of velocity data into different behavioral modes.

(a,b) Representative velocity histograms of lights-off (a) and lights-on (b) trials from the same fish with three clusters: slow (orange), fast positive (green) and fast negative (blue). The clustering was based on identifying velocity thresholds vL and vH on the Gaussian mixture model (GMM) fit (grey line) as indicated by blue and green markers, respectively (see Methods for details). (c,d) Top: two-state (F: fast velocity both positive and negative combined and S: slow velocity) Markov process showing mean transition rates for a representative individual in lights-off (a) and lights-on trials (b). Bottom: transition rates corresponding to state transitions: F → S in (e) and S → F in (f), respectively for lights-off (black) and lights-on (red) trials. The transition rate for F → S was higher in lights-on trials (one-sided p-values are 0.0068, 0.0001, 0.0046, 0.0023, and 0.0093, respectively) whereas the rate for S → F was higher in lights-off trials (one-sided p-values are 0.0006, 0.0001, 0.0001, 0.0018, and 0.0001, respectively). (e-h) Velocity histograms (e,f) and traces (g,h) from lights-off (e,g) and lights-on (f,h) trials from the same fish from (a,b) showing three clusters using Hidden Markov model (HMM) based clustering. The colors are same as in (a,b). (i-l) Velocity histograms (i,j) and traces (k,l) from lights-off (i,k) and lights-on (j,l) trials from the same fish from (a,b) showing three clusters using maximum a posteriori (MAP) clustering based on three-component GMM fits. The probability density functions (pdf) of respective components are shown as dashed lines in (i,j). (m) Box and whisker plots showing residence time in slow mode, computed as the percent of the trial duration (40 s), for lights-off (black) and lights-on (red) trials computed using different clustering algorithm. For details see Methods. For all the clustering algorithms, the computed residence time was significantly higher during lights-on trials than lights-off trials (one sided p-values are 0.0002, 0.0034, 0.0001, and 0.0001, respectively). All box and whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ± 1.5 × IQR. In (c,d) the total number of lights-off trials (n) for fish 1 and 3 was 7, and for the rest, it was 10 trials per condition per fish. All p-values were calculated using the Mann-Whitney-Wilcoxon test.

Extended Data Fig. 4 Reanalysis of the postural sway in humans, Homo sapiens17, and odor plume tracking response of American cockroach, Periplaneta americana¹⁸ show evidence in support of sensory salience dependent mode-switching strategy.

(a,b) Representative temporal traces of mediolateral movement of center of mass, ML CoM (a) and the histograms of ML CoM velocities (b) for different experimental conditions-both vision and touch (top), only touch (middle) and only vision (bottom). The magenta dashed and the blue solid curves in (b) correspond to a normal and GMM fit with three components, respectively. The dataset analyzed here, comprised of 7 subjects (N), with 2-3 replicate trials per experimental condition, was collected at 50 Hz. (c,d) Comparison of the RMS velocities (c) and switching frequency (d) for different experimental conditions. Different shades of gray denotes different human subjects. The p-values were computed using the sign test. (e,f) Representative temporal traces of the lateral head movement (e) and the histograms of the lateral velocities (f) for different antennae length as indicated. The colors of fitted curves are same as in (a,b). The dataset analyzed here, was collected at 30 Hz but later was subsampled at 15 Hz by the original study authors. The kurtosis (κ) values and the total number of trajectories (N, single trajectory per subject) analyzed are indicated next to the respective panels in (f). (g,h) Comparison of the RMS lateral velocities (g) and residence time at slow mode (h) for different experimental conditions. All box and whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ± 1.5 × IQR. Sample sizes (n) are shown in each boxplot. The one-sided p-values were computed using the Mann-Whitney-Wilcoxon test.

Extended Data Fig. 5 Reanalysis of the tactile response in crepuscular hawkmoth, Manduca sexta20, and active exploratory movement of three different species of hawkmoths during flower tracking¹⁹ shows similar broad-shouldered velocity distributions.

(a-f) Histograms of relative radial angular velocity for different shape of the flower as indicated. ‘C’ is the curvature parameter for the description of the lateral traces of the corollas for first (a-c) and seventh (d-f, early-learning) visit. The magenta dashed and the blue solid curves in (d) correspond to a normal and GMM fit with three components, respectively. The kurtosis (κ) values and the total number of trajectories (N) analyzed are indicated next to the respective panels in (a-f). The dataset analyzed here was collected at 100 Hz. For the present study, we focused on the data during the pre-feeding phases only. (g-l) Histograms of active exploratory velocity at low (g-i: 15 lx) and high (j-l: 300 lx) illumination level in three different species of hawkmoth-nocturnal Deilephila elpenor (g,j), diurnal Macroglossum stellatarum (h,k), and crepuscular Manduca sexta (i,l). Colors of the fits are same as in (a-f). The kurtosis (κ) values and the total number of hawkmoths (N) analyzed are indicated. (m-o) Box and whisker plots showing the RMS active exploratory velocity for all three species of hawkmoths analyzed at different illumination levels. All box and whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ± 1.5 × IQR. Sample sizes (n) are shown in each boxplot. The one-sided p-values were computed using the Mann-Whitney-Wilcoxon test. The datasets analyzed here were collected at 100 Hz with one trial per hawkmoth (N = n).

Extended Data Fig. 6 Reanalysis of the directed movement of Amoeba proteus and Metamoeba leningradensis in response to an electric field (galvaontaxis)22.

Migration trajectories (N = 50) of A. proteus (a), temporal traces of velocities (b) in the transverse direction of the applied electric field derived from the migration data, and the corresponding velocity histogram (c) with kurtosis (κ) value. The magenta dashed and the blue solid curves in (c) correspond to a normal and GMM fit with three components, respectively. (d-f) Galvanotaxis response in M. leningradensis. The panels are same as in (a-c). Image data for both datasets were collected at 0.1 Hz.

Extended Data Fig. 7 Effect of thresholds, ${{{{\rm{T}}}}}_{\min }$ and ${{{{\rm{T}}}}}_{\max }$ in Triggered Excitation strategy.

(a-c) Heatmaps showing mean RMS tracking error in Triggered Excitation (e_TE, a), mean RMS control effort (u_RMS, b) and mean kurtosis (κ, c) of the resultant velocity distributions from 100 independent simulations at critical excitation level corresponds to minimum RMS tracking error in Persistent Excitation (${{{{\rm{e}}}}}_{{{{\rm{PE}}}},\min }$ in Fig. ??l) as thresholds ${{{{\rm{T}}}}}_{\min }$ and ${{{{\rm{T}}}}}_{\max }$ were varied. The dashed line in (a-c) shows the phase transition based on the difference between the tracking error in Triggered and Persistent Excitation. The region inside the line corresponds to parameter space where the tracking error in Persistent Excitation is less than Triggered excitation, whereas outside the region Triggered excitation performs better. (d) Variation of kurtosis, κ (green, left y-axis), and Kullback-Leibler (K-L) divergence (right y-axis) of normal distribution (magenta dashed) and Gaussian mixture model (blue solid) fit to the velocity distribution with sensor noise variance, σ². (e,f) Velocity histograms with kurtosis values are shown for sensor noise variance, σ² = 0.70 and 2, respectively, as indicated by (i) and (ii) in (d). (g) Variation of RMS control effort (u_RMS) with sensor noise variance, σ². The shaded regions in (d,g) denote the respective standard deviations (n = 25 independent simulations per σ²).

Extended Data Table 1 Comparison of Kullback-Leibler (K-L) divergence and Bayesian information criterion (BIC) for the three-component Gaussian mixture model (GMM) and normal fits to the velocity data of Eigenmannia virescens during lights-off and lights-on trials

Full size table

Extended Data Table 2 Summary of the reanalysis of the published data across nine different species^{3,14,15,16,17,18,19,20,21} showing Kullback-Leibler (K-L) divergence and Bayesian information criterion (BIC) for the normal and the three-component Gaussian mixture model (GMM) fits to the velocity data and the kurtosis value (κ)

Full size table

Extended Data Table 3 Summary of comparison between different strategies of template model and Ergodic Information Harvesting (EIH) model showing Kullback-Leibler (K-L) divergence and Bayesian information criterion (BIC) for the normal and the three-component Gaussian mixture model (GMM) fits to the simulated velocity data and the kurtosis value (κ)

Full size table

Supplementary information

Supplementary Information

Supplementary Material and Methods, Figs. 1–6 and video information.

Supplementary Video 1

Fish behaviour during lights-off trials, related to Figs. 1 and 2. The top panel depicts the fish movement (bottom view) inside a stationary refuge from a representative trial. The bright ventral patch on the fish was tracked (circular marker). The original video was recorded at 100 frames per second. The middle and the bottom panels show the position (middle) and velocity (bottom) traces along the rostrocaudal axis (that is, x axis). The colours denote different behavioural modes: slow, orange; green, fast positive; blue, fast negative.

Supplementary Video 2

Fish behaviour during lights-on trials, related to Figs. 1 and 2. The format is same as Supplementary Video 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Biswas, D., Lamperski, A., Yang, Y. et al. Mode switching in organisms for solving explore-versus-exploit problems. Nat Mach Intell 5, 1285–1296 (2023). https://doi.org/10.1038/s42256-023-00745-y

Download citation

Received: 14 March 2023
Accepted: 15 September 2023
Published: 26 October 2023
Issue Date: November 2023
DOI: https://doi.org/10.1038/s42256-023-00745-y

Subjects

Abstract

Similar content being viewed by others

Main

E. virescens exhibited fast and slow behavioural modes

Sensory salience drives explore–exploit mode switching

Mode-switching across taxa, behaviours and sensory modalities

Heuristic model of the mode-switching strategy

Discussion

Methods

Tracking of glass knifefish

Subjects

Experimental apparatus

Experimental procedure

Tracking algorithm

Data processing

Identification and characterization of behavioural modes

Simulation

Statistics

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Extended Data Fig. 2 Reanalysis of the experimental and simulated trajectories from Chen et al.3.

Extended Data Fig. 4 Reanalysis of the postural sway in humans, Homo sapiens17, and odor plume tracking response of American cockroach, Periplaneta americana18 show evidence in support of sensory salience dependent mode-switching strategy.

Extended Data Fig. 5 Reanalysis of the tactile response in crepuscular hawkmoth, Manduca sexta20, and active exploratory movement of three different species of hawkmoths during flower tracking19 shows similar broad-shouldered velocity distributions.

Extended Data Fig. 6 Reanalysis of the directed movement of Amoeba proteus and Metamoeba leningradensis in response to an electric field (galvaontaxis)22.

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

Extended Data Fig. 4 Reanalysis of the postural sway in humans, Homo sapiens17, and odor plume tracking response of American cockroach, Periplaneta americana¹⁸ show evidence in support of sensory salience dependent mode-switching strategy.

Extended Data Fig. 5 Reanalysis of the tactile response in crepuscular hawkmoth, Manduca sexta20, and active exploratory movement of three different species of hawkmoths during flower tracking¹⁹ shows similar broad-shouldered velocity distributions.