Main

Organisms show complex patterns of movement that arise from the interplay between obtaining information (‘explore’)1,2,3 and using current information (‘exploit’)4. Exploratory movements to gain information and exploitative movements to achieve the task at hand are often mediated by the same motor systems. For example, the weakly electric glass knifefish (Eigenmannia virescens) produces both information-seeking exploratory movements2,3,5 and goal-driven exploitative movements to remain within a refuge6,7 using the same ribbon-fin locomotor system8,9. Both of these types of movement occur in a single linear dimension, along the rostrocaudal axis. This behaviour makes E. virescens an excellent model system with which to investigate the interplay between explore and exploit movements: within a fixed refuge, fish produce ancillary back-and-forth exploratory movements to sense the refuge2,3,5, but these back-and-forth (explore) movements conflict with the corrective movements (exploit) required for station keeping.

Resolving this conflict between explore movements5 versus goal-directed exploit movements is a computationally intractable optimization problem10,11,12. How do organisms resolve the explore–exploit conflict? A simple heuristic to solve this problem would be for an organism to perform goal-directed exploit movements while superimposing continuous small exploratory sensing movements—in other words, to use a persistent excitation approach13. Indeed, this heuristic has proven effective (if suboptimal) as an engineering approach to solve the explore–exploit problem of identifying states and parameters of a dynamical system during task execution13. If organisms were to employ such a strategy, they would produce movement statistics that correspond to a single behavioural mode (for example, a single-component Gaussian distribution) that continuously superimposes explore and exploit behaviour.

In contrast, we discovered that E. virescens does not use a persistent excitation strategy; instead, it shows a mode-switching strategy between fast, active-sensing movements (explore) and slow, corrective movements (exploit). This mode switching is modulated by sensory salience (Fig. 1 and 2). To assess the generality of this mode-switching strategy we investigated ten additional tasks performed by ten species ranging from amoebae to humans14,15,16,17,18,19,20,21,22, using five major sensing modalities—vision, audition, olfaction, tactile sensing and electrosensation (Fig. 3). On the basis of this extensive reanalysis, we found that such mode switching—and its dependence on sensory salience—is found across diverse behaviours, taxa and sensing modalities (Fig. 4). Inspired by this widespread biological strategy, we propose an engineering heuristic for selecting behavioural modes based on state uncertainty (Fig. 5), and show that this heuristic captures key features of mode switching found across organismal models. Furthermore, we show that this mode-switching heuristic can achieve better task-level performance, and do so with less control effort, than the conventional persistent-excitation strategy.

Fig. 1: Velocity distributions are broad-shouldered.
figure 1

a,b, Side view (schematic; a) and bottom view (infrared image; b) of a fish inside a stationary refuge. The bright ventral patch on the fish was tracked (green dot) as the fish swam inside the refuge (magenta dot). c,d, Position traces during lights-off trials (n = 7; c) and during lights-on trials (n = 10; d) from a single representative fish. e,f, Corresponding velocity traces (left) and velocity histograms (right) over the same range of velocities for lights-off trials (e) and lights-on trials (f), with the kurtosis value κ indicated. Note that the time and probability scales of the horizontal axes are shown below in f. A three-component GMM (blue solid curve) fits the data better than the normal fit (magenta dashed curve) as indicated by the lower Kullback–Leibler divergence and Bayesian information criterion values; see Extended Data Table 1 for statistical details. One lights-off trial with large positive velocity is truncated; see Extended Data Fig. 1a,b for full version.

Fig. 2: Bursts of faster movements are more common in lights-off trials than in lights-on trials.
figure 2

a,b, Fish showed two distinct behavioural modes, slow movement and fast movement, as seen in the velocity (top) and position traces (bottom) of representative trials from the same fish under lights-off (a) and lights-on (b) conditions. c, The residence time in the slow mode, computed as the percent of the trial duration (40 s), was significantly higher during lights-on trials than lights-off trials (one-sided P values are 0.0002, 0.0001, 0.0001, 0.0014 and 0.0001, respectively). d, The switching between the fast mode (positive and negative combined) and slow mode was significantly more frequent in lights-off trials (black) than in lights-on trials (red) (one-sided P values are 0.0037, 0.0070, 0.0008, 0.0045 and 0.0001, respectively). All box-and-whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ±1.5 × IQR. For both c and d, the P values were calculated using the Mann–Whitney–Wilcoxon test. The total number of lights-off trials (n) for fish 1 and 3 was 7, and for the rest it was 10 trials per condition per fish.

Fig. 3: Diversity of organisms and sensing modalities used for analysing mode-switching strategies.
figure 3

The major sensing modalities (vision, tactile sensing, audition, olfaction and eletrosensation) used for task execution are listed next to the organisms (four species of mammals, a species of ray-finned fish, four species of insects and two species of amoebae). The effect of sensory salience is studied for organisms marked with a star. Behaviours and movement statistics of the organisms are shown in greater detail in Fig. 4.

Fig. 4: Broad-shouldered velocity distribution is found across taxa, behaviours and sensing modalities.
figure 4

Reanalysis of data from eight previous studies reveals a convergent statistical structure of movements across a range of organisms and behaviours. a, Postural sway in humans (Homo sapiens) during maintenance of quiet upright stance17. b, Microsaccades in humans (H. sapiens) during fixated gaze21. c, Bilateral eye movements in mice (M. musculus) during prey (cricket) capture15. d, Pinnae movements in big brown bats (E. fuscus) while echolocating prey (mealworm)16. e, Olfactory-driven head movements in eastern moles (S. aquaticus) in response to food (earthworms)14. f, Odour plume tracking in American cockroaches (P. americana) in response to sex pheromone (periplanone B)18. g, Tactile sensing by Carolina sphinx hawkmoth (M. sexta) while searching for a flower nectary20. h, Visual tracking of swaying flower by hawkmoths19 (three species; only data from elephant hawkmoth, D. elpenor is shown; for the remainder, see Extended Data Fig. 5g–p). The second column shows representative temporal traces of the active exploratory movements and the third column shows the respective velocity traces. The fourth column presents velocity histograms showing that, unlike the normal distribution (magenta dashed curve), the three-component GMM (blue solid curve) captures the broad-shouldered nature of the velocity data across species, behaviours and sensing modalities. See Supplementary Materials and Methods for detailed method and Extended Data Table 2 for statistical details. Mice eye image in c adapted with permission from ref. 15 under a Creative Commons License CC BY 4.0. ML CoM, Mediolateral Center of Mass.

Fig. 5: Template model illustrating three different exploration strategies.
figure 5

a, Schematic of the triggered-excitation (mode-switching) strategy. The musculoskeletal plant for animal locomotion has two states—position (x) and velocity (v). The state estimator has access to noisy measurements from a nonlinear adaptive sensor (g(x)v). The state estimator (extended Kalman filter) is designed to work in tandem with the mode-switching controller. The controller output, u, comprises both state feedback (\(-f(\hat{x},\hat{v})\)) and an active-sensing component (ua(t)). See Methods for details. b, Triggered-excitation (TE) strategy showing the temporal traces of measure of uncertainty (M, trace of the error covariance) with threshold levels Tmin = 4.8 × 10−3 and Tmax = 6 × 10−3 (top), input u (bottom; active-sensing component, ua(t): light blue; state-feedback component, \(-f(\hat{x},\hat{v})\): dark blue). The triggering started as M exceeded Tmax and the triggering continued until M dropped below Tmin. ce, Simulated position traces (actual states: teal, estimated states: red) using three different exploratory movement strategies for ua(t): zero excitation (c), persistent excitation (d) and triggered excitation (Tmax) along with the state feedback. The respective RMS values of the tracking error (eRMS) are shown in the panels. fk, Controller input traces (fh) and velocity histograms (ik) for various schemes as in ce. Respective RMS values of the inputs (uRMS) are shown in fh. In ik, the fits with a normal distribution (magenta dashed) and three-component GMM (blue solid) are shown along with the respective kurtosis (κ) values. l, Effect of exploratory movement (variance of ua(t)) on tracking error (e) and control effort (u) in persistent excitation. The minimum RMS tracking error and the corresponding control effort are denoted as ePE,min and uPE,min, respectively. m,n, Effect of threshold pair (Tmax, Tmin) in triggered excitation on tracking error (m) and control effort (n) at optimum variance of ua(t) corresponds to ePE,min. The solid lines in m and n show the respective mean and the shaded regions in m correspond to respective s.e.m. (n = 300 independent simulations). Note that with the right choice of threshold pair, the triggered-excitation scheme can achieve lower tracking error with substantially lower control effort.

E. virescens exhibited fast and slow behavioural modes

We examined the behaviour of individual E. virescens as they performed untrained station keeping within a fixed refuge (Fig. 1a,b). Station keeping requires only small corrective movements; therefore, any significant movements by the fish are attributed to information-seeking, exploratory movement2,3,5. Previous work23 has demonstrated that E. virescens use both vision and electrosense for station keeping. Hence varying the light level is an experimental mechanism to examine the effect of visual salience on the selection between explore and exploit movements.

We measured the movements of five individual fish in 40 s duration station-keeping trials, in two lighting conditions: lights ‘off’ trials had low illumination (~0.3 lx; Supplementary Video 1) and lights ‘on’ trials had bright illumination (~80 lx; Supplementary Video 2). We conducted between seven and ten trials per condition per fish. We discarded trials in which the fish changed its swimming direction or exited the refuge. Consistent with previous studies2,5, fish moved significantly more in lights-off conditions than in lights-on conditions (Fig. 1c,d and Extended Data Fig. 1a–d). However, these previous analyses2,5 focused on tracking performance using analytical methods including Fourier analysis and root-mean-square (RMS) metrics that masked the temporal structure of the active-sensing movements that we seek to understand in this paper.

We found the patterns of fish swimming velocities were consistent with a mode-switching strategy. The distribution of velocities (v) featured a sharp peak around v = 0 with ‘broad’ shoulders for faster movements (Fig. 1e,f, right). These empirical distributions differed from a Gaussian distribution in two ways: (1) the distinct central peak and (2) the broad shoulders corresponding to the faster movements. The central peak (near-zero velocity) represents slow movement and the broad shoulders represent faster movement. These behavioural modes are associated with exploit and explore, respectively, as discussed in greater detail below.

The two behavioural modes were significantly better approximated by three-component Gaussian mixture models (GMMs) than by single-component models (Fig. 1e,f, right). This was shown by three measures, namely Kullback–Leibler divergence, Bayesian information criterion and closeness of quantile–quantile plots to the reference line (Extended Data Table 1, Extended Data Fig. 1e,f and Supplementary Fig. 1). The three-component GMMs generally comprised a sharp central Gaussian peak, capturing slow, task-oriented station-keeping movements, and two Gaussian ‘shoulders’, capturing faster, positive (forwards) and negative (backwards) exploratory movements. We found that only modest improvements in the fit of the GMMs occurred when using more than three components (Extended Data Fig. 1g).

The fast-mode movements increased in frequency in lights-off trials, increasing the relative prominence of the ‘shoulders’. For example, Fig. 1e,f shows representative data from one fish in which there were 48 fast movements with lights off (Fig. 1e, left) but only 13 fast movements with lights on (Fig. 1f, left). Interestingly, the overall higher proportion of fast velocities in lights-off trials leads to a surprising result, namely, higher kurtosis values for lights-on versus lights-off trials (Extended Data Fig. 1h). In other words, the increase in frequency of fast motions in the dark leads to a decrease in the relative prominence of the central, task-oriented velocity peak at v = 0, so that the overall distribution is closer to Gaussian and the kurtosis trends towards 3.

We found that the trend towards a Gaussian distribution of movement velocities in lights-off trials (reduced sensory salience) to be surprising because the exploratory movements for actively sensing the environment are associated with a nonlinear requirement24,25 to make movements that are potentially in conflict with task goals. Therefore, our initial hypothesis—that this nonlinearity would produce increased deviation from a Gaussian velocity distribution as sensory salience was reduced—was not supported. Our initial intuition failed because we did not appreciate that decreases in sensory salience drives the selection of explore behaviour, and that behaviour itself is approximately Gaussian, ultimately reducing the relative prominence of the task-oriented central peak.

Interestingly, reanalysis of data from a previous study of exploratory movements in a similar refuge-tracking paradigm in E. virescens show the same relationship between sensory salience and changes in velocity profiles, but for modulations of a different sensory modality, namely, electrosensation3. In these previous experiments, artificially generated electrical signals were used to diminish the salience of electrosensory information as the electric fish performed the refuge tracking. Our reanalysis of these published data (see Supplementary Material and Methods for details) showed that fish exhibited the distinctive non-Gaussian distribution of velocities. Moreover, the velocity distributions were modulated in relation to electrosensory salience: lower kurtosis values (corresponding to more normal distributions) occurred in experimental trials with added artificial electrosensory ‘jamming’ signals (Extended Data Fig. 2a–f).

Sensory salience drives explore–exploit mode switching

How do changes in sensory salience drive changes in mode switching? To investigate this question, we segregated the velocity trajectories into ‘S’, a slow-velocity mode (exploit) comprising task-oriented, station-keeping movements, and ‘F’, a fast-velocity mode (explore) comprising large positive (forwards) and negative (backwards) movement velocities (Fig. 2a,b, Extended Data Fig. 3 and Supplementary Fig. 2; see Methods for different clustering algorithms used).

Fish produced slow- and fast-velocity modes of movements in both lights-on and lights-off trials. We computed the residence time in each behavioural mode as a proportion of the total time spent in that mode compared with the trial duration of 40 s (note that the residence time in slow and fast modes adds up to unity). The residence time τs in the slow exploit mode was significantly higher (>1.7 times) in lights-on trials than the lights-off trials (Fig. 2c). In contrast, the residence time in the fast explore mode (1 − τs) was higher in light-off than in lights-on trials.

Fish switched between slow and fast modes more frequently in lights-off trials than in lights-on trials (Fig. 2d). From the computation of the transition rates between slow (S) and fast (F) modes as a two-state Markov process, we found that the transition rate S → F was significantly lower in lights-on versus lights-off trials, that is, the slow (exploit) state was visited more frequently in the lights-on trials compared with lights-off trials (Extended Data Fig. 3b). This salience-dependent modulation of switching frequency was the key mechanism by which movement velocity distributions trended towards a Gaussian distribution as a function of decreased sensory salience.

Mode-switching across taxa, behaviours and sensory modalities

Is this mode-switching strategy solution for the explore-versus-exploit problem found in other species, in other categories of behavioural tasks and in control systems that rely on other sensing modalities?

To answer this question, we analysed published data for an additional ten species, representing a wide phylogenetic range of taxa, from single-celled organisms to humans, involving categorically different tasks and sensorimotor regimes3,14,15,16,17,18,19,20,22. These taxonomically diverse species were selected to encompass a wide range of behaviours that rely on a broad range of sensory systems (Fig. 3). For every example we examined, we found the same distinctive non-normal distribution of velocities, with a peak at low-velocity movements and broad shoulders for higher-velocity movements (Fig. 4).

For example, postural sway movements in humans are thought to prevent the fading of postural state information during balance26. Our reanalysis of quiescent stance data17 revealed evidence of mode switching (Fig. 4a) that is remarkably similar to our findings in electric fish. In the quiescent stance task, human participants used visual and tactile feedback to maintain an upright posture. The distribution of sway velocities revealed a distinct peak at low velocities corresponding to the task goal, and broad shoulders for higher velocities produced by the exploratory movements; the velocity statistics were better captured by a GMM than a normal distribution. Furthermore, the velocity distribution showed the same surprising relation to changes in salience, becoming more Gaussian as well as an increase in the switching frequency when sensory salience was decreased (Extended Data Fig. 4a–d), as seen in the electric fish E. virescens.

Mode switching was also observed in invertebrate species. For example, the Carolina sphinx hawkmoth (Manduca sexta) uses somatosensory feedback from their proboscis to detect the curvature of flowers when searching for nectaries at dawn and dusk20. In this search behaviour, which has dynamics that are qualitatively similar to vibrissal sensing in rats27, the moth sweeps its proboscis across the surface of the flower using a combination of slow- and high-velocity movements. Our analysis of the distribution of the rate of change of radial orientation angle (angle between the proboscis tip trajectory and the radial axis of the flower), before the insertion of the proboscis tip into the nectary shows the characteristic sharp peak with broad shoulders (Fig. 4g) that is captured by a GMM. Experimental changes in the shape of artificial flowers that degrade the salience of the curvature of the flower surface20 resulted in a decrease of the kurtosis value of the proboscis angular velocity distribution (Extended Data Fig. 5a–f), similar to how both E. virescens and humans responded to changes in sensory salience.

The fact that this salience-based, mode-switching strategy was found in two distantly related classes (mammalia and insecta) performing very different behaviours, using different sensorimotor systems, suggests that the strategy emerged as a convergent solution to the explore-versus-exploit problem. We found additional evidence of convergence of this solution in reanalysis of eight additional datasets: visual saccades in humans (Fig. 4b and Supplementary Fig. 3)21 and in house mice Mus musculus (Fig. 3c and Supplementary Fig. 4)15, movements of the pinnae of echolocating big brown bats Eptesicus fuscus (Fig. 4d and Supplementary Fig. 5)16, olfaction in eastern moles Scalopus aquaticus (Fig. 4e and Supplementary Fig. 6)14 and American cockroaches Periplaneta americana (Fig. 4f and Extended Data Fig. 4e–f)18, and visual tracking of a swaying flower in three species of hawkmoths (M. sexta, Deilephila elpenor and Macroglossum stellatarum; Fig. 4h and Extended Data Fig. 5g–p)19. The discovery of a similar, parsimonious velocity distributions across taxa, behaviour and sensing modalities, with consistent dependency on sensory salience, was surprising.

Intriguingly, our analysis of the dynamics of transverse exploration by pseudopods of amoebae22 (Amoeba proteus and Metamoeba leningradensis) reveals similar GMM velocity distributions in response to an electric field (Extended Data Fig. 6). Although our modelling approach (see next section) includes inertial dynamics, which cannot be directly applied to movement of organisms in the low-Reynolds-number regimes occupied by single-celled and other microscopic organisms, these observations are consistent with a mode-switching strategy for the control of movement in these amoebae.

The examples described above include a broad phylogenetic array of organisms that perform a variety of behavioural tasks using different control and morphophysiological systems. Just as these behavioural systems evolved within each of the lineages represented in our reanalyses, we suggest that mode switching probably evolved independently in each lineage as well. In other words, the similarities we found across taxa are the result of convergent evolution towards a common solution—mode switching—for the explore-versus-exploit problem.

Heuristic model of the mode-switching strategy

Why might animals use mode switching, rather than the simpler heuristic of applying continual, low-amplitude exploratory inputs used by control engineers13? To address this question, we propose a parsimonious heuristic model that comprises a nonlinear motion-dependent sensor, a linear musculoskeletal plant, a state estimator (also known as an observer28) and a mode-switching controller (Fig. 5a). For the musculoskeletal plant, we assumed a simplified second-order Newtonian model9,24:

$$\begin{array}{rcl}\frac{{\mathrm{d}}x(t)}{{\mathrm{d}}t}&=&v(t),\\ \frac{{\mathrm{d}}v(t)}{{\mathrm{d}}t}&=&-\frac{b}{m}v(t)+\frac{1}{m}u(t)+{w}_{1}(t).\end{array}$$
(1)

Here t is dimensionless time, x(t) is dimensionless position, v(t) is dimensionless velocity, u(t) is the controller input and w1(t) is process noise. The process noise includes noise due to physical disturbances29,30 as well as motor noise31. The system parameters m and b represent unitless mass and viscous damping, respectively.

The key feature of the model is that the nonlinear sensory system (that is, the ‘motion-dependent sensor’) embodies the high-pass filtering (that is, fading or adapting) characteristics found across biological sensory systems26,32,33,34,35,36,37,38. This sensory system model (‘motion-dependent sensor’ in Fig. 5a) assumes nonlinear measurements that decay to zero over time in the face of constant stimuli:

$$y(t)=\frac{{\mathrm{d}}}{{\mathrm{d}}t}s(x(t))+{w}_{2}(t)=g(x)v(t)+{w}_{2}(t).$$
(2)

Here s(x) is the position-dependent sensory stimulus experienced by the organism, g(x) is the spatial derivative of the sensory stimulus (ds(x)/dx), and w2(t) is the sensory noise. The controller includes a state-feedback-based task-level control policy, \(-f(\hat{x},\hat{v})\), which exploits previously collected sensory information; that information is parsimoniously encoded (1) in estimates of the position and velocity \((\hat{x},\hat{v})\), and (2) in an ongoing measure of uncertainty, M (based on the covariance of position and velocity estimates; see Methods). Previous theoretical work has demonstrated that exploratory movements are required for state estimation in control systems that rely on such high-pass (that is, fading) sensors24,25. Hence, the controller also includes an active-sensing control policy, ua(t), that seeks to gain new information through exploratory movements.

To find the optimal balance between exploit and explore components, for a given admissible control policy, π and a given weight, r for an input, \(u\in {{{\mathcal{U}}}}\) (action space), we can define the average steady-state cost function:

$$J_{\pi}=\limsup_{t\to\infty}\mathbb{E}[x(t)^{2}+ru(t)^{2}]$$
(3)

where \({\mathbb{E}}\) is the expectation computed over all the trajectories induced by admissible control policy, π. Note that a control policy is admissible if it depends causally on the sensor and actuator data. We chose the cost function, Jπ as a weighted combination of steady-state tracking error and control effort. Even with complete knowledge of the system states, computation of the optimal solution \({J}_{{\pi }^{* }}=\mathop{\inf }\limits_{\pi \in {{\varPi }}}{J}_{\pi }\) where Π is the set of all admissible control policies, is only tractable in the case of linear systems or systems with finite state and action spaces.39. As the system is partially observed, existing approaches to optimal control require the solution to an optimal filtering problem and then formulate feedback laws on the filter states39. The filtering problem requires computation of the conditional probability \({\mathbb{P}}\left(\left[\begin{array}{c}x(t)\\ v(t)\end{array}\right]\left\vert \right.\,y(\tau ),u(\tau )\,\forall \tau \le t\right)\). However, due to nonlinearity in the measurement (equation (2)), there is no tractable method to compute this conditional probability, and so heuristic strategies are required. We tested three exploratory movement heuristics for the controller to find an approximate answer to this intractable optimal control problem.

  1. (1)

    Zero excitation: this is a passive strategy (that is, no exploration) in which the system provides no input excitation for the actuation forces (ua(t) = 0 for all t). This is a conventional state-feedback controller.

  2. (2)

    Persistent excitation: this scheme tests a common continuous exploration strategy used in the field of adaptive control13. The controller continually injects a Gaussian input ua(t).

  3. (3)

    Triggered excitation: this mode-switching strategy depends on lower and upper thresholds, Tmin and Tmax; the controller injects Gaussian input only when the uncertainty in the state estimator M exceeds Tmax, and then continues to inject input until this uncertainty drops below a lower threshold, Tmin (Fig. 5b).

As previous theoretical work has shown25, the zero-excitation strategy (that is, traditional state-estimate feedback) cannot minimize the state estimation error and hence, not surprisingly, results in poor tracking performance (Fig. 5c), thus illustrating the need for an additional active-sensing component in the face of adaptive sensing and perceptual fading. The persistent-excitation and triggered-excitation strategies both facilitate substantially better position control than does the zero-excitation strategy (Fig. 5c,d). Although these two strategies resulted in comparable tracking errors (eRMS; Fig. 5b,c), the triggered excitation was more efficient, requiring substantially lower control effort (uRMS; Fig. 5g,h). Moreover, unlike the persistent-excitation strategy, triggered excitation generated a distinctive broad-shouldered velocity distribution that featured a sharp peak near zero, with broad shoulders corresponding to bursts of fast movement (Fig. 5j,k). This distribution was strikingly similar to experimental observations across organisms (Figs. 1 and 4 and Extended Data Figs. 46), suggesting that such broad shoulders are a signature (if not definitive proof) of a mode-switching strategy.

We showed that active exploration is essential for better tracking performance as it improves state estimation. But, there is a point of diminishing returns: although higher (more energetic) active excitation can result in excellent state estimation, there is a point beyond which these additional active-sensing movements lead to greater tracking errors.

To contrast between persistent excitation and triggered excitation, we performed a numerical study to obtain the variance of the active-sensing signal ua(t) that minimizes the RMS tracking error for the persistent excitation strategy, ePE,min. Note that persistent excitation is the limiting case of triggered excitation with extremely low threshold values (that is, insuring that the active-sensing mode is always ‘on’). With that optimum stimulation obtained from persistent excitation, we next performed a parameter sweep involving threshold pair (Tmax, Tmin) in the triggered excitation. We discovered that the choice of thresholds in the triggered-excitation strategy plays an important role—with the right choice of parameters we could achieve better tracking performance (Fig. 5m) at reduced control effort (Fig. 5n). The choice of thresholds also shapes the velocity to best extract sensory information; with low thresholds, the statistics approach that of the persistent excitation, whereas high thresholds lead to velocity distributions with higher kurtosis (departure from normality), while requiring less control effort (Extended Data Fig. 7a–c).

How does sensory salience affect performance of the triggered excitation (mode switching) heuristic? To simulate the effects of changes in sensory salience, we parametrically varied the sensory noise variance while keeping constant the switching thresholds Tmin and Tmax. As the sensory noise variance was increased (simulating a decrease in salience), the kurtosis value of the velocity distribution decreased, numerically approaching normality in the limit of high sensory noise (Extended Data Fig. 7d–g). This trend of decreased kurtosis in the face of increased noise variance captures the widespread observation that, in animals, the velocity statistics tend towards a Gaussian distribution as sensory salience is decreased. Moreover, the underlying mechanism, namely, increasing frequency of bursts of exploratory movements, matches our experimental observations in E. virescens, which performed more frequent transitions to fast movements and spent less time in the slow mode in the lights-off trials than in the lights-on trials (Fig. 2c,d). These analyses clarify that this trend towards a Gaussian distribution with decreased salience is an epiphenomenon of mode switching: as the frequency of fast movement bursts increases, it overwhelms the task-oriented movements, diminishing the prominence of the central peak.

Discussion

We examined explore–exploit trade-offs in the context of goal-direct motor behaviours, such as station keeping, postural balance and plume tracking, that require active, exploratory movements to enhance sensation. We discovered that the velocity distributions that emerge from the interplay between exploratory movements and goal-directed control are broad-shouldered across taxa, and that this distinctive distribution of movements is robustly modulated by sensory salience. The bouts of ancillary movements that comprise the broad shoulders of these velocity distributions are commonly described as ‘active sensing’, that is, the expenditure of energy by organisms for the purpose of sensing40, for example, ancillary movements described here. Active sensing also includes the emission of energetically costly signals such as electric fields by weakly electric fishes41 and echolocation calls in dolphins, birds and bats42,43,44,45. Active-sensing research in humans, in relation to touch, was popularized in the 1960s by J. J. Gibson1, and the original ideas date back at least to the eighteenth century (for a historical account, see ref. 40).

Surprisingly, active sensing is largely avoided in engineering design despite being ubiquitous in animals. The performance of engineered systems may benefit from the generation of movement for improved sensing. An algorithm known as Ergodic Information Harvesting (EIH)3 could be used to control movements for sensing in artificial systems. This algorithm balances the energetic costs of generating movements against the expected reduction in sensory entropy. The EIH has been tested in relation to several animal model systems, and produces plausible animal trajectories3.

Interestingly, the EIH algorithm produces the opposite trend in kurtosis of velocity distributions in relation to sensory salience (Extended Data Fig. 2g–l and Extended Data Table 3) that we observed in our experiments, reanalysis of previous data and in our model: as sensory salience decreases, there is an increase in active-sensing movements but a decrease in kurtosis (Extended Data Table 2). That EIH leads to decrease in kurtosis occurs in part because EIH generates continuous-sensing movements, and does not incorporate mode switching. A refined EIH model, that generates the temporally distinct periods of sensing movements that characterize mode switching would better reflect our findings in animals, and is a promising strategy for improving the performance of robotic control systems.

How mode switching is manifest across the diverse biological systems we examined is a compelling open question. Many of these control systems have evolved via convergent evolution in which adaptive strategies emerge independently across lineages. One result of convergent adaptation is that species often rely on idiosyncratic features, such as feathers or skin flaps, to achieve the same adaptive strategy, such as flapping flight. We infer that the mechanisms for mode switching are present in control systems that range from subcellular systems46 to neural systems in vertebrates.

The mechanisms for mode switching in vertebrate nervous systems may emerge at different levels within sensorimotor control pathways. For example, neurophysiological recordings show that sensory salience can be encoded in brain circuits via synchronization and desynchronization of spiking activity47. Such population coding of salience48,49,50, when coupled with a threshold, could trigger discrete bursts of motor activity for sensing8. Motor circuits for the production of discrete bursts of movement occur in spinal circuits51.

These discrete bursts of movements could arise from reflex-like, threshold-based activity in animals, akin to how Mauthner cells trigger a cascade of motor activity when sensory inputs exceed a threshold52. A key difference between reflex-like, threshold-based behaviours and the mode switching we describe in this paper would be that the signal in question would arise from an internal representation of sensory uncertainty, rather than from the overall level of sensory excitation. Such a reflex-like action could produce stereotyped forms of interactions with the external environment in relation to sensing8.

A common engineering approach to sensing and control is to add sensors and improve sensor performance, particularly at low frequencies, effectively side-stepping the need for active-sensing movements altogether. Such improved sensing enhances observability without relying on movement. In stark contrast, organismal sensor systems are almost invariably adapting (high pass), necessitating active sensing. Irrespective of whether organisms have achieved an optimal solution to the control problem (or instead are limited by evolutionary constraints on sensor performance), the widespread convergent evolution of a common active-sensing strategy nevertheless suggests an alternative engineering design paradigm. The confluence of adapting sensors53 and the uncertainty-triggered mode-switching heuristic presented in this paper provide a new roadmap for movement control of robotic systems.

In this paper, the explore–exploit trade-off arises from the need for active-state estimation28 in a subset of tasks in which movement is used both for acquisition of information and achieving task goals. However, similar trade-offs arise in a wide variety of potentially more complex behaviours. For example, in foraging where the resources are found in patchy distributions, organisms balance the trade-offs between exploiting a local food source, exploring for distant sources54 and the costs of predation across the habitat55. Similarly, reinforcement learning involves choosing whether to adhere to a familiar option with a known reward or taking the risk to explore unknown options that can lead to increased rewards over the longer term56. We do not have direct evidence that the broad-shouldered feature we have identified in animal movements described here (Figs. 1 and 4)—reflecting the manifestation of mode switching—are also be found in these behavioural domains across taxa. Recent evidence from studies of human reinforcement learning, however, appear to be consistent with mode-switching behaviour.

Methods

Tracking of glass knifefish

Subjects

We obtained adult, weakly electric, glass knifefish E. virescens (10–15 cm in length) from commercial vendors, and housed the fish according to the published guidelines57. The water temperature in the experimental tank was kept between 24 °C and 27 °C, and conductivity ranged from 10 μS cm−1 to 150 μS cm−1. Fishes were transferred from the holding tank to the experimental tank 12–24 h before the experiments to allow for acclimation. All experimental procedures were approved by the Johns Hopkins Animal Care and Use Committee, and followed guidelines established by the National Research Council and the Society for Neuroscience.

Experimental apparatus

The experimental apparatus was similar to that used in previous studies2,6,8,23,58. The refuge was machined from a 152-mm-long segment of 46 × 50 mm rectangular PVC tubing, with the bottom surface removed to allow the camera to record the ventral view of the fish. On both sides of the refuge, a series of 6 rectangular windows (6 mm wide × 31 mm high, spaced 19 mm apart) were machined, through which to provide visual and electrosensory cues.

A computer sent designed digitized input stimuli (25 Hz) from LabVIEW (National Instruments) to a Field Programmable Gate Array based controller for a stepper motor (STS-0620-R, HW W Technologies). The stepper motor drove a linear actuator, leading to the one-degree-of-freedom refuge movement in real time. A video camera (pco.1200, PCO AG) captured fish movements through mirror reflection at 100 Hz. The captured frames (width × height, 1,280 pixels × 276 pixels) were saved as 16 bit .tif files via camera application software (pco.camware, PCO AG).

Experimental procedure

The experiments were conducted in two illuminance levels—around 0.3 lx (lights off) and 80 lx (lights on). Each trial lasted for 60 s. During the initial 10 s of each trial, the refuge was actuated to follow a 0.45 Hz sinusoidal trajectory, the amplitude of which was gradually increased to 3 cm, and then decreased to 0 at the end of the 10 s interval, in a similar fashion as described in ref. 2. After the initiation phase, the refuge remained stationary for 40 s, finally followed by a termination phase for 10 s, during which the refuge was actuated in a similar fashion as during the initiation phase.

Tracking algorithm

To observe fine details of the fish movement, we used a high frame rate in our video recordings. High tracking accuracy was essential as the position and velocity data were likely to be contaminated by measurement noise. To ensure high tracking accuracy, the refuge and fish position were analysed by custom video tracking software59 developed by Balázs P. Vágvölgyi from the Laboratory for Computational Sensing and Robotics, at Johns Hopkins University.

The tracking algorithm worked in two phases. The first phase was template matching, which roughly located the targets (fish or refuge). In the first frame of a given video, a rectangular region was manually selected around the target to create a template. On subsequent frames, a neighbourhood region around the template (±20 pixels) from the previous frame was selected for the computation of a normalized two-dimensional cross-correlation matrix. If the target changed its orientation in the new frames, before computing the normalized two-dimensional cross-correlation, the new image frame was first rotated to match the orientation of the template from previous frame. If needed, the areas of the image were sampled (then scaled and interpolated if necessary) with subpixel accuracy.

After creating the template, the second phase applied the Levenberg–Marquardt algorithm to find the global maximum of the normalized cross-correlation function. This step produced a match between the template and target at each frame, with subpixel accuracy. We performed extensive preliminary testing and analysis to confirm that the remaining measurement errors had smaller variance than the stochastic movements of the fish.

Data processing

The tracking algorithm stored the fish position in both horizontal and vertical directions (originally in pixels, along with the respective pixel-to-metre conversion factor) and the angle of orientation (in degrees) in .csv files. We used only the data while the refuge was stationary (40 s, 4,000 data points in total) for each trial. To further reduce the measurement noise, the position data were filtered through a Butterworth zero-phase distortion filter (filtfilt command in MATLAB) with a 10 Hz cut-off frequency. Fish velocity in the horizontal direction was computed as forward differences of the horizontal position time series.

Identification and characterization of behavioural modes

For the identification of the behavioural modes, we used three different clustering approaches—(1) a GMM with inflection-point-based clustering, (2) hidden Markov model (HMM)-based clustering, and (3) a GMM with maximum a posteriori probability (MAP)-based clustering.

For GMM with inflection-point-based clustering, the velocity (v) data from each individual fish at a specific lighting condition were clustered into three components, slow, fast positive and fast negative (Extended Data Fig. 3a,b), using two velocity thresholds, vL and vH (vL < vH), resulting in two behavioural modes—slow and fast (fast positive and negative were combined). The velocity threshold values were computed by finding the inflection points of the GMM fits to the velocity data, fGMM, specific to a lighting condition. To numerically identify the inflection points of fGMM, we numerically computed the spatial second-order derivative of fGMM (\({f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }\)), and located the first and the last indices of the array \({{f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }}/{{f}_{{{{\rm{GMM}}}}}}\) such that the condition \({{f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }}/{{f}_{{{{\rm{GMM}}}}}} < c\) was satisfied for a given ad hoc constant c, selected as described below. This method separated the central peak of the fGMM, velocity distribution around zero velocity, from the broad shoulders. We chose c = 0.005 for all the individual fish irrespective of the lighting conditions, except for fish 1 lights-off trials (c = 0.02); the different c value for fish 1 lights-off trials was chosen so that the relative area under the central peak of the distribution was less than 0.6 (similar to other fish during lights-off trials). For further analysis of these behavioural modes, we assumed a continuous-time Markov chain model (Extended Data Fig. 3c,d). For infinitesimal dt, the transition probabilities from state i to state j, Qij are given as follows:

$$\,{{\mbox{Pr}}}\,(v(t+{\mathrm{d}}t)=j| v(t)=i)={Q}_{ij}{\mathrm{d}}t+O({\mathrm{d}}t^2),\quad i\ne j$$
(4a)
$${Q}_{ii}=-\mathop{\sum}\limits_{i\ne j}{Q}_{ij}$$
(4b)

where O denotes order, and the probability matrix P with pij = Pr(v(t) = jv(0) = i) and transition rate matrix Q with entries Qij satisfy the first-order differential equation

$$\frac{{\mathrm{d}}}{{\mathrm{d}}t}{{{\bf{P}}}}={{{\bf{PQ}}}},$$
(5)

whose solution is given by

$${{{\bf{P}}}}={{\rm{e}}}^{t{{{\bf{Q}}}}}=\mathop{\sum }\limits_{n=0}^{\infty }\frac{{(tQ)}^{n}}{n!}.$$
(6)

For every trial from each individual fish, we computed the probability matrix P with entries pij, i = 1, 2, j = 1, 2 where states 1 and 2 correspond to slow (exploit) and fast (explore) modes, respectively. We used the approximation to the matrix exponential in equation (6), \({{{\bf{Q}}}}\approx \frac{1}{h}({{{\bf{P}}}}-I)\) for the computation of the transition rates between slow and fast modes in each trial from the respective probability matrix, P. Here h is time step = 0.01 s and I is 2x2 identity matrix.

For HMM clustering, we combined all the positional trial data (xt) from all the five fish at a specific lighting condition along with their negatives (−xt). The subscript t is a variable representing time. We included the negative data to eliminate any directional bias. We assumed that the observed measurements of position, xt, follow a homogeneous Markov switching first-order autoregressive model:

$${x}_{t}={\alpha }_{0}^{{s}_{t}}+{\alpha }_{1}^{{s}_{t}}{x}_{t-1}+\sigma {\varepsilon }_{t}$$
(7)

where the superscript st {1, 2, 3} refers to the hidden discrete state, coefficients \(\alpha^{s_t}_k \in {0,1}\) are model parameters, ε_t is Gaussian white noise, and \(\sigma\) is the noise variance. We fit this model using the NHMSAR package in R.

The HMM fitting resulted in three clusters similar to slow, fast positive and fast negative, as obtained with the GMM with inflection-point-based clustering method (Extended Data Fig. 3e–h). Finally by combining fast positive and negative, we ended up with two behavioural modes—fast and slow—for further computation of switching frequency and residence time.

In GMM with MAP-based clustering, GMM models with three components were fitted to the velocity data from each individual fish at a specific lighting condition. We assigned the cluster index for each data point based on the maximum a posteriori probability using Bayes’ rule. This method required a post hoc assignment of which cluster or clusters correspond to the ‘slow’ behavioural mode to compute residence time; see, for example, Extended Data Fig. 3i–m.

All analysis was performed using code written in R and MATLAB.

Simulation

Sensory adaptation is a robustly observed phenomenon among organisms ranging from unicellular amoebae37,38 to humans26 where the sensory systems stop responding to constant stimuli. Here we modelled this adaptive/high-pass nature of the sensory receptors as a ‘motion-dependent sensor’ for which we assumed a nonlinear measurement model24,25 with sensory noise w2(t):

$$y(t)=\frac{{\mathrm{d}}}{{\mathrm{d}}t}s(x(t))+{w}_{2}(t)=\frac{{\mathrm{d}}}{{\mathrm{d}}x}s(x(t))v(t)+{w}_{2}(t)=g(x)v(t)+{w}_{2}(t).$$
(8)

Here, s(x) is the position-dependent sensory scene experienced by the organism. For the present study, we assumed a quadratic sensory scene \(s(x)=\frac{1}{2}\alpha {x}^{2}+\beta x\) with non-zero constant sensory-scene parameters α and β. This assumption on sensory scene yields g(x) = αx + β, a linear function of position, x.

Due to the presence of the nonlinearity in the measurement, we used an extended Kalman filter for state estimation, a common heuristic. For the state-feedback component, we applied \(f(\hat{x},\hat{v})={k}_{1}\hat{x}+{k}_{2}\hat{v}\) where \(k_1\) and \(k_2\) are the position and velocity feedback constants, respectively. In the triggered-excitation scheme, for the uncertainty measure (M) we used the trace of the state estimation error covariance matrix, Tr(P(t)). When the uncertainty measure Tr(P(t)) rose above a maximum threshold, Tmax, the controller generated active-sensing component, ua(t) as a Gaussian input with fixed power spectral density and it continued to inject the input until Tr(P(t)) dropped below a lower threshold, Tmin. At this point, the controller switched back to traditional state-feedback form. For the persistent-excitation scheme, the controller continued to inject a Gaussian input ua(t) for all time.

To obtain the critical excitation level of the active-sensing component ua,crit(t) for optimum tracking performance in persistent excitation, we chose 30 logarithmically spaced variance values of ua(t) from 1 to 100. From the mean of 100 independent simulations for each variance value, we obtained ua,crit(t) ≈ 9.33, which achieved the minimum RMS tracking error of ePE,min ≈ 0.071 and RMS control effort uPE ≈ 10. Using this critical value for the excitation ua,crit(t), we studied the effect of thresholds in triggered excitation by varying Tmax and Tmax/Tmin linearly from 4 × 10−3 to 10 × 10−3, and 0.5 to 1, respectively, and performed 300 independent simulations for each pair of values.

The system parameters were chosen from previous studies6,24 as follows: b = 1.7, m = 1, α = 3, β = 5, \({k}_{1}=m{\omega }_{n}^{2}\), k2 = (2mζωn + b), ζ = 0.56 and ωn = 1.05 × 2π. The process noise, w1(t) and sensor noise, w2(t) were chosen as fixed Gaussian noise inputs with variances 0.03 and 10, respectively.

Statistics

All the statistical analysis was performed with sign test and Mann–Whitney–Wilcoxon test using custom codes written in R version: 4.3.0, R Core Team, and MATLAB version: 9.12 (R2022a), MathWorks. For all tests, the significance level was set to 0.05. The experimental and simulation data are provided as either mean plus or minus the standard deviation (μ ± s.d.) or mean plus or minus the standard error of the mean (μ ± s.e.m).