Abstract
Trade-offs between producing costly movements for gathering information (‘explore’) and using previously acquired information to achieve a goal (‘exploit’) arise in a wide variety of problems, including foraging, reinforcement learning and sensorimotor control. Determining the optimal balance between exploration and exploitation is computationally intractable, necessitating heuristic solutions. Here we show that the electric fish Eigenmannia virescens uses a salience-dependent mode-switching strategy to solve the explore–exploit conflict during a refuge-tracking task in which the same category of movement (fore-aft swimming) is used for both gathering information and achieving task goals. The fish produced distinctive non-Gaussian distributions of movement velocities characterized by sharp peaks for slower, task-oriented ‘exploit’ movements and broad shoulders for faster ‘explore’ movements. The measures of non-normality increased with increased sensory salience, corresponding to a decrease in the prevalence of fast explore movements. We found the same sensory salience-dependent mode-switching behaviour across ten phylogenetically diverse organisms, from amoebae to humans, performing tasks such as postural balance and target tracking. We propose a state-uncertainty-based mode-switching heuristic that reproduces the distinctive velocity distribution, rationalizes modulation by sensory salience and outperforms the classic persistent excitation approach while using less energy. This mode-switching heuristic provides insights into purposeful exploratory behaviours in organisms, as well as a framework for more efficient state estimation and control of robots.
Similar content being viewed by others
Main
Organisms show complex patterns of movement that arise from the interplay between obtaining information (‘explore’)1,2,3 and using current information (‘exploit’)4. Exploratory movements to gain information and exploitative movements to achieve the task at hand are often mediated by the same motor systems. For example, the weakly electric glass knifefish (Eigenmannia virescens) produces both information-seeking exploratory movements2,3,5 and goal-driven exploitative movements to remain within a refuge6,7 using the same ribbon-fin locomotor system8,9. Both of these types of movement occur in a single linear dimension, along the rostrocaudal axis. This behaviour makes E. virescens an excellent model system with which to investigate the interplay between explore and exploit movements: within a fixed refuge, fish produce ancillary back-and-forth exploratory movements to sense the refuge2,3,5, but these back-and-forth (explore) movements conflict with the corrective movements (exploit) required for station keeping.
Resolving this conflict between explore movements5 versus goal-directed exploit movements is a computationally intractable optimization problem10,11,12. How do organisms resolve the explore–exploit conflict? A simple heuristic to solve this problem would be for an organism to perform goal-directed exploit movements while superimposing continuous small exploratory sensing movements—in other words, to use a persistent excitation approach13. Indeed, this heuristic has proven effective (if suboptimal) as an engineering approach to solve the explore–exploit problem of identifying states and parameters of a dynamical system during task execution13. If organisms were to employ such a strategy, they would produce movement statistics that correspond to a single behavioural mode (for example, a single-component Gaussian distribution) that continuously superimposes explore and exploit behaviour.
In contrast, we discovered that E. virescens does not use a persistent excitation strategy; instead, it shows a mode-switching strategy between fast, active-sensing movements (explore) and slow, corrective movements (exploit). This mode switching is modulated by sensory salience (Fig. 1 and 2). To assess the generality of this mode-switching strategy we investigated ten additional tasks performed by ten species ranging from amoebae to humans14,15,16,17,18,19,20,21,22, using five major sensing modalities—vision, audition, olfaction, tactile sensing and electrosensation (Fig. 3). On the basis of this extensive reanalysis, we found that such mode switching—and its dependence on sensory salience—is found across diverse behaviours, taxa and sensing modalities (Fig. 4). Inspired by this widespread biological strategy, we propose an engineering heuristic for selecting behavioural modes based on state uncertainty (Fig. 5), and show that this heuristic captures key features of mode switching found across organismal models. Furthermore, we show that this mode-switching heuristic can achieve better task-level performance, and do so with less control effort, than the conventional persistent-excitation strategy.
E. virescens exhibited fast and slow behavioural modes
We examined the behaviour of individual E. virescens as they performed untrained station keeping within a fixed refuge (Fig. 1a,b). Station keeping requires only small corrective movements; therefore, any significant movements by the fish are attributed to information-seeking, exploratory movement2,3,5. Previous work23 has demonstrated that E. virescens use both vision and electrosense for station keeping. Hence varying the light level is an experimental mechanism to examine the effect of visual salience on the selection between explore and exploit movements.
We measured the movements of five individual fish in 40 s duration station-keeping trials, in two lighting conditions: lights ‘off’ trials had low illumination (~0.3 lx; Supplementary Video 1) and lights ‘on’ trials had bright illumination (~80 lx; Supplementary Video 2). We conducted between seven and ten trials per condition per fish. We discarded trials in which the fish changed its swimming direction or exited the refuge. Consistent with previous studies2,5, fish moved significantly more in lights-off conditions than in lights-on conditions (Fig. 1c,d and Extended Data Fig. 1a–d). However, these previous analyses2,5 focused on tracking performance using analytical methods including Fourier analysis and root-mean-square (RMS) metrics that masked the temporal structure of the active-sensing movements that we seek to understand in this paper.
We found the patterns of fish swimming velocities were consistent with a mode-switching strategy. The distribution of velocities (v) featured a sharp peak around v = 0 with ‘broad’ shoulders for faster movements (Fig. 1e,f, right). These empirical distributions differed from a Gaussian distribution in two ways: (1) the distinct central peak and (2) the broad shoulders corresponding to the faster movements. The central peak (near-zero velocity) represents slow movement and the broad shoulders represent faster movement. These behavioural modes are associated with exploit and explore, respectively, as discussed in greater detail below.
The two behavioural modes were significantly better approximated by three-component Gaussian mixture models (GMMs) than by single-component models (Fig. 1e,f, right). This was shown by three measures, namely Kullback–Leibler divergence, Bayesian information criterion and closeness of quantile–quantile plots to the reference line (Extended Data Table 1, Extended Data Fig. 1e,f and Supplementary Fig. 1). The three-component GMMs generally comprised a sharp central Gaussian peak, capturing slow, task-oriented station-keeping movements, and two Gaussian ‘shoulders’, capturing faster, positive (forwards) and negative (backwards) exploratory movements. We found that only modest improvements in the fit of the GMMs occurred when using more than three components (Extended Data Fig. 1g).
The fast-mode movements increased in frequency in lights-off trials, increasing the relative prominence of the ‘shoulders’. For example, Fig. 1e,f shows representative data from one fish in which there were 48 fast movements with lights off (Fig. 1e, left) but only 13 fast movements with lights on (Fig. 1f, left). Interestingly, the overall higher proportion of fast velocities in lights-off trials leads to a surprising result, namely, higher kurtosis values for lights-on versus lights-off trials (Extended Data Fig. 1h). In other words, the increase in frequency of fast motions in the dark leads to a decrease in the relative prominence of the central, task-oriented velocity peak at v = 0, so that the overall distribution is closer to Gaussian and the kurtosis trends towards 3.
We found that the trend towards a Gaussian distribution of movement velocities in lights-off trials (reduced sensory salience) to be surprising because the exploratory movements for actively sensing the environment are associated with a nonlinear requirement24,25 to make movements that are potentially in conflict with task goals. Therefore, our initial hypothesis—that this nonlinearity would produce increased deviation from a Gaussian velocity distribution as sensory salience was reduced—was not supported. Our initial intuition failed because we did not appreciate that decreases in sensory salience drives the selection of explore behaviour, and that behaviour itself is approximately Gaussian, ultimately reducing the relative prominence of the task-oriented central peak.
Interestingly, reanalysis of data from a previous study of exploratory movements in a similar refuge-tracking paradigm in E. virescens show the same relationship between sensory salience and changes in velocity profiles, but for modulations of a different sensory modality, namely, electrosensation3. In these previous experiments, artificially generated electrical signals were used to diminish the salience of electrosensory information as the electric fish performed the refuge tracking. Our reanalysis of these published data (see Supplementary Material and Methods for details) showed that fish exhibited the distinctive non-Gaussian distribution of velocities. Moreover, the velocity distributions were modulated in relation to electrosensory salience: lower kurtosis values (corresponding to more normal distributions) occurred in experimental trials with added artificial electrosensory ‘jamming’ signals (Extended Data Fig. 2a–f).
Sensory salience drives explore–exploit mode switching
How do changes in sensory salience drive changes in mode switching? To investigate this question, we segregated the velocity trajectories into ‘S’, a slow-velocity mode (exploit) comprising task-oriented, station-keeping movements, and ‘F’, a fast-velocity mode (explore) comprising large positive (forwards) and negative (backwards) movement velocities (Fig. 2a,b, Extended Data Fig. 3 and Supplementary Fig. 2; see Methods for different clustering algorithms used).
Fish produced slow- and fast-velocity modes of movements in both lights-on and lights-off trials. We computed the residence time in each behavioural mode as a proportion of the total time spent in that mode compared with the trial duration of 40 s (note that the residence time in slow and fast modes adds up to unity). The residence time τs in the slow exploit mode was significantly higher (>1.7 times) in lights-on trials than the lights-off trials (Fig. 2c). In contrast, the residence time in the fast explore mode (1 − τs) was higher in light-off than in lights-on trials.
Fish switched between slow and fast modes more frequently in lights-off trials than in lights-on trials (Fig. 2d). From the computation of the transition rates between slow (S) and fast (F) modes as a two-state Markov process, we found that the transition rate S → F was significantly lower in lights-on versus lights-off trials, that is, the slow (exploit) state was visited more frequently in the lights-on trials compared with lights-off trials (Extended Data Fig. 3b). This salience-dependent modulation of switching frequency was the key mechanism by which movement velocity distributions trended towards a Gaussian distribution as a function of decreased sensory salience.
Mode-switching across taxa, behaviours and sensory modalities
Is this mode-switching strategy solution for the explore-versus-exploit problem found in other species, in other categories of behavioural tasks and in control systems that rely on other sensing modalities?
To answer this question, we analysed published data for an additional ten species, representing a wide phylogenetic range of taxa, from single-celled organisms to humans, involving categorically different tasks and sensorimotor regimes3,14,15,16,17,18,19,20,22. These taxonomically diverse species were selected to encompass a wide range of behaviours that rely on a broad range of sensory systems (Fig. 3). For every example we examined, we found the same distinctive non-normal distribution of velocities, with a peak at low-velocity movements and broad shoulders for higher-velocity movements (Fig. 4).
For example, postural sway movements in humans are thought to prevent the fading of postural state information during balance26. Our reanalysis of quiescent stance data17 revealed evidence of mode switching (Fig. 4a) that is remarkably similar to our findings in electric fish. In the quiescent stance task, human participants used visual and tactile feedback to maintain an upright posture. The distribution of sway velocities revealed a distinct peak at low velocities corresponding to the task goal, and broad shoulders for higher velocities produced by the exploratory movements; the velocity statistics were better captured by a GMM than a normal distribution. Furthermore, the velocity distribution showed the same surprising relation to changes in salience, becoming more Gaussian as well as an increase in the switching frequency when sensory salience was decreased (Extended Data Fig. 4a–d), as seen in the electric fish E. virescens.
Mode switching was also observed in invertebrate species. For example, the Carolina sphinx hawkmoth (Manduca sexta) uses somatosensory feedback from their proboscis to detect the curvature of flowers when searching for nectaries at dawn and dusk20. In this search behaviour, which has dynamics that are qualitatively similar to vibrissal sensing in rats27, the moth sweeps its proboscis across the surface of the flower using a combination of slow- and high-velocity movements. Our analysis of the distribution of the rate of change of radial orientation angle (angle between the proboscis tip trajectory and the radial axis of the flower), before the insertion of the proboscis tip into the nectary shows the characteristic sharp peak with broad shoulders (Fig. 4g) that is captured by a GMM. Experimental changes in the shape of artificial flowers that degrade the salience of the curvature of the flower surface20 resulted in a decrease of the kurtosis value of the proboscis angular velocity distribution (Extended Data Fig. 5a–f), similar to how both E. virescens and humans responded to changes in sensory salience.
The fact that this salience-based, mode-switching strategy was found in two distantly related classes (mammalia and insecta) performing very different behaviours, using different sensorimotor systems, suggests that the strategy emerged as a convergent solution to the explore-versus-exploit problem. We found additional evidence of convergence of this solution in reanalysis of eight additional datasets: visual saccades in humans (Fig. 4b and Supplementary Fig. 3)21 and in house mice Mus musculus (Fig. 3c and Supplementary Fig. 4)15, movements of the pinnae of echolocating big brown bats Eptesicus fuscus (Fig. 4d and Supplementary Fig. 5)16, olfaction in eastern moles Scalopus aquaticus (Fig. 4e and Supplementary Fig. 6)14 and American cockroaches Periplaneta americana (Fig. 4f and Extended Data Fig. 4e–f)18, and visual tracking of a swaying flower in three species of hawkmoths (M. sexta, Deilephila elpenor and Macroglossum stellatarum; Fig. 4h and Extended Data Fig. 5g–p)19. The discovery of a similar, parsimonious velocity distributions across taxa, behaviour and sensing modalities, with consistent dependency on sensory salience, was surprising.
Intriguingly, our analysis of the dynamics of transverse exploration by pseudopods of amoebae22 (Amoeba proteus and Metamoeba leningradensis) reveals similar GMM velocity distributions in response to an electric field (Extended Data Fig. 6). Although our modelling approach (see next section) includes inertial dynamics, which cannot be directly applied to movement of organisms in the low-Reynolds-number regimes occupied by single-celled and other microscopic organisms, these observations are consistent with a mode-switching strategy for the control of movement in these amoebae.
The examples described above include a broad phylogenetic array of organisms that perform a variety of behavioural tasks using different control and morphophysiological systems. Just as these behavioural systems evolved within each of the lineages represented in our reanalyses, we suggest that mode switching probably evolved independently in each lineage as well. In other words, the similarities we found across taxa are the result of convergent evolution towards a common solution—mode switching—for the explore-versus-exploit problem.
Heuristic model of the mode-switching strategy
Why might animals use mode switching, rather than the simpler heuristic of applying continual, low-amplitude exploratory inputs used by control engineers13? To address this question, we propose a parsimonious heuristic model that comprises a nonlinear motion-dependent sensor, a linear musculoskeletal plant, a state estimator (also known as an observer28) and a mode-switching controller (Fig. 5a). For the musculoskeletal plant, we assumed a simplified second-order Newtonian model9,24:
Here t is dimensionless time, x(t) is dimensionless position, v(t) is dimensionless velocity, u(t) is the controller input and w1(t) is process noise. The process noise includes noise due to physical disturbances29,30 as well as motor noise31. The system parameters m and b represent unitless mass and viscous damping, respectively.
The key feature of the model is that the nonlinear sensory system (that is, the ‘motion-dependent sensor’) embodies the high-pass filtering (that is, fading or adapting) characteristics found across biological sensory systems26,32,33,34,35,36,37,38. This sensory system model (‘motion-dependent sensor’ in Fig. 5a) assumes nonlinear measurements that decay to zero over time in the face of constant stimuli:
Here s(x) is the position-dependent sensory stimulus experienced by the organism, g(x) is the spatial derivative of the sensory stimulus (ds(x)/dx), and w2(t) is the sensory noise. The controller includes a state-feedback-based task-level control policy, \(-f(\hat{x},\hat{v})\), which exploits previously collected sensory information; that information is parsimoniously encoded (1) in estimates of the position and velocity \((\hat{x},\hat{v})\), and (2) in an ongoing measure of uncertainty, M (based on the covariance of position and velocity estimates; see Methods). Previous theoretical work has demonstrated that exploratory movements are required for state estimation in control systems that rely on such high-pass (that is, fading) sensors24,25. Hence, the controller also includes an active-sensing control policy, ua(t), that seeks to gain new information through exploratory movements.
To find the optimal balance between exploit and explore components, for a given admissible control policy, π and a given weight, r for an input, \(u\in {{{\mathcal{U}}}}\) (action space), we can define the average steady-state cost function:
where \({\mathbb{E}}\) is the expectation computed over all the trajectories induced by admissible control policy, π. Note that a control policy is admissible if it depends causally on the sensor and actuator data. We chose the cost function, Jπ as a weighted combination of steady-state tracking error and control effort. Even with complete knowledge of the system states, computation of the optimal solution \({J}_{{\pi }^{* }}=\mathop{\inf }\limits_{\pi \in {{\varPi }}}{J}_{\pi }\) where Π is the set of all admissible control policies, is only tractable in the case of linear systems or systems with finite state and action spaces.39. As the system is partially observed, existing approaches to optimal control require the solution to an optimal filtering problem and then formulate feedback laws on the filter states39. The filtering problem requires computation of the conditional probability \({\mathbb{P}}\left(\left[\begin{array}{c}x(t)\\ v(t)\end{array}\right]\left\vert \right.\,y(\tau ),u(\tau )\,\forall \tau \le t\right)\). However, due to nonlinearity in the measurement (equation (2)), there is no tractable method to compute this conditional probability, and so heuristic strategies are required. We tested three exploratory movement heuristics for the controller to find an approximate answer to this intractable optimal control problem.
-
(1)
Zero excitation: this is a passive strategy (that is, no exploration) in which the system provides no input excitation for the actuation forces (ua(t) = 0 for all t). This is a conventional state-feedback controller.
-
(2)
Persistent excitation: this scheme tests a common continuous exploration strategy used in the field of adaptive control13. The controller continually injects a Gaussian input ua(t).
-
(3)
Triggered excitation: this mode-switching strategy depends on lower and upper thresholds, Tmin and Tmax; the controller injects Gaussian input only when the uncertainty in the state estimator M exceeds Tmax, and then continues to inject input until this uncertainty drops below a lower threshold, Tmin (Fig. 5b).
As previous theoretical work has shown25, the zero-excitation strategy (that is, traditional state-estimate feedback) cannot minimize the state estimation error and hence, not surprisingly, results in poor tracking performance (Fig. 5c), thus illustrating the need for an additional active-sensing component in the face of adaptive sensing and perceptual fading. The persistent-excitation and triggered-excitation strategies both facilitate substantially better position control than does the zero-excitation strategy (Fig. 5c,d). Although these two strategies resulted in comparable tracking errors (eRMS; Fig. 5b,c), the triggered excitation was more efficient, requiring substantially lower control effort (uRMS; Fig. 5g,h). Moreover, unlike the persistent-excitation strategy, triggered excitation generated a distinctive broad-shouldered velocity distribution that featured a sharp peak near zero, with broad shoulders corresponding to bursts of fast movement (Fig. 5j,k). This distribution was strikingly similar to experimental observations across organisms (Figs. 1 and 4 and Extended Data Figs. 4–6), suggesting that such broad shoulders are a signature (if not definitive proof) of a mode-switching strategy.
We showed that active exploration is essential for better tracking performance as it improves state estimation. But, there is a point of diminishing returns: although higher (more energetic) active excitation can result in excellent state estimation, there is a point beyond which these additional active-sensing movements lead to greater tracking errors.
To contrast between persistent excitation and triggered excitation, we performed a numerical study to obtain the variance of the active-sensing signal ua(t) that minimizes the RMS tracking error for the persistent excitation strategy, ePE,min. Note that persistent excitation is the limiting case of triggered excitation with extremely low threshold values (that is, insuring that the active-sensing mode is always ‘on’). With that optimum stimulation obtained from persistent excitation, we next performed a parameter sweep involving threshold pair (Tmax, Tmin) in the triggered excitation. We discovered that the choice of thresholds in the triggered-excitation strategy plays an important role—with the right choice of parameters we could achieve better tracking performance (Fig. 5m) at reduced control effort (Fig. 5n). The choice of thresholds also shapes the velocity to best extract sensory information; with low thresholds, the statistics approach that of the persistent excitation, whereas high thresholds lead to velocity distributions with higher kurtosis (departure from normality), while requiring less control effort (Extended Data Fig. 7a–c).
How does sensory salience affect performance of the triggered excitation (mode switching) heuristic? To simulate the effects of changes in sensory salience, we parametrically varied the sensory noise variance while keeping constant the switching thresholds Tmin and Tmax. As the sensory noise variance was increased (simulating a decrease in salience), the kurtosis value of the velocity distribution decreased, numerically approaching normality in the limit of high sensory noise (Extended Data Fig. 7d–g). This trend of decreased kurtosis in the face of increased noise variance captures the widespread observation that, in animals, the velocity statistics tend towards a Gaussian distribution as sensory salience is decreased. Moreover, the underlying mechanism, namely, increasing frequency of bursts of exploratory movements, matches our experimental observations in E. virescens, which performed more frequent transitions to fast movements and spent less time in the slow mode in the lights-off trials than in the lights-on trials (Fig. 2c,d). These analyses clarify that this trend towards a Gaussian distribution with decreased salience is an epiphenomenon of mode switching: as the frequency of fast movement bursts increases, it overwhelms the task-oriented movements, diminishing the prominence of the central peak.
Discussion
We examined explore–exploit trade-offs in the context of goal-direct motor behaviours, such as station keeping, postural balance and plume tracking, that require active, exploratory movements to enhance sensation. We discovered that the velocity distributions that emerge from the interplay between exploratory movements and goal-directed control are broad-shouldered across taxa, and that this distinctive distribution of movements is robustly modulated by sensory salience. The bouts of ancillary movements that comprise the broad shoulders of these velocity distributions are commonly described as ‘active sensing’, that is, the expenditure of energy by organisms for the purpose of sensing40, for example, ancillary movements described here. Active sensing also includes the emission of energetically costly signals such as electric fields by weakly electric fishes41 and echolocation calls in dolphins, birds and bats42,43,44,45. Active-sensing research in humans, in relation to touch, was popularized in the 1960s by J. J. Gibson1, and the original ideas date back at least to the eighteenth century (for a historical account, see ref. 40).
Surprisingly, active sensing is largely avoided in engineering design despite being ubiquitous in animals. The performance of engineered systems may benefit from the generation of movement for improved sensing. An algorithm known as Ergodic Information Harvesting (EIH)3 could be used to control movements for sensing in artificial systems. This algorithm balances the energetic costs of generating movements against the expected reduction in sensory entropy. The EIH has been tested in relation to several animal model systems, and produces plausible animal trajectories3.
Interestingly, the EIH algorithm produces the opposite trend in kurtosis of velocity distributions in relation to sensory salience (Extended Data Fig. 2g–l and Extended Data Table 3) that we observed in our experiments, reanalysis of previous data and in our model: as sensory salience decreases, there is an increase in active-sensing movements but a decrease in kurtosis (Extended Data Table 2). That EIH leads to decrease in kurtosis occurs in part because EIH generates continuous-sensing movements, and does not incorporate mode switching. A refined EIH model, that generates the temporally distinct periods of sensing movements that characterize mode switching would better reflect our findings in animals, and is a promising strategy for improving the performance of robotic control systems.
How mode switching is manifest across the diverse biological systems we examined is a compelling open question. Many of these control systems have evolved via convergent evolution in which adaptive strategies emerge independently across lineages. One result of convergent adaptation is that species often rely on idiosyncratic features, such as feathers or skin flaps, to achieve the same adaptive strategy, such as flapping flight. We infer that the mechanisms for mode switching are present in control systems that range from subcellular systems46 to neural systems in vertebrates.
The mechanisms for mode switching in vertebrate nervous systems may emerge at different levels within sensorimotor control pathways. For example, neurophysiological recordings show that sensory salience can be encoded in brain circuits via synchronization and desynchronization of spiking activity47. Such population coding of salience48,49,50, when coupled with a threshold, could trigger discrete bursts of motor activity for sensing8. Motor circuits for the production of discrete bursts of movement occur in spinal circuits51.
These discrete bursts of movements could arise from reflex-like, threshold-based activity in animals, akin to how Mauthner cells trigger a cascade of motor activity when sensory inputs exceed a threshold52. A key difference between reflex-like, threshold-based behaviours and the mode switching we describe in this paper would be that the signal in question would arise from an internal representation of sensory uncertainty, rather than from the overall level of sensory excitation. Such a reflex-like action could produce stereotyped forms of interactions with the external environment in relation to sensing8.
A common engineering approach to sensing and control is to add sensors and improve sensor performance, particularly at low frequencies, effectively side-stepping the need for active-sensing movements altogether. Such improved sensing enhances observability without relying on movement. In stark contrast, organismal sensor systems are almost invariably adapting (high pass), necessitating active sensing. Irrespective of whether organisms have achieved an optimal solution to the control problem (or instead are limited by evolutionary constraints on sensor performance), the widespread convergent evolution of a common active-sensing strategy nevertheless suggests an alternative engineering design paradigm. The confluence of adapting sensors53 and the uncertainty-triggered mode-switching heuristic presented in this paper provide a new roadmap for movement control of robotic systems.
In this paper, the explore–exploit trade-off arises from the need for active-state estimation28 in a subset of tasks in which movement is used both for acquisition of information and achieving task goals. However, similar trade-offs arise in a wide variety of potentially more complex behaviours. For example, in foraging where the resources are found in patchy distributions, organisms balance the trade-offs between exploiting a local food source, exploring for distant sources54 and the costs of predation across the habitat55. Similarly, reinforcement learning involves choosing whether to adhere to a familiar option with a known reward or taking the risk to explore unknown options that can lead to increased rewards over the longer term56. We do not have direct evidence that the broad-shouldered feature we have identified in animal movements described here (Figs. 1 and 4)—reflecting the manifestation of mode switching—are also be found in these behavioural domains across taxa. Recent evidence from studies of human reinforcement learning, however, appear to be consistent with mode-switching behaviour.
Methods
Tracking of glass knifefish
Subjects
We obtained adult, weakly electric, glass knifefish E. virescens (10–15 cm in length) from commercial vendors, and housed the fish according to the published guidelines57. The water temperature in the experimental tank was kept between 24 °C and 27 °C, and conductivity ranged from 10 μS cm−1 to 150 μS cm−1. Fishes were transferred from the holding tank to the experimental tank 12–24 h before the experiments to allow for acclimation. All experimental procedures were approved by the Johns Hopkins Animal Care and Use Committee, and followed guidelines established by the National Research Council and the Society for Neuroscience.
Experimental apparatus
The experimental apparatus was similar to that used in previous studies2,6,8,23,58. The refuge was machined from a 152-mm-long segment of 46 × 50 mm rectangular PVC tubing, with the bottom surface removed to allow the camera to record the ventral view of the fish. On both sides of the refuge, a series of 6 rectangular windows (6 mm wide × 31 mm high, spaced 19 mm apart) were machined, through which to provide visual and electrosensory cues.
A computer sent designed digitized input stimuli (25 Hz) from LabVIEW (National Instruments) to a Field Programmable Gate Array based controller for a stepper motor (STS-0620-R, HW W Technologies). The stepper motor drove a linear actuator, leading to the one-degree-of-freedom refuge movement in real time. A video camera (pco.1200, PCO AG) captured fish movements through mirror reflection at 100 Hz. The captured frames (width × height, 1,280 pixels × 276 pixels) were saved as 16 bit .tif files via camera application software (pco.camware, PCO AG).
Experimental procedure
The experiments were conducted in two illuminance levels—around 0.3 lx (lights off) and 80 lx (lights on). Each trial lasted for 60 s. During the initial 10 s of each trial, the refuge was actuated to follow a 0.45 Hz sinusoidal trajectory, the amplitude of which was gradually increased to 3 cm, and then decreased to 0 at the end of the 10 s interval, in a similar fashion as described in ref. 2. After the initiation phase, the refuge remained stationary for 40 s, finally followed by a termination phase for 10 s, during which the refuge was actuated in a similar fashion as during the initiation phase.
Tracking algorithm
To observe fine details of the fish movement, we used a high frame rate in our video recordings. High tracking accuracy was essential as the position and velocity data were likely to be contaminated by measurement noise. To ensure high tracking accuracy, the refuge and fish position were analysed by custom video tracking software59 developed by Balázs P. Vágvölgyi from the Laboratory for Computational Sensing and Robotics, at Johns Hopkins University.
The tracking algorithm worked in two phases. The first phase was template matching, which roughly located the targets (fish or refuge). In the first frame of a given video, a rectangular region was manually selected around the target to create a template. On subsequent frames, a neighbourhood region around the template (±20 pixels) from the previous frame was selected for the computation of a normalized two-dimensional cross-correlation matrix. If the target changed its orientation in the new frames, before computing the normalized two-dimensional cross-correlation, the new image frame was first rotated to match the orientation of the template from previous frame. If needed, the areas of the image were sampled (then scaled and interpolated if necessary) with subpixel accuracy.
After creating the template, the second phase applied the Levenberg–Marquardt algorithm to find the global maximum of the normalized cross-correlation function. This step produced a match between the template and target at each frame, with subpixel accuracy. We performed extensive preliminary testing and analysis to confirm that the remaining measurement errors had smaller variance than the stochastic movements of the fish.
Data processing
The tracking algorithm stored the fish position in both horizontal and vertical directions (originally in pixels, along with the respective pixel-to-metre conversion factor) and the angle of orientation (in degrees) in .csv files. We used only the data while the refuge was stationary (40 s, 4,000 data points in total) for each trial. To further reduce the measurement noise, the position data were filtered through a Butterworth zero-phase distortion filter (filtfilt command in MATLAB) with a 10 Hz cut-off frequency. Fish velocity in the horizontal direction was computed as forward differences of the horizontal position time series.
Identification and characterization of behavioural modes
For the identification of the behavioural modes, we used three different clustering approaches—(1) a GMM with inflection-point-based clustering, (2) hidden Markov model (HMM)-based clustering, and (3) a GMM with maximum a posteriori probability (MAP)-based clustering.
For GMM with inflection-point-based clustering, the velocity (v) data from each individual fish at a specific lighting condition were clustered into three components, slow, fast positive and fast negative (Extended Data Fig. 3a,b), using two velocity thresholds, vL and vH (vL < vH), resulting in two behavioural modes—slow and fast (fast positive and negative were combined). The velocity threshold values were computed by finding the inflection points of the GMM fits to the velocity data, fGMM, specific to a lighting condition. To numerically identify the inflection points of fGMM, we numerically computed the spatial second-order derivative of fGMM (\({f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }\)), and located the first and the last indices of the array \({{f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }}/{{f}_{{{{\rm{GMM}}}}}}\) such that the condition \({{f}_{{{{\rm{GMM}}}}}^{\,\prime\prime }}/{{f}_{{{{\rm{GMM}}}}}} < c\) was satisfied for a given ad hoc constant c, selected as described below. This method separated the central peak of the fGMM, velocity distribution around zero velocity, from the broad shoulders. We chose c = 0.005 for all the individual fish irrespective of the lighting conditions, except for fish 1 lights-off trials (c = 0.02); the different c value for fish 1 lights-off trials was chosen so that the relative area under the central peak of the distribution was less than 0.6 (similar to other fish during lights-off trials). For further analysis of these behavioural modes, we assumed a continuous-time Markov chain model (Extended Data Fig. 3c,d). For infinitesimal dt, the transition probabilities from state i to state j, Qij are given as follows:
where O denotes order, and the probability matrix P with pij = Pr(v(t) = j∣v(0) = i) and transition rate matrix Q with entries Qij satisfy the first-order differential equation
whose solution is given by
For every trial from each individual fish, we computed the probability matrix P with entries pij, i = 1, 2, j = 1, 2 where states 1 and 2 correspond to slow (exploit) and fast (explore) modes, respectively. We used the approximation to the matrix exponential in equation (6), \({{{\bf{Q}}}}\approx \frac{1}{h}({{{\bf{P}}}}-I)\) for the computation of the transition rates between slow and fast modes in each trial from the respective probability matrix, P. Here h is time step = 0.01 s and I is 2x2 identity matrix.
For HMM clustering, we combined all the positional trial data (xt) from all the five fish at a specific lighting condition along with their negatives (−xt). The subscript t is a variable representing time. We included the negative data to eliminate any directional bias. We assumed that the observed measurements of position, xt, follow a homogeneous Markov switching first-order autoregressive model:
where the superscript st ∈ {1, 2, 3} refers to the hidden discrete state, coefficients \(\alpha^{s_t}_k \in {0,1}\) are model parameters, ε_t is Gaussian white noise, and \(\sigma\) is the noise variance. We fit this model using the NHMSAR package in R.
The HMM fitting resulted in three clusters similar to slow, fast positive and fast negative, as obtained with the GMM with inflection-point-based clustering method (Extended Data Fig. 3e–h). Finally by combining fast positive and negative, we ended up with two behavioural modes—fast and slow—for further computation of switching frequency and residence time.
In GMM with MAP-based clustering, GMM models with three components were fitted to the velocity data from each individual fish at a specific lighting condition. We assigned the cluster index for each data point based on the maximum a posteriori probability using Bayes’ rule. This method required a post hoc assignment of which cluster or clusters correspond to the ‘slow’ behavioural mode to compute residence time; see, for example, Extended Data Fig. 3i–m.
All analysis was performed using code written in R and MATLAB.
Simulation
Sensory adaptation is a robustly observed phenomenon among organisms ranging from unicellular amoebae37,38 to humans26 where the sensory systems stop responding to constant stimuli. Here we modelled this adaptive/high-pass nature of the sensory receptors as a ‘motion-dependent sensor’ for which we assumed a nonlinear measurement model24,25 with sensory noise w2(t):
Here, s(x) is the position-dependent sensory scene experienced by the organism. For the present study, we assumed a quadratic sensory scene \(s(x)=\frac{1}{2}\alpha {x}^{2}+\beta x\) with non-zero constant sensory-scene parameters α and β. This assumption on sensory scene yields g(x) = αx + β, a linear function of position, x.
Due to the presence of the nonlinearity in the measurement, we used an extended Kalman filter for state estimation, a common heuristic. For the state-feedback component, we applied \(f(\hat{x},\hat{v})={k}_{1}\hat{x}+{k}_{2}\hat{v}\) where \(k_1\) and \(k_2\) are the position and velocity feedback constants, respectively. In the triggered-excitation scheme, for the uncertainty measure (M) we used the trace of the state estimation error covariance matrix, Tr(P(t)). When the uncertainty measure Tr(P(t)) rose above a maximum threshold, Tmax, the controller generated active-sensing component, ua(t) as a Gaussian input with fixed power spectral density and it continued to inject the input until Tr(P(t)) dropped below a lower threshold, Tmin. At this point, the controller switched back to traditional state-feedback form. For the persistent-excitation scheme, the controller continued to inject a Gaussian input ua(t) for all time.
To obtain the critical excitation level of the active-sensing component ua,crit(t) for optimum tracking performance in persistent excitation, we chose 30 logarithmically spaced variance values of ua(t) from 1 to 100. From the mean of 100 independent simulations for each variance value, we obtained ua,crit(t) ≈ 9.33, which achieved the minimum RMS tracking error of ePE,min ≈ 0.071 and RMS control effort uPE ≈ 10. Using this critical value for the excitation ua,crit(t), we studied the effect of thresholds in triggered excitation by varying Tmax and Tmax/Tmin linearly from 4 × 10−3 to 10 × 10−3, and 0.5 to 1, respectively, and performed 300 independent simulations for each pair of values.
The system parameters were chosen from previous studies6,24 as follows: b = 1.7, m = 1, α = 3, β = 5, \({k}_{1}=m{\omega }_{n}^{2}\), k2 = (2mζωn + b), ζ = 0.56 and ωn = 1.05 × 2π. The process noise, w1(t) and sensor noise, w2(t) were chosen as fixed Gaussian noise inputs with variances 0.03 and 10, respectively.
Statistics
All the statistical analysis was performed with sign test and Mann–Whitney–Wilcoxon test using custom codes written in R version: 4.3.0, R Core Team, and MATLAB version: 9.12 (R2022a), MathWorks. For all tests, the significance level was set to 0.05. The experimental and simulation data are provided as either mean plus or minus the standard deviation (μ ± s.d.) or mean plus or minus the standard error of the mean (μ ± s.e.m).
Data availability
An archived version of the datasets supporting this article is available through the Johns Hopkins University Data Archive at https://doi.org/10.7281/T1/QS3QFT ref. 60
Code availability
An archived version of the analysis codes supporting this article is available through the Johns Hopkins University Data Archive at https://doi.org/10.7281/T1/QS3QFT ref. 60
References
Gibson, J. J. Observations on active touch. Psychol. Rev. 69, 477–491 (1962).
Biswas, D. et al. Closed-loop control of active sensing movements regulates sensory slip. Curr. Biol. 28, 4029–4036 (2018).
Chen, C., Murphey, T. D. & MacIver, M. A. Tuning movement for sensing in an uncertain world. eLife 9, e52371 (2020).
Soatto, S. in Machine Learning for Computer Vision (eds. Cipolla, R. et al.) 17–48 (Springer, 2013).
Stamper, S. A., Roth, E., Cowan, N. J. & Fortune, E. S. Active sensing via movement shapes spatiotemporal patterns of sensory feedback. J. Exp. Biol. 215, 1567–1574 (2012).
Cowan, N. J. & Fortune, E. S. The critical role of locomotion mechanics in decoding sensory systems. J. Neurosci. 27, 1123–1128 (2007).
Rose, G. J. & Canfield, J. G. Longitudinal tracking responses of the weakly electric fish, Sternopygus. J. Comp. Physiol. A 171, 791–798 (1993).
Uyanik, I., Stamper, S. A., Cowan, N. J. & Fortune, E. S. Sensory cues modulate smooth pursuit and active sensing movements. Front. Behav. Neurosci. 13, 59 (2019).
Sefati, S. et al. Mutually opposing forces during locomotion can eliminate the tradeoff between maneuverability and stability. Proc. Natl Acad. Sci. USA 110, 18798–18803 (2013).
Golovin, D. & Krause, A. Adaptive submodularity: theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res. 42, 427–486 (2011).
Cooper, G. F. The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42, 393–405 (1990).
Blondel, V. D. & Tsitsiklis, J. N. A survey of computational complexity results in systems and control. Automatica 36, 1249–1274 (2000).
Narendra, K. S. & Annaswamy, A. M. Persistent excitation in adaptive systems. Int. J. Control 45, 127–160 (1987).
Catania, K. C. Stereo and serial sniffing guide navigation to an odour source in a mammal. Nat. Commun. 4, 1–8 (2013).
Michaiel, A. M., Abe, E. T. & Niell, C. M. Dynamics of gaze control during prey capture in freely moving mice. eLife 9, e57458 (2020).
Wohlgemuth, M. J., Kothari, N. B. & Moss, C. F. Action enhances acoustic cues for 3-D target localization by echolocating bats. PLoS Biol. 14, e1002544 (2016).
Kiemel, T., Oie, K. S. & Jeka, J. J. Multisensory fusion and the stochastic structure of postural sway. Biol. Cyber. 87, 262–277 (2002).
Lockey, J. K. & Willis, M. A. One antenna, two antennae, big antennae, small: total antennae length, not bilateral symmetry, predicts odor-tracking performance in the American cockroach Periplaneta americana. J. Exp. Biol. 218, 2156–2165 (2015).
Stöckl, A. L., Kihlström, K., Chandler, S. & Sponberg, S. Comparative system identification of flower tracking performance in three hawkmoth species reveals adaptations for dim light vision. Phil. Trans. R. Soc. Lond. B 372, 20160078 (2017).
Deora, T., Ahmed, M. A., Daniel, T. L. & Brunton, B. W. Tactile active sensing in an insect plant pollinator. J. Exp. Biol. 224, jeb239442 (2021).
Hauperich, A.-K., Young, L. K. & Smithson, H. E. What makes a microsaccade? A review of 70 years of research prompts a new detection method. J. Eye Mov. Res. 12, 1–22 (2019).
De la Fuente, I. M. et al. Evidence of conditioned behavior in amoebae. Nat. Commun. 10, 1–12 (2019).
Sutton, E. E., Demir, A., Stamper, S. A., Fortune, E. S. & Cowan, N. J. Dynamic modulation of visual and electrosensory gains for locomotor control. J. R. Soc. Interface 13, 20160057 (2016).
Kunapareddy, A. & Cowan, N. J. Recovering observability via active sensing. In Proc. Annual American Control 2821–2826 (IEEE, 2018).
Sontag, E. D., Biswas, D. & Cowan, N. J. An observability result related to active sensing. Preprint at https://arxiv.org/abs/2210.03848 (2022).
Fabre, M. et al. Large postural sways prevent foot tactile information from fading: neurophysiological evidence. Cereb. Cortex. Commun. 2, tgaa094 (2020).
Arkley, K., Grant, R. A., Mitchinson, B. & Prescott, T. J. Strategy change in vibrissal active sensing during rat locomotion. Curr. Biol. 24, 1507–1512 (2014).
Yang, S. C.-H., Wolpert, D. M. & Lengyel, M. Theoretical perspectives on active sensing. Curr. Opin. Behav. Sci. 11, 100–108 (2016).
Matthews, M. & Sponberg, S. Hawkmoth flight in the unsteady wakes of flowers. J. Exp. Biol. 221, jeb179259 (2018).
Tritico, H. M. & Cotel, A. J. The effects of turbulent eddies on the stability and critical swimming speed of creek chub Semotilus atromaculatus. J. Exp. Biol. 213, 2284–2293 (2010).
Harris, C. M. & Wolpert, D. M. Signal-dependent noise determines motor planning. Nature 394, 780–784 (1998).
Nelson, M., Xu, Z. & Payne, J. Characterization and modeling of p-type electrosensory afferent responses to amplitude modulations in a wave-type electric fish. J. Comp. Physiol. A 181, 532–544 (1997).
Lee, J. et al. Templates and anchors for antenna-based wall following in cockroaches and robots. IEEE Trans. Robot. 24, 130–143 (2008).
Jun, J. J., Longtin, A. & Maler, L. Active sensing associated with spatial learning reveals memory-based attention in an electric fish. J. Neurophysiol. 115, 2577–2592 (2016).
Clarke, S. E., Naud, R., Longtin, A. & Maler, L. Speed-invariant encoding of looming object distance requires power law spike rate adaptation. Proc. Natl Acad. Sci. USA 110, 13624–13629 (2013).
Clarke, S. E., Longtin, A. & Maler, L. A neural code for looming and receding motion is distributed over a population of electrosensory on and off contrast cells. J. Neurosci. 34, 5583–5594 (2014).
Takeda, K. et al. Incoherent feedforward control governs adaptation of activated ras in a eukaryotic chemotaxis pathway. Sci. Signal. 5, ra2 (2012).
Biswas, D., Devreotes, P. N. & Iglesias, P. A. Three-dimensional stochastic simulation of chemoattractant-mediated excitability in cells. PLoS Comput. Biol. 17, e1008803 (2021).
Bertsekas, D. Dynamic Programming and Optimal Control Vol. 1 (Athena Scientific, 2012).
Zweifel, N. O. & Hartmann, M. J. Defining ‘active sensing’ through an analysis of sensing energetics: homeoactive and alloactive sensing. J. Neurophys. 124, 40–48 (2020).
Bullock, T. H. Electroreception. Annu. Rev. Neurosci. 5, 121–170 (1982).
Brinkløv, S., Elemans, C. P. & Ratcliffe, J. M. Oilbirds produce echolocation signals beyond their best hearing range and adjust signal design to natural light conditions. R. Soc. Open Sci. 4, 170255 (2017).
Nelson, M. E. & MacIver, M. A. Sensory acquisition in active sensing systems. J. Comp. Physiol. A 192, 573–586 (2006).
Snyder, J. B., Nelson, M. E., Burdick, J. W. & MacIver, M. A. Omnidirectional sensory and motor volumes in electric fish. PLoS Biol. 5, e301 (2007).
Ghose, K. & Moss, C. F. Steering by hearing: a bat’s acoustic gaze is linked to its flight motor output by a delayed, adaptive linear law. J. Neurosci. 26, 1704–1710 (2006).
Berg, H. C. & Brown, D. A. Chemotaxis in Escherichia coli analysed by three-dimensional tracking. Nature 239, 500–504 (1972).
Benda, J., Longtin, A. & Maler, L. A synchronization–desynchronization code for natural communication signals. Neuron 52, 347–358 (2006).
Metzen, M. G., Hofmann, V. & Chacron, M. J. Neural synchrony gives rise to amplitude- and duration-invariant encoding consistent with perception of natural communication stimuli. Front. Neurosci. 14, 79 (2020).
Hofmann, V. & Chacron, M. J. Population coding and correlated variability in electrosensory pathways. Front. Integr. Neurosci. 12, 56 (2018).
Grewe, J., Kruscha, A., Lindner, B. & Benda, J. Synchronous spikes are necessary but not sufficient for a synchrony code in populations of spiking neurons. Proc. Natl Acad. Sci. USA 114, E1977–E1985 (2017).
Fetcho, J. R., Higashijima, S.-i & McLean, D. L. Zebrafish and motor control over the last decade. Brain. Res. Rev. 57, 86–93 (2008).
Tabor, K. M. et al. Direct activation of the Mauthner cell by electric field pulses drives ultrarapid escape responses. J. Neurophysiol. 112, 834–844 (2014).
Orchard, G. et al. HFirst: a temporal approach to object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2028–2040 (2015).
Krebs, J. R., Kacelnik, A. & Taylor, P. Test of optimal sampling by foraging great tits. Nature 275, 27–31 (1978).
Cerri, R. D. & Fraser, D. F. Predation and risk in foraging minnows: balancing conflicting demands. Am. Nat. 121, 552–561 (1983).
Kaelbling, L. P., Littman, M. L. & Moore, A. W. Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996).
Hitschfeld, É. M., Stamper, S. A., Vonderschen, K., Fortune, E. S. & Chacron, M. J. Effects of restraint and immobilization on electrosensory behaviors of weakly electric fish. ILAR J. 50, 361–372 (2009).
Roth, E., Zhuang, K., Stamper, S. A., Fortune, E. S. & Cowan, N. J. Stimulus predictability mediates a switch in locomotor smooth pursuit performance for Eigenmannia virescens. J. Exp. Biol. 214, 1170–1180 (2011).
Vágvölgyi, B. P. General tracker. GitHub https://github.com/vagvolgyi/general_tracker (2021).
Biswas, D. et al. Data and code associated with the publication: Mode switching in organisms for solving explore-vs-exploit problem. Johns Hopkins Research Data Repository https://doi.org/10.7281/T1/QS3QFT6 (2023).
Acknowledgements
We thank T. Kiemel (UMD) for providing human balance data and B. P. Vágvölgyi (JHU) for developing the tracking software used in this work. We thank C. F. Moss (JHU), V. P. Sharma (GT) and S. Sponberg (GT) for suggesting relevant articles for the reanalysis of the animal locomotion data, and S. L. Poynton (JHMI) for critical feedback on the paper. This work was supported by the Office of Naval Research under grant no. N00014-21-1-2431 (N.J.C.) and the National Science Foundation under grant no. 2011619 (N.J.C.).
Author information
Authors and Affiliations
Contributions
Conceptualization, all authors. Methodology, all authors. Software, D.B., A.L., K.H., Y.Y. and J.G. Formal analysis, D.B., A.L., K.H., Y.Y. and J.G. Investigation, Y.Y. Data curation, D.B. and Y.Y. Writing—original draft, all authors. Writing—review and editing, all authors. Visualization, D.B. with other authors. Supervision, N.J.C.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Leonard Maler, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Liesbeth Venema, in collaboration with the Nature Machine Intelligence team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Comparison of fish movement in lights-on trials versus lights-off trials.
(a) Velocity trace of the truncated trial data (black) with traces from other trials (gray) for the same fish from Fig. 1c. (b) Histogram of all trials (n = 7) using same length scale, and the three-component Gaussian mixture model (GMM) fit. (c) Magnitude of discrete Fourier transform of velocity traces from (a) with total number of trials (n) indicated next to the plots. The solid line and the shaded region denote mean and the standard error of mean, respectively. (d) Box and whisker plot showing RMS values of individual trials for all fish (N = 5) with colors the same as in (c). Mean RMS velocity across trials for all fish in lights-off trials was greater than in lights-on trials (one-sided p-values are 0.0004, 0.0001, 0.0036, 0.0003, and 0.0001, respectively). (e, f) Q-Q plots from a single representative fish (fish 1) comparing the velocity data from lights-off (g) and lights-on trials (h) with theoretical quantiles from a normal fit (magenta) and GMM fit (blue), respectively. Lesser deviation from the reference line (black dashed) for GMM fits with three components than for the normal, suggested better fitting of the former. The Q-Q plots also showed that the lights-off trial data were closer to normal distribution fits than were those of lights-on trial data. (g) Cumulative difference in Bayesian information criterion (ΔBIC) values for varying number of components in GMM (top: lights off; bottom: lights on). The gray dashed line corresponds to three-component GMM. (h) Box and whisker plot showing kurtosis values of individual trials for all fish (N = 5) with colors the same as in (c). For all fish, mean kurtosis values across trials in lights-on trials was greater than in lights-off trials (one-sided p-values are 0.0034, 0.0188, 0.0005, 0.0445, and 0.0068, respectively). All box and whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ± 1.5 × IQR. The total number of lights-off trials (n) for fish 1 and 3 was 7, and for the rest, it was 10 trials per condition per fish. All p-values were calculated using the Mann-Whitney-Wilcoxon test.
Extended Data Fig. 2 Reanalysis of the experimental and simulated trajectories from Chen et al.3.
(a) Experimental velocity traces (n = 10) with ‘jamming’ of the electrosensory system in Eigenmannia virescens, which decreased the salience and reliability of electrosensory navigation. (b) Corresponding histogram, with the kurtosis value, κ. The magenta dashed and the blue solid curves correspond to a normal and GMM fit with three components, respectively. (c) Q-Q plots comparing the sample velocity data for all trials (n = 10) from (a) with theoretical quantiles from the same normal (magenta) and GMM fit (blue) from (b). Clearly, the GMM fit was better than the normal. See Extended Data Table 2 for statistical details. (d–f) Experimental velocity traces (n = 10) with jamming electrode off (d), corresponding histogram (e) with the kurtosis value, κ and the Q-Q plots (f). (g–i) Simulated velocity traces (n = 9) using Ergodic Harvesting Information (EIH) algorithm for weak signal (g; SNR ≤ 30 dB equivalent to jamming amplitude ≥ 10 mA) (g), corresponding histogram (h) with the kurtosis value, κ and the Q-Q plots (i). (j-l) Simulated velocity traces (n = 9) using Ergodic Harvesting Information (EIH) algorithm for strong signal (d; SNR ≥ 50 dB equivalent to absence of jamming) (j), corresponding histogram (k) with the kurtosis value, κ and the Q-Q plots (l). Colors and styles are same as in (a-c). Mean ± SD of the respective RMS values of experimental velocity traces (vRMS) are shown next to plots of (a,d). The fitting performances of GMM and the normal distribution for simulated trajectories were comparable. See Extended Data Table 3 for statistical details.
Extended Data Fig. 3 Clustering of velocity data into different behavioral modes.
(a,b) Representative velocity histograms of lights-off (a) and lights-on (b) trials from the same fish with three clusters: slow (orange), fast positive (green) and fast negative (blue). The clustering was based on identifying velocity thresholds vL and vH on the Gaussian mixture model (GMM) fit (grey line) as indicated by blue and green markers, respectively (see Methods for details). (c,d) Top: two-state (F: fast velocity both positive and negative combined and S: slow velocity) Markov process showing mean transition rates for a representative individual in lights-off (a) and lights-on trials (b). Bottom: transition rates corresponding to state transitions: F → S in (e) and S → F in (f), respectively for lights-off (black) and lights-on (red) trials. The transition rate for F → S was higher in lights-on trials (one-sided p-values are 0.0068, 0.0001, 0.0046, 0.0023, and 0.0093, respectively) whereas the rate for S → F was higher in lights-off trials (one-sided p-values are 0.0006, 0.0001, 0.0001, 0.0018, and 0.0001, respectively). (e-h) Velocity histograms (e,f) and traces (g,h) from lights-off (e,g) and lights-on (f,h) trials from the same fish from (a,b) showing three clusters using Hidden Markov model (HMM) based clustering. The colors are same as in (a,b). (i-l) Velocity histograms (i,j) and traces (k,l) from lights-off (i,k) and lights-on (j,l) trials from the same fish from (a,b) showing three clusters using maximum a posteriori (MAP) clustering based on three-component GMM fits. The probability density functions (pdf) of respective components are shown as dashed lines in (i,j). (m) Box and whisker plots showing residence time in slow mode, computed as the percent of the trial duration (40 s), for lights-off (black) and lights-on (red) trials computed using different clustering algorithm. For details see Methods. For all the clustering algorithms, the computed residence time was significantly higher during lights-on trials than lights-off trials (one sided p-values are 0.0002, 0.0034, 0.0001, and 0.0001, respectively). All box and whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ± 1.5 × IQR. In (c,d) the total number of lights-off trials (n) for fish 1 and 3 was 7, and for the rest, it was 10 trials per condition per fish. All p-values were calculated using the Mann-Whitney-Wilcoxon test.
Extended Data Fig. 4 Reanalysis of the postural sway in humans, Homo sapiens17, and odor plume tracking response of American cockroach, Periplaneta americana18 show evidence in support of sensory salience dependent mode-switching strategy.
(a,b) Representative temporal traces of mediolateral movement of center of mass, ML CoM (a) and the histograms of ML CoM velocities (b) for different experimental conditions-both vision and touch (top), only touch (middle) and only vision (bottom). The magenta dashed and the blue solid curves in (b) correspond to a normal and GMM fit with three components, respectively. The dataset analyzed here, comprised of 7 subjects (N), with 2-3 replicate trials per experimental condition, was collected at 50 Hz. (c,d) Comparison of the RMS velocities (c) and switching frequency (d) for different experimental conditions. Different shades of gray denotes different human subjects. The p-values were computed using the sign test. (e,f) Representative temporal traces of the lateral head movement (e) and the histograms of the lateral velocities (f) for different antennae length as indicated. The colors of fitted curves are same as in (a,b). The dataset analyzed here, was collected at 30 Hz but later was subsampled at 15 Hz by the original study authors. The kurtosis (κ) values and the total number of trajectories (N, single trajectory per subject) analyzed are indicated next to the respective panels in (f). (g,h) Comparison of the RMS lateral velocities (g) and residence time at slow mode (h) for different experimental conditions. All box and whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ± 1.5 × IQR. Sample sizes (n) are shown in each boxplot. The one-sided p-values were computed using the Mann-Whitney-Wilcoxon test.
Extended Data Fig. 5 Reanalysis of the tactile response in crepuscular hawkmoth, Manduca sexta20, and active exploratory movement of three different species of hawkmoths during flower tracking19 shows similar broad-shouldered velocity distributions.
(a-f) Histograms of relative radial angular velocity for different shape of the flower as indicated. ‘C’ is the curvature parameter for the description of the lateral traces of the corollas for first (a-c) and seventh (d-f, early-learning) visit. The magenta dashed and the blue solid curves in (d) correspond to a normal and GMM fit with three components, respectively. The kurtosis (κ) values and the total number of trajectories (N) analyzed are indicated next to the respective panels in (a-f). The dataset analyzed here was collected at 100 Hz. For the present study, we focused on the data during the pre-feeding phases only. (g-l) Histograms of active exploratory velocity at low (g-i: 15 lx) and high (j-l: 300 lx) illumination level in three different species of hawkmoth-nocturnal Deilephila elpenor (g,j), diurnal Macroglossum stellatarum (h,k), and crepuscular Manduca sexta (i,l). Colors of the fits are same as in (a-f). The kurtosis (κ) values and the total number of hawkmoths (N) analyzed are indicated. (m-o) Box and whisker plots showing the RMS active exploratory velocity for all three species of hawkmoths analyzed at different illumination levels. All box and whisker plots include the median line, the box denotes the interquartile range (IQR), whiskers denote the rest of the data distribution and outliers are denoted by points greater than ± 1.5 × IQR. Sample sizes (n) are shown in each boxplot. The one-sided p-values were computed using the Mann-Whitney-Wilcoxon test. The datasets analyzed here were collected at 100 Hz with one trial per hawkmoth (N = n).
Extended Data Fig. 6 Reanalysis of the directed movement of Amoeba proteus and Metamoeba leningradensis in response to an electric field (galvaontaxis)22.
Migration trajectories (N = 50) of A. proteus (a), temporal traces of velocities (b) in the transverse direction of the applied electric field derived from the migration data, and the corresponding velocity histogram (c) with kurtosis (κ) value. The magenta dashed and the blue solid curves in (c) correspond to a normal and GMM fit with three components, respectively. (d-f) Galvanotaxis response in M. leningradensis. The panels are same as in (a-c). Image data for both datasets were collected at 0.1 Hz.
Extended Data Fig. 7 Effect of thresholds, \({{{{\rm{T}}}}}_{\min }\) and \({{{{\rm{T}}}}}_{\max }\) in Triggered Excitation strategy.
(a-c) Heatmaps showing mean RMS tracking error in Triggered Excitation (eTE, a), mean RMS control effort (uRMS, b) and mean kurtosis (κ, c) of the resultant velocity distributions from 100 independent simulations at critical excitation level corresponds to minimum RMS tracking error in Persistent Excitation (\({{{{\rm{e}}}}}_{{{{\rm{PE}}}},\min }\) in Fig. ??l) as thresholds \({{{{\rm{T}}}}}_{\min }\) and \({{{{\rm{T}}}}}_{\max }\) were varied. The dashed line in (a-c) shows the phase transition based on the difference between the tracking error in Triggered and Persistent Excitation. The region inside the line corresponds to parameter space where the tracking error in Persistent Excitation is less than Triggered excitation, whereas outside the region Triggered excitation performs better. (d) Variation of kurtosis, κ (green, left y-axis), and Kullback-Leibler (K-L) divergence (right y-axis) of normal distribution (magenta dashed) and Gaussian mixture model (blue solid) fit to the velocity distribution with sensor noise variance, σ2. (e,f) Velocity histograms with kurtosis values are shown for sensor noise variance, σ2 = 0.70 and 2, respectively, as indicated by (i) and (ii) in (d). (g) Variation of RMS control effort (uRMS) with sensor noise variance, σ2. The shaded regions in (d,g) denote the respective standard deviations (n = 25 independent simulations per σ2).
Supplementary information
Supplementary Information
Supplementary Material and Methods, Figs. 1–6 and video information.
Supplementary Video 1
Fish behaviour during lights-off trials, related to Figs. 1 and 2. The top panel depicts the fish movement (bottom view) inside a stationary refuge from a representative trial. The bright ventral patch on the fish was tracked (circular marker). The original video was recorded at 100 frames per second. The middle and the bottom panels show the position (middle) and velocity (bottom) traces along the rostrocaudal axis (that is, x axis). The colours denote different behavioural modes: slow, orange; green, fast positive; blue, fast negative.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Biswas, D., Lamperski, A., Yang, Y. et al. Mode switching in organisms for solving explore-versus-exploit problems. Nat Mach Intell 5, 1285–1296 (2023). https://doi.org/10.1038/s42256-023-00745-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00745-y