Gait switching and targeted navigation of microswimmers via deep reinforcement learning

Zou, Zonghao; Liu, Yuexin; Young, Y.-N.; Pak, On Shun; Tsang, Alan C. H.

doi:10.1038/s42005-022-00935-x

Download PDF

Article
Open access
Published: 21 June 2022

Gait switching and targeted navigation of microswimmers via deep reinforcement learning

Communications Physics volume 5, Article number: 158 (2022) Cite this article

3793 Accesses
21 Citations
123 Altmetric
Metrics details

Subjects

Abstract

Swimming microorganisms switch between locomotory gaits to enable complex navigation strategies such as run-and-tumble to explore their environments and search for specific targets. This ability of targeted navigation via adaptive gait-switching is particularly desirable for the development of smart artificial microswimmers that can perform complex biomedical tasks such as targeted drug delivery and microsurgery in an autonomous manner. Here we use a deep reinforcement learning approach to enable a model microswimmer to self-learn effective locomotory gaits for translation, rotation and combined motions. The Artificial Intelligence (AI) powered swimmer can switch between various locomotory gaits adaptively to navigate towards target locations. The multimodal navigation strategy is reminiscent of gait-switching behaviors adopted by swimming microorganisms. We show that the strategy advised by AI is robust to flow perturbations and versatile in enabling the swimmer to perform complex tasks such as path tracing without being explicitly programmed. Taken together, our results demonstrate the vast potential of these AI-powered swimmers for applications in unpredictable, complex fluid environments.

Shell buckling for programmable metafluids

Article 03 April 2024

Adel Djellouli, Bert Van Raemdonck, … Katia Bertoldi

Collective intelligence: A unifying concept for integrating biology across scales and substrates

Article Open access 28 March 2024

Patrick McMillen & Michael Levin

Force-controlled release of small molecules with a rotaxane actuator

Article Open access 10 April 2024

Lei Chen, Robert Nixon & Guillaume De Bo

Introduction

Swimming microorganisms have evolved versatile navigation strategies by switching their locomotory gaits in response to their surroundings¹. Their navigation strategies typically involve switching between translation and rotation modes such as run-and-tumble and reverse-and-flick in bacteria^2,3,4,5, as well as run-stop-shock and run-and-spin in eukaryotes^6,7. Such an adaptive, multimodal gait-switching ability is particularly desirable for biomedical applications of artificial microswimmers such as targeted drug delivery and microsurgery^8,9,10,11,12, which require navigation towards target locations in biological media with uncontrolled and/or unpredictable environmental factors^13,14,15.

Pioneering works by Purcell and subsequent studies demonstrated how simple reconfigurable systems with ingenious locomotory gaits can generate net translation and rotation, given the stringent constraints for locomotion at low Reynolds numbers¹⁶. Yet, the design of locomotory gaits becomes increasingly intractable when more sophisticated maneuvers are required or environmental perturbations are present. Existing microswimmers are therefore typically designed with fixed locomotory gaits and rely on manual interventions for navigation^{8,17,18,19,20,21}. It remains an unresolved challenge in developing microswimmers with adaptive locomotory strategies similar to that of biological cells that can navigate complex environments autonomously. Modular microrobotics and the use of soft active materials^22,23 have been proposed to address the challenge.

More recently, the rapid development of artificial intelligence (AI) and its applications in locomotion problems^{24,25,26,27,28,29} have opened different paths towards designing the next generation of smart microswimmers^30,31. Various machine learning approaches have enabled the navigation of active particles in the presence of background flows^32,33, thermal fluctuations^34,35, and obstacles³⁶. As minimal models, the microswimmers are often modeled as active particles with prescribed self-propelling velocities and certain degrees of freedom for speed variation and re-orientation. However, the complex adjustments in locomotory gaits required for such adaptations are typically not accounted for. Recent studies have begun to examine how different machine learning techniques enable reconfigurable microswimmers to evolve effective gaits for self-propulsion³⁷ and chemotactic repsonse³⁸.

Here, we combine reinforcement learning (RL) with artificial neural network to enable a simple reconfigurable system to perform complex maneuvers in a low-Reynolds-number environment. We show that the deep RL framework empowers a microswimmer to adapt its locomotory gaits in accomplishing sophisticated tasks including targeted navigation and path tracing, without being explicitly programmed. The multimodel gait switching strategies are reminiscent of that adopted by swimming microorganisms. Furthermore, we examine the performance of these locomotion strategies against perturbations by background flows. The results showcase the versatility of AI-powered swimmers and their robustness in media with uncontrolled environmental factors.

Results and discussion

Model reconfigurable system

We consider a simple reconfigurable system consisting of three spheres with radius R and centers r_i (i = 1, 2, 3) connected by two arms with variable lengths and orientations as shown in Fig. 1a. This setup generalizes previous swimmer models proposed by Najafi and Golestanian³⁹ and Ledesma-Aguilar et al. ⁴⁰ by allowing more degrees of freedom. The interaction between the system and the surrounding viscous fluid is modeled by low Reynolds number hydrodynamics, imposing stringent constraints on the locomotive capability of the system. Unlike the traditional paradigm where the locomotory gaits are prescribed in advance^{39,40,41,42,43,44}, here we exploit a deep RL framework to enable the system to self-learn a set of locomotory gaits to swim along a target direction, θ_T. We employ a deep neural network based on the Actor-Critic structure and implement the Proximal Policy Optimization (PPO) algorithm^29,45 to train and update the agent (i.e., AI) in charge of the decision making process (Fig. 1b). The deep RL framework here extends previous studies from discrete action spaces to continuous action spaces^32,35,37,46, enhancing the swimmer’s capability in developing more versatile locomotory gaits for complex navigation tasks (see the “Methods” section for implementation details of the Actor-Critic neural network and PPO algorithm).

**Fig. 1: Schematics of the model microswimmer and the deep neural network with Actor-Critic structure.**

Hydrodynamic interactions

The interaction between the spheres and their surrounding fluid is governed by the Stokes equation ( ∇ p = μ∇²u, ∇ ⋅ u = 0). Here, p, μ and u represent, respectively, the pressure, dynamic viscosity, and velocity field. In this low Reynolds number regime, the velocities of the spheres V_i and the forces F_i acting on them can be related linearly as

$${{{{{{{{\bf{V}}}}}}}}}_{i}={{{{{{{{\bf{G}}}}}}}}}_{ij}{{{{{{{{\bf{F}}}}}}}}}_{j},$$

(1)

where G_ij is the Oseen tensor^47,48,49 given by

$${{{{{{{{\bf{G}}}}}}}}}_{ij}=\left\{\begin{array}{l}\frac{1}{6\pi \mu R}{{{{{{{\bf{I}}}}}}}},\hfill\\ \frac{1}{8\pi \mu | {{{{{{{{\bf{r}}}}}}}}}_{i}-{{{{{{{{\bf{r}}}}}}}}}_{j}| }({{{{{{{\bf{I}}}}}}}}+{\hat{{{{{{{{\bf{r}}}}}}}}}}_{ij}{\hat{{{{{{{{\bf{r}}}}}}}}}}_{ij}).\end{array}\right.$$

(2)

Here, I is the identity matrix and ${\hat{{{{{{{{\bf{r}}}}}}}}}}_{ij}=({{{{{{{{\bf{r}}}}}}}}}_{i}-{{{{{{{{\bf{r}}}}}}}}}_{j})/| {{{{{{{{\bf{r}}}}}}}}}_{i}-{{{{{{{{\bf{r}}}}}}}}}_{j}|$ denotes the unit vector between spheres i and j. The torque acting on the sphere i is calculated by T_i = r_i × F_i. The rate of actuation of the arm lengths ${\dot{L}}_{1}$, ${\dot{L}}_{2}$ and the intermediate angle ${\dot{\theta }}_{31}$ can be expressed in terms of the velocities of the spheres V_i. The kinematics of the swimmer is fully determined upon applying the force free (∑_iF_i = 0) and torque-free (∑_iT_i = 0) conditions. The Oseen tensor hydrodynamic description is valid when the spheres are not in close proximity (R ≪ L). We therefore constrain the arm and angle contractions such that 0.6L ≤ L₁, L₂ ≤ L and 2π/3 ≤ θ₃₁ ≤ 4π/3.

The actuation rate of the arm lengths ${\dot{L}}_{1},{\dot{L}}_{2}$ can be expressed in terms of the relative velocities of the spheres parallel to the arm orientations:

$$({{{{{{{{\bf{V}}}}}}}}}_{2}-{{{{{{{{\bf{V}}}}}}}}}_{1})\cdot {\hat{{{{{{{{\bf{r}}}}}}}}}}_{21}={\dot{L}}_{1},$$

(3)

$$({{{{{{{{\bf{V}}}}}}}}}_{3}-{{{{{{{{\bf{V}}}}}}}}}_{2})\cdot {\hat{{{{{{{{\bf{r}}}}}}}}}}_{32}={\dot{L}}_{2},$$

(4)

The actuation rate of the intermediate angle ${\dot{\theta }}_{31}$ can be expressed in terms of the relative velocities of the spheres perpendicular to the arm orientations:

$$({{{{{{{{\bf{V}}}}}}}}}_{2}-{{{{{{{{\bf{V}}}}}}}}}_{1})\cdot \frac{d{\hat{{{{{{{{\bf{r}}}}}}}}}}_{21}}{d{\theta }_{1}}={L}_{1}{\dot{\theta }}_{1},$$

(5)

$$({{{{{{{{\bf{V}}}}}}}}}_{3}-{{{{{{{{\bf{V}}}}}}}}}_{2})\cdot \frac{d{\hat{{{{{{{{\bf{r}}}}}}}}}}_{32}}{d{\theta }_{2}}={L}_{2}{\dot{\theta }}_{2},$$

(6)

$${\dot{\theta }}_{1}-{\dot{\theta }}_{2}={\dot{\theta }}_{31},$$

(7)

where ${\dot{\theta }}_{1}$ and ${\dot{\theta }}_{2}$ are the arm rotation speeds. Together with the Oseen tensor description of the hydrodynamic interaction between the spheres, Eqs. (1) and (2) in the main text, and the overall force-free and torque-free conditions, the kinematics of the swimmer is fully determined.

In presenting our results, we scale lengths by the fully extended arm length L, velocities by a characteristic actuation rate of the arm V_c, and hence time by L/V_c and forces by μLV_c (see Non-dimensionalization under Supplementary methods).

Targeted navigation

We first use the deep RL framework to train the model system in swimming along a target direction θ_T, given any arbitrary initial swimmer’s orientation θ_o. The swimmer’s orientation is defined based on the relative position between the swimmer’s centroid r_c = ∑_ir_i/3 and r₁ as ${\theta }_{{\rm {o}}}=\arg ({{{{{{{{\bf{r}}}}}}}}}_{{\rm {c}}}-{{{{{{{{\bf{r}}}}}}}}}_{1})$ (Fig. 1).

In the RL algorithm, the state s ∈ (r₁, L₁, L₂, θ₁, θ₂) of the system is specified by the sphere center r₁, arm lengths L₁, L₂, and arm orientations θ₁, θ₂. The observation $o\in ({L}_{1},{L}_{2},{\theta }_{31},\cos {\theta }_{{\rm {d}}},\sin {\theta }_{{\rm {d}}})$ is extracted from the state, where θ₃₁ is the intermediate angle and θ_d = θ_T−θ_o is the difference between the target direction θ_T and the swimmer’s orientation θ_o; note that the angle difference is expressed in terms of $(\cos {\theta }_{{\rm {d}}},\sin {\theta }_{{\rm {d}}})$ to avoid discontinuity in the orientation space. The AI decides the swimmer’s next action based on the observation using the Actor neural network: for each action step Δt, the swimmer performs an action $a\in ({\dot{L}}_{1},{\dot{L}}_{2},{\dot{\theta }}_{31})$ by actuating its two arms, leading to swimmer displacement. To quantify the success of a given action, the reward is measured by the displacement of the swimmer’s centroid along the target direction, ${r}_{t}=({{{{{{{{\bf{r}}}}}}}}}_{{{\rm {c}}}_{t+1}}-{{{{{{{{\bf{r}}}}}}}}}_{{{\rm {c}}}_{t}})\cdot (\cos {\theta }_{{\rm {T}}},\,\sin {\theta }_{{\rm {T}}})$.

We divide the training process into a total of N_e episodes, with each episode consisting of N_t = 150 learning steps. To ensure a full exploration of the observation space o, both the initial swimmer state s and the target direction θ_T are randomized in each episode. Based on the training results after every 20 episodes, the critic neural network updates the AI to maximize the expected long-term rewards E[R_t=0∣π_θ], where π_θ is the stochastic control policy, ${R}_{t}=\mathop{\sum }\nolimits_{t^{\prime} }^{\infty }{\gamma }^{t^{\prime} -t}{r}_{t^{\prime} }$ is the infinite-horizon discounted future returns, and γ is the discount factor measuring the greediness of the algorithm^45,50. A large discount factor γ = 0.99 is set here to ensure farsightedness of the algorithm. As the episodes proceed, the Actor-Critic structure progressively trains the AI and thereby enhances the performance of the swimmer.

In Fig. 2 (Supplementary Movie 1) we visualize the navigation of a trained swimmer along a target direction θ_T, given a substantially different initial orientation, θ_o. The swimmer’s targeted navigation is accomplished in three stages: (1) in the initial phase (blue curve and regime), the swimmer employs “steering” gaits primarily for re-orientation, followed by (2) “transition” phase (red curve and regime) in which the swimmer continues to adjust its direction while self-propelling, before reaching (3) the “translation” phase (green curve and regime), in which the re-orientation is complete and the swimmer simply self-propels along the target direction. This example illustrates how an AI-powered reconfigurable system evolves a multimodal navigation strategy without explicitly programmed or relying on any prior knowledge of low-Reynolds-number locomotion. We next analyze the locomotory gaits in each mode in the evolved strategy.

**Fig. 2: Example of target navigation utilizing three distinct locomotory gaits.**

Multimodal locomotory gaits

Here we examine the details of the locomotory gaits acquired by the swimmer for targeted navigation in the steering, transition, and translation modes. We distinguish these gaits by visualizing their configurational changes in the three-dimensional (3D) configuration space of the swimmer (L₁, L₂, θ₃₁) in Fig. 3. Here we utilize an example of a swimmer navigating towards a target direction with ∣θ_d∣ > π/2 to illustrate the switching between different locomotory gaits (Fig. 3a), Supplementary Movies 2 and 3). The swimmer needs to re-orient itself in the counter-clockwise direction in this example; an example for the case of clockwise rotation is included in the Supplementary Note 1 (Supplementary Fig. 1, Movies 7 and 8). The dots in Fig. 3a represent configurations at different action steps. The configurations for the steering (blue dots), transition (red dots), and translation (green dots) gaits are clustered in different regions in the configuration space. A representative sequence of configurational changes for each mode of gaits are shown as solid lines to aid visualization (Fig. 3a).

**Fig. 3: Analysis of configurational changes revealing three distinct modes of locomotory gaits.**

We further examine the evolution of L₁, L₂, and θ₃₁ using the representative sequences of configurational changes identified in Fig. 3a for each mode of gaits. For the steering gaits (Fig. 3b, blue lines and Fig. 3d, blue box), the swimmer repeatedly extends and contracts L₂ and θ₃₁, but keeps L₁ constant (the left arm rests in the fully contracted state). The steering gaits thus reside in the L₂−θ₃₁ plane in Fig. 3a (blue line). The large variation in θ₃₁ generates net rotation, substantially re-orientating the swimmer orientation with a relatively small net translation (Fig. 3c). For the transition gaits (Fig. 3b, red lines and Fig. 3d, red box), the swimmer repeatedly extends and contracts all L₁, L₂ and θ₃₁, leading to significant amounts of both net rotation and translation (Fig. 3c). In the configuration space (Fig. 3a), the transition gaits tilt into the L₁−L₂ plane with an average θ₃₁ less than π (red line). Compared with the steering gaits, the variation of θ₃₁ becomes more restricted (Fig. 3b), resulting in smaller net rotation for fine tuning of the swimmer’s orientation in the transition phase. Finally, for the translation gaits (Fig. 3b, green lines and Fig. 3d, green box), the swimmer’s orientation is aligned with the target direction (θ_d ≈ 0); the swimmer repeatedly extends and contracts L₁ and L₂, while keeping θ₃₁ close to π (i.e., all three spheres of the swimmer are aligned), resembling the swimming gaits of Najafi–Golestanian swimmers^39,51. In the configuration space (Fig. 3a), the translation gaits reside largely in the L₁−L₂ plane with an approximately zero average θ₃₁, generating the maximum net translation with minimal net rotation (Fig. 3c). The details of gaits categorization are summarized under Supplementary methods.

It is noteworthy that the multimodal navigation strategy emerges solely from the AI without relying on prior knowledge of locomotion. The switching between rotation, transition, and translation gaits is analogous to the switching between turning and running modes observed in bacterial locomotion^2,5. These results demonstrate how an AI-powered swimmer, without being explicitly programmed, self-learns complex locomotory gaits from rich action and configuration spaces and undergoes autonomous gait switching in accomplishing targeted navigation.

Performance evaluation

Here we investigate the improvement of swimmer’s performance with increased number of training episodes N_e. At initial stage of training with a small N_e, the swimmer may fail to identify the right sets of locomotory gaits to achieve targeted navigation due to insufficient training. Continuous training with increased number of episodes would enable the swimmer to identify better locomotory gaits to complete navigation tasks. Here we measure the improvement of swimmer’s performance with increased N_e by three locomotion tests: (1) Random target test: the swimmer is assigned a target direction selected randomly from a uniform distribution in [0, 2π]; (2) Rotation test: the swimmer is assigned a targeted direction with a large angle of difference with swimmer’s orientation (i.e., θ_d = ± π/2); (3) Translation test: the swimmer is assigned a target direction equal to the swimmer’s orientation (i.e., θ_d = 0). A test is considered to be successful if the swimmer travels along the target direction for a distance of 5 unit in 10,000 action steps. These tests ensure that the trained swimmer acquires a set of effective locomotory gaits to swim along any specified direction with robust rotation and translation.

We consider the success rates of the three tests over 100 trials (Fig. 4). For N_e = 3 × 10⁴, success rates of around 90% are obtained for the three tests. When N_e is increased to 9 × 10⁴, the swimmer masters translation with a 100% success rate but still needs more training for rotation. When N_e is increased further to 15 × 10⁴, the swimmer obtains 100% success rates for all tests. This result demonstrates the continuous improvement in the robustness of targeted navigation with increased N_e up to 15 × 10⁴. As we further increase N_e, we found the relationship between N_e and performance to be non-monotonic. For a total training episodes much greater than N_e = 15 × 10⁴, the overall success rate will begin to drop and eventually fluctuate around 95%. We selected the trained result at N_e = 15 × 10⁴ for the best overall performance.

**Fig. 4: Analysis of the swimmer’s performance with increasing number of episodes.**

To better understand the swimmer’s training process, we also varied the number of steps in each episodes, N_l. For a range from 100 to 300 and a fixed total episodes N_e, we found N_l = 150 provides the most efficient way to balance translation and rotation and require least amount of action steps to complete both the rotation and translation tests. We remark that, when N_l = 100, the swimmer was only able to translate but not to rotate, indicating the significant role N_l plays in learning.

Lastly, we remark that the swimmer appears to require more training, both in N_e and N_l, to learn rotation compared to translation. This may be attributed to the inherit complexity of rotation gaits, where the swimmer needs to actuate its intermediate angle in addition to the actuation of the two arms required in translation gaits.

Path tracing–"SWIM"

Next we showcase the swimmer’s capability in tracing complex paths in an autonomous manner. To illustrate, the swimmer is tasked to trace out the English word “SWIM" (Fig. 5, Supplementary Movie 4). We note that the hydrodynamic calculations required to design the locomotory gaits to trace such complex paths become quickly intractable as the complexity increases. Here, instead of explicitly programming the gaits of the swimmer, we only select target points (p_i, i = 1, 2, . . . , 17, red spots in Fig. 5) as landmarks and require the swimmer to navigate towards these landmarks with its own AI, with the target directions at action step t + 1 given by ${\theta }_{{T}_{t+1}}=\arg ({{{{{{{{\bf{p}}}}}}}}}_{i}-{{{{{{{{\bf{r}}}}}}}}}_{{c}_{t}})$. The swimmer is assigned with the next target point p_i+1 when its centroid is within a certain threshold (0.1 of the fully extended arm length) from p_i. The completion of these multiple navigation tasks sequentially enables the swimmer to successfully trace out the word “SWIM" with a high accuracy (Fig. 5, Supplementary Movie 4). In accomplishing this task, the swimmer switches between the three modes of locomotory gaits autonomously to swim towards individual target points and turn around the corners of the path based on the AI-powered navigation strategy. It is noteworthy that the swimmer is able to navigate around some corners (e.g., at target points 4 and 6) without activating the steering gaits, which are employed for corners with more acute angles (e.g., at target points 8, 14, and 16). While past approaches based on detailed hydrodynamic calculations, manual interventions, or other control methods may also complete such tasks, here we present reinforcement learning as an alternative approach in accomplishing these complex maneuvers in a more autonomous manner.

**Fig. 5: Demonstration of complex navigation capability of Artificial Intelligence powered swimmer.**

Robustness against flows

Last, we examine the performance of targeted navigation under the influence of flows (Fig. 6a, b, Supplementary Movies 5, 6). In particular, to determine to what extent the AI-powered swimmer is capable of maintaining its target direction against flow perturbations, we use the same AI-powered swimmer trained without any background flow, and impose a rotational flow generated by a rotlet at the origin^47,48, u_∞ = −γ × r/r³, where γ = γe_z prescribes the strength of the rotlet in the z-direction, r = ∣r∣ is the magnitude of the position vector r from the origin (see the section “Simulations of background flow” under Supplementary methods). Here the AI-powered swimmer is tasked to navigate towards the positive x-direction under flow perturbations due to the rotlet. We examine how the swimmer adapts to the background flow when performing this task. For comparison, we contrast the resulting motion of the AI-powered swimmer with that of an untrained swimmer (i.e., a Najafi–Golestanian (NG) swimmer that performs only fixed locomotory gaits without any adaptivity³⁹). Without the background flow, both swimmers self-propel with the same speed. Both swimmers are initially placed close to the rotlet with r_c = −5e_x and we sample their performance with three different initial orientations: ${\theta }_{{o}_{0}}=-\pi /3$, 0, and π/3, under different flow strengths. Under a relatively weak flow (γ = 0.15, Fig. 6a), Supplementary Movie 5), the AI-powered swimmer is capable of navigating towards the positive x-direction regardless of its initial orientations against flow perturbations. In contrast, the trajectories of the NG swimmer are largely influenced by the rotlet flow passively depending on the initial orientation of the swimmer. For an increased flow strength (γ = 1.5, Fig. 6b, Supplementary Movie 6), the NG swimmer completely loses control of its direction and is scattered by the rotlet into different directions again due to the absence of any adaptivity. Under such a strong flow, the AI-powered swimmer initially circulates around the rotlet but eventually manages to escape from it, navigating to the positive x-direction successfully with similar trajectories for all initial orientations. We note that the vorticity experienced by the swimmer in this case is comparable with typical re-orientation rates of the AI-powered swimmer. We also remark that when navigating under flow perturbations, the AI-powered swimmer adopts the transition gaits to constantly re-orient itself towards the positive x-direction and self-propels along that direction eventually. These results showcase the AI-powered swimmer’s capability in adapting its locomotory gaits to navigate robustly against flows.

**Fig. 6: Analysis of the performance of targeted navigation under the influence of flows.**

Conclusions

In this work, we present a deep RL approach to enable navigation of an artificial microswimmer via gait switching advised by the AI. In contrast to previous works that considered active particles with prescribed self-propelling velocities as minimal models^32,34,35 or simple one-dimensional swimmers^37,38,46, here we demonstrate how a reconfigurable system can learn complex locomotory gaits from rich and continuous action spaces to perform sophisticated maneuvers. Through RL, the swimmer develops distinct locomotory gaits for a multimodal (i.e., steering, transition, and translation) navigation strategy. The AI-powered swimmer can adapt its locomotory gaits in an autonomous manner to navigate towards any arbitrary directions. Furthermore, we show that the swimmer can navigate robustly under the influence of flows and trace convoluted paths. Instead of explicitly programming a swimmer to perform these tasks in the traditional approach, the swimmer is advised by the AI to perform complex locomotory gaits and autonomous gait switching in accomplishing these navigation tasks. The multimodal strategy employed by the AI-powered swimmer is reminiscent of the run-and-tumble in bacteria^2,5. Taken together, our results showcase the vast potential of this deep RL approach in realizing adaptivity similar to that of biological organisms for robust locomotive capabilities. Such adaptive behaviors are crucial for future biomedical applications of artificial microswimmers in complex media with uncontrolled and/or unpredictable environmental factors.

We finally discuss several possibilities for subsequent investigations based on this deep RL approach. While we demonstrate only planar motion in this work, the approach can be readily extended to three-dimensional navigation by allowing out-of-plane rotation the swimmer’s arms with expanded observation and action spaces for the additional degrees of freedom. Moreover, the deep RL framework is not tied to any specific swimmers; a simple multi-sphere system is used in this work for illustration, and the same framework applies to other reconfigurable systems. We also remark that the AI-powered swimmer is able to overcome some influences of flows even though such flows were absent in the training. Subsequent investigations including the flow perturbation in the training may lead to even more powerful AI that could exploit the flows to further enhance the navigation strategies. Another practical aspect to consider is the effect of Brownian noise^52,53,54. Specifically, the characterization of the effect of thermal fluctuations in both the training process of the swimmer and its resulting navigation performance is currently underway. In addition to flow and thermal fluctuations, other environmental factors, including the presence of physical boundaries and obstacles, may be addressed in similar manners in future studies. The deep RL approach here opens an alternative path towards designing adaptive microswimmers with robust locomotive and navigation capabilities in more complex, realistic environments.

Methods

Here we briefly explain the Proximal Policy Optimization (PPO) alogrithm we used to train our AI-powered swimmer.

In the PPO algorithm, the agent’s motion control is managed with a neural network with an Actor-Critic structure. The Actor network can be considered as a stochastic control policy π_θ(a_t∣o_t), where it generates an action a_t given an observation o_t following a Gaussian distribution. Here θ represents all the parameters of the actor neural network. The Critic network is used to compute the value function V_ϕ by assuming the agent starts at an observation o and acts according to a particular policy π_θ. The parameters in the critic network is represented as ϕ.

To effectively train the swimmer, we divide the total training process into episodes. Each episode can be considered as one round, which terminates after a fixed amount of training steps (N_l = 150). To ensure fully exploration of the observation space, we randomly initialize the swimmer’s geometric configurations (L₁, L₂, θ₁, θ₂) and the target direction (θ_T) at the beginning of each episode.

At time t, the agent receives its current observation o_t and samples action a_t based on the policy π_θ. Given a_t, the swimmer interacts with its surrounding and calculates the next state s_t+1 and reward r_t. The next observation o_t+1 extracted from s_t+1 is sent to the agent for the next iteration. All the observations, actions, rewards and sampling probabilities are stored for the agent’s update. The update process begins after running fix amount of episodes N_E = 20 (Total training steps of an update is therefore: N = N_E*N_l = 3000). The goal for the update is to optimize θ so that the expected long term rewards J(π_θ) = E[R_t=0∣π_θ] is maximized.

The expectation is taken with respect to each running episode, τ. Here, we use the infinite-horizon discounted returns ${r}_{t}=\mathop{\sum }\nolimits_{t^{\prime} }^{\infty }{\gamma }^{t^{\prime} -t}{r}_{t^{\prime} }$, where γ is the discount factor measuring the greediness of the algorithm. We set γ = 0.99 ensuring its farsightedness. To solve this optimization problem, we use the typical policy gradient approach estimation: ∇_θJ(π_θ). More specifically, we implemented the clipped advantage PPO algorithm to avoid large changes in each gradient update. We estimated the surrogate objective J(π_θ) by clipping the probability ratio r(θ) times the advantage function ${\hat{A}}_{t}$. The probability ratio measures the probability of selecting an action for the current policy over the old policy ($r(\theta )=\frac{{\pi }_{\theta }{(a| o)}_{N\times 1}}{{\pi }_{{\theta }_{{{{{{{\mathrm{old}}}}}}}}}{(a| o)}_{N\times 1}}$). The advantage function ${\hat{A}}_{t}$ describes the relative advantage of taking an action a based on an observation o over a randomly selected action and is calculated by subtracting the value function V_N×1 from the discounted return R_N×1 (${\hat{A}}_{t}={R}_{N\times 1}-{V}_{N\times 1}$).

We then update the parameters θ, ϕ via a typical gradient descent algorithm: Adam optimizer. The full detail for our implementation is included in the Algorithm 1 and 2 below. Here, K is the total epoch number. N_l is the number of steps in one episode, and N is the total number of steps for each update. The PPO algorithm uses fixed-length trajectory segments τ. During each iteration, each of N_A parallel actors collect T time steps of data, then we construct the surrogate loss on these N_AT time steps of data, and optimize it with Adam for K epochs.

In the following we present the algorithm tables for the PPO algorithm employed in this work. We refer the readers to classical monographs for more details⁴⁵.

Algorithm 1

Environment

1:	for time step t = 0, 1, . . . do
2:	if mod(t, N_l) = 0 then
3:	Reset state s_t
4:	Compute observation o_t
5:	end if
6:	Sample action a_t from policy π_θ
7:	Evaluate the next state s_t+1 and reward r_t following the swimmer’s hydrodynamics
8:	Compute the next observation o_t+1 from state s_t+1
9:	if t = 0 or mod(t, N) ≠ 0 then
10:	append observation o_t+1, action a_t, reward r_t and action sampling probability π_θ(a_t∣o_t) to observation list o_N×5, action list a_N×3, reward list R_N×1 and action sampling probability list ${\pi }_{{\theta }_{{{{{{{{\rm{old}}}}}}}}}}{(a\| o)}_{N\times 1}$
11:	else
12:	Update the Agent using Algorithm 2
13:	end if
14:	end for

Algorithm 2

Proximal Policy Optimization, Actor-Critic, Update the Agent

1:	Input: Initial policy parameter θ, initial value function parameter ϕ
2:	for k = 0, 1, 2,…K do
3:	Compute infinite-horizon discounted returns R_N×1
4:	Evaluate expected returns V_N×1 using observations o_N×5 and value function V_ϕ
5:	Compute the advantage function: ${\hat{A}}_{t}={R}_{N\times 1}-{V}_{N\times 1}$
6:	Evaluate the probability for policy π_θ using observations o_N×5 and actions a_N×3, store the probability to π_θ(a∣o)_N×1
7:	Compute the probability ratio: $r(\theta )=\frac{{\pi }_{\theta }{(a\| o)}_{N\times 1}}{{\pi }_{{\theta }_{{{{{{{\mathrm{old}}}}}}}}}{(a\| o)}_{N\times 1}}$
8:	Compute the clipped surrogate loss function: ${L}^{{{{{{{\mathrm{CLIP}}}}}}}}(\theta )={\mathbb{E}}[\min (r(\theta ){\hat{A}}_{t},\,{{{{{{\mathrm{clip}}}}}}}\,(r(\theta ),1-\epsilon ,1+\epsilon ){\hat{A}}_{t})]$, with constant ϵ
9:	Compute the value-function loss: ${L}^{{{{{{{\mathrm{VF}}}}}}}}(\phi )=\frac{1}{2}{{{{{{{\bf{E}}}}}}}}[{({R}_{N\times 1}-{V}_{N\times 1})}^{2}]$
10:	Compute the entropy loss: L^S = αS[π_θ], with constant α
11:	Compute the total loss: L(θ, ϕ) = −L^CLIP(θ) + L^VF(ϕ)−L^S
12:	Optimize surrogate L with respect to (θ, ϕ), with K epochs and minibatch size M≤N_AT, with N_A is the number of parallel actors and T is the time step.
13:	θ_old ← θ, ϕ_old ← ϕ
14:	end for

Data availability

The data and Supplementary movies 1–8 that support the findings of this study are available from the corresponding author upon reasonable request.

Code availability

The codes that support the findings of this study are available from the corresponding author upon reasonable request.

References

Lauga, E. & Powers, T. R. The hydrodynamics of swimming microorganisms. Rep. Prog. Phys. 72, 096601 (2009).
Article ADS MathSciNet Google Scholar
Berg, H. C. & Brown, D. A. Chemotaxis in Escherichia coli analysed by three-dimensional tracking. Nature 239, 500–504 (1972).
Article ADS Google Scholar
Stocker, R., Seymour, J. R., Samadani, A., Hunt, D. E. & Polz, M. F. Rapid chemotactic response enables marine bacteria to exploit ephemeral microscale nutrient patches. Proc. Natl Acad. Sci. USA 105, 4209–4214 (2008).
Article ADS Google Scholar
Xie, L., Altindal, T., Chattopadhyay, S. & lun Wu, X. Bacterial flagellum as a propeller and as a rudder for efficient chemotaxis. Proc. Natl Acad. Sci. USA 108, 2246 – 2251 (2011).
Article Google Scholar
Ipiña, E. P., Otte, S., Pontier-Bres, R., Czerucka, D. & Peruani, F. Bacteria display optimal transport near surfaces. Nat. Phys. 15, 610–615 (2019).
Article Google Scholar
Wan, K. Y. & Goldstein, R. E. Time irreversibility and criticality in the motility of a flagellate microorganism. Phys. Rev. Lett. 121, 058103 (2018).
Article ADS Google Scholar
Tsang, A. C. H., Lam, A. T. & Riedel-Kruse, I. H. Polygonal motion and adaptable phototaxis via flagellar beat switching in the microswimmer euglena gracilis. Nat. Phys. 14, 1216–1222 (2018).
Google Scholar
Gao, W. et al. Cargo-towing fuel-free magnetic nanoswimmers for targeted drug delivery. Small 8, 460–467 (2012).
Article Google Scholar
Zhang, L. et al. Characterizing the swimming properties of artificial bacterial flagella. Nano Lett. 9, 3663–3667 (2009).
Article ADS Google Scholar
Ghosh, A. & Fischer, P. Controlled propulsion of artificial magnetic nanostructured propellers. Nano Lett. 9, 2243–2245 (2009).
Article ADS Google Scholar
Ceylan, H. et al. 3d-printed biodegradable microswimmer for theranostic cargo delivery and release. ACS Nano 13, 3353–3362 (2019).
Article Google Scholar
Huang, T.-Y. et al. 3d printed microtransporters: compound micromachines for spatiotemporally controlled delivery of therapeutic agents. Adv. Mater.27, 6644–6650 (2015).
Article Google Scholar
Nassif, X., Bourdoulous, S., Eugène, E. & Couraud, P.-O. How do extracellular pathogens cross the blood–brain barrier? Trends Microbiol. 10, 227–232 (2002).
Article Google Scholar
Celli, J. P. et al. Helicobacter pylori moves through mucus by reducing mucin viscoelasticity. Proc. Natl Acad. Sci. USA 106, 14321–14326 (2009).
Article ADS Google Scholar
Mirbagheri, S. A. & Fu, H. C. Helicobacter pylori couples motility and diffusion to actively create a heterogeneous complex medium in gastric mucus. Phys. Rev. Lett. 116, 198101 (2016).
Article ADS Google Scholar
Purcell, E. M. Life at low Reynolds number. Am. J. Phys. 45, 3–11 (1977).
Article ADS Google Scholar
Hu, W., Lum, G. Z., Mastrangeli, M. & Sitti, M. Small-scale soft-bodied robot with multimodal locomotion. Nature 5554, 81–85 (2016).
Google Scholar
Ohm, C., Brehmer, M. & Zentel, R. Liquid crystalline elastomers as actuators and sensors. Adv. Mater. 22, 3366–3387 (2010).
Article Google Scholar
Dai, B. et al. Programmable artificial phototactic microswimmer. Nat. Nanotechnol. 11, 1087–1092 (2016).
Article ADS Google Scholar
Palagi, S. et al. Structured light enables biomimetic swimming and versatile locomotion of photoresponsive soft microrobots. Nat. Mater. 15, 647 (2016).
Article ADS Google Scholar
von Rohr, A., Trimpe, S., Marco, A., Fischer, P. & Palagi, S. Gait learning for soft microrobots controlled by light fields. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 6199–6206 (IEEE, 2018).
Huang, H.-W., Sakar, M. S., Petruska, A. J., Pané, S. & Nelson, B. J. Soft micromachines with programmable motility and morphology. Nat. Commun. 7, 12263 (2016).
Article ADS Google Scholar
Huang, H.-W. et al. Adaptive locomotion of artificial microswimmers. Sci. Adv. 5, eaau1532 (2019).
Reddy, G., Celani, A., Sejnowski, T. & Vergassola, M. Learning to soar in turbulent environments. Proc. Natl Acad. Sci. USA 113, E4877 – E4884 (2016).
Article Google Scholar
Reddy, G., Wong-Ng, J., Celani, A., Sejnowski, T. J. & Vergassola, M. Glider soaring via reinforcement learning in the field. Nature 562, 236–239 (2018).
Article ADS Google Scholar
Gazzola, M., Tchieu, A. A., Alexeev, D., de Brauer, A. & Koumoutsakos, P. Learning to school in the presence of hydrodynamic interactions. J. Fluid Mech. 789, 726–749 (2016).
Article ADS MathSciNet Google Scholar
Biferale, L., Bonaccorso, F., Buzzicotti, M., Clark Di Leoni, P. & Gustavsson, K. Zermelo’s problem: optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos 29, 103138 (2019).
Article ADS MathSciNet Google Scholar
Verma, S., Novati, G. & Koumoutsakos, P. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. USA 115, 5849–5854 (2018).
Article ADS Google Scholar
Jiao, Y. et al. Learning to swim in potential flow. Phys. Rev. Fluids 6, 050505 (2021).
Article ADS Google Scholar
Cichos, F., Gustavsson, K., Mehlig, B. & Volpe, G. Machine learning for active matter. Nat. Mach. Intell. 2, 94–103 (2020).
Article Google Scholar
Tsang, A. C. H., Demir, E., Ding, Y. & Pak, O. S. Roads to smart artificial microswimmers. Adv. Intell. Syst. 2, 1900137 (2020).
Article Google Scholar
Colabrese, S., Gustavsson, K., Celani, A. & Biferale, L. Flow navigation by smart microswimmers via reinforcement learning. Phys. Rev. Lett. 118, 158004 (2017).
Article ADS Google Scholar
Alageshan, J. K., Verma, A. K., Bec, J. & Pandit, R. Machine learning strategies for path-planning microswimmers in turbulent flows. Phys. Rev. E 101, 043110 (2020).
Article ADS Google Scholar
Schneider, E. & Stark, H. Optimal steering of a smart active particle. Europhys. Lett. 127, 64003 (2019).
Article ADS Google Scholar
Muiños-Landin, S., Fischer, A., Holubec, V. & Cichos, F. Reinforcement learning with artificial microswimmers. Sci. Robot. 6, eabd9285 (2021).
Yang, Y., Bevan, M. A. & Li, B. Micro/nano motor navigation and localization via deep reinforcement learning. Adv. Theory Simul. 3, 2000034 (2020).
Article Google Scholar
Tsang, A. C. H., Tong, P. W., Nallan, S. & Pak, O. S. Self-learning how to swim at low Reynolds number. Phys. Rev. Fluids 5, 074101 (2020).
Article ADS Google Scholar
Hartl, B., Hübl, M., Kahl, G. & Zöttl, A. Microswimmers learning chemotaxis with genetic algorithms. Proc. Natl Acad. Sci. USA 118, e2019683118 (2021).
Najafi, A. & Golestanian, R. Simple swimmer at low Reynolds number: three linked spheres. Phys. Rev. E 69, 062901 (2004).
Article ADS Google Scholar
Ledesma-Aguilar, R., Löwen, H. & Yeomans, J. A circle swimmer at low Reynolds number. Eur. Phys. J. E 35, 1–9 (2012).
Article Google Scholar
Avron, J. E., Kenneth, O. & Oaknin, D. H. Pushmepullyou: an efficient micro-swimmer. New J. Phys. 7, 234 (2005).
Article ADS Google Scholar
Golestanian, R. & Ajdari, A. Stochastic low Reynolds number swimmers. J. Phys. Condens. Matter 21, 204104 (2009).
Article ADS Google Scholar
Alouges, F., DeSimone, A., Giraldi, L. & Zoppello, M. Self-propulsion of slender micro-swimmers by curvature control: N-link swimmers. Int. J. Nonlinear Mech. 56, 132–141 (2013).
Article ADS Google Scholar
Wang, Q. Optimal strokes of low reynolds number linked-sphere swimmers. Appl. Sci. 9, 4023 (2019).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at arxiv: 1707.06347 (2017).
Liu, Y., Zou, Z., Tsang, A. C. H., Pak, O. S. & Young, Y.-N. Mechanical rotation at low Reynolds number via reinforcement learning. Phys. Fluids 33, 062007 (2021).
Article ADS Google Scholar
Happel, J. & Brenner, H. Low Reynolds Number Hydrodynamics: with Special Applications to Particulate Media (Noordhoff International Publishing, 1973).
Kim, S. & Karrila, S. J. Microhydrodynamics: Principles and Selected Applications (Dover, New York, 2005).
Dhont, J. An Introduction to Dynamics of Colloids (Elsevier, 1996).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998).
Golestanian, R. & Ajdari, A. Analytic results for the three-sphere swimmer at low Reynolds number. Phys. Rev. E 77, 036308 (2008).
Article ADS Google Scholar
Howse, J. R. et al. Self-motile colloidal particles: From directed propulsion to random walk. Phys. Rev. Lett. 99, 048102 (2007).
Article ADS Google Scholar
Lobaskin, V., Lobaskin, D. & Kulić I. M. Brownian dynamics of a microswimmer. Eur. Phys. J.: Spec. Top. 157, 149–156 (2008).
Google Scholar
Dunkel, J. & Zaid, I. M. Noisy swimming at low Reynolds numbers. Phys. Rev. E 80, 021903 (2009).
Article ADS Google Scholar

Download references

Acknowledgements

Funding support by the National Science Foundation (Grant Nos. 1830958 and 1931292 to O.S.P. and Grant Nos. 1614863 and 1951600 to Y.-N.Y.) is gratefully acknowledged. Y.-N.Y. acknowledges support from Flatiron Institute, part of Simons Foundation. A.C.H.T. acknowledges funding support from the Croucher Foundation. Z.Z. and O.S.P. acknowledge the use of computational resources at the WAVE computing facility (enabled by the E.L. Wiegand Foundation) at Santa Clara University. We also thank Yi Fang for useful discussion.

Author information

These authors contributed equally: Zonghao Zou, Yuexin Liu.

Authors and Affiliations

Department of Mechanical Engineering, Santa Clara University, Santa Clara, CA, 95053, USA
Zonghao Zou & On Shun Pak
Department of Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ, 07102, USA
Yuexin Liu & Y.-N. Young
Department of Mechanical Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
Alan C. H. Tsang

Authors

Zonghao Zou
View author publications
You can also search for this author in PubMed Google Scholar
Yuexin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Y.-N. Young
View author publications
You can also search for this author in PubMed Google Scholar
On Shun Pak
View author publications
You can also search for this author in PubMed Google Scholar
Alan C. H. Tsang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.Z., Y.L., Y.-N.Y., O.S.P., and A.C.H.T. designed research; Z.Z., Y.L., Y.-N.Y., O.S.P., and A.C.H.T. performed research; Z.Z., Y.L., Y.-N.Y., O.S.P., and A.C.H.T. analyzed data; and Z.Z., Y.L., Y.-N.Y., O.S.P., and A.C.H.T. wrote the paper.

Corresponding authors

Correspondence to Y.-N. Young, On Shun Pak or Alan C. H. Tsang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Physics thanks Giovanni Volpe and the other, anonymous, reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Movie 1

Supplementary Movie 2

Supplementary Movie 3

Supplementary Movie 4

Supplementary Movie 5

Supplementary Movie 6

Supplementary Movie 7

Supplementary Movie 8

Supplementary material

Description of Additional Supplementary Files

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zou, Z., Liu, Y., Young, YN. et al. Gait switching and targeted navigation of microswimmers via deep reinforcement learning. Commun Phys 5, 158 (2022). https://doi.org/10.1038/s42005-022-00935-x

Download citation

Received: 07 February 2022
Accepted: 31 May 2022
Published: 21 June 2022
DOI: https://doi.org/10.1038/s42005-022-00935-x

This article is cited by

Learning to cooperate for low-Reynolds-number swimming: a model problem for gait coordination
- Yangzhe Liu
- Zonghao Zou
- Alan C. H. Tsang
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.