Abstract
Swimming microorganisms switch between locomotory gaits to enable complex navigation strategies such as run-and-tumble to explore their environments and search for specific targets. This ability of targeted navigation via adaptive gait-switching is particularly desirable for the development of smart artificial microswimmers that can perform complex biomedical tasks such as targeted drug delivery and microsurgery in an autonomous manner. Here we use a deep reinforcement learning approach to enable a model microswimmer to self-learn effective locomotory gaits for translation, rotation and combined motions. The Artificial Intelligence (AI) powered swimmer can switch between various locomotory gaits adaptively to navigate towards target locations. The multimodal navigation strategy is reminiscent of gait-switching behaviors adopted by swimming microorganisms. We show that the strategy advised by AI is robust to flow perturbations and versatile in enabling the swimmer to perform complex tasks such as path tracing without being explicitly programmed. Taken together, our results demonstrate the vast potential of these AI-powered swimmers for applications in unpredictable, complex fluid environments.
Similar content being viewed by others
Introduction
Swimming microorganisms have evolved versatile navigation strategies by switching their locomotory gaits in response to their surroundings1. Their navigation strategies typically involve switching between translation and rotation modes such as run-and-tumble and reverse-and-flick in bacteria2,3,4,5, as well as run-stop-shock and run-and-spin in eukaryotes6,7. Such an adaptive, multimodal gait-switching ability is particularly desirable for biomedical applications of artificial microswimmers such as targeted drug delivery and microsurgery8,9,10,11,12, which require navigation towards target locations in biological media with uncontrolled and/or unpredictable environmental factors13,14,15.
Pioneering works by Purcell and subsequent studies demonstrated how simple reconfigurable systems with ingenious locomotory gaits can generate net translation and rotation, given the stringent constraints for locomotion at low Reynolds numbers16. Yet, the design of locomotory gaits becomes increasingly intractable when more sophisticated maneuvers are required or environmental perturbations are present. Existing microswimmers are therefore typically designed with fixed locomotory gaits and rely on manual interventions for navigation8,17,18,19,20,21. It remains an unresolved challenge in developing microswimmers with adaptive locomotory strategies similar to that of biological cells that can navigate complex environments autonomously. Modular microrobotics and the use of soft active materials22,23 have been proposed to address the challenge.
More recently, the rapid development of artificial intelligence (AI) and its applications in locomotion problems24,25,26,27,28,29 have opened different paths towards designing the next generation of smart microswimmers30,31. Various machine learning approaches have enabled the navigation of active particles in the presence of background flows32,33, thermal fluctuations34,35, and obstacles36. As minimal models, the microswimmers are often modeled as active particles with prescribed self-propelling velocities and certain degrees of freedom for speed variation and re-orientation. However, the complex adjustments in locomotory gaits required for such adaptations are typically not accounted for. Recent studies have begun to examine how different machine learning techniques enable reconfigurable microswimmers to evolve effective gaits for self-propulsion37 and chemotactic repsonse38.
Here, we combine reinforcement learning (RL) with artificial neural network to enable a simple reconfigurable system to perform complex maneuvers in a low-Reynolds-number environment. We show that the deep RL framework empowers a microswimmer to adapt its locomotory gaits in accomplishing sophisticated tasks including targeted navigation and path tracing, without being explicitly programmed. The multimodel gait switching strategies are reminiscent of that adopted by swimming microorganisms. Furthermore, we examine the performance of these locomotion strategies against perturbations by background flows. The results showcase the versatility of AI-powered swimmers and their robustness in media with uncontrolled environmental factors.
Results and discussion
Model reconfigurable system
We consider a simple reconfigurable system consisting of three spheres with radius R and centers ri (i = 1, 2, 3) connected by two arms with variable lengths and orientations as shown in Fig. 1a. This setup generalizes previous swimmer models proposed by Najafi and Golestanian39 and Ledesma-Aguilar et al. 40 by allowing more degrees of freedom. The interaction between the system and the surrounding viscous fluid is modeled by low Reynolds number hydrodynamics, imposing stringent constraints on the locomotive capability of the system. Unlike the traditional paradigm where the locomotory gaits are prescribed in advance39,40,41,42,43,44, here we exploit a deep RL framework to enable the system to self-learn a set of locomotory gaits to swim along a target direction, θT. We employ a deep neural network based on the Actor-Critic structure and implement the Proximal Policy Optimization (PPO) algorithm29,45 to train and update the agent (i.e., AI) in charge of the decision making process (Fig. 1b). The deep RL framework here extends previous studies from discrete action spaces to continuous action spaces32,35,37,46, enhancing the swimmer’s capability in developing more versatile locomotory gaits for complex navigation tasks (see the “Methods” section for implementation details of the Actor-Critic neural network and PPO algorithm).
Hydrodynamic interactions
The interaction between the spheres and their surrounding fluid is governed by the Stokes equation ( ∇ p = μ∇2u, ∇ ⋅ u = 0). Here, p, μ and u represent, respectively, the pressure, dynamic viscosity, and velocity field. In this low Reynolds number regime, the velocities of the spheres Vi and the forces Fi acting on them can be related linearly as
where Gij is the Oseen tensor47,48,49 given by
Here, I is the identity matrix and \({\hat{{{{{{{{\bf{r}}}}}}}}}}_{ij}=({{{{{{{{\bf{r}}}}}}}}}_{i}-{{{{{{{{\bf{r}}}}}}}}}_{j})/| {{{{{{{{\bf{r}}}}}}}}}_{i}-{{{{{{{{\bf{r}}}}}}}}}_{j}|\) denotes the unit vector between spheres i and j. The torque acting on the sphere i is calculated by Ti = ri × Fi. The rate of actuation of the arm lengths \({\dot{L}}_{1}\), \({\dot{L}}_{2}\) and the intermediate angle \({\dot{\theta }}_{31}\) can be expressed in terms of the velocities of the spheres Vi. The kinematics of the swimmer is fully determined upon applying the force free (∑iFi = 0) and torque-free (∑iTi = 0) conditions. The Oseen tensor hydrodynamic description is valid when the spheres are not in close proximity (R ≪ L). We therefore constrain the arm and angle contractions such that 0.6L ≤ L1, L2 ≤ L and 2π/3 ≤ θ31 ≤ 4π/3.
The actuation rate of the arm lengths \({\dot{L}}_{1},{\dot{L}}_{2}\) can be expressed in terms of the relative velocities of the spheres parallel to the arm orientations:
The actuation rate of the intermediate angle \({\dot{\theta }}_{31}\) can be expressed in terms of the relative velocities of the spheres perpendicular to the arm orientations:
where \({\dot{\theta }}_{1}\) and \({\dot{\theta }}_{2}\) are the arm rotation speeds. Together with the Oseen tensor description of the hydrodynamic interaction between the spheres, Eqs. (1) and (2) in the main text, and the overall force-free and torque-free conditions, the kinematics of the swimmer is fully determined.
In presenting our results, we scale lengths by the fully extended arm length L, velocities by a characteristic actuation rate of the arm Vc, and hence time by L/Vc and forces by μLVc (see Non-dimensionalization under Supplementary methods).
Targeted navigation
We first use the deep RL framework to train the model system in swimming along a target direction θT, given any arbitrary initial swimmer’s orientation θo. The swimmer’s orientation is defined based on the relative position between the swimmer’s centroid rc = ∑iri/3 and r1 as \({\theta }_{{\rm {o}}}=\arg ({{{{{{{{\bf{r}}}}}}}}}_{{\rm {c}}}-{{{{{{{{\bf{r}}}}}}}}}_{1})\) (Fig. 1).
In the RL algorithm, the state s ∈ (r1, L1, L2, θ1, θ2) of the system is specified by the sphere center r1, arm lengths L1, L2, and arm orientations θ1, θ2. The observation \(o\in ({L}_{1},{L}_{2},{\theta }_{31},\cos {\theta }_{{\rm {d}}},\sin {\theta }_{{\rm {d}}})\) is extracted from the state, where θ31 is the intermediate angle and θd = θT−θo is the difference between the target direction θT and the swimmer’s orientation θo; note that the angle difference is expressed in terms of \((\cos {\theta }_{{\rm {d}}},\sin {\theta }_{{\rm {d}}})\) to avoid discontinuity in the orientation space. The AI decides the swimmer’s next action based on the observation using the Actor neural network: for each action step Δt, the swimmer performs an action \(a\in ({\dot{L}}_{1},{\dot{L}}_{2},{\dot{\theta }}_{31})\) by actuating its two arms, leading to swimmer displacement. To quantify the success of a given action, the reward is measured by the displacement of the swimmer’s centroid along the target direction, \({r}_{t}=({{{{{{{{\bf{r}}}}}}}}}_{{{\rm {c}}}_{t+1}}-{{{{{{{{\bf{r}}}}}}}}}_{{{\rm {c}}}_{t}})\cdot (\cos {\theta }_{{\rm {T}}},\,\sin {\theta }_{{\rm {T}}})\).
We divide the training process into a total of Ne episodes, with each episode consisting of Nt = 150 learning steps. To ensure a full exploration of the observation space o, both the initial swimmer state s and the target direction θT are randomized in each episode. Based on the training results after every 20 episodes, the critic neural network updates the AI to maximize the expected long-term rewards E[Rt=0∣πθ], where πθ is the stochastic control policy, \({R}_{t}=\mathop{\sum }\nolimits_{t^{\prime} }^{\infty }{\gamma }^{t^{\prime} -t}{r}_{t^{\prime} }\) is the infinite-horizon discounted future returns, and γ is the discount factor measuring the greediness of the algorithm45,50. A large discount factor γ = 0.99 is set here to ensure farsightedness of the algorithm. As the episodes proceed, the Actor-Critic structure progressively trains the AI and thereby enhances the performance of the swimmer.
In Fig. 2 (Supplementary Movie 1) we visualize the navigation of a trained swimmer along a target direction θT, given a substantially different initial orientation, θo. The swimmer’s targeted navigation is accomplished in three stages: (1) in the initial phase (blue curve and regime), the swimmer employs “steering” gaits primarily for re-orientation, followed by (2) “transition” phase (red curve and regime) in which the swimmer continues to adjust its direction while self-propelling, before reaching (3) the “translation” phase (green curve and regime), in which the re-orientation is complete and the swimmer simply self-propels along the target direction. This example illustrates how an AI-powered reconfigurable system evolves a multimodal navigation strategy without explicitly programmed or relying on any prior knowledge of low-Reynolds-number locomotion. We next analyze the locomotory gaits in each mode in the evolved strategy.
Multimodal locomotory gaits
Here we examine the details of the locomotory gaits acquired by the swimmer for targeted navigation in the steering, transition, and translation modes. We distinguish these gaits by visualizing their configurational changes in the three-dimensional (3D) configuration space of the swimmer (L1, L2, θ31) in Fig. 3. Here we utilize an example of a swimmer navigating towards a target direction with ∣θd∣ > π/2 to illustrate the switching between different locomotory gaits (Fig. 3a), Supplementary Movies 2 and 3). The swimmer needs to re-orient itself in the counter-clockwise direction in this example; an example for the case of clockwise rotation is included in the Supplementary Note 1 (Supplementary Fig. 1, Movies 7 and 8). The dots in Fig. 3a represent configurations at different action steps. The configurations for the steering (blue dots), transition (red dots), and translation (green dots) gaits are clustered in different regions in the configuration space. A representative sequence of configurational changes for each mode of gaits are shown as solid lines to aid visualization (Fig. 3a).
We further examine the evolution of L1, L2, and θ31 using the representative sequences of configurational changes identified in Fig. 3a for each mode of gaits. For the steering gaits (Fig. 3b, blue lines and Fig. 3d, blue box), the swimmer repeatedly extends and contracts L2 and θ31, but keeps L1 constant (the left arm rests in the fully contracted state). The steering gaits thus reside in the L2−θ31 plane in Fig. 3a (blue line). The large variation in θ31 generates net rotation, substantially re-orientating the swimmer orientation with a relatively small net translation (Fig. 3c). For the transition gaits (Fig. 3b, red lines and Fig. 3d, red box), the swimmer repeatedly extends and contracts all L1, L2 and θ31, leading to significant amounts of both net rotation and translation (Fig. 3c). In the configuration space (Fig. 3a), the transition gaits tilt into the L1−L2 plane with an average θ31 less than π (red line). Compared with the steering gaits, the variation of θ31 becomes more restricted (Fig. 3b), resulting in smaller net rotation for fine tuning of the swimmer’s orientation in the transition phase. Finally, for the translation gaits (Fig. 3b, green lines and Fig. 3d, green box), the swimmer’s orientation is aligned with the target direction (θd ≈ 0); the swimmer repeatedly extends and contracts L1 and L2, while keeping θ31 close to π (i.e., all three spheres of the swimmer are aligned), resembling the swimming gaits of Najafi–Golestanian swimmers39,51. In the configuration space (Fig. 3a), the translation gaits reside largely in the L1−L2 plane with an approximately zero average θ31, generating the maximum net translation with minimal net rotation (Fig. 3c). The details of gaits categorization are summarized under Supplementary methods.
It is noteworthy that the multimodal navigation strategy emerges solely from the AI without relying on prior knowledge of locomotion. The switching between rotation, transition, and translation gaits is analogous to the switching between turning and running modes observed in bacterial locomotion2,5. These results demonstrate how an AI-powered swimmer, without being explicitly programmed, self-learns complex locomotory gaits from rich action and configuration spaces and undergoes autonomous gait switching in accomplishing targeted navigation.
Performance evaluation
Here we investigate the improvement of swimmer’s performance with increased number of training episodes Ne. At initial stage of training with a small Ne, the swimmer may fail to identify the right sets of locomotory gaits to achieve targeted navigation due to insufficient training. Continuous training with increased number of episodes would enable the swimmer to identify better locomotory gaits to complete navigation tasks. Here we measure the improvement of swimmer’s performance with increased Ne by three locomotion tests: (1) Random target test: the swimmer is assigned a target direction selected randomly from a uniform distribution in [0, 2π]; (2) Rotation test: the swimmer is assigned a targeted direction with a large angle of difference with swimmer’s orientation (i.e., θd = ± π/2); (3) Translation test: the swimmer is assigned a target direction equal to the swimmer’s orientation (i.e., θd = 0). A test is considered to be successful if the swimmer travels along the target direction for a distance of 5 unit in 10,000 action steps. These tests ensure that the trained swimmer acquires a set of effective locomotory gaits to swim along any specified direction with robust rotation and translation.
We consider the success rates of the three tests over 100 trials (Fig. 4). For Ne = 3 × 104, success rates of around 90% are obtained for the three tests. When Ne is increased to 9 × 104, the swimmer masters translation with a 100% success rate but still needs more training for rotation. When Ne is increased further to 15 × 104, the swimmer obtains 100% success rates for all tests. This result demonstrates the continuous improvement in the robustness of targeted navigation with increased Ne up to 15 × 104. As we further increase Ne, we found the relationship between Ne and performance to be non-monotonic. For a total training episodes much greater than Ne = 15 × 104, the overall success rate will begin to drop and eventually fluctuate around 95%. We selected the trained result at Ne = 15 × 104 for the best overall performance.
To better understand the swimmer’s training process, we also varied the number of steps in each episodes, Nl. For a range from 100 to 300 and a fixed total episodes Ne, we found Nl = 150 provides the most efficient way to balance translation and rotation and require least amount of action steps to complete both the rotation and translation tests. We remark that, when Nl = 100, the swimmer was only able to translate but not to rotate, indicating the significant role Nl plays in learning.
Lastly, we remark that the swimmer appears to require more training, both in Ne and Nl, to learn rotation compared to translation. This may be attributed to the inherit complexity of rotation gaits, where the swimmer needs to actuate its intermediate angle in addition to the actuation of the two arms required in translation gaits.
Path tracing–"SWIM"
Next we showcase the swimmer’s capability in tracing complex paths in an autonomous manner. To illustrate, the swimmer is tasked to trace out the English word “SWIM" (Fig. 5, Supplementary Movie 4). We note that the hydrodynamic calculations required to design the locomotory gaits to trace such complex paths become quickly intractable as the complexity increases. Here, instead of explicitly programming the gaits of the swimmer, we only select target points (pi, i = 1, 2, . . . , 17, red spots in Fig. 5) as landmarks and require the swimmer to navigate towards these landmarks with its own AI, with the target directions at action step t + 1 given by \({\theta }_{{T}_{t+1}}=\arg ({{{{{{{{\bf{p}}}}}}}}}_{i}-{{{{{{{{\bf{r}}}}}}}}}_{{c}_{t}})\). The swimmer is assigned with the next target point pi+1 when its centroid is within a certain threshold (0.1 of the fully extended arm length) from pi. The completion of these multiple navigation tasks sequentially enables the swimmer to successfully trace out the word “SWIM" with a high accuracy (Fig. 5, Supplementary Movie 4). In accomplishing this task, the swimmer switches between the three modes of locomotory gaits autonomously to swim towards individual target points and turn around the corners of the path based on the AI-powered navigation strategy. It is noteworthy that the swimmer is able to navigate around some corners (e.g., at target points 4 and 6) without activating the steering gaits, which are employed for corners with more acute angles (e.g., at target points 8, 14, and 16). While past approaches based on detailed hydrodynamic calculations, manual interventions, or other control methods may also complete such tasks, here we present reinforcement learning as an alternative approach in accomplishing these complex maneuvers in a more autonomous manner.
Robustness against flows
Last, we examine the performance of targeted navigation under the influence of flows (Fig. 6a, b, Supplementary Movies 5, 6). In particular, to determine to what extent the AI-powered swimmer is capable of maintaining its target direction against flow perturbations, we use the same AI-powered swimmer trained without any background flow, and impose a rotational flow generated by a rotlet at the origin47,48, u∞ = −γ × r/r3, where γ = γez prescribes the strength of the rotlet in the z-direction, r = ∣r∣ is the magnitude of the position vector r from the origin (see the section “Simulations of background flow” under Supplementary methods). Here the AI-powered swimmer is tasked to navigate towards the positive x-direction under flow perturbations due to the rotlet. We examine how the swimmer adapts to the background flow when performing this task. For comparison, we contrast the resulting motion of the AI-powered swimmer with that of an untrained swimmer (i.e., a Najafi–Golestanian (NG) swimmer that performs only fixed locomotory gaits without any adaptivity39). Without the background flow, both swimmers self-propel with the same speed. Both swimmers are initially placed close to the rotlet with rc = −5ex and we sample their performance with three different initial orientations: \({\theta }_{{o}_{0}}=-\pi /3\), 0, and π/3, under different flow strengths. Under a relatively weak flow (γ = 0.15, Fig. 6a), Supplementary Movie 5), the AI-powered swimmer is capable of navigating towards the positive x-direction regardless of its initial orientations against flow perturbations. In contrast, the trajectories of the NG swimmer are largely influenced by the rotlet flow passively depending on the initial orientation of the swimmer. For an increased flow strength (γ = 1.5, Fig. 6b, Supplementary Movie 6), the NG swimmer completely loses control of its direction and is scattered by the rotlet into different directions again due to the absence of any adaptivity. Under such a strong flow, the AI-powered swimmer initially circulates around the rotlet but eventually manages to escape from it, navigating to the positive x-direction successfully with similar trajectories for all initial orientations. We note that the vorticity experienced by the swimmer in this case is comparable with typical re-orientation rates of the AI-powered swimmer. We also remark that when navigating under flow perturbations, the AI-powered swimmer adopts the transition gaits to constantly re-orient itself towards the positive x-direction and self-propels along that direction eventually. These results showcase the AI-powered swimmer’s capability in adapting its locomotory gaits to navigate robustly against flows.
Conclusions
In this work, we present a deep RL approach to enable navigation of an artificial microswimmer via gait switching advised by the AI. In contrast to previous works that considered active particles with prescribed self-propelling velocities as minimal models32,34,35 or simple one-dimensional swimmers37,38,46, here we demonstrate how a reconfigurable system can learn complex locomotory gaits from rich and continuous action spaces to perform sophisticated maneuvers. Through RL, the swimmer develops distinct locomotory gaits for a multimodal (i.e., steering, transition, and translation) navigation strategy. The AI-powered swimmer can adapt its locomotory gaits in an autonomous manner to navigate towards any arbitrary directions. Furthermore, we show that the swimmer can navigate robustly under the influence of flows and trace convoluted paths. Instead of explicitly programming a swimmer to perform these tasks in the traditional approach, the swimmer is advised by the AI to perform complex locomotory gaits and autonomous gait switching in accomplishing these navigation tasks. The multimodal strategy employed by the AI-powered swimmer is reminiscent of the run-and-tumble in bacteria2,5. Taken together, our results showcase the vast potential of this deep RL approach in realizing adaptivity similar to that of biological organisms for robust locomotive capabilities. Such adaptive behaviors are crucial for future biomedical applications of artificial microswimmers in complex media with uncontrolled and/or unpredictable environmental factors.
We finally discuss several possibilities for subsequent investigations based on this deep RL approach. While we demonstrate only planar motion in this work, the approach can be readily extended to three-dimensional navigation by allowing out-of-plane rotation the swimmer’s arms with expanded observation and action spaces for the additional degrees of freedom. Moreover, the deep RL framework is not tied to any specific swimmers; a simple multi-sphere system is used in this work for illustration, and the same framework applies to other reconfigurable systems. We also remark that the AI-powered swimmer is able to overcome some influences of flows even though such flows were absent in the training. Subsequent investigations including the flow perturbation in the training may lead to even more powerful AI that could exploit the flows to further enhance the navigation strategies. Another practical aspect to consider is the effect of Brownian noise52,53,54. Specifically, the characterization of the effect of thermal fluctuations in both the training process of the swimmer and its resulting navigation performance is currently underway. In addition to flow and thermal fluctuations, other environmental factors, including the presence of physical boundaries and obstacles, may be addressed in similar manners in future studies. The deep RL approach here opens an alternative path towards designing adaptive microswimmers with robust locomotive and navigation capabilities in more complex, realistic environments.
Methods
Here we briefly explain the Proximal Policy Optimization (PPO) alogrithm we used to train our AI-powered swimmer.
In the PPO algorithm, the agent’s motion control is managed with a neural network with an Actor-Critic structure. The Actor network can be considered as a stochastic control policy πθ(at∣ot), where it generates an action at given an observation ot following a Gaussian distribution. Here θ represents all the parameters of the actor neural network. The Critic network is used to compute the value function Vϕ by assuming the agent starts at an observation o and acts according to a particular policy πθ. The parameters in the critic network is represented as ϕ.
To effectively train the swimmer, we divide the total training process into episodes. Each episode can be considered as one round, which terminates after a fixed amount of training steps (Nl = 150). To ensure fully exploration of the observation space, we randomly initialize the swimmer’s geometric configurations (L1, L2, θ1, θ2) and the target direction (θT) at the beginning of each episode.
At time t, the agent receives its current observation ot and samples action at based on the policy πθ. Given at, the swimmer interacts with its surrounding and calculates the next state st+1 and reward rt. The next observation ot+1 extracted from st+1 is sent to the agent for the next iteration. All the observations, actions, rewards and sampling probabilities are stored for the agent’s update. The update process begins after running fix amount of episodes NE = 20 (Total training steps of an update is therefore: N = NE*Nl = 3000). The goal for the update is to optimize θ so that the expected long term rewards J(πθ) = E[Rt=0∣πθ] is maximized.
The expectation is taken with respect to each running episode, τ. Here, we use the infinite-horizon discounted returns \({r}_{t}=\mathop{\sum }\nolimits_{t^{\prime} }^{\infty }{\gamma }^{t^{\prime} -t}{r}_{t^{\prime} }\), where γ is the discount factor measuring the greediness of the algorithm. We set γ = 0.99 ensuring its farsightedness. To solve this optimization problem, we use the typical policy gradient approach estimation: ∇θJ(πθ). More specifically, we implemented the clipped advantage PPO algorithm to avoid large changes in each gradient update. We estimated the surrogate objective J(πθ) by clipping the probability ratio r(θ) times the advantage function \({\hat{A}}_{t}\). The probability ratio measures the probability of selecting an action for the current policy over the old policy (\(r(\theta )=\frac{{\pi }_{\theta }{(a| o)}_{N\times 1}}{{\pi }_{{\theta }_{{{{{{{\mathrm{old}}}}}}}}}{(a| o)}_{N\times 1}}\)). The advantage function \({\hat{A}}_{t}\) describes the relative advantage of taking an action a based on an observation o over a randomly selected action and is calculated by subtracting the value function VN×1 from the discounted return RN×1 (\({\hat{A}}_{t}={R}_{N\times 1}-{V}_{N\times 1}\)).
We then update the parameters θ, ϕ via a typical gradient descent algorithm: Adam optimizer. The full detail for our implementation is included in the Algorithm 1 and 2 below. Here, K is the total epoch number. Nl is the number of steps in one episode, and N is the total number of steps for each update. The PPO algorithm uses fixed-length trajectory segments τ. During each iteration, each of NA parallel actors collect T time steps of data, then we construct the surrogate loss on these NAT time steps of data, and optimize it with Adam for K epochs.
In the following we present the algorithm tables for the PPO algorithm employed in this work. We refer the readers to classical monographs for more details45.
Algorithm 1
Environment
1: | for time step t = 0, 1, . . . do |
2: | if mod(t, Nl) = 0 then |
3: | Reset state st |
4: | Compute observation ot |
5: | end if |
6: | Sample action at from policy πθ |
7: | Evaluate the next state st+1 and reward rt following the swimmer’s hydrodynamics |
8: | Compute the next observation ot+1 from state st+1 |
9: | if t = 0 or mod(t, N) ≠ 0 then |
10: | append observation ot+1, action at, reward rt and action sampling probability πθ(at∣ot) to observation list oN×5, action list aN×3, reward list RN×1 and action sampling probability list \({\pi }_{{\theta }_{{{{{{{{\rm{old}}}}}}}}}}{(a| o)}_{N\times 1}\) |
11: | else |
12: | Update the Agent using Algorithm 2 |
13: | end if |
14: | end for |
Algorithm 2
Proximal Policy Optimization, Actor-Critic, Update the Agent
1: | Input: Initial policy parameter θ, initial value function parameter ϕ |
2: | for k = 0, 1, 2,…K do |
3: | Compute infinite-horizon discounted returns RN×1 |
4: | Evaluate expected returns VN×1 using observations oN×5 and value function Vϕ |
5: | Compute the advantage function: \({\hat{A}}_{t}={R}_{N\times 1}-{V}_{N\times 1}\) |
6: | Evaluate the probability for policy πθ using observations oN×5 and actions aN×3, store the probability to πθ(a∣o)N×1 |
7: | Compute the probability ratio: \(r(\theta )=\frac{{\pi }_{\theta }{(a| o)}_{N\times 1}}{{\pi }_{{\theta }_{{{{{{{\mathrm{old}}}}}}}}}{(a| o)}_{N\times 1}}\) |
8: | Compute the clipped surrogate loss function: \({L}^{{{{{{{\mathrm{CLIP}}}}}}}}(\theta )={\mathbb{E}}[\min (r(\theta ){\hat{A}}_{t},\,{{{{{{\mathrm{clip}}}}}}}\,(r(\theta ),1-\epsilon ,1+\epsilon ){\hat{A}}_{t})]\), with constant ϵ |
9: | Compute the value-function loss: \({L}^{{{{{{{\mathrm{VF}}}}}}}}(\phi )=\frac{1}{2}{{{{{{{\bf{E}}}}}}}}[{({R}_{N\times 1}-{V}_{N\times 1})}^{2}]\) |
10: | Compute the entropy loss: LS = αS[πθ], with constant α |
11: | Compute the total loss: L(θ, ϕ) = −LCLIP(θ) + LVF(ϕ)−LS |
12: | Optimize surrogate L with respect to (θ, ϕ), with K epochs and minibatch size M≤NAT, with NA is the number of parallel actors and T is the time step. |
13: | θold ← θ, ϕold ← ϕ |
14: | end for |
Code availability
The codes that support the findings of this study are available from the corresponding author upon reasonable request.
References
Lauga, E. & Powers, T. R. The hydrodynamics of swimming microorganisms. Rep. Prog. Phys. 72, 096601 (2009).
Berg, H. C. & Brown, D. A. Chemotaxis in Escherichia coli analysed by three-dimensional tracking. Nature 239, 500–504 (1972).
Stocker, R., Seymour, J. R., Samadani, A., Hunt, D. E. & Polz, M. F. Rapid chemotactic response enables marine bacteria to exploit ephemeral microscale nutrient patches. Proc. Natl Acad. Sci. USA 105, 4209–4214 (2008).
Xie, L., Altindal, T., Chattopadhyay, S. & lun Wu, X. Bacterial flagellum as a propeller and as a rudder for efficient chemotaxis. Proc. Natl Acad. Sci. USA 108, 2246 – 2251 (2011).
Ipiña, E. P., Otte, S., Pontier-Bres, R., Czerucka, D. & Peruani, F. Bacteria display optimal transport near surfaces. Nat. Phys. 15, 610–615 (2019).
Wan, K. Y. & Goldstein, R. E. Time irreversibility and criticality in the motility of a flagellate microorganism. Phys. Rev. Lett. 121, 058103 (2018).
Tsang, A. C. H., Lam, A. T. & Riedel-Kruse, I. H. Polygonal motion and adaptable phototaxis via flagellar beat switching in the microswimmer euglena gracilis. Nat. Phys. 14, 1216–1222 (2018).
Gao, W. et al. Cargo-towing fuel-free magnetic nanoswimmers for targeted drug delivery. Small 8, 460–467 (2012).
Zhang, L. et al. Characterizing the swimming properties of artificial bacterial flagella. Nano Lett. 9, 3663–3667 (2009).
Ghosh, A. & Fischer, P. Controlled propulsion of artificial magnetic nanostructured propellers. Nano Lett. 9, 2243–2245 (2009).
Ceylan, H. et al. 3d-printed biodegradable microswimmer for theranostic cargo delivery and release. ACS Nano 13, 3353–3362 (2019).
Huang, T.-Y. et al. 3d printed microtransporters: compound micromachines for spatiotemporally controlled delivery of therapeutic agents. Adv. Mater.27, 6644–6650 (2015).
Nassif, X., Bourdoulous, S., Eugène, E. & Couraud, P.-O. How do extracellular pathogens cross the blood–brain barrier? Trends Microbiol. 10, 227–232 (2002).
Celli, J. P. et al. Helicobacter pylori moves through mucus by reducing mucin viscoelasticity. Proc. Natl Acad. Sci. USA 106, 14321–14326 (2009).
Mirbagheri, S. A. & Fu, H. C. Helicobacter pylori couples motility and diffusion to actively create a heterogeneous complex medium in gastric mucus. Phys. Rev. Lett. 116, 198101 (2016).
Purcell, E. M. Life at low Reynolds number. Am. J. Phys. 45, 3–11 (1977).
Hu, W., Lum, G. Z., Mastrangeli, M. & Sitti, M. Small-scale soft-bodied robot with multimodal locomotion. Nature 5554, 81–85 (2016).
Ohm, C., Brehmer, M. & Zentel, R. Liquid crystalline elastomers as actuators and sensors. Adv. Mater. 22, 3366–3387 (2010).
Dai, B. et al. Programmable artificial phototactic microswimmer. Nat. Nanotechnol. 11, 1087–1092 (2016).
Palagi, S. et al. Structured light enables biomimetic swimming and versatile locomotion of photoresponsive soft microrobots. Nat. Mater. 15, 647 (2016).
von Rohr, A., Trimpe, S., Marco, A., Fischer, P. & Palagi, S. Gait learning for soft microrobots controlled by light fields. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 6199–6206 (IEEE, 2018).
Huang, H.-W., Sakar, M. S., Petruska, A. J., Pané, S. & Nelson, B. J. Soft micromachines with programmable motility and morphology. Nat. Commun. 7, 12263 (2016).
Huang, H.-W. et al. Adaptive locomotion of artificial microswimmers. Sci. Adv. 5, eaau1532 (2019).
Reddy, G., Celani, A., Sejnowski, T. & Vergassola, M. Learning to soar in turbulent environments. Proc. Natl Acad. Sci. USA 113, E4877 – E4884 (2016).
Reddy, G., Wong-Ng, J., Celani, A., Sejnowski, T. J. & Vergassola, M. Glider soaring via reinforcement learning in the field. Nature 562, 236–239 (2018).
Gazzola, M., Tchieu, A. A., Alexeev, D., de Brauer, A. & Koumoutsakos, P. Learning to school in the presence of hydrodynamic interactions. J. Fluid Mech. 789, 726–749 (2016).
Biferale, L., Bonaccorso, F., Buzzicotti, M., Clark Di Leoni, P. & Gustavsson, K. Zermelo’s problem: optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos 29, 103138 (2019).
Verma, S., Novati, G. & Koumoutsakos, P. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. USA 115, 5849–5854 (2018).
Jiao, Y. et al. Learning to swim in potential flow. Phys. Rev. Fluids 6, 050505 (2021).
Cichos, F., Gustavsson, K., Mehlig, B. & Volpe, G. Machine learning for active matter. Nat. Mach. Intell. 2, 94–103 (2020).
Tsang, A. C. H., Demir, E., Ding, Y. & Pak, O. S. Roads to smart artificial microswimmers. Adv. Intell. Syst. 2, 1900137 (2020).
Colabrese, S., Gustavsson, K., Celani, A. & Biferale, L. Flow navigation by smart microswimmers via reinforcement learning. Phys. Rev. Lett. 118, 158004 (2017).
Alageshan, J. K., Verma, A. K., Bec, J. & Pandit, R. Machine learning strategies for path-planning microswimmers in turbulent flows. Phys. Rev. E 101, 043110 (2020).
Schneider, E. & Stark, H. Optimal steering of a smart active particle. Europhys. Lett. 127, 64003 (2019).
Muiños-Landin, S., Fischer, A., Holubec, V. & Cichos, F. Reinforcement learning with artificial microswimmers. Sci. Robot. 6, eabd9285 (2021).
Yang, Y., Bevan, M. A. & Li, B. Micro/nano motor navigation and localization via deep reinforcement learning. Adv. Theory Simul. 3, 2000034 (2020).
Tsang, A. C. H., Tong, P. W., Nallan, S. & Pak, O. S. Self-learning how to swim at low Reynolds number. Phys. Rev. Fluids 5, 074101 (2020).
Hartl, B., Hübl, M., Kahl, G. & Zöttl, A. Microswimmers learning chemotaxis with genetic algorithms. Proc. Natl Acad. Sci. USA 118, e2019683118 (2021).
Najafi, A. & Golestanian, R. Simple swimmer at low Reynolds number: three linked spheres. Phys. Rev. E 69, 062901 (2004).
Ledesma-Aguilar, R., Löwen, H. & Yeomans, J. A circle swimmer at low Reynolds number. Eur. Phys. J. E 35, 1–9 (2012).
Avron, J. E., Kenneth, O. & Oaknin, D. H. Pushmepullyou: an efficient micro-swimmer. New J. Phys. 7, 234 (2005).
Golestanian, R. & Ajdari, A. Stochastic low Reynolds number swimmers. J. Phys. Condens. Matter 21, 204104 (2009).
Alouges, F., DeSimone, A., Giraldi, L. & Zoppello, M. Self-propulsion of slender micro-swimmers by curvature control: N-link swimmers. Int. J. Nonlinear Mech. 56, 132–141 (2013).
Wang, Q. Optimal strokes of low reynolds number linked-sphere swimmers. Appl. Sci. 9, 4023 (2019).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at arxiv: 1707.06347 (2017).
Liu, Y., Zou, Z., Tsang, A. C. H., Pak, O. S. & Young, Y.-N. Mechanical rotation at low Reynolds number via reinforcement learning. Phys. Fluids 33, 062007 (2021).
Happel, J. & Brenner, H. Low Reynolds Number Hydrodynamics: with Special Applications to Particulate Media (Noordhoff International Publishing, 1973).
Kim, S. & Karrila, S. J. Microhydrodynamics: Principles and Selected Applications (Dover, New York, 2005).
Dhont, J. An Introduction to Dynamics of Colloids (Elsevier, 1996).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998).
Golestanian, R. & Ajdari, A. Analytic results for the three-sphere swimmer at low Reynolds number. Phys. Rev. E 77, 036308 (2008).
Howse, J. R. et al. Self-motile colloidal particles: From directed propulsion to random walk. Phys. Rev. Lett. 99, 048102 (2007).
Lobaskin, V., Lobaskin, D. & Kulić I. M. Brownian dynamics of a microswimmer. Eur. Phys. J.: Spec. Top. 157, 149–156 (2008).
Dunkel, J. & Zaid, I. M. Noisy swimming at low Reynolds numbers. Phys. Rev. E 80, 021903 (2009).
Acknowledgements
Funding support by the National Science Foundation (Grant Nos. 1830958 and 1931292 to O.S.P. and Grant Nos. 1614863 and 1951600 to Y.-N.Y.) is gratefully acknowledged. Y.-N.Y. acknowledges support from Flatiron Institute, part of Simons Foundation. A.C.H.T. acknowledges funding support from the Croucher Foundation. Z.Z. and O.S.P. acknowledge the use of computational resources at the WAVE computing facility (enabled by the E.L. Wiegand Foundation) at Santa Clara University. We also thank Yi Fang for useful discussion.
Author information
Authors and Affiliations
Contributions
Z.Z., Y.L., Y.-N.Y., O.S.P., and A.C.H.T. designed research; Z.Z., Y.L., Y.-N.Y., O.S.P., and A.C.H.T. performed research; Z.Z., Y.L., Y.-N.Y., O.S.P., and A.C.H.T. analyzed data; and Z.Z., Y.L., Y.-N.Y., O.S.P., and A.C.H.T. wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Physics thanks Giovanni Volpe and the other, anonymous, reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zou, Z., Liu, Y., Young, YN. et al. Gait switching and targeted navigation of microswimmers via deep reinforcement learning. Commun Phys 5, 158 (2022). https://doi.org/10.1038/s42005-022-00935-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42005-022-00935-x
This article is cited by
-
Drag force on a microrobot propelled through blood
Communications Physics (2024)
-
Machine learning for micro- and nanorobots
Nature Machine Intelligence (2024)
-
Learning to cooperate for low-Reynolds-number swimming: a model problem for gait coordination
Scientific Reports (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.