Soaring birds often rely on ascending thermal plumes (thermals) in the atmosphere as they search for prey or migrate across large distances1,2,3,4. The landscape of convective currents is rugged and shifts on timescales of a few minutes as thermals constantly form, disintegrate or are transported away by the wind5,6. How soaring birds find and navigate thermals within this complex landscape is unknown. Reinforcement learning7 provides an appropriate framework in which to identify an effective navigational strategy as a sequence of decisions made in response to environmental cues. Here we use reinforcement learning to train a glider in the field to navigate atmospheric thermals autonomously. We equipped a glider of two-metre wingspan with a flight controller that precisely controlled the bank angle and pitch, modulating these at intervals with the aim of gaining as much lift as possible. A navigational strategy was determined solely from the glider’s pooled experiences, collected over several days in the field. The strategy relies on on-board methods to accurately estimate the local vertical wind accelerations and the roll-wise torques on the glider, which serve as navigational cues. We establish the validity of our learned flight policy through field experiments, numerical simulations and estimates of the noise in measurements caused by atmospheric turbulence. Our results highlight the role of vertical wind accelerations and roll-wise torques as effective mechanosensory cues for soaring birds and provide a navigational strategy that is directly applicable to the development of autonomous soaring vehicles.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Newton, I. Migration Ecology of Soaring Birds 1st edn (Elsevier, Amsterdam, 2008).
Shamoun-Baranes, J., Leshem, Y., Yom-tov, Y. & Liechti, O. Differential use of thermal convection by soaring birds over central Israel. Condor 105, 208–218 (2003).
Weimerskirch, H., Bishop, C., Jeanniard-du-Dot, T., Prudor, A. & Sachs, G. Frigate birds track atmospheric conditions over months-long transoceanic flights. Science 353, 74–78 (2016).
Pennycuick, C. J. Thermal soaring compared in three dissimilar tropical bird species, Fregata magnificens, Pelecanus occidentals and Coragyps atratus. J. Exp. Biol. 102, 307–325 (1983).
Garrat, J. R. The Atmospheric Boundary Layer (Cambridge Univ. Press, Cambridge, 1994).
Lenschow, D. H. & Stephens, P. L. The role of thermals in the atmospheric boundary layer. Boundary-Layer Meteorol. 19, 509–532 (1980).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 1st edn (MIT Press, Cambridge, 1998).
Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Kim, H. J., Jordan, M. I., Sastry, S. & Ng, A. in Advances in Neural Information Processing Systems Vol. 16 (eds Thrun, S. et al.) 799–806 (MIT Press, Cambridge, 2004).
Levine, S., Finn, C., Darrell, T. & Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1–40 (2016).
Allen, M. J. & Lin, V. Guidance and control of an autonomous soaring vehicle with flight test results. In 45th AIAA Aerospace Sciences Meeting and Exhibit 2007-867 (AIAA, 2007).
Edwards, D. J. Implementation details and flight test results of an autonomous soaring controller. In AIAA Guidance, Navigation and Control Conference and Exhibit 2008-7244 (AIAA, 2008).
Edwards, D. J. Autonomous Soaring: The Montague Cross Country Challenge. PhD thesis, North Carolina State Univ. (2010).
Ákos, Z., Nagy, M., Leven, S. & Vicsek, T. Thermal soaring flight of birds and unmanned aerial vehicles. Bioinspir. Biomim. 5, 045003 (2010).
Doncieux, S., Mouret, J. B. & Meyer, J.-A. Soaring behaviors in UAVs: ‘animat’ design methodology and current results. In 3rd US–European Competition and Workshop on Micro Air Vehicle Systems (MAV07) and European Micro Air Vehicle Conference and Flight Competition (EMAV2007) (2007); http://www.isir.upmc.fr/files/2007ACTI734.pdf.
Wharington, J. & Herszberg, I. Control of a high endurance unmanned aerial vehicle. In 21st Congress of International Council of the Aeronautical Sciences 98-3.7.1 (ICAS, 1998).
Chung, J. J., Lawrance, N. R. J. & Sukkarieh, S. Learning to soar: resource-constrained exploration in reinforcement learning. Int. J. Robot. Res. 34, 158–172 (2015).
Reddy, G., Celani, A., Sejnowski, T. & Vergassola, M. Learning to soar in turbulent environments. Proc. Natl Acad. Sci. USA 113, E4877–E4884 (2016).
Yeung, P. K. & Pope, S. B. Lagrangian statistics from direct numerical simulations of isotropic turbulence. J. Fluid Mech. 207, 531–586 (1989).
Voth, G. A., La Porta, A., Crawford, A. M., Alexander, J. & Bodenschatz, E. Measurement of particle accelerations in fully developed turbulence. J. Fluid Mech. 469, 121–160 (2002).
Tennekes, H. & Lumley, J. L. A First Course in Turbulence (MIT Press, Cambridge, 1972).
Reichmann, H. Cross-Country Soaring (Thomson Publications, Santa Monica, 1988).
Ng, A. Y., Harada, D. & Russell, S. J. Policy invariance under reward transformations: theory and application to reward shaping. In Proc. 16th International Conference on Machine Learning (eds Bratko, I. & Dzeroski, S.) 278–287 (Morgan Kaufmann, San Francisco, 1999).
MacCready, P. B. J. Optimum airspeed selector. Soaring 1958, 10–11 (1958).
Horvitz, N. et al. The gliding speed of migrating birds: slow and safe or fast and risky? Ecol. Lett. 17, 670–679 (2014).
Cochrane, J. H. MacCready theory with uncertain lift and limited altitude. Tech. Soaring 23, 88–96 (1999).
Frisch, U. Turbulence: The Legacy of A. N. Kolmogorov (Cambridge Univ. Press, Cambridge, 1995).
This work was supported by Simons Foundation grant 340106 (to M.V.) and NSF grant NCS-FO-1735004 (to T.J.S.).
Nature thanks M. Chertkov and the other anonymous reviewer(s) for their contribution to the peer review of this work.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
The three-dimensional view and top view are shown of the glider’s trajectory as it executes the learned strategy for thermals (labelled ‘s’) or a random policy that takes actions with equal probability (labelled ‘r’). The trajectories are coloured according to the instantaneous vertical ground velocity uz. The green (red) dot shows the start (end) point of the trajectory. Trajectories s1, s2 and r1 last for 3 min each, whereas s3 lasts for about 8 min.
The forces on a glider and the definitions of the various angles that determine the glider’s motion.
a, Sample trajectory of a glider’s pitch and its vertical velocity with respect to ground (uz) in a case in which the feedback control over the pitch is reduced in order to exaggerate the pitch oscillations. The blue line shows the measured uz, and the orange line is uz obtained after subtracting the contributions from longitudinal motions of the glider (see Supplementary Information). b, The blue line shows the average change in uz when a particular action is taken (labelled above each panel), averaged over n 3-s intervals. The 13 panels correspond to the 13 possible bank angle changes from the angles 0°, ±15° and ±30° by increasing, decreasing the bank angle by 15° or keeping the same angle. The green dashed line shows the prediction from the model whereas the orange line is the estimated wz. The axis on the right shows the averaged pitch (red dashed line).
Extended Data Fig. 4 The estimated vertical wind acceleration is unbiased after accounting for the glider’s longitudinal motion.
a, The averaged vertical wind acceleration az in units of its standard deviation. az, plotted as in Extended Data Fig. 3b, is shown in orange with (blue line) and without (orange line) accounting for the glider’s longitudinal motions. The axis on the right shows the airspeed (green dashed line). b, Probability density functions (PDFs) of az for the different bank angle changes. The black dashed line shows the median.
Extended Data Fig. 5 The estimated roll-wise torque is unbiased after accounting for the effects of feedback control and glider aerodynamics.
a, The averaged evolution of the bank angle shown as in Extended Data Fig. 3b. The blue line shows the measured bank angle and the dashed orange line shows the best-fit line obtained from simultaneously fitting the 13 blue curves to the prediction (see Supplementary Information). b, PDFs of the roll-wise torque ω (in units of its standard deviation) for the different bank angle changes. The black dashed line shows the median value.
The root-mean-square vertical wind velocity measured in the field is pooled from about 240 3-min trials collected over 9 days. The dashed red line shows the threshold criterion imposed when measuring the performance of the strategy in the field (see Methods).
This file contains: (1) on-board estimation of the navigational cues; (2) reward shaping and policy invariance; and (3) noisy gradient sensing in the turbulent atmospheric boundary layer.
About this article
Cite this article
Reddy, G., Wong-Ng, J., Celani, A. et al. Glider soaring via reinforcement learning in the field. Nature 562, 236–239 (2018). https://doi.org/10.1038/s41586-018-0533-0
- Bank Angle
- Vertical Wind Velocity
- Mechanosensory Cues
- Flight Controller
- Thermal Soaring
IET Cyber-Systems and Robotics (2020)
Nature Machine Intelligence (2020)
Physical Review Fluids (2020)
Journal of Turbulence (2020)
Annual Review of Fluid Mechanics (2020)