Glider soaring via reinforcement learning in the field

Reddy, Gautam; Wong-Ng, Jerome; Celani, Antonio; Sejnowski, Terrence J.; Vergassola, Massimo

doi:10.1038/s41586-018-0533-0

Letter
Published: 19 September 2018

Glider soaring via reinforcement learning in the field

Gautam Reddy¹^na1,
Jerome Wong-Ng¹^na1,
Antonio Celani²,
Terrence J. Sejnowski^3,4 &
…
Massimo Vergassola¹

Nature volume 562, pages 236–239 (2018)Cite this article

15k Accesses
103 Citations
405 Altmetric
Metrics details

Subjects

Abstract

Soaring birds often rely on ascending thermal plumes (thermals) in the atmosphere as they search for prey or migrate across large distances^1,2,3,4. The landscape of convective currents is rugged and shifts on timescales of a few minutes as thermals constantly form, disintegrate or are transported away by the wind^5,6. How soaring birds find and navigate thermals within this complex landscape is unknown. Reinforcement learning⁷ provides an appropriate framework in which to identify an effective navigational strategy as a sequence of decisions made in response to environmental cues. Here we use reinforcement learning to train a glider in the field to navigate atmospheric thermals autonomously. We equipped a glider of two-metre wingspan with a flight controller that precisely controlled the bank angle and pitch, modulating these at intervals with the aim of gaining as much lift as possible. A navigational strategy was determined solely from the glider’s pooled experiences, collected over several days in the field. The strategy relies on on-board methods to accurately estimate the local vertical wind accelerations and the roll-wise torques on the glider, which serve as navigational cues. We establish the validity of our learned flight policy through field experiments, numerical simulations and estimates of the noise in measurements caused by atmospheric turbulence. Our results highlight the role of vertical wind accelerations and roll-wise torques as effective mechanosensory cues for soaring birds and provide a navigational strategy that is directly applicable to the development of autonomous soaring vehicles.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Soaring in the field by using turbulent navigational cues.**

**Fig. 2: Convergence of the learning algorithm and the learned strategy for navigating thermal plumes.**

**Fig. 3: Performance of the learned strategy and its dependence on the wingspan.**

Autonomous navigation of stratospheric balloons using reinforcement learning

Article 02 December 2020

Marc G. Bellemare, Salvatore Candido, … Ziyu Wang

Optimization of avian perching manoeuvres

Article Open access 29 June 2022

Marco KleinHeerenbrink, Lydia A. France, … Graham K. Taylor

Learning efficient navigation in vortical flow fields

Article Open access 08 December 2021

Peter Gunnarson, Ioannis Mandralis, … John O. Dabiri

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Newton, I. Migration Ecology of Soaring Birds 1st edn (Elsevier, Amsterdam, 2008).
Google Scholar
Shamoun-Baranes, J., Leshem, Y., Yom-tov, Y. & Liechti, O. Differential use of thermal convection by soaring birds over central Israel. Condor 105, 208–218 (2003).
Article Google Scholar
Weimerskirch, H., Bishop, C., Jeanniard-du-Dot, T., Prudor, A. & Sachs, G. Frigate birds track atmospheric conditions over months-long transoceanic flights. Science 353, 74–78 (2016).
Article ADS CAS Google Scholar
Pennycuick, C. J. Thermal soaring compared in three dissimilar tropical bird species, Fregata magnificens, Pelecanus occidentals and Coragyps atratus. J. Exp. Biol. 102, 307–325 (1983).
Google Scholar
Garrat, J. R. The Atmospheric Boundary Layer (Cambridge Univ. Press, Cambridge, 1994).
Google Scholar
Lenschow, D. H. & Stephens, P. L. The role of thermals in the atmospheric boundary layer. Boundary-Layer Meteorol. 19, 509–532 (1980).
Article ADS Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 1st edn (MIT Press, Cambridge, 1998).
MATH Google Scholar
Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995).
Article Google Scholar
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Article ADS CAS Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article ADS CAS Google Scholar
Kim, H. J., Jordan, M. I., Sastry, S. & Ng, A. in Advances in Neural Information Processing Systems Vol. 16 (eds Thrun, S. et al.) 799–806 (MIT Press, Cambridge, 2004).
Levine, S., Finn, C., Darrell, T. & Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1–40 (2016).
MathSciNet MATH Google Scholar
Allen, M. J. & Lin, V. Guidance and control of an autonomous soaring vehicle with flight test results. In 45th AIAA Aerospace Sciences Meeting and Exhibit 2007-867 (AIAA, 2007).
Edwards, D. J. Implementation details and flight test results of an autonomous soaring controller. In AIAA Guidance, Navigation and Control Conference and Exhibit 2008-7244 (AIAA, 2008).
Edwards, D. J. Autonomous Soaring: The Montague Cross Country Challenge. PhD thesis, North Carolina State Univ. (2010).
Ákos, Z., Nagy, M., Leven, S. & Vicsek, T. Thermal soaring flight of birds and unmanned aerial vehicles. Bioinspir. Biomim. 5, 045003 (2010).
Article ADS Google Scholar
Doncieux, S., Mouret, J. B. & Meyer, J.-A. Soaring behaviors in UAVs: ‘animat’ design methodology and current results. In 3rd US–European Competition and Workshop on Micro Air Vehicle Systems (MAV07) and European Micro Air Vehicle Conference and Flight Competition (EMAV2007) (2007); http://www.isir.upmc.fr/files/2007ACTI734.pdf.
Wharington, J. & Herszberg, I. Control of a high endurance unmanned aerial vehicle. In 21st Congress of International Council of the Aeronautical Sciences 98-3.7.1 (ICAS, 1998).
Chung, J. J., Lawrance, N. R. J. & Sukkarieh, S. Learning to soar: resource-constrained exploration in reinforcement learning. Int. J. Robot. Res. 34, 158–172 (2015).
Article Google Scholar
Reddy, G., Celani, A., Sejnowski, T. & Vergassola, M. Learning to soar in turbulent environments. Proc. Natl Acad. Sci. USA 113, E4877–E4884 (2016).
Article ADS CAS Google Scholar
Yeung, P. K. & Pope, S. B. Lagrangian statistics from direct numerical simulations of isotropic turbulence. J. Fluid Mech. 207, 531–586 (1989).
Article ADS MathSciNet Google Scholar
Voth, G. A., La Porta, A., Crawford, A. M., Alexander, J. & Bodenschatz, E. Measurement of particle accelerations in fully developed turbulence. J. Fluid Mech. 469, 121–160 (2002).
Article ADS Google Scholar
Tennekes, H. & Lumley, J. L. A First Course in Turbulence (MIT Press, Cambridge, 1972).
MATH Google Scholar
Reichmann, H. Cross-Country Soaring (Thomson Publications, Santa Monica, 1988).
Google Scholar
Ng, A. Y., Harada, D. & Russell, S. J. Policy invariance under reward transformations: theory and application to reward shaping. In Proc. 16th International Conference on Machine Learning (eds Bratko, I. & Dzeroski, S.) 278–287 (Morgan Kaufmann, San Francisco, 1999).
MacCready, P. B. J. Optimum airspeed selector. Soaring 1958, 10–11 (1958).
Google Scholar
Horvitz, N. et al. The gliding speed of migrating birds: slow and safe or fast and risky? Ecol. Lett. 17, 670–679 (2014).
Article Google Scholar
Cochrane, J. H. MacCready theory with uncertain lift and limited altitude. Tech. Soaring 23, 88–96 (1999).
Google Scholar
Frisch, U. Turbulence: The Legacy of A. N. Kolmogorov (Cambridge Univ. Press, Cambridge, 1995).
Book Google Scholar

Download references

Acknowledgements

This work was supported by Simons Foundation grant 340106 (to M.V.) and NSF grant NCS-FO-1735004 (to T.J.S.).

Reviewer information

Nature thanks M. Chertkov and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

These authors contributed equally: Gautam Reddy, Jerome Wong-Ng

Authors and Affiliations

Department of Physics, University of California, San Diego, La Jolla, CA, USA
Gautam Reddy, Jerome Wong-Ng & Massimo Vergassola
The Abdus Salam International Center for Theoretical Physics, Trieste, Italy
Antonio Celani
The Salk Institute for Biological Studies, La Jolla, CA, USA
Terrence J. Sejnowski
Division of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
Terrence J. Sejnowski

Authors

Gautam Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Jerome Wong-Ng
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Celani
View author publications
You can also search for this author in PubMed Google Scholar
Terrence J. Sejnowski
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Vergassola
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors were involved in designing the study and drafting the final manuscript. G.R. and J.W.N. performed the experiments and analysed the data. G.R., A.C. and M.V. contributed to the theoretical results.

Corresponding author

Correspondence to Massimo Vergassola.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Sample trajectories obtained in the field.

The three-dimensional view and top view are shown of the glider’s trajectory as it executes the learned strategy for thermals (labelled ‘s’) or a random policy that takes actions with equal probability (labelled ‘r’). The trajectories are coloured according to the instantaneous vertical ground velocity u_z. The green (red) dot shows the start (end) point of the trajectory. Trajectories s1, s2 and r1 last for 3 min each, whereas s3 lasts for about 8 min.

Extended Data Fig. 2 Force–body diagram of a glider.

The forces on a glider and the definitions of the various angles that determine the glider’s motion.

Extended Data Fig. 3 Modelling the longitudinal motion of the glider.

a, Sample trajectory of a glider’s pitch and its vertical velocity with respect to ground (u_z) in a case in which the feedback control over the pitch is reduced in order to exaggerate the pitch oscillations. The blue line shows the measured u_z, and the orange line is u_z obtained after subtracting the contributions from longitudinal motions of the glider (see Supplementary Information). b, The blue line shows the average change in u_z when a particular action is taken (labelled above each panel), averaged over n 3-s intervals. The 13 panels correspond to the 13 possible bank angle changes from the angles 0°, ±15° and ±30° by increasing, decreasing the bank angle by 15° or keeping the same angle. The green dashed line shows the prediction from the model whereas the orange line is the estimated w_z. The axis on the right shows the averaged pitch (red dashed line).

Extended Data Fig. 4 The estimated vertical wind acceleration is unbiased after accounting for the glider’s longitudinal motion.

a, The averaged vertical wind acceleration a_z in units of its standard deviation. a_z, plotted as in Extended Data Fig. 3b, is shown in orange with (blue line) and without (orange line) accounting for the glider’s longitudinal motions. The axis on the right shows the airspeed (green dashed line). b, Probability density functions (PDFs) of a_z for the different bank angle changes. The black dashed line shows the median.

Extended Data Fig. 5 The estimated roll-wise torque is unbiased after accounting for the effects of feedback control and glider aerodynamics.

a, The averaged evolution of the bank angle shown as in Extended Data Fig. 3b. The blue line shows the measured bank angle and the dashed orange line shows the best-fit line obtained from simultaneously fitting the 13 blue curves to the prediction (see Supplementary Information). b, PDFs of the roll-wise torque ω (in units of its standard deviation) for the different bank angle changes. The black dashed line shows the median value.

Extended Data Fig. 6 The distribution of the strength of vertical currents observed in the field.

The root-mean-square vertical wind velocity measured in the field is pooled from about 240 3-min trials collected over 9 days. The dashed red line shows the threshold criterion imposed when measuring the performance of the strategy in the field (see Methods).

Extended Data Table 1 Parameter values

Full size table

Supplementary information

Supplementary Information

This file contains: (1) on-board estimation of the navigational cues; (2) reward shaping and policy invariance; and (3) noisy gradient sensing in the turbulent atmospheric boundary layer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reddy, G., Wong-Ng, J., Celani, A. et al. Glider soaring via reinforcement learning in the field. Nature 562, 236–239 (2018). https://doi.org/10.1038/s41586-018-0533-0

Download citation

Received: 20 February 2018
Accepted: 20 July 2018
Published: 19 September 2018
Issue Date: 11 October 2018
DOI: https://doi.org/10.1038/s41586-018-0533-0

Keywords

This article is cited by

A deep reinforcement learning control approach for high-performance aircraft
- Agostino De Marco
- Paolo Maria D’Onza
- Sabato Manfredi
Nonlinear Dynamics (2023)
Machine learning for flow-informed aerodynamic control in turbulent wind conditions
- Peter I. Renn
- Morteza Gharib
Communications Engineering (2022)
Gait switching and targeted navigation of microswimmers via deep reinforcement learning
- Zonghao Zou
- Yuexin Liu
- Alan C. H. Tsang
Communications Physics (2022)
Hydrodynamics can determine the optimal route for microswimmer navigation
- Abdallah Daddi-Moussa-Ider
- Hartmut Löwen
- Benno Liebchen
Communications Physics (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.