Supervised machine learning for analysing spectra of exoplanetary atmospheres



The use of machine learning is becoming ubiquitous in astronomy1,2,3, but remains rare in the study of the atmospheres of exoplanets. Given the spectrum of an exoplanetary atmosphere, a multi-parameter space is swept through in real time to find the best-fit model4,5,6. Known as atmospheric retrieval, this technique originates in the Earth and planetary sciences7. Such methods are very time-consuming, and by necessity there is a compromise between physical and chemical realism and computational feasibility. Machine learning has previously been used to determine which molecules to include in the model, but the retrieval itself was still performed using standard methods8. Here, we report an adaptation of the ‘random forest’ method of supervised machine learning9,10, trained on a precomputed grid of atmospheric models, which retrieves full posterior distributions of the abundances of molecules and the cloud opacity. The use of a precomputed grid allows a large part of the computational burden to be shifted offline. We demonstrate our technique on a transmission spectrum of the hot gas-giant exoplanet WASP-12b using a five-parameter model (temperature, a constant cloud opacity and the volume mixing ratios or relative abundances of molecules of water, ammonia and hydrogen cyanide)11. We obtain results consistent with the standard nested-sampling retrieval method. We also estimate the sensitivity of the measured spectrum to the model parameters, and we are able to quantify the information content of the spectrum. Our method can be straightforwardly applied using more sophisticated atmospheric models to interpret an ensemble of spectra without having to retrain the random forest.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Posterior distributions of the volume mixing ratios, temperature and cloud opacity obtained from the machine-learning retrieval analysis of the WFC3 transmission spectrum of WASP-12b.
Fig. 2: Posterior distributions of the volume mixing ratios, temperature and cloud opacity obtained from the nested-sampling retrieval.
Fig. 3: True versus random forest predicted values of the five parameters in our transmission spectrum model.
Fig. 4: Feature importance plots associated with the machine-learning retrieval analysis of the WFC3 transmission spectrum of WASP-12b.


  1. 1.

    Banerji, M. et al. Galaxy Zoo: reproducing galaxy morphologies via machine learning. Mon. Not. R. Astron. Soc. 406, 342–353 (2010).

    ADS  Article  Google Scholar 

  2. 2.

    Graff, P., Feroz, F., Hobson, M. P. & Lasenby, A. SKYNET: an efficient and robust neural network training tool for machine learning in astronomy. Mon. Not. R. Astron. Soc. 441, 1741–1759 (2014).

    ADS  Article  Google Scholar 

  3. 3.

    Pearson, K. A., Palafox, L. & Griffith, C. A. Searching for exoplanets using artificial intelligence. Mon. Not. R. Astron. Soc. 474, 478–491 (2018).

    ADS  Article  Google Scholar 

  4. 4.

    Madhusudhan, N. & Seager, S. A temperature and abundance retrieval method for exoplanet atmospheres. Astrophys. J. 707, 24–39 (2009).

    ADS  Article  Google Scholar 

  5. 5.

    Benneke, B. & Seager, S. Atmospheric retrieval for super-Earths: uniquely constraining the atmospheric composition with transmission spectroscopy. Astrophys. J. 753, 100 (2012).

    ADS  Article  Google Scholar 

  6. 6.

    Line, M. R. et al. Information content of exoplanetary transit spectra: an initial look. Astrophys. J. 749, 93 (2012).

    ADS  Article  Google Scholar 

  7. 7.

    Rodgers, C. D. Inverse Methods for Atmospheric Sounding: Theory and Practice (World Scientific, Singapore, 2000).

  8. 8.

    Waldmann, I. P. Dreaming of atmospheres. Astrophys. J. 820, 107 (2016).

    ADS  Article  Google Scholar 

  9. 9.

    Ho, T. K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998).

    Article  Google Scholar 

  10. 10.

    Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  MATH  Google Scholar 

  11. 11.

    Heng, K. & Kitzmann, D. The theory of transmission spectra revisited: a semi-analytical method for interpreting WFC3 data and an unresolved challenge. Mon. Not. R. Astron. Soc. 470, 2972–2981 (2017).

    ADS  Article  Google Scholar 

  12. 12.

    Kreidberg, L. et al. A detection of water in the transmission spectrum of the hot Jupiter WASP-12b and implications for its atmospheric composition. Astrophys. J. 814, 66 (2015).

    ADS  Article  Google Scholar 

  13. 13.

    Breiman, L., Friedman, J., Stone, C. J. & Olshen, R. A. Classification and Regression Trees (Chapman & Hall/CRC, Boca Raton, 1984).

  14. 14.

    Kelleher, J. D., Mac Namee, B. & D’Arcy, A. Fundamentals of Machine Learning for Predictive Data Analytics (MIT Press, Cambridge, MA, 2015).

  15. 15.

    Criminisi, A., Shotton, J. & Konukoglu, E. Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning Technical Report TR-2011-114 (Microsoft Research, 2011).

  16. 16.

    Trotta, R. Bayes in the sky: Bayesian inference and model selection in cosmology. Contemp. Phys. 49, 71–104 (2008).

    ADS  Article  Google Scholar 

  17. 17.

    Skilling, J. et al. Nested sampling for general Bayesian computation. Bayesian Anal. 1, 833–859 (2006).

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Feroz, F. & Hobson, M. P. Multimodal nested sampling: an efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses. Mon. Not. R. Astron. Soc. 384, 449–463 (2008).

    ADS  Article  Google Scholar 

  19. 19.

    Batalha, N. E. & Line, M. R. Information content analysis for selection of optimal JWST observing modes for transiting exoplanet atmospheres. Astron. J. 153, 151 (2017).

    ADS  Article  Google Scholar 

  20. 20.

    Howe, A. R., Burrows, A. & Deming, D. An information-theoretic approach to optimize JWST observations and retrievals of transiting exoplanet atmospheres. Astrophys. J. 835, 96 (2017).

    ADS  Article  Google Scholar 

  21. 21.

    Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, New York, 2001).

  22. 22.

    Sznitman, R., Becker, C., Fleuret, F. & Fua, P. Fast object detection with entropy-driven evaluation. In Proc. 2013 IEEE Conference on Computer Vision and Pattern Recognition 3270–3277 (IEEE, 2013).

  23. 23.

    Zikic, D., Glocker, B. & Criminisi, A. Encoding atlases by randomized classification forests for efficient multi-atlas label propagation. Med. Image Anal. 18, 1262–1273 (2014).

    Article  Google Scholar 

  24. 24.

    Rieke, N. et al. Surgical tool tracking and pose estimation in retinal microsurgery. In Proc. 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Navab, N. et al.) 266–273 (Lecture Notes in Computer Science 9349, Springer, 2015).

  25. 25.

    Zhang, L., Varadarajan, J., Suganthan, P. N., Ahuja, N. & Moulin, P. Robust visual tracking using oblique random forests. Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition, 5825–5834 (IEEE, 2017).

  26. 26.

    Greene, T. P. et al. Characterizing transiting exoplanet atmospheres with JWST. Astrophys. J. 817, 17 (2016).

    ADS  Article  Google Scholar 

  27. 27.

    Marley, M. S. et al. Atmospheric, evolutionary, and spectral models of the brown dwarf Gliese 229 B. Science 272, 1919–1921 (1996).

    ADS  Article  Google Scholar 

  28. 28.

    Burrows, A. et al. A non-gray theory of extrasolar giant planets and brown dwarfs. Astrophys. J. 491, 856–875 (1997).

    ADS  Article  Google Scholar 

  29. 29.

    Baraffe, I., Chabrier, G., Allard, F. & Hauschildt, P. H. Evolutionary models for low-mass stars and brown dwarfs: uncertainties and limits at very young ages. Astron. Astrophys. 382, 563–572 (2002).

    ADS  Article  Google Scholar 

  30. 30.

    Kitzmann, D. & Heng, K. Optical properties of potential condensates in exoplanetary atmospheres. Mon. Not. R. Astron. Soc. 475, 94–107 (2018).

    ADS  Article  Google Scholar 

  31. 31.

    Feroz, F., Hobson, M. P. & Bridges, M. MultiNest: an efficient and robust Bayesian inference tool for cosmology and particle physics. Mon. Not. R. Astron. Soc. 398, 1601–1614 (2009).

    ADS  Article  Google Scholar 

  32. 32.

    Buchner, J. et al. X-ray spectral modelling of the AGN obscuring region in the CDFS: Bayesian model selection and catalogue. Astron. Astrophys. 564, A125 (2014).

    Article  Google Scholar 

  33. 33.

    Barber, R. J., Tennyson, J., Harris, G. J. & Tolchenov, R. N. A high-accuracy computed water line list. Mon. Not. R. Astron. Soc. 368, 1087–1094 (2006).

    ADS  Article  Google Scholar 

  34. 34.

    Barber, R. J. et al. ExoMol line lists—III. An improved hot rotation-vibration line list for HCN and HNC. Mon. Not. R. Astron. Soc. 437, 1828–1835 (2014).

    ADS  Article  Google Scholar 

  35. 35.

    Yurchenko, S. N., Barber, R. J. & Tennyson, J. A variationally computed line list for hot NH3. Mon. Not. R. Astron. Soc. 413, 1828–1834 (2011).

    ADS  Article  Google Scholar 

  36. 36.

    Yurchenko, S. N., Tennyson, J., Barber, R. J. & Thiel, W. Vibrational transition moments of CH4 from first principles. J. Mol. Spectrosc. 291, 69–76 (2013).

    ADS  Article  Google Scholar 

  37. 37.

    Yurchenko, S. N. & Tennyson, J. ExoMol line lists—IV. The rotation-vibration spectrum of methane up to 1500 K. Mon. Not. R. Astron. Soc. 440, 1649–1661 (2014).

    ADS  Article  Google Scholar 

  38. 38.

    Rothman, L. S. et al. The HITRAN molecular spectroscopic database and HAWKS (HITRAN atmospheric workstation): 1996 edition. J. Quant. Spectrosc. Radiat. Transf. 60, 665–710 (1998).

    ADS  Article  Google Scholar 

  39. 39.

    Grimm, S. L. & Heng, K. HELIOS-K: an ultrafast, open-source opacity calculator for radiative transfer. Astrophys. J. 808, 182 (2015).

    ADS  Article  Google Scholar 

  40. 40.

    Hebb, L. et al. WASP-12b: the hottest transiting extrasolar planet yet discovered. Astrophys. J. 693, 1920–1928 (2009).

    ADS  Article  Google Scholar 

  41. 41.

    Burrows, A. & Sharp, C. M. Chemical equilibrium abundances in brown dwarf and extrasolar giant planet atmospheres. Astrophys. J. 512, 843–863 (1999).

    ADS  Article  Google Scholar 

  42. 42.

    Heng, K. & Tsai, S.-M. Analytical models of exoplanetary atmospheres. III. Gaseous C-H-O-N chemistry with nine molecules. Astrophys. J. 829, 104 (2016).

    ADS  Article  Google Scholar 

  43. 43.

    Line, M. R. et al. A systematic retrieval analysis of secondary eclipse spectra. I. A comparison of atmospheric retrieval techniques. Astrophys. J. 775, 137 (2013).

    ADS  Article  Google Scholar 

  44. 44.

    Waldmann, I. P. et al. Tau-REx I: a next generation retrieval code for exoplanetary atmospheres. Astrophys. J. 802, 107 (2015).

    ADS  Article  Google Scholar 

  45. 45.

    Lavie, B. et al. HELIOS–RETRIEVAL: an open-source, nested sampling atmospheric retrieval code; application to the HR 8799 exoplanets and inferred constraints for planet formation. Astron. J. 154, 91 (2017).

    ADS  Article  Google Scholar 

  46. 46.

    Sharp, C. M. & Burrows, A. Atomic and molecular opacities for brown dwarf and giant planet atmospheres. Astrophys. J. 168, 140–166 (2007).

    ADS  Article  Google Scholar 

Download references


We acknowledge partial financial support from the Center for Space and Habitability (P.M.-N. and K.H.), the University of Bern International 2021 PhD Fellowship (C.F.), the PlanetS National Center of Competence in Research (K.H.), the Swiss National Science Foundation (R.S., C.F. and K.H.), the European Research Council via a Consolidator Grant (K.H.) and the Swiss-based MERAC Foundation (K.H.).

Author contributions

P.M.-N. led the development of computer codes used for this study, performed the machine-learning-related calculations, participated in the experimental design and made the majority of the figures. C.F. computed the grid of atmospheric models used as the training set, participated in the experimental design and performed the nested-sampling retrievals. R.S. co-led the scientific vision and experimental design and co-wrote the manuscript. K.H. co-led the scientific vision and experimental design and led the writing and typesetting of the manuscript.

Author information



Corresponding author

Correspondence to Kevin Heng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Table 1, Supplementary Figures 1–3

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Márquez-Neila, P., Fisher, C., Sznitman, R. et al. Supervised machine learning for analysing spectra of exoplanetary atmospheres. Nat Astron 2, 719–724 (2018).

Download citation

Further reading


Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing