Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Supervised machine learning for analysing spectra of exoplanetary atmospheres



The use of machine learning is becoming ubiquitous in astronomy1,2,3, but remains rare in the study of the atmospheres of exoplanets. Given the spectrum of an exoplanetary atmosphere, a multi-parameter space is swept through in real time to find the best-fit model4,5,6. Known as atmospheric retrieval, this technique originates in the Earth and planetary sciences7. Such methods are very time-consuming, and by necessity there is a compromise between physical and chemical realism and computational feasibility. Machine learning has previously been used to determine which molecules to include in the model, but the retrieval itself was still performed using standard methods8. Here, we report an adaptation of the ‘random forest’ method of supervised machine learning9,10, trained on a precomputed grid of atmospheric models, which retrieves full posterior distributions of the abundances of molecules and the cloud opacity. The use of a precomputed grid allows a large part of the computational burden to be shifted offline. We demonstrate our technique on a transmission spectrum of the hot gas-giant exoplanet WASP-12b using a five-parameter model (temperature, a constant cloud opacity and the volume mixing ratios or relative abundances of molecules of water, ammonia and hydrogen cyanide)11. We obtain results consistent with the standard nested-sampling retrieval method. We also estimate the sensitivity of the measured spectrum to the model parameters, and we are able to quantify the information content of the spectrum. Our method can be straightforwardly applied using more sophisticated atmospheric models to interpret an ensemble of spectra without having to retrain the random forest.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Posterior distributions of the volume mixing ratios, temperature and cloud opacity obtained from the machine-learning retrieval analysis of the WFC3 transmission spectrum of WASP-12b.
Fig. 2: Posterior distributions of the volume mixing ratios, temperature and cloud opacity obtained from the nested-sampling retrieval.
Fig. 3: True versus random forest predicted values of the five parameters in our transmission spectrum model.
Fig. 4: Feature importance plots associated with the machine-learning retrieval analysis of the WFC3 transmission spectrum of WASP-12b.

Similar content being viewed by others


  1. Banerji, M. et al. Galaxy Zoo: reproducing galaxy morphologies via machine learning. Mon. Not. R. Astron. Soc. 406, 342–353 (2010).

    Article  ADS  Google Scholar 

  2. Graff, P., Feroz, F., Hobson, M. P. & Lasenby, A. SKYNET: an efficient and robust neural network training tool for machine learning in astronomy. Mon. Not. R. Astron. Soc. 441, 1741–1759 (2014).

    Article  ADS  Google Scholar 

  3. Pearson, K. A., Palafox, L. & Griffith, C. A. Searching for exoplanets using artificial intelligence. Mon. Not. R. Astron. Soc. 474, 478–491 (2018).

    Article  ADS  Google Scholar 

  4. Madhusudhan, N. & Seager, S. A temperature and abundance retrieval method for exoplanet atmospheres. Astrophys. J. 707, 24–39 (2009).

    Article  ADS  Google Scholar 

  5. Benneke, B. & Seager, S. Atmospheric retrieval for super-Earths: uniquely constraining the atmospheric composition with transmission spectroscopy. Astrophys. J. 753, 100 (2012).

    Article  ADS  Google Scholar 

  6. Line, M. R. et al. Information content of exoplanetary transit spectra: an initial look. Astrophys. J. 749, 93 (2012).

    Article  ADS  Google Scholar 

  7. Rodgers, C. D. Inverse Methods for Atmospheric Sounding: Theory and Practice (World Scientific, Singapore, 2000).

  8. Waldmann, I. P. Dreaming of atmospheres. Astrophys. J. 820, 107 (2016).

    Article  ADS  Google Scholar 

  9. Ho, T. K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998).

    Article  Google Scholar 

  10. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  MATH  Google Scholar 

  11. Heng, K. & Kitzmann, D. The theory of transmission spectra revisited: a semi-analytical method for interpreting WFC3 data and an unresolved challenge. Mon. Not. R. Astron. Soc. 470, 2972–2981 (2017).

    Article  ADS  Google Scholar 

  12. Kreidberg, L. et al. A detection of water in the transmission spectrum of the hot Jupiter WASP-12b and implications for its atmospheric composition. Astrophys. J. 814, 66 (2015).

    Article  ADS  Google Scholar 

  13. Breiman, L., Friedman, J., Stone, C. J. & Olshen, R. A. Classification and Regression Trees (Chapman & Hall/CRC, Boca Raton, 1984).

  14. Kelleher, J. D., Mac Namee, B. & D’Arcy, A. Fundamentals of Machine Learning for Predictive Data Analytics (MIT Press, Cambridge, MA, 2015).

  15. Criminisi, A., Shotton, J. & Konukoglu, E. Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning Technical Report TR-2011-114 (Microsoft Research, 2011).

  16. Trotta, R. Bayes in the sky: Bayesian inference and model selection in cosmology. Contemp. Phys. 49, 71–104 (2008).

    Article  ADS  Google Scholar 

  17. Skilling, J. et al. Nested sampling for general Bayesian computation. Bayesian Anal. 1, 833–859 (2006).

    Article  MathSciNet  MATH  Google Scholar 

  18. Feroz, F. & Hobson, M. P. Multimodal nested sampling: an efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses. Mon. Not. R. Astron. Soc. 384, 449–463 (2008).

    Article  ADS  Google Scholar 

  19. Batalha, N. E. & Line, M. R. Information content analysis for selection of optimal JWST observing modes for transiting exoplanet atmospheres. Astron. J. 153, 151 (2017).

    Article  ADS  Google Scholar 

  20. Howe, A. R., Burrows, A. & Deming, D. An information-theoretic approach to optimize JWST observations and retrievals of transiting exoplanet atmospheres. Astrophys. J. 835, 96 (2017).

    Article  ADS  Google Scholar 

  21. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, New York, 2001).

  22. Sznitman, R., Becker, C., Fleuret, F. & Fua, P. Fast object detection with entropy-driven evaluation. In Proc. 2013 IEEE Conference on Computer Vision and Pattern Recognition 3270–3277 (IEEE, 2013).

  23. Zikic, D., Glocker, B. & Criminisi, A. Encoding atlases by randomized classification forests for efficient multi-atlas label propagation. Med. Image Anal. 18, 1262–1273 (2014).

    Article  Google Scholar 

  24. Rieke, N. et al. Surgical tool tracking and pose estimation in retinal microsurgery. In Proc. 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Navab, N. et al.) 266–273 (Lecture Notes in Computer Science 9349, Springer, 2015).

  25. Zhang, L., Varadarajan, J., Suganthan, P. N., Ahuja, N. & Moulin, P. Robust visual tracking using oblique random forests. Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition, 5825–5834 (IEEE, 2017).

  26. Greene, T. P. et al. Characterizing transiting exoplanet atmospheres with JWST. Astrophys. J. 817, 17 (2016).

    Article  ADS  Google Scholar 

  27. Marley, M. S. et al. Atmospheric, evolutionary, and spectral models of the brown dwarf Gliese 229 B. Science 272, 1919–1921 (1996).

    Article  ADS  Google Scholar 

  28. Burrows, A. et al. A non-gray theory of extrasolar giant planets and brown dwarfs. Astrophys. J. 491, 856–875 (1997).

    Article  ADS  Google Scholar 

  29. Baraffe, I., Chabrier, G., Allard, F. & Hauschildt, P. H. Evolutionary models for low-mass stars and brown dwarfs: uncertainties and limits at very young ages. Astron. Astrophys. 382, 563–572 (2002).

    Article  ADS  Google Scholar 

  30. Kitzmann, D. & Heng, K. Optical properties of potential condensates in exoplanetary atmospheres. Mon. Not. R. Astron. Soc. 475, 94–107 (2018).

    Article  ADS  Google Scholar 

  31. Feroz, F., Hobson, M. P. & Bridges, M. MultiNest: an efficient and robust Bayesian inference tool for cosmology and particle physics. Mon. Not. R. Astron. Soc. 398, 1601–1614 (2009).

    Article  ADS  Google Scholar 

  32. Buchner, J. et al. X-ray spectral modelling of the AGN obscuring region in the CDFS: Bayesian model selection and catalogue. Astron. Astrophys. 564, A125 (2014).

    Article  Google Scholar 

  33. Barber, R. J., Tennyson, J., Harris, G. J. & Tolchenov, R. N. A high-accuracy computed water line list. Mon. Not. R. Astron. Soc. 368, 1087–1094 (2006).

    Article  ADS  Google Scholar 

  34. Barber, R. J. et al. ExoMol line lists—III. An improved hot rotation-vibration line list for HCN and HNC. Mon. Not. R. Astron. Soc. 437, 1828–1835 (2014).

    Article  ADS  Google Scholar 

  35. Yurchenko, S. N., Barber, R. J. & Tennyson, J. A variationally computed line list for hot NH3. Mon. Not. R. Astron. Soc. 413, 1828–1834 (2011).

    Article  ADS  Google Scholar 

  36. Yurchenko, S. N., Tennyson, J., Barber, R. J. & Thiel, W. Vibrational transition moments of CH4 from first principles. J. Mol. Spectrosc. 291, 69–76 (2013).

    Article  ADS  Google Scholar 

  37. Yurchenko, S. N. & Tennyson, J. ExoMol line lists—IV. The rotation-vibration spectrum of methane up to 1500 K. Mon. Not. R. Astron. Soc. 440, 1649–1661 (2014).

    Article  ADS  Google Scholar 

  38. Rothman, L. S. et al. The HITRAN molecular spectroscopic database and HAWKS (HITRAN atmospheric workstation): 1996 edition. J. Quant. Spectrosc. Radiat. Transf. 60, 665–710 (1998).

    Article  ADS  Google Scholar 

  39. Grimm, S. L. & Heng, K. HELIOS-K: an ultrafast, open-source opacity calculator for radiative transfer. Astrophys. J. 808, 182 (2015).

    Article  ADS  Google Scholar 

  40. Hebb, L. et al. WASP-12b: the hottest transiting extrasolar planet yet discovered. Astrophys. J. 693, 1920–1928 (2009).

    Article  ADS  Google Scholar 

  41. Burrows, A. & Sharp, C. M. Chemical equilibrium abundances in brown dwarf and extrasolar giant planet atmospheres. Astrophys. J. 512, 843–863 (1999).

    Article  ADS  Google Scholar 

  42. Heng, K. & Tsai, S.-M. Analytical models of exoplanetary atmospheres. III. Gaseous C-H-O-N chemistry with nine molecules. Astrophys. J. 829, 104 (2016).

    Article  ADS  Google Scholar 

  43. Line, M. R. et al. A systematic retrieval analysis of secondary eclipse spectra. I. A comparison of atmospheric retrieval techniques. Astrophys. J. 775, 137 (2013).

    Article  ADS  Google Scholar 

  44. Waldmann, I. P. et al. Tau-REx I: a next generation retrieval code for exoplanetary atmospheres. Astrophys. J. 802, 107 (2015).

    Article  ADS  Google Scholar 

  45. Lavie, B. et al. HELIOS–RETRIEVAL: an open-source, nested sampling atmospheric retrieval code; application to the HR 8799 exoplanets and inferred constraints for planet formation. Astron. J. 154, 91 (2017).

    Article  ADS  Google Scholar 

  46. Sharp, C. M. & Burrows, A. Atomic and molecular opacities for brown dwarf and giant planet atmospheres. Astrophys. J. 168, 140–166 (2007).

    Article  ADS  Google Scholar 

Download references


We acknowledge partial financial support from the Center for Space and Habitability (P.M.-N. and K.H.), the University of Bern International 2021 PhD Fellowship (C.F.), the PlanetS National Center of Competence in Research (K.H.), the Swiss National Science Foundation (R.S., C.F. and K.H.), the European Research Council via a Consolidator Grant (K.H.) and the Swiss-based MERAC Foundation (K.H.).

Author contributions

P.M.-N. led the development of computer codes used for this study, performed the machine-learning-related calculations, participated in the experimental design and made the majority of the figures. C.F. computed the grid of atmospheric models used as the training set, participated in the experimental design and performed the nested-sampling retrievals. R.S. co-led the scientific vision and experimental design and co-wrote the manuscript. K.H. co-led the scientific vision and experimental design and led the writing and typesetting of the manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Kevin Heng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Table 1, Supplementary Figures 1–3

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Márquez-Neila, P., Fisher, C., Sznitman, R. et al. Supervised machine learning for analysing spectra of exoplanetary atmospheres. Nat Astron 2, 719–724 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing