A recurrent neural network for classification of unevenly sampled variable stars

Abstract

Astronomical surveys of celestial sources produce streams of noisy time series measuring flux versus time (‘light curves’). Unlike in many other physical domains, however, large (and source-specific) temporal gaps in data arise naturally due to intranight cadence choices as well as diurnal and seasonal constraints1,2,3,4,5. With nightly observations of millions of variable stars and transients from upcoming surveys4,6, efficient and accurate discovery and classification techniques on noisy, irregularly sampled data must be employed with minimal human-in-the-loop involvement. Machine learning for inference tasks on such data traditionally requires the laborious hand-coding of domain-specific numerical summaries of raw data (‘features’)7. Here, we present a novel unsupervised autoencoding recurrent neural network8 that makes explicit use of sampling times and known heteroskedastic noise properties. When trained on optical variable star catalogues, this network produces supervised classification models that rival other best-in-class approaches. We find that autoencoded features learned in one time-domain survey perform nearly as well when applied to another survey. These networks can continue to learn from new unlabelled observations and may be used in other unsupervised tasks, such as forecasting and anomaly detection.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Diagram of an RNN encoder–decoder architecture for irregularly sampled time series data.
Fig. 2: Example autoencoder reconstructions of ASAS light curves from 64-dimensional feature representation.
Fig. 3: Confusion matrices for autoencoder-feature random forest classifiers for labelled variable star light curves for each survey.

References

  1. 1.

    Levine, A. M. et al. First results from the All-Sky Monitor on the Rossi X-Ray Timing Explorer. Astrophys. J. Lett. 469, L33–L36 (1996).

    ADS  Article  Google Scholar 

  2. 2.

    Pojmanski, G. The All Sky Automated Survey. Catalog of variable stars. I. 0h–6h quarter of the southern hemisphere. Acta Astronomica 52, 397–427 (2002).

    ADS  Google Scholar 

  3. 3.

    Murphy, T. et al. VAST: an ASKAP survey for variables and slow transients. Publ. Astron. Soc. Aust. 30, e006 (2013).

    ADS  Article  Google Scholar 

  4. 4.

    Ridgway, S. T., Matheson, T., Mighell, K. J., Olsen, K. A. & Howell, S. B. The variable sky of deep synoptic surveys. Astrophys. J. 796, 53 (2014).

    ADS  Article  Google Scholar 

  5. 5.

    Djorgovski, S. et al. Real-time data mining of massive data streams from synoptic sky surveys. Future Gener. Comput. Syst. 59, 95–104 (2016).

    Article  Google Scholar 

  6. 6.

    Kantor, J. Transient alerts in LSST. in The Third Hot-wiring the Transient Universe Workshop (eds Wozniak, P. R., Graham, M. J., Mahabal, A. A. and Seaman, R.) 19–26 (Los Alamos National Laboratory, 2014).

  7. 7.

    Bloom, J. S., & Richards, J. W. Data mining and machine learning in time-domain discovery and classification. in Advances in Machine Learning and Data Mining for Astronomy (eds Way, M. J., Scargle, J. D., Ali, K. M. and Srivastava, A. N.) 89–112 (CRC, New York, 2012).

  8. 8.

    Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

    ADS  MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Richards, J. W. et al. Construction of a calibrated probabilistic classification catalog: application to 50k variable sources in the All-Sky Automated Survey. Astrophys. J. Suppl. Ser. 203, 32 (2012).

    ADS  Article  Google Scholar 

  10. 10.

    Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  MATH  Google Scholar 

  11. 11.

    Richards, J. W. et al. On machine-learned classification of variable stars with sparse and noisy time-series data. Astrophys. J. 733, 10 (2011).

    ADS  Article  Google Scholar 

  12. 12.

    Naul, B., van der Walt, S., Crellin-Quick, A., Bloom, J. S. & Pérez, F. Cesium: open-source platform for time-series inference. in Proc. 15th Python in Science Conf. (eds Benthall, S. and Rostrup, S.) 27–35 (SciPy, Austin, TX, 2016).

  13. 13.

    Nun, I. et al. FATS: Feature Analysis for Time Series. Preprint at https://arxiv.org/abs/1506.00010 (2015).

  14. 14.

    Dubath, P. et al. Random forest automated supervised classification of Hipparcos periodic variable stars. Mon. Notices R. Astron. Soc. 414, 2602–2617 (2011).

    ADS  Article  Google Scholar 

  15. 15.

    Nun, I., Pichara, K., Protopapas, P. & Kim, D.-W. Supervised detection of anomalous light curves in massive astronomical catalogs. Astrophys. J. 793, 23 (2014).

    ADS  Article  Google Scholar 

  16. 16.

    Miller, A. A. et al. A machine-learning method to infer fundamental stellar parameters from photometric light curves. Astrophys. J. 798, 122 (2015).

    ADS  Article  Google Scholar 

  17. 17.

    Kügler, S. D., Gianniotis, N. & Polsterer, K. L. Featureless classification of light curves. Mon. Not. R. Astron. Soc. 451, 3385–3392 (2015).

    ADS  Article  Google Scholar 

  18. 18.

    Kim, D.-W. & Bailer-Jones, C. A. A package for the automated classification of periodic variable stars. Astron. Astrophys. 587, A18 (2016).

    Article  Google Scholar 

  19. 19.

    Sesar, B. et al. Exploring the variable sky with LINEAR. II. Halo structure and substructure traced by RR Lyrae stars to 30 kpc. Astron. J. 146, 21 (2013).

    ADS  Article  Google Scholar 

  20. 20.

    Palaversa, L. et al. Exploring the variable sky with LINEAR. III. Classification of periodic light curves. Astron. J. 146, 101 (2013).

    ADS  Article  Google Scholar 

  21. 21.

    Alcock, C. et al. The MACHO project LMC variable star inventory. II. LMC RR Lyrae stars—pulsational characteristics and indications of a global youth of the LMC. Astron. J 111, 1146–1155 (1996).

    ADS  Article  Google Scholar 

  22. 22.

    Mackenzie, C., Pichara, K. & Protopapas, P. Clustering-based feature learning on variable stars. Astrophys. J. 820, 138 (2016).

    ADS  Article  Google Scholar 

  23. 23.

    Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH  Google Scholar 

  24. 24.

    Charnock, T. & Moss, A. Deep recurrent neural networks for supernovae classification. Preprint at https://arxiv.org/abs/1606.07442 (2016).

  25. 25.

    Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent neural networks for multivariate time series with missing values. Preprint at https://arxiv.org/abs/1606.01865 (2016).

  26. 26.

    Lipton, Z. C., Kale, D. C., Elkan, C. & Wetzell, R. Learning to diagnose with LSTM recurrent neural networks. Preprint at https://arxiv.org/abs/1511.03677 (2015).

  27. 27.

    Friedman, J. H. & Silverman, B. W. Flexible parsimonious smoothing and additive modeling. Technometrics 31, 3–21 (1989).

    MathSciNet  Article  MATH  Google Scholar 

  28. 28.

    Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. Preprint at https://arxiv.org/abs/1406.1078 (2014).

  29. 29.

    Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45, 2673–2681 (1997).

    ADS  Article  Google Scholar 

  30. 30.

    Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank Y. LeCun and F. El Gabaly for helpful discussions and A. Culich for computational assistance. This work is supported by the Gordon and Betty Moore Foundation Data-Driven Discovery and National Science Foundation BIGDATA grant number 1251274. Computation was provided by the Pacific Research Platform programme through the National Science Foundation Office of Advanced Cyberinfrastructure (number 1541349), Office of Cyberinfrastructure (number 1246396), University of California Office of the President, Calit2 and Berkeley Research Computing at University of California Berkeley.

Author information

Affiliations

Authors

Contributions

B.N. implemented and trained the networks, assembled the machine learning results and generated the first drafts of the paper and figures. J.S.B. conceived of the project, assembled the astronomical light curves and oversaw the supervised training portions. F.P. provided theoretical input. S.v.d.W. discussed the results and commented on the paper.

Corresponding author

Correspondence to Brett Naul.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Supplementary Text, Supplementary Figures 1–11 and Supplementary References

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Naul, B., Bloom, J.S., Pérez, F. et al. A recurrent neural network for classification of unevenly sampled variable stars. Nat Astron 2, 151–155 (2018). https://doi.org/10.1038/s41550-017-0321-z

Download citation

Further reading

Search

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing