Astronomical surveys of celestial sources produce streams of noisy time series measuring flux versus time (‘light curves’). Unlike in many other physical domains, however, large (and source-specific) temporal gaps in data arise naturally due to intranight cadence choices as well as diurnal and seasonal constraints1,2,3,4,5. With nightly observations of millions of variable stars and transients from upcoming surveys4,6, efficient and accurate discovery and classification techniques on noisy, irregularly sampled data must be employed with minimal human-in-the-loop involvement. Machine learning for inference tasks on such data traditionally requires the laborious hand-coding of domain-specific numerical summaries of raw data (‘features’)7. Here, we present a novel unsupervised autoencoding recurrent neural network8 that makes explicit use of sampling times and known heteroskedastic noise properties. When trained on optical variable star catalogues, this network produces supervised classification models that rival other best-in-class approaches. We find that autoencoded features learned in one time-domain survey perform nearly as well when applied to another survey. These networks can continue to learn from new unlabelled observations and may be used in other unsupervised tasks, such as forecasting and anomaly detection.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $8.67 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Levine, A. M. et al. First results from the All-Sky Monitor on the Rossi X-Ray Timing Explorer. Astrophys. J. Lett. 469, L33–L36 (1996).
Pojmanski, G. The All Sky Automated Survey. Catalog of variable stars. I. 0h–6h quarter of the southern hemisphere. Acta Astronomica 52, 397–427 (2002).
Murphy, T. et al. VAST: an ASKAP survey for variables and slow transients. Publ. Astron. Soc. Aust. 30, e006 (2013).
Ridgway, S. T., Matheson, T., Mighell, K. J., Olsen, K. A. & Howell, S. B. The variable sky of deep synoptic surveys. Astrophys. J. 796, 53 (2014).
Djorgovski, S. et al. Real-time data mining of massive data streams from synoptic sky surveys. Future Gener. Comput. Syst. 59, 95–104 (2016).
Kantor, J. Transient alerts in LSST. in The Third Hot-wiring the Transient Universe Workshop (eds Wozniak, P. R., Graham, M. J., Mahabal, A. A. and Seaman, R.) 19–26 (Los Alamos National Laboratory, 2014).
Bloom, J. S., & Richards, J. W. Data mining and machine learning in time-domain discovery and classification. in Advances in Machine Learning and Data Mining for Astronomy (eds Way, M. J., Scargle, J. D., Ali, K. M. and Srivastava, A. N.) 89–112 (CRC, New York, 2012).
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Richards, J. W. et al. Construction of a calibrated probabilistic classification catalog: application to 50k variable sources in the All-Sky Automated Survey. Astrophys. J. Suppl. Ser. 203, 32 (2012).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Richards, J. W. et al. On machine-learned classification of variable stars with sparse and noisy time-series data. Astrophys. J. 733, 10 (2011).
Naul, B., van der Walt, S., Crellin-Quick, A., Bloom, J. S. & Pérez, F. Cesium: open-source platform for time-series inference. in Proc. 15th Python in Science Conf. (eds Benthall, S. and Rostrup, S.) 27–35 (SciPy, Austin, TX, 2016).
Nun, I. et al. FATS: Feature Analysis for Time Series. Preprint at https://arxiv.org/abs/1506.00010 (2015).
Dubath, P. et al. Random forest automated supervised classification of Hipparcos periodic variable stars. Mon. Notices R. Astron. Soc. 414, 2602–2617 (2011).
Nun, I., Pichara, K., Protopapas, P. & Kim, D.-W. Supervised detection of anomalous light curves in massive astronomical catalogs. Astrophys. J. 793, 23 (2014).
Miller, A. A. et al. A machine-learning method to infer fundamental stellar parameters from photometric light curves. Astrophys. J. 798, 122 (2015).
Kügler, S. D., Gianniotis, N. & Polsterer, K. L. Featureless classification of light curves. Mon. Not. R. Astron. Soc. 451, 3385–3392 (2015).
Kim, D.-W. & Bailer-Jones, C. A. A package for the automated classification of periodic variable stars. Astron. Astrophys. 587, A18 (2016).
Sesar, B. et al. Exploring the variable sky with LINEAR. II. Halo structure and substructure traced by RR Lyrae stars to 30 kpc. Astron. J. 146, 21 (2013).
Palaversa, L. et al. Exploring the variable sky with LINEAR. III. Classification of periodic light curves. Astron. J. 146, 101 (2013).
Alcock, C. et al. The MACHO project LMC variable star inventory. II. LMC RR Lyrae stars—pulsational characteristics and indications of a global youth of the LMC. Astron. J 111, 1146–1155 (1996).
Mackenzie, C., Pichara, K. & Protopapas, P. Clustering-based feature learning on variable stars. Astrophys. J. 820, 138 (2016).
Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Charnock, T. & Moss, A. Deep recurrent neural networks for supernovae classification. Preprint at https://arxiv.org/abs/1606.07442 (2016).
Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y. Recurrent neural networks for multivariate time series with missing values. Preprint at https://arxiv.org/abs/1606.01865 (2016).
Lipton, Z. C., Kale, D. C., Elkan, C. & Wetzell, R. Learning to diagnose with LSTM recurrent neural networks. Preprint at https://arxiv.org/abs/1511.03677 (2015).
Friedman, J. H. & Silverman, B. W. Flexible parsimonious smoothing and additive modeling. Technometrics 31, 3–21 (1989).
Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. Preprint at https://arxiv.org/abs/1406.1078 (2014).
Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45, 2673–2681 (1997).
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
We thank Y. LeCun and F. El Gabaly for helpful discussions and A. Culich for computational assistance. This work is supported by the Gordon and Betty Moore Foundation Data-Driven Discovery and National Science Foundation BIGDATA grant number 1251274. Computation was provided by the Pacific Research Platform programme through the National Science Foundation Office of Advanced Cyberinfrastructure (number 1541349), Office of Cyberinfrastructure (number 1246396), University of California Office of the President, Calit2 and Berkeley Research Computing at University of California Berkeley.
Electronic supplementary material
Supplementary Text, Supplementary Figures 1–11 and Supplementary References