Automated discovery of fundamental variables hidden in experimental data

Chen, Boyuan; Huang, Kuang; Raghupathi, Sunand; Chandratreya, Ishaan; Du, Qiang; Lipson, Hod

doi:10.1038/s43588-022-00281-6

Article
Published: 25 July 2022

Automated discovery of fundamental variables hidden in experimental data

Nature Computational Science volume 2, pages 433–442 (2022)Cite this article

6977 Accesses
38 Citations
524 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

All physical laws are described as mathematical relationships between state variables. These variables give a complete and non-redundant description of the relevant system. However, despite the prevalence of computing power and artificial intelligence, the process of identifying the hidden state variables themselves has resisted automation. Most data-driven methods for modelling physical phenomena still rely on the assumption that the relevant state variables are already known. A longstanding question is whether it is possible to identify state variables from only high-dimensional observational data. Here we propose a principle for determining how many state variables an observed system is likely to have, and what these variables might be. We demonstrate the effectiveness of this approach using video recordings of a variety of physical dynamical systems, ranging from elastic double pendulums to fire flames. Without any prior knowledge of the underlying physics, our algorithm discovers the intrinsic dimension of the observed dynamics and identifies candidate sets of state variables.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Two-stage modelling of dynamical systems.**

**Fig. 2: Prediction visualizations and physics evaluations.**

**Fig. 3: ID and neural state variables.**

**Fig. 4: Long-term prediction stability.**

**Fig. 5: Neural state variables for dynamics stability indicators.**

**Fig. 6: Neural state variables for robust long-term prediction.**

Automatically discovering ordinary differential equations from data with sparse regression

Article Open access 09 January 2024

Data-driven discovery of intrinsic dynamics

Article 08 December 2022

State estimation of a physical system with unknown governing equations

Article Open access 11 October 2023

Data availability

All of our simulated and physical dataset repository is available⁵¹. Source data for Figs. 2b, 3b, 4a, 5 and 6a and Extended Data Fig. 2 are available for this Article.

Code availability

The open-source code to reproduce our training and evaluation results is available at the Zenodo repository⁵² and GitHub (https://github.com/BoyuanChen/neural-state-variables).

References

Anderson, P. W. More is different. Science 177, 393–396 (1972).
Article Google Scholar
Thompson, J. M. T. & Stewart, H. B. Nonlinear Dynamics and Chaos (Wiley, 2002).
Hirsch, M. W., Smale, S. & Devaney, R. L. Differential Equations, Dynamical Systems, and an Introduction to Chaos (Academic, 2012).
Kutz, J. N., Brunton, S. L., Brunton, B. W. & Proctor, J. L. Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems (SIAM, 2016).
Evans, J. & Rzhetsky, A. Machine science. Science 329, 399–400 (2010).
Article Google Scholar
Fortunato, S. et al. Science of science. Science 359, eaao0185 (2018).
Article Google Scholar
Bongard, J. & Lipson, H. Automated reverse engineering of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 104, 9943–9948 (2007).
Article Google Scholar
Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).
Article Google Scholar
King, R. D., Muggleton, S. H., Srinivasan, A. & Sternberg, M. Structure–activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc. Natl Acad. Sci. USA 93, 438–442 (1996).
Article Google Scholar
Waltz, D. & Buchanan, B. G. Automating science. Science 324, 43–44 (2009).
Article Google Scholar
King, R. D. et al. The robot scientist Adam. Computer 42, 46–54 (2009).
Article Google Scholar
Langley, P. BACON: a production system that discovers empirical laws. In Proc. Fifth International Joint Conference on Artificial Intelligence Vol. 1 344 (Morgan Kaufmann, 1977).
Langley, P. Rediscovering physics with BACON.3. In Proc. Sixth International Joint Conference on Artificial Intelligence Vol. 1 505–507 (Morgan Kaufmann, 1979).
Crutchfield, J. P. & McNamara, B. Equations of motion from a data series. Complex Syst. 1, 417–452 (1987).
MathSciNet MATH Google Scholar
Kevrekidis, I. G. et al. Equation-free, coarse-grained multiscale computation: enabling microscopic simulators to perform system-level analysis. Commun. Math. Sci. 1, 715–762 (2003).
Article MathSciNet Google Scholar
Yao, C. & Bollt, E. M. Modeling and nonlinear parameter estimation with Kronecker product representation for coupled oscillators and spatiotemporal systems. Physica D 227, 78–99 (2007).
Article MathSciNet Google Scholar
Rowley, C. W., Mezić, I., Bagheri, S., Schlatter, P. & Henningson, D. S. Spectral analysis of nonlinear flows. J. Fluid Mech. 641, 115–127 (2009).
Article MathSciNet Google Scholar
Schmidt, M. D. et al. Automated refinement and inference of analytical models for metabolic networks. Phys. Biol. 8, 055011 (2011).
Article Google Scholar
Sugihara, G. et al. Detecting causality in complex ecosystems. Science 338, 496–500 (2012).
Article Google Scholar
Ye, H. et al. Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. Proc. Natl Acad. Sci. USA 112, E1569–E1576 (2015).
Google Scholar
Daniels, B. C. & Nemenman, I. Automated adaptive inference of phenomenological dynamical models. Nat. Commun. 6, 8133 (2015).
Article Google Scholar
Daniels, B. C. & Nemenman, I. Efficient inference of parsimonious phenomenological models of cellular dynamics using S-systems and alternating regression. PloS ONE 10, e0119821 (2015).
Article Google Scholar
Benner, P., Gugercin, S. & Willcox, K. A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Rev. 57, 483–531 (2015).
Article MathSciNet Google Scholar
Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).
Article MathSciNet Google Scholar
Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).
Article Google Scholar
Udrescu, S.-M. & Tegmark, M. AI Feynman: a physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).
Article Google Scholar
Mrowca D. et al. Flexible neural representation for physics prediction. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) (Curran Associates, 2018).
Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Data-driven discovery of coordinates and governing equations. Proc. Natl Acad. Sci. USA 116, 22445–22451 (2019).
Article MathSciNet Google Scholar
Baldi, P. & Hornik, K. Neural networks and principal component analysis: learning from examples without local minima. Neural Netw. 2, 53–58 (1989).
Article Google Scholar
Hinton, G. E. & Zemel, R. S. Autoencoders, minimum description length, and Helmholtz free energy. Adv. Neural Inf. Process. Syst. 6, 3 (1994).
Google Scholar
Masci, J., Meier, U., Cireşan, D. & Schmidhuber, J. Stacked convolutional autoencoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks 52–59 (Springer, 2011).
Bishop C. M. et al. Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995).
Camastra, F. & Staiano, A. Intrinsic dimension estimation: advances and open problems. Inf. Sci. 328, 26–41 (2016).
Article Google Scholar
Campadelli, P., Casiraghi, E., Ceruti, C. & Rozza, A. Intrinsic dimension estimation: relevant techniques and a benchmark framework. Math. Probl. Eng. 2015, 759567 (2015).
Levina, E. & Bickel, P. J. Maximum likelihood estimation of intrinsic dimension. In Proc. 17th International Conference on Neural Information Processing Systems 777–784 (MIT Press, 2005).
Rozza, A., Lombardi, G., Ceruti, C., Casiraghi, E. & Campadelli, P. Novel high intrinsic dimensionality estimators. Mach. Learn. 89, 37–65 (2012).
Article MathSciNet Google Scholar
Ceruti, C. et al. DANCo: an intrinsic dimensionality estimator exploiting angle and norm concentration. Pattern Recognit. 47, 2569–2581 (2014).
Article Google Scholar
Hein, M. & Audibert, J.-Y. Intrinsic dimensionality estimation of submanifolds in R^d. In Proc. 22nd International Conference on Machine Learning 289–296 (Association for Computing Machinery, 2005).
Grassberger, P. & Procaccia, I. in The Theory of Chaotic Attractors 170–189 (Springer, 2004).
Pukrittayakamee, A. et al. Simultaneous fitting of a potential-energy surface and its corresponding force fields using feedforward neural networks. J. Chem. Phys. 130, 134101 (2009).
Article Google Scholar
Wu, J., Lim, J. J., Zhang, H., Tenenbaum, J. B. & Freeman, W. T. Physics 101: Learning physical object properties from unlabeled videos. In Proc. British Machine Vision Conference (BMVC) (eds Wilson, R. C. et al.) 39.1-39.12 (BMVA Press, 2016).
Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 3, e1603015 (2017).
Article Google Scholar
Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. & Tkatchenko, A. Quantum-chemical insights from deep tensor neural networks. Nat. Commun. 8, 13890 (2017).
Article Google Scholar
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
Article Google Scholar
Lutter, M., Ritter, C. & Peters, J. Deep Lagrangian networks: using physics as model prior for deep learning. In International Conference on Learning Representations (2019).
Bondesan, R. & Lamacraft, A. Learning symmetries of classical integrable systems. Preprint at https://arxiv.org/abs/1906.04645 (2019).
Greydanus, S. J., Dzumba, M. & Yosinski, J. Hamiltonian neural networks. Preprint at https://arxiv.org/abs/1906.01563 (2019).
Swischuk, R., Kramer, B., Huang, C. & Willcox, K. Learning physics-based reduced-order models for a single-injector combustion process. AIAA J. 58, 2658–2672 (2020).
Article Google Scholar
Lange, H., Brunton, S. L. & Kutz, J. N. From Fourier to Koopman: spectral methods for long-term time series prediction. J. Mach. Learn. Res. 22, 1–38 (2021).
MathSciNet MATH Google Scholar
Mallen, A., Lange, H. & Kutz, J. N. Deep probabilistic Koopman: long-term time-series forecasting under periodic uncertainties. Preprint at https://arxiv.org/abs/2106.06033 (2021).
Chen B. et al. Dataset for the paper titled Discovering State Variables Hidden in Experimental Data (1.0). Zenodo https://doi.org/10.5281/zenodo.6653856 (2022).
Chen B. et al. BoyuanChen/neural-state-variables: (v1.0). Zenodo https://doi.org/10.5281/zenodo.6629185 (2022).

Download references

Acknowledgements

This research was supported in part by NSF AI Institute for Dynamical Systems grant 2112085 (to H.L.), DARPA MTO Lifelong Learning Machines (L2M) Program HR0011-18-2-0020 (to H.L.), NSF NRI grant 1925157 (to H.L.), NSF DMS grant 1937254 (to Q.D.), NSF DMS grant 2012562 (to Q.D.), NSF CCF grant 1704833 (to Q.D.), DE grant SC0022317 (to Q.D.) and DOE ASCR DE grant SC0022317 (to Q.D.).

Author information

Authors and Affiliations

Department of Computer Science, Columbia University, New York, USA
Boyuan Chen & Ishaan Chandratreya
Department of Applied Physics and Applied Mathematics, Columbia University, New York, USA
Kuang Huang, Sunand Raghupathi & Qiang Du
Data Science Institute, Columbia University, New York, USA
Qiang Du & Hod Lipson
Department of Mechanical Engineering, Columbia University, New York, USA
Hod Lipson

Authors

Boyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Kuang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sunand Raghupathi
View author publications
You can also search for this author in PubMed Google Scholar
Ishaan Chandratreya
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Du
View author publications
You can also search for this author in PubMed Google Scholar
Hod Lipson
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.C. and H.L. proposed the research; B.C., K.H., H.L. and Q.D. performed experiments and numerical analysis, B.C. and K.H. designed the algorithms; B.C., K.H., I.C. and S.R. collected the dataset; B.C., K.H., H.L. and Q.D. wrote the paper; all authors provided feedback.

Corresponding author

Correspondence to Boyuan Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Bryan Daniels and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Jie Pan, in collaboration with the Nature Computational Science team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 What state variables describe these dynamical systems?.

What state variables describe these dynamical systems? Identifying state variables from raw observation data is a precursor step to discovering physical laws. The key challenge is to figure out how many variables will give a complete and non-redundant description of the system’s states, what are the candidate variables, and how the variables are dependent on each other. Our work studies how to retrieve possible set of state variables from data distributions non-linearly embedded in the ambient space.

Extended Data Fig. 2 PCA and Neural State Variables visualization.

PCA and Neural State Variables visualization. Here we visualize the interesting symmetrical structures encoded in the Neural State Variables from single pendulum (A) and rigid double pendulum (B) after applying PCA algorithm on them. The colors represent the value of different physical variables. The x-axis and y-axis represent different components of the Neural State Variables.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–12, Discussion and Tables 1–10.

Peer review file

Supplementary Video 1

Overview video.

Supplementary Video 2

ID estimation results.

Supplementary Video 3

Long-term prediction stability results.

Supplementary Video 4

Robust prediction results for the single-pendulum system.

Supplementary Video 5

Robust prediction results for the double-pendulum system.

Source data

Source Data Fig. 3

Source data to reproduce the figure results.

Source Data Fig. 4

Source data to reproduce the figure results.

Source Data Fig. 5

Source data to reproduce the figure results.

Source Data Fig. 6

Source data to reproduce the figure results.

Source Data Extended Data Fig. 2

Source data to produce Extended Data Fig. 2.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, B., Huang, K., Raghupathi, S. et al. Automated discovery of fundamental variables hidden in experimental data. Nat Comput Sci 2, 433–442 (2022). https://doi.org/10.1038/s43588-022-00281-6

Download citation

Received: 24 February 2022
Accepted: 21 June 2022
Published: 25 July 2022
Issue Date: July 2022
DOI: https://doi.org/10.1038/s43588-022-00281-6

This article is cited by

Generative learning for nonlinear dynamics
- William Gilpin
Nature Reviews Physics (2024)
Incorporating physics into data-driven computer vision
- Achuta Kadambi
- Celso de Melo
- Stefano Soatto
Nature Machine Intelligence (2023)
Knowledge-integrated machine learning for materials: lessons from gameplaying and robotics
- Kedar Hippalgaonkar
- Qianxiao Li
- Tonio Buonassisi
Nature Reviews Materials (2023)
Constructing custom thermodynamics using deep learning
- Xiaoli Chen
- Beatrice W. Soh
- Qianxiao Li
Nature Computational Science (2023)
Using artificial intelligence to transform astrobiology
- Caleb A. Scharf
- Marisa H. Mayer
- Penelope J. Boston
Nature Astronomy (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links