Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities

Abstract

The execution and analysis of complex experiments are challenged by the vast dimensionality of the underlying parameter spaces. Although an increase in data-acquisition rates should allow broader querying of the parameter space, the complexity of experiments and the subtle dependence of the model function on input parameters remains daunting owing to the sheer number of variables. New strategies for autonomous data acquisition are being developed, with one promising direction being the use of Gaussian process regression (GPR). GPR is a quick, non-parametric and robust approximation and uncertainty quantification method that can be applied directly to autonomous data acquisition. We review GPR-driven autonomous experimentation and illustrate its functionality using real-world examples from large experimental facilities in the USA and France. We introduce the basics of a GPR-driven autonomous loop with a focus on Gaussian processes, and then shift the focus to the infrastructure that needs to be built around GPR to create a closed loop. Finally, the case studies we discuss show that Gaussian-process-based autonomous data acquisition is a widely applicable method that can facilitate the optimal use of instruments and facilities by enabling the efficient acquisition of high-value datasets.

Key points

  • Gaussian process regression (GPR) is a robust statistical, non-parametric technique for uncertainty quantification and function approximation.

  • GPR can directly be applied to autonomous and optimal data acquisition.

  • GPR provides straightforward ways to inject domain knowledge and can easily be customized for feature finding.

  • The gpCAM software tool provides a simple way for practitioners to use GPR for autonomous experimentation.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Schematic of an example autonomous experiment at a synchrotron radiation beam line.
Fig. 2: Key elements for a successful application of gpCAM to infrared mapping.
Fig. 3: Autonomous identification of sample phases, and convergence towards the ground-truth for a surrogate sample measurement with ground truth spectrum consisting of 8,000 points acquired on a uniform mesh.
Fig. 4: Commissioning run of gpCAM at the triple-axis spectrometer ThALES57 at the Institut Laue-Langevin in Grenoble, France.

Code availability

The gpCAM code for autonomous steering associated with this Review is available at https://doi.org/10.11578/dc.20210217.5 and https://bitbucket.org/MarcusMichaelNoack/gpcam and via pip install gpCAM. Any updates will be published in the repository and on the Python package index (PyPi). The Takin software is available at https://doi.org/10.1016/j.softx.2021.100667.

References

  1. Peirce, C. S. The fixation of belief. Pop. Sci. Mon. 12, 1−15 (1877).

    Google Scholar 

  2. Peirce, C. S. & Menand, L. How to make our ideas clear. Pop. Sci. Mon. 12, 286–302 (1878).

    Google Scholar 

  3. McKay, M. D., Beckman, R. J. & Conover, W. J. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 239–245 (1979).

    MathSciNet  MATH  Google Scholar 

  4. Fisher, R. A. The arrangement of field experiments. In Breakthroughs in Statistics 82−91 (Springer, 1992).

  5. Settles, B. Active learning literature survey. Technical Reports (University of Wisconsin-Madison, Department of Computer Sciences, 2009).

  6. Krishnakumar, A. Active learning literature survey. Technical Reports 42 (University of California Santa Cruz, 2007).

  7. van de Schoot, R. et al. Bayesian statistics and modelling. Nat. Rev. Methods Primers 1, 1–26 (2021).

    Article  Google Scholar 

  8. Noack, M. M. et al. A Kriging-based approach to autonomous experimentation with applications to X-ray scattering. Sci. Rep. 9, 11809 (2019).

    Article  ADS  Google Scholar 

  9. Noack, M. M., Doerk, G. S., Li, R., Fukuto, M. & Yager, K. G. Advances in Kriging-based autonomous X-ray scattering experiments. Sci. Rep. 10, 1325 (2020).

    Article  ADS  Google Scholar 

  10. Noack, M. & Zwart, P. Computational strategies to increase efficiency of Gaussian-process-driven autonomous experiments. In 2019 IEEE/ACM 1st Annual Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP) 1−7 (IEEE, 2019).

  11. Noack, M. M. et al. Autonomous materials discovery driven by Gaussian process regression with inhomogeneous measurement noise and anisotropic kernels. Sci. Rep. 10, 17663 (2020).

    Article  ADS  Google Scholar 

  12. Wiegart, L. et al. Instrumentation for in situ/operando X-ray scattering studies of polymer additive manufacturing processes. Synchrotron Radiat. News 32, 20–27 (2019).

    Article  Google Scholar 

  13. Frazier, P. I. Bayesian optimization. Recent Adv. Optim. Model. Contemp. Probl. https://doi.org/10.1287/educ.2018.0188 (2018).

  14. Noack, M. gpcam version 6. bitbucket https://bitbucket.org/MarcusMichaelNoack/gpcam (2021).

  15. Noack, M. M. & Funke, S. W. Hybrid genetic deflated Newton method for global optimisation. J. Comput. Appl. Math. 325, 97–112 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  16. Hobson, A. & Cheng, B.-K. A comparison of the Shannon and Kullback information measures. J. Stat. Phys. 7, 301–310 (1973).

    Article  ADS  MathSciNet  MATH  Google Scholar 

  17. Noack, M. M. & Sethian, J. A. Advanced stationary and non-stationary Kernel designs for domain-aware Gaussian processes. Preprint at https://arxiv.org/abs/2102.03432 (2021).

  18. Fratzl, P. Small-angle scattering in materials science — a short review of applications in alloys, ceramics and composite materials. J. Appl. Crystallogr. 36, 397–404 (2003).

    Article  Google Scholar 

  19. Dubcek, P. Nanostructures as seen by the SAXS. Vacuum 80, 92–97 (2005).

    Article  ADS  Google Scholar 

  20. Yager, K. G., Zhang, Y., Lu, F. & Gang, O. Periodic lattices of arbitrary nano-objects: modeling and applications for self-assembled systems. J. Appl. Crystallogr. 47, 118–129 (2014).

    Article  Google Scholar 

  21. Liu, J. et al. The impact of alterations in lignin deposition on cellulose organization of the plant cell wall. Biotechnol. Biofuels 9, 126 (2016).

    Article  Google Scholar 

  22. Paris, O. From diffraction to imaging: new avenues in studying hierarchical biological tissues with X-ray microbeams (review). Biointerphases 3, FB16 (2008).

    Article  Google Scholar 

  23. Aghamohammadzadeh, H., Newton, R. H. & Meek, K. M. X-ray scattering used to map the preferred collagen orientation in the human cornea and limbus. Structure 12, 249–256 (2004).

    Article  Google Scholar 

  24. Liu, J. et al. Amyloid structure exhibits polymorphism on multiple length scales in human brain tissue. Sci. Rep. 6, 33079 (2016).

    Article  ADS  Google Scholar 

  25. Weaver, J. C. et al. The stomatopod dactyl club: a formidable damage-tolerant biological hammer. Science 336, 1275–1280 (2012).

    Article  ADS  Google Scholar 

  26. Wang, Q. et al. Phase transformations and structural developments in the radular teeth of Cryptochiton stelleri. Adv. Funct. Mater. 23, 2908–2917 (2013).

    Article  Google Scholar 

  27. Meredith, J. C., Smith, A. P., Karim, A. & Amis, E. J. Combinatorial materials science for polymer thin-film dewetting. Macromolecules 33, 9747–9756 (2000).

    Article  ADS  Google Scholar 

  28. Stafford, C. M., Roskov, K. E., Epps III, T. H. & Fasolka, M. J. Generating thickness gradients of thin polymer films via flow coating. Rev. Sci. Instrum. 77, 023908 (2006).

    Article  ADS  Google Scholar 

  29. Smith, A. P., Douglas, J. F., Meredith, J. C., Amis, E. J. & Karim, A. High-throughput characterization of pattern formation in symmetric diblock copolymer films. J. Polym. Sci. B 39, 2141–2158 (2001).

    Article  Google Scholar 

  30. Davis, R. L., Jayaraman, S., Chaikin, P. M. & Register, R. A. Creating controlled thickness gradients in polymer thin films via flowcoating. Langmuir 30, 5637–5644 (2014).

    Article  Google Scholar 

  31. Meredith, J. C., Karim, A. & Amis, E. J. High-throughput measurement of polymer blend phase behavior. Macromolecules 33, 5760–5762 (2000).

    Article  ADS  Google Scholar 

  32. Roberson, S. V., Fahey, A. J., Sehgal, A. & Karim, A. Multifunctional ToF-SIMS: combinatorial mapping of gradient energy substrates. Appl. Surf. Sci. 200, 150–164 (2002).

    Article  ADS  Google Scholar 

  33. Berry, B. C. et al. Versatile platform for creating gradient combinatorial libraries via modulated light exposure. Rev. Sci. Instrum. 78, 072202 (2007).

    Article  ADS  Google Scholar 

  34. Smith, A. P., Sehgal, A., Douglas, J. F., Karim, A. & Amis, E. J. Combinatorial mapping of surface energy effects on diblock copolymer thin film ordering. Macromol. Rapid Commun. 24, 131–135 (2003).

    Article  Google Scholar 

  35. Toth, K., Osuji, C. O., Yager, K. G. & Doerk, G. S. Electrospray deposition tool: creating compositionally gradient libraries of nanomaterials. Rev. Sci. Instrum. 91, 013701 (2020).

    Article  Google Scholar 

  36. Holman, H.-Y. N., Bechtel, H. A., Hao, Z. & Martin, M. C. Synchrotron IR spectromicroscopy: chemistry of living cells. Anal. Chem. 82, 8757–8765 (2010).

  37. Holman, H.-Y. N. et al. Real-time characterization of biogeochemical reduction of Cr (VI) on basalt surfaces by SR-FTIR imaging. Geomicrobiol. J. 16, 307–324 (1999).

    Article  Google Scholar 

  38. Holman, H.-Y. N. et al. Catalysis of PAH biodegradation by humic acid shown in synchrotron infrared studies. Environ. Sci. Technol. 36, 1276–1280 (2002).

    Article  ADS  Google Scholar 

  39. Mason, O. U. et al. Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill. ISME J. 6, 1715–1727 (2012).

    Article  Google Scholar 

  40. Holman, H.-Y. N. et al. Real-time molecular monitoring of chemical environment in obligate anaerobes during oxygen adaptive response. Proc. Natl Acad. Sci. USA 106, 12599–12604 (2009).

    Article  ADS  Google Scholar 

  41. Hazen, T. C. et al. Deep-sea oil plume enriches indigenous oil-degrading bacteria. Science 330, 204–208 (2010).

    Article  ADS  Google Scholar 

  42. Bælum, J. et al. Deep-sea bacteria enriched by oil and dispersant from the Deepwater Horizon spill. Environ. Microbiol. 14, 2405–2416 (2012).

    Article  Google Scholar 

  43. Benning, L. G., Phoenix, V., Yee, N. & Konhauser, K. The dynamics of cyanobacterial silicification: an infrared micro-spectroscopic investigation. Geochim. Cosmochim. Acta 68, 743–757 (2004).

    Article  ADS  Google Scholar 

  44. Benning, L. G., Phoenix, V., Yee, N. & Tobin, M. Molecular characterization of cyanobacterial silicification using synchrotron infrared micro-spectroscopy. Geochim. Cosmochim. Acta 68, 729–741 (2004).

    Article  ADS  Google Scholar 

  45. Yee, N., Benning, L. G., Phoenix, V. R. & Ferris, F. G. Characterization of metal-cyanobacteria sorption reactions: a combined macroscopic and infrared spectroscopic investigation. Environ. Sci. Technol. 38, 775–782 (2004).

    Article  ADS  Google Scholar 

  46. Probst, A. J. et al. Tackling the minority: sulfate-reducing bacteria in an archaea-dominated subsurface biofilm. ISME J. 7, 635–651 (2013).

    Article  Google Scholar 

  47. Valdespino-Castillo, P. M. et al. Exploring biogeochemistry and microbial diversity of extant microbialites in Mexico and Cuba. Front. Microbiol. 9, 510 (2018).

    Article  Google Scholar 

  48. Valdespino-Castillo, P. M. et al. Interplay of microbial communities with mineral environments in coralline algae. Sci. Total Environ. 757, 143877 (2021).

    Article  ADS  Google Scholar 

  49. Holman, E. et al. Autonomous adaptive data acquisition for scanning hyperspectral imaging. Commun. Biol. 3, 684 (2020).

  50. Davies, T. & Fearn, T. Back to basics: the principles of principal component analysis. Spectrosc. Eur. 16, 20 (2004).

    Google Scholar 

  51. Melton, C. N. et al. K-means-driven Gaussian process data collection for angle-resolved photoemission spectroscopy. Mach. Learn. Sci. Technol. 1, 045015 (2020).

  52. Cao, Y. et al. Unconventional superconductivity in magic-angle graphene superlattices. Nature 556, 43–50 (2018).

    Article  ADS  Google Scholar 

  53. Squires, G. L. Introduction to the Theory of Thermal Neutron Scattering (Cambridge Univ. Press, 2012).

  54. Weber, T. Takin 2 (software). GitLab https://code.ill.fr/scientific-software/takin (2021).

  55. Weber, T. Update 2.0 to “Takin: an open-source software for experiment planning, visualisation, and data analysis”, (PII: S2352711016300152). SoftwareX 14, 100667 (2021).

    Article  Google Scholar 

  56. Bostwick, A. et al. Band structure and many body effects in graphene. Eur. Phys. J. Spec. Top. 148, 5–13 (2007).

    Article  Google Scholar 

  57. Boehm, M. et al. ThALES – Three Axis Low Energy Spectroscopy for highly correlated electron systems. Neutron News 26, 18–21 (2015).

    Article  Google Scholar 

Download references

Acknowledgements

The work was partly funded through the Center for Advanced Mathematics for Energy Research Applications (CAMERA), which is jointly funded by the Advanced Scientific Computing Research (ASCR) and Basic Energy Sciences (BES) within the Department of Energy’s Office of Science, as well as by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory, under US Department of Energy contract no. DE-AC02-05CH11231. This research used resources of the Center for Functional Nanomaterials and the National Synchrotron Light Source II, which are US DOE Office of Science facilities, at Brookhaven National Laboratory under contract no. DE-SC0012704. This research also used resources of the Berkeley Synchrotron Infrared Structural Biology (BSISB) Imaging Program, funded by the US Department of Energy, Office of Biological and Environmental Research, under contract no. DE-AC02-05CH11231. The Advanced Light Source is supported by the Director, Office of Science, and the Office of Basic Energy Sciences. Both the ALS and BSISB were supported through contract no. DE-AC02-05CH11231. K.C.E. and C.B.M. acknowledge support from the Office of Naval Research Multidisciplinary University Research Initiative Award ONR N00014-18-1-2497. K.C.E. acknowledges support from the NSF Graduate Research Fellowship Program under grant no. DGE-1321851. This work is based on experiments performed at the Institut Laue-Langevin (ILL) in Grenoble, France. The collected datasets have the DOIs 10.5291/ILL-DATA.TEST-3123 and in part 10.5291/ILL-DATA.4-01-1643. The authors thank E. Villard, P. Chevalier and J. Locatelli for technical support at the ThALES spectrometer. C. N. Melton (author of ref.51) performed the K-means cluster-based GP collection simulations.

Author information

Authors and Affiliations

Authors

Contributions

M.M.N. wrote the initial drafts of the introduction and the technical sections, devised the algorithm used, formulated the required mathematics, and implemented the computer codes (gpCAM). P.H.Z. designed, coordinated and collaborated on the development of basic computational strategies in gpCAM and on its use in SR-FTIR microscopy and ARPES experiments and took part in writing and editing this manuscript. D.M.U. designed, configured and implemented codes associated with convnets for reverse image search and wrote the related section. M.F. and K.G.Y. planned, supervised and coordinated experiments at Brookhaven National Laboratory’s National Synchrotron Light Source II, and wrote the related section. M.F., K.G.Y., E.H.R.T., R.L., G.F. and M.Z. performed X-ray scattering experiments at National Synchrotron Light Source II, including beamline operation and data analytics. K.C.E. and C.B.M. prepared nanoplatelet materials. A.S. and G.S.D. prepared chemical templates and self-assembled films. E.R. planned and led the ARPES measurements at the Advanced Light Source, and wrote the related section. H.-Y.N.H. led the SR-FTIR measurements, coordinated the simulations and wrote the initial draft of the related section, S.L. designed and performed the PCA-based GP collection simulations and wrote the related section. L.C. designed the simulations and wrote the related section. Y.L.G. and T.W. customized gpCAM for use at the ThALES spectrometer. T.W. developed and performed preparatory simulations with gpCAM using theoretical dynamical structure factor models for neutron scattering. T.W. planned and T.W., M.B., P.S. and P.M. performed the first autonomous commissioning experiment at ThALES measuring the magnons in the chiral magnet MnSi. M.B. proposed and M.B., T.W., P.S. and P.M. performed the second autonomous commissioning experiment at ThALES, the results of which are shown in Fig. 4. The sample for the first experiment (MnSi) was provided by A. Bauer, the sample for the second autonomous commissioning experiment was provided by M.B. T.W. analysed the data of the first experiment (MnSi, not shown), M.B. analysed the data of the second experiment (Fig. 4). M.B. and T.W. wrote the text of the corresponding section to equal parts. J.A.S. supervised the development of the mathematics and the implementation of the code, and revised and improved the manuscript. All authors commented on the manuscript and revised it repeatedly.

Corresponding author

Correspondence to Marcus M. Noack.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Physics thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Glossary

Uncertainty quantification

The quantitative characterization of uncertainties in computational and real-world applications.

Himmelblau’s function

A common test function in the optimization community.

Principal component analysis

A dimensionality reduction technique that finds an orthonormal basis; typically retaining only the first few basis vectors preserves the majority of the variance of the dataset while substantially reducing data dimensionality.

Non-negative matrix factorization

A computational linear algebra technique to factorize a matrix into two matrices without negative elements.

Bump function

A function that is both smooth and compactly supported.

Surrogate model

An approximate model when the actual model is difficult or costly to evaluate.

Linear interpolation with Voronoi tessellation

A technique for function approximation and automated data acquisition.

Triple-axis spectrometers

A special spectrometer that selects the wavelengths of neutrons before and after they hit the sample, which directly probes the energy and momentum response of various materials.

Delaunay triangulation

A triangulation technique such that no point in the set is inside the circumcircle of any of the triangles connecting the points.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Noack, M.M., Zwart, P.H., Ushizima, D.M. et al. Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities. Nat Rev Phys 3, 685–697 (2021). https://doi.org/10.1038/s42254-021-00345-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42254-021-00345-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing