The execution and analysis of complex experiments are challenged by the vast dimensionality of the underlying parameter spaces. Although an increase in data-acquisition rates should allow broader querying of the parameter space, the complexity of experiments and the subtle dependence of the model function on input parameters remains daunting owing to the sheer number of variables. New strategies for autonomous data acquisition are being developed, with one promising direction being the use of Gaussian process regression (GPR). GPR is a quick, non-parametric and robust approximation and uncertainty quantification method that can be applied directly to autonomous data acquisition. We review GPR-driven autonomous experimentation and illustrate its functionality using real-world examples from large experimental facilities in the USA and France. We introduce the basics of a GPR-driven autonomous loop with a focus on Gaussian processes, and then shift the focus to the infrastructure that needs to be built around GPR to create a closed loop. Finally, the case studies we discuss show that Gaussian-process-based autonomous data acquisition is a widely applicable method that can facilitate the optimal use of instruments and facilities by enabling the efficient acquisition of high-value datasets.
Gaussian process regression (GPR) is a robust statistical, non-parametric technique for uncertainty quantification and function approximation.
GPR can directly be applied to autonomous and optimal data acquisition.
GPR provides straightforward ways to inject domain knowledge and can easily be customized for feature finding.
The gpCAM software tool provides a simple way for practitioners to use GPR for autonomous experimentation.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Computational Science Open Access 30 December 2022
Communications Materials Open Access 09 November 2022
npj Computational Materials Open Access 02 May 2022
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The gpCAM code for autonomous steering associated with this Review is available at https://doi.org/10.11578/dc.20210217.5 and https://bitbucket.org/MarcusMichaelNoack/gpcam and via pip install gpCAM. Any updates will be published in the repository and on the Python package index (PyPi). The Takin software is available at https://doi.org/10.1016/j.softx.2021.100667.
Peirce, C. S. The fixation of belief. Pop. Sci. Mon. 12, 1−15 (1877).
Peirce, C. S. & Menand, L. How to make our ideas clear. Pop. Sci. Mon. 12, 286–302 (1878).
McKay, M. D., Beckman, R. J. & Conover, W. J. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 239–245 (1979).
Fisher, R. A. The arrangement of field experiments. In Breakthroughs in Statistics 82−91 (Springer, 1992).
Settles, B. Active learning literature survey. Technical Reports (University of Wisconsin-Madison, Department of Computer Sciences, 2009).
Krishnakumar, A. Active learning literature survey. Technical Reports 42 (University of California Santa Cruz, 2007).
van de Schoot, R. et al. Bayesian statistics and modelling. Nat. Rev. Methods Primers 1, 1–26 (2021).
Noack, M. M. et al. A Kriging-based approach to autonomous experimentation with applications to X-ray scattering. Sci. Rep. 9, 11809 (2019).
Noack, M. M., Doerk, G. S., Li, R., Fukuto, M. & Yager, K. G. Advances in Kriging-based autonomous X-ray scattering experiments. Sci. Rep. 10, 1325 (2020).
Noack, M. & Zwart, P. Computational strategies to increase efficiency of Gaussian-process-driven autonomous experiments. In 2019 IEEE/ACM 1st Annual Workshop on Large-scale Experiment-in-the-Loop Computing (XLOOP) 1−7 (IEEE, 2019).
Noack, M. M. et al. Autonomous materials discovery driven by Gaussian process regression with inhomogeneous measurement noise and anisotropic kernels. Sci. Rep. 10, 17663 (2020).
Wiegart, L. et al. Instrumentation for in situ/operando X-ray scattering studies of polymer additive manufacturing processes. Synchrotron Radiat. News 32, 20–27 (2019).
Frazier, P. I. Bayesian optimization. Recent Adv. Optim. Model. Contemp. Probl. https://doi.org/10.1287/educ.2018.0188 (2018).
Noack, M. gpcam version 6. bitbucket https://bitbucket.org/MarcusMichaelNoack/gpcam (2021).
Noack, M. M. & Funke, S. W. Hybrid genetic deflated Newton method for global optimisation. J. Comput. Appl. Math. 325, 97–112 (2017).
Hobson, A. & Cheng, B.-K. A comparison of the Shannon and Kullback information measures. J. Stat. Phys. 7, 301–310 (1973).
Noack, M. M. & Sethian, J. A. Advanced stationary and non-stationary Kernel designs for domain-aware Gaussian processes. Preprint at https://arxiv.org/abs/2102.03432 (2021).
Fratzl, P. Small-angle scattering in materials science — a short review of applications in alloys, ceramics and composite materials. J. Appl. Crystallogr. 36, 397–404 (2003).
Dubcek, P. Nanostructures as seen by the SAXS. Vacuum 80, 92–97 (2005).
Yager, K. G., Zhang, Y., Lu, F. & Gang, O. Periodic lattices of arbitrary nano-objects: modeling and applications for self-assembled systems. J. Appl. Crystallogr. 47, 118–129 (2014).
Liu, J. et al. The impact of alterations in lignin deposition on cellulose organization of the plant cell wall. Biotechnol. Biofuels 9, 126 (2016).
Paris, O. From diffraction to imaging: new avenues in studying hierarchical biological tissues with X-ray microbeams (review). Biointerphases 3, FB16 (2008).
Aghamohammadzadeh, H., Newton, R. H. & Meek, K. M. X-ray scattering used to map the preferred collagen orientation in the human cornea and limbus. Structure 12, 249–256 (2004).
Liu, J. et al. Amyloid structure exhibits polymorphism on multiple length scales in human brain tissue. Sci. Rep. 6, 33079 (2016).
Weaver, J. C. et al. The stomatopod dactyl club: a formidable damage-tolerant biological hammer. Science 336, 1275–1280 (2012).
Wang, Q. et al. Phase transformations and structural developments in the radular teeth of Cryptochiton stelleri. Adv. Funct. Mater. 23, 2908–2917 (2013).
Meredith, J. C., Smith, A. P., Karim, A. & Amis, E. J. Combinatorial materials science for polymer thin-film dewetting. Macromolecules 33, 9747–9756 (2000).
Stafford, C. M., Roskov, K. E., Epps III, T. H. & Fasolka, M. J. Generating thickness gradients of thin polymer films via flow coating. Rev. Sci. Instrum. 77, 023908 (2006).
Smith, A. P., Douglas, J. F., Meredith, J. C., Amis, E. J. & Karim, A. High-throughput characterization of pattern formation in symmetric diblock copolymer films. J. Polym. Sci. B 39, 2141–2158 (2001).
Davis, R. L., Jayaraman, S., Chaikin, P. M. & Register, R. A. Creating controlled thickness gradients in polymer thin films via flowcoating. Langmuir 30, 5637–5644 (2014).
Meredith, J. C., Karim, A. & Amis, E. J. High-throughput measurement of polymer blend phase behavior. Macromolecules 33, 5760–5762 (2000).
Roberson, S. V., Fahey, A. J., Sehgal, A. & Karim, A. Multifunctional ToF-SIMS: combinatorial mapping of gradient energy substrates. Appl. Surf. Sci. 200, 150–164 (2002).
Berry, B. C. et al. Versatile platform for creating gradient combinatorial libraries via modulated light exposure. Rev. Sci. Instrum. 78, 072202 (2007).
Smith, A. P., Sehgal, A., Douglas, J. F., Karim, A. & Amis, E. J. Combinatorial mapping of surface energy effects on diblock copolymer thin film ordering. Macromol. Rapid Commun. 24, 131–135 (2003).
Toth, K., Osuji, C. O., Yager, K. G. & Doerk, G. S. Electrospray deposition tool: creating compositionally gradient libraries of nanomaterials. Rev. Sci. Instrum. 91, 013701 (2020).
Holman, H.-Y. N., Bechtel, H. A., Hao, Z. & Martin, M. C. Synchrotron IR spectromicroscopy: chemistry of living cells. Anal. Chem. 82, 8757–8765 (2010).
Holman, H.-Y. N. et al. Real-time characterization of biogeochemical reduction of Cr (VI) on basalt surfaces by SR-FTIR imaging. Geomicrobiol. J. 16, 307–324 (1999).
Holman, H.-Y. N. et al. Catalysis of PAH biodegradation by humic acid shown in synchrotron infrared studies. Environ. Sci. Technol. 36, 1276–1280 (2002).
Mason, O. U. et al. Metagenome, metatranscriptome and single-cell sequencing reveal microbial response to Deepwater Horizon oil spill. ISME J. 6, 1715–1727 (2012).
Holman, H.-Y. N. et al. Real-time molecular monitoring of chemical environment in obligate anaerobes during oxygen adaptive response. Proc. Natl Acad. Sci. USA 106, 12599–12604 (2009).
Hazen, T. C. et al. Deep-sea oil plume enriches indigenous oil-degrading bacteria. Science 330, 204–208 (2010).
Bælum, J. et al. Deep-sea bacteria enriched by oil and dispersant from the Deepwater Horizon spill. Environ. Microbiol. 14, 2405–2416 (2012).
Benning, L. G., Phoenix, V., Yee, N. & Konhauser, K. The dynamics of cyanobacterial silicification: an infrared micro-spectroscopic investigation. Geochim. Cosmochim. Acta 68, 743–757 (2004).
Benning, L. G., Phoenix, V., Yee, N. & Tobin, M. Molecular characterization of cyanobacterial silicification using synchrotron infrared micro-spectroscopy. Geochim. Cosmochim. Acta 68, 729–741 (2004).
Yee, N., Benning, L. G., Phoenix, V. R. & Ferris, F. G. Characterization of metal-cyanobacteria sorption reactions: a combined macroscopic and infrared spectroscopic investigation. Environ. Sci. Technol. 38, 775–782 (2004).
Probst, A. J. et al. Tackling the minority: sulfate-reducing bacteria in an archaea-dominated subsurface biofilm. ISME J. 7, 635–651 (2013).
Valdespino-Castillo, P. M. et al. Exploring biogeochemistry and microbial diversity of extant microbialites in Mexico and Cuba. Front. Microbiol. 9, 510 (2018).
Valdespino-Castillo, P. M. et al. Interplay of microbial communities with mineral environments in coralline algae. Sci. Total Environ. 757, 143877 (2021).
Holman, E. et al. Autonomous adaptive data acquisition for scanning hyperspectral imaging. Commun. Biol. 3, 684 (2020).
Davies, T. & Fearn, T. Back to basics: the principles of principal component analysis. Spectrosc. Eur. 16, 20 (2004).
Melton, C. N. et al. K-means-driven Gaussian process data collection for angle-resolved photoemission spectroscopy. Mach. Learn. Sci. Technol. 1, 045015 (2020).
Cao, Y. et al. Unconventional superconductivity in magic-angle graphene superlattices. Nature 556, 43–50 (2018).
Squires, G. L. Introduction to the Theory of Thermal Neutron Scattering (Cambridge Univ. Press, 2012).
Weber, T. Takin 2 (software). GitLab https://code.ill.fr/scientific-software/takin (2021).
Weber, T. Update 2.0 to “Takin: an open-source software for experiment planning, visualisation, and data analysis”, (PII: S2352711016300152). SoftwareX 14, 100667 (2021).
Bostwick, A. et al. Band structure and many body effects in graphene. Eur. Phys. J. Spec. Top. 148, 5–13 (2007).
Boehm, M. et al. ThALES – Three Axis Low Energy Spectroscopy for highly correlated electron systems. Neutron News 26, 18–21 (2015).
The work was partly funded through the Center for Advanced Mathematics for Energy Research Applications (CAMERA), which is jointly funded by the Advanced Scientific Computing Research (ASCR) and Basic Energy Sciences (BES) within the Department of Energy’s Office of Science, as well as by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory, under US Department of Energy contract no. DE-AC02-05CH11231. This research used resources of the Center for Functional Nanomaterials and the National Synchrotron Light Source II, which are US DOE Office of Science facilities, at Brookhaven National Laboratory under contract no. DE-SC0012704. This research also used resources of the Berkeley Synchrotron Infrared Structural Biology (BSISB) Imaging Program, funded by the US Department of Energy, Office of Biological and Environmental Research, under contract no. DE-AC02-05CH11231. The Advanced Light Source is supported by the Director, Office of Science, and the Office of Basic Energy Sciences. Both the ALS and BSISB were supported through contract no. DE-AC02-05CH11231. K.C.E. and C.B.M. acknowledge support from the Office of Naval Research Multidisciplinary University Research Initiative Award ONR N00014-18-1-2497. K.C.E. acknowledges support from the NSF Graduate Research Fellowship Program under grant no. DGE-1321851. This work is based on experiments performed at the Institut Laue-Langevin (ILL) in Grenoble, France. The collected datasets have the DOIs 10.5291/ILL-DATA.TEST-3123 and in part 10.5291/ILL-DATA.4-01-1643. The authors thank E. Villard, P. Chevalier and J. Locatelli for technical support at the ThALES spectrometer. C. N. Melton (author of ref.51) performed the K-means cluster-based GP collection simulations.
The authors declare no competing interests.
Peer review information
Nature Reviews Physics thanks the anonymous reviewers for their contribution to the peer review of this work.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- Uncertainty quantification
The quantitative characterization of uncertainties in computational and real-world applications.
- Himmelblau’s function
A common test function in the optimization community.
- Principal component analysis
A dimensionality reduction technique that finds an orthonormal basis; typically retaining only the first few basis vectors preserves the majority of the variance of the dataset while substantially reducing data dimensionality.
- Non-negative matrix factorization
A computational linear algebra technique to factorize a matrix into two matrices without negative elements.
- Bump function
A function that is both smooth and compactly supported.
- Surrogate model
An approximate model when the actual model is difficult or costly to evaluate.
- Linear interpolation with Voronoi tessellation
A technique for function approximation and automated data acquisition.
- Triple-axis spectrometers
A special spectrometer that selects the wavelengths of neutrons before and after they hit the sample, which directly probes the energy and momentum response of various materials.
- Delaunay triangulation
A triangulation technique such that no point in the set is inside the circumcircle of any of the triangles connecting the points.
About this article
Cite this article
Noack, M.M., Zwart, P.H., Ushizima, D.M. et al. Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities. Nat Rev Phys 3, 685–697 (2021). https://doi.org/10.1038/s42254-021-00345-y
This article is cited by
Nature Computational Science (2022)
Nature Reviews Chemistry (2022)
npj Computational Materials (2022)
Scientific Reports (2022)
Communications Materials (2022)