Introduction

Multimodal imaging methods underpin multiple areas of fundamental and applied sciences. Conventional intermittent contact mode atomic force microscopy yields topographic, phase, and error signals that highlight different aspects of surface structure1,2,3. In combination with detection modes such as electrostatic4,5,6, magnetic7,8,9, and Kelvin probe force microscopy10,11,12,13,14, these technique offers multiple information channels containing information on dissimilar aspects of materials functionality. In optical imaging in biology, specific dies are used to highlight different elements of cell structure and are visualized with different color filters or spectral range in hyperspectral methods. In energy-dispersive electron microscopy and electron energy loss spectroscopy (EELS)15,16, different energy ranges highlight concentrations of individual elements17.

In many cases, imaging is used to define objects of interest for more detailed studies18,19,20,21. In scanning probe microscopy (SPM), the structural or functional images can be used to select locations for force-distance or current-voltage measurements22, or locations for local sampling for chemical studies. In optical and scanning electron microscopy, the imaging data can be used to select locations for e.g., nanoindentation23. In mass-spectrometry, the sampling points are often selected based on the optical or SPM imaging24,25. This paradigm of imaging followed by selection of specific location(s) for detailed studies is common across physical, chemical, and biological imaging. Currently, these studies are often performed as guided by human operator intuition, via a classical point and click approach. However, in this case the process is slow and heavily biased by operator experience and expectations. An alternative approach is that of dense grid-based measurements, such as force-volume26, piezoresponse spectroscopy, piezoresponse nonlinearity measurements in SPM27,28,29,30,31, hyperspectral electron energy loss spectroscopy (EELS) measurements in scanning transmission electron microscopy (STEM)32, photoluminescence lifetime measurement in optical microscopy19, or electron diffraction measurement in electron microscopy29. However, the grid measurements tend to be time consuming and are often limited or impossible for circumstances where the probe or the sample degrade rapidly with measurements.

An alternative in multimodal imaging is thus naturally of interest, enabling a spectroscopy workflow within an automated experiment framework. In this framework, the locations for spectroscopic studies are selected based on the features of interest in multimodal image. Here, the direct problem—performing measurements at a known location of interest—can be engendered via (by now) standard computer vision algorithms. For example, we can choose the specific objects such as domain walls or molecules, to identify locations for detailed spectroscopic measurements18,21,33,34,35,36.

However, the inverse problem—discovering the features of interest in the right channel, e.g., topography, or piezoresponse, or conductivity image channel, that are best predictive of behaviors of interest—is poorly amenable to human operation. For example, we aim to discover which microstructural element has the best predictive capacity for the functional property encoded in polarization hysteresis loop or resonance frequency hysteresis loop such as maximal loop area, imprint bias, or more complex functionals of the loop shape. For unimodal imaging, this approach have recently been demonstrated for STEM-EELS, 4D STEM, and band excitation piezoresponse spectroscopy (BEPS)27,32,37,38. In these studies, we have discovered which features in image space are most predictive of the specific functionalities determined via spectral measurements, for example localization of the hysteresis loops with the maximal area at specific domain walls or emergence of low energy plasmons at the edges of 2D material flakes.

Here, we develop a framework for the automated discovery of the best predictive channel in multimodal imaging for the behavior of interest within a spectroscopic data set. Traditionally, such analysis is based on physical intuition using a priori expected physical relationships. However, this approach often leads to significant operator biases and precludes the discovery of the phenomena of interest. Here, we develop the experimental framework toward the discovery of the channel that offers best predictability for the behavior of interest in multimodal imaging. We have chosen to illustrate these using piezoresponse force microscopy (PFM) as the method that allows multichannel imaging and extensive set of spectroscopies39. However, this approach is universal and applies to other forms of multimodal imaging.

Results

Model materials

As model systems, we explored three thin film samples: lead titanate (PTO)40, lead zirconate titanate (PZT), and bismuth ferrite (BFO), these films are grown on SrRuO3 layers. Band excitation piezoresponse force microscopy (BEPFM) measurements were performed on three model thin film materials to investigate their domain structure. These results are shown in Fig. 1. The PTO thin film indicates both 180° ferroelectric domain structures—dark domain and bright domain in phase image (Fig. 1b), and non-180° ferroelastic domain structures—dark and bright stripe domains in amplitude image (Fig. 1a). The ferroelastic domains exhibit different strain and elastic properties due to the variation in crystallographic orientation, resulting in visible domain contrast in resonance frequency image (Fig. 1c). Topography image (Fig. 1d) also illustrates the ferroelastic domain features. In contrast, the PZT thin film only exhibits non-180° ferroelastic domain structures, displaying in BEPFM amplitude (Fig. 1e), phase (Fig. 1f), resonance frequency (Fig. 1g), and topography (Fig. 1h) images. The BFO majorly shows 180° ferroelectric domain structure (Fig. 1i, j), where the domain wall contrast is also visible in resonance frequency image (Fig. 1k). Notably, a few ferroelastic domains with weak contrast also show in amplitude image (Fig. 1i).

Fig. 1: Band excitation piezoresponse force microscopy (BEPFM) image results of three model samples.
figure 1

ad BEPFM amplitude, phase, resonance frequency, and topography images of PTO sample, respectively. eh BEPFM amplitude, phase, resonance frequency, and topography images of PZT sample, respectively. il BEPFM amplitude, phase, resonance frequency, and topography images of BFO sample, respectively.

Multiple-channel deep kernel learning

Next, we perform a multiple-channel deep kernel learning (DKL) measurement utilizing the ensembles of DKL models and basic reinforcement learning policy. Earlier, we showed how combining the structured Gaussian process with the epsilon-greedy policy allows one to learn a correct model of the system’s behavior and use it to drive the exploration of the configuration space41,42. However, that approach is limited to low-dimensional spaces and is not suitable for the structure-property relationship problems in the multimodal imaging. Here we use DKL43 that is a hybrid of a neural network and a Gaussian process to circumvent the dimensionality problem. As the fully Bayesian implementation of DKL is computationally too slow for real-time feedback and control, we approximated it with the ensembles of DKL models44. In this setup, each neural network in the ensemble is initialized independently resulting in different embeddings connected to separate Gaussian processes and the final prediction for each channel is an ensemble average.

The process of channel learning with ensemble-DKL is shown in Fig. 2a, b. The BEPFM images including amplitude, phase, frequency, and topography are used as four possible input channels. Each image is featurized by splitting it into patches that are used as inputs. The behavior of interest is encoded in polarization or resonance frequency hysteresis loops for each patch as a scalar target. Here we use the hysteresis loop area, but any functional of the spectroscopic signal can be selected. At the beginning of the channel learning experiment, a small, custom-defined number of warm-up steps is taken, at which a separate ensemble of DKL models is trained for each channel. In this process, the channel that produces the lowest mean predictive uncertainty on the unmeasured points is given a positive reward. This rewarded model is also used to derive the next measurement point corresponding to the largest uncertainty value in the prediction. After the warm-up steps, an epsilon-greedy policy45 is used to sample a single channel at each exploration step and derive the next measurement point.

Fig. 2: BEPFM experimental process driven by ensemble DKL.
figure 2

a Ensembel DKL workflow. b BEPFM image channels are used to predict a functional property, e.g., polarization loop area or resonance frequency loop area. The image data have four channels: amplitude (Channel 1), phase (Channel 2), resonance frequency (Channel 3), and topography (Channel 4). The goal is to identify the best channel for predicting the functional property. c A schematic showing hardware connected in the workflow, including a Cypher SPM, a PC, and a GPU server.

We implement this ensemble-DKL workflow on an Oxford Instrument Asylum Cypher microscope. As shown in Fig. 2c, to accelerate the DKL training and prediction we send the real-time measurement data to an Nvidia DGX-2 GPU server for analysis. Specifically, the custom DKL code written in JAX46 is run on a docker container residing on the GPU server. Via a combination of port forwarding and socket programming, data is sent directly from the instrument computer to the DGX-2 device without file I/O, and then processed within the container, taking advantage of the high processing capabilities on the server. For the data transfer, we utilize the mlsocket package (https://pypi.org/project/mlsocket/), which is a wrapper around the low-level python socket interface and enables sending and receiving of numpy arrays. The server houses 16 Nvidia Tesla V-100 GPUs each with 32GB of memory, enabling the different ensemble models to run in parallel. Practically, we select between multi-GPU “parallel” and single GPU “vectorized” approach for the ensemble-DKL training based on the size of image patch and complexity/depth of the neural network. For the image patch size of 20 × 20, the 3-layer fully-connected neural network, and 20 ensemble models, each iteration takes ~30 s when utilizing a single GPU, whereas for a comparison, the same iteration takes ~300 s on the CPU. As such, the connection to edge computing is critical for efficiency and viability of the proposed workflow.

Automated experiments

Here, we performed two sets of measurements—the polarization-voltage loop area and frequency-voltage loop area are used as target property descriptors, which measure the energy loss during switching and voltage-induced irreversible dynamics, respectively—on three model samples. First, a small number of randomly sampled points are measured as seed points for training. In these measurements, we start with 0.25% of the total measurement points as the seed data for DKL training, then perform 20 warm-up steps and 200 exploration steps. In the warm-up steps, each channel is trained in parallel and the one with the lowest mean uncertainty is used to derive the next measurement point. After the warm-up, a single channel is sampled at each step according to the epsilon-greedy policy with epsilon decreased uniformly (“annealed”) from 0.4 to 0.1 during the 200 exploration steps.

Shown in Fig. 3 are the evolution of channel reward, mean predictive uncertainty, and channel selection during the ensemble-DKL driven measurement for three samples. For the PTO sample, when the target property is polarization-voltage loop area (Fig. 3a), amplitude channel shows the highest reward and the phase channel is the second-best. Although the resonance frequency shows a very low reward (Fig. 3a), the evolution of uncertainty (Fig. 3b) indicates that the predictive uncertainty based on resonance frequency channel gradually decreases during the experiment, which implicates that the elastic variation displayed in frequency image has an effect on polarization dynamics. However, the topography channel shows both low reward (Fig. 3a) and no decrease of prediction uncertainty (Fig. 3b). When the frequency-voltage loop area is used as the target property, we observe an increase of reward to the resonance frequency and phase channels at the end of experiment (Fig. 3c), accompanied with larger decrease rate of predictive uncertainty from the resonance frequency and phase channels (Fig. 3d). The behavior of resonance frequency channel is due to the directly correlated property from loops and image data. The behavior of phase channel can be understood as the electrostatic effect on the detected cantilever resonance frequency, where the up and down polarized domains (shown as dark and bright contrast in phase image) may associate with different surface charge states that induce different electrostatic effect.

Fig. 3: Experimental process of three model samples when the functional property is set as polarization loop area and resonance frequency loop area.
figure 3

al Results for PTO, PZT, and BFO samples, respectively. The first and second columns show the evolution of channel reward and the mean predictive uncertainty as a function of experimental steps for the ensemble DKL when the functional property is a polarization loop area. In this case, ensemble DKL identifies the channel with best structure-polarization loop area relationship. The third and fourth columns show the evolution of channel reward and the mean predictive uncertainty as a function of experimental steps for the ensemble DKL when the functional property is resonance frequency loop area. In this case, the ensemble DKL identifies the channel with best structure-frequency loop area relationship. In these measurements, the ensemble DKL analysis starts with 0.25% of total number of points available for measurements and use 20 warm-up states. During the warm-up states, all four channels are evaluated in parallel and the one with the lowest uncertainty is used for next evaluation point. After the warmup phase, we sample a single channel at each step based on the epsilon-greedy policy with epsilon decreased uniformly from 0.4 to 0.1 during the 200 steps. In the uncertainty evolution plots, the color represents the selected channel at a specific step, which allows us to visualize the correlation of channel selection and uncertainty changes.

The PZT results (Fig. 3e–h) is very similar to those of PTO. We ascribe this similarity to the fact that most variability of the phenomena on epitaxial film surfaces are related to ferroelastic domain structure. However, note that predictive uncertainty from topography channel (Fig. 3f, h) slightly decreases during experiment in the PZT sample.

For the BFO results (Fig. 3i–l), when the polarization-voltage loop area is used as target property, the reward to amplitude channel (Fig. 3i) quickly stabilized around 0.5–0.6 after ~50 exploration steps, while other channel rewards drop quickly. Interestingly, the predictive uncertainties of four channels are distinct (Fig. 3j)—the uncertainty corresponding to amplitude channel keeps decreasing, the phase channel uncertainty is also very low but shows a slight increase in the middle of the measurement, and the uncertainties corresponding to frequency and topography channels are very high. When the resonance frequency-voltage loop area is used as target property, the evolution of channel reward and uncertainty (Fig. 3k, l) is similar to that of polarization-voltage loop area as target property. This is most likely because both phenomena are ferroelectric domain related.

After the ensemble-DKL exploration measurement, we can use the ensemble-DKL model to predict the target property at unmeasured points. Notably, the prediction can be made from each channel. Shown in Figs. 4 and 5 are the prediction of polarization-voltage loop area and frequency-voltage loop area of three samples from each channel, respectively. For the PTO and PZT samples, predictions from topography (Figs. 4d, h and 5d, h) display some features also showing up in the predictions from other channels, presumably because the ferroelastic domains also show in topography.

Fig. 4: Ensemble DKL prediction of polarization loop area from each channel after experiment.
figure 4

al Ensemble DKL predictions of PTO, PZT, and BFO, respectively. a, e, i DKL prediction from amplitude image; b, f, j DKL prediction of polarization loop area from phase image; c, g, k DKL prediction of polarization loop area from resonance frequency image; d, h, l DKL prediction of polarization loop area from topography image.

Fig. 5: Ensemble DKL prediction of frequency loop area from each channel after experiment.
figure 5

al Ensemble DKL predictions of PTO, PZT, and BFO, respectively. a, e, i DKL prediction of frequency loop area from amplitude image; b, f, j DKL prediction of frequency loop area from phase image; c, g, k DKL prediction of frequency loop area from resonance frequency image; d, h, l DKL prediction of frequency loop area from topography image.

The model selection during exploration steps is based on both the current channel reward (partially from warm-up steps) and the exploration/exploitation balance with epsilon-greedy policy. For the PTO and PZT results of using frequency-voltage loop area as target property, even if ensemble-DKL used the amplitude channel originally, the frequency channel reward starts increasing at the end of measurements (Fig. 3c, g) and the frequency channel uncertainty decreases faster than amplitude channel in some cases (Fig. 3d, h). Therefore, to investigate more details of the channel behaviors when using frequency-voltage loop area as target property, an additional experiment with enlarged exploration steps and different exploration/exploitation rate in epsilon-greedy policy was performed. In this measurement, we perform 20 warm-up steps and 480 exploration steps. In the exploration steps, epsilon in the epsilon-greedy policy decreased uniformly from 0.9 to 0.01 is used to sample a single channel at each step. Compared to previous measurements, here the epsilon is larger at the beginning of the measurement and smaller at the end of the measurement, corresponding to larger exploration rate at the beginning and smaller exploration rate at the end, respectively. Shown in Fig. 6 are the results, in this measurement, other channels are used more frequently (Fig. 6f). This is because of the higher exploration rate at the beginning of the measurement. In this case, we can observe more details of the evolution of other channels. We observe an obvious increase of reward to the phase channel (Fig. 6e) and the fastest decrease of uncertainty from phase channel prediction (Fig. 6f), probably because of the electrostatic effect as mentioned before.

Fig. 6: Experimental process exploring structure-resonance frequency loop area relationship in PZT sample with a different exploration and exploitation balance.
figure 6

ad BEPFM amplitude, phase, resonance frequency, and topography images. e Evolution of channel reward as a function of experiment steps. f The evolution of mean predictive uncertainty as a function of experimental reward. In this measurement, the ensemble DKL analysis starts with 0.4% of the data and use 20 warm-up states. After warm-up, the channel at each step is sampled with the epsilon-greedy policy and the epsilon decreased uniformly from 0.9 to 0.01 during the 480 steps. Here, the different epsilon as compared with previous measurements lead to a higher exploration chance (due to larger epsilon) at the beginning and higher exploitation chance (due to smaller epsilon) at the end of the measurement. g The measurement locations determined by DKL showing on amplitude map. Note that the measurement points are concentrated at the a-x domain boundaries, at which the polarization tilting can give rise to enhanced responses.

Discussion

To summarize, we have implemented an ensemble-DKL driven automated PFM for the identification of the channel with best predictive capacity, i.e., the channel for the most accurate reconstruction of target property encoded in spectroscopic data. This approach identifies the BEPFM image channel with the most predictive power for a target property of interest during measurement, which is also an indication of the strongest correlation between this BEPFM image channel and the target property.

Here, we implement this approach in BEPFM and piezoresponse spectroscopy measurement, and illustrate its application in exploring the structure-property relationships in three thin film materials with various ferroelectric and ferroelastic properties. To accelerate the ensemble-DKL training and prediction, we also develop an approach enabling real-time data transfer between microscope PC and GPU server, which allows GPU server to analyze the results from the on-the-fly microscope. This workflow and approach are universal and can be applied in other imaging and spectroscopic characterization methods, e.g., electron microscope, optical microscope, mass spectrometry imaging, as well.

Methods

The band excitation piezoresponse force microscopy measurements were performed on an Oxford Instrument Asylum Research Cypher AFM system using an ElectriMulti75-G Budget Sensors tip (Pt/Ir coated) with a band of frequencies near the resonance frequencies to track the resonance frequency shift.

Machine learning code is available at https://github.com/yongtaoliu/Ensemble-DKL. Real-time machine learning analysis during automated experiments was performed on a docker container residing on 16 Nvidia Tesla V-100 GPUs server, data were sent directly from the instrument computer to the GPU server via a combination of port forwarding and socket programming.