Introduction

In an age where advanced material characterization techniques offer access to a wide range of information, the problem essentially lies in extracting a sensible meaning out of the large amount of information collected. However, with an ever-increasing data generation tendency in scientific instruments, the necessary data organization and analysis techniques have also come a long way.1,2,3,4,5,6,7 This is also valid for high-throughput combinatorial searches in materials science, as well as chemistry, in which a large number of parallel or complementary properties may be measured with changes in composition or processing variables. Such data is used to identify previously unknown trends or materials-property connections, but a problem exists in efficient analysis and visualization of the data.8

As far as Scanning Probe Microscopy (SPM) is concerned, the recent developments in functional imaging-modes have been paralleled with a growing interest in studying various local phenomena. A few examples of such phenomena include electrical conduction at ferroelectric/ferroelastic domain walls,9,10,11,12 polarization dynamics in ferroelectrics,13,14,15,16,17,18 temperature/time/voltage dependent study of ergodicity (time dependence) of polarization in relaxor-ferroelectrics,14,19 and local magnetoelectricity.20,21,22,23,24,25,26,27,28 Probing such local phenomena essentially requires measuring a response I ij (x i , P j ) on an X × Y grid, as a function of a spectral parameter P j (j = 1,…,M); where x i is the spatial co-ordinate index (i = 1,…,N; N = X × Y). The format of such spectroscopic acquisition could be selected in two different ways: (i) a point-by-point acquisition of the response as a function of P j , or (ii) a sequence of scans at different values of P j . The choice between either of the formats largely depends upon the time step necessary to stabilize the spectral parameter, taking into consideration the scanner drift. In most cases, voltage is the spectral parameter, which could be swiftly varied, facilitating a drift-free point-by-point acquisition. However, when the spectral variable of interest is temperature, time, or magnetic field, the acquisition time between each step of the spectral variable is larger. In such cases, it is more suitable to opt for the sequential scan format, where a better track of the drift could be kept via simultaneously acquired topography images.

One implication of such spectroscopic techniques includes the local study of magnetoelectric (ME) effect in multiferroic composites. Such composites are a realistic alternative for achieving the foreseen device applications at room-temperature.29,30,31 The ME effect in such composites takes place via stress mediation, where the field induced strain in one phase leads to a stress on the adjacent phase, which ultimately varies the order parameter (polarization/magnetization) of the other phase.32,33,34,35,36 Interestingly, the performance of such composites, gauged by the ME coupling coefficient α, is found to be highly sensitive to microstructure.37,38,39,40 On the other hand, modeling predicts complex domain evolution in composites.41,42,43 PFM based investigation (direct ME effect) is expected to reveal a strong modulation of the local piezoresponse (i.e., polarization) as a function of the externally applied magnetic field 20,21,22,23,24,25,26,27,28 giving an indirect insight into the spatial distribution of the local magneto-electro- mechanical interactions.

In the present work, two particulate composites of BaTiO3 (piezoelectric phase) and two different ferrites (magnetostrictive phase) were studied. As a consequence of the applied magnetic field H, the local polarization is altered as:

$$\Delta P_i = \alpha_{ij}\Delta H_j$$
(1)

with α ij is the ME coefficient. This change in polarization should lead to a change of the local piezoelectric coefficient owing to its relation to polarization:44

$$d_{kij} = Q_{ijml}(P_l\varepsilon _{mk} + P_m\varepsilon_{lk})$$
(2)

where ε mk is the dielectric permittivity tensor, and Q ijml is the electrostriction coefficient tensor. Substituting equation 1 in 2, a magnetoelectrically induced change in d ijk can be expressed as:

$$\Delta d_{kij} = Q_{ijml}(\alpha _{ln}\Delta H_{n}\varepsilon _{mk} + \alpha _{mn}\Delta H_{n}\varepsilon _{lk})$$
(3)

In the variable-field PFM experiment one effectively measures Δd eff (x, y) = f (x, y, H 0 , E 0 , ΔH),32 and hence an exact observation of equation 3 is not trivial. Despite this limitation, PFM can reveal the intensity of the ME effect at a local scale, which could be useful information to relate the ME coupling to the material properties, as well as microstructure.

An experimental validation of equation 3 could be achieved by measuring local piezoresponse as a function of magnetic field, in other words variable magnetic-field PFM. This was realized in the present work by opting for the sequential spectroscopy approach, where PFM images were acquired in sequence at different in-plane magnetic fields. The problem with this approach is that the large time of acquisition required per image restricts the number of magnetic field steps. The total acquisition time can be expressed as the sum of individual scan times, and the ramp time for achieving a stable field value:

$$T_{total} = M \cdot T_{scan} + \frac{| {H_{end} - H_{start}}|}{r}$$
(4)

T scan –Time per scan, M –No. of magnetic field steps, r –Ramp rate, H start –Starting magnetic field, H end –End magnetic field

Equation 4 highlights the inevitable limitation in the choice of a larger M value, which could lead to an increased total acquisition time with all the other parameters being fixed. For the present case, where we consider-H start  = H end  = 6000 Oe, and r = 600 Oe/min, an optimal acquisition time without any break in the sequence comes out to be around 3-4 h by choosing M = 20. An acquisition time larger than this is considered impractical for manual acquisition, and too large for automated acquisition due to the risk of large thermal drifts. We opted for automated acquisition, and had to discard certain images from the sequence due to drift and/or noise being too high, leading to an M value lower than 20.

The problem at hand here constitutes an efficient extraction of the actual material response from I ij , without any a priori knowledge about the functional form of the response. In addition to the material response, the acquired image I ij is expected to contain noise, as well as certain additional non-stochastic variations which have no relation to the concerned phenomena. Such variation in I ij can occur due to (i) varying PFM tip-sample contact conditions,45,46,47 and/or (ii) the presence of relative drifts between each image (columns of I ij ). The linear drift between the images is easily removed as a part of the preprocessing step (see supplementary material); however, removal of the non-linear drift is not always trivial. Since we are interested in extracting the ME-induced variation of piezoresponse, it is useful to consider the problem format similar to that of time-series predictions or blind source-separation tasks,48 where the different non-stochastic variations, including the material response, are expected to be stochastically mixed. The corresponding latent-variable model can be written as:

$$I_{ij} = a_{ik}R_{kj} + n$$
(5)

where I ij (M × N) is the data-matrix, n represents white noise, R kj encompasses the actual material response, as well as the non-stochastic components, and a ik is the weight matrix representing weights of each component at each spatial location. At this point, we have two physical constraints for R kj and aik: (i) the material response in R kj , which is indirectly related to magnetostriction, should be symmetric in magnetic-field, and (ii) it should be localized within the piezoactive (ferroelectric) phase.

The value chosen for N can be sufficiently large, but the choice of M is limited. This effectively limits the application of higher order statistics (e.g., Independent Component Analysis; ICA) due to insufficient sampling available for characterizing the marginal densities of the data-vector components (rows of I ij ).49 At the same time, if the physical phenomena (ME effect) is weaker, then the data could be further obscured. In such circumstances, it is recommended to be restricted to low-order statistics, and Principal Component Analysis (PCA) is a simpler alternative. In the present case, we prefer usage of non-central moments for PCA, based on the intuition that the material response is expected to lie farther away from the origin in the feature-space, contributing to the maximum univariate moment which the PCA algorithm searches for.50 However, it is known that unlike ICA, PCA is limited due to the condition of orthogonal bases, and cannot correctly fit data with non-orthogonal density distribution.51 In other words, PCA is prone to show artifacts in cases where there is more than one independent component present in the data. To monitor this problem, we opt for an approach where we initially label the data using Density Based Clustering (DBSCAN) and visualize the labeled data by means of non-linear projection in 3D. The visualization of labeled data allows us to asses the outcomes of PCA.

Results and discussion

Data-structure visualization

With I ij as the data-matrix, we end up with an M > 3 dimensional data-space. A 3D projection of this multi-dimensional data is necessary to characterize the data-structure, which will also serve as a starting point for the analysis that will follow. We have utilized t-Stochastic Neighbor Embedding (t-SNE)52 for the 3D projection of the M-dimensional data (Fig. 1).

Fig. 1
figure 1

Labeled 3D representation of data corresponding to a BTO-BaM, and b BTO-CFO systems, based on the clusters identified by DBSCAN (gray: cluster-1, red: cluster-2, blue: cluster-3)

t-SNE is known to restore the local structure of the high dimensional data into its low dimensional representation. In other words, the visually observable clusters in the 3D projections should correspond to the individual non-stochastic components present in the data. Here, the data-structure of the two different material systems manifest different traits. The data corresponding to the BaTiO3–BaFe12O19 system (BTO-BaM) hardly manifests any clusters, suggesting a weaker presence of the individual non-stochastic components. However, the BaTiO3–CoFe2O4 system (BTO-CFO) data show a clear distinction between different groups of data-points. Next, we apply DBSCAN to rigorously identify these clusters, irrespective of the visual obscurity in the projected data.

Density based clustering

Identical responses tend to cluster in the data-space, and a suitable parameter which can be used to define a cluster is the density based connectivity of the corresponding data-points.53 DBSCAN is a unique clustering algorithm which looks for clusters based on their local density, and is hence more suitable for the current data-sets. The input criteria required for DBSCAN, namely the neighborhood radius ϵ (Euclidian), and the minimum number of neighborhood points p, were determined from the data-sets. First, the value of p was heuristically fixed for each data-set based on the total number of data-points, followed by the determination of the average p-nearest neighbor distance for the entire data-set, which was set to be ϵ. Based on these criteria, three clusters were identified in each data-set, as shown in Fig. 1.

Based on visual comparisons of the PFM images (Fig. 2a,b) and the cluster domain-maps (Fig. 2c,d) of both systems, it becomes evident that cluster-2 corresponds to the ferroelectric phase, and should hence represent the material response. On the other hand, cluster-1 in both cases localizes itself in an area dispersed around the interface, and corresponds to the outliers of DBSCAN. The data-points corresponding to the magnetostrictive phase in both systems have been classified as cluster-3. Figure 2e,f shows the mean-vectors corresponding to each cluster, highlighting interesting nuances. In both cases, the cluster-3 mean-vector does not carry any physical significance as it corresponds to the piezo-inactive magnetic phase. Whereas the cluster-2 mean-vector should ideally reveal an average trend in the magnetic-field dependence of the piezoresponse. In case of BTO-CFO, a reasonable trend in the piezoresponse is visible, which roughly fulfills the physical constraint of symmetry w.r.t. magnetic-field. However, in the case of BTO-BaM although the corresponding mean-vector does show a symmetry w.r.t. magnetic-field, the relative variation of the piezoresponse is subtle. These nuances in the mean-vector, combined with the features of the data-structure, suggest that the observed local ME effect inBTO-BaM as is weaker as compared to that in BTO–CFO.

Fig. 2
figure 2

Spatial locations of the identified clusters (domain maps) for system-1 c, and system-2 d. e, f show the mean-vectors of clusters corresponding to system-1 and system-2, respectively. Corresponding PFM images for BTO-BaM a, and BTO-CFO b are shown for reference

Upon labeling the data (Fig. 1), it becomes evident that the non-stochastic components in R ij have been classified into isolated clusters of high density in the feature-space (cluster-2, cluster-3). The fact that the outliers in both cases are localized around the interface suggests that they correspond to a drift-based variation in piezoresponse. As a consequence of the non-linear drift residual in the images, the interface data-points undergo a larger variation due to a larger difference in piezoresponse present on either side of the interface. Based on this intuition BTO-BaM possesses a relatively larger non-linear drift, leading to larger quantities of data-points being affected. The interface points in BTO-CFO are seemingly not so strongly affected by the non-linear drifts, and hence correspond to a nominal change from the material response (cluster-2). This conclusion is further corroborated by the fact that in BTO-CFO the cluster-1 more closely corresponds to the cluster-2 (Fig. 2f).

Principal component analysis

Figure 3 shows results corresponding to the Singular Value Decomposition (SVD) of the BTO-BaM data-matrix. The 1st component in PCA corresponds to the highest univariate second-moment (equivalent to variance for centered data) direction, followed by components with a decreasing order of the moment. Simultaneously, the score maps represent the spatial score of the corresponding patterns. We observe that the 1st component, which has a higher score in the ferroelectric phase, fulfills the physical requirement of horizontal (i.e., w.r.t. the magnetic field) symmetry (Fig. 3a,e), whereas the subsequent 2nd and 3rd components do not seem to have any obvious physical correlation (Fig. 3b,c,f-g). Interestingly, the 4th component seems to be more relevant (Fig. 3d,h), but apparently the corresponding moment is too small (Fig. 5a) to consider it as being significant.

Fig. 3
figure 3

The first four principal components eh, and the corresponding score maps ad, decomposed from the data-matrix of system-1 (The bright and dark contrast corresponds to high and low score value, respectively)

Fig. 4
figure 4

The first four principal components eh, and the corresponding score maps ad, decomposed from the data-matrix of system-2 (The bright and dark contrast corresponds to high and low score value, respectively)

Upon SVD of the BTO-CFO data-matrix (Fig. 4), the 1st component indeed shows the expected field dependence (Fig. 4a,e), but at the same time the 2nd component also shows a significant related phenomenon in the data (Fig. 4f), which is localized within the ferroelectric phase (Fig. 4b). Subsequently, the 3rd and 4th components possess negligible moments (Fig. 4c,d,g,h), and are hence ignored from further analysis.

Fig. 5
figure 5

Moment vs. components curves corresponding to BTO-BaM a, and BTO-CFO b, showing the decreasing order of the non-central univariate moment among the components

Discussion

In order to understand the different results obtained by PCA of the two material systems, we consider the basic properties of PCA, which is minimization of the bivariate second-moments and maximization of the univariate second-moments. In the case of a centered data-set, the univariate moments could be considered as the average of the distance of the data-points from their mean projected on the component axis. However, in the present case, where the data is not centered, this distance is measured from the origin, and hence the farthest lying data-points will contribute the maximum univariate moment. In reality, the piezoresponse value corresponding to the magnetostrictive phase is much smaller than that of the ferroelectric phase and, hence, the corresponding data-points are located closer to the origin. As a result, the first principal component will be predominantly decided by the data-points corresponding to the ferroelectric phase. In order to fulfill its second requirement of minimization of the bivariate second-moments, PCA looks for an axis with a minimum average of squared orthogonal distances from the data points. In other words, it looks for a regression straight-line passing through the data along the direction of the maximal univariate-moments, which also intercepts the origin. Intuitionally, the data-points corresponding to the magnetostrictive phase, which has relatively high stochastic spread, have relatively low potential to affect the minimization of the bivariate moments compared to the data-points of the ferroelectric phase.

Based on the theoretical arguments mentioned above, the interesting outcomes of PCA can be analyzed. In the case of BTO-BaM, the blue cluster lies closer to the origin, whereas the red cluster lies farthest from the origin. Hence, the first component essentially represents the red cluster, as evident from the spatial correlation, as well as its qualitative resemblance to the corresponding mean-vector (Fig. 2e). Next, the drift-induced spread in the data-points pertinent to the interface decides the next highest direction of moment, namely the 2nd component. Evidently, the score of the 2nd component is also higher near the interface (Fig. 3b). However, in the case of BTO-CFO a different analysis is required. In order to understand the difference, we refer to the mapped data-structure for BTO-CFO (Fig. 1b). It can be seen that the red cluster in BTO-CFO has a large and uniform spread in the feature-space. This signifies that the field dependence of piezoresponse undergoes continuous deviation within the ferroelectric phase, with the largest deviation occurring at the interface (outliers in DBSCAN). In other words, there is a continuous spread in the data-points lying farther away from origin, which leads to the 1st and the 2nd component both dividing themselves optimally between the two important events in the data, namely the pattern in piezoresponse, as well as the spread of that pattern. It is interesting to note how efficiently PCA takes into account the main event, as well as the spread in both cases, despite the different nature of the spread.

Apparently, PCA seems to demonstrate a best-suited extraction of the physical phenomena in the concerned data-structure. Up to this point, the observed magnetic field dependence of piezoresponse does not seem to have a non-physical origin, since along with the fact that it localizes itself within the ferroelectric phase, we have also cross-checked the simultaneously acquired topography response using PCA (see supplementary material). Also, we conclude that the two systems show different types of magnetic-field dependency: on one hand, BTO-BaM shows a unique but weak dependence, whereas on the other hand BTO-CFO shows a stronger dependency on magnetic field which apparently undergoes nominal deviation within the ferroelectric phase. This deviation could be associated with the varying degrees of ME coupling due to a varying texture of BaTiO3.

In conclusion, by carrying out combinational analysis using PCA and DBSCAN, we were able to extract physical meaning out of the variable magnetic-field PFM data-sets. At the same time, using valid experimental arguments, we highlighted the inherent ‘small-data’ problem of the sequential imaging data-sets, which needs to be tackled owing to the general interests in such type of experiments. The 3D data representation obtained using t-SNE suggested a varied distribution of the data-points, highlighting the sparse nature of the data-vectors. The data-points were efficiently clustered using DBSCAN. The labeled data-structure of both systems clearly manifested data-points corresponding to the ferroelectric phase, the magnetic phase, and the interface. A key difference between the data-structures of both the systems was in terms of their secondary spread. Despite the individual complexities, both data-sets were optimally analyzed by PCA. This leads to the conclusion that utilizing second order non-central moments should provide an optimal feature extraction for such data-sets. As a general physical conclusion, we notice that the locally observed ME effect is weaker in the hexaferrite based composite as compared to that in the spinel-ferrite based composite. This difference could be associated with the fact that in hexaferrite the number of possible easy axes per crystallite, along which the magnetostriction is highest, is limited to 2, whereas in case of spinel-ferrites there are 4 such possible axes, leading to a more uniform stress distribution at the interface.

Methods

The studied systems are particulate composites prepared by means of standard ceramic processing route.23 The prepared ceramic disks were ground down to under 0.5 mm thickness, in order to be able to apply high magnetic field in a confined region. The studied surfaces on the disks were then finely polished down to grit size of 0.25 µm.

The measurements were carried out using a commercial scanning probe microscope (MFP-3D, Asylum Research). The PFM amplitude images were collected in a dual amplitude resonance tracking mode (DART-PFM™), such that the operational frequency stays close to the contact resonance frequency. Fresh uncoated doped silicon cantilevers (SEIHR, Nanosensors) with a spring constant k = 15 N/m and a free air resonance frequency of f = 130–250 kHz were used. The corresponding contact resonance frequency was about 600 kHz. The choice of the cantilevers was optimized after several trial and error steps, in order to provide an optimal PFM contrast. A commercial variable magnetic field module (VFM2, Asylum Research) was used to apply in-plane magnetic field up to 3 kOe. The scan resolution was 512 × 512 pixels.

The t-SNE 3D representation was obtained using an open source MATLAB tool distributed by Laurens van der Maaten. The DBSCAN was implemented using the open source Python library distributed by scikit-learn.org.54

Code availability

The codes generated during the study are available from the corresponding author (H. Trivedi) upon reasonable request. The codes are not publicly available due to them containing information that could compromise research participants consent.

Data availability

The datasets generated and analyzed during the study are made available as supplementary information files.