# Characterizing transition-metal dichalcogenide thin-films using hyperspectral imaging and machine learning

## Abstract

Atomically thin polycrystalline transition-metal dichalcogenides (TMDs) are relevant to both fundamental science investigation and applications. TMD thin-films present uniquely difficult challenges to effective nanoscale crystalline characterization. Here we present a method to quickly characterize the nanocrystalline grain structure and texture of monolayer WS2 films using scanning nanobeam electron diffraction coupled with multivariate statistical analysis of the resulting data. Our analysis pipeline is highly generalizable and is a useful alternative to the time consuming, complex, and system-dependent methodology traditionally used to analyze spatially resolved electron diffraction measurements.

## Introduction

Transition-metal dichalcogenides (TMDs) display emergent properties when reduced to single, two-dimensional (2D) layers. A transition from indirect to direct band gap1,2, the emergence of charge density waves3,4 an increase in mobility5,6,7, and the presence of valley polarization8,9,10 are a few of the important properties that are manifested in the monolayer limit.

Polycrystalline TMD thin films can be grown at wafer scale and lend themselves to scalability11,12. These films have a high density of intrinsic grain boundaries and other defects that can influence physical properties and drive exotic correlated electron effects and emergent phenomena4. In this communication, we characterize large area polycrystalline thin-films of WS2 using scanning nanobeam electron diffraction, also called four-dimensional scanning transmission electron microscopy (4DSTEM) to identify the local crystalline texture and structure. We employ advanced multivariate statistical analysis (MVA) techniques to rapidly extract pertinent information, namely the grain structure of the WS2 films, from the complex, high-dimensional 4DSTEM data.

WS2 films are grown directly on electron transparent SiN membranes, resting on Si supports, using a previously described technique13 . Samples are prepared by depositing a coating of 10 nm of SiO2 on the SiN membrane, as well as the back and edges of the support window using plasma-enhanced atomic layer deposition (PE-ALD). This provides an ideal growth substrate on the electron transparent window and protects the Si support frame from chemical conversion during subsequent steps. 2 nm of WO3 is deposited onto the substrates using PE-ALD. The metal-oxide precursor is converted to WS2 in a dry (< 10 ppm water) tube furnace at 800 °C using H2S as a chalcogenization agent.

In a 4DSTEM experiment (Fig. 1a), we acquire diffraction data over a wide area of the sample14,15,16. This is in contrast to traditional dark-field (DF) TEM imaging, where a physical aperture is placed in the diffraction plane of the instrument at the location of a Bragg spot, resulting in an image formed by Bragg scattered electrons that have passed through the aperture. DF-TEM characterization uses a series of aperture images, acquired at several aperture positions, to construct a map of the spatial distribution of the crystalline grains in a sample17,18 . In contrast, 4DSTEM simultaneously acquires all possible aperture positions, including those that do not fall directly on a Bragg peak.

Figure 1b shows a conventional STEM image of a WS2 thin-film acquired using an annular dark field (ADF) detector. The contrast in this image indicates differences in thickness, mass density, and local crystallography of the sample. The bright regions are the thin film, the dark regions are voids, and the very bright spots are regions of contamination. The sample presumably has a distribution of grain sizes and orientations, but this is not directly apparent from the STEM image of Fig. 1b.

We now investigate the same WS2 film via 4DSTEM. Our attempts to study crystallinity and order in these films using atomic resolution STEM failed to provide satisfactory results due to the 20 nm thick SiN support and significant beam damage to the WS2 monolayers. Increasing the probe size, thus reducing the STEM resolution, allows a reduction of the electron flux below the damage threshold while collecting information about crystalline order through spatially resolved diffraction patterns. Furthermore, while the quality of HRSTEM results are affected by contamination, diffraction patterns that are used in this 4DSTEM study are relatively insensitive to amorphous contamination spots seen in Fig. 1b.

The red circles in the inset of Fig. 1b correspond to approximate positions of the probe during the 4DSTEM mapping. Figure 1c shows a visualization of the associated raw diffraction pattern data. Each red box in Fig. 1c presents spectral data collected from the corresponding spatial pixel (red circles) in Fig. 1b. It is apparent that each diffraction pattern has a mixture of two main features: sharp, bright spots arranged in an approximately hexagonal pattern arising from Bragg scattering from the crystalline planes of the thin-film and a diffuse component with approximate azimuthal symmetry that arises from the amorphous support substrate (the highly saturated central spot from unscattered electrons contains no useful information and has been masked in Fig. 1c).

The data in Fig. 1c hint at differently oriented crystallites (i.e. domains) with hexagonal (sixfold) symmetry within the sample. However, the true rotational symmetry and detailed domain structure are not easily individually identified and assigned by eye. In fact, the rotational symmetry in this specimen is not sixfold at all (MVA analysis reveals the domains have threefold crystalline symmetry). This illustrates the general difficulty of directly visualizing or assigning unambiguous meaning to higher dimensional data, particularly 4DSTEM data sets.

Figure 2 shows the results of a traditional analysis of the 4DSTEM data. Bragg peaks are detected in 10 randomly chosen diffraction patterns using difference of Gaussian (DoG) blob detection. The detected blobs are extracted and averaged together to create an exemplar for the diffraction spots, which is then used as a template. Bragg peaks are detected in each diffraction pattern of the 4DSTEM data using cross-correlation matching of the template. This preliminary set of diffraction peaks is enhanced by removing any matches that fall outside of a well-defined range of reciprocal space radii (3.43 nm−1 ≤ q ≤ 3.94 nm−1), corresponding to the in-plane reciprocal lattice constant of WS2 (q0 = 3.67 nm−1). The image shown in Fig. 2 is generated by drawing lines corresponding to the orientations of all Bragg peaks at each spatial pixel. The color scale indicates the angle, in degrees, of each Bragg reflection (modulo 60 degrees).

Figure 2 indicates that the specimen is comprised of many small grains with lateral sizes on the order of ten nanometers. The majority of the domains do not overlap, but there are regions on the sample with multiple crystallographic orientations at a single spatial location. The result presented in Fig. 2, while striking, is time consuming to construct and the data analysis relies on a priori knowledge of the crystal structure, implying that the methodology is not necessarily generalizable to other systems. Furthermore, even though the image presented in Fig. 2 has a significant reduction in size and dimension compared to the original data set, there is still too much information density to allow the facile extraction of the most relevant properties of the system under investigation (e.g. the precise distribution of grain sizes and orientations).

MVA techniques are extremely useful for tackling the problems of dimensional reduction and information extraction from complex data sets19,20,21,22,23,24. These statistical techniques result in a simplified representation of high-dimensional data consisting of a small number of low dimensional components which convey the general trends in the original data set. Some preliminary attempts have been made to approach 4DSTEM analysis using MVA, with varying degrees of success24,25. With this in mind, we apply MVA methodology, using the approach outlined in Fig. 3. Prior to MVA decomposition, the data are pre-treated. Many MVA techniques are highly sensitive to small shifts and outliers in the data, which can either be a blessing or a curse. In order to minimize artefacts in the MVA output, the data are first aligned (there are small shifts in the diffraction patterns recorded at different spatial pixels) and outliers are removed (e.g. cosmic rays result in “hot pixels” and the intensity from the central beam is both highly saturated and much greater than the intensity from scattered electrons).

The shifts between diffraction patterns are calculated by cross-correlation of the central beam followed by interpolation of the data to bring it into registration. The cross-correlation is enhanced by applying a noise reducing Gaussian filter followed by an edge-finding Sobel filter to the data before calculation of the cross-correlation coefficient26.

Hot pixels are removed using a 3 × 3 median filter and the intensity distribution from the central beam is masked with a circular disk. The final step of data pre-treatment is rebinning each diffraction pattern, making the size of the data more manageable and reducing the computation time for subsequent steps.

The first step of the MVA portion of the data analysis workflow is to decompose the data into a new basis using principal component analysis (PCA)20,27 . The Principal Components (PCs) are orthogonal and describe which parts of the data contain the most variance. The first PC accounts for the most variance in the data, the second PC accounts for the second most variance, and so forth. The PC basis is constructed from the data covariance matrix, $$\mathbf{C}\left(\mathbf{k}\right)$$, given by:

$$\mathbf{C}\left(\mathbf{k}\right)= \sum_{{\varvec{x}}}\left(\mathbf{D}\left(\mathbf{x},\mathbf{k}\right)-{\overline{\mathrm{D}}}(\mathbf{k})\right){\left(\mathbf{D}\left(\mathbf{x},\mathbf{k}\right)-{\overline{\mathrm{D}}}(\mathbf{k})\right)}^{\mathbf{T}}$$
(1)

$$\mathbf{D}\left(\mathbf{x},\mathbf{k}\right)$$, the as-acquired data set, is a function of two spatial directions ($$\mathbf{x}$$) and two spectral dimensions ($$\mathbf{k}$$), $${\overline{\mathrm{D}}}(\mathbf{k})$$ is the mean over the spatial dimensions, and T denotes the matrix transpose. The PC basis vectors, $${P}_{\alpha }(\mathbf{k})$$, are the eigenvectors of $$\mathbf{C}\left(\mathbf{k}\right)$$. In the PC basis, the data are represented as:

$$\mathbf{D}\left(\mathbf{x},\mathbf{k}\right)= \sum_{\alpha =1}^{\mathrm{N}}{a}_{\alpha }\left(\mathbf{x}\right){P}_{\alpha }\left(\mathbf{k}\right)$$
(2)

where $${a}_{\alpha }\left({\varvec{x}}\right)$$ are the spatially varying weight coefficients and N is the dimension of the raw data. N is either the total number of spatial pixels or total number of spectral pixels, whichever is smaller. Traditionally, the weights (which, for the data discussed here, are real-space images) are called the PC loadings, and the PCs (which, for the data discussed here, are diffraction patterns) are called the PC factors.

Figure 4 shows the primary features of the PCA decomposition. The first several components (1–5) are the most important and they indicate clear spatial structure. We observe a rotationally symmetric component, related to the mean response of the sample, as well as azimuthally varying ring-shaped components which describe the intensity of the Bragg spots throughout the sample. The next components (15–19) have less clear spatial structure, but decidedly more complex spectral structure; these components describe complicated intensity variations of the diffuse background. Components 50–54 have no discernable spatial structure and an intricate spectral structure that has no immediately clear meaning. The final components (500–504) have no structure either spatially or spectrally and show the descent of the components into random noise.

A fundamental assumption of PCA is that a data set can be described to a high degree of precision by retaining only N´  N components. It is assumed that the most important parts of the data (those with the highest variance) reside in the earlier components, while the later components contain primarily high-frequency noise, similar to Fourier decomposition and compression; inspection of Fig. 4 suggests that this assumption is valid. In this case, the reconstructed model of the data, $${\mathbf{M}}_{\mathrm{P}\mathrm{C}\mathrm{A}}$$, is given by:

$$M_{{PCA}} \left( {x,k;N^{\prime}~} \right) = ~\mathop \sum \limits_{{\alpha = 1}}^{{N^{\prime}~}} a_{\alpha } \left( x \right)P_{\alpha } \left( k \right)~~$$
(3)

The PCA components are orthogonal and thus do not necessarily describe physical processes. In order to decompose the PCA components into a new basis that more accurately reflects the physical reality of the sample, we employ independent component analysis (ICA) unmixing to perform blind source separation (BSS) of the spatial PCA loadings28,29,30.

BSS assumes the conjecture that if signals are from distinct physical processes, those signals will be statistically independent. The crux of the method is the reasonable (but logically unwarranted) assumption that this conjecture can be reversed; namely, BSS assumes that if signals are statistically independent, then they originate from different physical processes.

Two signals, X and Y, are uncorrelated if <XY> = <X><Y> where the brackets denote the expectation value. Two signals are statistically independent if <XpYq> = <Xp><Yq> for all positive integers p and q. Statistical independence is related to correlation but is a stronger condition. For example, the x and y coordinates of a body in uniform circular motion are uncorrelated but not statistically independent.

The goal of ICA is to un-mix a set of components into a new basis that has maximal statistical independence. We use a subset of the PCA loadings (real space images) as the set of components for separation. In ICA, the model of the data is given by:

$$M_{{ICA}} \left( {x,k;N^{\prime\prime}~} \right) = ~\mathop \sum \limits_{{\alpha = 1}}^{{N^{\prime\prime}~}} c_{\alpha } \left( k \right)\; I_{\alpha } \left( x \right)~$$
(4)

where $${\mathbf{M}}_{\mathrm{I}\mathrm{C}\mathrm{A}}$$ is the new ICA model which is again a function of two spatial directions x and two spectral dimensions k, $${c}_{\alpha }\left(\mathbf{k}\right)$$ are the spectrally varying weight coefficients, $${I}_{\alpha }\left(\mathbf{x}\right)$$ are the spatially varying independent component maps, and $$\mathrm{N}\mathrm{^{\prime}}\mathrm{^{\prime}}$$ is the reduced dimension of the independent component space. Each new independent component is constructed as a linear combination of principal components,

$${I}_{\alpha }\left(\mathbf{x}\right)= \sum_{\beta =1}^{{\mathrm{N}}^{\mathrm{^{\prime}}} }{w}_{\alpha \beta }\;{a}_{\beta }\left(\mathbf{x}\right)$$
(5)

where $${a}_{\beta }\left(\mathbf{x}\right)$$ are the principal component loadings and $${w}_{\alpha \beta }$$ are entries of the mixing matrix. The FastICA algorithm31 is a reliable method to efficiently determine the mixing matrix, which gives the set of independent components that have maximal statistical independence from one another.

After the mixing matrix has been computed using FastICA, the complementary k-space independent components $${c}_{\alpha }\left(\mathbf{k}\right)$$ are determined by:

$${c}_{\alpha }\left(\mathbf{k}\right)= \sum_{\beta =1}^{{\mathrm{N}}^{\mathrm{^{\prime}}} }{w}_{\alpha \beta }\;{P}_{\beta }\left(\mathbf{k}\right)$$
(6)

The independent components of Eqs. (5) and (6) underpin the model that emerges from the original experimental data, and as such provide a method to rapidly examine the major features of a large, high-dimensional data set.

One of the biggest challenges to utilizing MVA is the determination of the number of components to keep for the final reconstruction (N′). Figure 5 outlines several metrics for selection of N′. The PCA Scree plot (Fig. 5a) shows what proportion of the total variance each component adds to the data. Inspection of Fig. 5a reveals that there are four distinct regimes of components, denoted by the vertical colored lines. The first component alone accounts for nearly 95% of the total variance in the data (left of the red line). The curve defined by the components in the second regime (left of the green line) has a distinct shape, and each component accounts for between 0.1–% of the variance in the data. The remaining two regimes (left and right of the blue line) appear as a smooth curves with an elbow around 500 components. These values (1, 10 and 500 components) are useful trial values of N′ for inspecting general trends in the PCA decomposition.

The integral of the Scree plot (Fig. 5b) shows how much each component adds to the cumulative variance, and is a useful metric for determining the final number of components in a decomposition. A common method for choosing N′ is to keep all components below an arbitrary threshold in the cumulative variance plot or the Scree plot. Figure 5b shows three different choices of cutoff, the knee of the curve (red), 99% of the total variance (green), and 99.9% of the total variance (blue). In this plot, the vertical lines indicate the final output dimension and the horizontal lines indicate the choice of threshold.

The mean square reconstruction error (MSE) is given by:

$$e_{M}^{2} \left( {N^{\prime}} \right) = ~\frac{1}{{N_{x} N_{k} }}~\mathop \sum \limits_{{x,k}} \left| {D\left( {x,k} \right) - M_{{PCA}} \left( {x,k;N^{\prime}~} \right)} \right|^{2} = ~\frac{1}{{N_{k} }}\mathop \sum \limits_{{\alpha = N^{\prime}~ + 1}}^{N} \lambda _{\alpha }$$
(7)

Here, $${\lambda }_{\alpha }$$ are the eigenvalues of the data covariance matrix (C), $${\mathrm{N}}_{\mathbf{x}}$$ and $${\mathrm{N}}_{\mathbf{k}}$$ are the number of spatial and spectral pixels, D is the data set, and $${\mathbf{M}}_{\mathrm{P}\mathrm{C}\mathrm{A}}$$ is the reconstructed PCA model. The last expression in Eq. (7) can be quickly calculated for all values of N′ and can be used to rapidly determine a value of N′ based on the MSE. PCA decomposition can be used for lossless data compression and storage when N′ is chosen such that the root mean square reconstruction error (RMSE) is equal to the noise floor of the measurement20. The resulting representation of the data has increased signal to noise and decreased data size with the same information content. Figure 5c shows the RMSE as a function of N′, calculated using Eq. 7. Figure SI2 shows the computation time for PCA decomposition as a function of output dimension. The information in Figs. SI2 and 5c can be used to determine a compromise between runtime and information loss.

We use intuition gained from the PCA Scree plot to choose the number of components for the ICA input (N′) and output (N′′). We choose N′ = 500 based on the second knee in the PCA scree plot (marked with a blue line in Fig. 5a). Figure 6 shows the results of ICA unmixing of spatial components using 498 PCA inputs and an output ICA dimension of 49. Similar to PCA, the choice of ICA output dimension has to include the majority of information in a data set and will vary from experiment to experiment. The ICA output dimension can be chosen by varying N′′ until the final analysis reaches the desired degree of accuracy (see Fig. SI3). The first and eighth PCA components are both azimuthally symmetric and are removed from the analysis to increase contrast in the relevant final ICA outputs, described below.

The ICA components display several major attributes. Most striking, is the presence of distinct crystalline grains with three-fold rotationally symmetric diffraction patterns, shown in Fig. 6a (k-space components, top subpanels). The spatial distribution of each unique grain type is shown in the associated real space component (bottom subpanels).

The first noise component in Fig. 6b, shows the mean response of the sample under the electron beam. The majority of the noise components appear as spatially homogeneous, illustrated by most of the bottom subpanels of Fig. 6b, with no apparent physical meaning other than instrumentation noise. Finally, we find rare components in Fig. 6b that are not spatially homogenous and have hexagonal diffraction patterns with approximate two-fold rotational symmetry. We attribute these components to describing differences in tilt parallel to the beam.

The model that has emerged from ICA unmixing is that the sample is largely composed of distinct crystalline grains, each with three-fold rotational symmetry consistent with the 1H phase. Although our choice of N′ and N′′ can be argued to be insufficient for a general case, doubling the number of PCA inputs and/or ICA outputs does not significantly affect the outcome shown in Fig. 7. Importantly, this three-fold symmetry, a key physical attribute of monolayer WS2, is not readily apparent from the raw data and has only emerged after MVA processing.

To extract the details of the grain size and rotational orientation across the WS2 specimen, we employ image featurization (using Hu image moments32) and a clustering algorithm (affinity propagation33) to automatically sort the components into groups that have similar spatial features and spectral symmetry. This clustering analysis is applied to the featurized IC diffraction patterns, and the same grouping is then applied to the corresponding IC spatial images.

We use standard thresholding and particle analysis methods to generate histograms of the grain sizes and orientations (Fig. 7b,c) and a rotational orientation grain map (Fig. 7a). Spatial pixels with no appreciable intensity from a crystalline component are assigned a special value and colored black. From the grain area distribution we see that the CVD synthesis method used to produce the WS2 films favors small grains (10 nm2), but larger ones are also present up to approximately 100 nm2. We also find from the diffraction patterns that there is no preferred orientation for the crystalline grains.

In order to assess the efficacy of our MVA methodology, we compare the results using MVA to similar results obtained using traditional analysis methods34. Figure 8 shows both results side by side. The range of angles for the grain map using MVA has been reduced to 60 degrees for a fair comparison (hence Figs. 7a, 8b) are very similar, but not identical). We see that although the agreement is not perfect, the two maps have a high level of similarity, confirming that our MVA methodology is a useful tool for quickly assessing the approximate distribution of sample parameters in a high dimensional data set.

In conclusion, we present an MVA data analysis pipeline applied to 4DSTEM characterization of polycrystalline WS2 monolayer films. We demonstrate that the MVA approach is able to quickly parse complicated, high-dimensional 4DSTEM datasets into easily digestible information, providing immediate feedback during data acquisition and supplementing traditional model driven scientific inquiry. The PCA/ICA approach described here requires no physical assumptions about the sample, however the resulting components are often self-validating by displaying physical features describing the system. Perhaps, the most valuable feature of this analysis framework is the fact that it can guide the refinement of computationally intensive physical models and analysis frameworks used to fit or model the data for deep understanding.

## Methods

### Samples

Polycrystalline WS2 films are obtained via WO3 conversion13. 2 nm of WO3 is deposited using ALD onto 30 nm thick electron transparent Si3N4 TEM windows. The WO3 is then converted to WS2 by flowing H2S gas over the films at elevated temperature.

### 4DSTEM imaging

4DSTEM data are collected using an FEI Titan 80–300 operated at an acceleration voltage of 200 kV. Electron diffraction patterns are acquired at 30 frames per second using a 14 bit Gatan Orius 830 CCD Camera. A standard FEI C2 aperture (10 um) was used to define the 2.7 nm diameter electron probe with a convergence angle 0.6 mrad and dose rate of 109 e2 sec. The map is acquired over a square 100 spatial pixels per side with a step size of 2 nm between pixels.

### Determinant of Gaussians (DoG) Blob detection

The Determinant of Gaussians (DoG) blob detection method is a standard tool for detecting local peaks in images. First, a series of Gaussian blurs with increasing standard deviations are applied to an image. The difference between successive images in this series is taken and spatial pixels that display maxima (as a function of Gaussian blur standard deviation) are identified as blobs. For the data presented in the manuscript, 6 standard deviations were used between 2 and 10 pixels.

### Clustering analysis

For the input of the clustering analysis we first binarize each diffraction pattern using the Otsu method. Then we use the Hu image moments of the binarized diffraction patterns as the feature vector for clustering analysis. Hu moments are invariant under translation, scale and rotation and therefore sort images into groups with similar symmetry32. The feature vectors are given to scikit learn’s affinity propagation algorithm which finds the optimal number of clusters and assigns each input into a cluster33,35.

### Particle analysis

We use the IC images shown in Fig. 4a as the input for the analysis. First the images are thresholded using the Otsu method. We then perform a binary closing operation to remove grains smaller than two pixels and close holes within any given grain. Finally, we calculate the region properties of each grain using functions available in scikit image’s measure package36.

## Data availability

The data presented in the manuscript and analysis code that were used in this study are available from the corresponding authors upon reasonable request.

## References

1. 1.

Mak, K. F., Lee, C., Hone, J., Shan, J. & Heinz, T. F. Atomically thin MoS2: A new direct-gap semiconductor. Phys. Rev. Lett. 105, 136805 (2010).

2. 2.

Lebègue, S. & Eriksson, O. Electronic structure of two-dimensional crystals from ab initio theory. Phys. Rev. B 79, 115409 (2009).

3. 3.

Castro Neto, A. H. Charge density wave, superconductivity, and anomalous metallic behavior in 2D transition metal dichalcogenides. Phys. Rev. Lett. 86, 4382–4385 (2001).

4. 4.

Barja, S. et al. Charge density wave order in 1D mirror twin boundaries of single-layer MoSe2. Nat. Phys. 12, 751–756 (2016).

5. 5.

Podzorov, V., Gershenson, M. E., Kloc, Ch., Zeis, R. & Bucher, E. High-mobility field-effect transistors based on transition metal dichalcogenides. Appl. Phys. Lett. 84, 3301–3303 (2004).

6. 6.

Zou, X. et al. Interface engineering for high-performance top-gated MoS2 field-effect transistors. Adv. Mater. 26, 6255–6261 (2014).

7. 7.

Schmidt, H. et al. Transport properties of monolayer MoS2 grown by chemical vapor deposition. Nano Lett. 14, 1909–1913 (2014).

8. 8.

Mak, K. F., He, K., Shan, J. & Heinz, T. F. Control of valley polarization in monolayer MoS2 by optical helicity. Nat. Nanotechnol. 7, 494–498 (2012).

9. 9.

Zeng, H., Dai, J., Yao, W., Xiao, D. & Cui, X. Valley polarization in MoS2 monolayers by optical pumping. Nat. Nanotechnol. 7, 490–493 (2012).

10. 10.

Zhu, B., Zeng, H., Dai, J. & Cui, X. The study of spin-valley coupling in atomically thin group VI transition metal dichalcogenides. Adv. Mater. 26, 5504–5507 (2014).

11. 11.

Kang, K. et al. High-mobility three-atom-thick semiconducting films with wafer-scale homogeneity. Nature 520, 656–660 (2015).

12. 12.

Song, J.-G. et al. Controllable synthesis of molybdenum tungsten disulfide alloy for vertically composition-controlled multilayer. Nat. Commun. 6, 7817 (2015).

13. 13.

Kastl, C. et al. The important role of water in growth of monolayer transition metal dichalcogenides. 2D Mater. 4, 021024 (2017).

14. 14.

Gammer, C., Burak Ozdol, V., Liebscher, C. H. & Minor, A. M. Diffraction contrast imaging using virtual apertures. Ultramicroscopy 155, 1–10 (2015).

15. 15.

Panova, O. et al. Orientation mapping of semicrystalline polymers using scanning electron nanobeam diffraction. Micron 88, 30–36 (2016).

16. 16.

Wehmeyer, G., Bustillo, K. C., Minor, A. M. & Dames, C. Measuring temperature-dependent thermal diffuse scattering using scanning transmission electron microscopy. Appl. Phys. Lett. 113, 253101 (2018).

17. 17.

Kim, K. et al. Grain boundary mapping in polycrystalline graphene. ACS Nano 5, 2142–2146 (2011).

18. 18.

Huang, P. Y. et al. Grains and grain boundaries in single-layer graphene atomic patchwork quilts. Nature 469, 389–392 (2011).

19. 19.

Belianinov, A. et al. Identification of phases, symmetries and defects through local crystallography. Nat. Commun. 6, 7801 (2015).

20. 20.

Belianinov, A., Kalinin, S. V. & Jesse, S. Complete information acquisition in dynamic force microscopy. Nat. Commun. 6, 6550 (2015).

21. 21.

Jesse, S. et al. Big data analytics for scanning transmission electron microscopy ptychography. Sci. Rep. 6, 1–8 (2016).

22. 22.

Belianinov, A. et al. Big data and deep data in scanning and electron microscopies: deriving functionality from multidimensional data sets. Adv. Struct. Chem. Imaging 1, 6 (2015).

23. 23.

Sarahan, M. C., Chi, M., Masiel, D. J. & Browning, N. D. Point defect characterization in HAADF-STEM images using multivariate statistical analysis. Ultramicroscopy 111, 251–257 (2011).

24. 24.

Chen, Z. et al. Practical aspects of diffractive imaging using an atomic-scale coherent electron probe. Ultramicroscopy 169, 107–121 (2016).

25. 25.

Han, Y. et al. Strain mapping of two-dimensional heterostructures with subpicometer precision. Nano Lett. 18, 3746–3751 (2018).

26. 26.

Pekin, T. C., Gammer, C., Ciston, J., Minor, A. M. & Ophus, C. Optimizing disk registration algorithms for nanobeam electron diffraction strain mapping. Ultramicroscopy 176, 170–176 (2017).

27. 27.

Jolliffe, I. T. Principal Component Analysis (Springer, New York, 2002). https://doi.org/10.1007/0-387-22440-8_1

28. 28.

Choi, S., Cichocki, A., Park, H. & Lee, S. Blind source separation and independent component analysis: A review. (2004)

29. 29.

Stone, J. V. Independent Component Analysis: A Tutorial Introduction (MIT Press, Cambridge, 2004).

30. 30.

McKeown, M. J. et al. Analysis of fMRI data by blind separation into independent spatial components. Hum. Brain Mapp. 6, 160–188 (1998).

31. 31.

Hyvärinen, A. & Oja, E. Independent component analysis: Algorithms and applications. Neural Netw. 13, 411–430 (2000).

32. 32.

Hu, M.-K. Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 2, 179–187 (1962).

33. 33.

Frey, B. J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).

34. 34.

Ozdol, V. B. et al. Strain mapping at nanometer resolution using advanced nano-beam electron diffraction. Appl. Phys. Lett. 106, 253107 (2015).

35. 35.

Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

36. 36.

van der Walt, S. et al. scikit-image: Image processing in Python. PeerJ 2, e453 (2014).

## Acknowledgements

This work was supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, of the U.S. Department of Energy under Contract No. DE-AC02-05-CH11231, with support coming primarily from the sp2-Bonded Materials Program (KC2207), which provided for development of the theoretical framework and 4DSTEM experiments, and additionally from the van der Waals Heterostructure Program (KCWF16) which provided for preliminary sample preparation and characterization. This work was additionally supported by the National Science Foundation under Grant # DMR-1807233 which provided for conventional ADF TEM imaging. Work at the Molecular Foundry was supported by the Office of Science, Office of Basic Energy Sciences, of the US Department of Energy under Contract No. DE-AC02-05CH11231. We acknowledge Christoph Gammer, Colin Ophus, and Peter Ercius from the NCEM facility at the Molecular Foundry for developing the code used to acquire the 4DSTEM datasets and for helpful discussion.

## Author information

Authors

### Contributions

B.S., A.Z., and S.A. conceived the project. B.S. performed the electron microscopy and analyzed the data. C.T.C., C.K., T.K., A.S., and S.A. developed the methodology for fabrication of electron transparent, polycrystalline T.M.D. films and prepared the WS2 samples. B.S. and A.Z. wrote the manuscript. S.A. and A.Z. supervised the project. All authors proofread the paper.

### Corresponding authors

Correspondence to Brian Shevitski or Shaul Aloni or Alex Zettl.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

### Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Shevitski, B., Chen, C.T., Kastl, C. et al. Characterizing transition-metal dichalcogenide thin-films using hyperspectral imaging and machine learning. Sci Rep 10, 11602 (2020). https://doi.org/10.1038/s41598-020-68321-7