Autonomous Investigations over WS$_2$ and Au{111} with Scanning Probe Microscopy

Individual atomic defects in 2D materials impact their macroscopic functionality. Correlating the interplay is challenging, however, intelligent hyperspectral scanning tunneling spectroscopy (STS) mapping provides a feasible solution to this technically difficult and time consuming problem. Here, dense spectroscopic volume is collected autonomously via Gaussian process regression, where convolutional neural networks are used in tandem for spectral identification. Acquired data enable defect segmentation, and a workflow is provided for machine-driven decision making during experimentation with capability for user customization. We provide a means towards autonomous experimentation for the benefit of both enhanced reproducibility and user-accessibility. Hyperspectral investigations on WS$_2$ sulfur vacancy sites are explored, which is combined with local density of states confirmation on the Au{111} herringbone reconstruction. Chalcogen vacancies, pristine WS$_2$, Au face-centered cubic, and Au hexagonal close packed regions are examined and detected by machine learning methods to demonstrate the potential of artificial intelligence for hyperspectral STS mapping.

are explored, which is combined with local density of states confirmation on the Au{111} herringbone reconstruction. Chalcogen vacancies, pristine WS 2 , Au face-centered cubic, and Au hexagonal close packed regions are examined and detected by machine learning methods to demonstrate the potential of artificial intelligence for hyperspectral STS mapping.

INTRODUCTION
Two-dimensional (2D) material systems are one of the most sought after solid-state and thin-film structures due to the enormous phase space of functionality, which is driven by atomic-and nano-scale defects that can be advantageously manipulated for single-photon emission, strain engineering, Moiré physics, stacking, and transport. [1][2][3][4][5][6][7] Defective perturbations can alter the effective local landscape and immediately impact macroscopic functionality. Scanning tunneling microscopy (STM), one branch of scanning probe microscopy (SPM), remains fundamental for characterizing and understanding material and surface properties at distances within the atomic-to-nano range, providing information at scale into, e.g., spinorbit coupling effects within chalcogen vacancies, electrically-driven photon emission of individual defects, and substitutional fingerprinting by measuring the local density of states (LDOS). [8][9][10][11] Techniques that provide spectroscopic insight, such as STM, are extremely important in correlating defective states with macroscopic phenomena; hyperspectral data collection in, e.g., tip-enhanced Raman imaging and in optical transmission electron microscopy have become standard and enabled spectroscopic capability with enormous information richness that is both spatially and spectrally resolved. [12][13][14][15] However, while hyperspectral STS imaging would provide critical insight into heterogenous electronic properties at the atomic scale, it is not feasible due to the enormous time required. For instance, a hyperspectral optical map collected at 10 min per point in a 150×150 pixel grid would take well over one month. Here, we present a means of performing spatially-dense, point spectroscopic measurements with an STM in combination with artificially intelligent and machine learning (AI/ML) approaches to provide a faster and more reproducible approach to map and identify spectroscopic signatures of heterogeneous surfaces.
Since the inception of scanning tunneling spectroscopy (STS), which measures current-voltage (I/V) characteristics, the vast majority of experiments are performed using single pixel spectroscopy, where routes for data collection tend to be geometrically positioned along a line or grid during point acquisition. The first harmonic, within an atomically resolved area, can be measured over a defined voltage range, with lock-in techniques, that corresponds to a convolution of the tip and surface LDOS (dI/dV). Assuming the tip remains constant then the data collected are from the sample alone, 16 where inelastic contributions (d 2 I/dV 2 ) can also be inferred from the second harmonic. 17 This and other recent spectroscopic capabilities within the STM/STS field have given insight into the electronic structure at the atomic scale relevant for the entire field of surface science, chemical processes, optoelectronic processes, the identification of individual adatoms and molecules on surfaces, local spin-orbit coupling, and electronphonon coupling, all with sufficient spectroscopic energy resolution to also resolve quantum phase transitions, enable the exploration of next generation color centers, the capability to resolve quantum emitters at scale, or map quantum coherence transport. 3,4,[18][19][20] En route towards making STS more widely available, we collect point STS autonomously, via Gaussian process regression, and benchmark our method on tungsten disulfide (WS 2 ) and a Au{111} surface. The methods shown and autonomous experimentation techniques used can be extended to a variety of available spectroscopic techniques within the SPM field, such as force spectroscopy with non-contact atomic force microscopy, however, we first focus on using the LDOS as a tool to identify surface and defective states.
Transition metal dichalcogenides (TMDs), such as WS 2 , have gained substantial interest for pointdefect control, 21,22 serving as host substrates for quantum emitters, 23 spin-valley splitting properties, [24][25][26] and tunable band gap engineering. [27][28][29] Sulfur vacancies (V S ) can be controllably created to serve as target sites for photo-and spin-active functionalization. 30,31 SPM can measure the electronic characteristics at the atomic level of induced defects, 21,24,26 while also providing a path to excite optical transitions. 32 2D TMDs provide a wide phase space to non-destructively modify the quantum environment through a wide variety of defects, 24 however a means to probe the electronic environment to produce statistically significant and reproducible spectra is required to expedite the understanding of emerging phenomena within the field. Furthermore, coinage metal surfaces, such as Au, have been widely explored for a variety of applications in, e.g., molecular self-assembly, 33,34 tip-forming, 32 and device applications, 35,36 to name a few, and provide a means for tip state calibration with application towards high throughput STS. Hence, WS 2 and Au{111} are relevant and representative substrates for both the STM/STS community to employ as model systems and to demonstrate our AI/ML driven approach for hyperspectral STS mapping.
One technical challenge of STM/STS is the difficulty of the technique to acquire reproducible, artifact-free tunneling, especially in heterogeneous samples. Acquisition times are governed by multiple hyperparameters, such as voltage range, step size, and dwell time, where spectral collection that is both highly resolved energetically and spatially is confounded by a variety of factors, which can be very time intensive. In conventional STM/STS, point LDOS exploits the full energetic range of interest with high resolution, and a subsequent dI/dV map can then be collected at a specified energy level for high spatial resolution at the cost of greater experimental broadening. Current imaging tunneling spectroscopy (CITS) consists of scanning the tip in x 1 -and x 2 -directions within a predefined grid and collecting a high-resolution spectrum at each pixel and can thus visualize spectroscopic nuances spatially, such as band-bending across defective states, that may otherwise be missed. 37,38 CITS takes advantage of both modalities to create a full spectral and spatial picture of a region of interest, but this modality is complicated by any accompanied thermal drift, piezo hysteresis, grid optimization, and time constraints, which can either introduce artifacts in the spectra or limit experimental acquisition. The need for technical approaches that make hyperspectral STS mapping more accessible and user-friendly can provide essential utility into materials discovery and design.
We propose to make use of Gaussian process (GP) regression for hyperspectral data collection, [39][40][41][42][43] which is a well-known method for function approximation and uncertainty quantification. This method refers to a set of function values, where any finite subset of elements have a joint Gaussian distribution.
Given some initial input, a Gaussian prior probability density function is learned and then conditioned on data, providing a posterior mean and variance within the model domain, which can be used to make autonomous decisions about optimal point measurements. This and other learning approaches have shown to be useful for hyperspectral image reconstruction, 41,44 autonomous synchrotron experiments, 40 materials discovery, 43,45,46 feature extraction, 47 and in piezoresponse force microscopy, 41,48 to name a few applications. The promise of autonomously driven experiments with STM come at the benefit of the human operator and can provide industrial application, where a qualified scientist or engineer can initialize an experiment and allow an AI/ML algorithm to complete the workflow. [49][50][51][52][53][54][55] Here, we present one technical approach to address this challenge to perform hyperspectral STS mapping at defect sites on two different surfaces and demonstrate a) how to perform measurements with reproducible spectra, and b) create statistically significant electronic characterization of the different intrinsic defects that can be found on samples of interest. While this does not enhance sample throughput directly, it allows for samples to be spectroscopically and automatically interrogated in terms of defect diversity and their electronic fingerprint, such that a non-STM expert would have the ability to produce relevant spectroscopic insight after little training. This is carried out with the combination of two machine learning techniques for autonomous experimentation, where a one-dimensional convolutional neural network (1D-CNN) is used to identify obtained spectra that are collected autonomously by a GP to obtain an accurate CITS representation at a rate that is superior to grid collection. Surface maps are obtained for V S within WS 2 and between known surface reconstructions on a Au{111} surface. We further summarize our method in a user-friendly and tailorable software package, gpSTS, for public usage.

Hyperspectral STS Mapping via Gaussian Process Regression
An autonomous hyperspectral STM/STS experiment can be initialized over any substrate that is either conducting or semi-conducting. Spatial parameters and tip offsets in both the x 1 -and x 2 -direction are defined by an input image (as defined by point locations x 1 and x 2 with y signal) that is further used in cross-correlation feature tracking (see Supplementary Notes 1-4). At each point defined by the GP, the bias is ramped over a certain voltage range while the tip is held at constant height. Each spectra can then be identified by a 1D-CNN, where class probabilities are computed, and the sum of dI/dV signal intensity is input into the GP for mean and objective function calculation. As the experiment progresses, a proposed measurement is given to the instrument to collect a point at the uncertainty maximum. A workflow summary of an autonomous experiment is presented in Figure 1, where an atomically sharp tip is directed to the next point for STS point acquisition by exploration that ultimately provides a 3D volume of data defined by both I(x 1 , x 2 ,V ) and dI(x 1 , x 2 , dV ), where I is the current and V is the voltage.
The true power of this methodology is evident when sufficient orbital information (specific for the V S deep in-gap state within WS 2 ) and surface structure details (over Au f cc and Au hcp ) can be obtained in well under 100 collected data points (close to 1% of the data compared to a CITS grid measurement) that is verified against ground truth data (see Supplementary Figures 1-6) to remove any conceivable bias across experiments, where signal intensity over a given voltage range is monitored. 11,26 Figure 1: Overview of machine-driven hyperspectral STS mapping. The general workflow of the autonomous experiment performed with a scanning tunneling microscope. The data presented focuses on directing an ultra-sharp metal probe across a given area for point STS acquisition, where both filled and empty states of the sample are examined. After a completed experiment, hyperspectral volume is output from the software and a myriad of substrate and defect classes can be identified, with a trained model, and provided without cognitive bias. The method enables an optimized CITS measurement with predictive capability that was previously inaccessible due to time and tool constraints.
A GP model can be defined for a given dataset, D = {x i , y i }, where the regression model assumes , where x are the positions in some input or parameter space, y is the associated noisy function evaluation, and ε(x) represents the noise term. The variance-covariance matrix Σ of the prior Gaussian probability distribution is defined via kernel functions k(x i , x j ; φ ), where φ is a set of hyperparameters that are found by maximizing the marginal log-likelihood of the data (earlier referred to as learning). The Matérn kernel is commonly used to match physical processes and is combined with an anisotropic kernel definition to control the level of differentiability in each direction of the input space. 42 A predictive mean and variance can then be defined given a Gaussian probability distribution with a set of optimized hyperparameters, which can be further used to find the next optimal point measurements in the GP-driven data acquisition loop (see Supplementary Note 1). GP-driven autonomous experimentation (within the context of Bayesian optimization), where a statistical model of the system is generated based on prior data, uses an acquisition function to suggest the next point of input, which is non-trivial in its design. There are a number of acquisition functions available with different balances of exploration, with the goal to improve the statistical model, and exploitation, with the goal of utilizing the improved statistical model to find the global optimum.
Here experimental data is collected by exploration, where points are suggested to improve the Gaussian process via point selection at uncertainty maxima. The full energy range, which is defined by the accessible voltage range over a given sample that can represent a measurable band gap at both the valence band maximum and conduction band minimum or the range where representative surface states lie (as is the case for W S 2 and Au, respectively), can be measured at each point, and indeed we can zoom into any voltage range to visualize orbital information and are able to obtain an optimized hyperspectral map with high resolution both spatially and energetically. In order to evaluate the performance with different user-defined acquisition functions, which either combine posterior mean and variance functions or make use of enabled information-theoretical entities, we perform hyperspectral oversampling on WS 2 . After an extended and feasibly-obtainable experiment over ∼ 30 hours, sufficient data points are collected and we can interpolate over 128×128 pixel grid from acquired data, which is autonomously driven with n = 866 we fail to reject the null hypothesis that the means are equal across acquisition functions, however, we do indeed reject the null hypothesis that means among a random experiment, variance, UCB, and Shannon's information gain are equal (p value « 0.05), which is driven by the elevation and subsequent mismatch of randomly-driven point acquisition.

Convolutional Neural Networks for Spectral Identification
When the sufficient data is collected, we can further successfully identify V S compared to pristine WS 2 and distinguish Au f cc versus Au hcp with a trained 1D-CNN using an 80/20 train/test split ratio on 1482 individually and separately acquired scanning tunneling spectra, consisting of 424 Au f cc , 709 Au hcp , 158 V S , and 191 W S 2 spectra ( Figure 2). Test data is further split (60/40 ratio) into a validation set used during training to give an estimate of the model's skill and a test set used on unbiased data after training.
CNN architectures have shown application towards identifying tip state on H:Si(100), 56 with automated hydrogen lithography, 57,58 in identifying adatom arrays on a Co 3 Sn 2 S 2 cleaved surface, 59 and to aid in automating carbon monoxide functionalization with use in noncontact atomic force microscopy. 55 The  Figure 7), with class accuracy scores presented in Table 1 (Figure 3), where an impurity is used to track drift during a given cycle, dense hyperspectral data is classified with the 1D-CNN, and image segmentation can be performed with individually classed STS point overlays or pixel-by-pixel using an interpolated form. The peak in dI/dV at -0.48 V shows the tendency for low energy surface-state electrons to localize in Au hcp regions. 11 A completed experiment over a V S within WS 2 is also presented showing defect segmentation using the trained 1D-CNN. As most STM/STS data require high-quality tips and surfaces, the data acquired can be used to verify tip quality on both Au{111} and W S 2 , however this is not explicitly explored within the methods presented. Figure 3: Defect identification. a) Au{111} herringbone reconstruction that is identifiable via point bias spectroscopy followed by classification using a trained 1D-CNN, where image tracking can be performed on a larger surface region compared to the autonomous STS experiment region. b) Data is further interpolated over a dense grid, classified, and depicted as an image overlay on acquired topography (I tunnel = 30 pA, V sample = 1 V). Scale bar, 2.5 nm. Accumulated spectra over both c) Au f cc and d) Au hcp are shown with the mean spectrum that is colored by classification. e) V S located within in-situ annealed W S 2 , where the defect itself is used for drift tracking, with overlaid acquired STS (I tunnel = 30 pA, V sample = 1.2 V) and f) the corresponding linear interpolated form highlighting measured in-gap states. Scale bar, 0.5 nm. Spectra used for training, validation, and test are shown for both g) W S 2 and h) V S , where a total of > 1400 spectra were acquired over multiple experimental runs for both Au and W S 2 .

Autonomous Experimentation
Experiments are performed at liquid helium temperatures and in ultrahigh vacuum to minimize any drift during an experimental run. A number of drift correction techniques have been explored, which take advantage of machine vision techniques, feature tracking, atom-tracking, image pairs, or thermal drift correction methodologies, to list some of the approaches within the literature. [63][64][65][66][67] To correct for any residual drift, driven by either thermal fluctuations during piezo motion, sample-to-sample variability, or any tool-to-tool difference, we acquire interval images at a predefined offset window and then compute feature correlation (between spectral acquisition and after n = 10 points) using sliding image patches (Supplementary Figure 11). This block-matching approach is a common technique for image recognition and operates by taking the maximum correlation within a given pixel range. [68][69][70] Any computed offset is registered to the tool by updating the scan window location during the autonomous hyperspectral experiment. Collected images are plane corrected with a line-by-line linear fit to adjust for tilt, since SPM tips are not always perfectly normal to a given sample. Each high resolution spectra is swept from +1.4 V to -1.8 V on WS 2 (or +1 to -1 V on Au) and takes 2 minutes for the complete sweep. After two completed autonomous experiments, drift was measured to be on the order of 0.5 ± 0.2 Å on WS 2 and 1.1 ± 0.8 Å over Au{111} that is shown in Supplementary Figure 12. Hyperparameters can be fine-tuned to best accommodate for any of the defined drift modes during an experiment and for subsequent reproducibility. Where previous reports have used either spatial features, decreased pixel density in spectral space, or some combination for segmentation and CITS measurements with an STM, we unambiguously identify defects and surface-state based on high resolution spectral features that leverage the power of Gaussian processes combined with a CNN architecture for prediction. This hyperspectral STS mapping measurement technique that combines CITS with AI/ML enables full characterization of heterogeneous sample surfaces, ensuring that no local spectral features are missed, even by a non-experienced user.

Scanning probe microscopy (SPM) measurements
All measurements were performed with a Createc GmbH scanning probe microscope operating under ultrahigh vacuum (pressure < 2x10 −10 mbar) at liquid helium temperatures (T < 6 K). Either etched tungsten or platinum iridium tips were used during acquisition. Tip apexes were further shaped by indentations onto a gold substrate. STM images are taken in constant-current mode with a bias applied to the sample. STS measurements were recorded using a lock-in amplifier with a resonance frequency of 683 Hz and a modulation amplitude of 5 mV.

Sample Preparation
Monolayer islands of WS 2 were grown on graphene/SiC substrates with an ambient pressure CVD approach. A graphene/SiC substrate with 10 mg of WO 3 powder on top was placed at the center of a quartz tube, and 400 mg of sulfur powder was placed upstream. The furnace was heated to 900 • C and the sulfur powder was heated to 250 • C using a heating belt during synthesis. A carrier gas for process throughput was used (Ar gas at 100 sccm) and the growth time was 60 min. The CVD grown W S 2 /MLG/SiC was annealed in vacuo at 600 • C for 30 minutes to induce sulfur vacancies.

Neural Network and Gaussian Process Implementation
The acquisition software provided leverages the integration of Python and LabVIEW, and makes use of the Nanonis programming interface. The GP was implemented using gpCAM, which is a library for autonomous experimentation by M. M. Noack. 71 The CNN was constructed with Pytorch, which is a deep-learning library available in Python. An Intel Xeon E5-2623 v3 CPU with 8 cores and 64 GB of memory combined with a Tesla K80 with 4992 CUDA cores was used for training.

Data availability
All data needed to evaluate the conclusions exhibited are present in the paper, supplemental information, and/or available on zenodo (https://doi.org/10.5281/zenodo.5768320).

Code availability
The full suite of home-built software is available at https://github.com/jthomas03/gpSTS.

Supplementary Material for
where r = x i − x j l 2 , B ν is the modified Bessel function, and ν is a parameter defining the differentiability. This is combined with an anisotropic kernel definition to control the level of differentiability in each direction of the input space. The GP model prior and posterior can be used in the definition of the acquisition function. 2 The acquisition function can take a variety of forms that include measuring the total uncertainty or favor morphological features. The implemented software can include a user-defined acquisition function that passes through optimization and gives the next point of measurement for data collection. Using the GP defined joint prior given as with , and x 0 is the point of interest. The acquisition function can now be defined as and directly provide where the next measurement point will be taken during an autonomous experiment.
The simplest example of the acquisition function is the exploration method given as where the posterior variance maximum is the next point of acquisition. The general definition of the acquisition function also allows for Bayesian optimization such as using the upper confidence bound (UCB) method given as and information theory methods such as the Shannon-information gain defined by which can be further fine-tuned to find regions of interest, e.g. using a reference spectrum or image feature. A cost function can also be included, which defines the cost of a given measurement itself and the cost of moving within a defined parameter space.
Here we compare results from using the diagonal of the variance-covariance matrix, UCB, and using the Shannon-information gain (SIG). Additionally, we also compare results from a dense 128×128 grid, an equally spaced 12×12 grid, and purely randomly derived locations.

2| Drift Tracking
Drift within scanning probe microscopy techniques can be solved through a variety of methods either during the experiment or data manipulation post acquisition. Piezocreep and thermal drift are the two largest contributors to drift, where piezocreep is a nonlinear effect that can be minimized through correcting for the desired tip position to the actual tip location. However, as the piezos move or scan, some amount of thermal drift is expected. A number of image analysis methodologies that make use of features within a raster-scan image have shown efficacy in tracking drift between multiple images. 3 Atom-tracking, where the STM tip is locked onto an atom or molecule using two-dimensional lateral feedback can maintain spatial coordinates and be used to track drift velocity. 4 Another method that takes advantage of matching sets of key points across two images enables correction of nonlinearities within piezo positioners. 5 The concept of using image pairs is used across scanning probe microscopy and is also applicable to scanning transmission electron microscopy, where drift distortion can be corrected line-by-line by rectifying scan line origins to minimize differences between measured line intensity and the image recorded along an orthogonal direction. 6 Here, we apply one methodology towards tracking drift during point scanning tunneling spectroscopy acquisition.

3| Convolutional Neural Network Performance
Metrics from our proposed model are shown below, where we look at receiver operator characteristics to determine model performance between positive and negative classes, the confusion matrix to view sensitivity, specificity, the false negative rate, and the false positive rate, and the precision recall curve that depicts the true positive rate and the positive predictive value. In all three metrics, the chosen model performs optimally on the data provided against four classes of spectroscopic data. We compare two models with matching hyperparameters, as detailed in https://github.com/jthomas03/gpSTS, that only differ in the number of epochs that the neural network is trained. At 6 epochs, off-diagonal elements, which represent a type i (false positive) or type ii (false negative) error, within the confusion matrix are minimized.

4| gpSTS Experimental Workflow
A GP experiment can be started over a given region with some initialization parameters, which is provided by the user in the form of a n×n image containing information for pixel size, experimental boundary conditions, and location. Additional information, such as file locations, experiment name, imaging parameters for drift tracking, voltage range and stepsize, and any high-level neural network hyperparameters are input into the configuration file in gpSTS. An example file is shown below. The experiment is then run via command line interface with the Nanonis controller running with the option presented in the article shown below (using the signal summation in acquired spectra).

SUPPLEMENTARY FIGURES
Supplementary Figure 1: Extended Gaussian-process-driven experiment. a) Ground truth data was formulated from a live experimental run with collected points shown in x 1 × x 2 × bias sample (V ) space with spectral intensity (dI/dV a.u.) depicted in a white to red colorscale, and then b) interpolating the data to a dense 128 x 128 datacube. c) Each point can be summed within a given range (0 to 0.7 V to capture in-gap states), which is shown as an image, and can then be used to compare different acquisition function performance while GP takes point STS shown in b and performs an equivalent summation to reach a predicted mean value function.
Supplementary Figure 2: Comparison of point collection methods. In order to test the performance of a GP on low temperature STS data, we input points along two types of grids, compute summation values, interpolate with points acquired, and fill the remaining values with prior STS minima. We show an a) equally spaced 12 x 12 grid, b) a dense 128 x 128 grid, and c) an example GP-driven grid in addition to d) correlations with ground truth data using variance, UCB, Shannon information gain, and each style of grid collection. GP-driven collection shows a measurable boost in correlation and typically reaches greater than 90 percent in under 30 points.
Supplementary Supplementary Figure 12: Experimental drift. Computed offsets during a Au{111} autonomous experiment is shown a) in two dimensions and b) the magnitude of the 2D vector as a function of time. Both the c) 2D and d) 1D picture are also shown for a W S 2 experiment. Drift in each example is multi-directional and highlights the need for a means to correct for any offset during measurements, however, parameters can be optimized for any possible tool-tool or run-run delta within the provided software.