Two-dimensional (2D) material systems are one of the most sought after solid-state and thin-film structures due to the enormous phase space of functionality, which is driven by atomic- and nano- scale defects that can be advantageously manipulated for single-photon emission, strain engineering, Moiré physics, stacking, and transport1,2,3,4,5,6,7. Defective perturbations can alter the effective local landscape and immediately impact macroscopic functionality. Scanning tunneling microscopy (STM), one branch of scanning probe microscopy (SPM), remains fundamental for characterizing and understanding material and surface properties at distances within the atomic-to-nano range, providing information at scale into, e.g., spin-orbit coupling effects within chalcogen vacancies, electrically-driven photon emission of individual defects, and substitutional fingerprinting by measuring the local density of states (LDOS)8,9,10,11. Techniques that provide spectroscopic insight, such as STM, are extremely important in correlating defective states with macroscopic phenomena; hyperspectral data collection in, e.g., tip-enhanced Raman imaging and in optical transmission electron microscopy have become standard and enabled spectroscopic capability with enormous information richness that is both spatially and spectrally resolved12,13,14,15. However, while hyperspectral STS imaging would provide critical insight into heterogenous electronic properties at the atomic scale, it is not feasible due to the enormous time required. For instance, a hyperspectral optical map collected at 10 minutes per point in a 150 × 150 pixel grid would take well over one month. Here, we present a means of performing spatially-dense, point spectroscopic measurements with an STM in combination with artificially intelligent and machine learning (AI/ML) approaches to provide a faster and more reproducible approach to map and identify spectroscopic signatures of heterogeneous surfaces.

Since the inception of scanning tunneling spectroscopy (STS), which measures current-voltage (I/V) characteristics, the vast majority of experiments are performed using single-pixel spectroscopy, where routes for data collection tend to be geometrically positioned along a line or grid during point acquisition. The first harmonic, within an atomically resolved area, can be measured over a defined voltage range, with lock-in techniques, that corresponds to a convolution of the tip and surface LDOS (dI/dV). Assuming the tip remains constant then the data collected are from the sample alone16, where inelastic contributions (d2I/dV2) can also be inferred from the second harmonic17. This and other recent spectroscopic capabilities within the STM/STS field have given insight into the electronic structure at the atomic scale relevant for the entire field of surface science, chemical processes, optoelectronic processes, the identification of individual adatoms and molecules on surfaces, local spin-orbit coupling, and electron-phonon coupling, all with sufficient spectroscopic energy resolution to also resolve quantum phase transitions, enable the exploration of next generation color centers, the capability to resolve quantum emitters at scale, or map quantum coherence transport3,4,18,19,20. En route towards making STS more widely available, we collect point STS autonomously, via Gaussian process regression, and benchmark our method on tungsten disulfide (WS2) and a Au{111} surface. The methods shown and autonomous experimentation techniques used can be extended to a variety of available spectroscopic techniques within the SPM field, such as force spectroscopy with non-contact atomic force microscopy, however, we first focus on using the LDOS as a tool to identify surface and defective states.

Transition metal dichalcogenides (TMDs), such as WS2, have gained substantial interest for point-defect control21,22, serving as host substrates for quantum emitters23, spin-valley splitting properties24,25,26, and tunable band gap engineering27,28,29. Sulfur vacancies (VS) can be controllably created to serve as target sites for photo- and spin- active functionalization30,31. SPM can measure the electronic characteristics at the atomic level of induced defects21,24,26, while also providing a path to excite optical transitions32. 2D TMDs provide a wide phase space to non-destructively modify the quantum environment through an extensive variety of defects24, however a means to probe the electronic environment to produce statistically significant and reproducible spectra is required to expedite the understanding of emerging phenomena within the field. Furthermore, coinage metal surfaces, such as Au, have been widely explored for a variety of applications in, e.g., molecular self-assembly33,34, tip-forming32, and device applications35,36, to name a few, and provide a means for tip state calibration with application towards high throughput STS. Hence, WS2 and Au{111} are relevant and representative substrates for both the STM/STS community to employ as model systems and to demonstrate our AI/ML driven approach for hyperspectral STS mapping.

One technical challenge of STM/STS is the difficulty of the technique to acquire reproducible, artifact-free tunneling, especially in heterogeneous samples. Acquisition times are governed by multiple hyperparameters, such as voltage range, step size, and dwell time, where spectral collection that is both highly resolved energetically and spatially is confounded by a variety of factors, which can be very time intensive. In conventional STM/STS, point LDOS exploits the full energetic range of interest with high resolution, and a subsequent dI/dV map can then be collected at a specified energy level for high spatial resolution at the cost of greater experimental broadening. Current imaging tunneling spectroscopy (CITS) consists of scanning the tip in x1- and x2- directions within a predefined grid and collecting a high-resolution spectrum at each pixel and can thus visualize spectroscopic nuances spatially, such as band-bending across defective states, that may otherwise be missed37,38. CITS takes advantage of both modalities to create a full spectral and spatial picture of a region of interest, but this modality is complicated by any accompanied thermal drift, piezo hysteresis, grid optimization, and time constraints, which can either introduce artifacts in the spectra or limit experimental acquisition. The need for technical approaches that make hyperspectral STS mapping more accessible and user-friendly can provide essential utility into materials discovery and design.

We propose to make use of Gaussian process (GP) regression for hyperspectral data collection39,40,41,42,43, which is a well-known method for function approximation and uncertainty quantification. This method refers to a set of function values, where any finite subset of elements have a joint Gaussian distribution. Given some initial input, a Gaussian prior probability density function is learned and then conditioned on data, providing a posterior mean and variance within the model domain, which can be used to make autonomous decisions about optimal point measurements. This and other learning approaches have shown to be useful for hyperspectral image reconstruction41,44, autonomous synchrotron experiments40, materials discovery43,45,46, feature extraction47, and in piezoresponse force microscopy41,48, to name a few applications. The promise of autonomously driven experiments with STM come at the benefit of the human operator and can provide industrial application, where a qualified scientist or engineer can initialize an experiment and allow an AI/ML algorithm to complete the workflow49,50,51,52,53,54,55.

Here, we present one technical approach to address this challenge to perform hyperspectral STS mapping at defect sites on two different surfaces and demonstrate a) how to perform measurements with reproducible spectra, and b) create statistically significant electronic characterization of the different intrinsic defects that can be found on samples of interest. While this does not enhance sample throughput directly, it allows for samples to be spectroscopically and automatically interrogated in terms of defect diversity and their electronic fingerprint, such that a non-STM expert would have the ability to produce relevant spectroscopic insight after little training. This is carried out with the combination of two machine learning techniques for autonomous experimentation, where a one-dimensional convolutional neural network (1D-CNN) is used to identify obtained spectra that are collected autonomously by a GP to obtain an accurate CITS representation at a rate that is superior to grid collection. Surface maps are obtained for VS within WS2 and between known surface reconstructions on a Au{111} surface. We further summarize our method in a user-friendly and tailorable software package, gpSTS, for public usage.


Hyperspectral STS mapping via Gaussian process regression

An autonomous hyperspectral STM/STS experiment can be initialized over any substrate that is either conducting or semi-conducting. Spatial parameters and tip offsets in both the x1- and x2- direction are defined by an input image (as defined by point locations x1 and x2 with y signal) that is further used in cross-correlation feature tracking (see Supplementary Notes 14). At each point defined by the GP, the bias is ramped over a certain voltage range while the tip is held at constant height. Each spectra can then be identified by a 1D-CNN, where class probabilities are computed, and the sum of dI/dV signal intensity is input into the GP for mean and objective function calculation. As the experiment progresses, a proposed measurement is given to the instrument to collect a point at the uncertainty maximum. A workflow summary of an autonomous experiment is presented in Fig. 1, where an atomically sharp tip is directed to the next point for STS point acquisition by exploration that ultimately provides a 3D volume of data defined by both I(x1, x2, V) and dI(x1, x2, dV), where I is the current and V is the voltage. The true power of this methodology is evident when sufficient orbital information (specific for the VS deep in-gap state within WS2) and surface structure details (over Aufcc and Auhcp) can be obtained in well under 100 collected data points (close to 1% of the data compared to a CITS grid measurement) that is verified against ground truth data (see Supplementary Figs. 16) to remove any conceivable bias across experiments, where signal intensity over a given voltage range is monitored11,26.

Fig. 1: Overview of machine-driven hyperspectral STS mapping.
figure 1

The general workflow of the autonomous experiment performed with a scanning tunneling microscope. The data presented focuses on directing an ultra-sharp metal probe across a given area for point STS acquisition, where both filled and empty states of the sample are examined. After a completed experiment, hyperspectral volume is output from the software and a myriad of substrate and defect classes can be identified, with a trained model, and provided without cognitive bias. The method enables an optimized CITS measurement with predictive capability that was previously inaccessible due to time and tool constraints.

A GP model can be defined for a given dataset, \(D=\left\{{{{{\bf{x}}}}}_{i},{y}_{i}\right\}\), where the regression model assumes y(x) = f(x) + ϵ(x), where x are the positions in some input or parameter space, y is the associated noisy function evaluation, and ϵ(x) represents the noise term. The variance-covariance matrix Σ of the prior Gaussian probability distribution is defined via kernel functions k(xi, xj; ϕ), where ϕ is a set of hyperparameters that are found by maximizing the marginal log-likelihood of the data (earlier referred to as learning). The Matérn kernel is commonly used to match physical processes and is combined with an anisotropic kernel definition to control the level of differentiability in each direction of the input space42. A predictive mean and variance can then be defined given a Gaussian probability distribution with a set of optimized hyperparameters, which can be further used to find the next optimal point measurements in the GP-driven data acquisition loop (see Supplementary Note 1). GP-driven autonomous experimentation (within the context of Bayesian optimization), where a statistical model of the system is generated based on prior data, uses an acquisition function to suggest the next point of input, which is non-trivial in its design. There are a number of acquisition functions available with different balances of exploration, with the goal to improve the statistical model, and exploitation, with the goal of utilizing the improved statistical model to find the global optimum.

Here experimental data is collected by exploration, where points are suggested to improve the Gaussian process via point selection at uncertainty maxima. The full energy range, which is defined by the accessible voltage range over a given sample that can represent a measurable band gap at both the valence band maximum and conduction band minimum or the range where representative surface states lie (as is the case for WS2 and Au, respectively), can be measured at each point, and indeed we can zoom into any voltage range to visualize orbital information and are able to obtain an optimized hyperspectral map with high resolution both spatially and energetically. In order to evaluate the performance with different user-defined acquisition functions, which either combine posterior mean and variance functions or make use of enabled information-theoretical entities, we perform hyperspectral oversampling on WS2. After an extended and feasibly-obtainable experiment over ~ 30 hours, sufficient data points are collected and we can interpolate over 128 × 128 pixel grid from acquired data, which is autonomously driven with n = 866 collected data points and shown in Supplementary Fig. 1, and compare different acquisition functions using variance, Gaussian process upper confidence bound (GP-UCB), and Shannon’s information gain (SIG) (Supplementary Figs. 26). Here we use a side-by-side comparison of a GP-driven experiment compared to standard grid methodologies, where the GP point acquisition determined by either variance, GP-UCB, or SIG all out perform comparable grid collections. We additionally show the performance of purely random collected data points (Supplementary Fig. 3), where an experiment steered in this fashion shows performance degradation, and consistently and significantly requires more iterations versus a GP-driven CITS measurement (shown over n = 20 experiments). Across acquisition functions, the variance determined from the variance-covariance matrix shows a low of 30 and high of 79 iterations to reach 95% correlation, the UCB method shows a low of 27 and high of 74 iterations to reach this benchmark, and SIG shows a low of 25 and high of 74 iterations required (Varianceiterations = 54.2 ± 14.7, UCBiterations = 53.1 ± 12.4, SIGiterations = 50.0 ± 13.2). Performing a one-way ANOVA analysis, we fail to reject the null hypothesis that the means are equal across acquisition functions, however, we do indeed reject the null hypothesis that means among a random experiment, variance, UCB, and Shannon’s information gain are equal (pvalue << 0.05), which is driven by the elevation and subsequent mismatch of randomly-driven point acquisition.

Convolutional neural networks for spectral identification

When the sufficient data is collected, we can further successfully identify VS compared to pristine WS2 and distinguish Aufcc versus Auhcp with a trained 1D-CNN using an 80/20 train/test split ratio on 1482 individually and separately acquired scanning tunneling spectra, consisting of 424 Aufcc, 709 Auhcp, 158 VS, and 191 WS2 spectra (Fig. 2). Test data is further split (60/40 ratio) into a validation set used during training to give an estimate of the model’s skill and a test set used on unbiased data after training. CNN architectures have shown application towards identifying tip state on H:Si(100)56, with automated hydrogen lithography57,58, in identifying adatom arrays on a Co3Sn2S2 cleaved surface59, and to aid in automating carbon monoxide functionalization with use in noncontact atomic force microscopy55. The CNN architecture chosen uses shared weights to reduce the number of trainable parameters and extract spectral features on the pixel level, which is based on hyperspectral image classification methods used on AVIRIS sensor datasets60,61.

Fig. 2: 1D-Convolutional neural network.
figure 2

a Schematic depicting the 1D-CNN model used for training where each layer makes use of a rectified linear unit activation function and max pooling. The first convolutional layer consists of 64 nodes and the second layer makes use of 128 nodes, which is then passed through a dropout layer, flattened, and fed into a fully connected layer. b Individual spectra can be subsequently passed through a softmax layer using the trained model to yield class probabilities, where example spectra are shown for Aufcc, Auhcp, WS2, and VS. Each spectra depicted exhibits the greatest predictive probability of belonging to the expected class and near zero probabilities for the remaining 3 classes (depicted in gray scale) with the trained model after 6 epochs.

The 1D-CNN contains two convolution layers, one dropout layer to help overcome overfitting, and one fully connected linear layer, where the softmax can then be used after training to obtain point STS class probabilities. Each convolutional layer makes use of a 1 × 3 kernel to compute the sliding dot product and produce spectral feature maps at each layer (stride 1, padding 1), which is followed by batch normalization, a rectified linear unit (ReLU) activation, and maxpooling layer. The pooling layer down-samples each map while retaining the most important information. ReLU is a nonlinear operation that retains neuron values if it is positive or returns a zero if the input is negative, and is used on both 64 and 128 node layers. During training, the Adam algorithm62 with a learning rate of 10−4 and computed cross-entropy loss for optimization are used to automatically identify spectral features, where the Adam optimizer minimizes loss. Input and output are shown in Fig. 2, where spectra for Aufcc, Auhcp, WS2, and VS that is unseen by the trained model is input and passed through the defined network to produce class identification. Accuracy and loss are further shown through 20 epochs for both training and test data (Supplementary Fig. 7), with class accuracy scores presented in Table 1 for the first six epochs.

Table 1 Model class performance.

Pristine WS2, VS, Auhcp, and Aufcc training spectra are all optimized within the first 6 epochs, where the model reaches > 95% accuracy on Aufcc validation data after 5 epochs and 100% accuracy for remaining classes. Overall test performance reaches 100% accuracy after 6 epochs, which is the model chosen for classification (see Supplementary Figs. 810 for test performance metrics). This paves the way for enabling reproducible STS over surface variations that are distinguishable via bias spectroscopy by providing class probabilities for operators to benchmark against, which can be further expanded to any relevant material. Subsequent identification over herringbone reconstruction is performed (Fig. 3), where an impurity is used to track drift during a given cycle, dense hyperspectral data is classified with the 1D-CNN, and image segmentation can be performed with individually classed STS point overlays or pixel-by-pixel using an interpolated form. The peak in dI/dV at −0.48 V shows the tendency for low energy surface-state electrons to localize in Auhcp regions11. A completed experiment over a VS within WS2 is also presented showing defect segmentation using the trained 1D-CNN. As most STM/STS data require high-quality tips and surfaces, the data acquired can be used to verify tip quality on both Au{111} and WS2, however this is not explicitly explored within the methods presented.

Fig. 3: Defect identification.
figure 3

a Au{111} herringbone reconstruction that is identifiable via point bias spectroscopy followed by classification using a trained 1D-CNN, where image tracking can be performed on a larger surface region compared to the autonomous STS experiment region. b Data is further interpolated over a dense grid, classified, and depicted as an image overlay on acquired topography (Itunnel = 30 pA, Vsample = 1 V). Scale bar, 2.5 nm. Accumulated spectra over both c Aufcc and d Auhcp are shown with the mean spectrum that is colored by classification. e VS located within in-situ annealed WS2, where the defect itself is used for drift tracking, with overlaid acquired STS (Itunnel = 30 pA, Vsample = 1.2 V) and f the corresponding linear interpolated form highlighting measured in-gap states. Scale bar, 0.5 nm. Spectra used for training, validation, and test are shown for both g WS2 and h VS, where a total of >1400 spectra were acquired over multiple experimental runs for both Au and WS2.

Autonomous experimentation

Experiments are performed at liquid helium temperatures and in ultrahigh vacuum to minimize any drift during an experimental run. A number of drift correction techniques have been explored, which take advantage of machine vision techniques, feature tracking, atom-tracking, image pairs, or thermal drift correction methodologies, to list some of the approaches within the literature63,64,65,66,67. To correct for any residual drift, driven by either thermal fluctuations during piezo motion, sample-to-sample variability, or any tool-to-tool difference, we acquire interval images at a predefined offset window and then compute feature correlation (between spectral acquisition and after n = 10 points) using sliding image patches (Supplementary Fig. 11). This block-matching approach is a common technique for image recognition and operates by taking the maximum correlation within a given pixel range68,69,70. Any computed offset is registered to the tool by updating the scan window location during the autonomous hyperspectral experiment. Collected images are plane corrected with a line-by-line linear fit to adjust for tilt, since SPM tips are not always perfectly normal to a given sample. Each high resolution spectra is swept from +1.4 V to −1.8 V on WS2 (or +1 to −1 V on Au) and takes 2 min for the complete sweep. After two completed autonomous experiments, drift was measured to be on the order of 0.5 ± 0.2 Å on WS2 and 1.1 ± 0.8 Å over Au{111} that is shown in Supplementary Fig. 12. Hyperparameters can be fine-tuned to best accommodate for any of the defined drift modes during an experiment and for subsequent reproducibility.

Hyperparameters for both prior-mean model and covariance functions can be optimized after each point acquired during an autonomous experiment. We further show a summary example in Fig. 4, depicting the progression of such an experiment in the case of a VS within WS2, where the user defines the measurement space by providing a topographic image to the software and uncertainty is extracted after every point acquired to determine the next point of acquisition (further detailed in Supplementary Fig. 13). The means to direct measurements, optimize a grid, and decrease the time required for sample scrutiny is of key importance within the materials discovery and scanning probe fields, where high-resolution point spectroscopy is on the order of minutes and capturing a dense 128 × 128 grid with zero drift isn’t feasible with available systems (0.14 days with GP compared to 22.76 days for a dense CITS measurement). Enhanced experimental throughput with such a method provides an additional tool for users to collect and identify defects without human bias and/or intervention.

Fig. 4: Gaussian-process-driven experiment.
figure 4

a Autonomous scanning tunneling spectroscopy experiment, where an input image (Itunnel = 30 pA, Vsample = 1.6 V), showing a VS within WS2, is used for feature tracking and input into a Gaussian process to determine a corresponding objective and error function to suggest the next point of measurement. Scale bar, 1 nm. b Evolution of an autonomous experiment showing mean and variance functions at a given interval is depicted, where orbital reconstruction is sufficiently reached with only ~1% of points required compared a 128 × 128 pixel grid experiment. c Point defect identification is accomplished at each acquired pixel and the signal summation (0.0V < Vsample < 0.7V) is used for input. d Mean model output after N = 160 points, depicting a defect map of measured in-gap W d states.


A method for autonomous experimentation is presented that makes use of both GPs and a 1D-CNN for spectral identification, which enables point defect fingerprinting across a wide variety of materials and surfaces. Image segmentation can be subsequently executed after spectral classification. As experiments can be performed without time-intensive input from the operator and at a lower spatial density, high resolution STM/STS can be performed in an autonomous fashion allowing for less redundant information over a given area of interest. Additionally, as neural network algorithms tend to require a large amount of data, the GP can be operated in exploration mode for an increased number of observations to contribute towards statistically significant measurements and ML training on, e.g., an uninvestigated system of interest with spectroscopic signatures not readily available in the literature. The methods presented make use of spectroscopically variant features within a material, where a user can collect data directed by uncertainty quantification or any preferred acquisition function, and use classification to determine tip quality or train on a defined number of classes. We expect that the open source software package can find application across the scanning probe field and greatly increase experimental efficiency, where the library can be easily extended to any system accessible with STM/STS.

Where previous reports have used either spatial features, decreased pixel density in spectral space, or some combination for segmentation and CITS measurements with an STM, we unambiguously identify defects and surface-state based on high resolution spectral features that leverage the power of Gaussian processes combined with a CNN architecture for prediction. This hyperspectral STS mapping measurement technique that combines CITS with AI/ML enables full characterization of heterogeneous sample surfaces, ensuring that no local spectral features are missed, even by a non-experienced user.


Scanning probe microscopy (SPM) measurements

All measurements were performed with a Createc GmbH scanning probe microscope operating under ultrahigh vacuum (pressure < 2 × 10−10 mbar) at liquid helium temperatures (T < 6 K). Either etched tungsten or platinum iridium tips were used during acquisition. Tip apexes were further shaped by indentations onto a gold substrate. STM images are taken in constant-current mode with a bias applied to the sample. STS measurements were recorded using a lock-in amplifier with a resonance frequency of 683 Hz and a modulation amplitude of 5 mV.

Sample preparation

Monolayer islands of WS2 were grown on graphene/SiC substrates with an ambient pressure CVD approach. A graphene/SiC substrate with 10 mg of WO3 powder on top was placed at the center of a quartz tube, and 400 mg of sulfur powder was placed upstream. The furnace was heated to 900 °C and the sulfur powder was heated to 250 °C using a heating belt during synthesis. A carrier gas for process throughput was used (Ar gas at 100 sccm) and the growth time was 60 min. The CVD grown WS2/MLG/SiC was annealed in vacuo at 600 °C for 30 min to induce sulfur vacancies.

Neural network and Gaussian process implementation

The acquisition software provided leverages the integration of Python and LabVIEW, and makes use of the Nanonis programming interface. The GP was implemented using gpCAM, which is a library for autonomous experimentation by M. M. Noack71. The CNN was constructed with Pytorch, which is a deep-learning library available in Python. An Intel Xeon E5-2623 v3 CPU with 8 cores and 64 GB of memory combined with a Tesla K80 with 4992 CUDA cores was used for training.