Abstract
Recent advances in high resolution scanning transmission electron and scanning probe microscopies have allowed researchers to perform measurements of materials structural parameters and functional properties in real space with a picometre precision. In many technologically relevant atomic and/or molecular systems, however, the information of interest is distributed spatially in a nonuniform manner and may have a complex multidimensional nature. One of the critical issues, therefore, lies in being able to accurately identify (‘read out’) all the individual building blocks in different atomic/molecular architectures, as well as more complex patterns that these blocks may form, on a scale of hundreds and thousands of individual atomic/molecular units. Here we employ machine vision to read and recognize complex molecular assemblies on surfaces. Specifically, we combine Markov random field model and convolutional neural networks to classify structural and rotational states of all individual building blocks in molecular assembly on the metallic surface visualized in highresolution scanning tunneling microscopy measurements. We show how the obtained full decoding of the system allows us to directly construct a pair density function—a centerpiece in analysis of disorderproperty relationship paradigm—as well as to analyze spatial correlations between multiple order parameters at the nanoscale, and elucidate reaction pathway involving molecular conformation changes. The method represents a significant shift in our way of analyzing atomic and/or molecular resolved microscopic images and can be applied to variety of other microscopic measurements of structural, electronic, and magnetic orders in different condensed matter systems.
Introduction
The symmetry properties of crystalline and molecular systems associated with a longrange periodicity of their assumingly ideal ‘lattices’ serve as a cornerstone for deriving electronic, magnetic, and optical functionalities of technologically relevant materials. Experimentally, these properties are usually accessed via scattering techniques that provide information about spatially averaged (over the probing volume) site occupancies. The knowledge of average structural parameters underpins classical physical descriptions based on concepts of order parameters, average compositions, and symmetry averaged thermal and phonon properties. At the same time, there is growing realization that the effects of local structure exemplified by disorder can often lead to novel functionalities absent in averaged models.^{1,2,3,4,5} For example, one interesting scenario of disorder occurs when there is a distinction between local symmetry associated with individual building blocks and global symmetry imposed by underlying lattice.^{1, 2} A resultant interplay between local and average symmetry opens new pathways to understand and optimize optical, magnetic, electronic, and thermal properties of certain disordered systems.^{2, 5}
Exploring and controlling different types of disorder both in periodic and nonperiodic structures is therefore crucial for the applications and basic science alike. In the last decade, the progress in scanning probe and transmission electron microscopies (SPM and STEM) have allowed a realspace unit cell scale mapping of electronic and structural orders in materials making them the perfect tools for analyzing such distorted systems.^{6} For example, the subfields of SPM such as noncontact atomic force microscopy and scanning tunneling microscopy are known to provide an unprecedented, angstromresolved visual insight into a nature of chemical bonds^{7} and spatial behavior of electronic density of states on a surface,^{8} respectively. Such capabilities result in evergrowing stream of the vast amounts of high quality (resolution) experimental data that requires adequate analytical methods for extracting from it a relevant physical and chemical information.^{9}
Concurrent with advances in realspace experimental microscopic measurements, contemporary theoretical, abinitio modelling allows detailed study of atomic/molecular structures, their electronic, magnetic, and optical properties.^{10,11,12} However, many interesting functionalities in disordered molecular and/or atomic systems are defined on length scales at which the number of possible molecular or atomic configurations (and hence, computational cost) grow exponentially. Similarly, effects of local symmetry breaking and disorder are often manifested in minute (~ pm level) distortions of the bonding geometry or effective molecular shapes.^{13,14,15,16} This suggests a necessity for pathway to integrate an experiment and certain elements of theory that would allow an automated and highly efficient inspection and interpretation of experimental image consisting of a large number of individual atomic and/or molecular units (~10^{2}–10^{3}) in a fashion of full information extraction, linking both minute deviations in local structure and largescale assembly properties in statistically significant manner.
Here we use an approach based on a synergy of Markov random field, convolutional neural network, and abinitio simulations for performing a full decoding of various orders associated with symmetries of individual building blocks (molecules) on the underlying lattice (substrate). We apply this method to explore molecular interactions in 2D film of bowlshaped sumanene molecules on gold substrate, where an individual molecule in each lattice point can reside in multiple (structural and rotational) configurations. The obtained full decoding at the nanoscale level allows us to directly construct both relevant pair density functions—a centerpiece in analysis of disorderproperty relationship paradigm,^{1} as well as more complex structural descriptors. This in turn allows us to explore how individual blocks may form certain shortrange orders, as well as to analyze potential (spatial) correlations between multiple order parameters, and to use the obtained information for constructing a reaction path for molecule conformational changes in the selfassembly.
Results
Model experimental system
As a model system, we chose a selfassembly of bowlshaped πconjugated sumanene molecules^{17} (hereafter, buckybowls) on a gold (111) surface. Unlike most of the planar molecules, the buckybowls are characterized by an additional structural degree of freedom associated with their bowlup (U) and bowldown (D) conformations (Fig. 1a, b).^{18, 19} The raw experimental STM image in Fig. 1c shows a nanoscale area of gold (111) surface covered with selfassembled adlayer of buckybowls (see Supplementary Note 1). The global twodimensional fast Fourier transform performed on the data from Fig. 1c reveals a presence of two hexagonal patterns (inset in Fig. 1c), that are rotated by 30° with respect to each other, with their lattice constants different by a factor of ≈\(\sqrt 3 \). Such reciprocal space structure indicates an alternation of STM tunneling current at every third molecule in the selfassembly. This can be explained by the formation of the socalled 2U1D structure^{18} in which every third molecule appears in the bowldown state and is associated with an increase in STM tunneling current (see Supplementary Figure 1), whereas the rest of the molecules reside in the bowlup state. The relatively weak intensity of FFT peaks associated with 2U1D structure (inner hexagon in the inset of Fig. 1c) suggests only a partial formation of 2U1D structure across the field of view. In addition to the degrees of freedom associated with U and D states, each buckybowl can preside in several azimuthal rotational states. A simple visual inspection of several randomly selected areas of the STM image (such as the one illustrated in Fig. 1d) as well as an application of more advanced statistical tools such as principal component analysis^{20} (Fig. 1e) suggests that a likely number of rotational classes needed to be considered for this dataset is four (see Supplementary Note 2 and Supplementary Note 4, as well as Supplementary Figure 2 and Supplementary Figure 3). While a presence of three rotational states on (111) surface was expected from earlier studies,^{18} we assign an occurrence of an additional rotational class to a presence of imperfections of the molecular film and/or underlying substrate.
Due to the relative proximity of ground state energies in U and D conformations,^{18, 19} it is likely that certain perturbations will induce a transition between the two structural states yielding a deviation from ideal 2U1D periodic structure in the molecular film. Indeed, visual inspection depicted in e.g. Fig. 1d shows a presence of disorder (i.e., distortion of periodicity associated with 2U1D structure) in both U and D structural states, as well as in distribution of the molecules rotational classes. Recent experimental and theoretical studies suggest that a controllable formation of various architectures in 2D buckybowl adlayer is feasible through a manipulation via the SPM tip^{19, 21} or physical adsorption of certain chemical species,^{18} which could potentially lead to a realization of information storage molecular device or systems for molecular level mechanical transduction. Furthermore, due to a presence of multiple rotational states in addition to bowl up/down structural conformations, such system can be viewed as an ideal playground for probing an interplay between multiple order parameters, the molecular analog of multiferroic systems.^{22} However, one of the critical issues in these efforts lies in being able to identify (‘read out’) all the individual building blocks in different molecular formations on a scale of hundreds and thousands of molecules. This requires tools that would enable a classification of nonstrictly periodic structures in the STM images as well as extracting information on the ‘internal’ structure of individual units (molecules) in an automated and reliable fashion. Unfortunately, while average image analysis methods such as Fourier transform (FFT) and principal component analysis (PCA) of the STM data described above are useful in establishing physical priors, these methods alone are not sufficient for obtaining an accurate information on spatial distribution of the structural and rotational molecular states in such systems and on the possible spatial correlation between the corresponding order parameters. Below we demonstrate how adoption of Markov random field model and convolutional neural networks aided by density functional theory (DFT) simulations of STM images allows to classify bowlup/down structural states and different rotational classes in an automated and accurate fashion (Fig. 2).
Molecular selfassembly as Markov network
Markov Random Field (MRF) is a mathematical model that allows representing a longrange order of a system through defining only local interactions.^{23, 24} We illustrate its application using as an example the molecular system of buckybowls that can preside in different structural conformations (U and D states); later we also apply this scheme to the analysis of multiple azimuthal rotational states of buckybowls. The posterior probability distribution for the possible molecular states can be described using Bayes’s formula as:
where P(X = xZ = z) describes the probability of molecule belonging to a state x given the observation z (information from the experimental image) and is proportional to likelihood P(Z = zX = x) of the particular configuration leading to observed outcome multiplied by the prior distribution probability P(X = x) of such configuration in the absence of observation and based purely on our assumptions about the model. The P(Z = z) plays a role of a normalization constant. The priors P(x) can be represented via MRF, which makes use of an undirected graph G = (V, E), where V = {1, … n} is the vertex set associated with random variable X, and E is a set of edges joining pairs of vertices. The underlying assumption of Markov property is that state of an element is explicitly dependent only on the states of its neighbors,
Importantly, the explicit Markov structure implicitly carries longerrange dependencies; hence, it directly translates into the physics of our problem. Note that these priors are directly linked to the fundamental physics of the system, namely the presence of shortrange interactions in molecular assembly which are now explicitly taken into account during image analysis.
We represent our STM data on buckybowls in a form of graph in which each molecule is represented as a node (vertex), and edges are connections to each molecule’s nearest neighbors (Fig. 2b). The kd tree method with Euclidean distance metric is used to identify up to 6 nearest neighbors for each molecule. The posterior distribution P(X = xZ = z) of an MRF can be factorized over individual molecules such that^{25}
where \({{\rm{\Psi }}_i}\left( {{x_i},{z_i}} \right)\) represents unary potential given observation z, \({{\rm{\Psi }}_{ij}}\left( {{x_i},{x_j}} \right)\) are pairwise potentials over the connected neighbors, and Z is the partition function over the posterior MRF. The potentials are defined based on our knowledge about physicochemical properties of the molecular system. For analysis of molecules conformational changes, each node in our model can reside either in U state or D state. We then assign the unary potentials \({{\rm{\Psi }}_i}\left( {{x_i},{z_i}} \right)\) over molecular states based on the proximity of a particular molecule’s intensity in the STM image to the threshold value between the states T. The simplest threshold value is the mean value of all intensities after normalization and outlier removal (Supplementary Note 3). Therefore, node probabilities are calculated as
where \({I_i} \in [0,1]\) is the intensity of a given molecule i, and S is a parameter that controls the growth rate of the logistic function. This results in two logistic functions, which classify the molecular intensities far away from the threshold as belonging to their corresponding class with probability of 1, but provide more flexibility in the region around the threshold value itself. We proceed to assigning pairwise potentials \({{\rm{\Psi }}_{ij}}\left( {{x_i},{x_j}} \right)\) for our molecular system. The optimal 2U1D configuration proposed above is characterized by six U molecules surrounding one D molecule, such that D molecule is never allowed to have the nearest neighbor in the same bowl conformation. As we are interested in the distortion of an ideal structure, this condition is relaxed by introducing a disorder parameter p. The new probabilities used in our MRF model are summarized in Table 1. Finding an exact solution to MRF model is intractable in our case as it would require examining all 2^{n} combinations of state assignments, where n is the number of molecules, that is, about 1000 for examined images. However, one can obtain close approximate solution by using a maxproduct loopy belief propagation method, which is a messagepassing algorithm for performing inference on MRF graphs, with unary and pairwise potentials as an input^{25, 26} (see also Supplementary Note 5). We note that by tuning a graph structure and/or form of the potentials one can easily apply this approach to other molecular order parameters (such as lateral rotations, as we will show later in the paper) or even different molecular architectures.
Classification of azimuthal rotations via convolutional neural network
To determine an azimuthal rotational state of each molecule in the image in an automated fashion, we employ an approach based on convolution neural network (cNN).^{27} The cNN based image analysis has been successfully used in recent years in various areas of science and engineering ranging from cancer detection to satellite imaging,^{28, 29} but has yet to be applied to atomicresolved and molecularresolved imaging.^{30} The schematics of cNN adopted for the present study is shown in Fig. 2c. It consists of two convolutional layers interspersed with a subsampling/pooling layer, and a fully connected layer. The convolution layer is formed by running learnable kernels (‘filters’) of the selected size over the input image (or image in the previous layer), whereas the subsampling layer uses average pooling technique to reduce the size of the data. Fully connected layer at the end of the network contains as many neurons as the number of classes/states to be predicted. The learning of kernels is performed through a convolutional implementation of the backpropagation algorithm.^{31}
The cNN is trained on a set of synthetic STM images (25,000 samples) obtained from DFT simulations of different rotational classes (see more details in the next paragraph). Note that an information on the bowl conformational states (U and D) is inferred from the MRF analysis and is not therefore treated by cNN (when treated, the adopted cNN scheme produces much poorer overall accuracy in U/D states classification compared to MRF analysis).
Generation of synthetic STM data
To ascertain the applicability and robustness of our machine learning and pattern recognition methods for general STM data, we start with constructing a synthetic dataset on a model system. We generate synthetic STM images by Markov chain Monte Carlo sampler using inputs from DFT calculations of electronic charge density distribution in the molecules. We work under the commonly adopted assumption that ‘realistic’, experimental STM image can be viewed as a ‘distorted’ DFT simulation of a charge density distribution in the system (Fig. 3a).^{32, 33} For the molecular system under consideration, one possible type of the distortion is an admixture of another azimuthal rotational state to a given structural configuration of an individual molecules. Indeed, if two (or more) states are separated by a relatively low activation barrier, such as in the case of buckybowl’s distinct rotational states,^{19} the system may switch between these states during the acquisition of STM tunneling current over the molecule. As a result, the STM image will be a dynamical average of two (or more) states.^{34, 35} This effect may be especially pronounced during the roomtemperature measurements, small tipsample separation distances, or high setpoint current density. In addition, the blurring effect associated with a presence of the STM tip and a signaldependent Poisson noise were incorporated in our model (Fig. 3a). Here, blurring defines the convolution with the STM probe function, whereas Poisson noise is associated with the tunneling statistics.
Testing our methods on synthetic data
The MRF approach results in a remarkably accurate identification of molecular D and U states in scenarios where the distribution of the STM intensities in the synthetic data closely resembles the experimental one (note that different rotation angle with respect to the substrate results in a variation of STM signal intensity^{19}). This is illustrated on synthetic dataset described in Fig. 3b, c for which only 4 out of total 1225 molecules are misidentified by our classification scheme (Fig. 3d). The overall total error rate (ratio of misidentified molecules) as a function of pvalue and intensity distributions is shown in Fig. 4. Generally, this approach vastly outperforms simple meanvalue and/or average value thresholding (Fig. 4a) and allows accurate classification of U and D states for a wide range of intensity distributions (Fig. 4b) where no estimations regarding the pvalue is available apriori. In the case of data described in Fig. 3b, c, for example, increase of the pvalue by a factor of ∼3 would result in total error increase by less than ~1% (see Fig. 4a).
We proceed to extracting information about the azimuthal rotational state of the individual U and D units in the synthetic STM image (Fig. 5a). The dependence of classification accuracy on the hypothetical admixture ratios of a proximate rotational state (which are potentially located ‘inbetween’ the rotational classes used in our classification scheme) and the cNN error are shown in Fig. 5b, d, respectively. It is easy to see from Fig. 5b that one can obtain a reliable classification of molecules rotational states even for relatively large ratio of the selected admixed state.
The unique aspect of the proposed approach is that it is possible to incorporate certain physical constraints into the cNNbased analysis for obtaining more accurate decoding results. Here, we incorporate the effect of steric repulsion between molecules. In this case, we can use the cNNcalculated probabilities of azimuthal rotational states as prior probability distributions for another MRF model. Consider, for example, that two nearestneighbor molecules are highly unlikely to have the same azimuthal rotational states if they preside in the same conformation state (either U or D) in the selfassembly.^{18} We may therefore assign 1% probability of each class to have a neighbor of its own class and equal 33% probabilities to have a neighbor of other 3 rotational classes (see Supplementary Table 1). Total probabilities are then normalized to sum to 1. Then, similar to earlier description, we perform decoding using loopy belief propagation in order to acquire more accurate solution (Fig. 5c).
Application to experimental data
Having verified that our algorithm is capable of working on synthetic data that mimics the ‘laboratory conditions’, we move to applying it to real experimental STM images of buckybowls on gold (111) substrate described in Fig. 1. The FFT mask with Hamming window is first applied to the STM image for removing a largescale periodic contribution from the substrate. The MRF decoding of U and D states and the cNNbased decoding of azimuthal rotational states are summed in Fig. 6a. The physical priors and classes for MRF and cNN were taken from FFT and PCA analysis, respectively (see Fig. 1). A simple visual inspection of results in Fig. 6c can confirm a high accuracy of our method for experimental data.
Once a full decoding is performed, it becomes possible to construct a pair distribution function (PDF) for molecular states of interest. In turn, these provide further insight into the nature of (dis)order in molecular film, such as whether a disorder is correlated or random.^{1} The PDFs for all the molecular states, bowldown molecular confirmations, and one of the rotational classes are shown in Fig 6e–g. The molecules clearly display a welldefined long range positional order, as evident from Fig. 6e. On the other hand, an analysis of PDF for different molecular bowl conformations suggests that neither long range 2U1D nor perfect long range 3U orientational orders suggested previously^{18, 19} are realized for a given system. Indeed, the former must result in a disappearance (or a very strong suppression) of a peak at ≈11 Å in Fig. 6f, whereas for the latter the PDF of U states would closely resemble allmolecules PDF in Fig. 6e; these, however, was not observed. Interestingly, our analysis also shows a close resemblance in a behavior of PDFs for structural and rotational states (Fig. 6f, g, respectively) within first several coordination ‘spheres’ implying certain correlation between the two associated orders, that is, bowlup/down switching is associated with a formation of certain rotational (dis)order in the inverted molecules.
We further explore a nature of disorder in the molecular selfassembly by searching for local correlations between molecule bowl inversion and azimuthal rotation of the neighboring molecules. To obtain such an insight, we construct a spatial correlation map describing a possible interplay between these two different orders. Specifically, we adopt a method based on calculation of the socalled Moran’s I that can measure a spatial association between the distributions of two variables at nearby locations on the lattice.^{36, 37} The presence of the spatial weight matrix in the definition of Moran’s I allows us to impose constrains on the number of neighbors to be considered (see Supplementary Note 6). The results for spatial correlation between bowlup/down configuration and different rotational classes for the first ‘coordination sphere’ is shown in Fig. 6d where a different size of circles reflects different values of the Moran’s I across a field of view. Generally, the map in Fig. 6d implies a spatial variation in coupling between the two associated order parameters, which could also be sensitive to presence of defects. The average value of Moran’s I for all molecules is 0.310, whereas the average value for correlation of rotational classes with bowlup and bowldown molecular conformations are 0.246 and 0.426 respectively. This result indicates that a bowluptobowldown flip associated with occurrence of an ‘additional’ D molecule requires a larger change in a rotational state of the neighboring molecules (compared to a flip in the reversed direction) in order to compensate for a formation of energetically unfavorable (‘extra’) bowldown state.
Based on the findings above we propose a twostage ‘reaction path’ that explains a different correlation values of rotational states with neighboring bowldown and bowlup structural conformations schematically depicted in Fig. 7. Specifically, in the first stage, a creation of ‘extra’ bowldown state elevates the energy of the system, which is then relaxed in the second stage of the ‘reaction’ via adjustment of rotational states in nearby molecule(s). The latter is associated with the obtained values of Moran’s I. It is crucial to note that unlike previous studies which only considered a bowl inversion process for an isolated single molecule,^{18} our analysis allowed to obtain a deeper knowledge of local interaction processes that involve a lateral switching of neighboring molecules in the selfassembled layer. Observation of such an interplay between molecule rotations and its structural conformations provides important clues for understanding local degrees of freedom in the molecular adlayer which is crucial in terms of its potential applications in multilevel molecular memory devices.
Discussion
To summarize, we have developed a multistage pattern recognition approach which encompasses abinitio simulations, Markov random field and convolutional neural networks for a detailed characterization of surface molecular architectures in the typical field of view (~10^{2}–10^{3} molecules) of STM experiment.
We now comment on several potential limitations of the methods and possible ways to overcome them. First, the physical priors used for input in both MRF and cNN could be in future extracted (in addition to, or even instead of, FFT and PCA analysis) from stateoftheart abinitio analysis and molecular dynamics (MD) simulations thus potentially providing more accurate decoding results. In this regard, it should be noted that low probabilities of class determination for certain molecules, if present, would suggest that some of molecular states in the experimental system were not accounted for by theory. In such case, one must return to the abinitio modelling stage and reconsider the initial assumptions or adjust parameters. We envision that such process of adjusting (putting constraints on) abinitio or MD parameters could be automated in future, although this would require an infrastructure capable of performing DFT/MD onthefly. Second, it would be also interesting to apply the socalled domainadversarial training of neural networks^{38} which allows to alter theoretically predicted classes based on the observed data. The underlying idea of this approach is that the theoretical and experimental datasets are similar yet different in such a way that traditional neural networks may not capture correct features just from the labeled data. Finally, we foresee that in future a choice of the optimization value p in MRF analysis during an inference of bowlup/down structural states, as well as during the refinement of cNN results, could be in principle optimized using a statistical distance approach.^{39}
Regarding possible further applications of our method we note that in addition to analysis of individual static images of molecular structures, the same analysis can be applied to each individual frame in the STM “movies” of molecular motions (e.g. under external perturbation field) thus providing an invaluable input to molecular dynamics simulations. Furthermore, because our pattern recognition analysis is general in nature, it can be extended to microscopic measurements of structural, electronic, and magnetic orders, as well as their possible spatial correlations, in a variety of condensed matter systems such as, for example, skyrmion lattices.^{40}
Data availability
All the relevant data is available from the authors upon request.
References
 1.
Keen, D. A. & Goodwin, A. L. The crystallography of correlated disorder. Nature. 521, 303–309 (2015).
 2.
Overy, A. R. et al. Design of crystallike aperiodic solids with selective disorder–phonon coupling. Nat. Commun. 7, 10445 (2016).
 3.
Petrović, A. P. et al. A disorderenhanced quasionedimensional superconductor. Nat. Commun. 7, 12262 (2016).
 4.
Guo, H. et al. Strain doping: reversible singleaxis control of a complex oxide lattice via helium implantation. Phys. Rev. Lett. 114, 256801 (2015).
 5.
Bennett, T. D., Cheetham, A. K., Fuchs, A. H. & Coudert, F.X. Interplay between defects, disorder and flexibility in metalorganic frameworks. Nat. Chem. 9, 11–16 (2017).
 6.
Kalinin, S. V. & Pennycook, S. J. Microscopy: Hasten high resolution. Nature. 515, 487–488 (2014).
 7.
de Oteyza, D. G. et al. Direct imaging of covalent bond structure in singlemolecule chemical reactions. Science 340, 1434–1437 (2013).
 8.
Wang, Y. et al. Observing atomic collapse resonances in artificial nuclei on graphene. Science 340, 734–737 (2013).
 9.
Kalinin, S. V., Sumpter, B. G. & Archibald, R. K. Big–deep–smart data in imaging for guiding materials design. Nat. Mater. 14, 973–980 (2015).
 10.
Freysoldt, C. et al. Firstprinciples calculations for point defects in solids. Rev. Mod. Phys. 86, 253–305 (2014).
 11.
Rabe, K. M. Firstprinciples calculations of complex metaloxide materials. Annu. Rev. Condens. Matter Phys. 1, 211–235 (2010).
 12.
Friesner, R. A. Ab initio quantum chemistry: Methodology and applications. Proc. Natl. Acad. Sci. USA 102, 6648–6653 (2005).
 13.
Jia, C. L. et al. Atomicscale study of electric dipoles near charged and uncharged domain walls in ferroelectric films. Nat. Mater. 7, 57–61 (2008).
 14.
Jia, C. L. et al. Unitcell scale mapping of ferroelectricity and tetragonality in epitaxial ultrathin ferroelectric films. Nat. Mater. 6, 64–69 (2007).
 15.
Nelson, C. T. et al. Spontaneous vortex nanodomain arrays at ferroelectric heterointerfaces. Nano. Lett. 11, 828–834 (2011).
 16.
Borisevich, A. et al. Mapping octahedral tilts and polarization across a domain wall in BiFeO_{3} from zcontrast scanning transmission electron microscopy image atomic column shape analysis. ACS Nano 4, 6071–6079 (2010).
 17.
Sakurai, H., Daiko, T. & Hirao, T. A synthesis of sumanene, a fullerene fragment. Science 301, 1878 (2003).
 18.
Jaafar, R. et al. Bowl inversion of surfaceadsorbed sumanene. J. Am. Chem. Soc. 136, 13666–13671 (2014).
 19.
Fujii, S., Ziatdinov, M., Higashibayashi, S., Sakurai, H. & Kiguchi, M. Bowl inversion and electronic switching of buckybowls on gold. J. Am. Chem. Soc. 138, 12142–12149 (2016).
 20.
Jesse, S. & Kalinin, S. V. Principal component and spatial correlation analysis of spectroscopicimaging data in scanning probe microscopy. Nanotechnology. 20, 085714 (2009).
 21.
Olyanich, D. A., Kotlyar, V. G., Utas, T. V., Zotov, A. V. & Saranin, A. A. The manipulation of C_{60} in molecular arrays with an STM tip in regimes below the decomposition threshold. Nanotechnology. 24, 055302 (2013).
 22.
Spaldin, N. A. & Fiebig, M. Materials science. The renaissance of magnetoelectric multiferroics. Science 309, 391–392 (2005).
 23.
Cross, G. R. & Jain, A. K. Markov Random Field Texture Models. IEEE. Trans. Pattern. Anal. Mach. Intell. 5, 25–39 (1983).
 24.
Blake, A., Kohli, P. & Rother, C. Markov Random Fields for Vision and Image Processing (The MIT Press, 2011).
 25.
Weiss, Y. & Freeman, W. T. On the optimality of solutions of the maxproduct beliefpropagation algorithm in arbitrary graphs. IEEE Transactions on Information Theory 47, 736–744 (2001).
 26.
Schmidt, M. http://www.cs.ubc.ca/~schmidtm/Software/UGM.html (2007).
 27.
Nielsen, M. A. Neural Networks and Deep Learning (Determination Press, 2015).
 28.
Jean, N. et al. Combining satellite imagery and machine learning to predict poverty. Science 353, 790–794 (2016).
 29.
Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep. 6, 26286 (2016).
 30.
Cyphersmith, A., Maksov, A., HasseyParadise, R., McCarthy, K. D. & Barnes, M. D. Defocused Emission Patterns from Chiral Fluorophores: Application to Chiral Axis Orientation Determination. J. Phys. Chem. Lett. 2, 661–665 (2011).
 31.
Palm, R. B. Master Thesis (Technical University of Denmark, 2012).
 32.
Wachowiak, A. Visualization of the molecular JahnTeller effect in an Insulating K4C_{60} monolayer. Science 310, 468–470 (2005).
 33.
Lu, X., Grobis, M., Khoo, K. H., Louie, S. G. & Crommie, M. F. Charge transfer and screening in individual C_{60} molecules on metal substrates: A scanning tunneling spectroscopy and theoretical study. Phys. Rev. B 70, 115418 (2004).
 34.
Amara, H., Latil, S., Meunier, V., Lambin, P. & Charlier, J. C. Scanning tunneling microscopy fingerprints of point defects in graphene: A theoretical prediction. Phys. Rev B 76, 115423 (2007).
 35.
ElBarbary, A. A., Telling, R. H., Ewels, C. P., Heggie, M. I. & Briddon, P. R. Structure and energetics of the vacancy in graphite. Phys. Rev B 68, 144107 (2003).
 36.
Getis, A. & Ord, J. K. The analysis of spatial association by use of distance statistics. Geogr. Anal. 24, 189–206 (1992).
 37.
Anselin, L. Local indicators of spatial association. Geogr. Anal. 27, 93–115 (1995).
 38.
Ganin, Y. et al. Domainadversarial training of neural networks. Journal of Machine Learning Research 17, 1–35 (2016).
 39.
Vlcek, L. & Chialvo, A. A. Rigorous force field optimization principles based on statistical distance minimization. J. Chem. Phys. 143, 144110 (2015).
 40.
Matsumoto, T. et al. Direct observation of Σ7 domain boundary core structure in magnetic skyrmion lattice. Science Advances 2, e1501280 (2016).
Acknowledgements
This research was sponsored by the Division of Materials Sciences and Engineering, Office of Science, Basic Energy Sciences, US Department of Energy (M.Z. and S.V.K.). A.M. acknowledges fellowship support from the UT/ORNL Bredesen Center for Interdisciplinary Research and Graduate Education. Research was conducted at the Center for Nanophase Materials Sciences, which is a DOE Office of Science User Facility. The authors acknowledge Prof. Hidehiro Sakurai from Osaka University for synthesis of the buckybowl molecules, and Dr. Shintaro Fujii and Prof. Manabu Kiguchi from Tokyo Institute of Technology for their assistance in STM measurements.
Author information
Affiliations
Contributions
M.Z. conceived and led the project; he also performed STM experiments and DFTbased STM simulations. A.M. implemented algorithms for machine visionbased analysis of molecular structures in the STM images with help and guidance from M.Z. and S.V.K. M.Z. wrote the manuscript with inputs from A.M. and S.V.K.
Corresponding author
Correspondence to Maxim Ziatdinov.
Ethics declarations
Competing interests
The authors declare that they have no competing financial interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Received
Revised
Accepted
Published
DOI
Further reading

Atombyatom fabrication with electron beams
Nature Reviews Materials (2019)

Deep data analytics for genetic engineering of diatoms linking genotype to phenotype via machine learning
npj Computational Materials (2019)

Lab on a beam—Big data and artificial intelligence in scanning transmission electron microscopy
MRS Bulletin (2019)

Analyzing machine learning models to accelerate generation of fundamental materials insights
npj Computational Materials (2019)

Materials informatics: From the atomiclevel to the continuum
Acta Materialia (2019)