Deep reinforcement learning for data-driven adaptive scanning in ptychography

We present a method that lowers the dose required for an electron ptychographic reconstruction by adaptively scanning the specimen, thereby providing the required spatial information redundancy in the regions of highest importance. The proposed method is built upon a deep learning model that is trained by reinforcement learning, using prior knowledge of the specimen structure from training data sets. We show that using adaptive scanning for electron ptychography outperforms alternative low-dose ptychography experiments in terms of reconstruction resolution and quality.


I. INTRODUCTION
Ptychography is a coherent diffractive imaging (CDI) method that has found use in light, x-ray and scanning transmission electron microscopies (STEM).The method combines whole diffraction patterns from spatially overlapping regions to reconstruct the structure of a specimen for arbitrarily large fields of view [1], with many advantages over other imaging methods [2][3][4][5].The development of new hardware [6,7] and reconstruction algorithms [8,9] has led to ptychography becoming a mature electron microscopy technique [4].Current research to further improve this technique is driven by the desire to investigate thick samples [10][11][12][13][14] as well as to lower the required electron dose [15][16][17][18].
In order to lower the electron dose used, researchers have tried to vary various experimental parameters while preserving information redundancy through overlapping probes.One approach involves a defocused probe rastered across the specimen, with a less dense scan pattern.This uses therefore a lower dose than focused probe ptychography, but introduces additional complications for the reconstruction algorithm due to an increased need to account for partial spatial coherence in the illuminating probe [18].Another approach is simply to scan faster -by lowering the probe dwell time per probe position, an overall decrease in dose can be realized.However, this comes with its own limitations, as the physical limits of the electron source, microscope, and camera all must be considered.Finally, a third approach is the optimization of the scan pattern, deviating from a raster grid in favour of a generally more efficient pattern [19].This approach can, however, only yield a limited improvement in reconstruction quality as it is not capable of taking into account the structure of the specimen in the scan pattern.
In this paper we present an approach particularly tailored for electron ptychography that enables reduction of the electron dose through adaptive scanning.It is based upon the idea that, at atomic resolution, ptychography requires an increased information redundancy * schlozma@hu-berlin.dethrough overlapping illuminating beams only at regions that contain atomic structure of the scanned specimen.We present here an algorithm that scans only the regions with the highest information content in order to strongly improve the ptychographic reconstruction quality while keeping the total number of scan positions, and therefore the total dose, low.The scan positions are predicted sequentially during the experiment and the only information required for the prediction process is the diffraction data acquired at previous scan positions.
The scan position prediction model of the algorithm is a mixture of deep learning models, and the model training is performed with both supervised and reinforcement learning.The synergy of deep learning and reinforcement learning has already shown strong performance in various dynamic decision making problems, such as playing Atari games [20], or Go [21], as well as tasks in robotics [22,23] and visual recognition [24][25][26].The success of this approach, despite the complexity of the problems that they had to overcome, can be attributed to their algorithms' ability of learning independently from data.
Similarly, the proposed algorithm here solves a sequential decision making problem by learning from a large amount of simulated or, if available, experimental ptychographic data consisting of hundreds to thousands of diffraction patterns.Here, the focus of the learning is specifically designed to maximize the dynamic range in the reconstruction for each individual scan position.The algorithm then transfers the learned behaviour it developed offline to a realistic experimental environment.
Our approach is conceptually related to the subfield of computer vision that focuses on identifying relevant regions of images or video sequences for the purpose of classification or recognition.However, there are fundamental differences not only in the purpose, but also in the solution strategy for our application in contrast to computer vision tasks.Differences include a lack of direct access to images (updated real space information is only accessible through a highly optimized reconstruction algorithm), non-optimal parameter settings of the reconstruction algorithm and experimental uncertainties such as imprecise scan positioning of the microscope or contamination of the specimen requiring pre-processing of the reconstructed image, and the necessity of a much larger number of measurements requiring methods that improve the performance of the sequential decision making process.
Work in adaptive scanning for x-ray fluorescence imaging [27] and for scanning probe microscopy [28] has recently been reported.The work in [27] uses RL to determine the exposure time on a per pixel basis sequentially for multiple apertures that vary in their respective resolution.It is therefore more closely related to previous work in scanning electron microscopy that divides the measurement into a low-dose raster scan and a subsequent high-dose adaptive scan [29].The latter work in [28] uses Gaussian processes based Bayesian optimization to sequentially explore the image space with the scanning probe.However, it has been reported that this model suffers in performance as it lacks prior knowledge of the domain structure, which can be compensated by including a deep learning model with domain specific knowledge.Our proposed algorithm is the first application of adaptive scanning to ptychography, and is further unique in that the scan pattern is predicted using prior knowledge about the sample in the form of a pre-trained deep RL network, thereby improving performance.Our research forms a basis for a new avenue of automated and autonomous microscopy [30].
With the ever-increasing data storage capacities, implementations of data infrastructures and data sharing platforms [31][32][33][34], access to ptychographic data will be further facilitated and data-driven adaptive scanning schemes can be applied to a vast number of ptychographic experiments.
We demonstrate the performance of our algorithm using experimentally acquired data.Our analysis shows that the algorithm can sufficiently learn information about the structure of a material from data in order to optimize the scan behaviour of the microscope in a real experiment.For low dose experiments, we show that adaptive scanning can improve the ptychographic reconstruction quality by up to 25.75% and the resolution by up to 31.59% compared to a non-adaptive (random) scan method.Adaptive scanning allows for the retrieval of the material's structure in this low dose regime and even improves the resolution of the reconstruction when compared to the reconstruction obtained using the conventional high dose raster grid scan approach.

A. Image formation in ptychography
Single slice ptychography can be expressed by a multiplicative approximation that describes the interaction of a wavefunction ψ in p (r) of an incoming beam with the transmission function t(r) of a specimen.For each measurement p, the beam is shifted by R p and a diffraction pattern is acquired with the intensity I p that is expressed by: where F is the Fourier propagator, r the real space coordinate, k the reciprocal space coordinate and Ψ ex p (k) the exit wavefunction at the detector.According to the strong phase object approximation, the transmission function can be defined as t(r) = e iσV (r) , with the interaction constant σ and the complex quantity V (r), where the real part V re (r) corresponds to the local projected electrostatic potential and the imaginary part V im (r) accounts for absorption or scattering outside the range of scattering angles and energy losses recorded by the detector.Throughout the remainder of this paper, the variable σ is absorbed into V (r).X-ray and optical ptychography is mathematically described similarly with the only difference that the transmission function t(r) is related to the complex refractive index of the specimen.Figure 1 illustrates the experimental configuration of conventional ptychography.
The potential of the specimen is recovered from data of experimentally acquired diffraction patterns J p using a reconstruction algorithm.Here, we apply a gradient based algorithm [17] with a gradient decent optimization and the potential is retrieved by iteratively minimizing the loss function: Although the approach described in this paper is compatible with multisclice ptychography, in light of the application to a 2D material we constrain ourselves to single-slice ptychography.

B. Generation of scan sequences
We consider a recurrent neural network (RNN) [35][36][37] for the generation of scan sequences.Its network architecture is designed to model temporal sequences with recurring input information.Memory cells combine the current input information X t with the hidden state H t and map it to the next hidden state H t+1 .These hidden states represent the memory gathered from all the previous time steps.At every time step t, an output is generated on the basis of the current hidden state.In the implementation shown here, the output corresponds to a sub-sequence of scan positions, given by a vector of 2D coordinates R Pt .In principle, the output can be reduced to a single scan position R pt , but we do not for to practical reasons that will be discussed later.The subsequence is predicted via a fully connected layer (FC) that is parameterized by the layer weights θ H : At the predicted scan positions R Pt , diffraction patterns J Pt are acquired by the microscope and from these diffraction patterns a potential V t (r) is reconstructed minimizing Eq. ( 2).The intermediate reconstruction V t (r) combined with its corresponding sub-sequence of scan positions R Pt can then be used for the input information X t of the RNN.However, the bandwidth of the information given in V t (r) and R Pt differs strongly and thus pre-processing is required before the two components can be concatenated and mapped to X t .For the processed location information L t based on the subsequence R Pt , a FC that is parameterized by the weights θ R is used: For the processed structure information C t based on the reconstructed potential V t (r), a compressed representation z t is generated by using the encoder part of a convolutional autoencoder [38].This processing step is described in more detail in appendix A. The compressed representation z t is then fed into a FC that is parameterized by the weights θ z : The processed location information L t is subsequently concatenated with the processed structure information C t and mapped to the input information X t with a FC that is parameterized by the weights θ LC .The whole process of predicting sub-sequences of scan positions and acquiring the corresponding diffraction patterns is repeated until a ptychographic dataset of desired size is reached.
In practice and even after a reduction through adaptive scanning, several hundreds to thousands of diffraction patterns are required for effective ptychography.Covering this range of scan positions with a strong prediction performance requires efficient training of a large RNN.Backpropagation through time (BPTT) is typically used to generate the required gradients to update the network weights θ = {θ H , θ GRU , θ LC , θ R , θ z } of the RNN.Its foundation on the chain-rule, with terms being multiplied by themselves as many times as the length of the network, can result in problems with training efficiency.For even the most basic RNN architectures, BPTT fails for relatively short sequences due to the so-called vanishing or exploding gradient problem [39].To circumvent this issue a more complex RNN architecture was proposed by Hochreiter et al. [40].The Long-Short-Term-Memory (LSTM) network uses a more complex mapping between the input information and hidden state to the output, which allows a more efficient training using the BPTT method for larger networks.The gated recurrent units (GRU) network, which is a computationally faster, simplified version of the LSTM network, is used in this paper [41].A very large network would, nevertheless, be difficult to train using BPTT and also greatly increase acquisition time in adaptive scanning due to, e.g., a more frequent data transfer and generation of intermediate reconstructions V t (r).Therefore, a sub-sequence of scan positions R Pt is preferred over a single scan position R pt as the RNN output.Figure 2 shows the prediction process modeled by the RNN in full detail.

C. Training through reinforcement learning
A RNN, such as the one described in the previous section, can be combined with RL to provide a formalism for modelling behaviour to solve decision making problems.In RL, a learning agent interacts with an environment, while trying to maximize a reward signal.This is generally formalized as a Markov decision process (MDP) described by a 5-tuple: S, A, ρ, r, γ .At each time-step t the agent has complete knowledge of the environment by observing the state s t ∈ S and makes an optimal decision by selecting an action a t ∈ A. Based on s t and a t , the next state s t+1 is generated according to a transition function ρ : S × A × S → [0, 1].The agent additionally receives a feedback through a scalar reward function r : S × A → R.This reward r contributes to the total reward computed at the end of the sequence, G = T t=0 γ t r(a t , s t ), also known as the return.The discount factor γ ∈ [0, 1] controls the emphasis of long-term rewards versus short-term rewards.
In the case of adaptive scanning in ptychography, complete knowledge of the specimen structure is not known and the previous described formalism, where observations are equivalent to states, is not quite applicable.A partially observable Markov decision process (POMDP) generalizes the MDP to a 7-tuple: S, A, ρ, r, O, ω, γ by considering the observation o t ∈ O to contain only partial or incomplete information about the state s t , and which is generated according to a observation function ω : A × S × O → [0, 1].Therefore, o t can not sufficiently represent the state s t and instead the entire history of observations and actions up to the current time h t = {o 1 , a 1 , ..., o t−1 , a t−1 , o t } is used as basis for optimal or near-optimal decision making.A stochastic policy π θ (a t |h t ) maps the history of past interactions h t to action probabilities.Given a continuous action space, the policy can be represented by a two-dimensional Gaussian probability distribution: with its mean vector µ θ (h t ) corresponding to R pt , where the history h t is summarized in the hidden state H t of the RNN and the covariance matrix Σ with fixed variances σ 2 x ∈ [0, 1] and σ 2 y ∈ [0, 1].In this POMDP formalism, however, a single action a t is drawn from the probability distribution π θ (a t |h t ), which corresponds to a single agent interacting with the environment.This is incompatible with scan control in ptychography where we seek to predict multiple scan positions at each time step.A partially observable stochastic game (POSG) extends the POMDP formalism to a 8tuple, M, S, {A m } m∈M , ρ, {r m } m∈M , {O m } m∈M , ω, γ , with multiple agents M , each selecting an action a m t and making an observation o m t given the state s t .Thus, joint actions a t = a 1 t , ..., a m t from the joint action space A = A 1 × ... × A M are executed and joint observations o t = o 1 t , ..., o m t from the joint observation space O = O 1 × ... × O M are received from the environment at each time step.In this case, the transition function is given by ρ : S × A × S → [0, 1], the observation function is given by ω : A × S × O → [0, 1] and each agent receives its immediate reward defined by the reward function r m : S × A → R. Here, we consider the individual agent to have access to the actions and observations of all other agents, which allows the optimization of its individual policy π θ m (a m t |h t ) using the joint history of observations and actions h t = {o 1 , a 1 , ..., o t−1 , a t−1 , o t }.The joint policy of all agents is then defined as π θ (a The goal of RL is now to learn a joint policy that maximizes the expected total reward for each agent m with respect to its parameters θ m : where the expected total reward can be approximated by Monte Carlo sampling with N samples.In this paper, improvement of the policy is achieved by updating the policy parameters θ m = {θ m H , θ GRU , θ LC , θ R , θ z } with 'REINFORCE' [42], a policy gradient method: The derivation of ∇ θ m J m (θ) is given in the appendix B.

D. Learning to adaptively scan in ptychography
While policy gradient methods are the preferred choice to solve reinforcement learning problems in which the action spaces are continuous [43], they come with significant problems.Like any gradient based method, policy gradient solutions mainly converge to local, not global, optima [44].In this paper, we reduce the effect of this problem during training by splitting the training of the RNN into supervised learning and RL.The first training step initializes the policy parameters such that the scan pattern follows a conventional grid pattern, thereby avoiding relatively poor local optima during subsequent policy gradient steps.This training step is explained in more detail in the appendix C. In the second training step, the pre-trained policy is fine tuned through RL, resulting in a scan pattern that has been adapted to the structure of the material.
A high variance of gradient estimates is another problem that particularly strongly affects the Monte Carlo policy gradient method [43,45,46].Due to this, the sampling efficiency is relatively low, which causes a slow convergence to a solution.This makes deep RL applied to ptychography challenging as the image reconstruction itself requires iterative processing (see section II A).
The high variance can be in part attributed to the difficulty of assigning credit from the overall performance to an individual agent's action.This credit assignment problem is limited to a temporal problem in the single agent RL case [47].In this case, methods for variance reduction to better assign credit to individual actions are for instance reward-to-go [48] or using a baseline [44,45].
In RL involving multiple cooperating agents with a shared reward function r 1 (a t |s t ) = r 2 (a t |s t ) = ... = r m (a t |s t ), the challenge to overcome the credit assignment problem further increases due to the necessity of now identifying the contribution of each agent's action to the total reward.This challenge can be tackled with difference reward [49][50][51][52], which replaces the shared reward with a shaped reward that is formed by comparing the global reward with a reward that an agent would receive when performing a default action.
Following the idea of a difference reward in spirit, we introduce a way to estimate the reward function in order to tackle the credit assignment problem for adaptive scanning in ptychography.The reward function should naturally correspond to the quality of the ptychographic reconstruction.We have found empirically that a high reconstruction quality correlates positively with a high dynamic range in the phase.Therefore, the reward function could intuitively be formalized by r m (a t |s t ) = P −1 r∈FOV V (r), where P is the total number of scan positions.This formulation, however, does not solve the credit assignment problem and results in an insufficient training performance, as shown in Figure 3a).To estimate the reward for the actions of each individual agent, we use a tessellation method that partitions the atomic potential into small segments.A Voronoi diagram [53], where each position corresponds to a seed for one Voronoi cell, enables assignment of only a part of the total phase to each position.More precisely, the Voronoi diagram formed by the predicted scan positions is overlaid with the corresponding ptychographic reconstruction at the end of the prediction process and the summed phase within each Voronoi cell is the reward for that cell's seed position.The reward function can be expressed by r m (a t |s t ) = P −1 r∈Cell m V (r). Figure 3b) shows a Voronoi diagram generated by predicted scan positions.

E. Experiment and model design
For the experimental investigation, we acquired multiple ptychographic datasets from a monolayer molybdenum disulfide (MoS 2 ) specimen with a NION HERMES microscope.The microscope was operated with a 60 kV acceleration voltage, a convergence angle of 33 mrad and diffraction patterns with a pixel size of 0.84 mrad were acquired using a Dectris ELA direct electron detector mounted at the electron energy loss spectroscopy (EELS) camera port.Distortions induced by the EEL spectrometer were corrected with in-house developed software.For the ptychographic dataset acquisition, a conventional grid scan with a scanning step size of 0.02 nm was used.From the experimentally acquired datasets we created 175 smaller datasets, each with 10,000 diffraction patterns.The diffraction patterns were binned by a factor of 2 to 64 × 64 pixels.The adaptive scanning algorithm was then trained on the smaller datasets with the goal of predicting optimal scan sequences of 250 to 500 probe positions, out of the possible 10,000, which corresponds to a dose reduction by a factor of 40 to 20.Each sub-sequence contains 50 to 100 positions, where the first sub-sequence follows a quasi-random Halton sequence.
The ptychographic reconstructions were performed with an optimized version of ROP [17] that allows simultaneous reconstruction from a batch of different datasets, which was required for efficient model training.A gradient descent step size α ROP of 5.25e2 was chosen and the potential was retrieved at iteration 5.The reconstructed potential was 200 × 200 pixels with a pixel size of 0.0154 nm, for a field of view of 2 × 2 nm.For the generation of the reward function, Voronoi diagrams were generated with the Jump Flooding Algorithm [54] and for the implementation of the network models, PyTorch [55] was used.For the compression of structure information, we used a convolutional autoencoder consisting of 6 convolutional layers with kernels of dimension 3, a stride of 1 and channels that ranged from 16 to 512 for the encoder and decoder part, respectively.The input of the autoencoder had a dimension of 512 with a pixel size of 0.0064 nm and thus a scaling and an interpolation was required before the potential generated by ROP could be compressed.In addition, the value of the potential V i at each pixel i was transformed to zero mean and unit variance.For the prediction of the scan sequences, pre-training and fine-tuning was performed with a RNN model composed of 2 stacked GRU layers with hidden states H t of size 2048, the Adam optimizer [56] with a learning rate α RNN of 1e-5 and a batch size of 24.For the fine-tuning, a policy with variances of σ 2 x = σ 2 y = 0.0125 2 was chosen and a myopic behavior was enforced by set-ting the discount factor for the return, G, to γ = 0.All settings used for training the adaptive scanning algorithm are summarized in Table II.

A. Adaptive scanning on experimental MoS2 data
Figure 4 shows the result of adaptive scanning on experimentally acquired MoS 2 data and compares it to the result of a random scanning and the conventional grid scanning procedure.The data used for the comparison was not part of the training data for the adaptive scanning model.While the full data set consisting of 10,000 diffraction patterns has been used to obtain a ground truth reconstruction, only 250 diffraction patterns have been used for the adaptive scanning as well as the random scanning reconstruction.Figure 4a) shows the ptychographic reconstruction when using a random scanning procedure.The structure of the material is not clearly resolved and large parts of the field of view are not covered by the scanning procedure.Figure 4c) shows the reconstruction when the scan positions are predicted by the adaptive scanning algorithm.The structure of the MoS 2 material is now much better resolved and is closer to the ground truth reconstruction of the full data grid scanning procedure, shown in Figure 4e).Figure 4b), d) and f) show the diffractograms of the corresponding reconstructions and the circled diffraction spots show that the highest resolution of 1.08 Å is achieved by the adaptive scanning procedure, while the lowest resolution of 3.25 Å is obtained by the random scanning procedure.Further examples of reconstructions and their corresponding scan sequences are shown in Figure 8.
The results suggest that probe delocalization due to scattering plays an important role as to why an improved ptychographic reconstruction can be achieved by distributing the scan positions predominantly on the atoms of the specimen.When the beam is positioned on an atom, it scatters to higher angles and thus experiences spatial delocalization.It therefore also probes a larger range in real space, i.e. the scattering includes the local environment of the atom the beam hits.This implies that similar results could be achieved by using RL with a reward function that specifically emphasizes the scattered electrons in the recorded diffraction patterns, which is an interesting area for future research.
The final point of our investigation into adaptive scanning in ptychography evaluates the performance of the method for various prediction settings.We compare the structural similarity index measure (SSIM) [57] between the reconstruction obtained from the reduced data and the ground truth reconstruction obtained from the full data to quantify the improvement when using adaptive scanning.Here, SSIM a and SSIM r are the SSIM using a reconstruction of reduced data that is obtained with the adaptive scanning and the random scanning procedure, respectively.Table I shows the relative reconstruction quality improvement Q SSIM = (SSIM a −SSIM r )/(SSIM r ) for different experimental settings averaged over 25 data sets.Additionally, the relative resolution improvement Q res averaged over the same datasets is given.In the case of 250 scan positions, which corresponds to a dose reduction by a factor of 40 with respect to the original data, tests were performed for multiple sub-sequences, i.e. predictions.The quality improvement ranges from 16.71% to 25.75% and the resolution improvement ranges from 9.74% to 27.57% for a number of 2 to 5 sub-sequences (corresponding to 1 to 4 predictions), respectively.Further tests were performed using a larger number of total scan positions and 5 sub-sequences.However, while the relative resolution slightly improves for an increasing amount of scan positions, the difference in quality between the reconstruction generated with the positions of the adaptive scan and the random scan decreases with the total number of positions used, as can be expected, since the random sampling covers the sampled area in an increasingly complete manner.It should be noted that for all tests which used adaptive scanning with 5 sub-sequences, a higher resolution of the reconstruction compared to that of the reconstruction of the full data set was achieved.These results indicate that the reconstruction quality and resolution improves with the frequency by which the positions are predicted, and that low dose experiments benefit the most from the adaptive scanning scheme.

IV. CONCLUSION
In this paper we present a method for electron ptychography that reduces the electron dose through adaptive scanning.It is based on the idea that ptychography requires only an increased information redundancy through overlapping illuminating beams at regions of the sample that contain atomic structure.The prediction algorithm is a mixture of deep learning models being trained using supervised and reinforcement learning.
We show an improved reconstruction quality and resolution when using an adaptive scanning approach on experimentally acquired monolayer MoS 2 datasets in comparison to another dose reduction scanning approach.In a low dose experiment the adaptive scanning procedure could improve on average the reconstruction quality by up to 27.75% and the resolution by up to 31.59%.The resolution achieved by adaptive scanning was also higher than that of the reconstruction from the full data set, but without the same homogeneous reconstruction quality throughout the entire field of view.
These improvements show that adaptive scanning for ptychography is a useful technique to lower the dose needed for the analysis of sensitive samples.In addition to that, the proposed algorithm can be taken as a blueprint for a broad range of scanning based microscopy methods and thus paves the way for future research in machine learning supported, automated and autonomous microscopy.After estimating the network weights φ e and φ d by minimizing the loss function: we can utilize the encoder network E φe for the compression of V t (r). Figure 6 shows a compression of a partial reconstruction V t (r) and the decompression of its corresponding compressed representation z t .This preprocessing helps the algorithm form the hybrid input information X t by reducing the structure input information size, but also handle reconstructions from experimentally acquired data that may suffer from noise, contamination and/or incorrect scan positions.Appendix B: The "REINFORCE" algorithm In the case of the multi agent RL problem, where we use the POSG formalism, the objective of an agent m given by Eq. ( 7) can be expressed by: with the trajectory τ = {s 0 , o 0 , a 0 , s 1 , ..., s T , o T , a T } and the policy induced trajectory distribution π θ (τ ) = q(s 0 ) T t=0 ρ(s t+1 |s t , a t )π θ (a t |h t )ω(o t |s t ) and where q(s 0 ) is the distribution of initial states.Applying the gradient ∇ θ m to the objective and using the identity ∇ θ π θ (τ ) = π θ (τ )∇ θ logπ θ (τ ), we obtain: While training in RL can be performed with a policy whose parameters are arbitrarily initialized, this is not ideal.Having an adequate initial guess of the policy and using RL subsequently to only fine tune the policy is a much easier problem to solve.A quasi-random Halton sequence [60] with equally spaced probe positions is a reasonable initialization.Pre-training of the parameterized policy for the RL model can then be performed by supervised learning applied on the RNN such that the discrepancy between the predicted scan positions R Pt = µ θ (h t ) and the scan positions of the initialization sequence R init Pt is minimized: (C1) Figure 7 illustrates the scan positions during the fine tuning of the policy through RL for the first 10,000 iterations when either a) a policy that has not been initialized via supervised learning or b) an initialized policy is used.While the scan positions in both cases converge to the atomic structure, the positions predicted by the non-initialized policy are distributed only within a small region of the field of view during the entire training.

FIG. 1 .
FIG. 1. Experimental setup in ptychography.At the scan position Rp of the scan sequence, the beam illuminates a sample, where the incident electron wave ψ in p (r − Rp) interacts with the transmission function t(r).The wave exiting the sample is propagated by a Fourier transform to the detector located in the far field and the intensity Ip = |Ψ ex p (k)| 2 is recorded.

FIG. 2 .
FIG.2.Schematic of the forward propagation process of the RNN model.The RNN consists of GRU units that use the hidden state Ht from the previous time step and the hybrid input information Xt to create a new hidden state Ht+1.The hybrid input is the concatenation of the pre-processed information from the sub-sequence of scan positions RP t and the corresponding compressed representation of the partial reconstruction zt.The output of the GRU cell is used to predict the positions of the next sub-sequence RP t+1 and is also used as the input for the next GRU cell.The process is repeated until the full length of the scan sequence, consisting of T subsequences, is reached.

1 FIG. 3
FIG. 3. a) Learning curves of RL with multiple agents that use a shared reward or shaped reward, illustrated in orange and blue, respectively.b) A Voronoi diagram is used to assign a unique reward to each scan position of the predicted sequence.The scan positions are shown as red dots, where the first 50 positions are distributed on the right side within the dark blue area.For visualization purpose, the ground truth reconstruction is included in the diagram.

FIG. 4 .
FIG. 4. Ptychographic reconstructions of a MoS2 data set with different scanning procedures.a) Reconstruction from 250 diffraction patterns of the data set that correspond to scan positions which follow a random sequence and c) an adaptively predicted sequence.e) Reconstruction of the full data set with 10,000 diffraction patterns acquired with the conventional grid scan.b), d) and f) Corresponding diffractograms of the reconstructions.The real-space distance of the circled diffraction spots is labeled.

8 FIG. 5 .
FIG. 5. Schematic of the convolutional autoencoder model.A reconstruction V generated from diffraction patterns is mapped to the compressed representation z by using the encoder network E φe (V ).The compressed representation z is then basis for a reverse mapping by the decoder network D φ d (z) to generate a prediction of the potential V .

FIG. 6 .
FIG. 6. Convolutional autoencoder applied on partial structure information, given by the reconstruction of data from a sub-sequence of scan positions.a) The reconstruction Vt(r) from a sub-sequence of scan positions that is used as input for the convolutional autoencoder.b)-e) 4 channels of the compressed representation zt of the structure information.f) Decoded structure information Vt from zt.

- 1 .FIG. 7 .
FIG. 7. Fine tuning of a policy with RL that a) has not been initialized and b) has been initialized via supervised learning.Positions A indicate the scan positions of the first subsequence RP 0 that is provided to the RNN as part of the initial input.Positions B and C are the scan positions of all predicted sub-sequences at iteration 0 and 10,000, respectively.The trajectories they form during the optimization process are indicated by a dashed blue lines.

FIG. 8 .
FIG. 8. Ptychographic reconstructions of different MoS2 data sets and with different scanning procedures.Reconstruction from 250 diffraction patterns of a data set that correspond to scan positions which follow a)-d) a random sequence and e)-h) an adaptively predicted sequence.e)-h) Ground truth reconstruction of the full data set with 10,000 diffraction patterns shown with the scan positions used for the corresponding reconstructions a)-d) in green and e)-h) in red.

TABLE I .
[17]ormance of adaptive scanning for various experimental settings that differ in the number of scan positions and the total number of sub-sequences.For each setting, the oversampling ratio N k /Nu, which is calculated following[17], and the electron dose is given.# Pos.N k /Nu Dose (e − / Å−2 ) # Sub-seq.