Differential programming enabled functional imaging with Lorentz transmission electron microscopy

Lorentz transmission electron microscopy is an advanced characterization technique that enables the simultaneous imaging of both the microstructure and functional properties of materials. Information such as magnetization and electric potentials is carried by the phase of the electron wave, and is lost during image acquisition. Various methods have been proposed to retrieve the phase of the electron wavefunction using intensities of the acquired images, most of which work only in the small defocus limit. Imaging at strong defoci not only carries more quantitative phase information, but is essential to the study of weak magnetic and electrostatic fields at the nanoscale. In this work we develop a method based on differentiable programming to solve the inverse problem of phase retrieval. We show that our method maintains a high spatial resolution and robustness against noise even at the upper defocus limit of the microscope. More importantly, our proposed method can go beyond recovering just the phase information. We demonstrate this by retrieving the electron-optical parameters of the contrast transfer function alongside the electron exit wavefunction.


INTRODUCTION
The design and development of new and improved engineered materials requires a fundamental understanding of the structureproperty relationship at the nanoscale. For example, magnetic materials which host novel topological excitations such as skyrmions are of great interest to not only fundamental physics but also technological applications [1][2][3][4] . The quest to understand how these excitations interact locally with material inhomogeneities requires the capability to map the weak magnetization with both high spatial resolution and high phase accuracy. Similarly, grain boundaries in solid electrolytes for fuel cells and solid state batteries have a huge impact on their charge transport behavior [5][6][7][8] . The ability to quantify the often weak local electrostatic potential, along with the microstructure and the composition is crucial to the controlling and engineering of desired properties in these materials. Last but not least, 2D electron gas at the interface of heterostructures plays an important role in the emergence of unique properties such as interfacial magnetism, and high interfacial conductivity 9,10 . Their functional characterization also requires high spatial resolution and high phase sensitivity owing to their confined nature.
Despite being a relatively old technique, Lorentz transmission electron microscopy (LTEM) has seen major developments in recent years thanks to its ability to perform correlative microstructural and functional imaging at the nanoscale [11][12][13][14][15][16][17] . Quantitative information about the functional properties such as magnetization and electrostatic potentials is carried by the phase shift of the electrons, and is described by the Aharanov-Bohm relation 18 . This phase information is however lost during image acquisition as the recorded intensity is merely the squared amplitude of the electron exit wavefunction. Quantitative evaluation of the functional properties thus necessitates solving for the phase of the electron wave. The process of obtaining phase information from measured intensities is the basis for a variety of coherent imaging techniques in x-ray (known as phase retrieval) and electron microscopy (known as wave function reconstruction). In LTEM, methods based on the transport of intensity equation (TIE) 19 and off-axis electron holography are commonly used, each having its own merits and limitations 20,21 . The TIE approach is experimentally easy to implement, but the result often suffers from low spatial resolution due to the extent of defocusing required. Off-axis holography on the other hand has high sensitivity, and high spatial resolution [22][23][24] , but comes with additional requirements such as the need for a reference electron wave, and a special electron-optical setup. Iterative methods such as the Gerchberg-Saxton (GS) algorithm [25][26][27][28][29][30] and the maximumlikelihood optimization 31,32 have also been proposed. These methods are most commonly used in high resolution TEM, and for retrieving electrostatic phase shifts. More recent development on 4D-STEM have shown promising results on achieving extremely high spatial resolution 33 as well as magnetic phase shift retrieval 34 , but the long measurement time makes them impractical for in situ experiments.
Recent years have seen a tremendous increase in the application of machine-learning methods to determine the structure-property relationship in materials characterization [35][36][37][38][39] . For electron microscopy, the primary focus has been on the analysis of high resolution images to determine atomic positions, or on the processing of large datasets for electron diffraction to infer physical properties of the materials [40][41][42][43][44][45][46] . In this work, we develop a method based on reverse-mode automatic differentiation (AD) to solve the inverse problem of phase retrieval in LTEM. AD has become the de facto means of training a neural network 47 thanks to the flexible interfaces developed and supported by large technology corporations 48,49 . Phase retrieval with AD was first proposed 50 and later demonstrated with x-ray coherent diffraction imaging 51,52 . By applying this method to LTEM, we demonstrate functional imaging with simultaneously high spatial resolution and high phase accuracy. We use large defocus conditions to increase the phase sensitivity of our method, which is a common practice for imaging weak magnetic and electrostatic fields 53,54 . We show that under these conditions, our method outperforms the TIE method in terms of spatial resolution and the GS method in terms of robustness to noise. The high noise tolerance and improved phase sensitivity should permit significant reduction on the acquisition time, which may prove useful on beam sensitive samples or in in situ experiments. More importantly, our method can easily be extended to recover any information implicitly contained in the measured intensity. We demonstrate this through the simultaneous retrieval of the convergence semiangle and the correct defocus values alongside the exit wavefunction.

RESULTS
Phase retrieval using differentiable programming The process of applying AD for phase retrieval in LTEM is shown in Fig. 1. We used the modified electron wave function to describe the electron-sample interaction. The exit wavefunction is calculated as ψðr ? Þ ¼ aðr ? Þe iφðr?Þ , where a is the amplitude, φ is the phase and r ⊥ is a radial vector in the direction perpendicular to the electron propagation direction taken to be along z 55 . The phase shift, φ(r ⊥ ), accounts for contributions from both the mean inner potential, and the magnetization of the sample. The mean inner potential can be considered as a result of forward elastically scattered electrons and approximated as the zero-order term of the Fourier expansion of the crystal lattice potential 56 . We initialize the guess amplitude as the square root of the intensity of the infocus image and keep it fixed at the beginning of the phase retrieval process. (For reasons why this is a good estimation of the true amplitude, the reader is referred to the Supplementary Notes.) The starting guess φ 0 for the phase is set to a constant. For each iteration, the wavefunction at the image plane is calculated by the convolution of the exit wavefunction with the microscope contrast transfer function (TF). The latter includes both the phase transfer function and the damping envelope. The linear image formation model employed in this work is generally valid within the scope of LTEM, even in the presence of a strong phase object such as a magnetic sample. Because the scattering angle of Lorentz deflection is small, about 1000 times smaller than the angle of the diffracted electron beam, the spatial frequencies relevant to the magnetic signal are also small. The phase information for such strong phase objects is thus primarily carried by the lowest order terms in the phase transfer function 28 . Finally, the difference between the absolute of the calculated image wavefunction and the square root of the measured intensity of the Fresnel images is used to compute the loss function. The concept of phase retrieval with AD is analogous to the training of a neural network. In both cases, the gradients are calculated by backpropagating the loss through the network in what is known as reverse-mode automatic differentiation. The guess phase and amplitude are then updated iteratively in steps proportional to the negative of the gradients. The phase retrieval process is considered complete if the improvement of the loss function is smaller than a pre-defined value. For more detailed description, the reader is referred to the "Methods" section. Next, we shall demonstrate the advantages of AD over other conventional methods using simulated and experimental Fresnel images at different defocus values.
Higher spatial resolution Figure 2 shows the comparison of phase retrieved with various methods using simulated data at large defocus values. Figure 2a shows the ground truth phase used for creating the simulated LTEM dataset (Supplementary Fig. 1). The knowledge of the ground truth is essential for evaluating the accuracy and spatial resolution of the retrieved phase, which is described in the "Methods" section. State-of-the-art TIE 57,58 formalism uses the differentiation of Fresnel images to numerically approximate the longitudinal intensity derivatives. The phase is then retrieved by solving the partial differential equation using either the Fourier transform method 59 or the conjugate gradient method 60 . TIE solved in this way is strictly valid only in the small defocus limit. At large defocus values, the intensity no longer varies linearly with Δf 61 , and exact solution of the partial differential equation is required using for instance multi-grid numerical integration 25 . The biggest hurdle that limits the application of TIE at large defocus values is its reduced spatial resolution 62 . This is illustrated in Fig. 2c using the most simple scenario consisting of one over-focused and one under-focused image (Δf = ±1.6 mm). Despite using noise-free images, the achievable accuracy was only at 90%. The retrieved phase appeared to be blurry, which was reflective of the estimated spatial resolution of 160 nm (Fig. 2b).
We then performed AD phase retrieval on the same dataset. At large defocus values, AD (Fig. 2d) vastly outperformed TIE by simultaneously showing an extremely high level of accuracy (99%) and high spatial resolution (5 nm). We also compared our result with phase retrieved using the Gerchberg-Saxton iterative algorithm. Details about the GS algorithm implemented for this work can be found in the "Methods" section. Figure 2e shows that the GS retrieved phase on the same dataset did not converge to the ground truth. This is understood as the GS method by design only works with images of weak defocus values (|Δf|~1 μm) 26 , because the back-projection involves division by the spatial coherence envelope which, at large defocus values, vanishes for large spatial frequencies. Although a modification was proposed to improve the numerical stability of the GS method 27 , that too was limited to only moderately defocused images (|Δf| < 100 μm). At large defocus values (|Δf| > 1 mm), information at large spatial frequencies cannot be calculated correctly, which explains the failed phase retrieval as shown in Fig. 2e. For comparison, Supplementary Fig. 2e shows the GS retrieved phase on two images at Δf = ±0.15 mm. As expected, the modified GS method is numerically stable in the moderately defocused regime. The quality of the retrieved phase ( Supplementary Fig. 2e) was in fact identical to that of the AD method ( Supplementary Fig. 2d). It is worth pointing out that, at moderate defoci, none of the methods was able to correctly retrieve the true phase ( Supplementary Fig.  2b). This is explained by the lower phase sensitivity at low spatial frequencies, and shows again the potential benefit of working at large defocus values.
Increased robustness to noise Next, we demonstrate how high noise tolerance can be achieved with AD at large defocus values. We choose to show the gradient of the phase to highlight the effectiveness of our strategy. Compared to the phase itself, the gradient of the phase is a better representation of the functional properties such as the direction of magnetic induction or local electric fields. Figure 3a shows the ground truth for the phase gradient, with its direction and its magnitude respectively indicated by false color and grayscale contour. Optimization based methods such as AD often suffer from numerical instability when working with noisy images at large defocus values. This again can be explained by the damping envelope which reduces the amount of information transferred, and as a result, the numerical constraint at large spatial frequencies. Continuous error reduction would force AD to fit to the noise at these frequencies, thus producing the grainy phase images as shown in Fig. 3b.
There are in principle two approaches to resolve the issue of numerical stability, the first being the implementation of regularizations 63 . In the case of AD, this is more commonly achieved by adding a total variation (TV) regularizer in the loss function. While a TV regularizer is known to be capable of preserving edges (i.e., high spatial frequency information) and improves noise tolerance, its effectiveness is highly dependent on the choice of the regularization parameter. Choosing the right value for the regularization parameter was essential to achieving the high accuracy (96.02%) shown in Fig. 3c, whereas using a larger value resulted in the loss of low spatial frequency information as shown in Supplementary Fig. 3c.
To improve the noise tolerance at large defocus values, we explore a strategy that leverages the flexibility of the AD method to work with any number of through focus images. More specifically, the images at moderate defocus carry sufficient information at large spatial frequencies, and could be used to improve the numerical stability of AD at large defocus values. Figure 3d shows the AD retrieved phase after adding two images at moderate defocus to the two images at large defocus, with a final accuracy of 96.93%. To ensure a fair comparison, the noise level of the images was raised to 15%. This is to account for the factor of ffiffi ffi 2 p shot noise variation when the exposure time of each image is halved and the total exposure time remains unchanged. It became evident that the retrieved phase with the mixed focus series (Fig. 3d) is actually a more faithful reconstruction of the ground truth (Fig. 3a) than the one retrieved with TV regularization (Fig. 3c). To test the limit of the noise tolerance, the level of the Gaussian noise was further increased to 30%. Using the same mixed focus condition, the AD process remained convergent to the truth (Fig. 3e). Despite the lower accuracy, information of both high and low spatial frequencies was correctly retrieved as evidenced by the phase contour near the edge and at the center of the nanostructures. Compared to AD, the GS method is even more unstable in the presence of noise at large defocus values, due to reasons stated in the previous section. While the idea of mixed focus series also improves the noise tolerance of GS phase retrieval, the achievable accuracy is on average about 10% lower than AD under the same conditions ( Supplementary Fig. 4). Figure  3f shows the GS retrieved phase using the mixed focus series with 15% of Gaussian noise. Distortions of the phase were clearly visible in the rectangular shaped nanostructures, and the final accuracy was only 87.57%. Demonstration of partial recovery of the transfer function The most exciting aspect of the AD method is that it can easily be extended to retrieve any information implicitly contained in the measured intensity, regardless of the complexity of the forward model. This is particularly interesting as the accuracy of both the AD and GS method hinges on the precise knowledge of the transfer function. In fact, the phase retrieval process may not even be successful if one or more parameters in the TF is incorrect. The ability to retrieve those parameters alongside the phase is thus crucial to the applicability of our proposed strategy. To illustrate this, we demonstrate simultaneous recovery of the beam convergence semi-angle and defocus values embedded in the normal phase retrieval process. We used the same dataset as described in the previous section, which consisted of four images in a mixed focus series (Δf ± 0.15 and ±1.6 mm) with 15% of Gaussian noise. The starting guess for the convergence angle was 0.1 mrad which is 10 times larger than the true convergence (0.01 mrad) used to produce the simulated data. We picked the convergence angle in this example as it is hard to determine in practice, and has major impact on the image quality and contrast. In addition, we have offset the starting guess of all the Δf by 10 μm, to emulate uncertainties on the knowledge of the exact defocus values. We note that the added uncertainty represents less than 1% of the largest defocus value used (1.6 mm) in this scenario.
For the partial recovery of the transfer function, essentially the same strategy was used as described in Fig. 1. In addition to the phase, the convergence semi-angle and defocus values in the transfer function were also updated iteratively in steps proportional to the negative of their respective gradients. We began TF optimization after 50 iterations of pure phase retrieval. Figure 4a shows the accuracy evolution with (black line) and without (red line) optimizing the TF. The two initially overlap with each other during the first 50 iterations of pure phase retrieval. With TF optimization, the accuracy of the retrieved phase quickly Fig. 4 AD phase retrieval with partial recovery of the transfer function. a Evolution of the accuracy during AD phase retrieval with (black) and without (red) TF optimization. b Evolution of the recovered convergence semi-angle (black) and offset of the defocus values (red) during AD phase retrieval with TF optimization. The dashed lines mark respectively the ground truth for the two parameters. Fig. 3 Comparative performance in the presence of simulated noise. a Ground truth for the phase gradient. Its direction is shown in false color as defined by the color wheel in the inset. Its magnitude is indicated by the gray contour which appears each time the phase wraps over one tenth of 2π. The scale bar is 500 nm. AD retrieved phase gradient using an image pair with 10% of Gaussian noise at Δf = ±1.6 mm, (b) without and (c) with a TV regularizer. The regularization parameter was 4 × 10 −8 . AD retrieved phase gradient using a mixed focus series consisting of four images (Δf = ±0.15 and ±1.6 mm) with (d) 15% and (e) 30% of Gaussian noise. f GS retrieved phase gradient using the same condition as (d). The numbers in the brackets denotes the accuracy of the retrieved phases. The scale bar and the color wheel apply to all the images.
T. Zhou et al.
improved, reaching 95.57% at the end of 1000 iterations. Without TF optimization and with the wrong values for the convergence and defocus, the accuracy evolution staggered at around 60% and the retrieved phase gradually diverged from the true phase with further iterations. Figure 4b shows the evolution of the recovered values for the divergence semi-angle and defocus offset. Both values converged to the vicinity of their respective truth, with a small discrepancy due entirely to the noise in the simulated images. The result thus indicates that even in the presence of 15% of Gaussian noise, the current strategy is capable of correcting errors in the convergence semi-angle with an uncertainty of better than 0.5 μrad and errors in the defocus values with an uncertainty of better than 0.5 μm.
Application to experimental data Finally, we demonstrate the viability of the proposed phase retrieval strategy on experimental LTEM images. The imaging conditions and information about the sample can be found in the "Methods" section. We choose specifically, for the purpose of verification, nanostructures with known magnetic configuration. TIE phase retrieval were performed on two images at Δf = ±1.44 mm with 2 s of exposure per image. AD and GS phase retrieval was performed on a mixed focus series of four images at Δf = ±0.49 mm and ±1 mm with 1 s of exposure per image. The total counting time was thus 4 s in both cases, to ensure a fair comparison under the same electron dose conditions. The convergence semi-angle was also retrieved with the AD method. With a starting guess of 0.1 mrad, its value quickly converged to 0.02 mrad after about 100 iterations (Supplementary Fig. 5a). With a fixed and erroneous value of 0.1 mrad, the error reduction of both AD and GS staggered (Supplementary Fig. 5b) before any meaningful results could be produced. Concurrent phase and amplitude optimization (Supplementary Fig. 5c) was enabled near the end of the process and the retrieved amplitude (Supplementary Fig. 5d) was found to be close to the guess amplitude. Figure  5a, b shows respectively the TIE and AD retrieved phase. A strong phase variation spanning over 13 rad was observed in both cases from the edge to the center of the largest nanostructure. As expected, the AD result appears to be sharper thanks to its inherent high spatial resolution. Figure 5c and d shows respectively the TIE and AD retrieved phase gradient. Line profiles were extracted across the center of the largest nanostructure. Once again, AD (Fig. 5e, red line) outperformed TIE (black line) in retrieving the sharp variations of the phase gradient expected at the edges of the nanostructures. Elsewhere on the extracted line profiles, the two methods agree extremely well with each other, indicating that the accuracy of the AD method on the experimental data is at least equal to, if not better than that of TIE, under the same electron dose conditions. Energy dispersive x-ray spectroscopy and atomic force microscopy measurements were also performed on this sample. Based on the measured thickness and composition, and using the mean inner potential and saturation magnetization for Permalloy, we have calculated the expected phase shift and found good agreement with the experimentally retrieved ones (Supplementary Fig. 6).

DISCUSSION
In this work, we demonstrate phase retrieval in Lorentz TEM using automatic differentiation. The main advantage of our approach is its outstanding performance when working with Fresnel images of large defocus values (|Δf| > 1 mm). There are many reasons for which large defocus values may be desired. Functional properties such as the magnetic or electrostatic field within the sample typically has a slow varying component, and are carried primarily by the very low spatial frequency part of the contrast transfer function. As can be seen from the comparison between Fig. 2b and Supplementary Fig. 2b, these low spatial frequency information cannot be fully extracted at moderate defocus values (|Δf|1 00 μm). Imaging at large defocus values thus allows either more quantitative analysis of weak magnetic or electrostatic field, or similar analysis at a reduced electron dose. The latter is particularly appealing to in situ studies. Among the common methods used for phase retrieval, neither TIE nor GS work well near the upper limit of the microscope defocus. Conventional TIE cannot recover information beyond the point resolution and produces blurry phase images at large defocus values. Both multi-grid TIE and GS can in theory produce sharp phase images, but are numerically unstable at large defocus due to division (back-projection) by the vanishing damping envelope at large spatial frequencies. The maximum defocus value at which GS is stable depends on the convergence semi-angle as both parameters contribute to the angular spread of the source. In our simulated environment, a convergence angle of 10 μrad allows for a maximum defocus on the order of 100 μm. For GS to be stable at |Δf| = 1 mm would require a convergence semi-angle of 1 μrad.
Because the phase is updated through back-propagation of the gradient rather than back-projection of the wavefunction, our AD approach is in principle stable at large defocus values, capable of retrieving phase with both high accuracy and high spatial resolution (Fig. 2d). However, as an optimization method, AD has a tendency to fit to the noise at places with weak numerical constraint. At large defocus values, the numerical constraint is particularly weak at large spatial frequencies due to the vanishing damping envelope function. To improve the noise tolerance, we proposed the use of mixed focus series consisting of two images at large defocus and two images at moderate defocus. We show that under the same conditions, AD out-performs GS in terms of accuracy of the retrieved phase ( Supplementary Fig. 4), and is stable against up to 30% of Gaussian noise (Fig. 3e). The excellent noise tolerance is another reason why our proposed method may prove useful on beam sensitive samples or in in situ studies.
The most exciting feature of the AD formalism is perhaps the possibility to retrieve any information implicitly contained in the measured intensity. We demonstrate that by simultaneously recovering the convergence semi-angle and an intentional defocus offset alongside the phase image. The convergence semi-angle is experimentally hard to determine, and while its knowledge is not required in conventional TIE formalism, it is of critical importance to the AD and GS methods. An incorrect convergence semi-angle not only limits the maximum accuracy of the retrieved phase, but may also disrupt completely the iterative error reduction process (Fig. 4a). In a similar manner, uncertainty on the defocus values may affect the accuracy of the TIE retrieved phase. For those reasons, the possibility to recover any parameters in the contrast transfer function is a feature that may have profound impact on quantitative analysis in LTEM. Finally, we would like to point out that despite being demonstrated primarily at large defocus values, the proposed method works equally well under moderate defocus conditions. In fact, when only moderately defocused images are used, AD behaves very much like GS, sharing the same advantage (high noise tolerance) and disadvantage (slow in retrieving low spatial frequency information) as compared to conventional TIE.

Gradient descent optimization with automatic differentiation
We use the Adam optimizer 64 implemented in Google's Tensorflow package 48 for the gradient descent optimization. The initial learning rate is set to 1. Before computing the amplitude guess, the in-focus image was denoised with a total variation (TV) filter. For simulated data, the weight of the TV filter is set to the level of the Gaussian noise. For experimental data, it is set to the standard deviation measured on the background area. No denoising process was performed on any of the defocus images. The starting guess for the phase is 0.5 everywhere, though any other number or a random 2D array works just as well. Mean Squared Error was used as the loss function, calculated as the difference between the calculated amplitude at the image planes and the measured amplitude of the Fresnel images, squared and averaged for all the pixels. The phase retrieval process is run on a remote Nvidia Tesla T4 GPU hosted on Google's Colaboratory. The amount of time per iteration depends on the number of defocus images used, and is typically 1 min per 10000 iterations.

Gerchberg-Saxton algorithm with weighted back-projection
For the Gerchberg-Saxton algorithm we adopt the modification proposed by Bhattacharyya et al. 27 . Similar to the AD approach, we calculate the exit wavefunction as ψðr ? ; zÞ ¼ aðr ? ; zÞe iφðr?;zÞ . The same initial guess as in AD was used for the phase. For each iteration, the wavefunctions at the image plane were calculated by the convolution of the exit wavefunction with the microscope contrast transfer function. The latter again includes both the phase transfer function and the damping envelope. We then replace the amplitude of the image wavefunctions by the square root of the intensity of their corresponding Fresnel images. A new estimate for the phase was obtained by averaging the back-projected wavefunctions to the nominal exit plane (|Δf| = 0), weighted by their corresponding damping envelope. The multiplication by the damping envelope as the weight conveniently canceled out the division of the damping envelope in the back-projection, thus significantly improving the numerical stability of the method at moderate defocus values. Finally, the difference between the absolute of the calculated image wavefunction and the square root of the measured intensity of the Fresnel images is used to calculate the loss function. The phase retrieval process is considered complete if the improvement of the loss function is smaller than a pre-defined value.

Phase retrieval using transport of intensity equation
The phase was retrieved by solving the transport of intensity equation: where I(r ⊥ ) represents the image intensity observed at a given focus z, φ is the phase of the electron wave and λ is the electron wavelength (2.508 pm for 200 kV electrons). The inverse Laplacian method using Fourier transforms was used to solve the TIE to retrieve phase shift 65 and is given as: where ∇ À2 ? is the inverse Laplacian operator. The image intensity derivative with respect to z on the right hand side of the equation was calculated using central difference method. Image symmetrization was used to avoid any errors due to periodic boundary conditions 66 . The phase recovery was performed using PyLorentz, an open-source software for TIE-based phase reconstruction 67,68 . Accuracy and spatial resolution of the retrieved phase The use of the simulated datasets allowed us to evaluate the accuracy of the retrieved phase, calculated as the cross correlation between the ground truth phase and the retrieved phase.
The spatial resolution of the retrieved phase is estimated by fitting a Gaussian function to the sharp variation of the phase gradient at the edges of the nanostructures. The spatial resolution is then taken as the FWHM of the Gaussian distribution. Because an error function can be considered as twice the integral of a normalized Gaussian function. This is equivalent to fitting an error function to the sharp variation of the phase at the same edges (Fig. 2b)

Simulated dataset
Simulated LTEM images ( Supplementary Fig. 1 where σ is the interaction constant for a TEM and depends on the accelerating voltage (σ = 0.00728 V.nm −1 for 200 kV electrons), ℏ is the reduced Planck's constant and the integration is carried out along the direction of propagation of the electrons. Since the composition of the Permalloy films was taken to be uniform, the first term was approximated as σV 0 t(r ⊥ ), where V 0 = 26 V is the mean inner potential of Permalloy, and t (r ⊥ ) is the thickness of the sample. The amplitude for the simulated images was calculated as aðr ? Þ ¼ expðÀtðr ? Þ=ξ 0 Þ, where ξ 0 is the absorption length of electrons for Permalloy. In the case of thin specimens, ξ 0 mainly accounts for the loss of electrons as they are scattered outside the acceptance angle of the TEM projection lenses. The LTEM images were then computed using the forward model described previously. Each LTEM image is composed of 512 × 512 pixels of 5 × 5 nm. Gaussian noise was added to the images after they are generated. The level of the Gaussian noise refers to the standard deviation of the distribution.

Experimental dataset
Experimental images (512 × 512 pixels) were taken using an aberrationcorrected JEOL 2100F Lorentz TEM operating at 200 kV. The sample consists of 10 nm thick of Permalloy magnetic nanostructures, sputterdeposited on a TEM grid and patterned with e-beam lithography. The pixel size was 6.9 nm.

Microscope transfer function
In order to calculate the image intensity, we propagate the electron exit wavefunction to the image plane by convolving it with the transfer function of the microscope in the back focal plane. The intensity is determined by where * is a convolution operation, and Tðr ? Þ is the microscope transfer function. This operation is written here in real space but is computed in Fourier space. The LTEM transfer function is composed of three parts 55 : TðqÞ ¼ AðqÞe ÀiχðqÞ e ÀgðqÞ ; where A(q) is the objective aperture function, e −iχ(q) the phase transfer function, e −g(q) the damping envelope, and q is the reciprocal space wave vector. The aperture is a binary function (1 inside and 0 outside) centered in reciprocal space for the Fresnel imaging mode. We can define the phase transfer function as χðqÞ ¼ πλ Δz þ C a cos 2ϕ a ð Þ ½ q 2 þ π 2 C s λ 3 q 4 ; where Δz is the defocus, C a and ϕ a are the magnitude and orientation of the two-fold astigmatism, and C s is the spherical aberration coefficient. The damping envelope can be written as where u ¼ 1 þ 2 πθ c Δ ð Þ 2 q 2 , θ c is the beam convergence angle, and Δ is the defocus spread.