Dynamical machine learning volumetric reconstruction of objects’ interiors from limited angular views

Kang, Iksung; Goy, Alexandre; Barbastathis, George

doi:10.1038/s41377-021-00512-x

Download PDF

Article
Open access
Published: 07 April 2021

Dynamical machine learning volumetric reconstruction of objects’ interiors from limited angular views

Light: Science & Applications volume 10, Article number: 74 (2021) Cite this article

4999 Accesses
10 Citations
70 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 03 September 2021

This article has been updated

Abstract

Limited-angle tomography of an interior volume is a challenging, highly ill-posed problem with practical implications in medical and biological imaging, manufacturing, automation, and environmental and food security. Regularizing priors are necessary to reduce artifacts by improving the condition of such problems. Recently, it was shown that one effective way to learn the priors for strongly scattering yet highly structured 3D objects, e.g. layered and Manhattan, is by a static neural network [Goy et al. Proc. Natl. Acad. Sci. 116, 19848–19856 (2019)]. Here, we present a radically different approach where the collection of raw images from multiple angles is viewed analogously to a dynamical system driven by the object-dependent forward scattering operator. The sequence index in the angle of illumination plays the role of discrete time in the dynamical system analogy. Thus, the imaging problem turns into a problem of nonlinear system identification, which also suggests dynamical learning as a better fit to regularize the reconstructions. We devised a Recurrent Neural Network (RNN) architecture with a novel Separable-Convolution Gated Recurrent Unit (SC-GRU) as the fundamental building block. Through a comprehensive comparison of several quantitative metrics, we show that the dynamic method is suitable for a generic interior-volumetric reconstruction under a limited-angle scheme. We show that this approach accurately reconstructs volume interiors under two conditions: weak scattering, when the Radon transform approximation is applicable and the forward operator well defined; and strong scattering, which is nonlinear with respect to the 3D refractive index distribution and includes uncertainty in the forward operator.

Deep Learning-based Inaccuracy Compensation in Reconstruction of High Resolution XCT Data

Article Open access 06 May 2020

Deep denoising for multi-dimensional synchrotron X-ray tomography without high-quality reference data

Article Open access 04 June 2021

Just-in-time deep learning for real-time X-ray computed tomography

Article Open access 16 November 2023

Introduction

Optical tomography reconstructs the three-dimensional (3D) internal refractive index profile by illuminating the sample at several angles and processing the respective raw intensity images. The reconstruction scheme depends on the scattering model that is appropriate for a given situation. If the rays through the sample can be well approximated as straight lines, then the accumulation of absorption and phase delay along the rays is an adequate forward model, i.e. the projection or Radon transform approximation applies. This is often the case with hard x-rays through most materials including biological tissue; for that reason, Radon transform inversion has been widely studied^{1,2,3,4,5,6,7,8,9,10}. The problem becomes even more acute when the range of accessible angles around the object is restricted, a situation that we refer to as “limited-angle tomography,” due to the missing cone problem^11,12,13.

The next level of complexity arises when diffraction and multiple scattering must be taken into account in the forward model; then, the Born or Rytov expansions and the Lippmann-Schwinger integral equation^{14,15,16,17,18} are more appropriate. These follow from the scalar Helmholtz equation using different forms of expansion for the scattered field¹⁹. In all these approaches, weak scattering is obtained from the first order in the series expansion. Holographic approaches to volumetric reconstruction generally rely on this first expansion term^{20,21,22,23,24,25,26,27,28,29,30,31}. Often, solving the Lippmann-Schwinger equation is the most robust approach to account for multiple scattering, but even then, the solution is iterative and requires excessive amount of computation especially for complex 3D geometries. The inversion of these forward models to obtain the refractive index in 3D is referred to as inverse scattering, also a well-studied topic^{32,33,34,35,36,37,38,39}.

An alternative to the integral methods is the Beam Propagation Method (BPM), which sections the sample along the propagation distance z into slices, each slice scattering according to the thin transparency model, and propagates the field from one slice to the next through the object⁴⁰. Despite some compromise in accuracy, BPM offers comparatively light load of computation and has been used as forward model for 3D reconstructions¹⁸. The analogy of the BPM computational structure with a neural network was exploited, in conjunction with gradient descent optimization, to obtain the 3D refractive index as the “weights” of the analogous neural network in the learning tomography approach^41,42,43. BPM has also been used with more traditional sparsity-based inverse methods^33,44. Later, a machine learning approach with a Convolutional Neural Network (CNN) replacing the iterative gradient descent algorithm exhibited even better robustness to strong scattering for layered objects, which match well with the BPM assumptions⁴⁵. Despite great progress reported by these prior works, the problem of reconstruction through multiple scattering remains difficult due to the extreme ill-posedness and uncertainty in the forward operator; residual distortion and artifacts are not uncommon in experimental reconstructions.

Inverse scattering, as inverse problems in general, may be approached in a number of different ways to regularize the ill-posedness and thus provide some immunity to noise^46,47. Recently, thanks to a ground-breaking observation from 2010 that sparsity can be learnt by a deep neural network⁴⁸, the idea of using machine learning to approximate solutions to inverse problems also caught on ref. ⁴⁹. In the context of tomography, in particular, deep neural networks have been used to invert the Radon transform⁵⁰ and recursive Born model³², and were also the basis of some of the papers we cited earlier on holographic 3D reconstruction^28,29,30, learning tomography^41,42,43, and multi-layered strongly scattering objects⁴⁵. In prior work on tomography using machine learning, generally, the intensity projections are all fed simultaneously as inputs to a computational architecture that includes a neural network, and the output is the 3D reconstruction of the refractive index. The role of the neural network is to learn (1) the priors that apply to the particular class of objects being considered and (2) the relationship of these priors to the forward operator (Born, BPM, etc.) so as to produce a reasonable estimate of the inverse.

Here we propose a rather distinct approach to exploit machine learning for a generic 3D refractive index reconstruction independent of the type of scattering. Our motivation is that, as the angle of illumination is changed, the light goes through the same scattering volume, but the scattering events, weak or strong, follow a different sequence. At the same time, the raw image obtained from a new angle of illumination adds information to the tomographic problem, but that information is constrained by (i.e. is not orthogonal to) the previously obtained patterns. We interpret this as similar to a dynamical system, where the output is constrained by the history of earlier inputs as time evolves and new inputs arrive. (The convolution integral is the simplest and best-known expression of this relationship between the output of a system and the history of the system’s input.) An alternative interpretation is as dynamic programming⁵¹, where the system at each step reacts so as to incrementally improve an optimality criterion—in our case, the reconstruction error metric.

The analogy between tomography and a dynamical system suggests the RNN architecture as a strong candidate to process raw images in sequence, as they are obtained one after the other; and process them recurrently so that each raw image from a new angle improves over the reconstructions obtained from the previous angles. Thus, we treat multiple raw images under different illumination angles as a temporal sequence, as shown in Fig. 1. The angle index replaces what is a dynamical system would have been the time t. This idea is intuitively appealing; it also leads to considerable improvement in the reconstructions, removing certain artifacts that were visible in the strong scattering case of ref. ⁴⁵.

**Fig. 1: Definition of the angular axis according to illumination angles.**

The dynamic reconstruction methodology applies, for example, too weak scattering where the raw images are the sinograms; and too strong scattering, where the raw images are better interpreted as intensity diffraction patterns. The purpose of the learning scheme is to augment this relationship with regularization priors applicable to a certain class of objects of interest.

The way we propose to use RNNs in this problem is quite distinct from the recurrent architecture proposed first in ref. ⁴⁸ and subsequently implemented, replacing the recurrence by a cascade of distinct neural networks, in refs. ^50,52,53, among others. In these prior works, the input to the recurrence can be thought of as clamped to the raw measurement, as in the proximal gradient⁵⁴ and related methods; whereas, in our case, the input to the recurrence is itself dynamic, with the raw images from different angles forming the input sequence. Moreover, by utilizing a modified gated recurrent unit (more on this below) rather than a standard neural network, we do not need to break the recurrence up into a cascade.

Typical applications of RNNs^55,56 are in temporal sequence learning and identification. In imaging and computer vision, RNN is applied in 2D and 3D: video frame prediction^57,58,59,60, depth map prediction⁶¹, shape inpainting⁶²; and stereo reconstruction^63,64 or segmentation^65,66 from multi-view images, respectively. Stereo, in particular, bears certain similarities to our tomographic problem here, as sequential multiple views can be treated as a temporal sequence. To establish the surface shape, the RNNs in these prior works learn to enforce consistency in the raw 2D images from each view and resolve the redundancy between adjacent views in recursive fashion through the time sequence (i.e. the sequence of view angles). Non-RNN learning approaches have also been used in stereo, e.g. Gaussian mixture models⁶⁷. In computed tomography, in particular, an alternate dynamical neural network of the Hopfield type has been used successfully⁶⁸.

In this work, we replaced the standard Long Short-Term Memory (LSTM)⁵⁶ implementation of RNNs with a modified version of the newer Gated Recurrent Unit (GRU)⁶⁹. The GRU has the advantage of fewer parameters but generalizes comparably with the LSTM. Our GRU employs a separable-convolution scheme to explicitly account for the asymmetry between the lateral and axial axes of propagation. We also utilize an angular attention mechanism whose purpose is to learn how to reward specific angles in proportion to their contribution to reconstruction quality⁷⁰. We found that for the strongly anisotropic samples or scanning schemes the angular attention mechanism is effective.

The results of our simulation and experimental study on a generic interior volumetric reconstruction are in Results. We first show numerically that the dynamical machine learning approach is suitable to tomographic reconstruction under more restrictive and commonly used Radon transform assumption, i.e. weak scattering. Then, we demonstrate the applicability of the dynamical approach to strong scattering tomography. We show significant improvement over static neural network-based reconstructions of the same experimental data under the strong scattering assumption. The improvement is shown both visually and in terms of several quantitative metrics. Results from an ablation study indicate the relative significance of the new components we introduced to the quality of the reconstructions.

Results

Our first investigation of the recurrent reconstruction scheme is for weak scattering, i.e. when the Radon transform approximation applies and with a limited range of available angles. For simulation, each sample consists of random number, between 1 and 5 with equal probability, of ellipsoids at random locations with arbitrarily chosen sizes, amplitudes, and angles, thus spatially isotropic in average. Rotation is applied along the x-axis, from −10° to +10° with 1° increment, thus 21 projections per sample under a parallel-beam geometry. The Filtered Backprojection (FBP) algorithm³ is used to generate crude estimates from the projections. nth FBP Approximant (n = 1,2,…,21) is the reconstruction by the FBP algorithm using n projections of n angles starting from −10°.

The reconstructions by the RNN are compared in Fig. 2 with FBP and Total Variation (TV)-regularized reconstructions using TwIST⁷¹ for qualitative and visual comparison. Here a TV-regularization parameter is set to be 0.01, and the algorithm is run up to 200 iterations until its objective function saturates. Figure 3 shows the quantitative comparison on test performance using three different quantitative metrics, where FBP and FBP + TV yielded much lower values than the recurrent scheme.

**Fig. 2: Dynamic approach for projection tomography (i.e. subject to the Radon transform approximation) using simulated data.**

**Fig. 3: Test performance under the weak scattering condition.**

Figure 4 shows the evolution of test reconstructions as new projections or FBP Approximants are presented to the dynamical scheme. When the recurrence starts with n = 1, the volumetric reconstruction is quite poor; as more projections are included, the reconstruction improves as expected. It is also interesting to see that not all the angles are needed to achieve reasonable quality of reconstructions as the graphs and reconstructions in Fig. 4a, b saturate around n = 19.

**Fig. 4: Progression of 3D reconstruction performance under the weak scattering condition.**

Details of the recurrent architecture for the weak scattering assumption are presented in Fig. 16b. To quantify the relative contributions to reconstruction quality of each element in the architecture, the elements one by one are ablated (-) or substituted ($\rightleftarrows$) with alternatives. This ablation study is performed on a strategy to giving weights on hidden features, separable convolution, and ReLU (rectified linear unit) activation function⁷² inside the recurrence cell. Specifically, in the study, (1) an angular attention mechanism is ablated, or only last hidden feature h_n is taken into account before the decoder instead of the angular attention mechanism; (2) a separable convolution scheme is ablated, thus the standard 3D convolution; and (3) ReLU activation function is ablated and then substituted with the tanh activation function, which is more usual⁶⁹. The ablated architectures are trained under the same training scheme (for more details, see Training the recurrent neural network in Materials and methods) and tested with the same simulated data.

Visually, in Fig. 5, test performance largely degrades as the ablation happens on the ReLU activation function and separable convolution, which is also found quantitatively in Fig. 6. Therefore, for the Radon case, we find that (1) ReLU activation function is highly desirable instead of the native tanh function; (2) the separable convolution is helpful when designing a recurrent unit and encoder/decoder for tomographic reconstructions under the weak scattering assumption. However, for the ablation of the angular attention mechanism, it makes no large difference from the performance of the proposed model, which is because the objects used for training and testing are isotropic in average and do not show any preferential direction.

**Fig. 5: Ablation study on the recurrent architecture under the weak scattering condition.**

**Fig. 6: Quantitative analysis on test performance in the ablation study under the weak scattering condition.**

Here, we perform an additional study on the role of the angular attention mechanism by granting a preferential direction or spatial anisotropy to the objects of interest. In this study, the mean major axis of the objects is assumed to be parallel to the z-axis, i.e. the objects are elongated along the axis. Instead of the previously examined symmetric angular scanning range (−10°, +10°), we now consider the asymmetric case (−15°, +5°).

In Fig. 7a, the angular attention with the asymmetry now gives attention on n differently, i.e. its peak translated to the left by 5. It is because the projections from higher angles may contain more useful information on the objects due to the directional preference of the objects, thus the distribution of the attention probabilities is now attracted to the lower indices. In Fig. 7b, we quantitatively show that the angular attention improves performance in the case of the asymmetric scanning of objects with directionality regardless of the noise present in projections.

**Fig. 7: Attention probabilities and angular attention mechanism for different ranges of scanning.**

Figures 8 and 9 characterize our proposed method in terms of feature size and feature sparsity, as well as cross-domain generalization, compared to the baseline model (see Training the recurrent neural network in “Materials and methods” for details). Networks trained with examples with small and dense features tend to generalize better and with less artifacts than large and sparse features, in agreement with ref. ⁷³. Lastly, and not surprisingly, overall reconstruction quality is better when feature size is large and features are sparse.

**Fig. 8: Sample feature size and testing performance.**

**Fig. 9: Sample sparsity and testing performance.**

Next, we investigate the case when the Radon transform is not applicable, i.e. tomography under strong scattering conditions and under a similarly limited-angle scheme. The RNN is first trained with the single-pass, gradient descent-based Approximants Eq. (4) of simulated diffraction patterns (see Training and testing procedures in Materials and methods), and then tested with the simulated ones and additionally with the TV-based Approximants Eq. (5) of experimentally obtained diffraction patterns. TV regularization is only applied to the experimental patterns. To reconcile any experimental artifacts, there is an additional step of Dynamic Weighted Moving Average (DWMA) on the Approximants ${\boldsymbol{f}}_n^{[1]}(n = 1,\,2,\, \ldots ,\,42)$, hence DWMA Approximants ${\tilde{\boldsymbol f}}_m^{[1]}(m = 1,\,2,\, \ldots ,\,12)$. See “Materials and methods” for more details in the DWMA process. The evolution of the RNN output as more DWMA Approximants are presented is shown in Fig. 10 and shows a similar improvement with recurrence m as in the Radon case of Fig. 4. Also, like the Radon case, it is interesting to see that not all the Approximants are needed to acquire reasonable quality of reconstructions: the graphs in Fig. 10a saturate around m = 10 and the visual quality of the reconstructions at m = 10–12 in Fig. 10b does not largely differ.

**Fig. 10: Progress of 3D reconstruction performance as diffraction patterns from different angles are presented to the recurrent scheme.**

For comparison, the 3D-DenseNet architecture with skip connections in ref. ⁴⁵ and its modified version with more parameters to match with that of our RNN are set as baseline models (see Training the recurrent neural network in Materials and methods for details). Our RNN has approximately 21 M parameters, and visual comparisons with the baseline 3D-DenseNets with 0.5 M and 21 M parameters are shown in Fig. 11. The RNN results show substantial visual improvement, with fewer artifacts and distortions compared to the static approaches of ref. ⁴⁵, even when the number of parameters in the latter matches ours. PCC, SSIM, Wasserstein distance, and PE are used to quantify test performance using simulated and experimental data in Fig. 12.

**Fig. 11: Qualitative comparison on test performance between the baseline and proposed architectures using experimental data.**

**Fig. 12: Test performance on simulated and experimental data under the strong scattering condition.**

We also conducted an ablation study of the learning architecture of Fig. 16d. Similar to the Radon case, each component in the architecture was ablated or substituted with its alternative, one at a time: (1) ReLU was ablated and then substituted with the native tanh activation function, (2) the separable convolution was ablated, thus the standard 3D convolution, and (3) the angular attention mechanism was ablated, or only the last hidden feature was given attention. The ablated architectures are also trained under the same training scheme (see Training the recurrent neural network in “Materials and methods” for more details) and tested with both the simulated Eq. (4) and experimental Approximants Eq. (5).

Visually in Fig. 13, unlike the Radon case, paying attention only to the last hidden feature affects and degrades the testing performance worst. Also, it is important to note that the ablation of the separable convolution scheme brings degradation in test performance according to Fig. 13. The decrease in test performance by the substitution of ReLU with the more common tanh is comparatively marginal. These findings are supported quantitatively as well in Fig. 14.

**Fig. 13: Visual quality assessment from the ablation study on elements (see “Computational architecture in Materials and methods” for details) using experimental data.**

**Fig. 14: Ablation study on the recurrent architecture under the strong scattering condition.**

Thus, under the strong scattering condition, we find that (1) hidden features from all angular steps need to be taken into consideration with the angular attention mechanism for reconstructions to get a better test performance although the last hidden feature is assumed to be informed of the history of the previous angular steps; (2) replacing the standard 3D convolution with the separable convolution helps when designing a recurrent unit and a convolutional encoder/decoder for tomographic reconstructions; and (3) the substitution of tanh with ReLU is still useful but may be application dependent.

Discussion

We have proposed a new recurrent neural network scheme for a generic interior-volumetric reconstruction by processing raw inputs from different angles of illumination dynamically, i.e. as a sequence, with each new angle improving the 3D reconstruction. We found this approach to work well under two types of scattering assumptions: weak (Radon transform) and strong. In the second case, in particular, we observed significant qualitative and quantitative improvement over the static machine learning scheme of ref. ⁴⁵, where the raw inputs from all angles are processed at once by a neural network.

Through the ablation studies, we found that sandwiching the recurrent structure with some key elements between a convolutional encoder/decoder helps improve the reconstructions. We found that the angular attention mechanism takes an important role especially when the objects of interest are spatially anisotropic and performs better than placing all the attention on only the last hidden feature. Even though the last hidden feature is a nonlinear multivariate function of all the previous hidden features, as it has a propensity to reward the latter representations but the former ones⁷⁴, the last hidden feature may not sufficiently represent all angular views. Hence, the angular attention mechanism adaptively merges information from all angles. This is particularly important for our strong scattering case as each DWMA Approximant involves a diffraction pattern of a certain illumination angle; whereas an FBP Approximant under the weak scattering case is computed from several projections in a cumulative fashion.

In addition, interestingly, the relative contributions of other elements, e.g. the separable convolution scheme and ReLU activation, differ in weak and strong scattering assumptions. The substitution of the ReLU with a more common tanh activation brings forth more severe degradation of performance under the weak scattering assumption. Thus, we suggested different guidelines for each scattering assumption.

Lastly, alternative implementations of the RNN could be considered. Examples are LSTMs, Reservoir Computing^75,76,77, separable convolution or DenseNet variants for the encoder/decoder and dynamical units. We leave these investigations to future work.

Materials and methods

Experiment

For the experimental study under the strong scattering assumption, the experimental data are the same as in ref. ⁴⁵, whose experimental apparatus is summarized in Fig. 15. We repeat the description here for the readers’ convenience. The He-Ne laser (Thorlabs HNL210L, power: 20 mW, λ = 632.8 nm) illuminated the sample after spatial filtering and beam expansion. The illumination beam was then de-magnified by the telescope ($f_{L_3}:f_{L_4} = 2:1$), and the EM-CCD (Rolera EM-C2, pixel pitch: 8 μm, acquisition window dimension: $1002 \times 1004$) captured the experimental intensity diffraction patterns. The integration time for each frame was 2 ms, and the EM gain was set to × 1. The optical power of the laser was strong enough for the captured intensities to be comfortably outside the shot-noise limited regime.

Each layer of the sample was made of fused silica slabs (n = 1.457 at 632.8 nm and at 20 °C). Slab thickness was 0.5 mm, and patterns were carefully etched to the depth of 575 ± 5 nm on the top surface of each of the four slabs. To reduce the difference between refractive indices, gaps between adjacent layers were filled with oil (n = 1.4005 ± 0.0002 at 632.8 nm and at 20 °C), yielding binary-phase depth of −0.323 ± 0.006 rad. The diffraction patterns used for training were prepared with simulation precisely matched to the apparatus of Fig. 15. For testing, we used a set of diffraction patterns that was acquired both through simulation (see Approximant computations in Materials and methods for details) and experiment.

For the strong scattering case, objects used for both simulation and experiment are dense-layered, transparent, i.e. of negligible amplitude modulation, and of binary refractive index. They were drawn from a database of IC layout segments⁴⁵. The feature depth of 575 ± 5 nm and refractive index contrast 0.0565 ± 0.0002 at 632.8 nm and at 20 °C were such that weak scattering assumptions are invalid and strong scattering has to be necessarily taken into account. The Fresnel number ranged from 0.7 to 5.5 for the given defocus amount Δz = 58.2 mm for the range of object feature sizes.

To implement the raw image acquisition scheme, the sample was rotated from −10° to +10° with a 1° increment along both the x and y axes, while the illumination beam and detector remained still. This resulted in N = 42 angles and intensity diffraction patterns in total. Note that ref. ⁴⁵ only utilized 22 patterns out of with a 2-degree increment along both x and y axes. The comparisons we show later are still fair because we retrained all the algorithms of ref. ⁴⁵ for the 42 angles and 1° increment.

Computational architecture

Figure 16 shows the proposed RNN architectures for both scattering assumptions in detail. Details of the forward model and Approximant (pre-processing) algorithm, the separable-convolution GRU, convolutional encoder and decoder, and the angular attention mechanism are described in Materials and methods. The total number of parameters in both computational architectures is ∼ 21 M (more on this topic in Training the recurrent neural network in Materials and methods.).

**Fig. 16: Details on implementing the dynamical scheme.**

Approximant computations

Under the weak scattering condition, amplitude phantoms with the random number between 1 and 5 of ellipsoids with arbitrarily chosen dimensions and amplitude values at random locations are illuminated within a limited-angle range along one axis, thus spatially isotropic in average. The angle is scanned from −10° to +10° with a 1° increment. Intensity patterns on a detector are simple projections of the objects along certain angles according to the Radon transform as a forward model.

Filtered Backprojection (FBP)³ is chosen to perform backward operation. Here a crude estimate of n projections, i.e. ${\boldsymbol{g}}_1, \ldots ,\,{\boldsymbol{g}}_n$, using to the FBP algorithm without any regularization is the nth FBP Approximant or ${\boldsymbol{f}}_n^\prime$. Thus, the quality of the FBP Approximant is improved as n increases. As n spans from 1 to N(=21), a sequence of the FBP Approximants $\left( {{\boldsymbol{f}}_1^\prime ,\,{\boldsymbol{f}}_2^\prime ,\, \ldots ,\,{\boldsymbol{f}}_N^\prime } \right)$ becomes the input to an encoder and a recurrence cell as shown in Fig. 1b. The FBP Approximant sequences for training, validation, and testing are generated with these procedures and based on three-dimensional simulated phantoms.

However, under the strong scattering condition, the dense-layered, binary-phase object is illuminated at a sequence of angles, and the corresponding diffraction intensity patterns are captured by a detector. At the nth step of the sequence, the object is illuminated by a plane wave at angles $\left( {\theta _{nx},\,\theta _{ny}} \right)$ with respect to the propagation axis z on the xz and yz planes, respectively. Beyond the object, the scattered field propagates in free space by distance Δz to the digital camera (the numerical value is Δz = 58.2 mm. Let the forward model under the nth illumination angle be denoted as $H_n,\,n = 1,2, \ldots ,\,N$; that is, the nth measurement at the detector plane produced by the phase object f is g_n.

The forward operators H_n are obtained from the non-paraxial BPM^33,40,45, which is less usual so we describe it in some additional detail here. Let the jth cross-section of the computational window perpendicular to z-axis be $f^{\left[ j \right]} = \exp \left( {i\varphi ^{\left[ j \right]}} \right),\,j = 1, \ldots ,\,J,$ where J is the number of slices the we divide the object into, each of axial extent δz. The optical field at the (j + 1)th slice is expressed as

$$\psi _n^{\left[ {j + 1} \right]} = {\cal{F}}^{ - 1}\left[ {{\cal{F}}\left[ {\psi _n^{[j]} \circ f_n^{\left[ j \right]}} \right]\left( {k_x,k_y} \right) \cdot \exp \left( { - i\left( {k - \sqrt {k^2 - k_x^2 - k_y^2} } \right)\delta z} \right)} \right]$$

(1)

where δz is is equal to the slab thickness, i.e. 0.5 mm; ${\cal{F}}$ and ${\cal{F}}^{ - 1}$ are the Fourier and inverse Fourier transforms, respectively; and $\chi _1 \circ \chi _2$ denotes the Hadamard (element-wise) product of the functions $\chi _1$ and $\chi _2.$The Hadamard product is the numerical implementation of the thin transparency approximation, which is inherent in the BPM. To obtain the intensity at the detector, we define the (J + 1)th slice displaced by Δz from the Jth slice (the latter is the exit surface of the object) to yield

$$H_n\left( {\boldsymbol{f}} \right) = \left| {\psi _n^{\left[ {J + 1} \right]}} \right|^2$$

(2)

The purpose of the Approximant, in general, is to produce a crude estimate of the volumetric reconstruction using the forward operator alone. This has been well established as a helpful form of pre-processing for subsequent treatment by machine learning algorithms^45,78. Previous works constructed the Approximant as a single-pass gradient descent algorithm^33,45. Here, due to the sequential nature of our reconstruction algorithm, as each intensity diffraction pattern from a new angle of illumination n is received, we instead construct a sequence of Approximants, indexed by n, by solving the problem

$$\begin{array}{*{20}{l}}{\hat{\boldsymbol f}} = {\mathrm{argmin}}_{\boldsymbol{f}}{\cal{L}}_{n} \left( {\boldsymbol{f}} \right)\,{\mathrm{where}} \, {\cal{L}}_{n} \left( {\boldsymbol{f}} \right) = \displaystyle\frac{1}{2} \Vert {H_n} \left( {\boldsymbol{f}} \right) - {\boldsymbol{g}}_{n} \Vert_{2}^{2}, \\ n= 1,2, \ldots, \, N \end{array}$$

(3)

The gradient descent update rule for this functional ${\cal{L}}_n\left( {\boldsymbol{f}} \right)$ is

$${\boldsymbol{f}}_n^{\left[ {l + 1} \right]} = {\boldsymbol{f}}_n^{\left[ l \right]} - s\left( {\nabla _{\boldsymbol{f}}{\cal{L}}_n\left( {{\boldsymbol{f}}_n^{\left[ l \right]}} \right)} \right)^\dagger = {\boldsymbol{f}}_n^{\left[ l \right]} - s\left( {H_n^T\left( {{\boldsymbol{f}}_n^{\left[ l \right]}} \right)\nabla _{\boldsymbol{f}}H_n\left( {{\boldsymbol{f}}_n^{\left[ l \right]}} \right) - {\boldsymbol{g}}_n^T\nabla _{\boldsymbol{f}}H_n\left( {{\boldsymbol{f}}_n^{\left[ l \right]}} \right)} \right)^\dagger$$

(4)

where ${\boldsymbol{f}}_n^{\left[ 0 \right]} = 0$ and s is the descent step size and in the numerical calculations was set to 0.05 and the superscript † denotes the transpose. The single-pass, gradient descent-based Approximant was used for training and testing of the RNN with simulated diffraction patterns but with an additional pre-processing step that will be explained in Eq. (7).

We also implemented a denoised TV-based Approximant, to be used only at the testing stage of the RNN with experimental diffraction patterns, where the additional pre-processing step in Eq. (7) also applies. In this case, the functional to be minimized is

$$\begin{array}{*{20}{l}}{\cal{L}}_n^{{\mathrm{TV}}}\left( {\boldsymbol{f}} \right) = \displaystyle\frac{1}{2} \Vert H_n\left( {\boldsymbol{f}} \right) - {\boldsymbol{g}}_{n} \Vert_2^2 + \kappa {\mathrm{TV}}_{l_1}\left( {\boldsymbol{f}} \right), \\\quad\quad\;\;\; n = 1,2, \ldots ,\,N \end{array}$$

(5)

where the TV-regularization parameter was chosen as $\kappa = 10^{ - 3}$, and for ${\boldsymbol{x}} \in {\cal{R}}^{P \times Q}$ the anisotropic l₁-TV operator is

$${\mathrm{TV}}_{l_1}\left( {\boldsymbol{x}} \right) = \mathop {\sum }\limits_{p = 1}^{P - 1} \mathop {\sum }\limits_{q = 1}^{Q - 1} \left( {\left| {x_{p,q} - x_{p + 1,q}} \right| + \left| {x_{p,q} - x_{p,q + 1}} \right|} \right) + \mathop {\sum }\limits_{p = 1}^{P - 1} \left| {x_{p,Q} - x_{p + 1,Q}} \right| + \mathop {\sum }\limits_{q = 1}^{Q - 1} \left| {x_{P,q} - x_{P,q + 1}} \right|$$

(6)

with reflexive boundary conditions^79,80. To produce the Approximants of experimentally obtained diffraction patterns for testing from this functional, we first ran 4 iterations of the gradient descent and ran 8 iterations of the FGP-FISTA (Fast Gradient Projection with Fast Iterative Shrinkage Thresholding Algorithm)^79,81.

Dynamically weighted moving average

The N Approximants of the strong scattering case form a 4D spatiotemporal sequence $\left( {{\boldsymbol{f}}_1^{\left[ 1 \right]},{\boldsymbol{f}}_2^{[1]}, \ldots ,\,{\boldsymbol{f}}_N^{[1]}} \right),$ which we process with a Dynamical Weighted Moving Average (DWMA) operation. For the weak scattering case, we omit this operation. The purpose of the DWMA is to smooth out short-term fluctuations, such as experimental artifacts in raw intensity measurements, and highlight longer-term trends, e.g. the change of information conveyed by different forward operators along the angular axis. The resulting DWMA Approximants ${\tilde{\boldsymbol {f}}}_m^{[1]}$ have a shorter length M than the original Approximants ${\boldsymbol{f}}_n^{[1]}$, i.e. M < N. Also, the weights in the moving average are dynamically determined as follows.

$${\tilde{\boldsymbol {f}}}_m^{[1]} = \left\{ {\begin{array}{*{20}{l}} \sum \limits_{n = m}^{m + N_w} \alpha _{nm}{\boldsymbol{f}}_n^{[1]}, \hfill & {1 \le m \le N_h} \hfill \\ \sum\limits_{n = m}^{m + N_w} \alpha _{nm}{\boldsymbol{f}}_{n + N_w}^{[1]}, \hfill & {N_h + 1 \le m \le M} \hfill \end{array}} \right.$$

(7)

where $e_{nm} = \tanh \left( W_e^mf_n^{\left[1\right]} \right)$

$$\begin{array}{*{20}{l}}\alpha _{nm} = {\mathrm{softmax}}\left( {e_{nm}} \right) = \frac{{\exp \left( {e_{nm}} \right)}}{\sum _{n = 1}^{N_w} \exp \left( {e_{nm}} \right)},\\ n = m,m + 1, \ldots ,m + N_w\end{array}$$

Equation 7 follows the convention of an additive attention mechanism⁷⁴. α_nm indicates relative importance of ${\boldsymbol{f}}_n^{[1]}$ with respect to ${\tilde{\boldsymbol f}}_m^{[1]}$. Here, $W_e^m$ is a hidden layer assigned for each ${\tilde{\boldsymbol f}}_m^{[1]}$, which is subject to be trained for several epochs. The relative importance is determined by computing its associated energy e_nm and the softmax function normalizes it. More details are available in the Angular attention mechanism in Materials and methods. Supplementary Information (SI) Section S3 explains why the DWMA is more favorable than the Simple Moving Average (SMA) with fixed and uniform weights, i.e. 1/M.

To be consistent, the DWMA was applied to the original Approximants for both training and testing. In this study, N_w = 15, N_h = 6, M = 12. These choices follow from the following considerations. We have N = 42 diffraction patterns for each sequence: 21 captured along the x-axis (1 – 21; $\theta _x = - 10^\circ , - 9^\circ , \ldots ,\, + 10^ \circ$) and the remaining ones along the y-axis (22 – 42; $\theta _y = - 10^\circ , - 9^\circ , \ldots ,\, + 10^\circ$). The DWMA is first applied to 21 patterns from x-axis rotation, which thus generates 6 averaged diffraction patterns, and then it is applied to the remaining 21 patterns from y-axis rotation, resulting in the other 6 patterns. Therefore, the input sequence to the next step in the architecture of Fig. 16c, i.e. to the encoder (see Convolutional encoder and decoder in “Materials and methods” for details), consists of a sequence of M = 12 DWMA Approximants ${\tilde{\boldsymbol f}}_m^{[1]}.$ In SI Section S4, we discuss performance change due to different ways of numbering DWMA Approximants ${\tilde{\boldsymbol f}}_m^{[1]}$ entering the neural network. SI Section S5 provides visualization of DWMA Approximants.

Separable-Convolution Gated Recurrent Unit (SC-GRU)

Recurrent neural networks involve a recurrent unit that retains memory and context based on previous inputs in a form of latent tensors or hidden units. It is well known that the LSTM is robust to instabilities in the training process. Moreover, in the LSTM, the weights applied to past inputs are updated according to usefulness, while less useful past inputs are forgotten. This encourages the most salient aspects of the input sequence to influence the output sequence⁵⁶. Recently, the GRU was proposed as an alternative to LSTM. The GRU effectively reduces the number of parameters by merging some operations inside the LSTM, without compromising the quality of reconstructions; thus, it is expected to generalize better in many cases⁶⁹. For this reason, we chose to utilize the GRU in this paper as well.

The governing equations of the standard GRU are as follows:

$$\begin{array}{*{20}{l}} {r_m} \hfill & = \hfill & \sigma({W_r\xi _m + U_rh_{m - 1} + b_r}) \hfill \\ {z_m} \hfill & = \hfill & \sigma({W_z\xi _m + U_zh_{m - 1} + b_z}) \hfill \\ {\tilde h_m} \hfill & = \hfill & {\tanh \left( {W\xi _m + U\left( {r_m \circ h_{m - 1}} \right) + b_h} \right)} \hfill \\ {h_m} \hfill & = \hfill & {\left( {1 - z_m} \right) \circ \tilde h_m + z_m \circ h_{m - 1}} \hfill \end{array}$$

(8)

where $\xi _m,\,h_m,\,r_m,\,z_m$ are the inputs, hidden features, reset states, and update states, respectively. Multiplication operations with weight matrices are performed in a fully connected fashion.

We modified this architecture so as to take into account the asymmetry between the lateral and axial dimensions of optical field propagation, motivated from the concept of separable convolution in deep learning^82,83 as shown in Fig. 17. This is evident even in free-space propagation, where the lateral components of the Fresnel kernel

$$\exp \left( {{\rm{i}}\pi \frac{{x^2 + y^2}}{{\lambda z}}} \right)$$

(9)

are shift invariant and, thus, convolutional, whereas the longitudinal axis z is not. The asymmetry is also evident in nonlinear propagation, as in the BPM forward model Eq. (1) that we used here. This does not mean that space is anisotropic – of course space is isotropic! The asymmetry arises because propagation and the object are 3D, whereas the sensor is 2D. In other words, the orientation of the image plane breaks the symmetry in object space so that the scattered field from a certain voxel within the object apparently influences the scattered intensity from its neighbors at the detector plane differently in the lateral direction than in the axial direction. To account for this asymmetry in a profitable way for our learning task, we first define the operators $W_r,\,U_r,$ etc. as convolutional so as to keep the number of parameters down (even though in free space propagation the axial dimension is not convolutional and under strong scattering neither dimension is nonlinear); and we constrain the convolutional kernels of the operators to be the same in the lateral dimensions x and y, and allow the axial z dimension kernel to be different. This approach justifies the term separable convolution, and we found it to be a good compromise between facilitating generalization and adhering to the physics of the problem.

**Fig. 17: Separable-convolution scheme: different convolution kernels are applied along the lateral x, y axes vs. the longitudinal z-axis.**

We also replaced the tanh activation function of the standard GRU with ReLU activation⁸⁴ as the ReLU is computationally less expensive and helpful to avoid local minima with fewer vanishing gradient problems^72,85. The final form of our SC-GRU dynamics is

$$\begin{array}{*{20}{l}} {r_m} \hfill & = \hfill & \sigma({W_r \ast \xi _m + U_r \ast h_{m - 1} + b_r}) \hfill \\ {z_m} \hfill & = \hfill & \sigma({W_z \ast \xi _m + U_z \ast h_{m - 1} + b_z}) \hfill \\ {\tilde h_m} \hfill & = \hfill & {{\mathrm{ReLU}}\left( {W \ast \xi _m + U \ast \left( {r_m \circ h_{m - 1}} \right) + b_h} \right)} \hfill \\ {h_m} \hfill & = \hfill & {\left( {1 - z_m} \right) \circ \tilde h_m + z_m \circ h_{m - 1}} \hfill \end{array}$$

(10)

where * denotes the separable convolution operation.

Convolutional encoder and decoder

CNNs are placed before and after the SC-GRU as encoder and decoder, respectively. This architectural choice was inspired by refs. ^86,87,88,89. The encoder and decoder also utilize separable convolution, in conjunction with residual learning, which is known to improve generalization in deep networks⁹⁰. As in ref. ⁸⁶, the encoder and decoder utilize Down-Residual Blocks (DRB), Up-Residual Blocks (URB), and Residual Blocks (RB), whose details can be found in Fig. S5 in SI Section S6; however, there are no skip connections in our case, i.e. this is not a U-net⁹¹ architecture. The encoder learns how to map its input (i.e. the FBP Approximant ${\boldsymbol{f}}_n^\prime$ or the DWMA Approximant ${\tilde{\boldsymbol f}}_m^{[1]}$ sequence) onto a low-dimensional nonlinear manifold. For the weak scattering case, the compression factor for both lateral and axial input dimensions is 8, whereas for the strong scattering case, the compression factor is 16 for the lateral input dimensions, but the axial dimension is left intact. This eases the burden on the training process as the number of parameters is reduced; more importantly, encoding abstracts features out of the high-dimensional inputs, passing latent tensors over to the recurrent unit. Letting the encoder for the mth Approximant be symbolized as ${\mathrm{Enc}}_m\left( \cdot \right)$, $\xi _m = {\mathrm{Enc}}_m\left( {{\boldsymbol{f}}_m^\prime } \right)$ or $\xi _m = {\mathrm{Enc}}_m\left( {{\tilde{\boldsymbol f}}_m^{\left[ 1 \right]}} \right)$ in Eq. (10). The decoder restores the output of the RNN to the native dimension of the object we are reconstructing.

Angular attention mechanism

Each raw image (either a projection under a weak scattering assumption or a diffraction pattern under a strong scattering condition) from a new angle of illumination is combined at the SC-GRU input with the hidden feature from the same SC-GRU’s previous output. After N iterations, there are N different hidden features resulting from N illumination angles, as seen in Eq. (10). Since the forward operator under both scattering assumptions is object dependent, the qualitative information that each such new angle conveys will vary with the object. It then becomes interesting to consider whether some angles of illumination convey more information than others.

The analog in temporal dynamical systems, the usual domain of application for RNNs, is the attention mechanism. It decides which elements of the system’s state are the most informative. In our case, of course, time has been replaced by the angle of illumination, so we refer to the same mechanism as angular attention: it evaluates the relative importance of information from each illumination angle in generating the overall reconstruction and thus adaptively assigns different weights to every angle on how much attention should be paid to.

Following the convention of an additive attention mechanism⁷⁴, we compute the weight α_m from its associated energy e_m as output of a neural network with a hidden layer W_e as

$$\begin{array}{*{20}{l}} e_m = \tanh \left( {W_eh_m} \right), \\ \alpha _m = {\mathrm{softmax}}\left( {e_m} \right) = \frac{{\exp \left( {e_m} \right)}}{{\mathop {\sum }\nolimits_{m = 1}^M \exp \left( {e_m} \right)}},\\ m = 1,2, \ldots ,\,M \end{array}$$

(11)

The output of the angular attention as a single representation a is then computed from a linear combination of the hidden features as

$$a = \mathop {\sum }\limits_{m = 1}^M \alpha _mh_m$$

(12)

where a can be also viewed as the expected hidden representation since the weight α_m is essentially a probability that a hidden representation h_m is taken into account to the angular attention output a. There is an alternative, dot-product attention mechanism⁹², but we chose not to implement it here.

Training the recurrent neural network

For the weak scattering case, 2000 and 400 synthetic amplitude phantoms are used for training and validation, respectively. Projections are acquired by the Radon transform along several angles within the limited angular range, as described in Approximant computations in Materials and methods. The FBP Approximants are obtained by the FBP algorithm from the projections.

For the strong scattering case, 5000 and 500 synthetic layered objects are used for training and validation, respectively. For each object, a sequence of intensity diffraction patterns from the N = 42 angles of illumination are produced by BPM, as described earlier. The Approximants are obtained each as a single iteration of the gradient descent, followed by the DWMA process.

For both scattering cases, all of the architectures are trained with a Training Loss Function (TLF) of negative Pearson Correlation Coefficient (NPCC)⁹³:

$${\cal{E}}_{{\mathrm{NPCC}}}\left( {f,\hat f} \right) = - \frac{{\mathop {\sum }\nolimits_{x,y} \left( {f\left( {x,y} \right) - f} \right)\left( {\hat f\left( {x,y} \right) - \hat f} \right)}}{{\sqrt {\mathop {\sum }\nolimits_{x,y} \left( {f\left( {x,y} \right) - f} \right)^2} \sqrt {\mathop {\sum }\nolimits_{x,y} \left( {\hat f\left( {x,y} \right) - \hat f} \right)^2} }}$$

(13)

where f and $\hat f$ are a ground truth image and its corresponding reconstruction. In this article, our NPCC function is defined to perform computation in 3D. We use a stochastic gradient descent scheme with the Adam optimizer⁹⁴. The learning rate is set to be 10⁻³ initially and halved whenever validation loss plateaued for 5 consecutive epochs. The lower bound is set to be 10⁻⁶, and the batch size is set to be 10. The computer used for training has Intel Xeon Gold 6248 CPU at 2.50 GHz with 27.5 MB cache, 384 GB RAM, and dual NVIDIA Volta V100 GPUs with 32 GB VRAM⁹⁵, and it took approximately 5 min per each training epoch.

For comparison, we also re-train the 3D-DenseNet architecture with skip connections in ref. ⁴⁵ with the same training scheme above. This serves as baseline; however, the number of parameters in this network is 0.5 M, whereas in our RNN architecture the number of parameters is 21 M. We also train an enhanced version of the 3D-DenseNet by tuning the number of dense blocks, the number of layers inside each dense block, filter size, and growth rate to match the total number of parameters with that of the RNN, i.e. 21 M. These two versions of the 3D-DenseNet are referred to as Baseline (0.5 M) and Baseline (21 M), respectively, in Figs. 8, 9, 11, and 12.

Testing procedures and metrics

Test performance was demonstrated with only the simulated projections under the weak scattering condition. The projections were processed with the FBP to generate 100 sequences of FBP Approximants ${\boldsymbol{f}}_n^\prime$ for testing the trained network. Under the strong scattering condition, both simulated and experimental diffraction patterns were used for testing, but the patterns were processed differently. The simulated patterns were directly processed with a single-pass gradient descent to generate 50 sequences of original Approximants ${\boldsymbol{f}}_n^{\left[ 1 \right]}$ Eq. (4), whereas a simple affine transform was first applied to the raw experimentally obtained intensity diffraction patterns of an actual layered object to correct slight misalignment. We then applied the gradient descent up to 4 iterations and the FGP-FISTA up to 8 iterations to the corrected experimental patterns, to compute one set of TV-based Approximants Eq. (5). Testing process took approximately 300 ms for generating each volumetric reconstruction.

Even though training used NPCC as in Eq. (13). we investigated two additional metrics for testing: PE and the Wasserstein distance^96,97. We also quantified test performance using the SSIM⁹⁸.

PE is the mean absolute error between two binary objects; in the digital communication community it is instead referred to as Bit Error Rate (BER). It is the first time to our knowledge to use PE as a quantitative metric in tomography. To obtain the PE, we first threshold the reconstructions (see SI Section S7 for details in the thresholding process) and then define

$${\mathrm{PE = }}\frac{{\left( {\# \,{\mathrm{false}}\,{\mathrm{negatives}}} \right) + \left( {\# \,{\mathrm{false}}\,{\mathrm{positives}}} \right)}}{{{\mathrm{total}}\,\# \,{\mathrm{pixels}}}}$$

(14)

We found that it oftentimes helps to accentuate the differences between a binary phase ground truth object and its binarized reconstruction as even small residual artifacts, if they are above the threshold, are thresholded to be one, and thus they are taken into account to the probability of error calculation more than they would have been to other metrics. With these procedures, PE is a particularly suitable error metric for the kind of objects we consider in this paper.

PE is also closely related to the two-dimensional Wasserstein distance as we will now show through an analytical derivation. The latter metric involves an optimization process in terms of a transport plan to minimize the total cost of transport from a source distribution to a target distribution. The two-dimensional Wasserstein distance is defined as

$$\begin{array}{*{20}{l}} {W_{p = 1}} \hfill & = \hfill & {\mathop {{{\mathrm{min}}}}\limits_p \,{\it{P,C}}} \hfill & = \hfill & {\mathop {{{\mathrm{min}}}}\limits_p \,\mathop {\sum }\limits_{ij} \mathop {\sum }\limits_{kl} \gamma _{ij,kl}C_{ij,kl}} \hfill \\ {{\mathrm{s}}.{\mathrm{t}}{\mathrm{.}}\mathop {\sum }\limits_{kl} \gamma _{ij,kl}} \hfill & = \hfill & {f_{ij},\mathop {\sum }\limits_{ij} \gamma _{ij,kl}} \hfill & = \hfill & {g_{kl},\,\gamma _{ij,kl} \ge 0} \hfill \end{array}$$

(15)

where f_ij and g_kl are a ground truth binary object and its binary reconstruction, i.e. $f_{ij},\,g_{kl},\,\gamma _{ij,kl} \in \left\{ {0,1} \right\}$, a coupling tensor $P = \left( {\gamma _{ij,kl}} \right)$, and a cost tensor $C_{ij,kl} = \left| {x_{ij} - x_{kl}} \right|.$ PE can be reduced to have a similar, but not equivalent, form to that of the Wasserstein distance. For i.j,k,l where $\gamma _{ij,kl} \, \ne \, 0$,

$$\begin{array}{*{20}{l}} {{\mathrm{PE}}} \hfill & = \hfill & {\displaystyle \frac{1}{{N^2}}\mathop {\sum }\limits_{ij} \left| {f_{ij} - g_{ij}} \right|} \hfill \\ {\,} \hfill & = \hfill & {\displaystyle \frac{1}{{N^2}}\mathop {\sum }\limits_{ij} \left| {f_{ij} - \mathop {\sum }\limits_{kl} g_{kl}\delta \left[ {i - k,j - l} \right]} \right|} \hfill \\ {\,} \hfill & = \hfill & {\displaystyle \frac{1}{{N^2}}\mathop {\sum }\limits_{ij} \left| {\mathop {\sum }\limits_{kl} \gamma _{ij,kl} \displaystyle \left( {1 - \frac{{g_{kl}\delta \left[ {i - k,j - l} \right]}}{{\gamma _{ij,kl}}}} \right)} \right|} \hfill \\ {\,} \hfill & = \hfill & {\mathop {\sum }\limits_{ij} \left| {\mathop {\sum }\limits_{kl} \gamma _{ij,kl}\,\tilde C_{ij,kl}} \right|} \hfill \\ {\,} \hfill & = \hfill & {\mathop {\sum }\limits_{ij,kl,\gamma _{ij,kl} \ne 0} \gamma _{ij,kl}\,\tilde C_{ij,kl}} \hfill \end{array}$$

(16)

where $N^2\,\tilde C_{ij,kl} = 1 - \frac{{g_{kl}\delta \left[ {i - k,j - l} \right]}}{{\gamma _{ij,kl}}} = \left\{ {\begin{array}{*{20}{l}} {1,} \hfill & {if\,ij \, \ne \, kl} \hfill \\ {1 - g_{kl}} \hfill & {if\,ij = kl} \hfill \end{array}} \right.$

This shows that the PE is a version of the Wasserstein distance with differently defined cost tensor.

Change history

03 September 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41377-021-00615-5

References

Radon, J. On the determination of functions from their integral values along certain manifolds. IEEE Trans. Med. Imaging 5, 170–176 (1986).
Article Google Scholar
Radon, J. On the determination of functions from their integrals along certain manifolds. Ber. Saechsische Akademie Wissenschaften 29, 262–277 (1917).
Google Scholar
Bracewell, R. N. & Riddle, A. C. Inversion of fan-beam scans in radio astronomy. Astrophysical J. 150, 427 (1967).
Article ADS Google Scholar
Feldkamp, L. A., Davis, L. C. & Kress, J. W. Practical cone-beam algorithm. J. Optical Soc. Am. A 1, 612–619 (1984).
Article ADS Google Scholar
Dreike, P. & Boyd, D. P. Convolution reconstruction of fan beam projections. Computer Graph. Image Process. 5, 459–469 (1976).
Article Google Scholar
Wang, G. et al. A general cone-beam reconstruction algorithm. IEEE Trans. Med. Imaging 12, 486–496 (1993).
Article Google Scholar
Kudo, H. & Saito, T. Helical-scan computed tomography using cone-beam projections. In Proc. Conference Record of the 1991 IEEE Nuclear Science Symposium and Medical Imaging Conference 1958–1962 (IEEE, 1991).
Grangeat, P. in Mathematical Methods in Tomography (eds Herman, G. T., Louis, A. K. & Natterer, F.) 66–97 (Springer, 1991).
Katsevich, A. Analysis of an exact inversion algorithm for spiral cone-beam CT. Phys. Med. Biol. 47, 2583–2597 (2002).
Article Google Scholar
Choi, W. et al. Tomographic phase microscopy. Nat. Methods 4, 717–719 (2007).
Article Google Scholar
Delaney, A. H. & Bresler, Y. Globally convergent edge-preserving regularized reconstruction: an application to limited-angle tomography. IEEE Trans. Image Process. 7, 204–221 (1998).
Article ADS Google Scholar
Bartolac, S. et al. A local shift‐variant Fourier model and experimental validation of circular cone‐beam computed tomography artifacts. Med. Phys. 36, 500–512 (2009).
Article Google Scholar
Lim, J. W. et al. Comparative study of iterative reconstruction algorithms for missing cone problems in optical diffraction tomography. Opt. Express 23, 16933–16948 (2015).
Article ADS Google Scholar
Ishimaru, A. Electromagnetic Wave Propagation, Radiation, and Scattering: From Fundamentals to Applications. 1st edn (Prentice-Hall, 1991).
Tatarski, V. I. Wave Propagation in a Turbulent Medium (Dover Publications, 2016).
Wolf, E. Three-dimensional structure determination of semi-transparent objects from holographic data. Opt. Commun. 1, 153–156 (1969).
Article ADS Google Scholar
Devaney, A. J. Inverse-scattering theory within the Rytov approximation. Opt. Lett. 6, 374–376 (1981).
Article ADS Google Scholar
Pham, T. A. et al. Three-dimensional optical diffraction tomography with Lippmann-Schwinger model. IEEE Trans. Comput. Imaging 6, 727–738 (2020).
Article MathSciNet Google Scholar
Marks, D. L. A family of approximations spanning the Born and Rytov scattering series. Opt. Express 14, 8837–8848 (2006).
Article ADS Google Scholar
Milgram, J. H. & Li, W. C. Computational reconstruction of images from holograms. Appl. Opt. 41, 853–864 (2002).
Article ADS Google Scholar
Tian, L. et al. Quantitative measurement of size and three-dimensional position of fast-moving bubbles in air-water mixture flows using digital holography. Appl. Opt. 49, 1549–1554 (2010).
Article ADS Google Scholar
Hahn, J. et al. Wide viewing angle dynamic holographic stereogram with a curved array of spatial light modulators. Opt. Express 16, 12372–12386 (2008).
Article ADS Google Scholar
Park, J. H., Hong, K. & Lee, B. Recent progress in three-dimensional information processing based on integral imaging. Appl. Opt. 48, H77–H94 (2009).
Article ADS Google Scholar
Nehmetallah, G. & Banerjee, P. P. Applications of digital and analog holography in three-dimensional imaging. Adv. Opt. Photonics 4, 472–553 (2012).
Article ADS Google Scholar
Williams, L., Nehmetallah, G. & Banerjee, P. P. Digital tomographic compressive holographic reconstruction of three-dimensional objects in transmissive and reflective geometries. Appl. Opt. 52, 1702–1710 (2013).
Article ADS Google Scholar
Brady, D. J. et al. Compressive holography. Opt. Express 17, 13040–13049 (2009).
Article ADS Google Scholar
Choi, K. et al. Compressive holography of diffuse objects. Appl. Opt. 49, H1–H10 (2010).
Article Google Scholar
Rivenson, Y. et al. Phase recovery and holographic image reconstruction using deep learning in neural networks. Light.: Sci. Appl. 7, 17141 (2018).
Article Google Scholar
Wu, Y. C. et al. Bright-field holography: cross-modality deep learning enables snapshot 3d imaging with bright-field contrast using a single hologram. Light.: Sci. Appl. 8, 25 (2019).
Article ADS Google Scholar
Rivenson, Y., Wu, Y. C. & Ozcan, A. Deep learning in holography and coherent imaging. Light.: Sci. Appl. 8, 85 (2019).
Article ADS Google Scholar
Zhang, W. H. et al. Twin-image-free holography: a compressive sensing approach. Phys. Rev. Lett. 121, 093902 (2018).
Article ADS Google Scholar
Kamilov, U. S. et al. A recursive born approach to nonlinear inverse scattering. IEEE Signal Process. Lett. 23, 1052–1056 (2016).
Article ADS Google Scholar
Kamilov, U. S. et al. Optical tomographic image reconstruction based on beam propagation and sparse regularization. IEEE Trans. Comput. Imaging 2, 59–70 (2016).
Article MathSciNet Google Scholar
Giorgi, G. et al. Application of the inhomogeneous Lippmann–Schwinger equation to inverse scattering problems. SIAM J. Appl. Math. 73, 212–231 (2013).
Article MathSciNet MATH Google Scholar
Chew, W. C. & Wang, Y. M. Reconstruction of two-dimensional permittivity distribution using the distorted Born iterative method. IEEE Trans. Med. Imaging 9, 218–225 (1990).
Article Google Scholar
Sun, Y., Xia, Z. H. & Kamilov, U. S. Efficient and accurate inversion of multiple scattering with deep learning. Opt. Express 26, 14678–14688 (2018).
Article ADS Google Scholar
Lu, Z. Q. Multidimensional structure diffraction tomography for varying object orientation through generalised scattered waves. Inverse Probl. 1, 339–356 (1985).
Article ADS MathSciNet MATH Google Scholar
Lu, Z. Q. JKM perturbation theory, relaxation perturbation theory, and their applications to inverse scattering: theory and reconstruction algorithms. IEEE Trans. Ultrason. Ferroelectr. Frequency Control 33, 722–730 (1986).
Article ADS Google Scholar
Tsihrintzis, G. A. & Devaney, A. J. Higher order (nonlinear) diffraction tomography: Inversion of the Rytov series. IEEE Trans. Inf. Theory 46, 1748–1761 (2000).
Article MathSciNet MATH Google Scholar
Feit, M. D. & Fleck, J. A. Computation of mode properties in optical fiber waveguides by a propagating beam method. Appl. Opt. 19, 1154–1164 (1980).
Article ADS Google Scholar
Kamilov, U. S. et al. Learning approach to optical tomography. Optica 2, 517–522 (2015).
Article ADS Google Scholar
Shoreh, M. H. et al. Optical tomography based on a nonlinear model that handles multiple scattering. In Proc. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing 6220–6224 (IEEE, 2017).
Lim, J. et al. Learning tomography assessed using Mie theory. Phys. Rev. Appl. 9, 034027 (2018).
Article ADS Google Scholar
Chowdhury, S. et al. High-resolution 3D refractive index microscopy of multiple-scattering samples from intensity images. Optica 6, 1211–1219 (2019).
Article ADS Google Scholar
Goy, A. et al. High-resolution limited-angle phase tomography of dense layered objects using deep neural networks. Proc. Natl Acad. Sci. USA 116, 19848–19856 (2019).
Article ADS Google Scholar
Bertero, M. & Boccacci, P. Introduction to Inverse Problems in Imaging (IOP Publishing Ltd., 1998).
Candès, E. J., Romberg, J. & Tao, T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52, 489–509 (2006).
Article MathSciNet MATH Google Scholar
Gregor, K. & LeCun, Y. Learning fast approximations of sparse coding. In Proc. 27th International Conference on Machine Learning 399–406 (ACM, 2010).
Barbastathis, G., Ozcan, A. & Situ, G. On the use of deep learning for computational imaging. Optica 6, 921–943 (2019).
Article ADS Google Scholar
Jin, K. H. et al. Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 26, 4509–4522 (2017).
Article ADS MathSciNet MATH Google Scholar
Jacobs, O. L. R. Introduction to Control Theory (Oxford University Press, 1993).
Mardani, M. et al. Deep generative adversarial networks for compressed sensing automates MRI. Preprint at https://arxiv.org/abs/1706.00051 (2017).
Mardani, M. et al. Recurrent generative adversarial networks for proximal learning and automated compressive image recovery. Preprint at https://arxiv.org/abs/1711.10046 (2017).
Daubechies, I., Defrise, M. & De Mol, C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004).
Article MathSciNet MATH Google Scholar
Williams, R. J. & Zipser, D. A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1, 270–280 (1989).
Article Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
Shi, X. J. et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In Proc. 28th International Conference on Neural Information Processing Systems 802–810 (MIT Press, 2015).
Wang, Y. B. et al. Eidetic 3D LSTM: A model for video prediction and beyond. In Proc. International Conference on Learning Representations (OpenReview.net, 2019).
Wang, Y. B. et al. PredRNN: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In Proc. 31st Conference on Neural Information Processing Systems 879–888 (ACM, 2017).
Wang, Y. B. et al. PredRNN+ +: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In Proc. 35th International Conference on Machine Learning. (PMLR, 2018).
Kumar, A. C. S., Bhandarkar, S. M. & Prasad, M. DepthNet: a recurrent neural network architecture for monocular depth prediction. In Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 283–291 (IEEE, 2018).
Wang, W. Y. et al. Shape inpainting using 3D generative adversarial network and recurrent convolutional networks. Proc. 2017 IEEE International Conference on Computer Vision (IEEE, 2017, 2298-2306.
Liu, J. & Ji, S. P.A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset. In Proc 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition 6050–6059(IEEE, 2020).
Choy, C. B. et al. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In Proc. 14th European Conference on Computer Vision 628–644 (Springer, 2016).
Le, T., Bui, G. & Duan, Y. A multi-view recurrent neural network for 3D mesh segmentation. Comput. Graph. 66, 103–112 (2017).
Article Google Scholar
Stollenga, M. F. et al. Parallel multi-dimensional LSTM, with application to fast biomedical volumetric image segmentation. In Proc. 28th International Conference on Neural Information Processing Systems (MIT Press, 2015).
Hou, Y. X., Kannala, J. & Solin, A. Multi-view stereo by temporal nonparametric fusion. In Proc. 2019 IEEE/CVF International Conference on Computer Vision 2651–2660 (IEEE, 2019).
Cierniak, R. A new approach to image reconstruction from projections using a recurrent neural network. Int. J. Appl. Math. Comput. Sci. 18, 147–157 (2008).
Article MathSciNet Google Scholar
Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proc. 2014 Conference on Empirical Methods in Natural Language Processing 1724–1734 (Association for Computational Linguistics, 2014).
Kang, I., Goy, A. & Barbastathis, G. Limited-angle tomographic reconstruction of dense layered objects by dynamical machine learning. Preprint at https://arxiv.org/abs/2007.10734 (2020).
Bioucas-Dias, J. M. & Figueiredo, M. A. T. A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans. Image Process. 16, 2992–3004 (2007).
Article ADS MathSciNet Google Scholar
Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. In Proc. 27th International Conference on Machine Learning (ACM, 2010).
Deng, M. et al. On the interplay between physical and content priors in deep learning for computational imaging. Opt. Express 28, 24152–24170 (2020).
Article ADS Google Scholar
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at https://arxiv.org/abs/1409.0473 (2014).
Lukoševicius, M. & Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009).
Article MATH Google Scholar
Lukoševicius, M., Jaeger, H. & Schrauwen, B. Reservoir computing trends. KI-K.ünstliche Intell. 26, 365–371 (2012).
Article Google Scholar
Schrauwen, B., Verstraeten, D. & Van Campenhout, J. An overview of reservoir computing: theory, applications and implementations. In Proc. 15th European Symposium on Artificial Neural Networks 471–482 (Catholic University of Louvain, 2007).
Goy, A. et al. Low photon count phase retrieval using deep learning. Phys. Rev. Lett. 121, 243902 (2018).
Article ADS Google Scholar
Beck, A. & Teboulle, M. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18, 2419–2434 (2009).
Article ADS MathSciNet MATH Google Scholar
Chambolle, A. An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004).
Article MathSciNet MATH Google Scholar
Beck, A. & Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009).
Article MathSciNet MATH Google Scholar
Chollet, F. Xception: deep learning with depthwise separable convolutions. In Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017).
Gonda, F. et al. Parallel separable 3D convolution for video and volumetric data understanding. Preprint at https://arxiv.org/abs/1809.04096 (2018).
Dey, R. & Salem, F. M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proc. 2017 IEEE 60th International Midwest Symposium on Circuits and Systems 1597–1600 (IEEE, 2017).
Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. In Proc. 14th International Conference on Artificial Intelligence and Statistics 315–323 (Society for Artificial Intelligence and Statistics, 2011).
Sinha, A. et al. Lensless computational imaging through deep learning. Optica 4, 1117–1125 (2017).
Article ADS Google Scholar
Gehring, J. et al. A convolutional encoder model for neural machine translation. In Proc. 55th Annual Meeting of the Association for Computational Linguistics (Association for Computational Linguistics, 2016).
Hori, T. et al. Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM. In Proc. Interspeech 2017 949–953 (International Speech Communication Association, 2017).
Zhao, R. et al. Learning to monitor machine health with convolutional bi-directional LSTM networks. Sensors 17, 273 (2017).
Article ADS Google Scholar
He, K. M. et al. Deep residual learning for image recognition. In Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Proc. 18th International Conference on Medical Image Computing and Computer-Assisted Intervention 234–241(Springer, 2015).
Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems 5998–6008(NIPS, 2017).
Li, S. et al. Imaging through glass diffusers using densely connected convolutional networks. Optica 5, 803–813 (2018).
Article ADS Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Reuther, A. et al. Interactive supercomputing on 40,000 cores for machine learning and data analysis. In Proc. 2018 IEEE High Performance Extreme Computing Conference 1–6 (IEEE, 2018).
Villani, C. Topics in Optimal Transportation (American Mathematical Society, 2003).
Kolouri, S. et al. Optimal mass transport: signal processing and machine-learning applications. IEEE Signal Process. Mag. 34, 43–59 (2017).
Article ADS Google Scholar
Wang, Z. et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Article ADS Google Scholar

Download references

Acknowledgements

The authors acknowledge funding from Intelligence Advanced Research Projects Activity (FA8650-17-C-9113). I. K. acknowledges partial support from Korea Foundation for Advanced Studies (KFAS) scholarship. We are grateful to Jungmoon Ham for her assistance with drawing Figs. 1 and 17, and to Subeen Pang, Mo Deng, and Peter So for useful discussions and suggestions. The authors acknowledge the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for providing (HPC, database, consultation) resources that have contributed to the research results reported within this paper.

Author information

Alexandre Goy
Present address: Omnisens SA, Morges, 1110, Switzerland

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, USA
Iksung Kang
Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Alexandre Goy & George Barbastathis
Singapore-MIT Alliance for Research and Technology (SMART) Centre, 1 Create Way, Singapore, 117543, Singapore
George Barbastathis

Authors

Iksung Kang
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Goy
View author publications
You can also search for this author in PubMed Google Scholar
George Barbastathis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.K. and G.B. conceived the research and devised the dynamical machine learning concept; I.K. developed the machine learning algorithm and performed the simulations, the neural network training, image processing, and data analysis; A.G. performed the strong-scattering optical experiments, and all authors contributed to the preparation of the paper and the discussion.

Corresponding author

Correspondence to Iksung Kang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Supplementary information

Supplementary movie 1

Supplementary movie 2

Supplementary movie 3

Supplementary movie 4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kang, I., Goy, A. & Barbastathis, G. Dynamical machine learning volumetric reconstruction of objects’ interiors from limited angular views. Light Sci Appl 10, 74 (2021). https://doi.org/10.1038/s41377-021-00512-x

Download citation

Received: 17 September 2020
Revised: 04 March 2021
Accepted: 13 March 2021
Published: 07 April 2021
DOI: https://doi.org/10.1038/s41377-021-00512-x

This article is cited by

Artificial intelligence-enabled quantitative phase imaging methods for life sciences
- Juyeon Park
- Bijie Bai
- YongKeun Park
Nature Methods (2023)
Attentional Ptycho-Tomography (APT) for three-dimensional nanoscale X-ray imaging with minimal data acquisition and computation time
- Iksung Kang
- Ziling Wu
- George Barbastathis
Light: Science & Applications (2023)
Adaptive 3D descattering with a dynamic synthesis network
- Waleed Tahir
- Hao Wang
- Lei Tian
Light: Science & Applications (2022)
Computer-free computational imaging: optical computing for seeing through random media
- Yunzhe Li
- Lei Tian
Light: Science & Applications (2022)