Abstract
Volumetric imaging of samples using fluorescence microscopy plays an important role in various fields including physical, medical and life sciences. Here we report a deep learningbased volumetric image inference framework that uses 2D images that are sparsely captured by a standard widefield fluorescence microscope at arbitrary axial positions within the sample volume. Through a recurrent convolutional neural network, which we term as RecurrentMZ, 2D fluorescence information from a few axial planes within the sample is explicitly incorporated to digitally reconstruct the sample volume over an extended depthoffield. Using experiments on C. elegans and nanobead samples, RecurrentMZ is demonstrated to significantly increase the depthoffield of a 63×/1.4NA objective lens, also providing a 30fold reduction in the number of axial scans required to image the same sample volume. We further illustrated the generalization of this recurrent network for 3D imaging by showing its resilience to varying imaging conditions, including e.g., different sequences of input images, covering various axial permutations and unknown axial positioning errors. We also demonstrated widefield to confocal crossmodality image transformations using RecurrentMZ framework and performed 3D image reconstruction of a sample using a few widefield 2D fluorescence images as input, matching confocal microscopy images of the same sample volume. RecurrentMZ demonstrates the first application of recurrent neural networks in microscopic image reconstruction and provides a flexible and rapid volumetric imaging framework, overcoming the limitations of current 3D scanning microscopy tools.
Similar content being viewed by others
Introduction
Highthroughput imaging of 3D samples is of significant importance for numerous fields. Volumetric imaging is usually achieved through optical sectioning of samples using various microscopy techniques. Generally, optical sectioning can be categorized based on its dimension of sectioning: (i) 0dimensional pointwise sectioning, including e.g., confocal^{1}, twophoton^{2} and threephoton^{3} laser scanning microscopy, and timedomain optical coherence tomography (TDOCT)^{4}; (ii) 1dimensional linewise sectioning, including e.g., spectral domain OCT^{5,6}, (iii) 2dimensional planewise sectioning, including e.g., widefield and lightsheet^{7} fluorescence microscopy. In all of these modalities, serial scanning of the sample volume is required, which limits the imaging speed and throughput, reducing the temporal resolution, also introducing potential photobleaching on the sample. Different imaging methods have been proposed to improve the throughput of scanningbased 3D microscopy techniques, such as multifocal imaging^{8,9,10,11,12,13}, lightfield microscopy^{14,15}, microscopy with engineered point spread functions (PSFs)^{16,17,18} and compressive sensing^{19,20,21}. Nevertheless, these solutions introduce tradeoffs, either by complicating the microscope system design, compromising the image quality and/or resolution or prolonging the image postprocessing time. In addition to these, iterative algorithms that aim to solve the inverse 3D imaging problem from a lower dimensional projection of the volumetric image data, such as the fast iterative shrinkage and thresholding algorithm (FISTA)^{22} and alternating direction method of multiplier (ADMM)^{23} are relatively timeconsuming and unstable, and further require userdefined regularization of the optimization process as well as an accurate forward model of the imaging system. Some of these limitations and performance tradeoffs have partially restricted the widescale applicability of these computational methods for 3D microscopy.
In recent years, emerging deep learningbased approaches have enabled a new set of powerful tools to solve various inverse problems in microscopy^{24,25}, including e.g., superresolution imaging^{26,27}, virtual labeling of specimen^{28,29,30,31,32}, holographic imaging^{33,34}, Fourier ptychography microscopy^{35}, singleshot autofocusing^{36,37}, threedimensional image propagation^{38}, among many others^{39}. Benefiting from the recent advances in deep learning, these methods require minimal modification to the underlying microscopy hardware, and result in enhanced imaging performance in comparison to conventional image reconstruction and postprocessing algorithms.
The majority of these neural networks applied in microscopic imaging were designed to perform inference using a single 2D input image. An alternative method to adapt a deep network’s inference ability to utilize information that is encoded over volumetric inputs (instead of a single 2D input image) is to utilize 3D convolution kernels. However, this approach requires a significant number of additional trainable parameters and is therefore more susceptible to overfitting. Moreover, simply applying 3D convolution kernels and representing the input data as a sequence of 2D images would constrain the input sampling grid and introduce practical challenges. As an alternative to 3D convolution kernels, recurrent neural networks (RNNs) were originally designed for sequential temporal inputs, and have been successfully applied in various tasks in computer vision^{40,41,42}.
Here, we introduce the first RNNbased volumetric microscopy framework, which is also the first application of RNNs in microscopic image reconstruction; we termed this framework as RecurrentMZ. RecurrentMZ permits the digital reconstruction of a sample volume over an extended depthoffield (DOF) using a few different 2D images of the sample as inputs to a trained RNN (see Fig. 1a). The input 2D images are sparsely sampled at arbitrary axial positions within the sample volume and the convolutional recurrent neural network (RecurrentMZ) takes these 2D microscopy images as its input, along with a set of digital propagation matrices (DPMs) which indicate the relative distances (dz) to the desired output plane(s). Information from the input images is separately extracted using sequential convolution blocks at different scales, and then the recurrent block aggregates all these features from the previous scans/images, allowing flexibility in terms of the length of the input image sequence as well as the axial positions of these input images, which do not need to be regularly spaced or sampled; in fact, the input 2D images can even be randomly permuted.
We demonstrate the efficacy of the RecurrentMZ using multiple fluorescent specimens. First, we demonstrate RecurrentMZ inference for 3D imaging of C. elegans samples, and then quantify its performance using fluorescence nanobeads. Our results demonstrate that RecurrentMZ significantly increases the depthoffield of a 63×/1.4NA objective lens, providing a 30fold reduction in the number of axial scans required to image a sample volume. Furthermore, we demonstrate the robustness of this framework and its inference to axial permutations of the input images as well to uncontrolled errors and noise terms in the axial positioning of different input image scans. Finally, we report widefield to confocal crossmodality image transformation using the RecurrentMZ framework, which takes in e.g., three widefield 2D fluorescence images of a sample as input in order to reconstruct a 3D image stack, matching confocal microscopy images of the same sample; we refer to this crossmodality image transformation network as RecurrentMZ+.
Results
We formulate the target sample volume V(x,y,z) as a random field on the set of all discretized axial positions Z, i.e., \(I_z \in {\mathbb {R}}^{m \times n},z \in Z\), where x,y are pixel indices on the lateral plane, m, n are the lateral dimensions of the image, and z is a certain axial position in Z. The distribution of such random fields is defined by the 3D distribution of the sample of interest, the PSF of the microscopy system, the aberrations and random noise terms present in the image acquisition system. RecurrentMZ takes in a set of M 2D axial images, i.e., \(\left\{ {I_{z_1},I_{z_2}, \cdots ,I_{z_M}} \right\},\,1\, < \,M \ll Z\), where Z is the cardinality of Z, defining the number of unique axial planes in the target sample. The output inference of RecurrentMZ estimates (i.e., reconstructs) the volume of the sample and will be denoted as \(V_M(x,y,z;\,I_{z_1},I_{z_2}, \cdots ,I_{z_M})\). Starting with the next subsection we summarize RecurrentMZ inference results using different fluorescent samples.
RecurrentMZ based volumetric imaging of C. elegans samples
A RecurrentMZ network was trained and validated using C. elegans samples, and then blindly tested on new specimens that were not part of the training/validation dataset. This trained RecurrentMZ was used to reconstruct C. elegans samples with high fidelity over an extended axial range of 18 μm based on three 2D input images that were captured with an axial spacing of Δz = 6 μm; these three 2D images were fed into RecurrentMZ in groups of two, i.e., M = 2 (Fig. 2). The comparison images of the same sample volume were obtained by scanning a widefield fluorescence microscope with a 63×/1.4NA objective lens and capturing Z = 91 images with an axial spacing of Δz = 0.2 μm (see the Materials and Methods section). The inference performance of RecurrentMZ is both qualitatively and quantitatively demonstrated in Fig. 2 and Video S1. Even in the middle of two adjacent input images (see the z = 11.4 μm row of Fig. 2), RecurrentMZ is able to output images with a very good match to the ground truth image, achieving a normalized root mean square error (NRMSE) of 6.45 and a peak signaltonoise ratio (PSNR) of 33.96. As also highlighted in Video S1, RecurrentMZ is able to significantly extend the axial range of the reconstructed images using only three 2D input scans, each captured with a 1.4NA objective lens that has a depthoffield of 0.4 μm. In addition to these, Supplementary Note 1 and Fig. S1 also compare the output images of RecurrentMZ with the results of various interpolation algorithms, further demonstrating the advantages of RecurrentMZ framework for volumetric imaging.
It is worth noting that although RecurrentMZ presented in Fig. 2 was trained with 2 input images (i.e., M = 2), it still can be fed with M ≥ 3 input images thanks to its recurrent scheme. Regardless of the choice of M, all RecurrentMZ networks have the same number of parameters, where the only difference is the additional time that is required during the training and inference phases; for example the inference time of RecurrentMZ with M = 2 and M = 3 for a single output plane (1024 × 1024 pixels) is 0.18 s and 0.28 s, respectively. In practice, using a larger M yields a better performance in terms of the reconstruction fidelity (see e.g., Fig. S2a), at the cost of a tradeoff of imaging throughput and computation time. The detailed discussion about this tradeoff is provided in the Discussion section.
RecurrentMZ based volumetric imaging of fluorescence nanobeads
Next, we demonstrated the performance of RecurrentMZ using 50 nm fluorescence nanobeads. These nanobead samples were imaged through the TxRed channel using a 63×/1.4NA objective lens (see the Materials and Methods section). The RecurrentMZ model was trained on a dataset with M = 3 input images, where the axial spacing between the adjacent planes was Δz = 3 μm. The ground truth images of the sample volume were captured by mechanical scanning over an axial range of 10 μm, i.e., Z = 101 images with Δz = 0.1 μm were obtained. Figure 3 shows both the side views and the crosssections of the sample volume reconstructed by RecurrentMZ (M = 3), compared against the Z = 101 images captured through the mechanical scanning of the same sample. The first column of Fig. 3a presents the M = 3 input images and their corresponding axial positions, which are also indicated by the blue dashed lines. Through the quantitative histogram comparison shown in Fig. 3b, we see that the reconstructed volume by RecurrentMZ matches the ground truth volume with high fidelity. For example, the full width at half maximum (FWHM) distribution of individual nanobeads inferred by RecurrentMZ (mean FWHM = 0.4401 μm) matches the results of the ground truth (mean FWHM = 0.4428 μm) very well. We also showed the similarity of the ground truth histogram with that of the RecurrentMZ output by calculating the Kullback–Leibler (KL) divergence, which is a distance measure between two distributions; the resulting KL divergence of 1.3373 further validates the high fidelity of RecurrentMZ reconstruction when compared to the ground truth, acquired through Z = 101 images captured via mechanical scanning of the sample with Δz = 0.1 μm.
Figure 3 also reports the comparison of RecurrentMZ inference results with respect to another fluorescence image propagation network termed DeepZ^{38}. DeepZ is designed for taking a single 2D image as input, and therefore there is an inherent tradeoff between the propagation quality and the axial refocusing range (from a given focal plane), which ultimately limits the effective volumetric spacebandwidthproduct (SBP) that can be achieved using DeepZ. In this comparison between RecurrentMZ and DeepZ (Fig. 3), the nearest input image is used for DeepZ based propagation; in other words, three nonoverlapping volumes are separately inferred using DeepZ from the input scans at z = 3, 6 and 9 μm, respectively (this provides a fair comparison against RecurrentMZ with M = 3 input images). As illustrated in Fig. 3b, DeepZ inference resulted in a mean FWHM of 0.4185 μm and a KL divergence of 2.3334, which illustrate the inferiority of singleimagebased volumetric propagation, when compared to the results of RecurrentMZ. The same conclusion regarding the performance comparison of RecurrentMZ and DeepZ inference is further supported using the C. elegans imaging data reported in Fig. 2 (RecurrentMZ) and in Fig. S3 (DeepZ). For example, DeepZ inference results in an NRMSE of 8.02 and a PSNR of 32.08, while RecurrentMZ (M = 2) improves the inference accuracy, achieving an NRMSE of 6.45 and a PSNR of 33.96.
Generalization of RecurrentMZ to nonuniformly sampled input images
Next, we demonstrated, through a series of experiments, the generalization performance of RecurrentMZ on nonuniformly sampled input images, in contrast to the training regiment, which only included uniformly spaced inputs. These nonuniformly spaced input image planes were randomly selected from the same testing volume as shown in Fig. 2, with the distance between two adjacent input planes made smaller than the uniform axial spacing used in the training dataset (Δz = 6 μm). Although the RecurrentMZ was solely trained with equidistant input scans, it generalized to successfully perform volumetric image propagation using nonuniformly sampled input images. For example, as shown in Fig. 4a, the input images of RecurrentMZ were randomly selected at (z_{1}, z_{2}, z_{3}) = (3, 7.8, 13.6) μm, respectively, and the output inference at z = 6.8 μm and z = 12.8 μm very well match the output of RecurrentMZ that used uniformly sampled inputs acquired at (z_{1}, z_{2}, z_{3}) = (3, 9, 15) μm, respectively. Figure 4b further demonstrates the inference performance of RecurrentMZ using nonuniformly sampled inputs throughout the specimen volume. The blue (uniform inputs) and the red curves (nonuniform inputs) in Fig. 4b have very similar trends, illustrating the generalization of RecurrentMZ, despite being only trained with uniformlysampled input images with a fixed Δz. Figure S3 further presents another successful blind inference of RecurrentMZ on nonuniformly sampled input images. On the other hand, the gray curve in Fig. 4b (3D UNet with the same nonuniform inputs) clearly illustrates the generalization failure of a nonrecurrent convolutional neural network (CNN) on nonuniformly sampled input images.
We further investigated the effect of the hyperparameter Δz on the performance of RecurrentMZ. For this, three different RecurrentMZ networks were trained using Δz = 4, 6, and 8 μm, respectively, and then blindly tested on a new input sequence with Δz = 6 μm. Figure 4c, d show the tradeoff between the peak performance and the performance consistency over the inference axial range: by decreasing Δz, RecurrentMZ demonstrates a better peak inference performance, indicating that more accurate propagation has been learned from smaller Δz, whereas the variance of PSNR, corresponding to the performance consistency over a larger axial range, is degraded for smaller Δz.
Inference stability of RecurrentMZ
During the acquisition of the input scans, inevitable measurement errors are introduced by e.g., PSF distortions and focus drift^{42}, which jeopardize both the precision and accuracy of the axial positioning measurements. Hence, it is necessary to take these effects into consideration and examine the stability of the RecurrentMZ inference. For this, RecurrentMZ was tested on the same image test set as in Fig. 2, only this time, independent and identically distributed (i.i.d.) Gaussian noise was injected into the DPM of each input image, mimicking the measurement uncertainty when acquiring the axial scans. The noise was added to the DPM as follows:
where Z_{i} is the DPM (m × n matrix) of the ith input image, z_{d,i} ~ N(0,σ^{2}), i = 1, 2, ..., M and J is an allone m × n matrix.
The results of this noise analysis reveal that, as illustrated in Fig. 5b, the output images of RecurrentMZ (M = 2) at z = 4.6 μm degrade as the variance of the injected noise increases, as expected. However, even at a relatively significant noise level, where the microscope stage or sample drift is represented with a standard variation of σ = 1 μm (i.e., 2.5fold of the objective lens depthoffield, 0.4 μm), RecurrentMZ inference successfully matches the ground truth with an NRMSE of 5.94; for comparison, the baseline inference (with σ = 0 μm) has an NRMSE of 5.03. The same conclusion also holds for output images at z = 6.8 μm, which highlights the resilience of RecurrentMZ framework against axial scanning errors and/or uncontrolled drifts in the sample/stage.
Permutation invariance of RecurrentMZ
Next, we focused on post hoc interpretation^{43,44} of the RecurrentMZ framework, without any modifications to its design or the training process. For this, we explored to see if RecurrentMZ framework exhibits permutation invariance, i.e.,
where S_{M} is the permutation group of M. To explore the permutation invariance of RecurrentMZ (see Fig. 6), the test set’s input images were randomly permuted, and fed into the RecurrentMZ (M = 3), which was solely trained with input images sorted by z. We then quantified RecurrentMZ outputs over all the 6 permutations of the M = 3 input images, using the average RMSE (μ_{RMSE}) and the standard deviation of the RMSE (σ_{RMSE}), calculated with respect to the ground truth image I:
where RMSE(I, J) gives the RMSE between image I and J. In Fig. 6e, the red line indicates the average RMSE over 6 permutations and the pink shaded region indicates the standard deviation of RMSE over these 6 permutations. RMSE and RMS values were calculated based on the yellow highlighted regions of interest (ROIs) in Fig. 6. Compared with the blue line in Fig. 6e, which corresponds to the output of the RecurrentMZ with the inputs sorted by z, the input image permutation results highlight the success of RecurrentMZ with different input image sequences, despite being trained solely by depth sorted inputs. In contrast, nonrecurrent CNN architectures, such as 3D UNet^{45}, inevitably lead to input permutation instability as they require a fixed length and sorted input sequences; this failure of nonrecurrent CNN architectures is illustrated in Fig. S5.
We also explored different training schemes to further improve the permutation invariance of RecurrentMZ, including training with input images sorted in descending order by the relative distance (dz) to the output plane as well as randomly sorted input images. As shown in Fig. S6, the RecurrentMZ trained with input images that are sorted by depth, z, achieves the best inference performance, indicated by an NRMSE of 4.03, whereas incorporating randomly ordered inputs in the training phase results in the best generalization for different input image permutations. The analyses reported in Fig. S6 further highlight the impact of different training schemes on the inference quality and the permutation invariance feature of the resulting trained RecurrentMZ network.
Repetition invariance of RecurrentMZ
Next, we explored to see if RecurrentMZ framework exhibits repetition invariance. Figure 7 demonstrates the repetition invariance of RecurrentMZ when it was repeatedly fed with input image I_{1}. The output images of RecurrentMZ in Fig. 7b show its consistency for 2, 4 and 6 repetitions of I_{1}, i.e., V_{ii}(I_{1}, I_{1}), V_{ii} (I_{1}, I_{1}, I_{1}, I_{1}) and V_{ii}(I_{1}, I_{1}, I_{1}, I_{1}, I_{1}, I_{1}), which resulted in an RMSE of 12.30, 11.26, and 11.73, respectively. Although RecurrentMZ was never trained with repeated input images, its recurrent scheme still demonstrates the correct propagation under repeated inputs of the same 2D plane. When compared with the output of DeepZ (i.e., DeepZ(I_{1})) shown in Fig. 7c, RecurrentMZ, with a single input image or its repetitions, exhibits comparable reconstruction quality. Figure S7 also presents a similar comparison when M = 3, further supporting the same conclusion.
While for a single input image (I_{1} or its repeats) the blind inference performance of RecurrentMZ is on par with DeepZ(I_{1}), the incorporation of multiple input planes gives a superior performance to RecurrentMZ over DeepZ. As shown in the last two columns of Fig. 7b, by adding another depth image, I_{2}, the output of RecurrentMZ is significantly improved, where the RMSE decreased to 8.78; this represents a better inference performance compared to DeepZ(I_{1}) and DeepZ(I_{2}) as well as the average of these two DeepZ outputs (see Fig. 7b, c). The same conclusion is further supported in Fig. S7b, c for M = 3, demonstrating that RecurrentMZ is able to outperform DeepZ even if all of its M input images are individually processed by DeepZ and averaged, showing the superiority of the presented recurrent inference framework.
Demonstration of crossmodality volumetric imaging: widefield to confocal
The presented RecurrentMZ framework can also be applied to perform crossmodality volumetric imaging, e.g., from widefield to confocal, where the network takes in a few widefield 2D fluorescence images (input) to infer at its output a volumetric image stack, matching the fluorescence images of the same sample obtained by a confocal microscope; we termed this crossmodality image transformation framework as RecurrentMZ+. To experimentally demonstrate this unique capability, RecurrentMZ+ was trained using widefield (input) and confocal (ground truth) image pairs corresponding to C. elegans samples (see the Materials and Methods section for details). Figure 8 and Movie S3 report blindtesting results on new images never used in the training phase. In Fig. 8, M = 3 widefield images captured at z = 2.8, 4.8, and 6.8 μm were fed into RecurrentMZ+ as input images and were virtually propagated onto axial planes from 0 to 9 μm with 0.2 μm spacing; the resulting RecurrentMZ+ output images provided a very good match to the corresponding confocal 3D image stack obtained by mechanical scanning (also see Movie S3). Figure 8b further illustrates the maximum intensity projection (MIP) side views (xz and yz), showing the high fidelity of the reconstructed image stack by RecurrentMZ+ with respect to the mechanical confocal scans. In contrast to the widefield image stack of the same sample (with 46 image scans), where only a few neurons can be recognized in the MIP views with deformed shapes, the reconstructed image stack by RecurrentMZ+ shows substantially sharper MIP views using only M = 3 input images, and also mitigates the neuron deformation caused by the elongated widefield PSF, providing a comparable image quality with respect to the confocal microscopy image stack (Fig. 8b).
Discussion
We demonstrated a new deep learningbased volumetric imaging framework termed RecurrentMZ enabled by a convolutional recurrent neural network, which significantly extends the DOF of the microscopy system from sparse 2D scanning, providing a 30fold reduction in the number of required mechanical scans. Another advantage of RecurrentMZ is that it does not require special optical components in the microscopy setup or an optimized scanning strategy. Despite being trained with equidistant input scans, RecurrentMZ successfully generalized to use input images acquired with a nonuniform axial spacing as well as unknown axial positioning errors, all of which demonstrate its robustness.
In a practical application, the users of RecurrentMZ should select an optimum M to provide a balance between the inference image quality of the reconstructed sample volume and the imaging throughput. For example, it is possible to set a stopping threshold, ϵ, for the volumetric reconstruction improvement that is provided by adding another image/scan to RecurrentMZ, in terms of the Euclidean distance from the volume which was reconstructed from previous images; stated differently, the scanning can stop when e.g., ‖V_{M}(I_{1}, ..,I_{M}) − V_{M−1}(I_{1},..., I_{M−1})‖_{F} ≤ ϵ, where ‖·‖_{F} defines the Frobenius norm.
Importantly, this study shows the first application of convolutional recurrent neural networks in microscopic image reconstruction, and also reveals the potential of RNNs in microscopic imaging when sequential image data are acquired. With regards to solving inverse problems in microscopic imaging, most existing deep learningbased methods are optimized for a single shot/image, whereas sequential shots are generally convenient to obtain and substantial sample information hides in their 3D distribution. Through the incorporation of sequential 2D scans, the presented RecurrentMZ integrates the information of different input images from different depths to gain considerable improvement in the volumetric image quality and the output DOF. Furthermore, the success of crossmodality image transformations using RecurrentMZ+ reveals its potential for a wide spectrum of biomedical and biological applications, where confocal microscopy is frequently utilized. Using just a few widefield 2D input images corresponding to a volumetric sample, RecurrentMZ+ is able to rapidly provide a 3D image stack that is comparable to confocal microscopic imaging of the same sample volume (see Fig. 8 and Movie S3), potentially avoiding timeconsuming scanning and substantially increasing the 3D imaging throughput.
In contrast to 3D CNNs that generally require a fixed sampling grid (see e.g., the failure of 3D UNet with nonuniform axial sampling in Fig. 4), the presented recurrent scheme is compatible with (1) input sequences of variable lengths, as shown in Fig. 7, and (2) input images at variable, nonuniform axial sampling, as shown in Figs. 4 and S4. In various imaging applications, where the 3D distribution of the fluorescent samples has a large axial variation, exhibiting significant spatial nonuniformity, RecurrentMZ’s compatibility with variable sampling grids provides us significant flexibility and performance advantage over conventional 3D CNNs that demand a fixed sampling grid. Another interesting property that we demonstrated is the robustness of RecurrentMZ inference to input image permutations (Fig. 6), which could lead to catastrophic failure modes for standard convolutional networks, as also illustrated in Fig. S5. For 3D microscopy modalities under random, interlaced or other specific scanning modes, where the captured images can hardly be sorted, this unique permutation invariance could empower the RecurrentMZ framework to correctly utilize the information of the input image sequence regardless of its order. One potential future application that might benefit from the above discussed unique advantages of the RecurrentMZ framework is 3D microscopic imaging with an isotropic PSF. In general, 3D imaging with isotropic resolution can be achieved through e.g., image fusion and deconvolution from multiple views^{46,47,48,49}, during which several image stacks from different viewing angles are acquired to improve 3D image resolution. The presented RecurrentMZ framework and the underlying core principles could potentially be applied to incorporate 2D images that are sparsely captured at a variable sampling grid under the spherical coordinate system, i.e., sampling the 3D object at variable depths from variable viewing angles. Such a learning approach that is based on RecurrentMZ framework might significantly reduce the number of axial scans and viewing angles needed to achieve an isotropic 3D resolution.
In summary, RecurrentMZ provides a rapid and flexible volumetric imaging framework with reduced number of axial scans, and opens up new opportunities in machine learningbased 3D microscopic imaging. The presented recurrent neural network structure could also be widely applicable to process sequential data resulting from various other 3D imaging modalities such as OCT, Fourier ptychographic microscopy, holography, structured illumination microscopy, among others.
Materials and methods
Sample preparation, image acquisition and dataset preparation
The C. elegans samples were firstly cultured and stained with GFP using the strain AML18. AML18 carries the genotype wtfIs3 [rab3p::NLS::GFP+rab3p::NLS::tagRFP] and expresses GFP and tagRFP in the nuclei of all the neurons. C. elegans samples were cultured on nematode growth medium seeded with OP50 E. Coli bacteria using standard conditions. During the imaging process, the samples were washed off the plates with M9 solution and anesthetized with 3 mM levamisole, and then mounted on slides seeded with 3% agarose.
The widefield and confocal microscopy images of C. elegans were captured by an inverted scanning microscope (TCS SP8, Leica Microsystems), using a 63×/1.4NA objective lens (HC PL APO 63×/1.4NA oil CS2, Leica Microsystems) and a FITC filter set (excitation/emission wavelengths: 495 nm/519 nm), resulting in a DOF about 0.4 μm. A monochrome scientific CMOS camera (Leica DFC9000GTCVSC08298) was used for widefield imaging where each image has 1024 × 1024 pixels and 12bit dynamic range; a photomultiplier tube (PMT) recorded the confocal image stacks. For each FOV, 91 images with 0.2 μm axial spacing were recorded, where the starting position of the axial scan (z = 0 μm) was set on the boundary of each worm. A total of 100 FOVs were captured and exclusively divided into training, validation and testing datasets at the ratio of 41:8:1, respectively, where the testing dataset was strictly captured on distinct worms that were not used in training dataset.
The nanobead image dataset consists of widefield microscopic images that were captured using 50 nm fluorescence beads with a Texas Red filter set (excitation/emission wavelengths: 589 nm/615 nm). The widefield microscopy system consists of an inverted scanning microscope (TCS SP8, Leica Microsystems) and a 63×/1.4NA objective lens (HC PL APO 63×/1.4NA oil CS2, Leica Microsystems). The nanobeads were purchased from MagSphere (PSF050NM RED), and ultrasonicated before dilution into the heated agar solution. ~1 mL diluted beadagar solution was further mixed to break down the bead clusters and then a 2.5 µL droplet was pipetted onto a cover slip, spread and dried for imaging. Axial scanning was implemented and the system started to record images (z = 0 μm) when a sufficient number of nanobeads could be seen in the FOV. Each volume contains 101 images with 0.1 μm axial spacing. A subset of 400, 86 and 16 volumes were exclusively divided as training, validation and testing datasets.
Each captured image volume was first axially aligned using the ImageJ plugin ‘StackReg’^{50} for correcting the lateral stage shift and stage rotation. Secondly, an image with extended depth of field (EDF) was generated for each volume, using the ImageJ plugin ‘Extended Depth of Field’^{51}. The EDF image was later used as a reference for the following image processing steps: (1) apply triangle thresholding to the EDF image to separate the background and foreground contents^{38}, (2) draw the mean intensity from the background pixels as the shift factor, and the 99% percentile of the foreground pixels as the scale factor, (3) normalize the volume by the shift and scale factors. For RecurrentMZ+, confocal image stacks were registered to their widefield counterparts using the same featurebased registration method reported earlier^{38}. Thirdly, training FOVs were cropped into small regions of 256 × 256 pixels without any overlap. Eventually, the data loader randomly selects M images from the volume with an axial spacing of Δz = 6 μm (C. elegans) and Δz = 3 μm (nanobeads) in both the training and testing phases.
Network structure
RecurrentMZ is based on a convolutional recurrent network^{52} design, which combines the advantages of both convolutional neural networks^{39} and recurrent neural networks in processing sequential inputs^{53,54}. A common design of the network is formed by an encoderdecoder structure^{55,56}, with the convolutional recurrent units applying to the latent domain^{40,57,58,59}. Furthermore, inspired by the success of exploiting multiscale features in image translation tasks^{60,61,62}, a sequence of cascaded encoderdecoder pairs is utilized to exploit and incorporate image features at different scales from different axial positions.
As shown in Fig. 1b, the output of last encoder block x_{k−1} is pooled and then fed into the kth block, which can be expressed as
where P(·) is the 2 × 2 maxpooling operation, BN(·) is batch normalization, ReLU(·) is the rectified linear unit activation function and Conv_{k,i}(·) stands for the ith convolution layer in the kth encoder block. The convolution layers in all convolution blocks have a kernel size of 3 × 3, with a stride of 1, and the number of channels for Conv_{k,1} and Conv_{k,2} are 20 · 2^{k−2} and 20 · 2^{k−1}, respectively. Then, x_{k} is sent to the recurrent block, where features from the sequential input images are recurrently integrated:
where RConv_{k}(·) is the convolutional recurrent layer with kernels of 3 × 3 and a stride of 1, the Conv_{k,3}(·) is a 1 × 1 convolution layer. Finally, at the decoder part, s_{k} is concatenated with the upsampled output from last decoder convolution block, and fed into the kth decoder block, so the output of kth decoder block can be expressed as
where ⊕ is the concatenation operation, I(·) is the 2 × 2 upsampling operation using nearest interpolation and Conv_{k,i}(·) are the convolution layers of the kth decoder block.
In this work, the gated recurrent unit (GRU)^{63} is used as the recurrent unit, i.e., the RConv(·) layer in Eq. (2) updates h_{t}, given the input x_{t}, through the following three steps:
where f_{t}, h_{t} are forget and output vectors at time step t, respectively, W_{f}, W_{h}, U_{f}, U_{h} are the corresponding convolution kernels, b_{f}, b_{h} are the corresponding biases, σ is the sigmoid activation function, * is the 2D convolution operation, and ⊙ is the elementwise multiplication. Compared with long short term memory (LSTM) network^{64}, GRU entails fewer parameters but is able to achieve similar performance.
The discriminator (D) is a CNN consisting of five convolutional blocks and two dense layers. The kth convolutional block has two convolutional layers with 20 · 2^{k} channels. A global average pooling layer compacts each channel before the dense layers. The first dense layer has 20 hidden units with ReLU activation function and the second dense layer uses a sigmoid activation function. The GAN structure and other details of both the generator and discriminator networks are reported in Fig. S8.
RecurrentMZ implementation
The RecurrentMZ was written and implemented using TensorFlow 2.0. In both training and testing phases, a DPM is automatically concatenated with the input image by the data loader, indicating the relative axial position of the input plane to the desired output plane, i.e., the input in the training phase has dimensions of M × 256 × 256 × 2. Through varying the DPMs, RecurrentMZ learns to digitally propagate inputs to any designated plane, and thus forming an output volume with dimensions of Z × 256 × 256.
The training loss of RecurrentMZ is composed of three parts: (i) pixelwise BerHu loss^{65,66}, (ii) multiscale structural similarity index (MSSSIM)^{67}, and (iii) the adversarial loss using the generative adversarial network (GAN)^{68} structure. Based on these, the total loss of RecurrentMZ, i.e., L_{V}, is expressed as
\(\hat y\) is the output image of the RecurrentMZ, and y is the ground truth image for a given axial plane. α, β, γ are the hyperparameters, which were set as 3, 1 and 0.5, respectively. And the MSSSIM and BerHu losses are expressed as:
x_{j}, y_{j} are 2^{j−1} downsampled images of x,y, respectively, \(\mu _x,\,\sigma _x^2\) denote the mean and variance of x, respectively, and \(\sigma _{xy}^2\) denotes the covariance between x and y. x(m,n) is the intensity value at pixel (m,n) of image x. α_{M}, β_{j}, γ_{j}, C_{i} are empirical constants^{67} and c is a constant set as 0.1. BerHu and MSSSIM losses provide a structural loss term, in addition to the adversarial loss, focusing on the highlevel image features. The combination of SSIM or MSSSIM evaluating regional or global similarity, and a pixelwise loss term with respect to the ground truth (such as L1, L2, Huber and BerHu) has been shown to improve network performance in image translation and restoration tasks^{69}.
The loss for the discriminator L_{D} is defined as:
where D is the discriminator of the GAN framework. An Adam optimizer^{70} with an initial learning rate 10^{−5} was employed for stochastic optimization.
The training time on a PC with Intel Xeon W2195 CPU, 256 GB RAM and one single NVIDIA RTX 2080 Ti graphic card is about 3 days. After optimization for mixed precision and parallel computation, the image reconstruction using RecurrentMZ (M = 3) takes ~0.15 s for an output image of 1024 × 1024, and ~3.42s for a volume of 101 × 1024 × 1024 pixels.
The implementation of DeepZ
The DeepZ network, used for comparison purposes, is identical as in ref. ^{38}, and was trained and tested on the same dataset as RecurrentMZ using the same machine. The loss function, optimizer and hyperparameter settings were also identical to ref. ^{38}. Due to the singlescan propagation of DeepZ, the training range is \(\frac{1}{M}\) of that of RecurrentMZ, depending on the value of M used in the comparison. The reconstructed volumes over a large axial range, as presented in the manuscript, were axially stacked using M nonoverlapping volumes, which were propagated from different input scans and covered \(\frac{1}{M}\) of the total axial range. The DeepZ reconstruction time for a 1024 × 1024 output image on the same machine as RecurrentMZ is ~0.12 s.
The implementation of 3D UNet
For each input sequence of M × 256 × 256 × 2 (the second channel is the DPM), it was reshaped as a tensor of 256 × 256 × (2M) and fed into the 3D UNet^{45}. When permuting the M input scans, the DPMs always follow the corresponding images/scans. The number of channels at the last convolutional layer of each downsampling block is 60 · 2^{k} and the convolutional kernel is 3 × 3 × 3. The network structure is the same as reported in ref. ^{45}. The other training settings, such as the loss function and optimizer are similar to RecurrentMZ. The reconstruction time (M = 3) for an output image of 1024 × 1024 on the same machine (Intel Xeon W2195 CPU, 256 GB RAM and one single NVIDIA RTX 2080 Ti graphic card) is ~0.2 s.
References
Pawley, J. B. Handbook of Biological Confocal Microscopy. 3rd edn. (SpringerVerlag, New York, 2006).
Denk, W., Strickler, J. H. & Webb, W. W. Twophoton laser scanning fluorescence microscopy. Science 248, 73–76 (1990).
Horton, N. G. et al. In vivo threephoton microscopy of subcortical structures within an intact mouse brain. Nat. Photonics 7, 205–209 (2013).
Huang, D. et al. Optical coherence tomography. Science 254, 1178–1181 (1991).
Haeusler, G. & Lindner, M. W. “Coherence radar” and “spectral radar”—new tools for dermatological diagnosis. J. Biomed. Opt. 3, 21–31 (1998).
Fercher, A. F. et al. Measurement of intraocular distances by backscattering spectral interferometry. Opt. Commun. 117, 43–48 (1995).
Santi, P. A. Light sheet fluorescence microscopy: a review. J. Histochem. Cytochem. 59, 129–138 (2011).
Prabhat, P. et al. Simultaneous imaging of different focal planes in fluorescence microscopy for the study of cellular dynamics in three dimensions. IEEE Trans. NanoBiosci. 3, 237–242 (2004).
Johnson, C. et al. Continuous focal translation enhances rate of pointscan volumetric microscopy. Opt. Express 27, 36241–36258 (2019).
Abrahamsson, S. et al. Fast multicolor 3D imaging using aberrationcorrected multifocus microscopy. Nat. Methods 10, 60–63 (2013).
Bouchard, M. B. et al. Swept confocallyaligned planar excitation (SCAPE) microscopy for highspeed volumetric imaging of behaving organisms. Nat. Photonics 9, 113–119 (2015).
Nakano, A. Spinningdisk confocal microscopy—a cuttingedge tool for imaging of membrane traffic. Cell Struct. Funct. 27, 349–355 (2002).
Badon, A. et al. Videorate largescale imaging with MultiZ confocal microscopy. Optica 6, 389–395 (2019).
Li, H. Y. et al. Fast, volumetric livecell imaging using highresolution lightfield microscopy. Biomed. Opt. Express 10, 29–49 (2019).
MartínezCorral, M. & Javidi, B. Fundamentals of 3D imaging and displays: a tutorial on integral imaging, lightfield, and plenoptic systems. Adv. Opt. Photonics 10, 512–566 (2018).
Song, A. et al. Volumetric twophoton imaging of neurons using stereoscopy (vTwINS). Nat. Methods 14, 420–426 (2017).
Chen, X. L. et al. Volumetric chemical imaging by stimulated Raman projection microscopy and tomography. Nat. Commun. 8, 15117 (2017).
Lu, R. W. et al. Videorate volumetric functional imaging of the brain at synaptic resolution. Nat. Neurosci. 20, 620–628 (2017).
Pascucci, M. et al. Compressive threedimensional superresolution microscopy with specklesaturated fluorescence excitation. Nat. Commun. 10, 1327 (2019).
Fang, L. Y. et al. Fast acquisition and reconstruction of optical coherence tomography images via sparse representation. IEEE Trans. Med. Imaging 32, 2034–2049 (2013).
Wen, C. Y. et al. Compressive sensing for fast 3D and randomaccess twophoton microscopy. Opt. Lett. 44, 4343–4346 (2019).
Beck, A. & Teboulle, M. A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009).
Boyd, S. et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011).
de Haan, K. et al. Deeplearningbased image reconstruction and enhancement in optical microscopy. Proc. IEEE 108, 30–50 (2020).
Rivenson, Y. et al. Deep learning microscopy. Optica 4, 1437–1443 (2017).
Wang, H. D. et al. Deep learning enables crossmodality superresolution in fluorescence microscopy. Nat. Methods 16, 103–110 (2019).
Nehme, E. et al. DeepSTORM: superresolution singlemolecule microscopy by deep learning. Optica 5, 458–464 (2018).
Rivenson, Y. et al. Virtual histological staining of unlabelled tissueautofluorescence images via deep learning. Nat. Biomed. Eng. 3, 466–477 (2019).
Bayramoglu, N. et al. Towards virtual H&E staining of hyperspectral lung histology images using conditional generative adversarial networks. In: Proc. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) 64–71 (IEEE, Venice, Italy, 2017).
Christiansen, E. M. et al. In silico labeling: predicting fluorescent labels in unlabeled images. Cell 173, 792–803.e19 (2018).
Ounkomol, C. et al. Labelfree prediction of threedimensional fluorescence images from transmittedlight microscopy. Nat. Methods 15, 917–920 (2018).
Rivenson, Y. et al. PhaseStain: the digital staining of labelfree quantitative phase microscopy images using deep learning. Light.: Sci. Appl. 8, 23 (2019).
Wu, Y. C. et al. Extended depthoffield in holographic imaging using deeplearningbased autofocusing and phase recovery. Optica 5, 704–710 (2018).
Rivenson, Y. et al. Phase recovery and holographic image reconstruction using deep learning in neural networks. Light.: Sci. Appl. 7, 17141 (2018).
Nguyen, T. et al. Deep learning approach for Fourier ptychography microscopy. Opt. Express 26, 26470–26484 (2018).
Pinkard, H. et al. Deep learning for singleshot autofocus microscopy. Optica 6, 794–797 (2019).
Luo, Y. L. et al. Singleshot autofocusing of microscopy images using deep learning. ACS Photonics 8, 625–638 (2021).
Wu, Y. C. et al. Threedimensional virtual refocusing of fluorescence microscopy images using deep learning. Nat. Methods 16, 1323–1331 (2019).
Barbastathis, G., Ozcan, A. & Situ, G. On the use of deep learning for computational imaging. Optica 6, 921–943 (2019).
Choy, C. B. et al. 3DR2N2: a unified approach for single and multiview 3D object reconstruction. In: Proc. 14th European Conference on Computer Vision (ECCV) 2016. 628644. (Springer, Amsterdam, The Netherlands, 2016)
Kar, A., Häne, C. & Malik, J. Learning a multiview stereo machine. In: Proc. 31st International Conference on Neural Information Processing Systems (ACM, Long Beach, CA, USA, 2017).
Petrov, P. N. & Moerner, W. E. Addressing systematic errors in axial distance measurements in singleemitter localization microscopy. Opt. Express 28, 18616–18632 (2020).
Montavon, G., Samek, W. & Müller, K. R. Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73, 1–15 (2018).
Selvaraju, R. R. et al. GradCAM: visual explanations from deep networks via gradientbased localization. In: Proc. 2017 IEEE International Conference on Computer Vision (ICCV). (IEEE, Venice, Italy, 2017).
Çiçek, Ö. et al. 3D UNet: learning dense volumetric segmentation from sparse annotation. In: Proc. 19th International Conference on Medical Image Computing and ComputerAssisted Intervention – MICCAI 2016. 424–432. (Springer, Athens, Greece, 2016).
Chhetri, R. K. et al. Wholeanimal functional and developmental imaging with isotropic spatial resolution. Nat. Methods 12, 1171–1178 (2015).
Kumar, A. et al. Dualview plane illumination microscopy for rapid and spatially isotropic imaging. Nat. Protoc. 9, 2555–2573 (2014).
Wu, Y. C. et al. Spatially isotropic fourdimensional imaging with dualview plane illumination microscopy. Nat. Biotechnol. 31, 1032–1038 (2013).
Swoger, J. et al. Multiview image fusion improves resolution in threedimensional microscopy. Opt. Express 15, 8029–8042 (2007).
Thevenaz, P., Ruttimann, U. E. & Unser, M. A pyramid approach to subpixel registration based on intensity. IEEE Trans. Image Process. 7, 27–41 (1998).
Forster, B. et al. Complex wavelets for extended depthoffield: a new method for the fusion of multichannel microscopy images. Microsc. Res. Tech. 65, 33–42 (2004).
Shi, X. J. et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proc. 28th International Conference on Neural Information Processing Systems. (ACM, Montreal, Quebec, Canada, 2015).
Graves, A. et al. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855–868 (2009).
Gregor, K. et al. DRAW: a recurrent neural network for image generation. In Proc. 32nd Internnational Conference on Machine Learning 2015. 14621471. (PMLR, Lille, France, 2015).
Sharma, A., Grau, O. & Fritz, M. VConvDAE: deep volumetric shape learning without object labels. In Proc. 14th European Conference on Computer Vision (ECCV) 2016. 236250. (Springer, Amsterdam, The Netherlands, 2016).
Kingma, D. P. & Welling, M. Autoencoding variational bayes. Preprint at http://arxiv.org/abs/1312.6114 (2014).
Wang, W. Y. et al. Shape inpainting using 3D generative adversarial network and recurrent convolutional networks. In: Proc. 2017 IEEE International Conference on Computer Vision (ICCV). 2317–2325. (IEEE, Venice, Italy, 2017).
Chen, J. X. et al. Combining fully convolutional and recurrent neural networks for 3D biomedical image segmentation. In: Proc. 30th International Conference on Neural Information Processing Systems. (ACM, Barcelona, Spain, 2016).
Tseng, K. L. et al. Joint sequence learning and crossmodality convolution for 3D biomedical segmentation. In: Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3739–3746. (IEEE, Honolulu, HI, 2017).
Ronneberger, O., Fischer, P. & Brox, T. UNet: convolutional networks for biomedical image segmentation. Preprint at http://arxiv.org/abs/1505.04597 (2015).
Zhou, Z. W. et al. UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39, 1856–1867 (2020).
Liu, P. J. et al. Multilevel waveletCNN for image restoration. In: Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 886–88609. (IEEE, Salt Lake City, UT, USA, 2018).
Cho, K. et al. Learning phrase representations using RNN encoderdecoder for statistical machine translation. Preprint at http://arxiv.org/abs/1406.1078 (2014).
Hochreiter, S. & Schmidhuber, J. Long shortterm memory. Neural Comput. 9, 1735–1780 (1997).
Owen, A. B. A robust hybrid of lasso and ridge regression. in Prediction and Discovery (eds Verducci, J. S., Shen, X. T. & Lafferty, J.) 59–71 (American Mathematical Society, Providence, Rhode Island, 2007).
Laina, I. et al. Deeper depth prediction with fully convolutional residual networks. Preprint at http://arxiv.org/abs/1606.00373 (2016).
Wang, Z., Simoncelli, E. P. & Bovik, A. C. Multiscale structural similarity for image quality assessment. In: Proc. 37th Asilomar Conference on Signals, Systems & Computers. (IEEE, Pacific Grove, CA, USA, 2003, 1398–1402).
Goodfellow, I. J. et al. Generative adversarial nets. In: Proc. 27th International Conference on Neural Information Processing Systems. (ACM, Montreal, Quebec, Canada, 2014).
Zhao, H. et al. Loss functions for image restoration with neural networks. IEEE Trans. Computational Imaging 3, 47–57 (2017).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at http://arxiv.org/abs/1412.6980 (2017).
Acknowledgements
The Ozcan Lab at UCLA acknowledges the support of Koc Group, NSF and HHMI.
Author information
Authors and Affiliations
Contributions
L.H. devised the network, implemented network training/testing, analyzed the results and prepared the manuscript. H.C. assisted in performing network training and prepared the manuscript. Y.L. assisted in imaging experiments and data preprocessing. Y.R. prepared the manuscript and helped with experiments. A.O. supervised the research and prepared the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The author declares no competing interests.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Huang, L., Chen, H., Luo, Y. et al. Recurrent neural networkbased volumetric fluorescence microscopy. Light Sci Appl 10, 62 (2021). https://doi.org/10.1038/s41377021005069
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41377021005069
This article is cited by

Selfsupervised learning of hologram reconstruction using physics consistency
Nature Machine Intelligence (2023)

Multifocus lightfield microscopy for highspeed largevolume imaging
PhotoniX (2022)

Adaptive 3D descattering with a dynamic synthesis network
Light: Science & Applications (2022)

Fourier Imager Network (FIN): A deep neural network for hologram reconstruction with superior external generalization
Light: Science & Applications (2022)

Mirrorenhanced scanning lightfield microscopy for longterm highspeed 3D imaging with isotropic resolution
Light: Science & Applications (2021)