Bright-field holography: cross-modality deep learning enables snapshot 3D imaging with bright-field contrast using a single hologram

Digital holographic microscopy enables the 3D reconstruction of volumetric samples from a single-snapshot hologram. However, unlike a conventional bright-field microscopy image, the quality of holographic reconstructions is compromised by interference fringes as a result of twin images and out-of-plane objects. Here, we demonstrate that cross-modality deep learning using a generative adversarial network (GAN) can endow holographic images of a sample volume with bright-field microscopy contrast, combining the volumetric imaging capability of holography with the speckle- and artifact-free image contrast of incoherent bright-field microscopy. We illustrate the performance of this “bright-field holography” method through the snapshot imaging of bioaerosols distributed in 3D, matching the artifact-free image contrast and axial sectioning performance of a high-NA bright-field microscope. This data-driven deep-learning-based imaging method bridges the contrast gap between coherent and incoherent imaging, and enables the snapshot 3D imaging of objects with bright-field contrast from a single hologram, benefiting from the wave-propagation framework of holography.


Sample preparation
Dried pollen samples: Bermuda grass pollen (Cynodon dactylon), oak tree pollen (Quercus agrifolia), and ragweed pollen (Artemisia artemisifolia) were purchased from Stallergenes Greer (NC, USA) (cat #: 2,195, and 56 respectively) and mixed with a weight ratio of 2 : 3 : 1. The mixture was deposited onto a sticky coverslip from an impaction based air sampler for the 2D pollen sample. The mixture was also diluted into PDMS and cured on a glass slide for the 3D pollen sample.
Polystyrene bead sample with 1 µm diameter was purchased from Thermo Scientific (cat #: 5100A) and diluted 1000× by methanol. A droplet of 2.5 µL of diluted bead sample was pipetted onto a cleaned #1 coverslip and let dry.

Training data preparation
The success of the cross-modality transform behind bright-field holography relies on accurate registration of the back-propagated holograms with the scanning bright-field microscope images in 3D. This registration can be divided into two parts, also shown in Supplementary Fig. S2. The first part matches a bright-field image (2048 × 2048 pixels) to that of the hologram, with the following steps: (1) A stitched bright-field full-FOV image of ~20,000 × 4,000 pixels was generated by stitching together the middle planes of each bright-field microscope stack using ImageJ plugin Microscopy Image Stitching Tool (MIST) 1 . (2) The shade-corrected full-FOV hologram was back-propagated to a global focus distance determined by auto-focusing on a region of 512×512 pixels in the center of the hologram. (3) The brightfield full-FOV image was roughly registered to the back-propagated hologram full-FOV by fitting a rigid transformation through 3-5 pairs of manually selected matching points. (4) The bright-field full-FOV was then warped using this transformation, and the overlapping regions with hologram was cropped to generate matching pairs. The second part further refines the registration in x-y and z directions, with the following steps: (1) small FOV pairs (300×300 pixels) were selected from the cropped FOV. (2) Autofocusing was performed on each hologram patch to find the focus distance for this patch, denoted as ‫ݖ‬ ு . (3) The standard deviation (std) of each bright-field height within the stack was calculated, which provides a focus curve for the bright-field stack. A second-order polynomial fit was performed on four heights in the focus curve with highest std values, and the focus for this bright-field stack was determined to be the peak location of the fit, denoted as ‫ݖ‬ ி . (4) For each microscope scan in the stack at height ‫ݖ‬ ி , a corresponding hologram image was generated by back-propagating the hologram by the distance ‫ݖ‬ ி − ‫ݖ‬ ி + ‫ݖ‬ ு , where symmetric padding was used on the hologram during the propagation to avoid ringing artifacts. (5) The best focused plane in each stack, as well as five other randomly selected defocused planes were chosen.
(6) Pyramid elastic registration 2 was performed on the small FOV image pair closest to the focal plane, and the same registered warping was applied to the other five defocused image pairs to generate 6 aligned small FOV pairs in total. (7) The corresponding patches were cropped to 256×256 pixels in image size.
Since the pyramidal registration can sometimes fail to converge to the correct transformation, the generated dataset was also manually inspected to remove the data that had significant artifacts due to registration errors.

Details of network and training
The GAN implemented here consisted of a generator network and a discriminator network, as shown in channels of the bright-field image. Following the image registration and cropping, the dataset was divided into 75% for training, 15% for validation, and 10% for blind testing. The training data consisted of ~6,000 image pairs, which were further augmented to 30,000 by random rotation and flipping of the images. The validation data were not augmented.
During the training phase, the network iteratively minimized the generator loss L ୋ and discriminator loss L ୈ , defined as: where G൫x ሺ୧ሻ ൯ is the generator output for the input x ሺ୧ሻ , z ሺ୧ሻ is the corresponding target (bright-field) image, Dሺ. ሻ is the discriminator, and MAEሺ. ሻ stands for the mean absolute error, defined as: where the images have L×L pixels. N stands for the image batch size (e.g., N = 20), α is a balancing parameter for the GAN loss and the MAE loss in the L ୋ which was chosen as α =0.01 and as result, the GAN loss and MAE loss terms occupied 99% and 1% of the total loss, L ୋ , respectively. Adaptive momentum (Adam) optimizer was used to minimize L ୋ and L ୈ , with learning rate 10 ିସ and 3 × 10 ିହ respectively. In each iteration, six updates of the generator and three updates of the discriminator network were performed. The validation set was tested every 50 iterations, and the best network was chosen to be the one with the lowest MAE loss on the validation set. The network was implemented using TensorFlow 5 .

Estimation of the lateral and axial FWHM values for PSF analysis
A threshold was used on the most focused hologram plane to extract individual sub-regions, each of which contained a single bead. A 2D Gaussian fit 6 was performed on each sub-region to estimate the lateral PSF FWHM. The fitted centroid was used to crop x-z slices, and another 2D Gaussian fit was performed on each slice to estimate the axial PSF FWHM values for (i) the back-propagated hologram stacks, (ii) the network output stacks and (iii) the scanning bright-field microscope stacks. Histograms for the lateral and axial PSF FWHM were generated subsequently, as shown in Fig. 4.

Quantitative evaluation of image quality
Each network output image I ୭୳୲ was evaluated with reference to the corresponding ground truth (brightfield microscopy) image I ୋ using four different criteria: (1) root mean square error (RMSE), (2) correlation coefficient (Corr), (3) structural similarity (SSIM) 7 , and (4) universal image quality index (UIQI) 8 . RMSE is defined as: where L x and L y represent the number of pixels in the x and y directions, respectively.
Correlation coefficient is defined as: where ߪ ௨௧ and ߪ ீ் are the standard deviations of ‫ܫ‬ ௨௧ ܽ݊݀ ‫ܫ‬ ீ் respectively, and ߪ ௨௧,ீ் is the crossvariance between the two images.
SSIM is defined as: where ߤ ௨௧ ܽ݊݀ ߤ ீ் are the mean values of the images ‫ܫ‬ ௨௧ ܽ݊݀ ‫ܫ‬ ீ் , respectively. ‫ܥ‬ ଵ and ‫ܥ‬ ଶ are constants used to prevent division by a denominator close to zero.
Then the global UIQI was defined as the average of these local UIQIs: We used a window of size B = 8, same as in Ref. 8.
In addition to the above discussed measures, we also evaluated the image quality using the Blind Reference-less Image Spatial Quality Evaluator (BRISQUE), using a Matlab built-in function "brisque" 9 . Supplementary Fig. S1. Network structure. The numbers represent the size and the channels of each block. ReLU: rectified linear unit. Conv: convolutional layer.

Supplementary Figures and Captions
However, the GAN outputs are sharper and exhibit more information, which are visually more appealing than the CNN outputs. Each sample image is also evaluated using a non-reference Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) score, where the lower score represents better visual quality.
The BRISQUE scores are shown on the lower corner of the images, where GAN output images have a smaller/better score compared to CNN output images. that is used to report the results in the main text, which is detailed Supplementary Fig. S1. The CNN is the same generator network without using the discriminator and the adversarial loss. The encoder-decoder structure was constructed by removing the concatenation connections in the U-Net (gray arrows in Supplementary Fig. S1). The GAN with spectral normalization is the same structure as GAN, with spectral normalization performed on each convolutional layer of the discriminator. is composed of images of pollen mixture spread in 3D inside a polydimethylsiloxane (PDMS) substrate with ~ 800 µm thickness. The 3D pollen dataset only has testing images and is evaluated using the network trained with 2D pollen images. Both datasets include in-focus and de-focused pairs of images for training to capture the 3D light propagation behavior across the holographic and bright-field microscopy modalities. The image size of 3D pollen PDMS testing dataset is 1024×1024 pixels, the other images are of size 256×256 pixels.