Neural network-based image reconstruction in swept-source optical coherence tomography using undersampled spectral data

Optical coherence tomography (OCT) is a widely used non-invasive biomedical imaging modality that can rapidly provide volumetric images of samples. Here, we present a deep learning-based image reconstruction framework that can generate swept-source OCT (SS-OCT) images using undersampled spectral data, without any spatial aliasing artifacts. This neural network-based image reconstruction does not require any hardware changes to the optical setup and can be easily integrated with existing swept-source or spectral-domain OCT systems to reduce the amount of raw spectral data to be acquired. To show the efficacy of this framework, we trained and blindly tested a deep neural network using mouse embryo samples imaged by an SS-OCT system. Using 2-fold undersampled spectral data (i.e., 640 spectral points per A-line), the trained neural network can blindly reconstruct 512 A-lines in 0.59 ms using multiple graphics-processing units (GPUs), removing spatial aliasing artifacts due to spectral undersampling, also presenting a very good match to the images of the same samples, reconstructed using the full spectral OCT data (i.e., 1280 spectral points per A-line). We also successfully demonstrate that this framework can be further extended to process 3× undersampled spectral data per A-line, with some performance degradation in the reconstructed image quality compared to 2× spectral undersampling. Furthermore, an A-line-optimized undersampling method is presented by jointly optimizing the spectral sampling locations and the corresponding image reconstruction network, which improved the overall imaging performance using less spectral data points per A-line compared to 2× or 3× spectral undersampling results. This deep learning-enabled image reconstruction approach can be broadly used in various forms of spectral-domain OCT systems, helping to increase their imaging speed without sacrificing image resolution and signal-to-noise ratio.


DL-OCT on undersampled squeezed spectral data
All pre-processing methods for DL-OCT reported in the main text either interpolate or zero-pad the undersampled spectral data to its original size (before the undersampling). To further speed up the preprocessing step needed to feed image data to the reconstruction neural network, here we investigate the use of undersampled spectral data without any interpolation/padding steps.
After the m× spectral undersampling (e.g., m = 2 or 3), we squeeze the spectral data to 1/m of its original size without any interpolation/padding. Then, we apply a Fast Fourier transform (FFT) to this squeezed spectral data to obtain the undersampled OCT image. One example field-of-view resulting from this step is shown in Fig. S1(a-c). Next, we apply a combination of simple spatial transforms to convert the undersampled squeezed OCT images into a form that is equivalent to the undersampled OCT images processed with zero interpolation (shown in Fig. S1(d-e)). Using the undersampled squeezed A-line signal g [n] of length N, the zero interpolated signal g 0 [n] with length mN can be represented as: shows that the DFT of m× undersampled signal with zero interpolation is composed of m copies of the DFT of the squeezed undersampled signal, g [n]. Furthermore, since g[n] is real (optical intensity at the photodetector), the amplitude of DFT(g[n]) has even symmetry, meaning that half of the amplitude of DFT(g[n]) contains all the information to fully characterize the amplitude of DFT(g 0 [n]); this feature justifies the copy/flipping and concatenation operation shown in Figs. S1-S2.
We performed a quantitative comparison between the above detailed spectral squeezing method and the zero interpolation method on 2× and 3× undersampled spectral data. The image reconstruction results are summarized in Fig. S2. Through visual inspection, one can conclude that the undersampled, squeezed spectral data can be used to reconstruct the same high-quality OCT images that are achieved using the undersampled images processed using zero interpolation. To further compare the above methods quantitatively, we calculated the average PSNR and SSIM values for 13,131 test image patches, which revealed very similar results using the two methods (see Table S1).
Although the undersampled squeezed OCT image contains all the needed information, directly using it as the network input (without the copy/flipping and concatenation operations) leads to spatial artifacts at the network output (see Fig. S3) since this represents a more complex and challenging learning and inference task for the U-net structured model that we have employed in our work.
In summary, we demonstrated the reconstruction of high-quality OCT images by using undersampled squeezed spectral data. With m-fold undersampled and squeezed spectral data, the computation time of an FFT operation can be decreased approximately by m-fold, and since the additional computational time needed for the flipping and concatenation operations is negligible compared to an FFT operation, the preprocessing method using squeezed, undersampled spectral data requires approximately (1/m)-th of the conventional OCT reconstruction time.

DL-OCT performance on different types of tissue
To further test the robustness of DL-OCT, we investigated its performance on different types of tissue imaged by an SS-OCT system, which had a central wavelength of ~1300 nm, a sweep range of ~100 nm, and an incident power of ~18 mW. The axial and transverse resolutions of the system have been characterized as ~12 µm and ~10 µm, respectively, in air. For this new dataset, a sample area of 5.5 mm × 5.5 mm × 7.27 mm (X, Y, Z) was imaged. Each raw A-scan consisted of 2560 spectral data points that were sampled linearly in the wavenumber domain by a k-clock on the SS-OCT system. Each B-scan consisted of 1000 A-scans, and each sample volume consisted of 1000 B-scans. After 2× undersampling, zero interpolation and OCT reconstruction, the resulting 2× undersampled images were partitioned into patches of 1280×320 pixels before being fed into the reconstruction neural network.
In our data acquisition, 7 human subjects and 4 mouse subjects were involved. For human subjects, 6 different types of samples, including finger, nail, palm, wrist, limbus of eye, and anterior chamber of eye, were imaged for each subject (a total of 42 human samples). For mouse subjects, the eye for each subject was imaged (resulting in a total of 4 mouse samples). In total, this dataset contains 7 different types of human/mouse tissue with 46 samples. To separate the network training and blind testing, one sample for each type of human/mouse tissue was reserved for blind testing, and the remaining 39 samples were mixed and used for training. Stated differently, for each human tissue type, one subject (and his/her sample) was left out for blind testing; similarly, one mouse subject was left out for blind testing.
The results of our blind testing for different types of tissue are summarized in Fig. S4. When visually inspected, one can see that DL-OCT successfully removed the spatial aliasing artifacts for all these different types of tissue. Quantitatively, DL-OCT achieved an average 28.7683 dB PSNR and 0.7239 SSIM on 7,000 full-frame test images.  Figure S1. Schematic of the pre-processing method for DL-OCT using 2× undersampled, squeezed spectral data. Figure S2. Comparison of blind testing results using undersampled, squeezed spectral data and undersampled spectral data with zero interpolation. (a) Comparison of 2× spectral undersampling results. (b) Comparison of 3× spectral undersampling results. PSNR and SSIM values are displayed for each one of these fields-of-view. Figure S3. Comparison of blind testing results obtained using networks trained with 2× undersampled, squeezed spectral data without any spatial transformations vs. 2× undersampled spectral data with zero interpolation. These results confirm the significance of the spatial transformations reported in Fig. S2 in terms of the final image reconstruction quality. Although the undersampled, squeezed image contains all the needed information, directly using it as the network input (without the copy/flipping and concatenation operations reported in Fig. S2) leads to spatial artifacts at the network output (shown in the second column). Figure S4. Blind testing performance of the DL-OCT on seven different types of tissue. PSNR and SSIM values are also displayed for each one of these fields-of-view. Figure S5. Comparison of DL-OCT network input workflow using 2× spectral undersampling, 3× spectral undersampling and A-line-optimized spectral undersampling methods. Raw OCT fringes were processed with 2× undersampling grid (N spec = 640), 3× undersampling grid (N spec = 427) and A-lineoptimized undersampling grid (N spec = 407), respectively. The resulting undersampled spectral fringes were then reconstructed to yield the corresponding undersampled OCT images, which served as the network input images (to be reconstructed). The A-line-optimized undersampling grid shown in the second column is a result of the A-line-optimized spectral undersampling method described in Fig. 8 of the main text.