Integrated deep learning framework for accelerated optical coherence tomography angiography

Label-free optical coherence tomography angiography (OCTA) has become a premium imaging tool in clinics to obtain structural and functional information of microvasculatures. One primary technical drawback for OCTA, however, is its imaging speed. The current protocols require high sampling density and multiple acquisitions of cross-sectional B-scans to form one image frame, resulting in low acquisition speed. Recently, deep learning (DL)-based methods have gained attention in accelerating the OCTA acquisition process. They achieve faster acquisition using two independent reconstructing approaches: high-quality angiograms from a few repeated B-scans and high-resolution angiograms from undersampled data. While these approaches have shown promising results, they provide limited solutions that only partially account for the OCTA scanning mechanism. Herein, we propose an integrated DL method to simultaneously tackle both factors and further enhance the reconstruction performance in speed and quality. We designed an end-to-end deep neural network (DNN) framework with a two-staged adversarial training scheme to reconstruct fully-sampled, high-quality (8 repeated B-scans) angiograms from their corresponding undersampled, low-quality (2 repeated B-scans) counterparts by successively enhancing the pixel resolution and the image quality. Using an in-vivo mouse brain vasculature dataset, we evaluate our proposed framework through quantitative and qualitative assessments and demonstrate that our method can achieve superior reconstruction performance compared to the conventional means. Our DL-based framework can accelerate the OCTA imaging speed from 16 to 256\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× while preserving the image quality, thus enabling a convenient software-only solution to enhance preclinical and clinical studies.

Data acquisition. Our dataset comprises in-vivo brain microvasculature images of 8-week-old male mice (C57BL/6J). After removing the scalp and skull (craniotomy details can be found in 7 ), the samples were sealed with glass coverslips and imaged using a custom-designed spectral-domain OCT system from the University of Washington Seattle 8 . The system utilizes a broadband superluminescent diode as the light source with a central wavelength of 1340 nm and a 3 dB spectral bandwidth of 110 nm. A biaxial galvanometer is utilized to raster scan the probing light using a scanning step size of 10 µ m, and a fast spectrometer detects the resulting interference signals at an A-line scan rate of 92 kHz. The system's axial and lateral resolutions are 7 µ m each, and a 3 mm × 3 mm field of view (FOV) was utilized. For each volumetric profile, 400 slow-axis locations were sampled at a B-scan rate of 180 Hz, containing cross-sectional structural B-scans with a pixel size of 400 × 1024 (400 A-lines per scan). As illustrated in Fig. 1, a previously reported OCTA algorithm 7 was used to register eight sequential B-scans at each slow-axis location to generate a high-quality cross-sectional angiogram. We chose to utilize eight sequential B-scans since it is known that the angiogram quality (e.g., signal-to-noise ratio, vessel connectivity) saturates near eight repeated B-scans 19 . We downsampled the structural volumes in the enface plane, and two sequential downsampled B-scans were randomly selected to create a corresponding cross-sectional angiogram with low resolution and quality. The downsampling procedure was conducted to mimic the OCTA systems' rasterized scanning principle. For example, if the artificial scanning step size is 2 × larger than the fully-sampled step size, we downsample each axis in the enface plane by a ratio of two. The fully-sampled angiograms' enface pixel resolution was 400 × 400, and we investigated downsampling ratios of 2 × , 4 × , and 8 × in this study. For pre-processing, the pixel values were normalized in the range of [0, 1] to accelerate the deep neural network (DNN) training. In total, 29 pairs of volumetric angiograms were acquired from 11 different biosamples. The acquired volumetric angiograms were sectioned in 2D slices along the depth direction as enface representations. The enface angiograms from two sequential downsampled B-scans are low in resolution and quality. They are denoted as (LR, LQ) for the remainder of the paper and used as the input data to our DNN framework. Similarly, the high-resolution, high-quality enface angiograms from eight sequential fully-sampled B-scans are denoted (HR, HQ) and used as the target for training and ground truth for evaluation. (LR, LQ) and (HR, HQ) angiograms were paired according to the downsampling ratio.
The MIP enface angiograms are conventionally utilized to visualize the vascular profile in angiography applications. However, the number of volumetric angiograms available in this study is limited, leading to an insufficient amount of MIP data (29 total) to establish our DNNs. To address the limited data issue, we used 2000 depth-wise sectioned enface angiograms from seven volumetric pairs as our training and validation data which were randomly split at a ratio of 9:1. As illustrated in Fig. 2, sectioned enface angiograms differ from MIP enface angiograms in that void spaces are common in the sectioned enface angiograms. Thus, we manually selected images with sufficient vessel profiles to use as our training and validation data. The training dataset was used    www.nature.com/scientificreports/ for optimizing the stochastic gradient descent algorithm, and the validation dataset was used during training to monitor the loss value and finetune the hyper-parameter settings. Random rotation and flipping in the x and y axes were applied to augment the training dataset. As for the test data, 200 sectioned enface angiograms were manually selected from an independent volumetric pair to conduct a comparative study and select the bestperforming method. Furthermore, 21 MIP enface angiograms from the remaining volumetric pairs were utilized to evaluate the selected methods. A summary description of the dataset is provided in Table 1.

Deep learning architectures.
We have developed an end-to-end CNN framework to reconstruct (HR, HQ) enface angiograms from their (LR, LQ) counterparts. As shown in Fig. 3, our reconstruction model is structured in two-stages and contains two constituent modules that operate in series, i.e., the super-resolution module followed by the quality-enhancing module. The super-resolution module compensates the pixel undersampling, and the quality-enhancing module compensates the B-scan repetition number. Our goal is to establish an end-to-end framework consisting of two sub-modules performing each task successively. The two-stage design should and has in our experiments been found to demonstrate better reconstruction performance than directly mapping (LR, LQ) angiograms to their (HR, HQ) counterparts. Further details on the proposed structure of the networks are summarized in Supplementary Table 1.
The super-resolution module first upsamples the pixel resolution of the undersampled, low quality angiograms (LR, LQ) to their fully-sampled, low-quality counterparts (HR, LQ). Note that (LR, LQ) and (HR, LQ) angiograms are constructed from two repeated B-scans and, therefore, of poor quality. Thus, the task of mapping (LR, LQ) to (HR, LQ) angiograms is conducting pixel-wise super-resolution to fill in the missing pixels from  www.nature.com/scientificreports/ undersampling. The module first extracts low-level features from the input angiograms using (9 × 9) convolution filters and then extracts high-level residuals using a series of repeated operations within the Deep layer. The Deep layer consists of five convolution blocks with (3 × 3) convolution filters and the parametric rectified linear unit (PReLU) activation function 20 . Convolution operations with a smaller receptive field are utilized to capture finer details of the input feature maps, and the PReLU function is employed as our primary activation function since it is known to accelerate deeper networks' training by incorporating a trainable leakage coefficient 20 . Two types of inter-block connections within the Deep layer are adopted: the dense connection 21 (Fig. 3a) and the residual connection 22 (Fig. 3b). We have adopted these inter-connection strategies as they have presented promising results for various previously reported bio-imaging applications 17,18,23 . The dense connection concatenates all subsequent convolution blocks' output with the original input and utilizes the result as the input to the next convolution block. This configuration strengthens feature propagation since the convolution blocks only need to learn refinements that augment the previous outputs. The residual connection sums the input with the convolution block's output, which is advantageous as the learning is alleviated to focus on the residuals. Also, these inter-block connections prevent the vanishing gradient problem and reduce the number of parameters for deeper networks. The extracted low-level and high-level features are merged using an element-wise summation operation. For the upsampling layer, we utilize the pixel shuffle operation 24 to avoid the checkerboard artifacts that are commonly observed when using transposed convolutions. Unlike the previous approach, where the input image is pre-upsampled using filter-based methods 18,23 , we let the network learn the upsampling filters directly to increase performance 25 .
The quality enhancing module enhances the fully-sampled, low-quality output from the previous module and reconstructs the fully-sampled, high-quality angiograms (HR, HQ). The module specifically compensates for the B-scan repetition number (from two repeated B-scans to eight) and thus enhances the angiograms quality. It is worth mentioning that the feature map size is preserved throughout the module as the input and the ground truth (HR, HQ) angiograms are spatially aligned. For the most part, the input and the ground truth angiograms share similar low-frequency information but differ in the high-frequency details. Thus, we designed our module with (9 × 9) convolution operation to extract the low-frequency information. The operations within the Deep layer focus on reconstructing the high-frequency residuals. The two components are merged using the element-wise summation operation. Finally, a 400 × 400 reconstructed angiogram is obtained through a (9 × 9) convolution at the end of the module.
As mentioned before, the modules are concatenated in series to form a single reconstruction network. We establish two comparative network models based on the inter-block connection type adopted in the Deep layer: the DenseReconstNet (dense connection) and the ResReconstNet (residual connection).
Model training and evaluation. Several training strategies were employed to train our networks and optimize their trainable parameters. For the loss function L, we diverge from the de facto mean squared error (MSE) loss used in the previous studies 17, 18 and instead propose a perceptually improved loss function by combining different loss terms. While the MSE loss achieves high PSNR, minimizing the MSE loss encourages reconstructing pixel-wise averages of plausible outcomes, typically over-smooth and thus failing to capture the highfrequency details of small vessel profiles 25 . We utilize the mean absolute error (MAE) loss as the primary loss function since it is known to have better local convexity 26 : N denotes the number of pixels in each angiogram, Y and Ŷ denotes the ground truth and the reconstructed angiogram, respectively. In addition to the primary loss function, we incorporate two other loss terms to aid in reconstructing the high-frequency details: The first term is the multiscale structural similarity (MS-SSIM 27 ) loss, with the MSSSIM(·) function calculating the corresponding metric. The MS-SSIM loss is advantageous in that it adeptly preserves the contrast in high-frequency regions when utilized along with the MAE loss 26 . The second term is the Fourier MAE loss (FMAE), which calculates the MAE from the magnitude of the angiograms' 2D Fourier transforms. We choose FMAE to provide information on the vessel orientations and perceive the downsampling remnants as frequency corruptions 23 . The general strategy in incorporating different loss terms is that the primary loss function reconstructs the overall structure and the angiograms' low-frequency components. In contrast, the two sub-loss terms reconstruct the high-frequency details and remove the angiograms' frequency-space artifacts. In total, our loss function L is designed by linearly combining counterparts and is defined as follows: All the terms included in our loss function are fully differentiable, which is necessary for the neural network's backpropagation algorithm. The weighting values for the different loss terms were referenced from previous literature 23, 26 and finetuned by heuristic searching and were found to be sufficient for all of the tested models. (1) (2) . www.nature.com/scientificreports/ We mention that our proposed loss function should and has in our experiments been found to provide improved reconstruction performance and training stability than solely using the MAE loss. Each module is sequentially trained during each training iteration. The super-resolution module is trained first, followed by the quality enhancing module using the previous module's output. At the end of each iteration, the two modules are concatenated as an end-to-end network and finetuned using a lower learning rate. Trainable parameters were initialized using the He normal initialization method 20 and optimized using the Adam optimizer 28 . Early stopping and L2 regularization 29 was also employed to avoid over-fitting the network parameters. After successfully establishing the DenseReconstNet and the ResReconstNet, we further trained each network using an adversarial training scheme to boost the reconstruction performance. As illustrated in Fig. 4, we define a discriminator network D, which we optimized in an alternating manner with our pre-trained reconstruction model G to solve the adversarial min-max problem 30 : X denotes the (LR, LQ) angiogram taken as the input to our reconstruction model. The idea is that we train our reconstruction model to fool the discriminator that distinguishes the reconstructed angiograms from their (HR, HQ) counterparts. The proposed training strategy allows our reconstruction model to create perceptually superior solutions residing in the manifold of the (HR, HQ) angiograms. The adversarial loss function for the reconstruction models is defined as follows 30 : The weight value for the adversarial loss term was referenced from previous literature 25 and finetuned by heuristic searching and was found to be sufficient for all of the tested models. The established generative adversarial networks (GANs) are denoted as DenseReconstGAN and ResRecontGAN. Details on the discriminator network's structure can be found in Supplementary Table 2. All hyper-parameters were searched for by using the random search strategy 31 and their details can be found in Supplementary Table 3.
As for the evaluation metric, we consider the perceptual indices of SSIM 32 , MS-SSIM 27 , and the PSNR to assess the reconstruction performance. In addition, we assess the angiograms' contrast by calculating the rootmean-square (RMS) contrast 33 and the vessel connectivity by calculating the ratio of connected flow pixels to the total number of skeletonized pixels 18,19 .

Results and discussion
In this section, we present and discuss the experimental results of our OCTA reconstruction method. We again emphasize that our task differs from image super-resolution in the strict sense because our method compensates for both the pixel undersampling (image resolution) and the B-scan repetition number (image quality). However, no previous work has attempted to address both factors in a single integrated framework. Therefore, we compare our method with existing super-resolution methods, including interpolation-based image upsampling methods (i.e., nearest-neighbor, bicubic, and Lanczos interpolation) and a previously reported DL-based superresolution network for retinal OCT angiograms called the high-resolution angiogram reconstruction network (HARNet) 18 . Implementation details for HARNet were carefully examined from the previous literature 18 , and its hyper-parameters were optimized using the random search strategy 31 (Supplementary Table 3). We first quantitatively compare the reconstruction performance of the presented methods using the test dataset consisting of sectioned enface angiograms. Once the methods showing the most promising results are identified, we further assess quantitative and qualitative results on the test dataset consisting of MIP enface angiograms from various in-vivo samples.
Quantitative comparison using sectioned enface angiograms. To emphasize the advantage of our proposed training strategy (i.e., two-staged training with the perceptual loss function followed by adversarial training), we conduct a full ablation study in addition to the comparative study with the baseline methods, and quantitative results are summarized in Table 2. We mention that although HARNet was originally proposed as a super-resolution framework, it was trained to map (LR, LQ) images to (HR, HQ) images in our experiments and thus learns to compensate for both the pixel undersampling (image resolution) and the B-scan repetition  www.nature.com/scientificreports/ number (image quality). Quality enhancement filters (e.g., median, bilateral, vesselness) were not used for the interpolation-based methods since sectioned enface angiograms do not contain full vascular profiles (see Fig. 2), and applying such filters deteriorated the results. For the same reason, angiogram quality measures (i.e., the contrast and vessel connectivity) were not assessed, and only the image reconstruction metrics were considered (i.e., SSIM, MS-SSIM, and PSNR). However, an additional quality enhancement filter was applied for the interpolation-based method when evaluating the MIP enface angiograms in the following section, and all metrics were assessed. The models trained with our proposed training strategy (ResReconstGAN and DenseReconstGAN) outperform all baseline methods, including the interpolation-based methods and the previously reported HARNet. When solely using the MAE loss, the networks trained with the two-staged training scheme show improved performance over those trained in a single-stage manner that directly maps (LR, LQ) angiograms to their (HR, HQ) counterparts, including HARNet. The two-staged training strategy enhances the reconstruction performance since it provides additional intermediate supervision during training to perform super-resolution to (HR, LQ) angiograms. Furthermore, Table 2 shows that training with our proposed perceptual loss function and the adversarial training scheme further boosts the reconstruction performance as the metrics improved over the two-staged networks trained solely using the MAE loss. The results follow the intuition that the sub-loss terms and the adversarial loss push the reconstructed results to the (HR, HQ) angiograms' manifold. The dense connection-based models (DenseReconstNet and DenseReconstGAN) outperform the residual connection-based models (HARNet, ResReconstNet, and ResReconstGAN), which is consistent with the consensus that deeper networks benefit from their ability to model mappings of higher complexity 34,35 . It is worth mentioning that the previously reported HARNet shows poor reconstruction performance even compared to our single-stage models trained using the MAE loss. One possible reason could be that while it is known that pre-upsampling the input image using filter-based methods deteriorates the results 25 , our networks learn the upsampling filters directly in contrast to HARNet. DenseReconstGAN consistently shows the best performance amongst all comparative methods for each of the downsampling ratios. HARNet shows the best performance amongst the baseline methods, and the bicubic interpolation method outperforms all other interpolation-based methods. Therefore, we focus our analysis on comparing the three methods for assessing the MIP test dataset, and the results are presented in the following section.
Quantitative and qualitative comparison using MIP enface angiograms. We compare the performance of the DenseReconstGAN, HARNet, and the bicubic interpolation method for the test dataset consisting of MIP enface images taken from various biosamples. The quantitative results of the three comparison methods are presented in Table 3. A bilateral filter was applied to the bicubic interpolation results to enhance the angiogram quality for fair comparisons with the DL-based methods. The results indicate that our DenseReconstGAN is superior to the baseline methods in terms of image reconstruction measures (i.e., SSIM, MS-SSIM, and PSNR) and angiogram quality measures (i.e., the contrast and vessel connectivity) for all downsampling ratios. Also, the results show that our DNN, trained with sectioned enface images, can generalize to full vascular profiles in MIP enface images. However, we notice that all evaluation metrics using the MIP enface dataset are lower than the sectioned enface dataset (see Table 2), especially for higher downsampling ratios of four and eight. The main reason is that the void regions in the sectioned enface angiograms yield high evaluation metrics, for they do not include contents for reconstruction (see Fig. 2). In addition, assessing the MIP enface angiograms require higher generalizability, as they are taken from various in-vivo samples. Nevertheless, our DenseReconstGAN still exhibits notable metrics at downsampling ratios of two and four and significantly outperforms the baseline methods, including the previously reported HARNet, which confirms the superiority of our proposed DL-based framework. www.nature.com/scientificreports/ Figure 5 illustrates the qualitative performance of our DenseReconstGAN and HARNet on an (LR, LQ) MIP enface angiogram with a downsampling ratio of two. In the enlarged regions of interest (ROI), it can be seen that HARNet suffers from contrast loss and oversmooths the vascular details, resulting in dilated vessel structures. In contrast, our DenseReconstGAN demonstrates improved reconstruction precision with obvious advantages in recovering the high-frequency microvascular details owing to our improved training strategy. Since capillary morphology provides critical information about vascular pathology (e.g., capillary telangiectasias), we specifically compare the reconstruction performance of very small vessels ( ∼ 50 µ m in diameter) along the dashed lines within each ROI. It can be observed that the input (LR, LQ) angiogram exhibits distorted and out-of-phase vessel profiles due to spatial aliasing. HARNet fails to distinguish vessels of smaller diameters (vessel profiles in the distance range of 100-125 µ m, 150-175 µ m for the red ROI, and 75-100 µ m blue ROI) and severely loses contrast (0-25 µ m red ROI and 150-200 µ m blue ROI). Conversely, DenseReconstGAN reconstructs in-phase microvascular profiles with relatively small contrast loss. In addition, the vessel profiles indicate that our Dens-eReconstGAN produces no false blood flow signals. These results are better reflected in the angiograms' contrast and vessel connectivity values in Table 3, as DenseReconstGAN shows superior values for all downsampling ratios. Some minor limitations can be observed using our method in certain poorly sampled regions. For example, local smoothing (115-150 µ m red ROI) is observed in regions where spatial aliasing is most severe in the input (LR, LQ) angiogram. In addition, DenseReconstGAN also suffers from wavy motion artifacts, as can be observed in the red ROI. Our DL model is more prone to these artifacts since high expression capability, unfortunately, enhances noise. Our framework does not make efforts to correct these artifacts since commercial systems can easily address these artifacts by tracking the scan acquisition level and utilizing related software as described in Refs. 36,37 . Despite the above limitations, the qualitative results indicate that our proposed DL framework exhibits significantly superior performance than the bicubic interpolation method with bilateral filtering and the previously reported HARNet in reconstructing the microvascular profiles.
We present a qualitative comparison of the two methods at higher downsampling ratios of four and eight (Fig. 6). A clear distinction can be made regarding the quality of vesselness, as the difference between the two methods becomes more significant with sparse data. At a downsampling ratio of four, oversmoothing and contrast loss worsens for HARNet, whereas microvascular structure and contrast are reasonably preserved for our DenseReconstGAN. Similarly, at a downsampling ratio of eight, HARNet exhibits biologically dubious features (jagged and disjointed vessels), while our DenseReconstGAN reconstructs smoother and rounder vessel profiles. However, at such a high downsampling ratio, the performance of our DL model deteriorates severely. Since the downsampling ratio of eight is equivalent to a scanning step size of 80 µ m, and the scanning step size far exceeds the maximum Nyquist limit required to capture the capillary resolution ( ∼ 10 µm 38 ), the degraded results are expected. Thus, the desired application context should always be considered when selecting the appropriate scanning step size to meet the desired performance. For example, a large scanning step size (e.g., 40 µ m) is more appropriate for clinical applications requiring fast imaging speed but tolerant to the low image quality. Nevertheless, our DL framework still outperforms the previously reported super-resolution network HARNet in reconstructing the vasculature's bulk physiology when using larger scanning step sizes of 40 and 80 µ m. Thus, Table 3. Performance metrics of our DenseReconstGAN, HARNet, and the bicubic interpolation method with bilateral filtering evaluated on the test dataset composed of maximum intensity projection (MIP) enface angiograms from various biosamples. SSIM structural similarity index measure, MS-SSIM multiscale structural similarity index measure, PSNR peak signal-to-noise ratio, FPS frames per second. Significance values are given in bold. www.nature.com/scientificreports/ our method expands the range of usable scanning step sizes in OCTA systems, which were considered unavailable previously. We also compare the computation time and the mean frames per second (FPS) rate is provided in Table 3  (CPU and GPU details are listed in Supplementary Table 3

Conclusions
This study proposed a deep learning-based software-only solution to accelerate the OCTA systems' imaging speed while preserving the image quality. The main contributions of the work are summarized as follows. First, we tackled all existing inherent factors related to the slow OCTA imaging speed, including the B-scan repetition number and the sampling density. While previous works only partially addressed the limiting factors, our work considered all aspects in a single, end-to-end framework. As a result, our approach achieved an acceleration of OCTA imaging by factors of up to 16-256× . Our method is advantageous since it can be directly applied to clinically available systems without increasing the system complexity. In addition, larger FOVs can be captured to improve clinical observations without sacrificing spatial resolution. Second, we established a novel DNN framework by leveraging state-of-the-art deep learning principles (i.e., network architectures, loss functions, and training strategies). In particular, we showed that the DenseReconstGAN exhibited the best performance and significantly outperformed the baseline methods, including the previously reported HARNet, in both quantitative and qualitative aspects for all of the experimented cases (Tables 2, 3, Figs. 5, 6). The results indicate that our proposed DNN framework is more suitable for the current application of compensating both the B-scan repetition number and the sampling density.
In the future, we aim to continually refine our DL framework's generalizability by training with more MIP enface images from various in-vivo samples. We also aim to extend our framework to other pathological applications (e.g., retinal angiography). By combining our established framework with transfer learning techniques 39 , we can avoid acquiring a large amount of data that is required for re-training. We hope to utilize our integrated framework to broaden OCTA systems' clinical applicability to rapidly and accurately delineate pathological biomarkers and achieve potential near real-time functional imaging capability.