Cone-beam CT image quality improvement using Cycle-Deblur consistent adversarial networks (Cycle-Deblur GAN) for chest CT imaging in breast cancer patients

Cone-beam computed tomography (CBCT) integrated with a linear accelerator is widely used to increase the accuracy of radiotherapy and plays an important role in image-guided radiotherapy (IGRT). For comparison with fan-beam computed tomography (FBCT), the image quality of CBCT is indistinct due to X-ray scattering, noise, and artefacts. We proposed a deep learning model, “Cycle-Deblur GAN”, combined with CycleGAN and Deblur-GAN models to improve the image quality of chest CBCT images. The 8706 CBCT and FBCT image pairs were used for training, and 1150 image pairs were used for testing in deep learning. The generated CBCT images from the Cycle-Deblur GAN model demonstrated closer CT values to FBCT in the lung, breast, mediastinum, and sternum compared to the CycleGAN and RED-CNN models. The quantitative evaluations of MAE, PSNR, and SSIM for CBCT generated from the Cycle-Deblur GAN model demonstrated better results than the CycleGAN and RED-CNN models. The Cycle-Deblur GAN model improved image quality and CT-value accuracy and preserved structural details for chest CBCT images.


Methods
Data pre-processing. Fifteen breast cancer patients were enrolled in this study. Before radiotherapy, each patient underwent planning CT, that is, the FBCT acquired by a Big Bore CT scanner (Discovery CT590 RT, GE company, Boston, USA), for treatment planning. The acquisition parameters of the GE CT scanner were detector rows of 16, helical scan pitch of 0.938:1, slice thickness of 2.5 mm, and FOV of 50 cm. The adaptive statistical iterative reconstruction (ASiR) algorithm of 30% (SS30) was selected to reconstruct FBCT images. The SS30 denotes the selected ASiR mode as slice statistical reconstruction mode with 30% of the 100% ASiR, which was reconstructed with the original image 30 . The reconstructed FBCT images were used for this study. During every treatment fraction, CBCT was performed for image registration. There are 185 CBCT image datasets, in light of 9856 CBCT images acquired by X-ray Volumetric Imager 10,11 (XVI system, version R5.0, Elekta company, Stockholm, Sweden) using an optimization of Feldkamp backprojection reconstruction algorithm for training and testing. The acquisition parameters of XVI VolumeView were voltage of the X-ray tube of 120 kVp, current of 40 mA, acquisition time of 120 s, frame rate of 5.5 frames per second, and voxel size of 1 mm × 1 mm × 1 mm for chest CBCT images. An M20 protocol was selected for XVI acquisition, "M" was the FOV of 42.5 cm × 42.5 cm at the kV detector panel, and "20" was 27.67 cm in length at the isocenter of the field projections. FBCT, which was performed once during CT simulation, was used as the ground truth for each CBCT set for this study. Among the images from the fifteen patients, those from three patients (1150 images) were kept for testing, and those from the remaining 12 patients (8706 images) were used to train the network. Pre-processing to build an image pair of CBCT and FBCT was as follows. The CBCT images for each patient were three-dimensionally pre-aligned to each of the FBCT images by rigid registration by PMOD software (Version 3.7, PMOD Technologies, Zurich, Switzerland). To avoid any adverse impact from non-anatomical structures on a CBCT to FBCT registration and as a model training procedure, binary masks were created to separate the body region from non-anatomical regions. These masks were created by finding the maximum convex hull with a threshold CT value of − 1000. CycleGAN does not need paired data, and we could use unpaired images, which may provide various characteristics for model training 28 . To accelerate the model training time, we still used paired data for modelling. We also clipped all image sizes from 512 × 512 down to 264 × 336 to minimize the anatomical region to accelerate the calculation time. We normalized the CT values of CBCT and FBCT images from a range of − 950 to 500 into a range of 0-1. Hence, the pixels with CT values of less than − 950 were assigned to 0, and those with CT values of higher than 500 were assigned to 1. The scale range for modelling is 0-1.

Image modelling.
An overview of the model architecture is illustrated in Fig. 1. For generator G, we adopted the architecture for our generative networks from Kupyn et al. 27 . They proposed Deblur-GAN, which showed impressive results for generating synthetic clear images from blurred images. The generator architecture is shown in Fig. 2. The inception block 31 adapted by GoogLeNet can extract different kinds of features by different sizes of convolutions. In our study, the features from the large and small ranges can provide specific results and make the boundary clearer. The generative network contained one convolution block with stride two, two www.nature.com/scientificreports/ inception blocks with stride two, nine residual blocks and two transposed convolution blocks. Each residual block consisted of a convolution layer, an instance normalization layer, and Swish activation 32,33 . The inception block, a collection of convolution layers of different sizes, such as the 1 × 1 convolution layer, 5 × 5 convolution layer, 9 × 9 convolution layer and 13 × 13 convolution layer, could capture detailed and brief characters without changing the image size. To concatenate different convolution layers in the inception block with different priorities, as shown in Fig. 3, we multiply different weights by 1, 5, 9 and 13. For discriminator D, shown in Fig. 4, inspired by Ledig et al. 34 , we followed architectural guidelines and dropped the last full connection from the last layer to the convolution layer with the input patch image, which aims to classify whether overlapping image patches were real or generated. Such a patch-level discriminator architecture had fewer parameters than a full-image discriminator and could work on arbitrarily sized images in a fully convolutional method.
Loss functions. Our goal in this study is to define a deep neural network that finds a suitable mapping function that minimizes the loss functions. Let x ∈ X and X was CBCT that was an input image. Let y ∈ Y and Y be FBCT, which was the ground truth image. In general, we can define the mapping function as Eq. (1)  www.nature.com/scientificreports/ where y is the synthesis FBCT image and G θ is the generative model that transforms x to y . To obtain decent y , the loss function for G θ to generate a synthesis image from the input is shown in Eq. (2) where (x i , y i ) are paired CBCT and FBCT images. Inspired by Kida et al. 29 , our model has two generative models G CF : X → Y and G FC : Y → X by input image pairs (x i , y i ) . The two different generative models are trained to synthesize different targets such that G CF generates synthesis FBCT by input CBCT; otherwise, G FC outputs synthesis CBCT by giving FBCT. Moreover, there are two adversarial discriminators D F and D C , which aim to distinguish whether the output of the generative model is real or synthesis. For example, given FBCT input y ,  www.nature.com/scientificreports/ G FC intends to generate synthesis CBCT x , which will be as similar as real CBCT x to foolish discriminator D C . In contrast, D F will judge the reconstructed y from generative model G CF by feeding synthesis CBCT x and real x . That is, a cyclic method in which the discriminators are not only discriminator synthesized CBCT (or FBCT) but also reconstructed CBCT. The key idea is that generators and discriminators are trained on each other to enhance their accuracy. Therefore, our objective function is a minimax problem, as shown in Eq. (3): Additionally, our novel networks include five types of loss functions: adversarial loss (adv); cycle-consistency loss (cycle); generated loss (generated); identity loss (identity); and Sobel filter loss (Sobel) 35 .
For the discriminators, we adopt WGAN-gp as our objective functions as Eqs. (4) and (5): where E[•] is the expectation operator, the first two terms are the negative Wasserstein distance, which determines how much better the real term is than the synthesized term, and the last term is the gradient penalty, in which is a regularization parameter, = εy Therefore, the overall loss function for the discriminator is shown in Eq. (6): The adversarial loss for generators is as Eq. (7): Because of adversarial training, discriminators and generators will handle each best. That is, for the loss of discriminators, they encourage real images to score high and synthesized images as low like L WGAN−gp (G FC , D C ) . However, for the generator loss, they intend to let their synthesized images to score much higher, such as −E y D C G FC y . Thus, they hold different loss function tasks and alternatively train each other.
The loss function of cycle consistency is shown in Eq. (8): The loss maps x → G CF (x) → G FC (G CF (x)) ≈ x and y → G FC y → G CF G FC y ≈ y , which are referenced to forward cycle consistency loss and backward cycle consistency, respectively.
To make generators able to generate synthesis images, we define the generated loss, which can be expressed as Eq. (9): The term can maintain the mapping of y ≈ y and x ≈ x , and it is our final objective. Identity loss is shown in Eq. (10): The idea for identity loss is that the generative model will transform to the input image style regardless of the input., i.e., given synthesis FBCT to generative model G CF , the model should still output the image with FBCT style, even if the input is not a real CBCT. A similar method for G FC is given synthesis CBCT with the same style output as synthesis CBCT. We use a regularization term to keep the training from overfitting. Sobel filter loss 35 is shown in Eq. (11): where δ 1 and δ 2 are Sobel gradient operators. The Sobel operator filters the gradient of image colour intensity by δ 1 and δ 2 and keeps the edges blurred. The total objective function for generators can be defined as Eq. (12): The hyper-parameters adv , cycle , generated , itentity , sobel are changed during the training time, and gan always has a weight of 1. In the first 10 epochs, the loss function is similar to a vanilla GAN, which means that all of the hyper-parameters are 0 except L gan . In the next 10 epochs, we assigned cycle = 5, itentity = 5, generated = 1, sobel = 0 as the cycle-consistent period to be the main target. After 20 epochs, we adopt cycle = 10, identity = 10, generated = 10, sobel = 10 −4 as our main loss function parameters and because if the Sobel gradient loss is too high, the total loss may be misleading; thus, we adopt sobel = 10 −4 as the optimal weight. Our goal is not only for the model to learn style transfer between CBCT and FBCT by cycle consistency but also to make the model fit another style by obtaining direct loss, such as generated = 10 and sobel = 10 −4 . The networks are trained with a learning rate of 10 −4 , with the Adam optimizer 36 and with L G = adv L adv + cycle L cycle + generated L generated + identity L identity + sobel L sobel www.nature.com/scientificreports/ a batch size of 8. Since GAN has difficulty finding the best minimal loss, we decay the learning rate by a cosine annealing scheduler while keeping the same learning rate in the first 20 epochs. CT images used for training contain a large range of black backgrounds around the body. The black border will cause the model to be less sensitive to edge pixels. To improve the model stability and prevent overfitting, data augmentation is applied during the training time. Every image pair (CBCT and FBCT), loaded from a dataset, will be synchronously randomly cropped into 128 × 128 sizes. Second, the image pairs are synchronously randomly rotated angles between − 20° and 20° and horizontally and vertically flipped. Then, the image pairs are generated.
We used a personal computer with a single GPU (Nvidia Titan XP) and a CPU (Intel Xeon E5-2620 v4 @ 2.10 GHz) with 64 GB memory, running Ubuntu 18.04 LTS. We implemented our method with Python 3.6.7 and PyTorch 1.0.0. The training time for 200 epochs needs approximately 3 days.
Quantitative evaluation. The CT value is a linear transformation of the original linear attenuation coefficient measurement into one in which the radiodensity of distilled water at standard pressure and temperature (STP) is defined as the zero CT value, while the radiodensity of air at STP is defined as − 1024 HU. The CT values of the different regions for CBCT, RED-CNN model images, CycleGAN model images, and CycleDeblur GAN images were compared to FBCT. We chose three kinds of soft tissue, breast, muscle, and mediastinum; two kinds of bony structures, sternum and spine; and lung tissue to compare CT values. To evaluate the performance of the proposed method, Cycle-Deblur GAN, we chose existing metrics such as the peak-signal-to-noise-ratio (PSNR), which is measured to capture the reduction in noise, and the structural similarity index measure (SSIM) 37 , which is one of the human visual system-based metrics and to evaluate different attributes such as luminance, contrast, and structure comprehensively. The mean absolute error (MAE) is one of the quantitative evaluations and is also used in our objective (loss) function. The PSNR is calculated from the mean square error (MSE), which is commonly used to measure distortion. There were seven regions of interest (ROIs) shown in Fig. 5, which were used to compare the CT value, MSE, PSNR, and SSIM. We define MSE, PSNR, and SSIM as Eqs. (13)-(21):

Results
The generated CBCT images from RED-CNN, CycleGAN, and our proposed Cycle-Deblur GAN with seven ROIs are shown in Fig. 5. The generated CBCT images from Cycle-Deblur GAN performed better visualization than those from CycleGAN and RED-CNN. The CT images of the lung, soft tissue, and bone in seven specific ROIs are shown in Figs. 5, 6, 7 and 8. For the lung ROIs, Cycle-Deblur GAN demonstrated more lung detail preservation than other methods. The ROIs were analysed by the CT value, MAE, PSNR, and SSIM. In Table 1, the CT values of seven ROIs in Cycle-Deblur GAN, CycleGAN, and RED-CNN are shown as the mean values with standard deviations.
In Table 3 In Table 4

Cycle-Deblur GAN (mean ± SD)
CycleGAN (mean ± SD)   www.nature.com/scientificreports/ In the blind image observer study, the median years of experience of radiation oncologists and medical physicists were 11 years, with a range from 6 to 33 years, and 8 years, with a range from 6 to 22 years. The results are shown in Table 5. The mean scores of the CBCT and generated CBCT images from the Cycle-Deblur GAN model, CycleGAN model and RED-CNN model were 2.8, 4.5, 3.3, and 1.3, respectively. The CBCT generated from the Cycle-Deblur GAN model scored higher than the other models.

Discussion
Our proposed Cycle-Deblur GAN consists of CycleGAN and Deblur-GAN with increasing shortcut numbers and inception blocks to preserve the detailed structure. For the activation layers, since the performance of Swish was better than ReLU in test set accuracy when changing the number of layers 33 , we adopted it as our activation function for Cycle-Deblur GAN. Satoshi Kida et al. 29 proposed CycleGAN for visual enhancement in pelvic CT images. However, CycleGAN could not perform better to improve image quality in PSNR, SSIM, and MAE for chest CBCT images, as shown in bone and soft tissues. In the RED-CNN 19 model, 14 input images and ground truth images were created from the same projections for comparison. However, in our study, CBCT and FBCT were acquired from one patient on different days. The image registration was needed before modelling. When using the RED-CNN model to train CBCT and FBCT in our study, the misalignment influenced the results of the RED-CNN model, which showed blurred results. For the Cycle-Deblur GAN model, the CBCT and FBCT images were both treated as the input images to derive a more stable model. Hence, the registration error due to the different acquisition dates of FBCT and CBCT for Cycle-Deblur GAN represented less influence, and the Cycle-Deblur GAN model could generate higher image quality images.
CT values of the original CBCT images may fluctuate for the same material in the different relative positions being scanned in the image volume 15 . In Table 1, the CT values of the generated CBCT from Cycle-Deblur GAN showed better results than those from RED-CNN and CycleGAN in the breast, lung, muscle, mediastinum, and sternum. For bone tissue, including the spine and sternum, the CBCT generated from the RED-CNN model showed a better result in the quantitative analysis of PSNR and SSIM. However, the visual enhancement of the generated CBCT from the RED-CNN model, as shown in Fig. 8, was blurred. The ROI size of the spine, breast, and sternum was smaller than others due to contouring the same structure in one ROI. The PSNR and SSIM of our proposed method demonstrated better results than other methods and showed more detail preservation, especially in lung tissues, as shown in Fig. 6.
Once the Cycle-Deblur GAN was well trained, the generator of the Cycle-Deblur GAN was used for testing. In the testing process, we input the CBCT image passing through the generator model and receive the generated CBCT. The generated CBCT with high image quality benefits the image verification by the oncologist. The average time of the generator to produce an improved CBCT image was approximately 0.17 s and depended on the hardware used.

Limitations of the study
We proposed the Cycle-Deblur GAN method to model chest CBCT images and obtain better results than the CycleGAN and RED-CNN methods in this study. The input data were all chest CT images, and the modelling generator used to produce the generated CBCT may be limited to the chest region only. Smoothed images of better PSNR with lower noise may be accompanied by lower contrast. Hence, the visualized evaluation of SSIM was also evaluated in our study.

Conclusions
The CBCT generated by our proposed Cycle-Deblur GAN model demonstrated higher PSNR and SSIM results in soft tissue, lung, and bony structures with improved image quality. The generated CBCT images with accurate CT values can be used for adaptive dose calculation in radiotherapy. The overall artefact of CBCT was well removed by using this model. This model enhanced the structural details in the lung, soft tissue, and bony structure and showed better visualization than the original CBCT. The Cycle-Deblur GAN model improved the image quality of CBCT, preserved structural details and provided accurate CT values for dose calculation. The high image quality and accurate CT values of CBCT would assist the development of radiomics in our future work.

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.