Generating synthetic contrast enhancement from non-contrast chest computed tomography using a generative adversarial network

This study aimed to evaluate a deep learning model for generating synthetic contrast-enhanced CT (sCECT) from non-contrast chest CT (NCCT). A deep learning model was applied to generate sCECT from NCCT. We collected three separate data sets, the development set (n = 25) for model training and tuning, test set 1 (n = 25) for technical evaluation, and test set 2 (n = 12) for clinical utility evaluation. In test set 1, image similarity metrics were calculated. In test set 2, the lesion contrast-to-noise ratio of the mediastinal lymph nodes was measured, and an observer study was conducted to compare lesion conspicuity. Comparisons were performed using the paired t-test or Wilcoxon signed-rank test. In test set 1, sCECT showed a lower mean absolute error (41.72 vs 48.74; P < .001), higher peak signal-to-noise ratio (17.44 vs 15.97; P < .001), higher multiscale structural similarity index measurement (0.84 vs 0.81; P < .001), and lower learned perceptual image patch similarity metric (0.14 vs 0.15; P < .001) than NCCT. In test set 2, the contrast-to-noise ratio of the mediastinal lymph nodes was higher in the sCECT group than in the NCCT group (6.15 ± 5.18 vs 0.74 ± 0.69; P < .001). The observer study showed for all reviewers higher lesion conspicuity in NCCT with sCECT than in NCCT alone (P ≤ .001). Synthetic CECT generated from NCCT improves the depiction of mediastinal lymph nodes.

Iodinated contrast media are widely used in computed tomography (CT) to enhance tissue contrast, making it easier to evaluate anatomic structures and pathologies. However, iodinated contrast media have potential adverse effects varying from minor physiologic reactions to severe life-threatening situations, although their incidence has decreased with the development of low-osmolar and non-ionic contrast agents 1,2 . Many chest CT examinations, which are undeniably crucial diagnostic tools to evaluate thoracic disorders, are non-contrast CT (NCCT), especially for screening purposes or initial evaluation. The use of contrast in chest CT is often unnecessary for detecting lung parenchymal lesions. However, contrast-enhanced CT (CECT) plays a critical role in the detailed assessment of the mediastinum, pleura, and vessels.
In recent years, deep learning has been applied to various tasks in medical imaging, including automatic lesion detection, segmentation, or image quality improvement. One of the most interesting current implementations of deep learning in medical imaging is synthetic image generation and the generative adversarial network (GAN) is considered state-of-the-art for such a task 3,4 . A recent study used a deep learning algorithm to synthesize contrast enhancement from non-contrast cardiac CT 5 . However, the authors only used slices where the heart was present and mainly focused on delineating the left cardiac chamber. We think that generating synthetic contrast enhancement from a full-volume NCCT without additional scan or intravenous contrast injection would prove more useful in clinical practice without any added risks to the patients.
This study aimed to propose and evaluate a deep learning approach using GAN for generating synthetic contrast-enhanced CT (sCECT) images from non-contrast chest CT.

Methods
This retrospective study was approved by the Seoul National University Hospital Institutional Review Board (SNUH, IRB no. 1910-152-1073) and the Institutional Review Board of Gyeongsang National University Changwon Hospital (GNUCH, IRB no. 2020-07-011). Both institutional review boards waived the requirement of informed consent for the study. All methods were performed in accordance with the relevant guidelines and regulations.
Data acquisition. We collected three separate data sets, the development set (for model training and tuning) from Hospital #1 (GNUCH) and test sets 1 (for technical evaluation) and 2 (for clinical utility evaluation) from Hospital #2 (SNUH). Patient inclusion is shown in Fig. 1. The development set included consecutive patients who underwent dual-energy thoracic CT angiography (Somatom Force; Siemens, Erlangen, Germany) in December 2019. There were no exclusion criteria in the development set. The development set was randomly split (9:1 ratio) into training and tuning sets. Test set 1 included consecutive patients who underwent thoracic CT angiography on various CT scanners between February 2020 and April 2020. Patients with CT examinations with motion artifacts or suboptimal contrast opacification were excluded. Test set 1 was divided into test sets 1A and 1B based on the CT vendor. Patients whose CT vendor was the same as that in the development set (Siemens, Erlangen, Germany) were included in test set 1A, and test set 1B consisted of the remaining patients. For evaluation of clinical utility, a separate test set with clinical relevance had to be constructed. Test set 2 comprised consecutive patients with suspected lung cancer who underwent preprocedural CT examinations for electromagnetic navigational bronchoscopy between August 2019 and April 2020. Among them, patients with at least one mediastinal lymph node with a short-axis diameter > 1 cm (significant lymphadenopathy) were included. At Hospital #2, patients underwent pre-bronchoscopic CT examinations consisting of a pre-contrast and a routine contrast-enhanced scan on a designated single CT scanner (IQon; Philips, Andover, Massachusetts). We obtained paired virtual non-contrast (VNC) and CECT data for the development set and paired NCCT and CECT data for the test sets.
Image preprocessing. All axial CT images were downloaded in Digital Imaging and Communications in Medicine (DICOM) format from picture archiving and communication systems (PACS) after anonymization. The size of all CT images was the same (512 × 512 pixels), and we did not resize the images. The original CT images ranged from − 1024 to over 3000 HU. We acquired three greyscale images from each axial CT image by applying three different window settings, normalized them to a range of − 1 to 1, and combined them into a 3-channel image. We tried various combinations of CT window settings in a preliminary study, and eventually, those we used in this study were as follows: lung/bone window (window width, 2000 HU; level, 0 HU), vascular

Lesion annotation.
A board-certified radiologist (Y.H.C.) and a radiology resident (J.W.C.) with 16 and 4 years of experience, respectively, reviewed the CECT images of test set 2 and annotated mediastinal lymph nodes with a short-axis diameter > 5 mm. The measurement of lesion contrast-to-noise ratio (CNR) and the observer study were based on these annotations. There were a total of 55 annotated mediastinal lymph nodes (mean short-axis diameter, 8.62 ± 2.47 mm).
Deep learning model development. Source code for training and inference of our deep learning model is available at https:// github. com/ jwc-rad/ pix2p ix3D-CT. The architecture of our deep learning model is illustrated in Supplementary Fig. S1. The basic structure of our proposed model is identical to that of the original pix2pix model 6 , except that the 2D convolutional layers are replaced with their 3D counterparts. The proposed model consists of a generator network and a discriminator network, as in a conventional generative adversarial network (GAN) 4 . The generator network is an encoder-decoder convolutional neural network with skip connections (U-Net) 7 , with an input and output size of 512 × 512 × 16. Each encoder block is composed of a convolution with stride, Leaky ReLu 8 , and instance normalization 9 , whereas each decoder block consists of an upsampling layer, convolution with stride, ReLu 8 , instance normalization, and a skip connection. To reduce the checkerboard artifacts of a GAN, the model uses resize-convolution with nearest-neighbor interpolation for the upsampling layer of the decoder block 10 . A skip connection concatenates encoder block i and decoder block n-i, where n is the total number of blocks, and passes the output to a ReLu activation layer. The final output layer of the generator network uses the Tanh function 8 . The discriminator network is a PatchGAN 6 that classifies each 70 × 70 × 4 pixel patch as real or fake and whose convolutional module is identical to the encoder block of the generator network.
Training the deep learning model. For the model training, we followed a standard adversarial approach of alternating training steps on the discriminator network and generator network 4 . The objective function of the proposed model is a weighted sum of the GAN loss and L1 loss of the generator network. Although the ratio of weights is a hyperparameter that may be altered for optimization, we set it to a fixed ratio of 1:100, as in the original pix2pix paper 6 . To stabilize the training process, we adopted general techniques for training GANs, proposed by Ian Goodfellow 11 and Radford et al. 8 , including one-sided label smoothing and the use of the Adam optimizer 12 . For data augmentation, we performed a random vertical and horizontal shift of up to 50 pixels on the input before feeding the image into the model. The training parameters were as follows: Adam optimizer with a learning rate of 0.0002, beta 1 of 0.5, and a decay rate of 0.1 after the first 10 epochs; a batch size of 1; and a total of 20 epochs. Each epoch covered all possible sets of 16 consecutive axial images of the training set, which was approximately 3000 iterations. The entire training process took about 4 days on a cloud-based workstation with an NVIDIA Tesla V100 GPU (NVIDIA, Santa Clara, CA) and 26 GB RAM. As there is no gold standard objective measure for the performance of a GAN, we relied on the visual inspection of images synthesized by the generator network 6,13 . During the training phase, a radiologist (J.W.C.) monitored random samples of the generated images, and after the training process, he validated the results generated from the tuning set.
Applying the deep learning model. For inference, we used only the generator network of our proposed model. The inference process was performed in the same manner for all data sets, regardless of whether the input was VNC or NCCT. As the size of the input was the same as in the training process, the direct synthesis of an entire CT volume of a patient was not possible. Instead, one slice at a time from the top, we repeated the application of the generator network to the next 16 consecutive slices while indexing the slice numbers. For a CT volume with a shape of 512 × 512 × N, running the generator network through the entire CT volume would first yield N-15 arrays with a shape of 512 × 512 × 16 that partially overlap with one another. Then, the overlapping slices, which would be the images with the same indexed slice number, were averaged. Finally, we reconverted the first channel image (lung/bone window) of the averaged output to a synthetic CT image with a range of -1000 to 1000 HU.
Image analysis: technical evaluation. We employed the mean absolute error (MAE), peak signal-tonoise ratio (PSNR) 14,15 , multiscale structural similarity index measurement (MS-SSIM) 16 , and learned perceptual image patch similarity metric (LPIPS) 17 to perform a quantitative evaluation of the tuning set and test set 1. A lower MAE, higher PSNR, higher MS-SSIM, and lower LPIPS indicate higher similarity to the ground truth. MAE and PSNR reflect the absolute numerical difference between two images, whereas MS-SSIM correlates with similarity in the structural composition of pixels 14,18 . LPIPS is a more recently suggested metric of perceptual distance based on widely used pretrained deep neural networks 17,19 . For comparison, we calculated the metrics for both sCECT and input images (NCCT or VNC) in the mediastinal window (window width, 350 HU; level, 50 HU), each relative to the corresponding CECT images. We only included axial slices between the top of the aortic arch and the diaphragm for image similarity analysis. www.nature.com/scientificreports/ set 2. As a quantitative analysis, we measured the lesion CNR of the mediastinal lymph nodes. For each lesion, the measurement was performed on the axial slice of the contrast-enhanced CT, where the short-axis diameter was measured. We first drew a circular region of interest (ROI) inside the lesion, measuring 90% of the lesion's short-axis diameter. Circular ROIs of the same size were additionally drawn inside the descending thoracic aorta and subcutaneous fat of the bilateral chest wall. The ROIs were then copied to the same locations on the non-contrast and synthetic contrast-enhanced axial images. The contrast-to-noise ratio (CNR) of all lesions was calculated as follows: where HU is the mean HU value of the ROI, SD is its standard deviation, and DTA is descending thoracic aorta.
For the qualitative analysis, two blinded board-certified radiologists (Y.J.C. and S.B.L. with 8 and 3 years of experience, respectively) participated in a three-session review of CT images with two-week intervals using a Digital Imaging and Communications in Medicine viewer (RadiAnt, version 2020.1; Medixant, Poznan, Poland). Each session consisted of NCCT, NCCT with sCECT, and CECT images, respectively, from each patient in test set 2 presented in random order. The reviewers were instructed to label mediastinal lymph nodes with a short-axis diameter > 5 mm and report lesion conspicuity on a 4-point scale (1, barely perceptible with presence debatable; 2, subtle finding but likely a lesion; 3, definite lesion detected; and 4, strikingly evident and easily detected) 20 . The conspicuity of undetected lesions was recorded as 0.
Statistical analysis. For comparison of image similarity metrics and lesion CNR, we applied the paired t-test or the Wilcoxon signed-rank test according to the Shapiro-Wilk normality test. For the observer study, the detection rate of the lymph nodes was compared using the McNemar test and the differences in lesion conspicuity were evaluated using the Wilcoxon signed-rank test. Also, we evaluated lesion localization using the figures of merit (FOM) from jackknife alternative free-response receiver operating characteristic (JAFROC) analysis 21 . We report the results from the random reader, fixed case JAFROC analysis because of the small number of cases of our study. P < 0.05 was considered indicative of a statistically significant difference. All data were analyzed using MedCalc (version 12.7, MedCalc Software, Ostend, Belgium), scikit-learn library (version 0.20.3, https:// scikit-learn. org/), and JAFROC software for Windows (version 4.2.1, WindowsJafroc, https:// www. devch akrab orty. com).
Informed consent. This retrospective study was approved by the institutional review boards, which waived the need for patient informed consent.

Results
Patient characteristics. Patient characteristics and CT acquisition parameters are summarized in Technical evaluation. Examples of representative cases from the tuning set and test set 1 are shown in Figs. 2 and 3, respectively. The sCECT images showed significantly higher similarity to the ground-truth CECT than NCCT in all quantitative metrics in both the tuning set and test set 1 (Fig. 4, Table 2). In the tuning set, the sCECT images showed a lower median MAE (33. 19      ; P < 0.001). We did not statistically compare measurements between sCECT and CECT images because of the difference in degrees of contrast enhancement between the development set and test set 2 due to the CT protocols.
In the observer study on test set 2, both reviewers detected a higher number of lymph nodes on NCCT with sCECT than on NCCT alone (reviewer 1, 76% [42 of 55 nodes] vs 49% [27 of 55 nodes], P = 0.003; reviewer 2, 38% [21 of 55 nodes] vs 29% [16 of 55 nodes], P = 0.06). The reader-averaged JAFROC FOMs calculated from NCCT alone, NCCT with sCECT, and CECT were 0.48, 0.52, and 0.68, respectively. There was no significant difference in JAFROC FOMs between the modalities (P = 0.059). The FROC curves from the three modalities are shown in Supplementary Fig. S2 Both reviewers had a higher lesion conspicuity rating for NCCT with sCECT compared to NCCT alone (P ≤ 0.001 for both), and both also rated CECT images higher in comparison to images of the other two groups (P < 0.001 for both; Fig. 6, Supplementary Table S1).
The most important strength of the current study is that we performed technical validation on a heterogeneous test set of CT data, including various CT vendors and scanning parameters. Many studies have shown deep learning applications of image-to-image synthesis in radiology, including cross-modality synthesis and reconstruction, but reports on external data are rare 3 . We believe that the quantitative performance of the proposed model shows the potential for generalizability, which is essential for any deep learning model to be used in clinical practice 22 .
Few previous studies have applied deep learning for synthetic contrast enhancement in CT. Santini et al. 5 demonstrated synthetic enhancement in non-contrast cardiac CT to delineate the left cardiac chambers. Liu et al. 23 proposed a deep learning model to generate synthetic enhancement of major arteries in non-contrast www.nature.com/scientificreports/ abdominopelvic CT. However, to our knowledge, there are no previous studies that have performed end-toend conversion of a whole volume of NCCT into sCECT images. We believe that acquiring VNC CT in the development set played a crucial role in the successful training of the proposed model. Misalignment between non-contrast and ground-truth contrast-enhanced images is an obstacle in the development of synthetic contrast enhancement 24,25 . The VNC reconstruction of dual-energy CT enabled perfect spatial registration between the input and ground truth. The observer study performed by two radiologists showed that the mediastinal lymph nodes were more conspicuous on sCECT than on NCCT, which can be attributed to the higher CNR of the lymph nodes. However, only one radiologist showed a statistically significant increase in the detection rate on sCECT images compared to NCCT images. The trained model relatively poorly delineated hilar and segmental lymph nodes adjacent to pulmonary vessels that are often difficult to detect on NCCT. Further training on a more heterogeneous group of patients with mediastinal lymphadenopathy may improve the model's performance. Nonetheless, the proposed model successfully generated sCECT images with higher CNR in terms of technical feasibility. Importantly, we do not claim that our deep learning implementation or methods to generate sCECT can replace CECT. The ultimate goal of our study on sCECT is to yield additional information, including improved lesion conspicuity and detectability, from NCCT, but not to predict the degree or pattern of contrast enhancement of the lesions. Not only does a vast majority of chest CT not require the use of contrast media, but also sCECT has a potential benefit in patients under certain conditions. These include allergy to iodinated contrast media, frequent CT examinations, chronic kidney disease, and poor vascular access. Additionally, we believe that sCECT can be utilized as a type of post-processing technique. A future application of sCECT is its use in automated volumetric segmentation and analysis. A previous study used synthetic non-contrast CT to improve the generalizability of CT segmentation tasks 26 . Likewise, sCECT may enable segmentation tools based on CECT to be generalized to NCCT data.
Our study has several limitations. First, our study included a small number of patients. However, such a number is reasonable as generative models demand high computational loads, unlike classification models. Several previous studies on image synthesis in radiology were also based on small study populations 18,27 . Second, we could not strictly control CT protocols and indications because of the retrospective nature of our study. The ideal training and test sets might have been patients with similar diseases and similar CT protocols. However, dual-energy CT with routine contrast amount or CT angiography for suspected lung cancer is not commonly performed in clinical practice. Lastly, our proposed model may not be an optimal deep learning approach for sCECT. Comparison and combination with different approaches including CNN (e.g., U-Net 7 ) and generative models based on unpaired data (e.g., CycleGAN 28 ) are warranted.
In conclusion, we implemented a deep learning model for generating synthetic contrast enhancement from non-contrast chest CT. Synthetic contrast-enhanced CT demonstrated good quantitative performance in terms of image similarity metrics and improved depiction of mediastinal lymph nodes. Applying the proposed deep learning model in clinical practice requires further studies on a larger population with more heterogeneous diseases. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.