Two-view topogram-based anatomy-guided CT reconstruction for prospective risk minimization

To facilitate a prospective estimation of the effective dose of an CT scan prior to the actual scanning in order to use sophisticated patient risk minimizing methods, a prospective spatial dose estimation and the known anatomical structures are required. To this end, a CT reconstruction method is required to reconstruct CT volumes from as few projections as possible, i.e. by using the topograms, with anatomical structures as correct as possible. In this work, an optimized CT reconstruction model based on a generative adversarial network (GAN) is proposed. The GAN is trained to reconstruct 3D volumes from an anterior-posterior and a lateral CT projection. To enhance anatomical structures, a pre-trained organ segmentation network and the 3D perceptual loss are applied during the training phase, so that the model can then generate both organ-enhanced CT volume and organ segmentation masks. The proposed method can reconstruct CT volumes with PSNR of 26.49, RMSE of 196.17, and SSIM of 0.64, compared to 26.21, 201.55 and 0.63 using the baseline method. In terms of the anatomical structure, the proposed method effectively enhances the organ shapes and boundaries and allows for a straight-forward identification of the relevant anatomical structures. We note that conventional reconstruction metrics fail to indicate the enhancement of anatomical structures. In addition to such metrics, the evaluation is expanded with assessing the organ segmentation performance. The average organ dice of the proposed method is 0.71 compared with 0.63 for the baseline model, indicating the enhancement of anatomical structures.


Introduction
Computed tomography (CT) imaging provides non-invasive insights into the human body with a high image quality and only short acquisition time compared to other modalities.Therefore, CT imaging has become an integral part of clinical routine and research.However, in order to reconstruct CT volumes with a diagnostic image quality, a sufficient number of measured projections must be acquired which inevitably exposes the patient to ionizing radiation, i.e., X-rays.Therefore, dose reduction is an important research topic in CT imaging.There are different methods to achieve dose reduction, both hardware-and software-based.These methods include but are not limited to the usage of pre-filters, iterative reconstruction algorithms, and dose-shielding methods.One other method that is routinely used is to adjust the tube current of the X-ray source depending on the angular position α of the X-ray source and the z-position, so called tube current modulation (TCM) 1,2 .More precisely, TCM methods aim at minimizing the mAs-product by adapting the tube current as a function of attenuation for a given view.The attenuation can for example be estimated based on the topogram acquired prior to the CT scan.
However, the mAs-product is only a surrogate parameter for actual patient dose, since some organs are more sensitive to the radiation than others.It would be of advantage to also account for these sensitivities in the tube current optimization.Thereby, the effective dose D eff is defined as the sum of the dose absorbed by the organ-at-risks (OAR) during the exposure, weighted with the organ-specific tissue weighting factor.The tissue weighting factors correspond to the radiation sensitivity of the individual organs and structures are provided by the international commission on radiological protection (ICRP) 3,4 .The factors also reflect the risk of radiation induced for cancer.To this end, it is necessary to be able to estimate organ doses, namely effective dose, prior to the CT scan and then optimize for D eff .Recently, a risk-minimizing tube current modulation (riskTCM) has been proposed that requires a dose distribution and organ segmentation as input parameters 5 .In particular, this method assumes an initial coarse CT reconstruction and the voxel-wise segmentation of all relevant organs.Given the known sensitivities with respect to ionizing radiation of these organs, the effective dose is estimated on a per-view basis.Usually, dose arXiv:2401.12725v1[eess.IV] 23 Jan 2024 estimation is performed using Monte Carlo methods.Such methods, however, are very time consuming and would prohibit an application of riskTCM in clinical practice.Hence, spatial dose distribution is estimated in quasi-real-time from a given CT volume using a deep neural network proposed by Maier et al. 6 .Organ dose is then obtained using known organ segmentation.With the effective dose for each potential view in the desired scan range, a tube current curve is then computed that allows maintaining diagnostic image quality while minimizing the patient risk.
To achieve this, a method that estimates a coarse CT reconstruction before the scanning is needed.As shown in Figure 1, starting from only few projections provides a reasonable pipeline to facilitate the CT risk optimization rather than the retrospective CT dose estimation.In order to avoid additional X-ray projections, we refactor the research problem to the reconstruction of a coarse CT volume from only two orthogonal topograms, referred to as X-ray projections in the following manuscript.
With the emerge of deep learning (DL)-based medical image processing methods, some generative adversarial network (GAN) methods have been established related with CT reconstruction from only few views 7 .Ying et al. proposed X2CT-GAN 8 that performs a domain transfer task from X-ray projections to CT volumes, where a network for effective 2D-to-3D image generation is proposed.The authors also address the superiority of using two X-ray projections, i.e. from anterior-posterior (a.p.) and lateral (lat.)direction, compared to only a single view.On top of the X2CT-GAN, Ling et al. proposed a conditional variational autoencoder (cVAE)-based GAN 9 to enhance the regularization of the generator.Ratul et al. improved the generator with additional input of the organ segmentation of the X-ray projections from a.p. direction 10 .Montoya et al. proposed ScoutCT-Net that first backprojects the topograms into an initial CT volume, and refine the initial volume using another network 11 .Similarly, most proposed methods aim to improve the CT reconstructions by voxel-wise metrics, while the anatomical information, such as the shape and location of organs and structures, are usually ignored.
In this work, we propose an anatomy-guided GAN for CT reconstruction from only two X-ray projections which can facilitate the implementation of the risk-specific TCM methods.More specifically, a 3D perceptual loss L p and a 3D segmentation loss L s are implemented into the overall loss function for training the GAN, leading to a loss function that also optimize for better anatomical information: where L G is the original generator loss, focusing on voxel-wise similarity, and λ p and λ s are constants that control the enhancement.We demonstrate that the combined use of L p and L s can lead to the enhancement of anatomical structures in the reconstructed volumes.The implementation of L p and L s will be in detail explained in following sections.Our proposed method enhances the organ shape and boundary during the training phase and thus will not increase the computational complexity during inference time.CT organ dose are designed as retrospective pipelines, which can only be applied after the scanning.For application like CT risk minimization, a prospective pipeline for organ dose estimation is required.

Results
Some exemplary slices of the reconstructed CT volumes are shown in Figure 2 and the results of organ segmentation are shown in Figure 3.After evaluating the reconstruction performance, we choose the λ s = 2.0 and λ p = 0.5 to present the reconstruction performance of our proposed method.We demonstrate that our proposed method can improve both the overall image quality and the anatomical structures, in comparison with the baseline method.More specifically, L p leads to the improved image quality while L s can improve the anatomical plausibility of organs and structures.The influence of the L p and L s are further investigated in the ablation experiments, where the either L p or L s is applied for enhancement with varying λ s.

Baseline
Only Only Ground truth Our method

Reconstruction performance
From the exemplary slices in Figure 2. The reconstructed CT volumes from the baseline method are 'visual real' but only in terms of the body shape and regions with obvious contrast, such as the boundary of the lungs.However, the abdominal organs, for example liver, are not distinctive from the remainder regions.Some structural details, like the shape of vertebra, are also lost.The proposed method leads to the enhancement of such anatomical structural details and the organ contrast, while keeping the overall image quality.More specifically, the application of L s decreases the image quality.It is expected as L s only enhances the organ segmentation rather than the voxel-wise image quality.Regarding the anatomy, L s contributes to organ-specific enhancement.As L s in our experiments only targets for lung, liver and bones segmentation, the contrast of such organs in the reconstructed CT volumes are enhanced while there is no enhancement for other organs and structures.With L p the anatomical structures in the CT volumes are enhanced, such as the shape of vertebrae, and the contrast between adjacent anatomical structures, such as the boundary of the fat tissues.However, such anatomical improvements are barely indicated by  4, higher λ p can lead to higher PSNR and SSIM, indicating higher overall image quality.
In contrast, higher λ s will not improve the overall image quality.The proposed method also results in higher PSNR and RMSE when λ s and λ p increase, similar to the results with only L p .

Organ segmentation in reconstruction
In additional to the reconstruction metrics, we also evaluate the organ segmentation of the reconstructed volumes for assessment of human anatomy.In our experiments, the segmentation of liver, lung and bones are evaluated, as defined by L s .Since the reconstruction dataset contains no paired organ segmentation annotation, 20 CT volumes in the test set are manually annotated with such organs, namely annotation M. In addition to the manual ground truth, the organ segmentation masks by the pre-trained segmentation network of L s are also used to benchmark segmentation performance, named as annotation S. the dice similarity coefficient (DSC) of each organ is then computed.Evaluated using the annotation M, the proposed method leads to the increase by 12.6% in average DSC compared with the baseline method, also 9.5% when only L p applied and 11.1% when only L s applied.when the annotation S is used as ground truth, the proposed method leads to the increase by 7.0% in average DSC compared with baseline method, also 4.2 % when only L p applied and 7.0 % when only L s applied.In terms of each single organ, as shown in Table 2, the proposed method improves the DSC M by 15.1% for bones, 10.9% by liver and 6.1% by lungs.Also the DSC S is increased by 8.0% for liver and 10.5% for bones.Some exemplary organ segmentation masks are shown in Figure 3.The organ segmentation using the proposed method shows higher anatomical plausibility in terms of the organ and skeleton shape, as shown by the mesh visualization in Figure 3.In comparison, the baseline method and the model with only L p contains more outliers and the segmentation of the skeleton is less accurate.As also shown in Figure 4, higher λ s in general leads to higher average DSC M and DSC S .

Discussion
Based on the results, the proposed L p and L s contribute to the enhancement of both the anatomical structures and the overall image quality.Such enhancements enable the GANs to reconstruct CT volumes that not only appears correct but also ensures the reliability of the anatomical structures in the reconstructed volumes.Consequently, a more robust reconstruction method for a prospective CT risk minimization pipeline is established.However, the accurate inference of the radiation risk involves more organs as liver, lungs and bones in our research, and aggregating region should be whole human body.It is still difficult to acquire the dataset containing whole body CT scans with all organ segmented for dose estimation, but the research towards larger body region and the segmentation of more diverse organs is our future research of interest.Throughout our investigation, we have noted that the reconstructed volumes with enhanced anatomical structures can lead to inferior reconstruction metrics, i.e.PSNR, SSIM and RMSE.PSNR and RMSE are commonly used for the evaluation of reconstruction algorithms, and SSIM is originally designed for the assessment of digital image quality.Different from typical CT reconstruction methods, GAN-based methods depend on training a generator network to reconstruct the volumes from bi-planar projections, so such reconstruction is an ill-posed problem.During training the GANs, the network tends to reconstruct the CT volumes with bare or even no anatomical information, while maintaining high reconstruction metrics such as PSNR and SSIM.Some exemplary slices are shown in Figure 5. Therefore, in our research we also evaluate the organ segmentation of the CT volumes, based on the assumption that a network that is trained for organ segmentation can effectively evaluate anatomical structures.

Methods
The pipeline of the proposed model is shown in Figure 6.A GAN is trained to reconstruct a CT volume from two X-ray projections.On top of the typical generator and discriminator network of GAN 7 , two pre-trained networks are included into the training procedure, i.e. a pre-trained segmentation network, namely φ s , for the enhancement of specific anatomical structures and a pre-trained VGG network for the enhancement of the image quality 12 .VGG network is firstly proposed by the visual geometry group (VGG) from the university of Oxford and is a well-known network for image feature extraction in computer vision researches.

CT reconstruction GAN
The training of our proposed model follows the adversarial strategy of GAN.The minmax objective of GAN training in our situation is 7 min where x indicates the input X-ray projections and y the corresponding CT volume, G(x; θ g ) and D(y; θ d ) are the generator and discriminator network.More specifically, the GAN loss is modified according to least squared GAN as two loss functions 13 In our model, the discriminator network is implemented as in the work of Phillip et al. 14 .The generator network encodes the input 2D X-ray projections using two independent pathways based on U-Net 15 and the encoded features are fused to output the reconstructed CT volume in 3D.The network architecture of the generator network is shown in Figure 7.
One key step for the generator network is to convert the extracted features from 2D to 3D, and in our network such conversion is accomplished using backprojection, as the 2D X-ray projections are obtained by the forward projection of the CT volumes.fan-beam geometry is implemented in our research.The backprojection propagates the 2D feature maps to 3D and is implemented as a matrix multiplication, Ẑ3D = T • Ẑ2D , (5)  where Ẑ is the flattened 2D or 3D intermediate feature maps and T is a pre-defined transformation matrix depending on the fan-beam geometry.In this work T is given by a pixel-driven fan-beam backprojector based on the geometry of a Siemens Somatom Force scanner.

3D segmentation loss
L s is first proposed for the enhancement of specific anatomical structures.The correct location, shape and size of the OARs in the reconstructed CT volumes are crucial for dose estimation and organ segmentation, but such anatomical content cannot be explicitly leveraged using typical image generation models, such as GANs.In order to include also the organ segmentation into the training of the GAN, a of CT volumes with ground truth is required.However, the voxel-wise annotation of the OARs is very expensive and cannot be easily obtained for large-scale dataset for training a GAN, while the segmentation datasets are mostly not sufficient in the number of images for training GANs.
In our model, a pre-trained φ s is leveraged to enhance the anatomical content that are missing in the reconstruction dataset.The φ s is trained on an auxiliary dataset that contains the segmentation of the OARs in CT volumes.Such a φ s is then applied into the training of the GAN and the enhancement of the anatomical structures is thus explicitly refactored to the optimization of OARs segmentation in the reconstructed CT based on the pre-trained φ s .Such regularization is implemented as a loss item 7/11  where y m = φ s (y) and y ′ m = φ s (G(x)) are the organ segmentation mask of y and G(x) using φ s .From the equation, L s will depend on the target organ of the φ s , so the enhancement of anatomical structures can be flexible to specific organs.Since the segmentation ground truth of the reconstruction dataset is missing, the φ s will not be optimized during GAN training.After the training of the GAN, the φ s can be further used to provide organ segmentation.During the inference, the model outputs G(x) as reconstructed CT volume and φ s (G(x)) as the corresponding organ segmentation map with x the input X-ray projections.

3D perceptual loss
Perceptual loss is first proposed in the field of computer vision for feed-forward image transformation tasks 16 .Unlike typical loss functions, perceptual loss relies on a pre-trained classification network as feature extractor and backpropagates the loss using the extracted features from the source and the target images.Apart from natural image researches, perceptual loss is also applied in medical image processing researches, such as the denoising of CT images 17 .It is shown that the network pre-trained on natural images can also work as a good feature extractor for medical images.In our model, we adopted the VGG-16 network pre-trained on the ImageNet dataset as the feature extractor, deployed by the torchvision toolkit (version 0.15.2) 18,19 .The original VGG-16 network contains five convolutional blocks to extract image features in different scales.The aggregation of L1 loss of intermediate features from the ground truth and the reconstructed CT volumes leads to the 3D perceptual loss, as shown in Figure 6.The 3D perceptual loss used in the model training is where the φ p () is the intermediate features and only the first four VGG levels are used to aggregate the 3D perceptual loss.Note that the pre-trained VGG network only inputs 2D images, so the ground truth and the reconstructed CT volumes are sliced along the vertical direction and the loss of all 2D slices are aggregrated.

Overall loss function
In addition to the previously mentioned loss functions, the voxel-based L r and pixel-based L proj are applied for the consistency of the input X-ray projections and the reconstructed CT volume, which is implemented as where P a.p. and P lat project the CT volume each in a.p. and lat.direction.L r will lead to the CT reconstruction to be correct and L pro j will lead to the projections to be correct.Then with the proposed L s and L p , the overall loss function for training the CT reconstruction GAN is weighted to balance the voxel-wise features and anatomical contents.The overall loss function aggregates as where λ s are configurable hyper-parameters, and in the experiments we will illustrate how the CT reconstruction is enhanced by the proposed model.

Experiments
In all experiments, the GAN is trained for 100 epochs.Adam optimizer is used with learning rate of 2 • 10 −4 .The weights of the GAN loss, the reconstruction loss and the projection loss, namely λ gen , λ r and λ proj , are fixed across all experiments, i.e. λ gen =0.1, λ r =10 and λ proj =10.In all experiments, φ s is implemented as an vanilla 3D U-Net and trained on the CT-ORG dataset for 200 epochs.Dice loss is used as loss function and Adam is used as the optimizer with learning rate of 5 • 10 −4 .All model training is carried out on one Nvidia A100 GPU with 40GB memory.For the fan-beam operator, we model the real scanner parameters with the source-to-detector distance (SDD) as 1085.6 mm, the source-to-isocenter distance (SID) as 595 mm and the number of rays within the fan to be 920.

Figure 1 .
Figure 1.Illustration of prospective and retrospective CT organ dose estimation pipelines.Many existing methods to estimate CT organ dose are designed as retrospective pipelines, which can only be applied after the scanning.For application like CT risk minimization, a prospective pipeline for organ dose estimation is required.

Figure 2 .
Figure 2. Exemplary slices of the reconstructed CT volumes using the proposed and baseline method, in comparison with the ground truth.

Figure 3 .
Figure 3.Comparison of the organ segmentation masks generated in different experiments.In the top three rows, the slices of the organ segmentation mask and the CT volumes are shown and in the bottom rows the organ segmentation is shown as mesh visualization.

Figure 4 .
Figure 4. Reconstruction and organ segmentation performance of the proposed method with varying λ , in comparison with applying L p and L s independently.

Figure 5 .
Figure 5. Example slices that GANs fails to reconstruct anatomical structures in volumes.Column (a) is our proposed method with enhanced anatomical structures.(b) and (c) illustrate the CT reconstruction with deteriorated anatomical structure but high reconstruction metrics.

Figure 6 .
Figure 6.The proposed CT reconstruction pipeline with enhancement of anatomical structures.The CT Gen is the CT generator network that reconstructs the 3D CT volume with two 2D X-ray projections as input.The SegNet is a pre-trained segmentation network that segments the target anatomical structures and is frozen during the GAN training.The 3D perceptual loss aggregates the 2D perceptual losses from slices along vertical directions.

Figure 7 .
Figure 7. Network architecture of the CT generator network that inputs 2D X-ray projections but outputs 3D CT volumes.
For the training of the GAN and the pre-training of the φ s , two datasets are used in our experiment for the proof of the principle, i.e. a reconstruction dataset and a segmentation dataset.For training the GAN, the CT volumes from the lung image database 9/11 consortium and image database resource initiative (LIDC-IDRI) are used as the reconstruction dataset20 .The LIDC-IDRI dataset consists of 1016 chest CT volumes with pixel size ranging from 0.46 mm to 0.98 mm in the transverse plane and from 0.6 mm to 5.0 mm in vertical direction.For the pre-training of the φ s , we select a public dataset of CT volumes with voxel-wise annotation of abdominal organs, namely CT-ORG dataset21 .The CT-ORG dataset consists of 140 throat-abdominal CT scans with annotated lungs, bones, liver, bladder and brain, with voxel size ranging from 0.56 mm to 1.0 mm in vertical direction.Because the reconstruction dataset covers only chest region, the annotations of lungs, liver and bone in the CT-ORG dataset are used in the following experiments.Some samples from the datasets are shown in Figure 8.All CT volumes in the LIDC-IDRI dataset and the CT-ORG dataset are resampled to a uniform voxel size of mm by 1 mm by 1 mm to ensure the consistency during model training.812 CT images from the LIDC-IDRI dataset are used during the model training and 214 images for test.For training the φ s , 112 CT scans are used for training and 28 images for test.The X-ray projections are simulated in a way to mimic the fan-beam CT forward projection, by using the aforementioned scanner geometry.The CT volumes are first resampled to voxel size of 2.5 mm in each direction and then clipped to the uniform volume/image size of 128.The resolution of the input X-ray projections is also 128.For the GAN training, both X-ray projections and the CT volumes are normalized to 0.0 to 1.0 using the same parameters.

Figure 8 .
Figure 8. Example simulated X-ray projections from the LIDC-IDRI dataset and fan-beam geometry (left) and one slice from the CT volume with organ segmentation from CT-ORG dataset (right).

Table 1 .
Reconstruction and organ segmentation results of the experiments.DSC M indicates the DSC with manual annotation as ground truth and DSC S indicates the DSC with the segmentation using pre-trained segmentation network as ground truth.The segmentation metrics are aggregated from 20 CT volumes.

Table 2 .
Organ-wise segmentation results of the proposed method in comparison with the baseline method.
the reconstruction metrics.Peak-signal-to-noise ratio (PSNR), structural similarity index (SSIM) and root mean squared error (RMSE) in the unit of Hounsfield unit (HU) are selected to evaluate the reconstruction performance.Table1shows the results of our proposed method in comparison with the baseline method.The proposed method leads to the improvement in the PSNR by 1.0% and the SSIM by 3.2%, and RMSE by 2.7%.With only L s , the PSNR is deteriorated by 2.6%, SSIM by 3.2% and RMSE by 8.7%.The best improvement in metrics is obtained with only L p , the improvement is by PSNR 1.2%, SSIM by 3.2% and RMSE by 2.9%.From the reconstruction metrics, only L p can contribute to the improved image quality.The results from the ablation study are shown in Figure