QCBCT-NET for direct measurement of bone mineral density from quantitative cone-beam CT: a human skull phantom study

The purpose of this study was to directly and quantitatively measure BMD from Cone-beam CT (CBCT) images by enhancing the linearity and uniformity of the bone intensities based on a hybrid deep-learning model (QCBCT-NET) of combining the generative adversarial network (Cycle-GAN) and U-Net, and to compare the bone images enhanced by the QCBCT-NET with those by Cycle-GAN and U-Net. We used two phantoms of human skulls encased in acrylic, one for the training and validation datasets, and the other for the test dataset. We proposed the QCBCT-NET consisting of Cycle-GAN with residual blocks and a multi-channel U-Net using paired training data of quantitative CT (QCT) and CBCT images. The BMD images produced by QCBCT-NET significantly outperformed the images produced by the Cycle-GAN or the U-Net in mean absolute difference (MAD), peak signal to noise ratio (PSNR), normalized cross-correlation (NCC), structural similarity (SSIM), and linearity when compared to the original QCT image. The QCBCT-NET improved the contrast of the bone images by reflecting the original BMD distribution of the QCT image locally using the Cycle-GAN, and also spatial uniformity of the bone images by globally suppressing image artifacts and noise using the two-channel U-Net. The QCBCT-NET substantially enhanced the linearity, uniformity, and contrast as well as the anatomical and quantitative accuracy of the bone images, and demonstrated more accuracy than the Cycle-GAN and the U-Net for quantitatively measuring BMD in CBCT.


Materials and methods
Data acquisition and preparation. We used two phantoms of human skulls encased in acrylic articulated for medical use (Erler Zimmer Co., Lauf, Germany), one with and the other without metal restorations causing streak artifacts. The phantoms have been used in our previous studies [45][46][47][48] . The images of the phantoms were obtained with a MDCT (Somatom Sensation 10, Siemens AG, Erlangen, Germany) and a CBCT (CS 9300, Carestream Health, Inc., Rochester, US), respectively. We acquired the CT images with voxel sizes of 0.469 × 0.469 × 0.5 mm 3 , dimensions of 512 × 512 pixels, and 16 bit depth under condition of 120 kVp and 130 mA, while the CBCT images were obtained with voxel sizes of 0.3 × 0.3 × 0.3 mm 3 , dimensions of 559 × 559 pixels, and 16 bit depth under conditions combined from 80 or 90 kVp and 8 or 10 mA. In addition, CT and CBCT images of a BMD calibration phantom (QRM-BDC Phantom 200 mm length, QRM GmbH, Moehrendorf, Germany) with calcium hydroxyapatite inserts of three densities (0 (water), 100, and 200 mg/cm 3 ) were also obtained under the same condition (Fig. 1). The CT images of the skull phantoms were then converted into quantitative CT (QCT) images based on Hounsfield Units (HU) by linear calibration using the CT images of the BMD calibration phantom. The CBCT images of the skull phantoms were also converted into calibrated CBCT (CAL_CBCT) images using the corresponding images of the BMD calibration phantom for comparisons with deep learning results afterwards. www.nature.com/scientificreports/ The CT image for the skull phantom was matched to the CBCT image by paired-point registration using a software (3D Slicer, MIT, Massachusetts, US), where the six landmarks were localized manually at the vertex on the lateral incisors, the buccal cusps of the first premolars, and the distobuccal cusps of the first molars 49 . The matched CT and CBCT images consisting of a matrix of 559 × 559 × 264 pixels were cropped to images of 559 × 559 × 200 pixels centered at the maxillomandibular region, and then resized to images of 256 × 256 × 200 pixels. To avoid adverse impacts from non-anatomical regions during training, binary masks were applied to the CT and CBCT images to separate the maxillomandibular region from the non-anatomical regions 44 . The binary mask images were generated by using thresholding and morphological operations. The edges of anatomical regions were extracted by applying a local range filter to the paired CBCT and CT images 50 , and the morphological operations of opening and flood fill were applied to the binarized edges obtained by thresholding to remove small blobs and fill the inner area. The corresponding CBCT and CT images were multiplied by the intersection of the two binary masks from CBCT and CT images. The voxel values outside the masked region were replaced with Hounsfield Units (HUs) of − 1000.
For deep learning, we prepared the 800 pairs of axial slice images for QCT and CBCTs from the skull phantom without metal restorations for the training and validation datasets (obtained under four conditions combined from 80 or 90 kVp, and 8 or 10 mA), and independently, another 400 pairs for QCT and CBCTs from the skull phantom with metal restorations for the test dataset (obtained under two conditions of 80 kVp and 8 mA, and 90 kVp and 10 mA).
Hybrid deep-learning model (QCBCT-NET) for quantitative CBCT images. We designed a hybrid deep-learning architecture (QCBCT-NET) consisting of Cycle-GAN and U-Net to generate QCT-like images from the conventional CBCT images (Fig. 2), and also the Cycle-GAN and the U-Net with the same architecture with QCBCT-NET, respectively, for performance comparisons. We implemented Cycle-GAN with the residual blocks 38 combined with a multi-channel U-Net model using paired training data. The CycleGAN architecture contained two generators for yielding the CBCT to QCT ( G CBCT→QCT ) and QCT to CBCT ( G QCT→CBCT ) mappings, and two discriminators for distinguishing between real ( D QCT ) and generated ( D CBCT ) images. We Figure 2. The QCBCT-NET architecture combining Cycle-GAN and the multi-channel U-net. The Cycle-GAN consisted of two generators of G CBCT → QCT , and G QCT → CBCT , and two discriminators of D CBCT , and D QCT . In the generators, the convolution block consisted of 7 × 7 and 3 × 3 convolution layers with batch normalization and ReLU activation, and residual blocks were embedded in the middle of the down-sampling and up-sampling layers. In discriminators, the convolution block consisted of 4 × 4 convolution layers with batch normalization and leaky ReLU activation followed by down-sampling layers. The multi-channel U-Net had two-channel inputs of CBCT and corresponding CYC_CBCT images, consisting of 3 × 3 convolution layers with batch normalization and ReLU activation, and had skip connections at each layer level. Max-pooling was used for down-sampling and transposed convolution was used for up-sampling. Consequently, the QCBCT-NET generated QCBCT images from CBCT images to quantitatively measure BMD in CBCTs. www.nature.com/scientificreports/ adopted a ResNet architecture with nine residual blocks for the generators, and a PatchGAN of 70 × 70 patch for the discriminators.

Scientific
The Cycle-GAN model was optimized using two part loss functions consisting of an adversarial loss and a cycle consistency loss 36 . The adversarial loss function relied on the output of the discriminators, which were defined as: where I CBCT was the CBCT image, and I QCT , the QCT image.
To avoid mode collapse issues, we added a cycle consistency loss that reduced the space of mapping functions. The cycle consistency loss was defined as: where I CBCT was the CBCT image, and I QCT , the QCT image.
Finally, the loss function of Cycle-GAN was defined as: where λ controlled the relative importance of the adversarial losses, and the used value of λ was 10.
To generate QCBCT images, we implemented the multi-channel U-Net with four skip-connections between an encoder and a decoder at each resolution level using the two-channel inputs consisting of the original CBCT image, and the corresponding output of the Cycle-GAN. The multi-channel U-Net was optimized by the loss function consisting of the mean absolute difference (MAD) and structural difference (SSIM) between QCBCT and QCT images 43 , which were defined as: where I QCBCT was the QCBCT image, I QCT , the QCT image, µ, mean, σ 2 , variance, and C 1 and C 2 , variables to stabilize the division with weak denominators.
Finally, the loss function of the multi-channel U-Net was defined as: where the used value of α was 0.6. The deep learning model was trained and tested using a workstation with four GPUs of Nvidia GeForce GTX 1080 Ti and 11 GB of VRAM. The Cycle-GAN model was trained by the Adam optimizer with a mini-batch size of 8 and epoch number of 200. For the first 100 epochs, the learning rate was maintained at 0.0002, and decreased linearly approaching zero for the next 100 epochs. The U-Net model was trained by the Adam optimizer with a mini-batch size of 8 and epoch number of 200. The learning rate was set to 0.0001 with momentum terms of 0.9 to stabilize the training.
To compare the performance of measuring BMD from QCBCT images produced by the QCBCT-NET with those by the Cycle-GAN or the U-Net, we used the same settings with QCBCT-NET for the Cycle-GAN and the U-Net, and trained the networks with only CBCT as the network input, respectively.

Evaluation of quantitative CBCT images for measuring BMD.
To quantitatively evaluate the performance of measuring BMD from CBCT images by the different deep learning models, we compared the mean absolute difference (MAD), peak signal to noise ratio (PSNR), normalized cross correlation (NCC), and structural similarity (SSIM) between the original QCT image (the ground truth), and QCBCT image produced by QCBCT-NET, CYC_CBCT image produced by Cycle-GAN, U_CBCT image produced by U-NET, and CAL_ CBCT image produced by only calibration for the CBCT image of the test dataset obtained under two scanning conditions. The MAD was defined as the mean of the absolute differences between the intensities of the QCT and CBCT images, the PSNR as the logarithm of the maximum possible intensity (MAX) over the root mean squared error (MSE) between the intensities of the QCT and CBCT images ( PSNR = 20 × log 10 ), the NCC as the multiplication between the intensities of the QCT and CBCT images divided by each standard deviation , and SSIM the same as described above. The quantitative measurements in each slice were averaged over the whole maxilla and mandible. The higher values of PSNR, SSIM, and NCC, and the lower MAE indicated better performance for BMD measurement from CBCT images.
Spatial nonuniformity (SNU) of the CBCT images was measured as the absolute difference between the maximum and the minimum of the BMD values in rectangular ROIs around the maxilla and mandible. To evaluate the linearity of BMD measurements in the CBCT images, we analyzed the relationship between the voxel intensities of the QCT (the ground truth) and CBCT images through a linear regression of the voxel intensities (Slope, slope of linear regression) at the maxilla and mandible, respectively. The lower SNU, and the higher Slope indicated better performance for BMD measurement from CBCT images. We also performed the Bland-Altman analysis to analyze the bias and agreement limits of the BMD between QCT (the ground truth) and CBCT images at the maxilla and mandible. www.nature.com/scientificreports/ We compared the performances between QCBCT and other CBCT images at the maxilla and mandible under two conditions of 80 kVp and 8 mA, and 90 kVp and 10 mA with respect to the variations of BMD values of a bone depending on their relative positions 51 , and those affected by scanning conditions. Paired two-tailed t-tests were used (SPSS v26, SPSS Inc., Chicago, IL, USA) to compare the quantitative performances between QCBCT and CYC_CBCT images, between QCBCT and U_CBCT images, and between QCBCT and CAL_CBCT images. Statistical significance level was set at 0.01. Table 1 summarizes the means of the quantitative performance results for measuring BMD from QCBCT images produced by QCBCT-NET, CYC_CBCT produced by Cycle-GAN, U_CBCT produced by U-NET, and CAL_ CBCT produced by calibration for the CBCT images of test datasets acquired for the skull phantom with metal restorations under conditions of 80 kVp and 8 mA, and 90 kVp and 10 mA. The BMD images of QCBCTs significantly outperformed the CYC_CBCT and U_CBCT images in MAD, PSNR, SSIM, and NCC at both the maxilla and mandible area when compared to the original QCT images ( Table 1). All performances from the QCBCT images exhibited significant differences with those from the CYC_CBCT or U_CBCT images at the maxilla and mandible (p < 0.01) except for the SNU from the U_CBCT (p = 0.04) ( Table 1). Compared to the BMD measurements from the CYC_CBCT image, the BMD from the QCBCT showed increases of 38% MAD, 20% PSNR, 45% SSIM, 40% NCC, 80% SNU, and 84% Slope at the maxilla, and 39% MAD, 20% PSNR, 50% SSIM, 40% NCC, 47% SNU, and 102% Slope at the mandible for CBCT images under condition of 80 kVp and 8 mA (Table 2). Compared to the BMD measurement from the U_CBCT image, increases of 59% MAD, 41% PSNR, 112% SSIM, 58% NCC, -17% SNU, and 167% Slope at the maxilla, and 49% MAD, 33% PSNR, 81% SSIM, 54% NCC, -25% SNU, and 142% Slope at the mandible for CBCT images under condition of 80 kVp and 8 mA (Table 2). Under the higher dose condition of 90 kVp and 10 mA, the BMD from the QCBCT also showed higher performances at both the maxilla and mandible compared to the CYC_CBCT and U_CBCT (Table 2). Therefore, the BMDs from the QCBCT demonstrated more accuracy than those from the CYC_CBCT and U_CBCT without regard to relative positions of the bone, or effects from different scanning conditions. Figure 3 shows the axial slices of the BMD images from the original QCT, QCBCT, CYC_CBCT, U_CBCT, and CAL_CBCT at the maxilla and mandible. As shown in the subtraction images in Fig. 3, the BMD image Table 1. Quantitative performance of CBCT images produced by QCBCT-NET, Cycle-GAN, U-Net, and CAL_CBCT compared to the original QCT images for measuring BMD values at the maxilla (1-81 slices) and mandible (82-200 slices) for test datasets under conditions of 80 kVp and 8 mA, and 90 kVp and 10 mA. MAD mean absolute difference, PSNR peak signal to noise ratio, SSIM structural similarity, NCC normalized cross correlation, SNU spatial nonuniformity, Slope slope of linear regression between the voxel intensities. Mean ± SD. *Significant difference (p < 0.01) between QCBCT-NET and U-Net, † (p < 0.01) between QCBCT-NET and Cycle-GAN, and ‡ (p < 0.01) between QCBCT-NET and CAL_CBCT.  www.nature.com/scientificreports/ quality of the QCBCTs for the two regions exhibited substantial improvement over those of CYC_CBCT, U_ CBCT, and CAL_CBCT in terms of BMD (voxel intensity) differences compared to the original QCT images. The large differences around the teeth and dense bone of higher voxel intensities (BMD) seen in the CAL_CBCT were more reduced in the QCBCT than in the CYC_CBCT or U_CBCT images. Figure 4 shows the BMD (voxel intensity) profiles that were acquired along the dental arch at the maxilla and mandible in the QCT and CBCT images as shown in Fig. 3. The BMD profile from the QCBCT images more closely reflected the original QCT than the CYC_CBCT and U_CBCT images with higher correlations with the QCT than other CBCT images, although the dental implant and restorations showed higher voxel intensities compared to other anatomical structures (Fig. 4). Therefore, the QCBCT image exhibited more improved structural preservation and edge sharpness of the bone than the CYC_CBCT and U_CBCT images at both the maxilla and mandible. The BMD distribution of the QCBCT also more closely restored the original QCT than that of the CYC_CBCT and U_CBCT images in an axial slice at the maxilla and mandible (Fig. 5). The linear relationship between the QCT and QCBCT images showed more contrast and correlation than that between QCT and other CBCT images with the larger slope and better goodness of fit (Fig. 6). The Bland-Altman plot between QCT and QCBCT images also showed higher linear relationships and better agreement limits than that between QCT and other CBCT images (Fig. 7). Therefore, the QCBCT images showed more improvement in preservation for the original distribution and linear relationship of the BMD values compared to CYC_CBCT and U_CBCT images.

Discussion
We developed a hybrid deep-learning model (QCBCT-NET) consisting of Cycle-GAN and U-Net to quantitatively and directly measure BMD from CBCT images. The BMD measurements of QCBCT images produced by QCBCT-NET significantly outperformed the CYC_CBCT images produced by Cycle-GAN and U_CBCT images produced by U-Net at both the maxilla and mandible area when compared to the original QCT. We used paired training data in the Cycle-GAN implementation with the residual blocks, which forced the network to focus on reducing image artifacts and enhancing bone contrast, rather than focusing on bone structural mismatches. Through the residual blocks in the generator architecture of the Cycle-GAN, the network could learn the difference between the source and target based on the residual image and generate corrected bone images more accurately 52 . In a study, a Cycle-GAN was used to capture the relationship from CBCT to CT images while simultaneously supervising an inverse of the CT to CBCT transformation model 36 . The Cycle-GAN doubled the process of a typical GAN by enforcing an inverse transformation, which doubly constrained the model and increased accuracy in the output images 38 . In our study, the Cycle-GAN can learn both intensity and textural mapping from a source distribution of the CBCT bone image to a target distribution of the QCT bone image.
In previous studies, U-Net architectures were used to directly synthesize CT-like CBCT images for their corresponding CT images especially on paired datasets 43,44 . The U-Net could suppress global scattering artifacts and local artifacts derived from CBCT images by capturing both global and local features in the image spatial www.nature.com/scientificreports/ domain 43 . In addition, the spatial uniformity of CT-like CBCT images was enhanced close to those of corresponding CT images while maintaining the anatomical structures on the CBCT images 44 . Therefore, in our results, the spatial uniformity of CBCT images produced by U-Net was improved, but the contrast of the bone images was reduced when compared to the CYC_CBCT images by Cycle-GAN. In our study, the two-channel U-Net, which learned spatial information of CBCTs and corresponding CYC_ CBCT images simultaneously, could improve image contrast and uniformity by suppressing beam hardening artifacts and scattering noise 43 . The CYC_CBCT images out of the two inputs helped the U-Net to focus on learning pixel-wise correspondence (or mapping) between QCT and CBCT images while maintaining the original intensity distribution of the bone structures. The combination loss of MAE and SSIM in the U-Net facilitated faster convergence and better accuracy considering the pixel-wise errors and structural similarity. As a result, the BMDs (voxel intensities) from the QCBCT demonstrated more accuracy than those from the CYC_CBCT and U_CBCT without regard to relative positions of the bone in the image volume 51 , or effects from different radiation doses or scanning conditions used in clinical settings.
We combined the Cycle-GAN with the two-channel U-Net model to further improve the contrast and uniformity of the CBCT bone images. The Cycle-GAN improved the contrast of the bone images by reflecting the original BMD distribution of the QCT images locally, while the two-channel U-Net improved the spatial uniformity of the bone images by globally suppressing the image artifacts and noise. As a result, the Cycle-GAN and two-channel U-Net worked to provide complementary benefits in improving the contrast and uniformity of the bone image locally and globally. Consequently, the QCBCT-NET could substantially enhance the linearity, uniformity, and contrast as well as the anatomical and quantitative accuracy of the bone images in order to quantitatively measure BMD in CBCT. Although the BMD linear relationships and agreement limits of QCBCT images were superior to those of CYC_CBCT and U_CBCT images, the accuracy of our method should be further improved for clinical applications.
Our study had some limitations. First, because paired CBCT and CT images were acquired at different imaging situations typically, the bone structures of the images were not perfectly aligned even after registration. Therefore, the registration error of CBCT and CT images might cause adverse impacts during network training. Second, our study had a potential limitation of generalization ability due to using a relatively small number of training dataset. Overfitting of the training CNN model, which resulted in the model learning statistical regularity specific to the training dataset, could impact negatively the model's ability to generalize to a new dataset 53 . Third, the results presented in this study were based on two human skull phantoms with and without metal restorations instead of actual patients. Our method needs to be validated for the dataset from actual patients having dental fillings and restorations for its application in clinical research and practice, and compared to the conventional scatter-based method in future studies. www.nature.com/scientificreports/

Conclusions
We proposed QCBCT-NET to directly and quantitatively measure BMD from CBCT images based on a hybrid deep-learning model of combining the generative adversarial network (GAN) and U-Net. The Cycle-GAN and two-channel U-Net in QCBCT-Net provided complementary benefits of improving the contrast and uniformity of the bone image locally and globally. The BMD images produced by QCBCT-NET significantly outperformed the images produced by Cycle-GAN or U-Net in MAD, PSNR, SSIM, NCC, and linearity when compared to the original QCT. The QCBCT-NET substantially enhanced the linearity, uniformity, and contrast as well as the anatomical and quantitative accuracy of the bone images, and demonstrated more accuracy than the Cycle-GAN and the U-Net for quantitatively measuring BMD in CBCT. In future studies, we plan to evaluate the proposed method on the actual patient dataset to prove its clinical efficacy. www.nature.com/scientificreports/

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.