Identi�cation of Osteoporosis Using Ensemble Deep Learning Model with Panoramic Radiographs and Clinical Covariates

Osteoporosis is becoming a global health issue due to increased life expectancy. However, it is di�cult to detect in its early stages owing to a lack of discernible symptoms. Hence, screening for osteoporosis with widely used dental panoramic radiographs would be very cost-effective and useful. In this study, we investigate the use of deep learning to classify osteoporosis from dental panoramic radiographs. In addition, the effect of adding clinical covariate data to the radiographic images on the identi�cation performance was assessed. For objective labeling, a dataset containing 778 images was collected from patients who underwent both skeletal-bone-mineral density measurement and dental panoramic radiography at a single general hospital between 2014 and 2020. Osteoporosis was assessed from the dental panoramic radiographs using convolutional neural network (CNN) models, including EcientNet-b0, -b3, and -b7 and ResNet-18, -50, and -152. An ensemble model was also constructed with clinical covariates added to each CNN. The ensemble model exhibited improved performance on all metrics for all CNNs, especially accuracy and AUC. The results show that deep learning by CNN can accurately classify osteoporosis from dental panoramic radiographs. Furthermore, it was shown that the accuracy can be improved using an ensemble model with patient covariates.


Introduction
Osteoporosis is de ned by the loss of bone mass and the deterioration of the microarchitecture of bone tissue 1 .It is a common and potentially metabolic bone disease characterized by susceptibility to fracture.Fractures of the spine, hips, and wrists caused by osteoporosis signi cantly impair the quality of life of patients.In addition, in severe cases, it can lead to disorders that increase the risk of mortality 2 .With the rapid aging of the population caused by the increase in life expectancy in recent years, millions of people are affected annually worldwide, and osteoporosis is becoming a global public health problem.However, osteoporosis initially develops without any symptoms and can go undetected in its early stages 3 .
Dual-energy X-ray absorptiometry (DXA) is an effective means of identifying bone mineral density (BMD) and is the standard test for diagnosing osteoporosis 4 .Despite being standard inspection methods, DXA scans are relatively expensive 5 , which makes them unsuitable for general screening.Dental panoramic radiographs are frequently taken during regular dental examinations or before certain dental procedures.
Therefore, it would be of great medical and economic value if dentists could use dental panoramic radiographs to screen patients for osteoporosis.This approach is also clinically useful in that dentists can refer patients with suspected osteoporosis to specialists.Several researchers have analyzed dental panoramic radiographs to provide initial diagnoses of osteoporosis [6][7][8][9][10][11][12][13][14][15][16] .
The detection of osteoporosis using panoramic radiographs has been investigated in relation to several concentrations and linear measurements, such as the mandibular cortical width (MCW), mandibular cortex index (MCI), mental index, and panoramic mandibular index [6][7][8][9][10][11][12][13][14][15][16] In addition, the diagnosis of osteoporosis using a support vector machine has been reported 16 .However, these diagnostic imaging methods have not been commonly used because they require complicated preprocessing, image normalization, and complicated and specialized measurements for diagnosis.In contrast, the diagnosis of osteoporosis by deep learning using a convolutional neural network (CNN) that does not require complicated pretreatment has also been reported.One study that used deep learning focusing on the mandibular cortical bone produced a high diagnostic accuracy of 84.0%, and an area under the curve (AUC) of the receiver operating characteristic (ROC) curve of 0.858 17 .It has been suggested that deep learning using X-ray images can be useful for diagnosing osteoporosis.
The conventional methods of classifying osteoporosis by extracting each feature from panoramic images are extremely useful.However, osteoporosis is associated with systemic patient factors 18 .We hypothesized that the diagnostic accuracy using deep learning and X-ray images would be improved by constructing a CNN in which patient factors are added.
The purpose of this study was to construct an osteoporosis classi er from dental panoramic radiographs.
In addition, we developed an osteoporosis classi er based on an ensemble model in which the clinical covariates of patients were added to dental panoramic radiographs to statistically clarify the effect of classi cation accuracy on the addition of clinical covariates.

Comparison between image-only model and ensemble model
Table 2 shows the performance metrics, p-values, and effect sizes for ResNet-18, -50, and -152.All performance metrics were elevated using the ensemble model.Both the image-only model and the ensemble model showed higher performance in the order of ResNet-18, -50, and -152.There is a strongly statistically signi cant difference between the two groups, especially in terms of accuracy and AUC.In the effect size evaluation, the AUC had the highest effect in all ResNet models, categorized as very large.
Table 3 shows the performance metrics, p-values, and effect sizes for E cientNet-b0, -b3, and -b7.As with ResNet, all performance metrics are increased by the ensemble model.Both the image-only model and ensemble model show higher performance in the order of E cientNet-b0, -b3, and -b7.The two-group comparison also showed strong statistically signi cant differences in accuracy and AUC, and the effect sizes were all very large.Among all CNN models, E cientNet-b7 produced the highest accuracy, AUC, and F1 score.Figure S1 shows the ROC curves corresponding to ResNet and E cientNet.

Visualization of model identi cation
Figure 1 shows the focused visualization area obtained by guided Grad-CAM.We selected the ensemble analysis using E cientNet-b0, -b3, and -b7 and ResNet-18, -50, and -152.Both E cientNet and ResNet commonly focused on the cortical bone region of the mandibular lower border as a feature region.E cientNet determined that this area was a characteristic region in non-osteoporosis images.In contrast, in the osteoporosis images, the area above the cortical bone was judged to be a characteristic region in addition to the cortical bone region of the mandibular lower border.ResNet characterized the cortical bone at the lower edge of the mandible more strongly.In osteoporosis images, ResNet-50 and -152 paid particular attention to the mandibular lower border cortical bone.ResNet did not consider the area above the mandibular cortical bone as a characteristic region, whereas E cientNet did.In the non-osteoporosis images, the cortical bone in the entire mandibular lower border was judged to constitute a characteristic region.In both E cientNet and ResNet, the larger the number of parameters, the smaller the variation in the area that captured the image features.

Discussion
This study has demonstrated that CNNs can diagnose osteoporosis from dental panoramic radiographs with high levels of accuracy.Moreover, including patient variables involved in routine clinical settings improved the performance metrics of all predictions compared to using the image-only model.In particular, the accuracy and AUC were statistically signi cantly improved by the sample model.
There was no signi cant difference in diagnostic accuracy for our images compared to previous reports of osteoporosis classi cation by deep learning using dental panoramic radiographs 17 .The advantage of the method applied in this work is that we created a model with clinical patient covariates added to improve the accuracy of deep learning using images.This article is the rst to report on the identi cation of osteoporosis using an ensemble model from dental panoramic radiographs.The addition of patient covariates provided additional information regarding important osteoporosis classi cations and improved all performance metrics over the image-only model.In particular, the accuracy and AUC were statistically signi cantly improved by the sample model.
It is presumed that the diagnostic accuracy was improved because advanced inference was enabled by deep learning that simultaneously considers important information related to clinical covariates that cannot be extracted from dental panoramic X-ray images alone.
In this study, we used ResNet and E cientNet CNNs.In general, CNNs have a deep hierarchical structure to improve accuracy.ResNet-152 and E cientNet-B7 showed the highest accuracy among the ResNet and E cientNet approaches, respectively.This nding is consistent with the results of previous studies 19,20 .In addition, performance improvements were obtained in all cases from the CNNs with few parameters compared to the CNNs with numerous parameters.This result suggests that the ensemble model that added structured data contributed to the performance improvement of the model regardless of the layer depth.
In our study, using patient clinical covariate data structured with images was more e cient in classifying osteoporosis by deep learning than using images alone.Only a few scholars have employed images using deep learning and ensemble models with clinical covariates 21,22 Clinical data that re ect the general condition of the patient are important factors in the diagnosis of osteoporosis 23 .However, unfortunately, it is di cult to collect highly specialized clinical information such as accurate histories of fractures and time of menopause from rst-time patients at dental clinics.Our study envisaged a more accurate screening method for dentists involving panoramic radiographs.We created an ensemble model with relatively high osteoporosis classi cation accuracy using age, gender, and BMI, which are easily collectable and clinically important data, as clinical covariates.
In this study, we used guided Grad-CAM technology to visualize feature regions in deep learning.The visualization of the feature area was different between ResNet and E cientNet, and this result was extremely interesting.ResNet focused on the cortical bone in the mandibular lower border.In contrast, E cientNet focused on the area above the cortical bone in addition to the cortical bone in the mandibular lower border.In previous studies, the MCW and MCI were used as indicators in osteoporosis screening 9,10,12,15 .MCI is a screening method that focuses on structural changes in the cortical bone due to bone resorption 24 .It is presumed that ResNet mainly focused on the MCW, whereas E cientNet regarded both the MCW and MCI as characteristic areas.The MCW may not have shown the ability to detect osteoporosis 25 , and the MCI was not reproducible, which were drawbacks of these measurement methods 15 .The MCW is characterized by higher speci city than sensitivity 26 .It was speculated that ResNet showed higher speci city mainly due to the MCI and derived from its characteristic region.The high classi cation performance of E cientNet may be due to its focus on each of the two measurement methods.
The advantage of this study over previous works is the statistical assessment of the additional effects of patient factors on the identi cation of osteoporosis from panoramic radiographs using deep learning.To the best of our knowledge, this study is the rst to adopt this approach.In addition, the effect sizes calculated in this study will facilitate sample size estimation in future works.
This study has two notable limitations.Although we utilized more cases than previous research 17 , it was di cult to collect su cient image data from a single general hospital.CNNs with a small amount of data can lead to over tting and reduced generalization.We organized the data to avoid over tting and used cross-validation and early-stopping learning methods.In general, models trained by deep learning from large image datasets are effective for image classi cation.By increasing the amount of data through multi-center collaborative research, the accuracy and generalization of CNN classi cation diagnosis can be improved.The second limitation is the type of CNN adopted for validation.In this study, E cientNet and ResNet were examined at various depths.If a CNN with fewer parameters could achieve higher performance, it would be more widely applicable as the calculation cost would thus decrease.The identi cation of various CNNs suitable for image quality and patient covariate ensembles remains as an important task for future research.In this study, the images were manually cropped to include the mandibular inferior margin in the center of the mandibular body as a preoperative preparation to classify osteoporosis.In the future, the construction of a network that can screen for osteoporosis from dental panoramic radiographs by automatically detecting the ROI from untrimmed dental panoramic radiographs is required.Muramatsu et al. reported on the automatic detection of MCI 27 , which could be applied to the setting of ROIs.Furthermore, it is ideal to ensemble patient covariates automatically by linking them with electronic medical record information.

Conclusions
We have demonstrated that deep learning by CNN models can classify osteoporosis with relatively high accuracy from dental panoramic radiographs.Furthermore, we showed that an ensemble model with patient covariates can classify osteoporosis with high accuracy.The ensemble models of E cientNet-B7 and ResNet-152 were classi ed with the highest accuracy.These results are expected to play an important role in the screening of osteoporosis in clinical dentistry settings.

Study design
The aim of this study was to classify osteoporosis and non-osteoporosis using a dataset segmented from panoramic radiographs and several different CNNs.Supervised learning was employed as a deep learning method.We statistically investigated the effect of adding covariates extracted from clinical records on the accuracy of the osteoporosis identi cation.

Data acquisition
We retrospectively used clinical and radiographic data from March 2014 to September 2020.This study protocol was approved by the institutional review boards of the respective institutions hosting this work (i.e., the review boards of Kagawa Prefectural Central Hospital, approval number 994), following Ethical guidelines for clinical research and in accordance with the ethical principles that have their origins in the Declaration of Helsinki and its subsequent amendments.Informed consent from individual patients for this retrospective study was waived at the discretion of the institutional review committee (Kagawa Prefectural Central Hospital Ethics Committee) because protected health information was not used.The study included 902 consecutive images from enrolled patients who underwent panoramic radiography within the rst year of receiving DXA at our hospital.
Osteoporosis was diagnosed by the DXA method using the hip or spine.The parameters investigated included the automatically generated BMD (g/cm 3 ) and T-score.Osteoporosis was diagnosed when the Tscore of the BMD was less than -2.5 and non-osteoporosis when the T-score was -2.5 or more, according to the diagnostic criteria of the World Health Organization 28 .When DXA was performed at both the hip and spine sites, the result with the lower T-score was used for diagnosis.
The following panoramic radiographs were excluded from this study: 119 images of patients taking antiresorptive agents such as bisphosphonates or anti-RANKL antibodies, 3 images of foreign substances such as plates and gastric tubes, 1 image of a mandibular fracture, and 1 image with poor panoramic radiography.Further analysis was conducted on the remaining 778 images.

Data preprocessing
Dental panoramic radiographs of each patient were utilized to acquire images using an AZ3000CMR (ASAHI ROENTGEN IND.Co., Ltd., Kyoto, Japan).All data images were output in .tiffformat (2964 × 1464 pixels) from the Kagawa Prefectural Central Hospital PACS system (HOPE DrABLE-GX, FUJITSU Co., Tokyo, Japan).We isolated the cortical bone at the lower edge of the mandible in the images.Two maxillofacial surgeons manually placed and cropped regions of interest (ROIs) on the dental panoramic radiograph images using Photoshop Elements (Adobe Systems, Inc., San Jose, CA, USA).The ROI was set according to previous studies of deep learning that identi ed the ROI in osteoporosis by panoramic radiography.A previous study identi ed the middle area of the mandibular lower border as the ROI 17 .To ensure reproducibility, the mental foramen was used as the reference point at the mid-point of the mandible.The ROI was created to be 250 × 400 pixels in size just below the reference point to include the lower edge of the mandible.All analyses in this study were performed on the left side, as shown in Fig. 2.
The cropped image was saved in PNG format.The oral and maxillofacial surgeons who cropped the image data were completely unaware of the osteoporotic status of each patient as this information was concealed from them according to the experimental design.

CNN model architecture
In this study, the evaluation was performed using the standard CNN models, including a residual neural network (ResNet) 19 and E cientNet 20 .ResNet, invented by He et al. 19 , won the classi cation task of the ILSVRC2015 Challenge.Generally, deepening the network layer improves the accuracy of image identi cation, but conversely, a network layer that is too deep reduces the accuracy.To deal with this issue, we introduced a learning method called residual learning that involves a network that can be deepened to 152 layers.This representative of the ResNet architecture has 18, 50, and 152 layers.E cientNet is a CNN that was proposed as a state-of-the-art image classi cation methods on ImageNet data in 2019.Although the number of parameters is smaller than that of the conventional CNN model, E cientNet is a high-speed and relatively accurate CNN model that uses E cientNet-b0, -b3, and 0b7 models.For e cient model building 29 , it is possible to ne-tune the weights of existing models as initial values for additional learning, therefore, all CNNs were used to transfer learning with ne-tuned pre-trained weights using the ImageNet database 30 .The process of deep learning analysis was implemented using the PyTorch deep learning framework and the Python programming language.

Clinical covariates
Patients in the high risk group for osteoporosis are generally female, older, and with lower body mass indices (BMIs) 31 .There are many other patient factors, but age, gender, and BMI were selected as factors that can be easily identi ed by dentists.BMI is given by weight in kilograms divided by the square of height in meters.Patients' weight and height were recorded at the time of BMD measurement.Table 1 shows the clinical and demographic characteristics of the patients in this study.

Architecture of the ensemble model
Our key performance indicators, namely, the osteoporosis discrimination accuracy, precision, recall, speci city and F1 score, are de ned by equations ( 1), ( 2), ( 3), (4), and (5), respectively, which account for the relations between the positive labels of the data and those given by the classi er.We also calculated the ROC curve and measured the AUC.
Here, TP and TN represent the numbers of true positive and true negative results, respectively, and FP and FN represent the numbers of false positives and false negatives, respectively.M 1 and M 2 are the means for the ensemble and image-only models, s 1 and s 2 , respectively, are the standard deviations for the ensemble and image-only models, and n 1 and n 2 , respectively, are the numbers for the ensemble and image-only models.
Statistical analyses were performed for each performance metric with the use of JMP Statistics Software Package Version 14.2.0 for Macintosh (SAS Institute Inc., Cary, NC, USA).P < 0.05 was considered statistically signi cant, and 95% con dence intervals were calculated.Parametric tests were performed based on the results of the Shapiro-Wilk test.The effect sizes were calculated as Hedges' g (unbiased Cohen's d).The effect size was determined as follows based on the criteria proposed by Cohen and expanded by Sawilowsky 32 : a very small effect was 0.01, small effect was 0.2, medium effect was 0.5, large effect was 0.8, very large effect was 1.0, and huge effect was 2.0.

Visualization of the computer-assisted diagnostic system
Gradient-weighted class activation mapping (Grad-CAM) is a technology that visualizes important pixels by weighting the gradient with respect to the predicted value 33 .It shows information that is signi cant for identi cation: the high gradient of the input to the last convolutional layer.Guided Grad-CAM is a combination of Grad-CAM and backpropagation visualization techniques that are useful for identifying detailed feature locations.The feature area visualization was reconstructed from the last convolution layer of each CNN in this study.

Figures Figure 1
Figures

Table 1 :
Clinical and demographic characteristics of the patients.

Table 2 :
Comparison of performance metrics in ResNet.

Table 3 :
Comparison of performance metrics in E cientNet.