Abstract
This study aimed to assess the performance of deep learning (DL) algorithms in the diagnosis of nasal bone fractures on radiographs and compare it with that of experienced radiologists. In this retrospective study, 6713 patients whose nasal radiographs were examined for suspected nasal bone fractures between January 2009 and October 2020 were assessed. Our dataset was randomly split into training (n = 4325), validation (n = 481), and internal test (n = 1250) sets; a separate external dataset (n = 102) was used. The area under the receiver operating characteristic curve (AUC), sensitivity, and specificity of the DL algorithm and the two radiologists were compared. The AUCs of the DL algorithm for the internal and external test sets were 0.85 (95% CI, 0.83–0.86) and 0.86 (95% CI, 0.78–0.93), respectively, and those of the two radiologists for the external test set were 0.80 (95% CI, 0.73–0.87) and 0.75 (95% CI, 0.68–0.82). The DL algorithm therefore significantly exceeded radiologist 2 (P = 0.021) but did not significantly differ from radiologist 1 (P = 0.142). The sensitivity and specificity of the DL algorithm were 83.1% (95% CI, 71.2–93.2%) and 83.7% (95% CI, 69.8–93.0%), respectively. Our DL algorithm performs comparably to experienced radiologists in diagnosing nasal bone fractures on radiographs.
Similar content being viewed by others
Introduction
Nasal bone fractures are the most common fractures of the facial bones, accounting for up to 50% of all incidences1. Moreover, they represent the third most common bone fracture overall2. The anatomy of the nasal bone—a thin, protruded structure of the face—makes it especially vulnerable to fractures. Nasal bone fractures have various causes, including sports injuries, physical fights, traffic accidents, and falls3. Complications of nasal bone fractures vary in severity from mild forms, including nasal septal deviation, nasal obstruction, and olfactory disturbances, to more severe complications, such as cerebrospinal fluid rhinorrhea4. Careful physical, clinical, and radiographic examinations are primarily used to diagnose nasal bone fractures5.
Radiological modalities for diagnosing nasal bone fractures include high-resolution computed tomography (CT), plain radiography, and ultrasonography. Among these, CT shows the highest diagnostic accuracy and was found to also be superior in detecting other associated injuries6. However, plain nasal radiography still shows substantial reliability1, with a sensitivity of approximately 80%7. Considering the benefits of low radiation exposure as well as its cost-effectiveness and accessibility, plain radiography remains the initial diagnostic tool for screening simple nasal bone fractures.
Deep learning (DL) is a specialized subset of machine learning, a multilayered information processing technique. The abilities of DL systems include self-extraction of data from raw input and self-learning, and DL-based applications are emerging in various fields of medicine8. In recent studies, a specific subcategory of DL, convolutional neural networks (CNNs), have shown reliable and accurate performance in various radiological subspecialties; they are thus expected to show significant potential in improving patient outcomes in the future9. In medical imaging, DL has been applied in the diagnosis, classification, and prediction of outcomes or survival from a variety of diseases; applications include, for example, fatty liver disease risk stratification from CT images, the diagnosis of prostate cancer from MRI, and the prediction of acute respiratory disease prognosis using CT9.
As conventional radiography is used in various medical subspecialties, a number of studies have reported on DL algorithms that detect abnormalities on radiographs and verified their diagnostic performance compared to that of radiologists. For instance, DL algorithms showed outstanding results in chest radiograph interpretation10, the detection of hip fractures11, and the diagnosis of moyamoya disease12, paranasal sinusitis13,14, proximal humerus fractures15, knee osteoarthritis16, pediatric skull fractures17, and developmental dysplasia of the hip18. Furthermore, a recent study explored the potential of DL in diagnosing nasal fractures in CT scans19. However, the DL-based detection of nasal bone fractures using conventional radiography has not been investigated. Therefore, here, we aimed to design a CNN-based algorithm to detect nasal bone fractures on plain radiographs and compare its diagnostic performance with that of radiologists.
Results
Patients
The baseline characteristics of the study cohort are summarized in Table 1. The training, validation, and internal test cohorts consisted of 4325, 481, and 1250 patients, respectively (mean age, 44–46 years; men, 58.5–68.2%; fractures, 45.2–46.2%). The external test cohort included 102 patients (mean age, 46 years; men, 52%; fractures, 57.8%). There were no significant differences among the cohorts in mean age (P = 0.101) or proportion of fractures (P = 0.06). There were significantly more men in the training and validation cohorts (67.6–68.2%) than in the internal and external test sets (52–58.5%) (P < 0.001).
Diagnostic performance of the deep learning model and radiologists
On the internal test set, the DL model demonstrated excellent diagnostic performance with an AUC of 0.931 (95% CI, 0.915–0.944), sensitivity of 82.2% (95% CI, 78.3–86.5%), specificity of 89.6% (95% CI, 85.2–92.4%), and accuracy of 85.9% (95% CI, 84–87.8%) (Fig. 1a).
The diagnostic performance measures of the DL model and the two radiologists on the external test set are listed in Table 2. The DL model showed an AUC of 0.857 (95% CI, 0.782–0.931), sensitivity of 83.1% (95% CI, 71.2–93.2%), specificity of 83.7% (95% CI, 69.8–93%), and accuracy of 83.3% (95% CI, 75.5–90.2%). The AUCs of radiologists 1 (blinded, with 9 years of experience) and 2 (blinded, with 6 years of experience) were 0.799 (95% CI, 0.729–0.868) and 0.749 (95% CI, 0.676–0.819), respectively. The AUCs of the two radiologists did not significantly differ (P = 0.172). The inter-rater agreement between the two radiologists was substantial (Cohen κ coefficient, 0.67), whereas it was fair to moderate between the radiologists and the DL model (Cohen κ coefficient, 0.28–0.41).
Comparison of diagnostic performance measures
The AUC of the DL model was significantly higher than that of radiologist 2 (0.857 vs. 0.749; P = 0.021) but not significantly different from that of radiologist 1 (0.857 vs. 0.799; P = 0.142). The ROC curves of the DL model and those of the two radiologists are depicted in Fig. 1b. Example radiographs of correctly diagnosed nasal bone fractures with Grad-CAM heatmaps are shown in Fig. 2. The nasal bone area was properly highlighted in the heatmap in both views for most patients (n = 90). For nine of the remaining 12 subjects, the nasal bone was highlighted in only one view while the nasal bone was not highlighted in both views in three patients. Additionally, examples of normal nasal radiographs incorrectly diagnosed as fractures by the DL model are shown in Fig. 3.
Discussion
The current study trained and validated a CNN-based DL model for the diagnosis of nasal bone fractures using plain bilateral radiographs. The DL model demonstrated excellent diagnostic performance on both the internal (AUC: 0.931) and external (AUC: 0.857) test sets. Furthermore, the DL model showed a diagnostic performance comparable to that of experienced radiologists (AUC: 0.749–0.799).
With regard to more recent studies comparing the diagnostic performance of DL and radiologists on conventional radiographs, DL models have been shown to outperform radiologists in diagnosing maxillary13 and paranasal sinusitis14 and have demonstrated comparable diagnostic performance for pediatric supracondylar fractures20 and developmental dysplasia of the hip18 as well as general chest radiograph interpretation10. However, it is important to note that the DL model’s comparable performance to radiologists does not lessen the need for critical appraisal by human practitioners, including a comprehensive review of a patient’s clinical information; rather, it should be seen as a complimentary diagnostic aid.
A previous study found that only 82% of nasal bone fractures can be identified on plain radiographs1. In this study, the DL model had a sensitivity of 83.1%, indicating an almost perfect diagnostic performance for this imaging modality. While CT images with thin slice thickness reconstruction are the imaging modality of choice for diagnosing nasal bone fractures21, conventional radiographs have advantages such as lower radiation exposure, fast image acquisition, and cost-effectiveness. Although conventional radiography is not the most accurate imaging modality for diagnosing nasal bone fractures, DL-assisted radiography would help expedite the diagnosis of nasal bone fractures and address resource scarcity in clinical practice. However, the definite diagnosis of nasal bone fractures depends on both CT and conventional radiographs owing to the inherently limited diagnostic capability of conventional radiographs.
The main strength of the current study is that our DL model was trained on well-balanced real-world clinical datasets, and that the proportion of fracture cases was almost equivalent to the normal incidence (40–50%) across different cohorts. This is particularly important, considering that a DL model trained on an imbalanced training set is more vulnerable to biases and more likely makes decisions in favor of the majority class22. Furthermore, the diagnostic performance of our DL model was compared to that of radiologists using a geographically separate dataset, which increases the generalizability of our results. Additionally, we applied Grad-CAM to overlay heatmaps onto radiographs to improve the transparency and interpretability of our DL model. In fact, the intensities of the heatmaps mostly focused on the nasal areas, suggesting that the model correctly recognized the nasal area when identifying a fracture, even in the presence of artifacts such as skin folds or metallic wires of facial masks.
There are, however, several limitations of the current study that need to be addressed. First, the sample size of the external test set was relatively small, which may somewhat reduce the generalizability of our DL model to real-world practice settings. Second, the number of reviewers was limited to two experienced radiologists. The potential benefit of the DL model could have been further validated if additional reviewers from different backgrounds, such as emergency physicians or plastic surgeons, had been included. Third, there was a significant difference in the proportion of sexes between training/validation and test sets. However, no significant anatomical difference in nasal bones has been found between sexes23, and thus this difference would have not affected the outcome of this study. Finally, the study cohorts comprised an Asian population whose nasal bone anatomy may differ from that of other ethnicities. Thus, the developed DL model may not yield similar results for patients of different ethnicities.
In conclusion, our CNN-based DL model demonstrated excellent performance comparable to that of experienced radiologists in diagnosing nasal bone fractures on conventional radiographs. This promising finding could translate into DL applications that can be used as diagnostic assistance tools for patients with facial trauma.
Methods
The institutional review boards of Seoul St. Mary’s Hospital (KC22RISI0472) and Yeouido St. Mary’s Hospital (SC22RISI0102) approved this retrospective study and waived the requirement for informed consent considering the retrospective nature of the analyses involving only anonymized data. All methods were performed in accordance with the ethical standards of our institutional research committee and with the 1964 Helsinki declaration and its later amendments.
Dataset
A total of 9596 nasal radiographs from 6713 adult patients were examined for suspected nasal bone fractures between January 2009 and October 2020 at our institution. All nasal bone radiographs were exported from the institutional picture archiving and communication system in anonymized DICOM format. The data were exported and initially reviewed by a (blinded) 3rd-year radiology resident in training. The criteria for determining nasal bone fractures were based on surgical findings when available or clinical/radiological consensus obtained from electronic medical records. Nasal radiographs were examined bilaterally, and records of patients with only unilateral (left or right) nasal radiographs were excluded (n = 657). The remaining patients were then randomly divided into the training, validation, and internal test cohorts. The external test cohort consisted of patients who presented at a geographically separated tertiary hospital between July 2019 and December 2020 (Fig. 4).
Image assessment by experienced radiologists
Two fellowship-trained radiologists (both blinded, with 6 and 9 years of experience in head and neck diagnostic radiology, respectively) independently reviewed and classified the nasal radiographs as either fractures or normal bone. Both radiologists routinely interpret nasal bone radiographs. A fracture was diagnosed if any radiologic features of nasal bone fractures, including fracture lines, displacement, depression, deformity, and angulation, were present.
Image preprocessing
Before the DL model training, all images were preprocessed in the following way. First, images were resized by adjusting the image ratio, with the length of the short axis set to 512 pixels. Intensity normalization was then performed to obtain a pixel value between 0 and 1.
Training the deep learning algorithm
WE trained a DL model to classify the nasal radiographs. Two lateral views from each patient were used as the model input, and the presence of a nasal fracture was determined via binary classification. First, to simultaneously utilize the features of both views, the imaging features were independently extracted from each view using the backbone of EfficientNet-B7 model24. The parameters of the backbone were initialized by loading an ImageNet pretrained model. The input image size of each CNN model was 448 × 448 pixels, and 2560 imaging features were extracted. Then, binary classification was performed with a multilayer perceptron model using the concatenated 5120 features extracted from the two CNN model paths as inputs. The multilayer perceptron model consists of three hidden layers to reduce the number of hidden units by half for each hidden layer. To reduce overfitting of the training data, various random transformations, including random flipping, rotation, affine transforms, intensity inverting, addition of random noises, and random cropping were applied during the model training. Cross-entropy was used as a cost function, and the model parameters were updated using the AdamW algorithm25 with a learning rate of 0.0001 and a weight decay coefficient of 0.001. A cut-off value of 0.5 was used in the cross-entropy loss function such that the parameters were updated based on whether the prediction reached the cut-off value. For each epoch, the AUC for the validation set was calculated and the parameters in the epoch with the best AUC result (the 140th epoch out of the total of 256 epochs in our training process) were selected. After the training process was completed, in the inference of the test set, the probability of an image containing a nasal fracture was determined by averaging the two results, with the order of the two views changed. To identify the key regions that contributed to the model’s decision, Gradient-weighted Class Activation Mapping (Grad-CAM) was applied at the last convolutional layer of the CNN model for each view. We reviewed the Grad-CAM results of the external dataset to observe whether Grad-CAM properly emphasized the nasal bone area. The overall DL model architecture used in this study is summarized in Fig. 5, and more details and the codes of the model were uploaded to the Github repository (https://github.com/hufsaim/nasalbone).
Statistical analysis
Continuous and categorical variables were compared among the training, validation, and internal/external test cohorts using one-way ANOVA and Pearson’s Chi-squared test, respectively. Cohen’s κ was calculated to assess the inter-rater agreement between the DL model and the two radiologists. The level of agreement was determined as none to slight if κ was 0.01–0.20; fair at κ = 0.21–0.40; moderate at 0.41–0.60; substantial at 0.61–0.80; and almost perfect at 0.81–1.0026. To evaluate the diagnostic performance of the DL model, the AUC was calculated for both the internal and the external test set. The optimal threshold for the ROC curve was determined using the Youden index. The AUCs of the DL model and the radiologists were compared using the DeLong method27. Confidence intervals for sensitivities, specificities, and accuracies were derived from 2000 bootstrap replicates using the “pROC” R package. All statistical analyses were performed using R statistical software (v. 4.1.2, Vienna, Austria) and Stata (v. 16.1, College Station, TX, USA). A two-sided P-value of less than 0.05 was considered statistically significant.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Abbreviations
- CNN:
-
Convolutional neural network
- DL:
-
Deep learning
- Grad-CAM:
-
Gradient-weighted Class Activation Mapping
References
Hwang, K., You, S. H., Kim, S. G. & Lee, S. I. Analysis of nasal bone fractures; A six-year study of 503 patients. J. Craniofac. Surg. 17, 261–264 (2006).
Murray, J. A., Maran, A. G., Mackenzie, I. J. & Raab, G. Open v closed reduction of the fractured nose. Arch. Otolaryngol. 110, 797–802 (1984).
Hwang, K., Ki, S. J. & Ko, S. H. Etiology of nasal bone fractures. J. Craniofac. Surg. 28, 785–788 (2017).
Hwang, K., Yeom, S. H. & Hwang, S. H. Complications of nasal bone fractures. J. Craniofac. Surg. 28, 803–805 (2017).
Hoffmann, J. F. An algorithm for the initial management of nasal trauma. Facial Plast. Surg. 31, 183–193 (2015).
Hwang, K., Jung, J. S. & Kim, H. Diagnostic performance of plain film, ultrasonography, and computed tomography in nasal bone fractures: A systematic review. Plast. Surg. (Oakv) 26, 286–292 (2018).
Baek, H. J., Kim, D. W., Ryu, J. H. & Lee, Y. J. Identification of nasal bone fractures on conventional radiography and facial CT: Comparison of the diagnostic accuracy in different imaging modalities and analysis of interobserver reliability. Iran. J. Radiol. 10, 140 (2013).
Ueda, D., Shimazaki, A. & Miki, Y. Technical and clinical overview of deep learning in radiology. Jpn. J. Radiol. 37, 15–33 (2019).
Saba, L. et al. The present and future of deep learning in radiology. Eur. J. Radiol. 114, 14–24 (2019).
Wu, J. T. et al. Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents. JAMA Netw. Open 3, e2022779 (2020).
Cheng, C. T. et al. Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs. Eur. Radiol. 29, 5469–5477 (2019).
Kim, T. et al. Machine learning for detecting moyamoya disease in plain skull radiography using a convolutional neural network. EBioMedicine 40, 636–642 (2019).
Kim, Y. et al. Deep learning in diagnosis of maxillary sinusitis using conventional radiography. Investig. Radiol. 54, 7–15 (2019).
Jeon, Y. et al. Deep learning for diagnosis of paranasal sinusitis using multi-view radiographs. Diagnostics (Basel) 11, 250 (2021).
Chung, S. W. et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 89, 468–473 (2018).
Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P. & Saarakkala, S. Automatic knee osteoarthritis diagnosis from plain radiographs: A deep learning-based approach. Sci. Rep. 8, 1727 (2018).
Choi, J. W. et al. Deep learning-assisted diagnosis of pediatric skull fractures on plain radiographs. Korean J. Radiol. 23, 343–354 (2022).
Park, H. S. et al. Diagnostic performance of a new convolutional neural network algorithm for detecting developmental dysplasia of the hip on anteroposterior radiographs. Korean J. Radiol. 22, 612–623 (2021).
Seol, Y. J., Kim, Y. J., Kim, Y. S., Cheon, Y. W. & Kim, K. G. A study on 3D deep learning-based automatic diagnosis of nasal fractures. Sensors 22, 506 (2022).
Choi, J. W. et al. Using a dual-input convolutional neural network for automated detection of pediatric supracondylar fracture on conventional radiography. Investig. Radiol. 55, 101–110 (2020).
Ziccardi, V. B. & Braidy, H. Management of nasal fractures. Oral Maxillofac. Surg. Clin. N. Am 21, 203–208, vi (2009).
Candemir, S., Nguyen, X. V., Folio, L. R. & Prevedello, L. M. Training strategies for radiology deep learning models in data-limited scenarios. Radiol. Artif. Intell. 3, e210014 (2021).
Hwang, T.-S., Song, J., Yoon, H., Cho, B.-P. & Kang, H.-S. Morphometry of the nasal bones and piriform apertures in Koreans. Ann. Anat. Anat. Anz. 187, 411–414 (2005).
Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In EfficientNet: Rethinking model scaling for convolutional neural networks 6105–6114 (PMLR, 2019).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
McHugh, M. L. Interrater reliability: The kappa statistic. Biochem. Med. (Zagreb) 22, 276–282 (2012).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837–845 (1988).
Acknowledgements
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (2021R1I1A1A01040285, 2020R1F1A1070517).
Author information
Authors and Affiliations
Contributions
Y.N.: Data curation; Formal analysis; Investigation; Methodology; Software; Validation; Visualization. Y.C.: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Resources; Software; Supervision; Validation; Visualization; Roles/Writing—original draft; Writing—review and editing. J.K.: Formal analysis. S.J.H.: Data curation; Resources. M.K.S.: Data curation; Resources. M.K.L.: Data curation; Resources.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nam, Y., Choi, Y., Kang, J. et al. Diagnosis of nasal bone fractures on plain radiographs via convolutional neural networks. Sci Rep 12, 21510 (2022). https://doi.org/10.1038/s41598-022-26161-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-022-26161-7
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.