Proximal femur fracture detection on plain radiography via feature pyramid networks

Yıldız Potter, İlkay; Yeritsyan, Diana; Mahar, Sarah; Kheir, Nadim; Vaziri, Aidin; Putman, Melissa; Rodriguez, Edward K.; Wu, Jim; Nazarian, Ara; Vaziri, Ashkan

doi:10.1038/s41598-024-63001-2

Download PDF

Article
Open access
Published: 27 May 2024

Proximal femur fracture detection on plain radiography via feature pyramid networks

İlkay Yıldız Potter¹,
Diana Yeritsyan^2,3,
Sarah Mahar^2,3,
Nadim Kheir^2,3,
Aidin Vaziri¹,
Melissa Putman⁴,
Edward K. Rodriguez^2,3,
Jim Wu⁵,
Ara Nazarian^2,3,6^na1 &
…
Ashkan Vaziri¹^na1

Scientific Reports volume 14, Article number: 12046 (2024) Cite this article

277 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Hip fractures exceed 250,000 cases annually in the United States, with the worldwide incidence projected to increase by 240–310% by 2050. Hip fractures are predominantly diagnosed by radiologist review of radiographs. In this study, we developed a deep learning model by extending the VarifocalNet Feature Pyramid Network (FPN) for detection and localization of proximal femur fractures from plain radiography with clinically relevant metrics. We used a dataset of 823 hip radiographs of 150 subjects with proximal femur fractures and 362 controls to develop and evaluate the deep learning model. Our model attained 0.94 specificity and 0.95 sensitivity in fracture detection over the diverse imaging dataset. We compared the performance of our model against five benchmark FPN models, demonstrating 6–14% sensitivity and 1–9% accuracy improvement. In addition, we demonstrated that our model outperforms a state-of-the-art transformer model based on DINO network by 17% sensitivity and 5% accuracy, while taking half the time on average to process a radiograph. The developed model can aid radiologists and support on-premise integration with hospital cloud services to enable automatic, opportunistic screening for hip fractures.

Segment anything in medical images

Article Open access 22 January 2024

Implementing vision transformer for classifying 2D biomedical images

Article Open access 31 May 2024

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Introduction

Hip pain is a common reason for presentation to an emergency room. A traumatic event such as a fall associated with hip pain or a deformity points to a high suspicion of hip fracture and is often easily diagnosed with radiography. However, some fractures are not obvious and can constitute an occult, valgus impacted, or nondisplaced fracture that can be missed in initial radiographic assessments, often requiring additional imaging modalities such as CT or MRI. In the United States, hip fractures exceed 250,000 annually, with an incidence of 80 per 100,000 population^1,2,3. Hip fracture incidence rates are known to increase exponentially with age in both women and men⁴, secondary to osteoporosis and osteoporosis. This is the most common metabolic disease worldwide, predominantly affecting the elderly population⁵ and characterized by decreased bone mineral density and loss of trabecular architecture⁶, bone microstructural deterioration, and increased fracture risk⁷. A total of 8.9 million fractures are caused by osteoporosis in the world annually, resulting in an osteoporotic fracture every 3 s⁸. Although it is not the most common type of fragility fracture, osteoporotic hip fractures are considered the most serious, with a mortality rate reaching 20–40% during the first year after fracture^9,10. With rising global life expectancy, the incidence and prevalence of osteoporosis are also expected to increase. Accordingly, the number of men and women combined who will be above the threshold for a fracture is expected to almost double by 2040, with a prediction of 319 million cases^11,12. In fact, by 2050, the worldwide incidence of hip fracture in women is projected to increase by 240% and 310% in men^8,13, with approximately 1 in 2 women and 1 in 5 men over the age of 50 projected to suffer from a fracture in their remaining lifetime¹⁴.

Hip fractures are prevalently detected via abnormalities observed on plain radiography, patient history, and physical examination findings. Nevertheless, radiographical appearance on radiography is not always sufficient for final diagnosis due to highly variable patient parameters such as BMI, positioning, and image quality¹⁵. Up to 10% of at-risk patients are examined via further imaging, including computed tomography (CT) or magnetic resonance imaging (MRI) to limit misdiagnosis. However, less than a third of the further studied cases subsequently demonstrate hip fractures^2,16. Additional advanced imaging studies face challenges such as high cost and limited availability at remote and non-urban healthcare facilities. At the same time, delayed diagnosis and unrecognized fractures increase the risk of mortality² and the time and cost of hospitalization¹⁷.

Employing an accurate automated detection model for hip fractures on radiographs can aid experts in saving time and resources. As a result, automated tools using machine learning and deep learning models have been increasingly studied in the literature^18,19. Many studies have employed deep learning models trained over thousands of annotated radiographs and demonstrated high accuracy for potential clinical deployment^{20,21,22,23,24,25,26,27,28,29,30,31,32}. Nevertheless, these approaches lacked explicit localization of identified hip fractures. Providing the location of identified fractures allows the clinician to visualize and overread automated detection results to confirm the result or decide on further evaluation. Thus, several studies have focused on detecting and localizing hip fractures from radiographs via deep learning, albeit requiring multiple cascaded models^{33,34,35,36,37}. Developing and evaluating such cascaded approaches is less computationally efficient than end-to-end one-stage detection and localization^38,39,40 and potentially requires manual data cleaning between cascaded models to address error propagation³⁶.

Recent works have proposed deep learning models for end-to-end detection and localization of hip fractures from radiographs^41,42, particularly focusing on feature pyramid networks (FPNs)^{43,44,45,46,47}. FPNs are convolutional neural networks combining features extracted at different scales and resolutions toward object detection predictions. They are tailored for medical imaging applications in which variability in resolution and anatomical structure sizes are long-lasting challenges⁴⁸. Despite their success, FPNs have been typically evaluated for generic object detection metrics such as average precision^45,46,47, thereby limiting validation with clinically relevant metrics and confidence intervals. In recent years, transformer models have also become integral to deep learning approaches in medical image analysis, including detection and classification⁴⁹. For hip fracture analysis from plain radiography, transformer models have only been employed within multi-stage cascaded approaches³⁶, leaving room for end-to-end detection and localization as in FPNs.

Motivated by these observations, this study aims to assess state-of-the-art deep learning models for object detection, including FPNs and transformers, on end-to-end proximal femur fracture detection and localization from plain radiography with clinically relevant metrics. We employed and extended the VarifocalNet FPN⁵⁰, well-established for object detection in various domains. Using a retrospective dataset of 823 hip radiographs from 150 subjects with proximal femur fractures and 362 negative controls, VarifocalNet attained 0.94 specificity and 0.95 sensitivity, with up to 14% sensitivity and 9% accuracy improvement against five benchmark FPN models. Crucially, we took the first steps in evaluating a transformer model for our task, employing the state-of-the-art DINO network⁵¹. We established that for commonly-observed small-sample settings such as ours, FPNs remain state-of-the-art: VarifocalNet outperformed DINO by 17% sensitivity and 5% accuracy while taking half the time on average to process a radiograph.

Methods

Study design

The Institutional Review Board in the Beth Israel Deaconess Medical Center (BIDMC) at Harvard Medical School approved this retrospective study in compliance with the Health Information Portability and Accountability Act. All data was collected at the BIDMC Division of Musculoskeletal Imaging & Intervention. Informed consent was obtained from all individual participants included in the study. All methods were performed in accordance with relevant guidelines and regulations following the Declaration of Helsinki.

We collected retrospective frontal view plain radiographs of the hip from subjects who sustained a proximal femur fracture after 2004 using the PACS system. The proximal femur joins with the acetabulum of the pelvis to form the hip joint. Hip radiographs from non-fracture age- and gender-matched subjects were used as controls. Fractures outside the proximal femur, such as the acetabulum, were also considered negative for proximal femur fracture analysis. Exclusion criteria included subjects with pathological fractures from pre-existing pathological diseases other than osteoporosis (history of bone cancer, infection, or cysts) and lateral view radiographs. Identified scans were exported, de-identified, and assigned a unique identifier before analysis. The resulting dataset included 440 hip radiographs from 122 subjects with proximal femur fractures and 194 hip radiographs from 194 controls without proximal femur fractures. To balance the number of fracture and control scans, we augmented this dataset with publicly available hip radiographs collected from 28 fracture subjects and 161 controls⁵². The final dataset comprised 468 hip radiographs from 150 subjects with proximal femur fractures and 355 hip radiographs from 355 negative controls without proximal femur fractures.

Table 1 shows the subject-by-subject distribution of gender, BMI, and race categories for fracture and control subjects. Supplementary Table S.1 shows the scan-by-scan distribution of age and gender categories for fracture and control subjects, as scans from the same fracture subject may be collected at different ages. Agreeing with the published reports on fracture incidence rates^14,53, most fracture subjects were above the age of 50 for both genders, with more female subjects than male. The dataset was also diverse over the BMI categories, particularly for fracture subjects. While race distribution was more imbalanced, fracture incidences have been reported to exhibit higher rates over white populations compared to African American populations as in our dataset⁵³. Supplementary Table S.2 shows the scan-by-scan distribution of imaging devices, exhibiting diverse representation over four different scanner device manufacturers. Included fracture cases also exhibited diversity over anatomical locations, including the greater trochanter (54%), intertrochanter (24%), femoral neck (20%), and femoral head (2%), as well as degree of displacement, including non-displaced and mild displacement cases. Agreeing with the literature, the most common fracture location was the greater trochanter⁵⁴, and the rarest location was the femoral head⁵⁵.

Table 1 Subject level demographics distribution.

Full size table

Data annotation and partitioning

To perform fracture localization, a radiologist with clinical experience in musculoskeletal radiography manually annotated each confirmed fracture radiograph by drawing a bounding box that fully contained each visible fracture region using the PhotoPad Image Editor (NCH Software)⁵⁶.

We partitioned our dataset into stratified training and test sets, keeping a uniform ratio of positive (with proximal femur fracture) and negative (without proximal femur fracture) subjects in each set. 10% of the subjects were held out for testing, and 90% were used for training. Data partitioning was based on subjects rather than scans, ensuring subjects included in training were not included in testing.

Automated proximal femur fracture and localization via VarifocalNet

VarifocalNet architecture

We employed and extended the VarifocalNet feature pyramid network (FPN)⁵⁰, motivated by the recent influx of FPN models for end-to-end detection and localization of hip fractures from radiographs^{43,44,45,46,47}. VarifocalNet FPN was selected for its state-of-the-art prediction performance in detecting common objects, outperforming twenty-five object detection baselines⁵⁰. In our application, VarifocalNet received a plain radiograph of the hip and made the following predictions: (i) rectangular bounding boxes circumscribing candidate fracture regions and (ii) a confidence score in the range 0–1 associated with each detected box. The confidence score governed the likelihood of fracture existence and was thresholded in post-processing stages to detect a fracture. We explain the details of VarifocalNet architecture and our approach below.

An FPN receives a 2D image of any size and begins with extracting a hierarchy of features at multiple scales via a base neural network⁵⁷. The base network comprises a sequence of stages, each containing convolutional and residual layers. The activation output of each stage's last residual layer is part of the feature pyramid. Base network features are complemented by upsampling to extract higher-resolution features, merged with lower-resolution base network features of the same size to form a multi-scale feature pyramid. Feature extraction and merging at different resolutions tailors FPNs for medical imaging applications in which variability in resolution and anatomical structure sizes are long-lasting challenges⁴⁸. We used the ResNeXt-101 architecture⁵⁸ for the base network and five feature pyramid levels, following the recent literature on FPNs^47,50. Features extracted by the feature pyramid were used for (i) detecting objects of interest via bounding boxes circumscribing object locations on the input image, and (ii) predicting a confidence score for each detected box.

To predict bounding boxes, VarifocalNet maps each pixel location at each feature pyramid level back to the original input scale by multiplying and shifting the pixel coordinates by the total stride before the current pyramid level⁵⁰. For each pixel location, four scalars representing the distance to the object bounding box's left, top, right, and bottom corners were predicted. Neighboring locations around the current pixel location were selected and mapped back to the feature pyramid level to incorporate nearby contextual information. Formally, for a pixel with coordinates x and y along the width and height of the image, respectively, a bounding box was first predicted via a convolutional block. The distances from (x, y) to the left, top, right, and bottom corners of the bounding box were denoted by l, t, r, and b, respectively. To incorporate nearby contextual information, nine neighboring pixels with coordinates (x, y), (xl, y), (x, y − t), (x + r, y), (x, y + b), (x − l, y–t), (x + l, y − t), (x − l, y + b) and (x + r, y + b) were selected and mapped back to the feature pyramid level. Bounding box predictions were then refined by learning and incorporating residual improvement factors. In particular, four distance scaling factors (∆l, ∆t, ∆r, ∆b) were predicted via deformable convolution⁵⁹ based on the features of neighboring pixels, where the relative offsets of neighboring pixels to (x, y) served as the offsets to the deformable convolution. The refined bounding box was then represented by $(l{\prime}, t{\prime}, r{\prime},$ $b{\prime})$ = (∆l × l, ∆t × t, ∆r × r, ∆b × b). Confidence score prediction followed the same steps as bounding box prediction except for the last layer, where the output was a scalar score p for each location (x, y), rather than the four distance factors.

Figure 1 summarizes the overall VarifocalNet architecture. We also included the detailed architecture breakdown for base neural network, feature pyramid, bounding box, and confidence score prediction stages in Supplementary Tables S.3–S.5.

Data preprocessing

We prepared each radiograph via contrast-limited adaptive histogram equalization (CLAHE) to enhance input radiographs, a typical technique in radiography-based fracture detection to reduce noise and improve image quality^60,61,62. Each image was then normalized to the range 0–1 via min–max normalization, following the standard in deep learning literature for medical imaging to aid training stability^63,64.

Training

Transfer learning was employed to accelerate training by initializing VarifocalNet parameters with weights pre-trained on a benchmark object detection dataset named COCO⁶⁵. Following initialization, VarifocalNet was trained over the pairs of training scans and corresponding ground-truth fracture bounding boxes for 75 epochs via stochastic gradient descent with a momentum factor of 0.9 and batch size of 1⁶⁶. The learning rate was initialized at $5\times $ 10^–3 and divided by ten after every 25 epochs to aid training convergence⁶⁷. To aid performance generalization, training scans were augmented by horizontal flipping and resizing, where image height was fixed at 1333 and width is varied between 512 and 800 by increments of 32. Moreover, initialized parameters of the first base network stage were not fine-tuned, while all trained parameters were regularized via weight decay with regularization level 10^–4⁶⁸. Initialization, optimization, data augmentation, and regularization techniques followed the standard in deep learning literature for object detection^50,51,57.

The training objective comprised several components for optimizing bounding boxes and confidence scores. Fracture confidence scores were optimized by minimizing a weighted binary cross entropy loss to combat the imbalance between pixels pertaining to background vs. fractures⁵⁰:

$$ - \frac{1}{\left| F \right|}\sum\limits_{i \in F} {q_{i} \left( {q_{i} log\left( {p_{i} } \right) + \left( {1 - q_{i} } \right)log\left( {1 - p_{i} } \right)} \right)} - \frac{1}{\left| F \right|}\sum\limits_{i \in B} {0.75 p_{i}^{2} log\left( {1 - p_{i} } \right)} , $$

(1)

where index i denotes a pixel location, F comprises the indices of foreground pixel locations coinciding with ground-truth fracture boxes, B comprises the indices of background pixel locations, q denotes the target confidence score, p denotes the predicted confidence score and $\left| F \right|$ denotes the number of foreground pixel locations. To capture the coupling between bounding box and confidence score predictions, the target score q took on the value of Intersection over Union (IOU)⁶⁹ between ground-truth and predicted bounding boxes for foreground pixel locations and the value 0 otherwise. In doing so, detections outside ground-truth fracture boxes were assigned lower weights, while high-confidence detections overlapping with ground-truth boxes were assigned higher weights.

Fracture bounding box predictions were optimized by minimizing a generalized IOU (GIOU) objective⁷⁰, governed by the negative of the proximity between a ground-truth fracture box and the corresponding detected box:

$${\sum }_{i \in F}\begin{array}{c}-\frac{1.5}{\left|F\right|} {q}_{i }GIOU\left(\left[{l}_{i} ,{t}_{i} ,{r}_{i} ,{b}_{i}\right], \left[{{l}_{i}}^{*},{{t}_{i}}^{*},{{r}_{i}}^{*},{{b}_{i}}^{*}\right]\right)\\ -\frac{2}{\left|F\right|} {q}_{i }GIOU\left(\left[{l}_{i}^{\prime}, {t}_{i}^{\prime}, {r}_{i}^{\prime}, {b}_{i}^{\prime}\right], \left[{{l}_{i}}^{*},{{t}_{i}}^{*},{{r}_{i}}^{*},{{b}_{i}}^{*}\right]\right),\end{array}$$

(2)

where * denotes the distance factors for a ground-truth fracture box. VarifocalNet was trained by minimizing the sum of (1) and (2), where the weighting coefficients 0.75, 1.5 and 2 followed Zhang et al.⁵⁰.

Inference and evaluation metrics

We applied the trained fracture detection model on each scan in the test set to record bounding box detections and their confidence scores. We represented each scan with the detection corresponding to the maximum confidence score in the scan, to be thresholded for fracture detection. We determined the fracture detection threshold as the score that maximized the geometric mean of sensitivity and specificity⁷¹. In the clinical care environment aided by this binary prediction, experts are expected to review the positive-flagged scans and decide on fracture presence. Thus, the focus of our model was not to miss positive scans by focusing on one high-confidence detection for each scan.

Fracture detection performance was assessed via several clinically relevant evaluation metrics. Using the confidence scores before thresholding, the Area Under the Receiver Operating Characteristic Curve (AUC) was computed. After thresholding for binary classification of each scan as positive or negative for proximal femur fracture, sensitivity, specificity, accuracy, and positive and negative predictive values were computed as follows:

$$\text{Sensitivity}=\frac{\#\, of\, positive\, predictions}{\#\, of\, ground-truth\, positives}=\frac{\#\, of\, true\, positives\, (TP)}{TP + \#\, of\, false\, negatives\, (FN)},$$

(3)

$$\text{Specificity }=\frac{\#\, of\, negative\, predictions}{\#\, of\, ground-truth\, negatives}=\frac{\#\, of\, true\, negatives\, (TN)}{TN + \#\, of\, false\, positives\, (FP)},$$

(4)

$$\text{Accuracy}=\frac{TP+TN}{TP+TN+FN+FP},$$

(5)

$$\text{Positive Predictive Value }\left(\text{PPV}\right)=\frac{TP}{TP+FP},$$

(6)

$$\text{Negative Predictive Value }\left(\text{NPV}\right)=\frac{TN}{TN+FN}.$$

(7)

The benchmark IOU metric⁶⁹ was used to assess fracture localization performance, governed by the overlap percentage between a ground-truth fracture box and the corresponding detected box. IOU was computed over the true positive scans, as these were the only scans with both ground-truth and detected fracture boxes after thresholding for fracture detection.

We reported each metric and its 95% confidence interval⁷². To assess the significance when comparing two metrics, we reported p-values for the two-sided Mann–Whitney nonparametric test⁷³, as performance metrics do not follow a specific parametric distribution.

Competing methods

We evaluated VarifocalNet against five benchmark FPNs that have been tested for end-to-end hip fracture detection and localization from plain radiography: Faster-RCNN^44,74, Cascade-RCNN⁷⁵, RetinaNet⁷⁶, Fully convolutional one-stage (FCOS)⁷⁷ and Global Context networks (GCNet)⁴⁷. Faster R-CNN and Cascade R-CNN involve region proposal networks to predict bounding box locations relative to pre-defined anchor boxes. RetinaNet incorporates focal loss to combat the imbalance between background vs. object locations. Similar to VarifocalNet, FCOS does not require anchor boxes and directly predicts bounding boxes and confidence scores for each pixel location on feature pyramids. GCNet combines region proposal networks with global context blocks to capture long-range dependencies over input images. Other FPNs from the literature on end-to-end hip fracture detection and localization from plain radiography included dilated convolutional feature pyramid network (DCFPN)⁴⁵ and ParallelNet⁴⁶, which were outperformed by the GCNet we implemented and compared with⁴⁷. For fair comparison to VarifocalNet, all FPNs were implemented with ResNeXt-101 as their base neural network.

In addition to FPN benchmarks, we implemented the state-of-the-art DINO transformer network⁵¹ for end-to-end proximal femur fracture detection and localization from plain radiography. DINO uses a Swin transformer as the base neural network for feature extraction⁷⁸ and a transformer encoder-decoder network for object detection and localization using Swin features. Transformer networks involve attention mechanisms that learn weighting coefficients over features to capture long-range dependencies⁷⁹. For fair comparison to VarifocalNet, all base neural networks for FPNs and DINO were initialized with weights pre-trained on COCO and were implemented with the same preprocessing and inference procedures described in Sections "Data preprocessing" and "Inference and evaluation metrics".

Beyond end-to-end detection and localization approaches, we implemented two other state-of-the-art deep learning models commonly used for hip fracture detection. DenseNet⁸⁰ employs dense connections by receiving features extracted by all preceding layers with identical feature shapes as inputs to each layer and has been used by a plethora of recent works^{21,24,30,31,32}. We implemented the DenseNet-121 version following recent works^31,32. EfficientNet was proposed to improve the efficiency of well-established convolutional neural networks by increasing architecture depth, resolution scaling and number of channels in intermediate layers to extract more fine-grained features⁸¹. It has been used by multiple related works^20,82; we implemented the EfficientNet-B5 version following recent works⁸². Both networks were initialized with weights pre-trained over the benchmark image classification dataset ImageNet⁸³ and implemented with the same preprocessing and inference procedures described in Sections "Data preprocessing" and "Inference and evaluation metrics".

Ethics approval and informed consent

The Institutional Review Board in the Beth Israel Deaconess Medical Center (BIDMC) at Harvard Medical School approved this retrospective study in compliance with the Health Information Portability and Accountability Act. All data was collected at the BIDMC Division of Musculoskeletal Imaging and Intervention. Informed consent was obtained from all individual participants included in the study. All methods were performed in accordance with relevant guidelines and regulations following the Declaration of Helsinki.

Results

Our goal in this study was to establish the state-of-the-art in deep learning models for end-to-end proximal femur fracture detection and localization from plain radiography with clinically relevant metrics. We present our relevant results below.

Quantitative results

Table 2 visualizes the fracture detection and localization performance metrics for VarifocalNet against all competing methods. VarifocalNet attained high performance across all clinically relevant metrics, with 0.98 AUC, 0.94 specificity, 0.95 sensitivity, and 0.94 accuracy. In doing so, VarifocalNet outperformed all other FPN models by up to 6% AUC, 14% sensitivity, 9% accuracy, and 12% NPV, with p-values < 10^–4. Moreover, VarifocalNet obtained the best balance between sensitivity and specificity.

Table 2 Comparison of VarifocalNet to competing methods.

Full size table

Crucially, VarifocalNet outperformed the DINO transformer network by 7% AUC, 17% sensitivity, 5% accuracy, and 13% NPV. DINO also attained the lowest AUC and largest imbalance between specificity and sensitivity among all methods. Our results confirmed that while transformer models have been widely employed for medical image analysis⁴⁹, their performances on small-scale medical imaging datasets such as ours can vary substantially⁸⁴. VarifocalNet not only outperformed DINO with clinically relevant metrics but also performed inference more efficiently: when evaluated on a Quadro RTX 6000 Graphical Processing Unit (GPU), VarifocalNet took 1.16 s on average to process each radiograph, while DINO took 2.13 s. Quantitative comparisons showed that for small-sample settings such as ours, FPNs remain state-of-the-art compared to transformer models.

Regarding fracture localization, all methods attained similar IOUs in the range of 0.67 to 0.71. As discussed in more detail below in Section "Qualitative results", VarifocalNet consistently localized fracture regions of interest correctly compared to the corresponding ground-truths, while detected box sizes and aspect ratios varied and lowered the average IOU.

In comparison to DenseNet and EfficientNet that only performed fracture detection, VarifocalNet attained similarly high detection performance with significantly better AUC, lower specificity, and equal sensitivity. Crucially, in doing so, VarifocalNet additionally provided the locations of identified fractures. Providing the location of identified fractures allows the clinician to visualize and overread automated detection results to confirm the result or decide on further evaluation.

We further analyzed VarifocalNet for gender subgroups: the average AUC was 0.99 for female subjects and 0.84 for male subjects. Agreeing with the literature on hip fractures^14,53, our dataset comprised twice the number of female subjects than male subjects with proximal femur fractures, as summarized in Table 1. Thus, the trained model could generalize well over female subjects while remaining more limited in evaluations of male subjects.

Qualitative results

Figure 2 visualizes examples of ground-truth fracture bounding boxes vs. the corresponding predictions by VarifocalNet. VarifocalNet consistently localized fracture regions of interest correctly compared to the corresponding ground-truths, with particularly high confidence scores for scans with hip implants such as Fig. 2a. That said, detected box sizes and aspect ratios varied (c.f. Fig. 2b,c) and lowered the average IOU for all methods, as reported in Table 2. Overall, VarifocalNet prioritized highly accurate proximal femur fracture detection for clinical applications with expert review aided by localization, rather than the exact sizes of fractures.

Figure 3 compares fracture bounding box predictions of VarifocalNet against two competing methods with the highest average IOUs in Table 2: DINO and Cascade-RCNN. All three methods typically localized fracture regions of interest correctly compared to the corresponding ground-truths, demonstrated by the similar IOUs in Table 2 and exemplified by Fig. 3a. Figure 3b,c show the only two true positive predictions for which the VarifocalNet fracture box predictions did not overlap with ground-truth boxes. In both cases, DINO or Cascade-RCNN also made the same localization mistake or could not correctly classify the scan as positive for fracture. In particular, the scan in Fig. 3c shows the only scan for which VarifocalNet (as well as Cascade-RCNN) predicted the opposite side of the ground-truth as the fracture location. As this scan belonged to an 80-year-old female subject, we believe the contralateral side of the fractured hip introduced a challenge for both methods, given the systemic nature of fracture risk and the similarity of the two femurs^85,86,87,88. Qualitative results confirmed that the fracture localization performance of VarifocalNet was on par with other competing methods, while also significantly improving fracture detection performance, as discussed in Section "Quantitative results".

Figure 4 visualizes the only two ground-truth fracture scans falsely predicted as negative by VarifocalNet. As femoral head fractures are uncommon⁸⁹ and represented by only 2% of the subjects in our dataset, Fig. 4a demonstrates a rare and difficult femoral head fracture scan for the proposed model. For the scan in Fig. 4b, VarifocalNet predicted a fracture bounding box with a confidence score falling slightly below the detection threshold. We believe the confidence score was lower since this scan was considerably more zoomed out of the hip region and contained most of the femur bone, compared to the other hip scans in Figs. 2 and 3.

External validation

To assess the robustness and generalizability of the proposed method, we conducted further experiments on a publicly available dataset associated with two recent works^22,43. The PelvixNet dataset⁹⁰ comprised 100 frontal view plain radiographs of the hip with 50 scans collected from subjects with hip fractures and the remaining 50 from subjects without hip fractures. Included scans did not contain annotations of fracture locations. We used the models trained over our dataset to perform fracture detection on PelvixNet, with detection thresholds developed over our dataset as described in Section "Inference and evaluation metrics". Corresponding results are presented in Table 3.

Table 3 Comparison of VarifocalNet to competing methods over the external PelvixNet dataset.

Full size table

VarifocalNet attained significantly higher sensitivity and NPV than other methods by up to 34% sensitivity (p-values < 10^–5) and 17% NPV (p-values < 0.02), as well as the second highest accuracy that did not have a significant difference with the highest accuracy. Similar to the results over our dataset (c.f. Section "Quantitative results"), VarifocalNet further exhibited balance between sensitivity and specificity, while several other methods including DINO, DenseNet and EfficientNet resulted in severe imbalance by up to 48% difference between the two metrics. Moreover, end-to-end detection and localization models consistently outperformed DenseNet and EfficientNet, further underlining the benefit of localization in terms of robustness in detection performance. These results were also promising for potential applications in the clinical-care environment, where sensitivity is the most critical metric as false negatives can lead to delayed diagnosis or unrecognized fractures, while specificity should also be at a similar level in order to reduce unnecessary burden of time and cost for both clinicians and patients.

Discussion

We employed and extended the state-of-the-art VarifocalNet⁵⁰ for end-to-end proximal femur fracture detection and localization from plain radiography. Our retrospective dataset comprised 823 hip radiographs acquired from 150 fracture subjects and 362 non-fracture controls, with diverse patient parameters summarized in Table 1.

A large body of research has used deep learning models to identify or classify hip fractures from radiographs^{20,21,22,23,24,25,26,27,28,29,30,31,32}, albeit lacking explicit localization of identified fractures. These approaches employed a plethora of well-established convolutional neural networks such as AlexNet²⁶, GoogLeNet²⁶, ResNet²⁹, DenseNet^{21,24,30,31,32}, EfficientNet²⁰ and Xception^22,23. Extensions included heatmap-based analysis via weighted class activation mapping (Grad-CAM)^{20,21,22,23,29,31,32}, improved loss functions such as focal loss³⁰, autoencoder networks for feature extraction²⁸, and curriculum learning²⁵. When trained over thousands of annotated hip radiographs, these detection models attained up to 0.99 AUC^29,31. Our approach via VarifocalNet attained 0.98 AUC while using only 823 radiographs collected from 150 fracture subjects and 362 negative controls. Crucially, VarifocalNet performed joint detection and localization of proximal femur fractures, allowing the clinician to visualize and overread automated detection results to confirm or decide on further evaluation.

Several studies have detected and localized hip fractures from radiographs via deep learning, albeit requiring multiple cascaded models^{33,34,35,36,37}. In particular, a neural network was first trained to zoom into the hip region on radiographs, using customized convolutional networks^33,34 or well-established architectures such as AlexNet³⁵ and Yolo³⁶. A second network was then trained over the hip radiographs cropped around the hip to detect and classify fractures, with novel architectures including Siamese networks³⁷ and vision transformers³⁶. Developing and evaluating such cascaded approaches is less computationally efficient than end-to-end detection and localization approaches^38,39,40. Furthermore, cascaded approaches may require manual data cleaning between cascaded models to address error propagation, as exemplified by Tanzi et al.³⁶. Instead, our approach performed end-to-end detection and localization of proximal femur fractures via one deep-learning model based on VarifocalNet. More importantly, we tested a transformer model for the first time for end-to-end hip fracture detection and localization from plain radiography; Tanzi et al.³⁶ instead used a transformer as the classification stage of a multi-stage cascaded model. VarifocalNet not only outperformed the state-of-the-art DINO transformer regarding clinical metrics but also took half the time on average to process a radiograph. Our results established that for small-sample settings like ours, FPNs remain state-of-the-art compared to transformer models requiring thousands of annotated images for training^36,84.

Closer to our work, recent studies have performed end-to-end detection and localization of hip fractures from radiographs^{41,42,43,44,45,46,47}. Jiménez-Sánchez et al.⁴¹ and Kazi et al.⁴² incorporated transformations (such as scaling and translation) into detection models, where all transformations were trained to maximize the detection performance. Unlike our work, these approaches did not use bounding box annotations of fracture and, accordingly, did not perform localization accurately⁴¹. Instead, most existing works used FPN models^{43,44,45,46,47}, trained over fracture bounding box annotations for end-to-end detection and localization of hip fractures. FPNs tested for this task included Faster-RCNN⁴⁴, Cascade-RCNN⁷⁵, RetinaNet⁷⁶, FCOS⁷⁷, DCFPN⁴⁵, ParallelNet⁴⁶ and GCNet⁴⁷. As presented in Section "Quantitative results", our study assessed FPNs based on clinically relevant metrics to establish the state-of-the-art. Our proposed model based on VarifocalNet outperformed Faster-RCNN, Cascade-RCNN, RetinaNet, FCOS and GCNet by up to 6% AUC, 14% sensitivity, 9% accuracy, and 12% NPV with p-values < 10^–4. We did not evaluate DCFPN and ParallelNet, as they were outperformed by GCNet when tested over the same dataset⁴⁷. Cheng et al.⁴³ also proposed an FPN model, albeit requiring point annotations marking centers of fracture-related hip regions, rather than bounding box annotations that we considered. We focused on bounding box annotations due to the extensive literature with the same data annotation setting^{33,34,35,36,43,44,45,46}, also noting that point annotations are typically used with other imaging modalities than radiography, such as histopathology^91,92,93 and MRI⁹⁴.

Our study has some limitations. While our dataset contained a similar number of radiographs of proximal femur fractures and negative controls (468 with fractures, 355 controls), samples with proximal femur fractures were collected from 150 subjects. This reduced the number of independent training and testing samples, further exacerbating small-sample challenges such as large confidence intervals in Table 2. Another challenge was the imbalance of genders in our dataset, containing twice the number of female subjects than male subjects with proximal femur fractures. This resulted in a higher AUC of fracture detection over female subjects than males, as they were better represented in training. While this imbalance agreed with the literature on hip fractures^14,53, collecting more scans from male subjects to augment our dataset would improve performance generalization. Moreover, we believe that the performance gap between our dataset and PelvixNet by all models may be due to the fact that our dataset mainly focused on proximal femur fractures due to bone fragility, while PelvixNet mainly included fractures due to trauma. Including other fracture types such as trauma and pathologies other than osteoporosis would further improve generalization.

Conclusion

We evaluated deep learning models on end-to-end proximal femur fracture detection and localization from plain radiography with clinically relevant metrics, focusing on the state-of-the-art VarifocalNet FPN. Tested over 823 hip radiographs of 150 fracture subjects and 362 controls, VarifocalNet attained 0.94 specificity and 0.95 sensitivity, outperforming five benchmark FPNs. Taking the first steps in implementing a transformer model for our task, VarifocalNet further outperformed the transformer network DINO and confirmed FPNs as state-of-the-art for small-sample settings such as ours. Employing a highly sensitive and specific automated detection model for proximal femur fracture detection can aid experts in accurate diagnosis. This can reduce further advanced imaging requirements such as CT and MRI, saving patients and healthcare facilities time and resources. Our study focused on highly accurate detection of proximal femur fractures from radiographs but did not currently incorporate classification of fracture types³⁶ or grades³³. Collecting such annotations and extending VarifocalNet for classification and localization of proximal femur fractures of diverse types is an open direction.

Data availability

The datasets generated and analyzed during the current study are not publicly available due to being supported by an NIH SBIR grant award. As outlined in the 2023 NIH Data Management and Sharing Policy, “SBIR and Small Business Technology Transfer (STTR) recipients may retain the rights to data generated during the performance of an SBIR or STTR award for up to 20 years after the award date, per the SBIR and STTR Program Policy Directive but are available on reasonable request from the corresponding author."

References

Melton, L. Hip fracture: A worldwide problem today and tomorrow. Bone 14, 51–58 (1993).
Article Google Scholar
Dominguez, S., Liu, P., Roberts, C., Mandell, M. & Richman, P. B. Prevalence of traumatic hip and pelvic fractures in patients with suspected hip fracture and negative initial standard radiographs—a study of emergency department patients. Acad. Emerg. Med. 12(4), 366–369 (2005).
Article PubMed Google Scholar
Melton, L. J. III., Therneau, T. M. & Larson, D. R. Long-term trends in hip fracture prevalence: The influence of hip fracture incidence and survival. Osteoporos. Int. 8, 68–74 (1998).
Article PubMed Google Scholar
Kannus, P., Natri, A., Paakkala, T. & Jarvinen, M. An outcome study of chronic patellofemoral pain syndrome: Seven-year follow-up of patients in a randomized, controlled trial. J. Bone Joint Surg. Am. 81(3), 355–363 (1999).
Article CAS PubMed Google Scholar
Salari, N. et al. Global prevalence of osteoporosis among the world older adults: A comprehensive systematic review and meta-analysis. J. Orthop. Surg. Res. 16(1), 669. https://doi.org/10.1186/s13018-021-02821-8 (2021).
Article PubMed PubMed Central Google Scholar
Turner, C. H. Biomechanics of bone: Determinants of skeletal fragility and bone quality. Osteoporosis Int. 13(2), 97–104. https://doi.org/10.1007/s001980200000 (2002).
Article CAS Google Scholar
Chen, H., Zhou, F., Onozuka, M. & Kubo, K. Y. Age-related changes in trabecular and cortical bone microstructure. Int. J. Endocrinol. 213, 234. https://doi.org/10.1155/2013/213234 (2013).
Article Google Scholar
Foundation, I. O. Facts and Statistics | International Osteoporosis Foundation. https://www.iofbonehealth.org/facts-statistics (accessed January, 2019).
van Oostwaard, M. Osteoporosis and the nature of fragility fracture: An overview. In Hertz, K., & Santy-Tomlinson, J. (Eds.), Fragility Fracture Nursing: Holistic Care and Management of the Orthogeriatric Patient, pp. 1–13 (Springer, Cham, 2018).
Guzon-Illescas, O. et al. Mortality after osteoporotic hip fracture: Incidence, trends, and associated factors. J. Orthop. Surg. Res. 14(1), 203. https://doi.org/10.1186/s13018-019-1226-6 (2019).
Article PubMed PubMed Central Google Scholar
Oden, A., McCloskey, E. V., Kanis, J. A., Harvey, N. C. & Johansson, H. Burden of high fracture probability worldwide: Secular increases 2010–2040. Osteoporosis Int. J. Estab. Res. Cooper. Between Eur. Found. Osteopor. Natl. Osteopor. Found. USA 26(9), 2243–2248. https://doi.org/10.1007/s00198-015-3154-6 (2015).
Article CAS Google Scholar
Tu, K. N. et al. Osteoporosis: A review of treatment options. P&T Peer-Rev. J. Formul. Manag. 43(2), 92–104 (2018).
Google Scholar
Ji, M.-X. & Yu, Q. Primary osteoporosis in postmenopausal women. Chronic Dis. Transl. Med. 1(1), 9–13. https://doi.org/10.1016/j.cdtm.2015.02.006 (2015).
Article PubMed PubMed Central Google Scholar
Harvey, N., Dennison, E. & Cooper, C. Osteoporosis: Impact on health and economics. Nat. Rev. Rheumatol. 6(2), 99–105. https://doi.org/10.1038/nrrheum.2009.260 (2010).
Article PubMed Google Scholar
Kirby, M. W. & Spritzer, C. Radiographic detection of hip and pelvic fractures in the emergency department. Am. J. Roentgenol. 194(4), 1054–1060 (2010).
Article Google Scholar
Cannon, J., Silvestri, S. & Munro, M. Imaging choices in occult hip fracture. J. Emerg. Med. 37, 144–152 (2009).
Article PubMed Google Scholar
Shabat, S. et al. Economic consequences of operative delay for hip fractures in a non-profit institution. Orthopedics 26, 1197–1199 (2003).
Article PubMed Google Scholar
Cha, Y. et al. Artificial intelligence and machine learning on diagnosis and classification of hip fracture: Systematic review. J. Orthoped. Surg. Res. 17(1), 1–13 (2022).
Google Scholar
Yang, S. et al. Diagnostic accuracy of deep learning in orthopedic fractures: A systematic review and meta-analysis. Clin. Radiol. 75(9), 713-e17 (2020).
Article Google Scholar
Sato, Y., Takegami, Y., Asamoto, T., Ono, Y., Hidetoshi, T., Goto, R., Kitamura, A., & Honda, S. A computer-aided diagnosis system using artificial intelligence for hip fractures-multi-institutional joint development research (2020). arXiv preprint arXiv:2003.12443.
Kitamura, G. Deep learning evaluation of pelvic radiographs for position, hardware presence, and fracture detection. Eur. J. Radiol. 130, 109139 (2020).
Article PubMed PubMed Central Google Scholar
Choi, J. et al. Practical computer vision application to detect hip fractures on pelvic X-rays: A bi-institutional study. Trauma Surg. Acute Care Open 6(1), e000705 (2021).
Article PubMed PubMed Central Google Scholar
Ouyang, C. H. et al. The application of design thinking in developing a deep learning algorithm for hip fracture detection. Bioengineering 10(6), 735 (2023).
Article PubMed PubMed Central Google Scholar
Gale, W., Oakden-Rayner, L., Carneiro, G., Bradley, A. P., & Palmer, L. J. Detecting hip fractures with radiologist-level performance using deep neural networks (2017). arXiv preprint arXiv:1711.06504.
Jiménez-Sánchez, A., Mateus, D., Kirchhoff, S., Kirchhoff, C., Biberthaler, P., Navab, N., González Ballester, M. A., & Piella, G. Medical-based deep curriculum learning for improved fracture classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part VI 22, pp. 694–702 (Springer International Publishing, 2019).
Adams, M. et al. Computer vs. human: Deep learning versus perceptual training for the detection of neck of femur fractures. J. Med. Imaging Rad. Oncol. 63(1), 27–32 (2019).
Article Google Scholar
Beyaz, S., Açıcı, K. & Sümer, E. Femoral neck fracture detection in X-ray images using deep learning and genetic algorithm approaches. Joint Dis. Relat. Surg. 31(2), 175 (2020).
Google Scholar
Lee, C. et al. Classification of femur fracture in pelvic X-ray images using meta-learned deep neural network. Sci. Rep. 10(1), 13694 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bae, J. et al. External validation of deep learning algorithm for detecting and visualizing femoral neck fracture including displaced and non-displaced fracture on plain X-ray. J. Digit. Imaging 34(5), 1099–1109 (2021).
Article PubMed PubMed Central Google Scholar
Lotfy, M., Shubair, R.M., Navab, N., & Albarqouni, S. Investigation of focal loss in deep learning models for femur fractures classification. In 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–4 (IEEE, 2019).
Oakden-Rayner, L. et al. Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: A diagnostic accuracy study. Lancet Digi. Health 4(5), e351–e358 (2022).
Article CAS Google Scholar
Gao, Y. et al. Application of a deep learning algorithm in the detection of hip fractures. Iscience 26(8), 1 (2023).
Article Google Scholar
Mutasa, S., Varada, S., Goel, A., Wong, T. T. & Rasiej, M. J. Advanced deep learning techniques applied to automated femoral neck fracture detection and classification. J. Digit. Imaging 33, 1209–1217 (2020).
Article PubMed PubMed Central Google Scholar
Murphy, E. A. et al. Machine learning outperforms clinical experts in classification of hip fractures. Sci. Rep. 12(1), 2058 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Jiménez-Sánchez, A. et al. Precise proximal femur fracture classification for interactive training and surgical planning. Int. J. Comput. Assist. Radiol. Surg. 15, 847–857 (2020).
Article PubMed Google Scholar
Tanzi, L., Audisio, A., Cirrincione, G., Aprato, A. & Vezzetti, E. Vision transformer for femur fracture classification. Injury 53(7), 2625–2634 (2022).
Article PubMed Google Scholar
Chen, H., Wang, Y., Zheng, K., Li, W., Chang, C. T., Harrison, A. P., Xiao, J., Hager, G. D., Lu, L., Liao, C. H., & Miao, S. Anatomy-aware siamese network: Exploiting semantic asymmetry for accurate pelvic fracture detection in x-ray images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 239–255 (Springer International Publishing, 2020).
Yang, Y., Asthana, A., & Zheng, L. Does keypoint estimation benefit object detection? An empirical study of one-stage and two-stage detectors. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 1–7 (IEEE, 2021).
Zhang, Y., Li, X., Wang, F., Wei, B., & Li, L. A comprehensive review of one-stage networks for object detection. In 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), pp. 1–6 (IEEE, 2021).
Soviany, P., & Ionescu, R.T. Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. In 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 209–214 (IEEE, 2018).
Jiménez-Sánchez, A., Kazi, A., Albarqouni, S., Kirchhoff, S., Sträter, A., Biberthaler, P., Mateus, D., & Navab, N. Weakly-supervised localization and classification of proximal femur fractures (2018). arXiv preprint arXiv:1809.10692.
Kazi, A., Albarqouni, S., Sanchez, A.J., Kirchhoff, S., Biberthaler, P., Navab, N., & Mateus, D. Automatic classification of proximal femur fractures based on attention models. In Machine Learning in Medical Imaging: 8th International Workshop, MLMI 2017, Held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, September 10, 2017, Proceedings 8, pp. 70–78 (Springer International Publishing, 2017).
Cheng, C. T. et al. A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nat. Commun. 12(1), 1066 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, P. et al. Artificial intelligence to detect the femoral intertrochanteric fracture: The arrival of the intelligent-medicine era. Front. Bioeng. Biotechnol. 10, 927926 (2022).
Article PubMed PubMed Central Google Scholar
Guan, B., Yao, J., Zhang, G. & Wang, X. Thigh fracture detection using deep learning method based on new dilated convolutional feature pyramid network. Pattern Recogn. Lett. 125, 521–526 (2019).
Article ADS Google Scholar
Wang, M. et al. ParallelNet: Multiple backbone network for detection tasks on thigh bone fracture. Multimed. Syst. 1, 1–10 (2021).
Google Scholar
Guan, B. et al. Automatic detection and localization of thighbone fractures in X-ray based on improved deep learning method. Comput. Vis. Image Understand. 216, 103345 (2022).
Article Google Scholar
Zhou, S. K. et al. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 109(5), 820–838 (2021).
Article CAS Google Scholar
Shamshad, F., Khan, S., Zamir, S. W., Khan, M. H., Hayat, M., Khan, F. S., & Fu, H. Transformers in medical imaging: A survey. Med. Image Anal. 102802 (2023).
Zhang, H., Wang, Y., Dayoub, F., & Sunderhauf, N. Varifocalnet: An IOU-aware dense object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8514–8523 (2021).
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L. M., & Shum, H. Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection (2022). arXiv preprint arXiv:2203.03605.
Abedeen, I. et al. FracAtlas: A dataset for fracture classification, localization and segmentation of musculoskeletal radiographs. Sci. Data 10(1), 521 (2023).
Article PubMed PubMed Central Google Scholar
Farmer, M. E., White, L. R., Brody, J. A. & Bailey, K. R. Race and sex differences in hip fracture incidence. Am. J. Public Health 74(12), 1374–1380 (1984).
Article CAS PubMed PubMed Central Google Scholar
Parkkari, J. et al. Majority of hip fractures occur as a result of a fall and impact on the greater trochanter of the femur: A prospective controlled hip fracture study with 206 consecutive patients. Calcified Tissue Int. 65, 183–187 (1999).
Article CAS Google Scholar
Matsuda, D. K. A rare fracture, an even rarer treatment: The arthroscopic reduction and internal fixation of an isolated femoral head fracture. Arthrosc. J. Arthrosc. Relat. Surg. 25(4), 408–412 (2009).
Article Google Scholar
NCH Software, Inc. PhotoPad Image Editor [Computer software] (2019).
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition pp. 2117–2125 (2017).
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500 (2017).
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 764–773 (2017).
Karanam, S. R., Srinivas, Y. & Chakravarty, S. A systematic approach to diagnosis and categorization of bone fractures in X-Ray imagery. Int. J. Healthc. Manag. 1, 1–12 (2022).
Google Scholar
Lu, S., Wang, S. & Wang, G. Automated universal fractures detection in X-ray images based on deep learning approach. Multimed. Tools Appl. 1, 1–17 (2022).
Google Scholar
Mall, P. K., Singh, P. K., & Yadav, D. GLCM based feature extraction and medical x-ray image classification using machine learning techniques. In 2019 IEEE Conference on Information and Communication Technology, pp. 1–6 (2019).
Kibriya, H. et al. A novel and effective brain tumor classification model using deep feature fusion and famous machine learning classifiers. Comput. Intell. Neurosci. 1, 1 (2022).
Article Google Scholar
Reddy, G. T., Bhattacharya, S., Ramakrishnan, S. S., Chowdhary, C. L., Hakak, S., Kaluri, R., & Reddy, M. P. K. An ensemble based machine learning model for diabetic retinopathy classification. In 2020 international conference on emerging trends in information technology and engineering (ic-ETITE) pp. 1–6 (IEEE, 2020).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13 (pp. 740–755) (Springer International Publishing, 2014).
Ruder, S. An overview of gradient descent optimization algorithms (2016). arXiv preprint arXiv:1609.04747.
You, K., Long, M., Wang, J., & Jordan, M. I. How does learning rate decay help modern neural networks? (2019). arXiv preprint arXiv:1908.01878.
Krogh A., & Hertz J. A simple weight decay can improve generalization. Advances in neural information processing systems. In Proceedings of the 4th International Conference on Neural Information Processing Systems, pp. 950–957 (1991).
Redmon, J., & Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017).
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., & Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 658–666) (2019).
Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006).
Article ADS MathSciNet Google Scholar
Hanley, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982).
Article CAS PubMed Google Scholar
McKnight, P. E., & Najab, J. Mann–Whitney U test. The Corsini encyclopedia of psychology, pp.1–1 (2010).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 1 (2015).
Google Scholar
Cai, Z. & Vasconcelos, N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019).
Article Google Scholar
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988) (2017).
Tian, Z., Shen, C., Chen, H., & He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9627–9636) (2019).
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022) (2021).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1 (2017).
Google Scholar
Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., & Keutzer, K. Densenet: Implementing efficient convnet descriptor pyramids (2014). arXiv preprint arXiv:1404.1869.
Tan, M., & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105–6114) (PMLR, 2019).
Kim, T., Moon, N. H., Goh, T. S. & Jung, I. D. Detection of incomplete atypical femoral fracture on anteroposterior radiographs via explainable artificial intelligence. Sci. Rep. 13(1), 10415 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255) (2009).
Liu, Y. et al. Efficient training of visual transformers with small datasets. Adv. Neural Inf. Process. Syst. 34, 23818–23830 (2021).
Google Scholar
Pierre, M. A., Zurakowski, D., Nazarian, A., Hauser-Kara, D. A. & Snyder, B. D. Assessment of the bilateral asymmetry of human femurs based on physical, densitometric, and structural rigidity characteristics. J. Biomech. 43(11), 2228–36. https://doi.org/10.1016/j.jbiomech.2010.02.032 (2010).
Article PubMed PubMed Central Google Scholar
Rao, A. D., Reddy, S. & Rao, D. S. Is there a difference between right and left femoral bone density?. J. Clin. Densitom. 3(1), 57–61. https://doi.org/10.1385/JCD:3:1:057 (2000).
Article CAS PubMed Google Scholar
Yang, R. S., Chieng, P. U., Tsai, K. S. & Liu, T. K. Symmetry of bone mineral density in the hips is not affected by age. Nucl. Med. Commun. 17(8), 711–6. https://doi.org/10.1097/00006231-199608000-00012 (1996).
Article CAS PubMed Google Scholar
Faulkner, K. G., Genant, H. K. & McClung, M. Bilateral comparison of femoral bone density and hip axis length from single and fan beam DXA scans. Calcif. Tissue Int. 56(1), 26–31. https://doi.org/10.1007/bf00298740 (1995).
Article CAS PubMed Google Scholar
Droll, K. P., Broekhuyse, H. & O’Brien, P. Fracture of the femoral head. JAAOS-J. Am. Acad. Orthoped. Surg. 15(12), 716–727 (2007).
Article Google Scholar
Cheng, C. T. et al. A scalable physician-level deep learning algorithm of universal trauma finding detection of pelvic radiographs, PelvixNet dataset. Gigantum. https://doi.org/10.34747/f06m-m978 (2021).
Article Google Scholar
Han, J., Wang, X. & Liu, W. Contextual prior constrained deep networks for mitosis detection with point annotations. IEEE Access 9, 71954–71967 (2021).
Article Google Scholar
Gao, Z. et al. A semi-supervised multi-task learning framework for cancer classification with weak annotation in whole-slide images. Med. Image Anal. 83, 102652 (2023).
Article PubMed Google Scholar
Gao, Z., Puttapirat, P., Shi, J., & Li, C. Renal cell carcinoma detection and subtyping with minimal point-based annotation in whole-slide images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part V 23 (pp. 439–448) (Springer International Publishing, 2020).
Han, X., Zhai, Y., Yu, Z., Peng, T., & Zhang, X. Y. Detecting extremely small lesions in mouse brain MRI with point annotations via multi-task learning. In Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pp. 498–506 (Springer International Publishing, 2021).

Download references

Acknowledgements

All figures have been prepared by İ.Y.P., using Microsoft PowerPoint version 16.76.1 and Matplotlib software library version 3.5.2.

Funding

The research reported in this publication was supported by the National Institute On Aging of the National Institutes of Health (NIH) Small Business Innovation Research (SBIR) under Award Number R44AG081031. The content is solely the authors' responsibility and does not necessarily represent the official views of the NIH.

Author information

These authors jointly supervised this work: Ara Nazarian and Ashkan Vaziri.

Authors and Affiliations

BioSensics, LLC, 57 Chapel Street, Newton, MA, 02458, USA
İlkay Yıldız Potter, Aidin Vaziri & Ashkan Vaziri
Carl J. Shapiro Department of Orthopaedic Surgery, Beth Israel Deaconess Medical Center (BIDMC) and Harvard Medical School, 330 Brookline Avenue, Stoneman 10, Boston, MA, 02215, USA
Diana Yeritsyan, Sarah Mahar, Nadim Kheir, Edward K. Rodriguez & Ara Nazarian
Musculoskeletal Translational Innovation Initiative, Beth Israel Deaconess Medical Center and Harvard Medical School, 330 Brookline Avenue RN123, Boston, MA, 02215, USA
Diana Yeritsyan, Sarah Mahar, Nadim Kheir, Edward K. Rodriguez & Ara Nazarian
Division of Endocrinology, Massachusetts General Hospital and Harvard Medical School, 55 Fruit Street, Boston, MA, 02114, USA
Melissa Putman
Department of Radiology, Massachusetts General Brigham (MGB) and Harvard Medical School, 75 Francis Street, Boston, MA, 02215, USA
Jim Wu
Department of Orthopaedic Surgery, Yerevan State University, Yerevan, Armenia
Ara Nazarian

Authors

İlkay Yıldız Potter
View author publications
You can also search for this author in PubMed Google Scholar
Diana Yeritsyan
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Mahar
View author publications
You can also search for this author in PubMed Google Scholar
Nadim Kheir
View author publications
You can also search for this author in PubMed Google Scholar
Aidin Vaziri
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Putman
View author publications
You can also search for this author in PubMed Google Scholar
Edward K. Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Jim Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ara Nazarian
View author publications
You can also search for this author in PubMed Google Scholar
Ashkan Vaziri
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All of the listed authors have participated actively in the entire study project. İ.Y.P., J.W, A.N., and As.V. developed the design and conduct of the study. A.N. and J.W. led the data collection. M.P, E.R., D.Y., S.M., and N.K. aided in data collection. Ai.V. annotated imaging data. İ.Y.P. performed data analysis. İ.Y.P. wrote the first draft of the manuscript, and all authors commented on previous versions. All authors participated in and approved the final submission.

Corresponding author

Correspondence to İlkay Yıldız Potter.

Ethics declarations

Competing interests

İ.Y.P., D.Y., S.M., N.K., M.P, E.R., J.W., and Ai.V. declare they have no financial interests. As.V. and A.N. received the research grant as principal investigators. A.N. is also a consultant with BioSensics, LLC, on an unrelated project.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Tables.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yıldız Potter, İ., Yeritsyan, D., Mahar, S. et al. Proximal femur fracture detection on plain radiography via feature pyramid networks. Sci Rep 14, 12046 (2024). https://doi.org/10.1038/s41598-024-63001-2

Download citation

Received: 14 September 2023
Accepted: 23 May 2024
Published: 27 May 2024
DOI: https://doi.org/10.1038/s41598-024-63001-2

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Segment anything in medical images

Implementing vision transformer for classifying 2D biomedical images

Towards a general-purpose foundation model for computational pathology

Introduction

Methods

Study design

Data annotation and partitioning

Automated proximal femur fracture and localization via VarifocalNet

VarifocalNet architecture

Data preprocessing

Training

Inference and evaluation metrics

Competing methods

Ethics approval and informed consent

Results

Quantitative results

Qualitative results

External validation

Discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Tables.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Comments

Search

Quick links