Abstract
Hip fractures exceed 250,000 cases annually in the United States, with the worldwide incidence projected to increase by 240–310% by 2050. Hip fractures are predominantly diagnosed by radiologist review of radiographs. In this study, we developed a deep learning model by extending the VarifocalNet Feature Pyramid Network (FPN) for detection and localization of proximal femur fractures from plain radiography with clinically relevant metrics. We used a dataset of 823 hip radiographs of 150 subjects with proximal femur fractures and 362 controls to develop and evaluate the deep learning model. Our model attained 0.94 specificity and 0.95 sensitivity in fracture detection over the diverse imaging dataset. We compared the performance of our model against five benchmark FPN models, demonstrating 6–14% sensitivity and 1–9% accuracy improvement. In addition, we demonstrated that our model outperforms a state-of-the-art transformer model based on DINO network by 17% sensitivity and 5% accuracy, while taking half the time on average to process a radiograph. The developed model can aid radiologists and support on-premise integration with hospital cloud services to enable automatic, opportunistic screening for hip fractures.
Similar content being viewed by others
Introduction
Hip pain is a common reason for presentation to an emergency room. A traumatic event such as a fall associated with hip pain or a deformity points to a high suspicion of hip fracture and is often easily diagnosed with radiography. However, some fractures are not obvious and can constitute an occult, valgus impacted, or nondisplaced fracture that can be missed in initial radiographic assessments, often requiring additional imaging modalities such as CT or MRI. In the United States, hip fractures exceed 250,000 annually, with an incidence of 80 per 100,000 population1,2,3. Hip fracture incidence rates are known to increase exponentially with age in both women and men4, secondary to osteoporosis and osteoporosis. This is the most common metabolic disease worldwide, predominantly affecting the elderly population5 and characterized by decreased bone mineral density and loss of trabecular architecture6, bone microstructural deterioration, and increased fracture risk7. A total of 8.9 million fractures are caused by osteoporosis in the world annually, resulting in an osteoporotic fracture every 3 s8. Although it is not the most common type of fragility fracture, osteoporotic hip fractures are considered the most serious, with a mortality rate reaching 20–40% during the first year after fracture9,10. With rising global life expectancy, the incidence and prevalence of osteoporosis are also expected to increase. Accordingly, the number of men and women combined who will be above the threshold for a fracture is expected to almost double by 2040, with a prediction of 319 million cases11,12. In fact, by 2050, the worldwide incidence of hip fracture in women is projected to increase by 240% and 310% in men8,13, with approximately 1 in 2 women and 1 in 5 men over the age of 50 projected to suffer from a fracture in their remaining lifetime14.
Hip fractures are prevalently detected via abnormalities observed on plain radiography, patient history, and physical examination findings. Nevertheless, radiographical appearance on radiography is not always sufficient for final diagnosis due to highly variable patient parameters such as BMI, positioning, and image quality15. Up to 10% of at-risk patients are examined via further imaging, including computed tomography (CT) or magnetic resonance imaging (MRI) to limit misdiagnosis. However, less than a third of the further studied cases subsequently demonstrate hip fractures2,16. Additional advanced imaging studies face challenges such as high cost and limited availability at remote and non-urban healthcare facilities. At the same time, delayed diagnosis and unrecognized fractures increase the risk of mortality2 and the time and cost of hospitalization17.
Employing an accurate automated detection model for hip fractures on radiographs can aid experts in saving time and resources. As a result, automated tools using machine learning and deep learning models have been increasingly studied in the literature18,19. Many studies have employed deep learning models trained over thousands of annotated radiographs and demonstrated high accuracy for potential clinical deployment20,21,22,23,24,25,26,27,28,29,30,31,32. Nevertheless, these approaches lacked explicit localization of identified hip fractures. Providing the location of identified fractures allows the clinician to visualize and overread automated detection results to confirm the result or decide on further evaluation. Thus, several studies have focused on detecting and localizing hip fractures from radiographs via deep learning, albeit requiring multiple cascaded models33,34,35,36,37. Developing and evaluating such cascaded approaches is less computationally efficient than end-to-end one-stage detection and localization38,39,40 and potentially requires manual data cleaning between cascaded models to address error propagation36.
Recent works have proposed deep learning models for end-to-end detection and localization of hip fractures from radiographs41,42, particularly focusing on feature pyramid networks (FPNs)43,44,45,46,47. FPNs are convolutional neural networks combining features extracted at different scales and resolutions toward object detection predictions. They are tailored for medical imaging applications in which variability in resolution and anatomical structure sizes are long-lasting challenges48. Despite their success, FPNs have been typically evaluated for generic object detection metrics such as average precision45,46,47, thereby limiting validation with clinically relevant metrics and confidence intervals. In recent years, transformer models have also become integral to deep learning approaches in medical image analysis, including detection and classification49. For hip fracture analysis from plain radiography, transformer models have only been employed within multi-stage cascaded approaches36, leaving room for end-to-end detection and localization as in FPNs.
Motivated by these observations, this study aims to assess state-of-the-art deep learning models for object detection, including FPNs and transformers, on end-to-end proximal femur fracture detection and localization from plain radiography with clinically relevant metrics. We employed and extended the VarifocalNet FPN50, well-established for object detection in various domains. Using a retrospective dataset of 823 hip radiographs from 150 subjects with proximal femur fractures and 362 negative controls, VarifocalNet attained 0.94 specificity and 0.95 sensitivity, with up to 14% sensitivity and 9% accuracy improvement against five benchmark FPN models. Crucially, we took the first steps in evaluating a transformer model for our task, employing the state-of-the-art DINO network51. We established that for commonly-observed small-sample settings such as ours, FPNs remain state-of-the-art: VarifocalNet outperformed DINO by 17% sensitivity and 5% accuracy while taking half the time on average to process a radiograph.
Methods
Study design
The Institutional Review Board in the Beth Israel Deaconess Medical Center (BIDMC) at Harvard Medical School approved this retrospective study in compliance with the Health Information Portability and Accountability Act. All data was collected at the BIDMC Division of Musculoskeletal Imaging & Intervention. Informed consent was obtained from all individual participants included in the study. All methods were performed in accordance with relevant guidelines and regulations following the Declaration of Helsinki.
We collected retrospective frontal view plain radiographs of the hip from subjects who sustained a proximal femur fracture after 2004 using the PACS system. The proximal femur joins with the acetabulum of the pelvis to form the hip joint. Hip radiographs from non-fracture age- and gender-matched subjects were used as controls. Fractures outside the proximal femur, such as the acetabulum, were also considered negative for proximal femur fracture analysis. Exclusion criteria included subjects with pathological fractures from pre-existing pathological diseases other than osteoporosis (history of bone cancer, infection, or cysts) and lateral view radiographs. Identified scans were exported, de-identified, and assigned a unique identifier before analysis. The resulting dataset included 440 hip radiographs from 122 subjects with proximal femur fractures and 194 hip radiographs from 194 controls without proximal femur fractures. To balance the number of fracture and control scans, we augmented this dataset with publicly available hip radiographs collected from 28 fracture subjects and 161 controls52. The final dataset comprised 468 hip radiographs from 150 subjects with proximal femur fractures and 355 hip radiographs from 355 negative controls without proximal femur fractures.
Table 1 shows the subject-by-subject distribution of gender, BMI, and race categories for fracture and control subjects. Supplementary Table S.1 shows the scan-by-scan distribution of age and gender categories for fracture and control subjects, as scans from the same fracture subject may be collected at different ages. Agreeing with the published reports on fracture incidence rates14,53, most fracture subjects were above the age of 50 for both genders, with more female subjects than male. The dataset was also diverse over the BMI categories, particularly for fracture subjects. While race distribution was more imbalanced, fracture incidences have been reported to exhibit higher rates over white populations compared to African American populations as in our dataset53. Supplementary Table S.2 shows the scan-by-scan distribution of imaging devices, exhibiting diverse representation over four different scanner device manufacturers. Included fracture cases also exhibited diversity over anatomical locations, including the greater trochanter (54%), intertrochanter (24%), femoral neck (20%), and femoral head (2%), as well as degree of displacement, including non-displaced and mild displacement cases. Agreeing with the literature, the most common fracture location was the greater trochanter54, and the rarest location was the femoral head55.
Data annotation and partitioning
To perform fracture localization, a radiologist with clinical experience in musculoskeletal radiography manually annotated each confirmed fracture radiograph by drawing a bounding box that fully contained each visible fracture region using the PhotoPad Image Editor (NCH Software)56.
We partitioned our dataset into stratified training and test sets, keeping a uniform ratio of positive (with proximal femur fracture) and negative (without proximal femur fracture) subjects in each set. 10% of the subjects were held out for testing, and 90% were used for training. Data partitioning was based on subjects rather than scans, ensuring subjects included in training were not included in testing.
Automated proximal femur fracture and localization via VarifocalNet
VarifocalNet architecture
We employed and extended the VarifocalNet feature pyramid network (FPN)50, motivated by the recent influx of FPN models for end-to-end detection and localization of hip fractures from radiographs43,44,45,46,47. VarifocalNet FPN was selected for its state-of-the-art prediction performance in detecting common objects, outperforming twenty-five object detection baselines50. In our application, VarifocalNet received a plain radiograph of the hip and made the following predictions: (i) rectangular bounding boxes circumscribing candidate fracture regions and (ii) a confidence score in the range 0–1 associated with each detected box. The confidence score governed the likelihood of fracture existence and was thresholded in post-processing stages to detect a fracture. We explain the details of VarifocalNet architecture and our approach below.
An FPN receives a 2D image of any size and begins with extracting a hierarchy of features at multiple scales via a base neural network57. The base network comprises a sequence of stages, each containing convolutional and residual layers. The activation output of each stage's last residual layer is part of the feature pyramid. Base network features are complemented by upsampling to extract higher-resolution features, merged with lower-resolution base network features of the same size to form a multi-scale feature pyramid. Feature extraction and merging at different resolutions tailors FPNs for medical imaging applications in which variability in resolution and anatomical structure sizes are long-lasting challenges48. We used the ResNeXt-101 architecture58 for the base network and five feature pyramid levels, following the recent literature on FPNs47,50. Features extracted by the feature pyramid were used for (i) detecting objects of interest via bounding boxes circumscribing object locations on the input image, and (ii) predicting a confidence score for each detected box.
To predict bounding boxes, VarifocalNet maps each pixel location at each feature pyramid level back to the original input scale by multiplying and shifting the pixel coordinates by the total stride before the current pyramid level50. For each pixel location, four scalars representing the distance to the object bounding box's left, top, right, and bottom corners were predicted. Neighboring locations around the current pixel location were selected and mapped back to the feature pyramid level to incorporate nearby contextual information. Formally, for a pixel with coordinates x and y along the width and height of the image, respectively, a bounding box was first predicted via a convolutional block. The distances from (x, y) to the left, top, right, and bottom corners of the bounding box were denoted by l, t, r, and b, respectively. To incorporate nearby contextual information, nine neighboring pixels with coordinates (x, y), (xl, y), (x, y − t), (x + r, y), (x, y + b), (x − l, y–t), (x + l, y − t), (x − l, y + b) and (x + r, y + b) were selected and mapped back to the feature pyramid level. Bounding box predictions were then refined by learning and incorporating residual improvement factors. In particular, four distance scaling factors (∆l, ∆t, ∆r, ∆b) were predicted via deformable convolution59 based on the features of neighboring pixels, where the relative offsets of neighboring pixels to (x, y) served as the offsets to the deformable convolution. The refined bounding box was then represented by \((l{\prime}, t{\prime}, r{\prime},\) \(b{\prime})\) = (∆l × l, ∆t × t, ∆r × r, ∆b × b). Confidence score prediction followed the same steps as bounding box prediction except for the last layer, where the output was a scalar score p for each location (x, y), rather than the four distance factors.
Figure 1 summarizes the overall VarifocalNet architecture. We also included the detailed architecture breakdown for base neural network, feature pyramid, bounding box, and confidence score prediction stages in Supplementary Tables S.3–S.5.
VarifocalNet architecture. The ResNeXt-101 base neural network comprises a sequence of stages, each containing convolutional and residual layers. These stages extract hierarchical multi-resolution features, depicted by rectangles with horizontal lines. ResNeXt-101 features are complemented by upsampling to extract higher-resolution features, depicted by rectangles with vertical lines. Higher and lower resolution features are merged and used for predicting fracture bounding boxes (bbox) and associated confidence scores in the range 0–1. We also included the detailed architecture breakdown for base neural network, feature pyramid, bounding box, and confidence score prediction stages in Supplementary Tables S.3–S.5.
Data preprocessing
We prepared each radiograph via contrast-limited adaptive histogram equalization (CLAHE) to enhance input radiographs, a typical technique in radiography-based fracture detection to reduce noise and improve image quality60,61,62. Each image was then normalized to the range 0–1 via min–max normalization, following the standard in deep learning literature for medical imaging to aid training stability63,64.
Training
Transfer learning was employed to accelerate training by initializing VarifocalNet parameters with weights pre-trained on a benchmark object detection dataset named COCO65. Following initialization, VarifocalNet was trained over the pairs of training scans and corresponding ground-truth fracture bounding boxes for 75 epochs via stochastic gradient descent with a momentum factor of 0.9 and batch size of 166. The learning rate was initialized at \(5\times \) 10–3 and divided by ten after every 25 epochs to aid training convergence67. To aid performance generalization, training scans were augmented by horizontal flipping and resizing, where image height was fixed at 1333 and width is varied between 512 and 800 by increments of 32. Moreover, initialized parameters of the first base network stage were not fine-tuned, while all trained parameters were regularized via weight decay with regularization level 10–468. Initialization, optimization, data augmentation, and regularization techniques followed the standard in deep learning literature for object detection50,51,57.
The training objective comprised several components for optimizing bounding boxes and confidence scores. Fracture confidence scores were optimized by minimizing a weighted binary cross entropy loss to combat the imbalance between pixels pertaining to background vs. fractures50:
where index i denotes a pixel location, F comprises the indices of foreground pixel locations coinciding with ground-truth fracture boxes, B comprises the indices of background pixel locations, q denotes the target confidence score, p denotes the predicted confidence score and \(\left| F \right|\) denotes the number of foreground pixel locations. To capture the coupling between bounding box and confidence score predictions, the target score q took on the value of Intersection over Union (IOU)69 between ground-truth and predicted bounding boxes for foreground pixel locations and the value 0 otherwise. In doing so, detections outside ground-truth fracture boxes were assigned lower weights, while high-confidence detections overlapping with ground-truth boxes were assigned higher weights.
Fracture bounding box predictions were optimized by minimizing a generalized IOU (GIOU) objective70, governed by the negative of the proximity between a ground-truth fracture box and the corresponding detected box:
where * denotes the distance factors for a ground-truth fracture box. VarifocalNet was trained by minimizing the sum of (1) and (2), where the weighting coefficients 0.75, 1.5 and 2 followed Zhang et al.50.
Inference and evaluation metrics
We applied the trained fracture detection model on each scan in the test set to record bounding box detections and their confidence scores. We represented each scan with the detection corresponding to the maximum confidence score in the scan, to be thresholded for fracture detection. We determined the fracture detection threshold as the score that maximized the geometric mean of sensitivity and specificity71. In the clinical care environment aided by this binary prediction, experts are expected to review the positive-flagged scans and decide on fracture presence. Thus, the focus of our model was not to miss positive scans by focusing on one high-confidence detection for each scan.
Fracture detection performance was assessed via several clinically relevant evaluation metrics. Using the confidence scores before thresholding, the Area Under the Receiver Operating Characteristic Curve (AUC) was computed. After thresholding for binary classification of each scan as positive or negative for proximal femur fracture, sensitivity, specificity, accuracy, and positive and negative predictive values were computed as follows:
The benchmark IOU metric69 was used to assess fracture localization performance, governed by the overlap percentage between a ground-truth fracture box and the corresponding detected box. IOU was computed over the true positive scans, as these were the only scans with both ground-truth and detected fracture boxes after thresholding for fracture detection.
We reported each metric and its 95% confidence interval72. To assess the significance when comparing two metrics, we reported p-values for the two-sided Mann–Whitney nonparametric test73, as performance metrics do not follow a specific parametric distribution.
Competing methods
We evaluated VarifocalNet against five benchmark FPNs that have been tested for end-to-end hip fracture detection and localization from plain radiography: Faster-RCNN44,74, Cascade-RCNN75, RetinaNet76, Fully convolutional one-stage (FCOS)77 and Global Context networks (GCNet)47. Faster R-CNN and Cascade R-CNN involve region proposal networks to predict bounding box locations relative to pre-defined anchor boxes. RetinaNet incorporates focal loss to combat the imbalance between background vs. object locations. Similar to VarifocalNet, FCOS does not require anchor boxes and directly predicts bounding boxes and confidence scores for each pixel location on feature pyramids. GCNet combines region proposal networks with global context blocks to capture long-range dependencies over input images. Other FPNs from the literature on end-to-end hip fracture detection and localization from plain radiography included dilated convolutional feature pyramid network (DCFPN)45 and ParallelNet46, which were outperformed by the GCNet we implemented and compared with47. For fair comparison to VarifocalNet, all FPNs were implemented with ResNeXt-101 as their base neural network.
In addition to FPN benchmarks, we implemented the state-of-the-art DINO transformer network51 for end-to-end proximal femur fracture detection and localization from plain radiography. DINO uses a Swin transformer as the base neural network for feature extraction78 and a transformer encoder-decoder network for object detection and localization using Swin features. Transformer networks involve attention mechanisms that learn weighting coefficients over features to capture long-range dependencies79. For fair comparison to VarifocalNet, all base neural networks for FPNs and DINO were initialized with weights pre-trained on COCO and were implemented with the same preprocessing and inference procedures described in Sections "Data preprocessing" and "Inference and evaluation metrics".
Beyond end-to-end detection and localization approaches, we implemented two other state-of-the-art deep learning models commonly used for hip fracture detection. DenseNet80 employs dense connections by receiving features extracted by all preceding layers with identical feature shapes as inputs to each layer and has been used by a plethora of recent works21,24,30,31,32. We implemented the DenseNet-121 version following recent works31,32. EfficientNet was proposed to improve the efficiency of well-established convolutional neural networks by increasing architecture depth, resolution scaling and number of channels in intermediate layers to extract more fine-grained features81. It has been used by multiple related works20,82; we implemented the EfficientNet-B5 version following recent works82. Both networks were initialized with weights pre-trained over the benchmark image classification dataset ImageNet83 and implemented with the same preprocessing and inference procedures described in Sections "Data preprocessing" and "Inference and evaluation metrics".
Ethics approval and informed consent
The Institutional Review Board in the Beth Israel Deaconess Medical Center (BIDMC) at Harvard Medical School approved this retrospective study in compliance with the Health Information Portability and Accountability Act. All data was collected at the BIDMC Division of Musculoskeletal Imaging and Intervention. Informed consent was obtained from all individual participants included in the study. All methods were performed in accordance with relevant guidelines and regulations following the Declaration of Helsinki.
Results
Our goal in this study was to establish the state-of-the-art in deep learning models for end-to-end proximal femur fracture detection and localization from plain radiography with clinically relevant metrics. We present our relevant results below.
Quantitative results
Table 2 visualizes the fracture detection and localization performance metrics for VarifocalNet against all competing methods. VarifocalNet attained high performance across all clinically relevant metrics, with 0.98 AUC, 0.94 specificity, 0.95 sensitivity, and 0.94 accuracy. In doing so, VarifocalNet outperformed all other FPN models by up to 6% AUC, 14% sensitivity, 9% accuracy, and 12% NPV, with p-values < 10–4. Moreover, VarifocalNet obtained the best balance between sensitivity and specificity.
Crucially, VarifocalNet outperformed the DINO transformer network by 7% AUC, 17% sensitivity, 5% accuracy, and 13% NPV. DINO also attained the lowest AUC and largest imbalance between specificity and sensitivity among all methods. Our results confirmed that while transformer models have been widely employed for medical image analysis49, their performances on small-scale medical imaging datasets such as ours can vary substantially84. VarifocalNet not only outperformed DINO with clinically relevant metrics but also performed inference more efficiently: when evaluated on a Quadro RTX 6000 Graphical Processing Unit (GPU), VarifocalNet took 1.16 s on average to process each radiograph, while DINO took 2.13 s. Quantitative comparisons showed that for small-sample settings such as ours, FPNs remain state-of-the-art compared to transformer models.
Regarding fracture localization, all methods attained similar IOUs in the range of 0.67 to 0.71. As discussed in more detail below in Section "Qualitative results", VarifocalNet consistently localized fracture regions of interest correctly compared to the corresponding ground-truths, while detected box sizes and aspect ratios varied and lowered the average IOU.
In comparison to DenseNet and EfficientNet that only performed fracture detection, VarifocalNet attained similarly high detection performance with significantly better AUC, lower specificity, and equal sensitivity. Crucially, in doing so, VarifocalNet additionally provided the locations of identified fractures. Providing the location of identified fractures allows the clinician to visualize and overread automated detection results to confirm the result or decide on further evaluation.
We further analyzed VarifocalNet for gender subgroups: the average AUC was 0.99 for female subjects and 0.84 for male subjects. Agreeing with the literature on hip fractures14,53, our dataset comprised twice the number of female subjects than male subjects with proximal femur fractures, as summarized in Table 1. Thus, the trained model could generalize well over female subjects while remaining more limited in evaluations of male subjects.
Qualitative results
Figure 2 visualizes examples of ground-truth fracture bounding boxes vs. the corresponding predictions by VarifocalNet. VarifocalNet consistently localized fracture regions of interest correctly compared to the corresponding ground-truths, with particularly high confidence scores for scans with hip implants such as Fig. 2a. That said, detected box sizes and aspect ratios varied (c.f. Fig. 2b,c) and lowered the average IOU for all methods, as reported in Table 2. Overall, VarifocalNet prioritized highly accurate proximal femur fracture detection for clinical applications with expert review aided by localization, rather than the exact sizes of fractures.
Example visualizations of ground-truth fracture bounding boxes (left) vs. predicted fracture bounding boxes by VarifocalNet (right). Images are radiographs preprocessed via CLAHE, as described in Section "Data preprocessing".
Figure 3 compares fracture bounding box predictions of VarifocalNet against two competing methods with the highest average IOUs in Table 2: DINO and Cascade-RCNN. All three methods typically localized fracture regions of interest correctly compared to the corresponding ground-truths, demonstrated by the similar IOUs in Table 2 and exemplified by Fig. 3a. Figure 3b,c show the only two true positive predictions for which the VarifocalNet fracture box predictions did not overlap with ground-truth boxes. In both cases, DINO or Cascade-RCNN also made the same localization mistake or could not correctly classify the scan as positive for fracture. In particular, the scan in Fig. 3c shows the only scan for which VarifocalNet (as well as Cascade-RCNN) predicted the opposite side of the ground-truth as the fracture location. As this scan belonged to an 80-year-old female subject, we believe the contralateral side of the fractured hip introduced a challenge for both methods, given the systemic nature of fracture risk and the similarity of the two femurs85,86,87,88. Qualitative results confirmed that the fracture localization performance of VarifocalNet was on par with other competing methods, while also significantly improving fracture detection performance, as discussed in Section "Quantitative results".
Qualitative examples of ground-truth fracture bounding boxes (left column) and VarifocalNet predictions (second column) against DINO (third column) and Cascade-RCNN (fourth column) predictions. The associated confidence score is provided on the right of each prediction image. Images are radiographs preprocessed via CLAHE, as described in Section "Data preprocessing".
Figure 4 visualizes the only two ground-truth fracture scans falsely predicted as negative by VarifocalNet. As femoral head fractures are uncommon89 and represented by only 2% of the subjects in our dataset, Fig. 4a demonstrates a rare and difficult femoral head fracture scan for the proposed model. For the scan in Fig. 4b, VarifocalNet predicted a fracture bounding box with a confidence score falling slightly below the detection threshold. We believe the confidence score was lower since this scan was considerably more zoomed out of the hip region and contained most of the femur bone, compared to the other hip scans in Figs. 2 and 3.
External validation
To assess the robustness and generalizability of the proposed method, we conducted further experiments on a publicly available dataset associated with two recent works22,43. The PelvixNet dataset90 comprised 100 frontal view plain radiographs of the hip with 50 scans collected from subjects with hip fractures and the remaining 50 from subjects without hip fractures. Included scans did not contain annotations of fracture locations. We used the models trained over our dataset to perform fracture detection on PelvixNet, with detection thresholds developed over our dataset as described in Section "Inference and evaluation metrics". Corresponding results are presented in Table 3.
VarifocalNet attained significantly higher sensitivity and NPV than other methods by up to 34% sensitivity (p-values < 10–5) and 17% NPV (p-values < 0.02), as well as the second highest accuracy that did not have a significant difference with the highest accuracy. Similar to the results over our dataset (c.f. Section "Quantitative results"), VarifocalNet further exhibited balance between sensitivity and specificity, while several other methods including DINO, DenseNet and EfficientNet resulted in severe imbalance by up to 48% difference between the two metrics. Moreover, end-to-end detection and localization models consistently outperformed DenseNet and EfficientNet, further underlining the benefit of localization in terms of robustness in detection performance. These results were also promising for potential applications in the clinical-care environment, where sensitivity is the most critical metric as false negatives can lead to delayed diagnosis or unrecognized fractures, while specificity should also be at a similar level in order to reduce unnecessary burden of time and cost for both clinicians and patients.
Discussion
We employed and extended the state-of-the-art VarifocalNet50 for end-to-end proximal femur fracture detection and localization from plain radiography. Our retrospective dataset comprised 823 hip radiographs acquired from 150 fracture subjects and 362 non-fracture controls, with diverse patient parameters summarized in Table 1.
A large body of research has used deep learning models to identify or classify hip fractures from radiographs20,21,22,23,24,25,26,27,28,29,30,31,32, albeit lacking explicit localization of identified fractures. These approaches employed a plethora of well-established convolutional neural networks such as AlexNet26, GoogLeNet26, ResNet29, DenseNet21,24,30,31,32, EfficientNet20 and Xception22,23. Extensions included heatmap-based analysis via weighted class activation mapping (Grad-CAM)20,21,22,23,29,31,32, improved loss functions such as focal loss30, autoencoder networks for feature extraction28, and curriculum learning25. When trained over thousands of annotated hip radiographs, these detection models attained up to 0.99 AUC29,31. Our approach via VarifocalNet attained 0.98 AUC while using only 823 radiographs collected from 150 fracture subjects and 362 negative controls. Crucially, VarifocalNet performed joint detection and localization of proximal femur fractures, allowing the clinician to visualize and overread automated detection results to confirm or decide on further evaluation.
Several studies have detected and localized hip fractures from radiographs via deep learning, albeit requiring multiple cascaded models33,34,35,36,37. In particular, a neural network was first trained to zoom into the hip region on radiographs, using customized convolutional networks33,34 or well-established architectures such as AlexNet35 and Yolo36. A second network was then trained over the hip radiographs cropped around the hip to detect and classify fractures, with novel architectures including Siamese networks37 and vision transformers36. Developing and evaluating such cascaded approaches is less computationally efficient than end-to-end detection and localization approaches38,39,40. Furthermore, cascaded approaches may require manual data cleaning between cascaded models to address error propagation, as exemplified by Tanzi et al.36. Instead, our approach performed end-to-end detection and localization of proximal femur fractures via one deep-learning model based on VarifocalNet. More importantly, we tested a transformer model for the first time for end-to-end hip fracture detection and localization from plain radiography; Tanzi et al.36 instead used a transformer as the classification stage of a multi-stage cascaded model. VarifocalNet not only outperformed the state-of-the-art DINO transformer regarding clinical metrics but also took half the time on average to process a radiograph. Our results established that for small-sample settings like ours, FPNs remain state-of-the-art compared to transformer models requiring thousands of annotated images for training36,84.
Closer to our work, recent studies have performed end-to-end detection and localization of hip fractures from radiographs41,42,43,44,45,46,47. Jiménez-Sánchez et al.41 and Kazi et al.42 incorporated transformations (such as scaling and translation) into detection models, where all transformations were trained to maximize the detection performance. Unlike our work, these approaches did not use bounding box annotations of fracture and, accordingly, did not perform localization accurately41. Instead, most existing works used FPN models43,44,45,46,47, trained over fracture bounding box annotations for end-to-end detection and localization of hip fractures. FPNs tested for this task included Faster-RCNN44, Cascade-RCNN75, RetinaNet76, FCOS77, DCFPN45, ParallelNet46 and GCNet47. As presented in Section "Quantitative results", our study assessed FPNs based on clinically relevant metrics to establish the state-of-the-art. Our proposed model based on VarifocalNet outperformed Faster-RCNN, Cascade-RCNN, RetinaNet, FCOS and GCNet by up to 6% AUC, 14% sensitivity, 9% accuracy, and 12% NPV with p-values < 10–4. We did not evaluate DCFPN and ParallelNet, as they were outperformed by GCNet when tested over the same dataset47. Cheng et al.43 also proposed an FPN model, albeit requiring point annotations marking centers of fracture-related hip regions, rather than bounding box annotations that we considered. We focused on bounding box annotations due to the extensive literature with the same data annotation setting33,34,35,36,43,44,45,46, also noting that point annotations are typically used with other imaging modalities than radiography, such as histopathology91,92,93 and MRI94.
Our study has some limitations. While our dataset contained a similar number of radiographs of proximal femur fractures and negative controls (468 with fractures, 355 controls), samples with proximal femur fractures were collected from 150 subjects. This reduced the number of independent training and testing samples, further exacerbating small-sample challenges such as large confidence intervals in Table 2. Another challenge was the imbalance of genders in our dataset, containing twice the number of female subjects than male subjects with proximal femur fractures. This resulted in a higher AUC of fracture detection over female subjects than males, as they were better represented in training. While this imbalance agreed with the literature on hip fractures14,53, collecting more scans from male subjects to augment our dataset would improve performance generalization. Moreover, we believe that the performance gap between our dataset and PelvixNet by all models may be due to the fact that our dataset mainly focused on proximal femur fractures due to bone fragility, while PelvixNet mainly included fractures due to trauma. Including other fracture types such as trauma and pathologies other than osteoporosis would further improve generalization.
Conclusion
We evaluated deep learning models on end-to-end proximal femur fracture detection and localization from plain radiography with clinically relevant metrics, focusing on the state-of-the-art VarifocalNet FPN. Tested over 823 hip radiographs of 150 fracture subjects and 362 controls, VarifocalNet attained 0.94 specificity and 0.95 sensitivity, outperforming five benchmark FPNs. Taking the first steps in implementing a transformer model for our task, VarifocalNet further outperformed the transformer network DINO and confirmed FPNs as state-of-the-art for small-sample settings such as ours. Employing a highly sensitive and specific automated detection model for proximal femur fracture detection can aid experts in accurate diagnosis. This can reduce further advanced imaging requirements such as CT and MRI, saving patients and healthcare facilities time and resources. Our study focused on highly accurate detection of proximal femur fractures from radiographs but did not currently incorporate classification of fracture types36 or grades33. Collecting such annotations and extending VarifocalNet for classification and localization of proximal femur fractures of diverse types is an open direction.
Data availability
The datasets generated and analyzed during the current study are not publicly available due to being supported by an NIH SBIR grant award. As outlined in the 2023 NIH Data Management and Sharing Policy, “SBIR and Small Business Technology Transfer (STTR) recipients may retain the rights to data generated during the performance of an SBIR or STTR award for up to 20 years after the award date, per the SBIR and STTR Program Policy Directive but are available on reasonable request from the corresponding author."
References
Melton, L. Hip fracture: A worldwide problem today and tomorrow. Bone 14, 51–58 (1993).
Dominguez, S., Liu, P., Roberts, C., Mandell, M. & Richman, P. B. Prevalence of traumatic hip and pelvic fractures in patients with suspected hip fracture and negative initial standard radiographs—a study of emergency department patients. Acad. Emerg. Med. 12(4), 366–369 (2005).
Melton, L. J. III., Therneau, T. M. & Larson, D. R. Long-term trends in hip fracture prevalence: The influence of hip fracture incidence and survival. Osteoporos. Int. 8, 68–74 (1998).
Kannus, P., Natri, A., Paakkala, T. & Jarvinen, M. An outcome study of chronic patellofemoral pain syndrome: Seven-year follow-up of patients in a randomized, controlled trial. J. Bone Joint Surg. Am. 81(3), 355–363 (1999).
Salari, N. et al. Global prevalence of osteoporosis among the world older adults: A comprehensive systematic review and meta-analysis. J. Orthop. Surg. Res. 16(1), 669. https://doi.org/10.1186/s13018-021-02821-8 (2021).
Turner, C. H. Biomechanics of bone: Determinants of skeletal fragility and bone quality. Osteoporosis Int. 13(2), 97–104. https://doi.org/10.1007/s001980200000 (2002).
Chen, H., Zhou, F., Onozuka, M. & Kubo, K. Y. Age-related changes in trabecular and cortical bone microstructure. Int. J. Endocrinol. 213, 234. https://doi.org/10.1155/2013/213234 (2013).
Foundation, I. O. Facts and Statistics | International Osteoporosis Foundation. https://www.iofbonehealth.org/facts-statistics (accessed January, 2019).
van Oostwaard, M. Osteoporosis and the nature of fragility fracture: An overview. In Hertz, K., & Santy-Tomlinson, J. (Eds.), Fragility Fracture Nursing: Holistic Care and Management of the Orthogeriatric Patient, pp. 1–13 (Springer, Cham, 2018).
Guzon-Illescas, O. et al. Mortality after osteoporotic hip fracture: Incidence, trends, and associated factors. J. Orthop. Surg. Res. 14(1), 203. https://doi.org/10.1186/s13018-019-1226-6 (2019).
Oden, A., McCloskey, E. V., Kanis, J. A., Harvey, N. C. & Johansson, H. Burden of high fracture probability worldwide: Secular increases 2010–2040. Osteoporosis Int. J. Estab. Res. Cooper. Between Eur. Found. Osteopor. Natl. Osteopor. Found. USA 26(9), 2243–2248. https://doi.org/10.1007/s00198-015-3154-6 (2015).
Tu, K. N. et al. Osteoporosis: A review of treatment options. P&T Peer-Rev. J. Formul. Manag. 43(2), 92–104 (2018).
Ji, M.-X. & Yu, Q. Primary osteoporosis in postmenopausal women. Chronic Dis. Transl. Med. 1(1), 9–13. https://doi.org/10.1016/j.cdtm.2015.02.006 (2015).
Harvey, N., Dennison, E. & Cooper, C. Osteoporosis: Impact on health and economics. Nat. Rev. Rheumatol. 6(2), 99–105. https://doi.org/10.1038/nrrheum.2009.260 (2010).
Kirby, M. W. & Spritzer, C. Radiographic detection of hip and pelvic fractures in the emergency department. Am. J. Roentgenol. 194(4), 1054–1060 (2010).
Cannon, J., Silvestri, S. & Munro, M. Imaging choices in occult hip fracture. J. Emerg. Med. 37, 144–152 (2009).
Shabat, S. et al. Economic consequences of operative delay for hip fractures in a non-profit institution. Orthopedics 26, 1197–1199 (2003).
Cha, Y. et al. Artificial intelligence and machine learning on diagnosis and classification of hip fracture: Systematic review. J. Orthoped. Surg. Res. 17(1), 1–13 (2022).
Yang, S. et al. Diagnostic accuracy of deep learning in orthopedic fractures: A systematic review and meta-analysis. Clin. Radiol. 75(9), 713-e17 (2020).
Sato, Y., Takegami, Y., Asamoto, T., Ono, Y., Hidetoshi, T., Goto, R., Kitamura, A., & Honda, S. A computer-aided diagnosis system using artificial intelligence for hip fractures-multi-institutional joint development research (2020). arXiv preprint arXiv:2003.12443.
Kitamura, G. Deep learning evaluation of pelvic radiographs for position, hardware presence, and fracture detection. Eur. J. Radiol. 130, 109139 (2020).
Choi, J. et al. Practical computer vision application to detect hip fractures on pelvic X-rays: A bi-institutional study. Trauma Surg. Acute Care Open 6(1), e000705 (2021).
Ouyang, C. H. et al. The application of design thinking in developing a deep learning algorithm for hip fracture detection. Bioengineering 10(6), 735 (2023).
Gale, W., Oakden-Rayner, L., Carneiro, G., Bradley, A. P., & Palmer, L. J. Detecting hip fractures with radiologist-level performance using deep neural networks (2017). arXiv preprint arXiv:1711.06504.
Jiménez-Sánchez, A., Mateus, D., Kirchhoff, S., Kirchhoff, C., Biberthaler, P., Navab, N., González Ballester, M. A., & Piella, G. Medical-based deep curriculum learning for improved fracture classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part VI 22, pp. 694–702 (Springer International Publishing, 2019).
Adams, M. et al. Computer vs. human: Deep learning versus perceptual training for the detection of neck of femur fractures. J. Med. Imaging Rad. Oncol. 63(1), 27–32 (2019).
Beyaz, S., Açıcı, K. & Sümer, E. Femoral neck fracture detection in X-ray images using deep learning and genetic algorithm approaches. Joint Dis. Relat. Surg. 31(2), 175 (2020).
Lee, C. et al. Classification of femur fracture in pelvic X-ray images using meta-learned deep neural network. Sci. Rep. 10(1), 13694 (2020).
Bae, J. et al. External validation of deep learning algorithm for detecting and visualizing femoral neck fracture including displaced and non-displaced fracture on plain X-ray. J. Digit. Imaging 34(5), 1099–1109 (2021).
Lotfy, M., Shubair, R.M., Navab, N., & Albarqouni, S. Investigation of focal loss in deep learning models for femur fractures classification. In 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–4 (IEEE, 2019).
Oakden-Rayner, L. et al. Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: A diagnostic accuracy study. Lancet Digi. Health 4(5), e351–e358 (2022).
Gao, Y. et al. Application of a deep learning algorithm in the detection of hip fractures. Iscience 26(8), 1 (2023).
Mutasa, S., Varada, S., Goel, A., Wong, T. T. & Rasiej, M. J. Advanced deep learning techniques applied to automated femoral neck fracture detection and classification. J. Digit. Imaging 33, 1209–1217 (2020).
Murphy, E. A. et al. Machine learning outperforms clinical experts in classification of hip fractures. Sci. Rep. 12(1), 2058 (2022).
Jiménez-Sánchez, A. et al. Precise proximal femur fracture classification for interactive training and surgical planning. Int. J. Comput. Assist. Radiol. Surg. 15, 847–857 (2020).
Tanzi, L., Audisio, A., Cirrincione, G., Aprato, A. & Vezzetti, E. Vision transformer for femur fracture classification. Injury 53(7), 2625–2634 (2022).
Chen, H., Wang, Y., Zheng, K., Li, W., Chang, C. T., Harrison, A. P., Xiao, J., Hager, G. D., Lu, L., Liao, C. H., & Miao, S. Anatomy-aware siamese network: Exploiting semantic asymmetry for accurate pelvic fracture detection in x-ray images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 239–255 (Springer International Publishing, 2020).
Yang, Y., Asthana, A., & Zheng, L. Does keypoint estimation benefit object detection? An empirical study of one-stage and two-stage detectors. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 1–7 (IEEE, 2021).
Zhang, Y., Li, X., Wang, F., Wei, B., & Li, L. A comprehensive review of one-stage networks for object detection. In 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), pp. 1–6 (IEEE, 2021).
Soviany, P., & Ionescu, R.T. Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. In 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 209–214 (IEEE, 2018).
Jiménez-Sánchez, A., Kazi, A., Albarqouni, S., Kirchhoff, S., Sträter, A., Biberthaler, P., Mateus, D., & Navab, N. Weakly-supervised localization and classification of proximal femur fractures (2018). arXiv preprint arXiv:1809.10692.
Kazi, A., Albarqouni, S., Sanchez, A.J., Kirchhoff, S., Biberthaler, P., Navab, N., & Mateus, D. Automatic classification of proximal femur fractures based on attention models. In Machine Learning in Medical Imaging: 8th International Workshop, MLMI 2017, Held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, September 10, 2017, Proceedings 8, pp. 70–78 (Springer International Publishing, 2017).
Cheng, C. T. et al. A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nat. Commun. 12(1), 1066 (2021).
Liu, P. et al. Artificial intelligence to detect the femoral intertrochanteric fracture: The arrival of the intelligent-medicine era. Front. Bioeng. Biotechnol. 10, 927926 (2022).
Guan, B., Yao, J., Zhang, G. & Wang, X. Thigh fracture detection using deep learning method based on new dilated convolutional feature pyramid network. Pattern Recogn. Lett. 125, 521–526 (2019).
Wang, M. et al. ParallelNet: Multiple backbone network for detection tasks on thigh bone fracture. Multimed. Syst. 1, 1–10 (2021).
Guan, B. et al. Automatic detection and localization of thighbone fractures in X-ray based on improved deep learning method. Comput. Vis. Image Understand. 216, 103345 (2022).
Zhou, S. K. et al. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 109(5), 820–838 (2021).
Shamshad, F., Khan, S., Zamir, S. W., Khan, M. H., Hayat, M., Khan, F. S., & Fu, H. Transformers in medical imaging: A survey. Med. Image Anal. 102802 (2023).
Zhang, H., Wang, Y., Dayoub, F., & Sunderhauf, N. Varifocalnet: An IOU-aware dense object detector. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8514–8523 (2021).
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L. M., & Shum, H. Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection (2022). arXiv preprint arXiv:2203.03605.
Abedeen, I. et al. FracAtlas: A dataset for fracture classification, localization and segmentation of musculoskeletal radiographs. Sci. Data 10(1), 521 (2023).
Farmer, M. E., White, L. R., Brody, J. A. & Bailey, K. R. Race and sex differences in hip fracture incidence. Am. J. Public Health 74(12), 1374–1380 (1984).
Parkkari, J. et al. Majority of hip fractures occur as a result of a fall and impact on the greater trochanter of the femur: A prospective controlled hip fracture study with 206 consecutive patients. Calcified Tissue Int. 65, 183–187 (1999).
Matsuda, D. K. A rare fracture, an even rarer treatment: The arthroscopic reduction and internal fixation of an isolated femoral head fracture. Arthrosc. J. Arthrosc. Relat. Surg. 25(4), 408–412 (2009).
NCH Software, Inc. PhotoPad Image Editor [Computer software] (2019).
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition pp. 2117–2125 (2017).
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500 (2017).
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 764–773 (2017).
Karanam, S. R., Srinivas, Y. & Chakravarty, S. A systematic approach to diagnosis and categorization of bone fractures in X-Ray imagery. Int. J. Healthc. Manag. 1, 1–12 (2022).
Lu, S., Wang, S. & Wang, G. Automated universal fractures detection in X-ray images based on deep learning approach. Multimed. Tools Appl. 1, 1–17 (2022).
Mall, P. K., Singh, P. K., & Yadav, D. GLCM based feature extraction and medical x-ray image classification using machine learning techniques. In 2019 IEEE Conference on Information and Communication Technology, pp. 1–6 (2019).
Kibriya, H. et al. A novel and effective brain tumor classification model using deep feature fusion and famous machine learning classifiers. Comput. Intell. Neurosci. 1, 1 (2022).
Reddy, G. T., Bhattacharya, S., Ramakrishnan, S. S., Chowdhary, C. L., Hakak, S., Kaluri, R., & Reddy, M. P. K. An ensemble based machine learning model for diabetic retinopathy classification. In 2020 international conference on emerging trends in information technology and engineering (ic-ETITE) pp. 1–6 (IEEE, 2020).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13 (pp. 740–755) (Springer International Publishing, 2014).
Ruder, S. An overview of gradient descent optimization algorithms (2016). arXiv preprint arXiv:1609.04747.
You, K., Long, M., Wang, J., & Jordan, M. I. How does learning rate decay help modern neural networks? (2019). arXiv preprint arXiv:1908.01878.
Krogh A., & Hertz J. A simple weight decay can improve generalization. Advances in neural information processing systems. In Proceedings of the 4th International Conference on Neural Information Processing Systems, pp. 950–957 (1991).
Redmon, J., & Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017).
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., & Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 658–666) (2019).
Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006).
Hanley, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982).
McKnight, P. E., & Najab, J. Mann–Whitney U test. The Corsini encyclopedia of psychology, pp.1–1 (2010).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 1 (2015).
Cai, Z. & Vasconcelos, N. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019).
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988) (2017).
Tian, Z., Shen, C., Chen, H., & He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9627–9636) (2019).
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022) (2021).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1 (2017).
Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., & Keutzer, K. Densenet: Implementing efficient convnet descriptor pyramids (2014). arXiv preprint arXiv:1404.1869.
Tan, M., & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105–6114) (PMLR, 2019).
Kim, T., Moon, N. H., Goh, T. S. & Jung, I. D. Detection of incomplete atypical femoral fracture on anteroposterior radiographs via explainable artificial intelligence. Sci. Rep. 13(1), 10415 (2023).
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255) (2009).
Liu, Y. et al. Efficient training of visual transformers with small datasets. Adv. Neural Inf. Process. Syst. 34, 23818–23830 (2021).
Pierre, M. A., Zurakowski, D., Nazarian, A., Hauser-Kara, D. A. & Snyder, B. D. Assessment of the bilateral asymmetry of human femurs based on physical, densitometric, and structural rigidity characteristics. J. Biomech. 43(11), 2228–36. https://doi.org/10.1016/j.jbiomech.2010.02.032 (2010).
Rao, A. D., Reddy, S. & Rao, D. S. Is there a difference between right and left femoral bone density?. J. Clin. Densitom. 3(1), 57–61. https://doi.org/10.1385/JCD:3:1:057 (2000).
Yang, R. S., Chieng, P. U., Tsai, K. S. & Liu, T. K. Symmetry of bone mineral density in the hips is not affected by age. Nucl. Med. Commun. 17(8), 711–6. https://doi.org/10.1097/00006231-199608000-00012 (1996).
Faulkner, K. G., Genant, H. K. & McClung, M. Bilateral comparison of femoral bone density and hip axis length from single and fan beam DXA scans. Calcif. Tissue Int. 56(1), 26–31. https://doi.org/10.1007/bf00298740 (1995).
Droll, K. P., Broekhuyse, H. & O’Brien, P. Fracture of the femoral head. JAAOS-J. Am. Acad. Orthoped. Surg. 15(12), 716–727 (2007).
Cheng, C. T. et al. A scalable physician-level deep learning algorithm of universal trauma finding detection of pelvic radiographs, PelvixNet dataset. Gigantum. https://doi.org/10.34747/f06m-m978 (2021).
Han, J., Wang, X. & Liu, W. Contextual prior constrained deep networks for mitosis detection with point annotations. IEEE Access 9, 71954–71967 (2021).
Gao, Z. et al. A semi-supervised multi-task learning framework for cancer classification with weak annotation in whole-slide images. Med. Image Anal. 83, 102652 (2023).
Gao, Z., Puttapirat, P., Shi, J., & Li, C. Renal cell carcinoma detection and subtyping with minimal point-based annotation in whole-slide images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part V 23 (pp. 439–448) (Springer International Publishing, 2020).
Han, X., Zhai, Y., Yu, Z., Peng, T., & Zhang, X. Y. Detecting extremely small lesions in mouse brain MRI with point annotations via multi-task learning. In Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pp. 498–506 (Springer International Publishing, 2021).
Acknowledgements
All figures have been prepared by İ.Y.P., using Microsoft PowerPoint version 16.76.1 and Matplotlib software library version 3.5.2.
Funding
The research reported in this publication was supported by the National Institute On Aging of the National Institutes of Health (NIH) Small Business Innovation Research (SBIR) under Award Number R44AG081031. The content is solely the authors' responsibility and does not necessarily represent the official views of the NIH.
Author information
Authors and Affiliations
Contributions
All of the listed authors have participated actively in the entire study project. İ.Y.P., J.W, A.N., and As.V. developed the design and conduct of the study. A.N. and J.W. led the data collection. M.P, E.R., D.Y., S.M., and N.K. aided in data collection. Ai.V. annotated imaging data. İ.Y.P. performed data analysis. İ.Y.P. wrote the first draft of the manuscript, and all authors commented on previous versions. All authors participated in and approved the final submission.
Corresponding author
Ethics declarations
Competing interests
İ.Y.P., D.Y., S.M., N.K., M.P, E.R., J.W., and Ai.V. declare they have no financial interests. As.V. and A.N. received the research grant as principal investigators. A.N. is also a consultant with BioSensics, LLC, on an unrelated project.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yıldız Potter, İ., Yeritsyan, D., Mahar, S. et al. Proximal femur fracture detection on plain radiography via feature pyramid networks. Sci Rep 14, 12046 (2024). https://doi.org/10.1038/s41598-024-63001-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-63001-2
Keywords
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.