Musculoskeletal disorders are a major healthcare challenge around the world. We investigate the utility of convolutional neural networks (CNNs) in performing generalized abnormality detection on lower extremity radiographs. We also explore the effect of pretraining, dataset size and model architecture on model performance to provide recommendations for future deep learning analyses on extremity radiographs, especially when access to large datasets is challenging. We collected a large dataset of 93,455 lower extremity radiographs of multiple body parts, with each exam labelled as normal or abnormal. A 161-layer densely connected, pretrained CNN achieved an AUC-ROC of 0.880 (sensitivity = 0.714, specificity = 0.961) on this abnormality classification task. Our findings show that a single CNN model can be effectively utilized for the identification of diverse abnormalities in highly variable radiographs of multiple body parts, a result that holds potential for improving patient triage and assisting with diagnostics in resource-limited settings.
This is a preview of subscription content
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
We are releasing our de-identified test set as part of this manuscript. This dataset includes radiographs from 182 patients and demonstrates class balance across normal and abnormal labels as well as the four types of lower extremity (foot, hip, knee and ankle). In addition, two board-certified radiologists manually refined all labels, which guarantees a high level of accuracy. The dataset is available at https://aimi.stanford.edu/lera-lower-extremity-radiographs-2.
Our deep learning training framework is available at: https://github.com/maya124/MSK-LE.
Yelin, E., Weinstein, S. & King, T. The burden of musculoskeletal diseases in the United States. Semin. Arthritis Rheum. 46, 259–60. (2016).
Amin, S., Achenbach, S. J., Atkinson, E. J., Khosla, S. & Melton, L. J. III Trends in fracture incidence: a population-based study over 20 years. J. Bone Miner. Res. 29, 581–589 (2014).
Gyftopoulos, S. et al. Changing musculoskeletal extremity imaging utilization from 1994 through 2013: a Medicare beneficiary perspective. Am. J. Roentgenol. 209, 1103–1109 (2017).
Lee, C. S., Nagy, P. G., Weaver, S. J. & Newman-Toker, D. E. Cognitive and system factors contributing to diagnostic errors in radiology. Am. J. Roentgenol. 201, 611–617 (2013).
Bhargavan, M., Kaye, A. H., Forman, H. P. & Sunshine, J. H. Workload of radiologists in United States in 2006–2007 and trends since 1991–1992. Radiology 252, 458–467 (2009).
Rajpurkar, P. et al. MURA: large dataset for abnormality detection in musculoskeletal radiographs. Preprint at https://arxiv.org/abs/1712.06957 (2017).
Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).
Thian, Y. L. et al. Convolutional neural networks for automated fracture detection and localization on wrist radiographs. Radiology: Artificial Intelligence 1, e180001 (2019).
Huh, M., Agrawal, P. & Efros, A. A. What makes ImageNet good for transfer learning? Preprint at https://arxiv.org/abs/1608.08614 (2016).
Rajpurkar, P. et al. CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. Preprint at https://arxiv.org/abs/1711.05225 (2017).
Larson, D. B. et al. Performance of a deep-learning neural network model in assessing skeletal maturity on pediatric hand radiographs. Radiology. 287, 313–22. (2018).
Antony, J., McGuinness, K., O’Connor, N. E. & Moran K. Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. In Proceedings of the International Conference on Pattern Recognition 1195–1200 (2017).
Bi, L., Kim, J., Kumar, A. & Feng, D. Automatic liver lesion detection using cascaded deep residual networks. Preprint at https://arxiv.org/abs/1704.02703 (2017).
Zhang, R. et al. Automatic detection and classification of colorectal polyps by transferring low-level CNN features from nonmedical domain. IEEE J. Biomed. Health Inform. 21, 41–47 (2017).
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).
Greenspan, H. et al. Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35, 1153–1159 (2016).
Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131 (2018).
Yan, C. et al. Weakly supervised deep learning for thoracic disease classification and localization on chest X-rays. Preprint at https://arxiv.org/abs/1807.06067 (2018).
Bar, Y. et al. Chest pathology detection using deep learning with non-medical training. In Proceedings of the International Symposium on Biomedical Imaging 294–297 (2015).
Olczak, J. et al. Artificial intelligence for analyzing orthopedic trauma radiographs: deep learning algorithms—are they on par with humans for diagnosing fractures? Acta Orthop. 88, 581–586 (2017).
Lindsey, R. et al. Deep neural network improves fracture detection by clinicians. Proc. Natl Acad. Sci. USA 115, 11591–11596 (2018).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2921–2929 (IEEE, 2016).
Chartrand, G. et al. Deep learning: a primer for radiologists. Radiographics 37, 2113–31. (2017).
Yosinski, J., Clune, J., Bengio, Y. & Lipson H. How transferable are features in deep neural networks? In Proceedings of the 27th International Neural Information Processing Systems Conference 3320–3328 (MIT Press, 2014).
Dunnmon, J. A. et al. Assessment of convolutional neural networks for automated classification of chest radiographs. Radiology. 290, 537–544 (2019).
Gale, W., Oakden-Rayner, L., Carneiro, G., Bradley, A. P. & Palmer, L. J. Detecting hip fractures with radiologist-level performance using deep neural networks. Preprint at https://arxiv.org/abs/1711.06504 (2017).
Krupinski, E. A., Berbaum, K. S., Caldwell, R. T., Schartz, K. M. & Kim, J. Long radiology workdays reduce detection and accommodation accuracy. J. Am. Coll. Radiol. 7, 698–704 (2010).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 4700–4708 (IEEE, 2017).
He, K., Zhang, X., Ren, S. & Sun J. Delving deep into rectifiers. Surpassing human-level performance on ImageNet classification. In Proceedings of the International Conference on Computer Vision 1026–1034 (2015).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 44, 837–845 (1988).
This study was supported by the Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI). The research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under award no. R01LM012966 and Stanford Child Health Research Institute (Stanford NIH-NCATS-CTSA grant #UL1 TR001085). This research used data or services provided by STARR (STAnford medicine Research data Repository) a clinical data warehouse made possible by the Stanford School of Medicine Research Office.
There was no industry support or other funding for this work. There are no conflicts of interests that pertain specifically to this work. However, some of the authors are consultants for medical industry. M.P.L. is supported by the National Library of Medicine of the NIH (R01LM012966). B.N.P. has grant support from GE. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or GE. M.P.L.’s activities not related to this Article include positions as shareholder and advisory board member for Segmed Inc., Nines.ai and Bunker Hill. M.V., R.G., M.L., N.K., P.R., J.L. and K.S. are not employees or consultants for industry and had control of the data and the analysis.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Varma, M., Lu, M., Gardner, R. et al. Automated abnormality detection in lower extremity radiographs using deep learning. Nat Mach Intell 1, 578–583 (2019). https://doi.org/10.1038/s42256-019-0126-0
A versatile deep learning architecture for classification and label-free prediction of hyperspectral images
Nature Machine Intelligence (2021)
Differentiation of COVID-19 conditions in planar chest radiographs using optimized convolutional neural networks
Applied Intelligence (2021)
Deep learning detection of subtle fractures using staged algorithms to mimic radiologist search pattern
Skeletal Radiology (2021)