Automated abnormality detection in lower extremity radiographs using deep learning


Musculoskeletal disorders are a major healthcare challenge around the world. We investigate the utility of convolutional neural networks (CNNs) in performing generalized abnormality detection on lower extremity radiographs. We also explore the effect of pretraining, dataset size and model architecture on model performance to provide recommendations for future deep learning analyses on extremity radiographs, especially when access to large datasets is challenging. We collected a large dataset of 93,455 lower extremity radiographs of multiple body parts, with each exam labelled as normal or abnormal. A 161-layer densely connected, pretrained CNN achieved an AUC-ROC of 0.880 (sensitivity = 0.714, specificity = 0.961) on this abnormality classification task. Our findings show that a single CNN model can be effectively utilized for the identification of diverse abnormalities in highly variable radiographs of multiple body parts, a result that holds potential for improving patient triage and assisting with diagnostics in resource-limited settings.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Categorization of patients in training, validation and test sets.
Fig. 2: Model architecture.
Fig. 3: Grad-CAM visualizations for abnormal lower extremities.

Data availability

We are releasing our de-identified test set as part of this manuscript. This dataset includes radiographs from 182 patients and demonstrates class balance across normal and abnormal labels as well as the four types of lower extremity (foot, hip, knee and ankle). In addition, two board-certified radiologists manually refined all labels, which guarantees a high level of accuracy. The dataset is available at

Code availability

Our deep learning training framework is available at:


  1. 1.

    Yelin, E., Weinstein, S. & King, T. The burden of musculoskeletal diseases in the United States. Semin. Arthritis Rheum. 46, 259–60. (2016).

    Article  Google Scholar 

  2. 2.

    Amin, S., Achenbach, S. J., Atkinson, E. J., Khosla, S. & Melton, L. J. III Trends in fracture incidence: a population-based study over 20 years. J. Bone Miner. Res. 29, 581–589 (2014).

    Article  Google Scholar 

  3. 3.

    Gyftopoulos, S. et al. Changing musculoskeletal extremity imaging utilization from 1994 through 2013: a Medicare beneficiary perspective. Am. J. Roentgenol. 209, 1103–1109 (2017).

    Article  Google Scholar 

  4. 4.

    Lee, C. S., Nagy, P. G., Weaver, S. J. & Newman-Toker, D. E. Cognitive and system factors contributing to diagnostic errors in radiology. Am. J. Roentgenol. 201, 611–617 (2013).

    Article  Google Scholar 

  5. 5.

    Bhargavan, M., Kaye, A. H., Forman, H. P. & Sunshine, J. H. Workload of radiologists in United States in 2006–2007 and trends since 1991–1992. Radiology 252, 458–467 (2009).

    Article  Google Scholar 

  6. 6.

    Rajpurkar, P. et al. MURA: large dataset for abnormality detection in musculoskeletal radiographs. Preprint at (2017).

  7. 7.

    Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).

    Article  Google Scholar 

  8. 8.

    Thian, Y. L. et al. Convolutional neural networks for automated fracture detection and localization on wrist radiographs. Radiology: Artificial Intelligence 1, e180001 (2019).

    Google Scholar 

  9. 9.

    Huh, M., Agrawal, P. & Efros, A. A. What makes ImageNet good for transfer learning? Preprint at (2016).

  10. 10.

    Rajpurkar, P. et al. CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. Preprint at (2017).

  11. 11.

    Larson, D. B. et al. Performance of a deep-learning neural network model in assessing skeletal maturity on pediatric hand radiographs. Radiology. 287, 313–22. (2018).

    Article  Google Scholar 

  12. 12.

    Antony, J., McGuinness, K., O’Connor, N. E. & Moran K. Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. In Proceedings of the International Conference on Pattern Recognition 1195–1200 (2017).

  13. 13.

    Bi, L., Kim, J., Kumar, A. & Feng, D. Automatic liver lesion detection using cascaded deep residual networks. Preprint at (2017).

  14. 14.

    Zhang, R. et al. Automatic detection and classification of colorectal polyps by transferring low-level CNN features from nonmedical domain. IEEE J. Biomed. Health Inform. 21, 41–47 (2017).

    Article  Google Scholar 

  15. 15.

    Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).

    Article  Google Scholar 

  16. 16.

    Greenspan, H. et al. Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35, 1153–1159 (2016).

    Article  Google Scholar 

  17. 17.

    Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131 (2018).

    Article  Google Scholar 

  18. 18.

    Yan, C. et al. Weakly supervised deep learning for thoracic disease classification and localization on chest X-rays. Preprint at (2018).

  19. 19.

    Bar, Y. et al. Chest pathology detection using deep learning with non-medical training. In Proceedings of the International Symposium on Biomedical Imaging 294–297 (2015).

  20. 20.

    Olczak, J. et al. Artificial intelligence for analyzing orthopedic trauma radiographs: deep learning algorithms—are they on par with humans for diagnosing fractures? Acta Orthop. 88, 581–586 (2017).

    Article  Google Scholar 

  21. 21.

    Lindsey, R. et al. Deep neural network improves fracture detection by clinicians. Proc. Natl Acad. Sci. USA 115, 11591–11596 (2018).

    MathSciNet  Article  Google Scholar 

  22. 22.

    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2921–2929 (IEEE, 2016).

  23. 23.

    Chartrand, G. et al. Deep learning: a primer for radiologists. Radiographics 37, 2113–31. (2017).

    Article  Google Scholar 

  24. 24.

    Yosinski, J., Clune, J., Bengio, Y. & Lipson H. How transferable are features in deep neural networks? In Proceedings of the 27th International Neural Information Processing Systems Conference 3320–3328 (MIT Press, 2014).

  25. 25.

    Dunnmon, J. A. et al. Assessment of convolutional neural networks for automated classification of chest radiographs. Radiology. 290, 537–544 (2019).

    Article  Google Scholar 

  26. 26.

    Gale, W., Oakden-Rayner, L., Carneiro, G., Bradley, A. P. & Palmer, L. J. Detecting hip fractures with radiologist-level performance using deep neural networks. Preprint at (2017).

  27. 27.

    Krupinski, E. A., Berbaum, K. S., Caldwell, R. T., Schartz, K. M. & Kim, J. Long radiology workdays reduce detection and accommodation accuracy. J. Am. Coll. Radiol. 7, 698–704 (2010).

    Article  Google Scholar 

  28. 28.

    Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015).

    MathSciNet  Article  Google Scholar 

  29. 29.

    He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  30. 30.

    Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 4700–4708 (IEEE, 2017).

  31. 31.

    He, K., Zhang, X., Ren, S. & Sun J. Delving deep into rectifiers. Surpassing human-level performance on ImageNet classification. In Proceedings of the International Conference on Computer Vision 1026–1034 (2015).

  32. 32.

    DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 44, 837–845 (1988).

    Article  Google Scholar 

Download references


This study was supported by the Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI). The research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under award no. R01LM012966 and Stanford Child Health Research Institute (Stanford NIH-NCATS-CTSA grant #UL1 TR001085). This research used data or services provided by STARR (STAnford medicine Research data Repository) a clinical data warehouse made possible by the Stanford School of Medicine Research Office.

Author information




All authors contributed extensively to this work. M.V., M.L. and R.G. designed the methodology and algorithms, implemented models, analysed results and wrote the manuscript. B.N.P. and M.P.L. oversaw the entire project and helped with study design, methodology development and manuscript writing. N.K. and P.R. provided technical advice and manuscript feedback. J.D. and J.L. contributed to statistical analyses and writing the manuscript. C.B. and K.S. assisted with data collection and labelling. L.F.-F. provided resources and advice.

Corresponding author

Correspondence to Bhavik N. Patel.

Ethics declarations

Competing interests

There was no industry support or other funding for this work. There are no conflicts of interests that pertain specifically to this work. However, some of the authors are consultants for medical industry. M.P.L. is supported by the National Library of Medicine of the NIH (R01LM012966). B.N.P. has grant support from GE. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or GE. M.P.L.’s activities not related to this Article include positions as shareholder and advisory board member for Segmed Inc., and Bunker Hill. M.V., R.G., M.L., N.K., P.R., J.L. and K.S. are not employees or consultants for industry and had control of the data and the analysis.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Varma, M., Lu, M., Gardner, R. et al. Automated abnormality detection in lower extremity radiographs using deep learning. Nat Mach Intell 1, 578–583 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing