Machine-learning-based patient-specific prediction models for knee osteoarthritis

Jamshidi, Afshin; Pelletier, Jean-Pierre; Martel-Pelletier, Johanne

doi:10.1038/s41584-018-0130-5

Perspective
Published: 06 December 2018

OPINION

Machine-learning-based patient-specific prediction models for knee osteoarthritis

Afshin Jamshidi¹,
Jean-Pierre Pelletier¹ &
Johanne Martel-Pelletier¹

Nature Reviews Rheumatology volume 15, pages 49–60 (2019)Cite this article

5564 Accesses
109 Citations
33 Altmetric
Metrics details

Subjects

Abstract

Osteoarthritis (OA) is an extremely common musculoskeletal disease. However, current guidelines are not well suited for diagnosing patients in the early stages of disease and do not discriminate patients for whom the disease might progress rapidly. The most important hurdle in OA management is identifying and classifying patients who will benefit most from treatment. Further efforts are needed in patient subgrouping and developing prediction models. Conventional statistical modelling approaches exist; however, these models are limited in the amount of information they can adequately process. Comprehensive patient-specific prediction models need to be developed. Approaches such as data mining and machine learning should aid in the development of such models. Although a challenging task, technology is now available that should enable subgrouping of patients with OA and lead to improved clinical decision-making and precision medicine.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A generic scheme for clinical prediction modelling.**

Causal machine learning for predicting treatment outcomes

Article 19 April 2024

Stefan Feuerriegel, Dennis Frauen, … Mihaela van der Schaar

Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling

Article Open access 12 April 2024

Ting-Hsuan Sun, Chia-Chun Wang, … Kai-Cheng Hsu

An overview of clinical decision support systems: benefits, risks, and strategies for success

Article Open access 06 February 2020

Reed T. Sutton, David Pincock, … Karen I. Kroeker

References

Arden, N. & Cooper, C.in Osteoarthritis Handbook (Taylor & Francis, London, 2006).
McGuire, D. A., Carter, T. R. & Shelton, W. R. Complex knee reconstruction: osteotomies, ligament reconstruction, transplants, and cartilage treatment options. Arthroscopy 18, 90–103 (2002).
Article Google Scholar
Cooper, C. & Arden, N. K. Excess mortality in osteoarthritis. BMJ 342, d1407 (2011).
Article Google Scholar
Hochberg, M. C. Mortality in osteoarthritis. Clin. Exp. Rheumatol 26, S120–S124 (2008).
CAS PubMed Google Scholar
Bitton, R. The economic burden of osteoarthritis. Am. J. Manag. Care 15, S230–S235 (2009).
PubMed Google Scholar
Prieto-Alhambra, D. et al. Incidence and risk factors for clinically diagnosed knee, hip and hand osteoarthritis: influences of age, gender and osteoarthritis affecting other joints. Ann. Rheum. Dis. 73, 1659–1664 (2014).
Article Google Scholar
Martel-Pelletier, J. et al. Osteoarthritis. Nat. Rev. Dis. Primers 2, 16072 (2016).
Article Google Scholar
Blagojevic, M., Jinks, C., Jeffery, A. & Jordan, K. P. Risk factors for onset of osteoarthritis of the knee in older adults: a systematic review and meta-analysis. Osteoarthritis Cartilage 18, 24–33 (2010).
Article CAS Google Scholar
Zhang, W. Risk factors of knee osteoarthritis — excellent evidence but little has been done. Osteoarthritis Cartilage 18, 1–2 (2010).
Article CAS Google Scholar
McWilliams, D. F., Leeb, B. F., Muthuri, S. G., Doherty, M. & Zhang, W. Occupational risk factors for osteoarthritis of the knee: a meta-analysis. Osteoarthritis Cartilage 19, 829–839 (2011).
Article CAS Google Scholar
Raynauld, J. P. et al. Long term evaluation of disease progression through the quantitative magnetic resonance imaging of symptomatic knee osteoarthritis patients: correlation with clinical symptoms and radiographic changes. Arthritis Res. Ther. 8, R21 (2006).
Article Google Scholar
Solomon, D. H. et al. The comparative safety of analgesics in older adults with arthritis. Arch. Intern. Med. 170, 1968–1978 (2010).
Article Google Scholar
Marx, V. Biology: the big challenges of big data. Nature 498, 255–260 (2013).
Article CAS Google Scholar
Dolinski, K. & Troyanskaya, O. G. Implications of big data for cell biology. Mol. Biol. Cell 26, 2575–2578 (2015).
Article Google Scholar
Cintolo-Gonzalez, J. A. et al. Breast cancer risk models: a comprehensive overview of existing models, validation, and clinical applications. Breast Cancer Res. Treat. 164, 263–284 (2017).
Article Google Scholar
Cosma, G., Brown, D., Archer, M., Khan, M. & Pockley, A. G. A survey on computational intelligence approaches for predictive modeling in prostate cancer. Expert Syst. Appl. 70, 1–19 (2017).
Article Google Scholar
Fast and Secure protocol — FASP (Aspera, Inc., Emeryville, CA, USA).
Zhang, W. et al. Nottingham knee osteoarthritis risk prediction models. Ann. Rheum. Dis. 70, 1599–1604 (2011).
Article Google Scholar
Losina, E., Klara, K., Michl, G. L., Collins, J. E. & Katz, J. N. Development and feasibility of a personalized, interactive risk calculator for knee osteoarthritis. BMC Musculoskelet. Disord. 16, 312 (2015).
Article Google Scholar
Watt, E. W. & Bui, A. A. Evaluation of a dynamic Bayesian belief network to predict osteoarthritic knee pain using data from the osteoarthritis initiative. AMIA Annu. Symp. Proc. 2008, 788–792 (2008).
PubMed Central Google Scholar
Yoo, T. K., Kim, D. W., Choi, S. B., Oh, E. & Park, J. S. Simple scoring system and artificial neural network for knee osteoarthritis risk prediction: a cross-sectional study. PLoS ONE 11, e0148724 (2016).
Article Google Scholar
Lazzarini, N. et al. A machine learning approach for the identification of new biomarkers for knee osteoarthritis development in overweight and obese women. Osteoarthritis Cartilage 25, 2014–2021 (2017).
Article CAS Google Scholar
Schett, G. et al. Vascular cell adhesion molecule 1 as a predictor of severe osteoarthritis of the hip and knee joints. Arthritis Rheum. 60, 2381–2389 (2009).
Article CAS Google Scholar
Schett, G., Zwerina, J., Axmann, R., Willeit, J. & Stefan, K. Risk prediction for severe osteoarthritis. Ann. Rheum. Dis. 69, 1573–1574 (2010).
Article Google Scholar
Berthiaume, M. J. et al. Meniscal tear and extrusion are strongly associated with the progression of knee osteoarthritis as assessed by quantitative magnetic resonance imaging. Ann. Rheum. Dis. 64, 556–563 (2005).
Article Google Scholar
Raynauld, J. P. et al. Correlation between bone lesion changes and cartilage volume loss in patients with osteoarthritis of the knee as assessed by quantitative magnetic resonance imaging over a 24-month period. Ann. Rheum. Dis. 67, 683–688 (2008).
Article Google Scholar
Tanamas, S. K. et al. Bone marrow lesions in people with knee osteoarthritis predict progression of disease and joint replacement: a longitudinal study. Rheumatology 49, 2413–2419 (2010).
Article Google Scholar
Raynauld, J. P. et al. Risk factors predictive of joint replacement in a 2-year multicentre clinical trial in knee osteoarthritis using MRI: results from over 6 years of observation. Ann. Rheum. Dis. 70, 1382–1388 (2011).
Article Google Scholar
Pelletier, J. P. et al. What is the predictive value of MRI for the occurrence of knee replacement surgery in knee osteoarthritis? Ann. Rheum. Dis. 72, 1594–1604 (2013).
Article Google Scholar
Neogi, T. et al. Magnetic resonance imaging-based three-dimensional bone shape of the knee predicts onset of knee osteoarthritis: data from the osteoarthritis initiative. Arthritis Rheum. 65, 2048–2058 (2013).
Article Google Scholar
Raynauld, J. P. et al. Bone curvature changes can predict the impact of treatment on cartilage volume loss in knee osteoarthritis: data from a 2-year clinical trial. Rheumatology 56, 989–998 (2017).
Article Google Scholar
Fan, J., Han, F. & Liu, H. Challenges of big data analysis. Natl Sci. Rev. 1, 293–314 (2014).
Article Google Scholar
Haixiang, G. et al. Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017).
Article Google Scholar
Fu, X., Wang, L., Chua, K. S. & Chu, F. Training RBF Neural Networks on Unbalanced Data. Proc. 9th Int. Conf. Neural Inform. Processing (ICONIP’02) 2, 1016–1020 (2002).
Wasikowski, M. & Chen, X. W. Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22, 1388–1400 (2010).
Article Google Scholar
Khalilia, M., Chakraborty, S. & Popescu, M. Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 11, 51 (2011).
Article Google Scholar
Wang, K. J., Makond, B. & Wang, K. M. An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data. BMC Med. Inform. Decis. Mak. 13, 124 (2013).
Article CAS Google Scholar
Ozcift, A. Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis. Comput. Biol. Med. 41, 265–271 (2011).
Article Google Scholar
van Buuren, S. & Groothuis-Oudshoorn, K. MICE: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–68 (2011).
Article Google Scholar
IBM SPSS Statistics for Windows, version 25.0, released 2017 (IBM Corp., Armonk, NY, USA).
SAS/STAT® version 14.1 (SAS Institute Inc., Cary, NC, USA).
STATA Statistical Software, release 15, 2017 (StataCorp LLC, College Station, TX, USA).
Frank, E., Hall, M. A. & Witten, I. H. The WEKA workbench: online appendix for data mining: practical machine learning tools and techniques. UoW https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf (2016).
Zhang, Z. Missing data imputation: focusing on single imputation. Ann. Transl Med. 4, 9 (2016).
Article Google Scholar
Verborgh, R. & De Wilde, M. Using OpenRefine (Packt Publishing Ltd., Burmingham, UK, 2013).
Trifacta. Data wrangling tools & software. Trifacta https://www.trifacta.com (2018).
Paxata, Inc. Self-service data preparation for data analytics. Paxata https://www.paxata.com (2018).
Baruti, R. (ed.) Learning Alteryx: A Beginner’s Guide to Using Alteryx for Self-Service Analytics and Business Intelligence (Packt Publishing Ltd., Birmingham, UK, 2017).
McKinney, W. pandas: a foundational python library for data analysis and statistics. DLR http://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2011/submissions/pyhpc2011_submission_9.pdf (2011).
OBiBa. Open source software for epidemiology. OBiBa http://www.obiba.org (2018).
Optimus Company. Data cleansing and exploration made simple. Optimus https://hioptimus.com (2018).
Griffith, L. E. et al. Statistical approaches to harmonize data on cognitive measures in systematic reviews are rarely reported. J. Clin. Epidemiol. 68, 154–162 (2015).
Article Google Scholar
Royston, P., Parmar, M. K. & Sylvester, R. Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer. Stat. Med. 23, 907–926 (2004).
Article Google Scholar
Doiron, D. et al. Data harmonization and federated analysis of population-based studies: the BioSHaRE project. Emerg. Themes Epidemiol. 10, 12 (2013).
Article Google Scholar
Doiron, D., Raina, P., Ferretti, V., L’Heureux, F. & Fortier, I. Facilitating collaborative research: implementing a platform supporting data harmonization and pooling. Nor. Epidemiol. 21, 221–224 (2012).
Google Scholar
Alba, A. C. et al. Discrimination and calibration of clinical prediction models: users’ guides to the medical literature. JAMA 318, 1377–1384 (2017).
Article Google Scholar
Steyerberg, E. W. & Harrell, F. E. Jr. Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016).
Article Google Scholar
Siontis, G. C., Tzoulaki, I., Castaldi, P. J. & Ioannidis, J. P. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. J. Clin. Epidemiol. 68, 25–34 (2015).
Article Google Scholar
Tugwell, P. & Knottnerus, J. A. Clinical prediction models are not being validated. J. Clin. Epidemiol. 68, 1–2 (2015).
Article Google Scholar
Tugwell, P. & Knottnerus, J. A. Transferability/generalizability deserves more attention in ‘retest’ studies in diagnosis and prognosis. J. Clin. Epidemiol. 68, 235–236 (2015).
Article Google Scholar
Debray, T. P., Moons, K. G., Ahmed, I., Koffijberg, H. & Riley, R. D. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat. Med. 32, 3158–3180 (2013).
Article Google Scholar
Debray, T. P. et al. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J. Clin. Epidemiol. 68, 279–289 (2015).
Article Google Scholar
Steyerberg, E. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating (Springer New York, 2010).
Papageorgiou, E. I., Subramanian, J., Karmegam, A. & Papandrianos, N. A risk management model for familial breast cancer: a new application using Fuzzy Cognitive Map method. Comput. Methods Programs Biomed. 122, 123–135 (2015).
Article Google Scholar
Froelich, W., Papageorgiou, E. I., Samarinas, M. & Skriapas, K. Application of evolutionary fuzzy cognitive maps to the long-term prediction of prostate cancer. Appl. Soft Comput. 12, 3810–3817 (2012).
Article Google Scholar
Takahashi, H. et al. Prediction model for knee osteoarthritis based on genetic and clinical information. Arthritis Res. Ther. 12, R187 (2010).
Article Google Scholar
Kerkhof, H. J. et al. Prediction model for knee osteoarthritis incidence, including clinical, genetic and biochemical risk factors. Ann. Rheum. Dis. 73, 2116–2121 (2014).
Article CAS Google Scholar
Kinds, M. B. et al. Evaluation of separate quantitative radiographic features adds to the prediction of incident radiographic osteoarthritis in individuals with recent onset of knee pain: 5-year follow-up in the CHECK cohort. Osteoarthritis Cartilage 20, 548–556 (2012).
Article CAS Google Scholar
Swan, A. L. et al. A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genomics 16, S2 (2015).
Article Google Scholar
Ashinsky, B. G. et al. Predicting early symptomatic osteoarthritis in the human knee using machine learning classification of magnetic resonance images from the osteoarthritis initiative. J. Orthop. Res. 35, 2243–2250 (2017).
Article Google Scholar
Long, M. J., Papi, E., Duffell, L. D. & McGregor, A. H. Predicting knee osteoarthritis risk in injured populations. Clin. Biomech. 47, 87–95 (2017).
Article Google Scholar
Minciullo, L., Bromiley, P. A., Felson, D. T. & Cootes, T. F. Indecisive trees for classification and prediction of knee osteoarthritis. 8th Int. Workshop MLMI 2017 MICCAI 2017 Proc. 10541, 283–290 (2017).
Jamshidi, A., Ait-kadi, D., Ruiz, A. & Rebaiaia, M. L. Dynamic risk assessment of complex systems using FCM. Int. J. Prod. Res. 56, 1070–1088 (2017).
Article Google Scholar
Meher, S. K. & Pal, S. K. Rough-wavelet granular space and classification of multispectral remote sensing image. Appl. Soft Comput. 11, 5662–5673 (2011).
Article Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS Google Scholar
Hastie, T., Tibshirani, R. & Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations (Chapman and Hall/CRC, 2015).
Meinshausen, N. & Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006).
Article Google Scholar
Huang, J., Ma, S. & Zhang, C. H. Adaptive Lasso for sparse high-dimensional regression models. Stat. Sin. 18, 1603–1618 (2008).
Google Scholar
Simon, N., Friedman, J., Hastie, T. & Tibshirani, R. A. Sparse-group lasso. J. Comput. Graph. Stat. 22, 231–245 (2013).
Article Google Scholar
Friedman, J. et al. Package ‘glmnet’. The Comprehensive R Archive Network https://cran.r-project.org/web/packages/glmnet/glmnet.pdf (2018).
Alakwaa, F. M., Chaudhary, K. & Garmire, L. X. Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data. J. Proteome Res. 17, 337–347 (2018).
Article CAS Google Scholar
Nezhad, M. Z., Zhu, D., Li, X., Yang, K. & Levy, P. SAFS: a deep feature selection approach for precision medicine. Preprint at arXiv https://arxiv.org/abs/1704.05960 (2017).
Li, Y., Chen, C. Y. & Wasserman, W. W. Deep feature selection: theory and application to identify enhancers and promoters. J. Comput. Biol. 23, 322–336 (2016).
Article CAS Google Scholar
Christ, M., Braun, N., Neuffer, J. & Kempa-Liehr, W. A. Time series feature extraction on basis of scalable hypothesis tests (tsfresh — a Python package). Neurocomputing 307, 72–77 (2018).
Article Google Scholar
Fulcher, B. D. & Jones, N. S. hctsa: a computational framework for automated time-series phenotyping using massive feature extraction. Cell Syst. 5, 527–531 (2017).
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank V. Wallis for her assistance with manuscript preparation.

Author information

Authors and Affiliations

Osteoarthritis Research Unit, University of Montreal Hospital Research Centre (CRCHUM), Montreal, Quebec, Canada
Afshin Jamshidi, Jean-Pierre Pelletier & Johanne Martel-Pelletier

Authors

Afshin Jamshidi
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pierre Pelletier
View author publications
You can also search for this author in PubMed Google Scholar
Johanne Martel-Pelletier
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.M.-P. and A.J. researched data for the article. All authors wrote the article, made substantial contribution to discussions of the content and reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Johanne Martel-Pelletier.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplentary Table

GlossaryGlossary terms

Artificial intelligence: The process of creating systems that can learn from experience and adjust to new inputs in order to perform human-like tasks. Machine-learning is a fundamental concept of artificial intelligence.
Calibration: Calibration measurements represent the level of accuracy of a model in estimating the absolute risk (that is, the agreement between the observed and predicted risk). Poorly calibrated models will underestimate or overestimate the outcome of interest.
Classification models: In statistics and machine-learning, classification is the process of identifying the category of a new observation on the basis of a training set of data containing observations for which the category (outcome value) is known. In the field of osteoarthritis, an example could be classification of patients into slow progressors and fast progressors on the basis of several input variables.
Deep-learning: A subfield of machine-learning that is based on advanced artificial neural networks; this field has enabled doctors in different fields of medicine to obtain a precise 3D understanding of 2D images.
Discrimination: Discrimination measurements identify to what extent a model discriminates items of different classes (for example, individuals with disease and without disease). For binary outcomes, the receiver operating characteristic curve or C-statistic could be applied for discrimination measurement.
Feature selection: Feature selection refers to the process of obtaining a subset of variables from an original set of variables according to certain feature selection criteria. The feature selection step precedes the learning step of a prediction model and good feature selection results can improve the learning accuracy, reduce learning time and simplify learning results.
Generalizability: Refers to the accuracy with which a prediction model developed from one study population can be used for the population at large.
Imputation: In machine-learning and statistics, imputation is the process of replacing missing data with substituted values to avoid bias or inaccuracies in the results.
Interpretability: Model interpretability describes the ability of the user to understand the model, which includes understanding the relationships between the input and outcome variables (for example, knowing how the selected input variables contribute to the outcome variable).
Regression models: Regression is the process of identifying the value of a new observation on the basis of a training set of data containing observations for which the category (outcome value) is known. In the field of osteoarthritis, an example could be predicting the probability of disease.
Semi-supervised learning: Semi-supervised learning is typically when only a small amount of data are labelled (that is, have both input and output variables) and a large amount are unlabelled (that is, have only input data); this method falls between unsupervised learning and supervised learning.
Supervised learning: Supervised learning is where you have input variables (x) and an output variable (y) and use an algorithm to learn the mapping function from the input to the output y = f(x).
Training: The training for machine learning involves providing a machine-learning algorithm with training data (input and outcome variables) to learn from. The learning algorithm finds patterns in the training data such that the input parameters correspond to the target. Machine-learning models are applied to do predictions on new data for which the outcome value is not known (for example, to determine to which class the new observation belongs).
Unsupervised learning: In unsupervised learning, only input data (x) exist and there are no corresponding output variables. The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jamshidi, A., Pelletier, JP. & Martel-Pelletier, J. Machine-learning-based patient-specific prediction models for knee osteoarthritis. Nat Rev Rheumatol 15, 49–60 (2019). https://doi.org/10.1038/s41584-018-0130-5

Download citation

Published: 06 December 2018
Issue Date: January 2019
DOI: https://doi.org/10.1038/s41584-018-0130-5

This article is cited by

Data-driven identification of predictive risk biomarkers for subgroups of osteoarthritis using interpretable machine learning
- Rikke Linnemann Nielsen
- Thomas Monfeuga
- Ramneek Gupta
Nature Communications (2024)
Meteorological factors cannot be ignored in machine learning-based methods for predicting dengue, a systematic review
- Lanlan Fang
- Wan Hu
- Guixia Pan
International Journal of Biometeorology (2024)
Development and evaluation of nomograms for predicting osteoarthritis progression based on MRI cartilage parameters: data from the FNIH OA biomarkers Consortium
- Chunbo Deng
- Yingwei Sun
- Fenghua Zhou
BMC Medical Imaging (2023)
Prediction of gap balancing based on 2-D radiography in total knee arthroplasty for knee osteoarthritis patients
- Zhuo Zhang
- Yang Luo
- Guoqiang Zhang
Arthroplasty (2023)
Repeated intra-articular injections of umbilical cord-derived mesenchymal stem cells for knee osteoarthritis: a phase I, single-arm study
- Yunong Ao
- Jiangjie Duan
- Fuyou Wang
BMC Musculoskeletal Disorders (2023)