An introduction to machine learning and analysis of its use in rheumatic diseases

Kingsmore, Kathryn M.; Puglisi, Christopher E.; Grammer, Amrie C.; Lipsky, Peter E.

doi:10.1038/s41584-021-00708-w

Review Article
Published: 02 November 2021

An introduction to machine learning and analysis of its use in rheumatic diseases

Nature Reviews Rheumatology volume 17, pages 710–730 (2021)Cite this article

3994 Accesses
34 Citations
28 Altmetric
Metrics details

Subjects

Abstract

Machine learning (ML) is a computerized analytical technique that is being increasingly employed in biomedicine. ML often provides an advantage over explicitly programmed strategies in the analysis of multidimensional information by recognizing relationships in the data that were not previously appreciated. As such, the use of ML in rheumatology is increasing, and numerous studies have employed ML to classify patients with rheumatic autoimmune inflammatory diseases (RAIDs) from medical records and imaging, biometric or gene expression data. However, these studies are limited by sample size, the accuracy of sample labelling, and absence of datasets for external validation. In addition, there is potential for ML models to overfit or underfit the data and, thereby, these models might produce results that cannot be replicated in an unrelated dataset. In this Review, we introduce the basic principles of ML and discuss its current strengths and weaknesses in the classification of patients with RAIDs. Moreover, we highlight the successful analysis of the same type of input data (for example, medical records) with different algorithms, illustrating the potential plasticity of this analytical approach. Altogether, a better understanding of ML and the future application of advanced analytical techniques based on this approach, coupled with the increasing availability of biomedical data, may facilitate the development of meaningful precision medicine for patients with RAIDs.

Key points

Appropriate application of machine learning (ML) algorithms and model construction, including that using data from patients with rheumatic autoimmune inflammatory diseases (RAIDs), involves preprocessing, feature selection, comparisons of multiple models to determine which is most appropriate for the data, and proper validation.
ML has been applied to various types of data from patients with RAIDs, including medical records and imaging data to classify patients, sequencing data to predict genetic risk loci, biometric data to identify disease activity, transcriptomic data to classify or cluster patient subtypes, and demographic, genetic and genomic data to predict treatment response.
Most published studies that describe the employment of ML in RAIDs, however, only serve as proof-of-principle studies as they lack adequate sample sizes or external test datasets; consequently, clinical translation of ML in rheumatology is in a nascent stage.
Current ML studies provide hypotheses that can be validated in large retrospective datasets or used to design prospective trials characterized by correct data collection and sample sizes that are suitable for the application of ML.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The output of a machine learning model is classification, regression or clustering.**

**Fig. 2: Machine learning model workflow.**

**Fig. 3: Guidelines for selecting the most appropriate machine learning algorithm.**

**Fig. 4: Receiver operating characteristic curves are used to assess binary classification performance.**

A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases

Article Open access 09 March 2020

Precision medicine: the precision gap in rheumatic disease

Article 10 October 2022

A machine learning model identifies patients in need of autoimmune disease testing using electronic health records

Article Open access 25 April 2023

References

Jordan, M. I. & Mitchell, T. M. Machine learning: trends, perspectives, and prospects. Science 349, 255–260 (2015).
Article CAS PubMed Google Scholar
Samuel, A. L. Some studies in machine learning using the game of checkers IBM journals & magazine. IBM J. Res. Dev. 3, 210–229 (1959).
Article Google Scholar
Bhavsar, P., Safro, I., Bouaynaya, N., Polikar, R. & Dera, D. Machine learning in transportation data analytics in Data Analytics for Intelligent Transportation Systems (eds Chowdhury, M., Apon, A. & Dey, K.) 283–307 (Elsevier Inc., 2017).
Kubat, M. An Introduction to Machine Learning. (Springer International Publishing, 2017).
Hand, D. Statistics and data mining: intersecting disciplines. ACM SIGKDD Explor. Newsl. 1, 16–19 (1999).
Article Google Scholar
Kim, K.-J. & Tagkopoulos, I. Application of machine learning in rheumatic disease research. Korean J. Intern. Med. 34, 708–722 (2019).
Article PubMed PubMed Central Google Scholar
Liao, K. P. et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 350, h1885 (2015).
Article PubMed PubMed Central Google Scholar
Turner, C. A. et al. Word2Vec inversion and traditional text classifiers for phenotyping lupus. BMC Med. Inform. Decis. Mak. 17, 126 (2017).
Article PubMed PubMed Central Google Scholar
Jorge, A. et al. Identifying lupus patients in electronic health records: development and validation of machine learning algorithms and application of rule-based algorithms. Semin. Arthritis Rheum. 49, 84–90 (2019).
Article PubMed PubMed Central Google Scholar
Zhou, S. M. et al. Defining disease phenotypes in primary care electronic health records by a machine learning approach: a case study in identifying rheumatoid arthritis. PLoS One 11, 1–14 (2016).
Google Scholar
Norgeot, B. et al. Assessment of a deep learning model based on electronic health record data to forecast clinical outcomes in patients with rheumatoid arthritis. JAMA Netw. Open. 2, e190606 (2019).
Article PubMed PubMed Central Google Scholar
Walsh, J. A. et al. Identifying axial spondyloarthritis in electronic medical records of US Veterans. Arthritis Care Res. 69, 1414–1420 (2017).
Article Google Scholar
Odgers, D. J., Tellis, N., Hall, H. & Dumontier, M. Using LASSO regression to predict rheumatoid arthritis treatment efficacy. AMIA Jt. Summits Transl. Sci. Proc. 2016, 176–83 (2016).
PubMed PubMed Central Google Scholar
Lockshin, M. D., Barbhaiya, M., Izmirly, P., Buyon, J. P. & Crow, M. K. SLE: Reconciling heterogeneity. Lupus Sci. Med. 6, e000280 (2019).
Article PubMed PubMed Central Google Scholar
McInnes, I. B. Psoriatic arthritis: embracing pathogenetic and clinical heterogeneity? Clin. Exp. Rheumatol. 34, 9–11 (2016).
PubMed Google Scholar
Weyand, C. M., Klimiuk, P. A. & Goronzy, J. J. Heterogeneity of rheumatoid arthritis: from phenotypes to genotypes. Springer Semin. Immunopathol. 20, 5–22 (1998).
Article CAS PubMed Google Scholar
de Bruijne, M. Machine learning approaches in medical image analysis: From detection to diagnosis. Med. Image Anal. 33, 94–97 (2016).
Article PubMed Google Scholar
Deeb, S. J. et al. Machine learning-based classification of diffuse large B-cell lymphoma patients by their protein expression profiles. Mol. Cell. Proteom. 14, 2947–60 (2015).
Article CAS Google Scholar
Ali, M. & Aittokallio, T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys. Rev. 11, 31–39 (2019).
Article CAS PubMed Google Scholar
Lou, B. et al. An image-based deep learning framework for individualising radiotherapy dose: a retrospective analysis of outcome prediction. Lancet Digit. Heal. 1, e136–e147 (2019).
Article Google Scholar
Jiang, M. et al. Machine learning in rheumatic diseases. Clin. Rev. Allergy Immunol. 60, 96–110 (2021).
Article PubMed Google Scholar
Hügle, M., Omoumi, P., van Laar, J. M., Boedecker, J. & Hügle, T. Applied machine learning and artificial intelligence in rheumatology. Rheumatol. Adv. Pract. 4, rkaa005 (2020).
Google Scholar
Stoel, B. Use of artificial intelligence in imaging in rheumatology-current status and future perspectives. RMD Open 6, e001063 (2020).
Article PubMed PubMed Central Google Scholar
Kingsmore, K. M., Grammer, A. C. & Lipsky, P. E. Drug repurposing to improve treatment of rheumatic autoimmune inflammatory diseases. Nat. Rev. Rheumatol. 16, 32–52 (2020).
Article CAS PubMed Google Scholar
Guan, Y. et al. Machine learning to predict anti-TNF drug responses of rheumatoid arthritis patients by integrating clinical and genetic markers. Arthritis Rheumatol. 71, 1987–1996 (2019).
Article CAS PubMed Google Scholar
Fautrel, B. et al. Choice of second-line disease-modifying antirheumatic drugs after failure of methotrexate therapy for rheumatoid arthritis: a decision tree for clinical practice based on rheumatologists’ preferences. Arthritis Care Res. 61, 425–434 (2009).
Article CAS Google Scholar
Eyre, S., Orozco, G. & Worthington, J. The genetics revolution in rheumatology: large scale genomic arrays and genetic mapping. Nat. Rev. Rheumatol. 13, 421–432 (2017).
Article CAS PubMed Google Scholar
Catalina, M. D. et al. Patient ancestry significantly contributes to molecular heterogeneity of systemic lupus erythematosus. JCI Insight 5, e140380 (2020).
Article PubMed Central Google Scholar
Provost, F. & Kohavi., R. Glossary of Terms. J. Mach. Learn. 30, 271–274 (1998).
Article Google Scholar
Zhu, X. & Goldberg, A. Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 6, 1–116 (2009).
Google Scholar
Haldorai, A., Ramu, A. & Suriya, M. Organization internet of things (IoTs): supervised, unsupervised, and reinforcement learning. in EAI/Springer Innovations in Communication and Computing 27–53 (Springer, 2020).
Jain, A. K. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31, 651–666 (2010).
Article Google Scholar
Kotsiantis, S. B., Zaharakis, I. D. & Pintelas, P. E. Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26, 159–190 (2006).
Article Google Scholar
Ayodele, T. O. Types of Machine Learning Algorithms. in New Advances in Machine Learning (ed. Zhang, Y.) 19–49 (InTech, 2010).
Alasadi, S. A. & Bhaya, W. S. Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 12, 4102–4107 (2017).
Google Scholar
Zhang, Z. Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4, 9 (2016).
Cao, X. H., Stojkovic, I. & Obradovic, Z. A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC Bioinforma. 17, 359 (2016).
Article Google Scholar
Han, J., Kamber, M. & Pei, J. Data Transformation and Data Discretization. in Data mining: Concepts and Techniques 111–119 (Elsevier, 2012).
Saeys, Y., Inza, I. & Larranaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007).
Article CAS PubMed Google Scholar
Tuikkala, J., Elo, L. L., Nevalainen, O. S. & Aittokallio, T. Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinforma. 9, 202 (2008).
Article Google Scholar
Aljuaid, T. & Sasi, S. Proper imputation techniques for missing values in data sets. in Proceedings of the 2016 International Conference on Data Science and Engineering ICDSE 2016 (Institute of Electrical and Electronics Engineers Inc., 2017)
Rahman, M. M. & Davis, D. N. Machine Learning-Based Missing Value Imputation Method for Clinical Datasets. in Lecture Notes in Electrical Engineering 245–257 (Springer, Dordrecht, 2013).
Raja, P. S. & Thangavel, K. Missing value imputation using unsupervised machine learning techniques. Soft Comput. 24, 4361–4392 (2020).
Article Google Scholar
Phung, S., Kumar, A. & Kim, J. A deep learning technique for imputing missing healthcare data. in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 6513–6516 (Institute of Electrical and Electronics Engineers Inc., 2019).
Chowdhury, G. G. Natural language processing. Annu. Rev. Inf. Sci. Technol. 37, 51–89 (2005).
Article Google Scholar
Zhang, Y., Jin, R. & Zhou, Z. H. Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. 1, 43–52 (2010).
Article Google Scholar
Kozlowski, A. C., Taddy, M. & Evans, J. A. The geometry of culture: analyzing the meanings of class through word embeddings. Am. Sociol. Rev. 84, 905–949 (2019).
Article Google Scholar
McInnes, B. T., Pedersen, T. & Carlis, J. Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain. AMIA Annu. Symp. Proc. 2007, 533–537 (2007).
PubMed Central Google Scholar
El Bouchefry, K. & de Souza, R. S. Learning in Big Data: Introduction to Machine Learning. in Knowledge Discovery in Big Data from Astronomy and Earth Observation 225–249 (Elsevier, 2020).
Lever, J., Krzywinski, M. & Altman, N. Principal component analysis. Nat. Methods 14, 641–642 (2017).
Article CAS Google Scholar
Anowar, F., Sadaoui, S. & Selim, B. Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput. Sci. Rev. 40, 100378 (2021).
Google Scholar
Velliangiri, S., Alagumuthukrishnan, S. & Thankumar Joseph, S. I. A review of dimensionality reduction techniques for efficient computation. Procedia Comput. Sci. 165, 104–111 (2019).
Article Google Scholar
Guyon, I. & Elisseefl, A. An introduction to feature extraction. in Studies in Fuzziness and Soft Computing Vol. 207 1–25 (Springer, 2006).
Kubat, M. Some Practical Aspects to Know About. in An Introduction to Machine Learning 191–210 (Springer International Publishing, 2017).
Elashoff, J. C., Elashoff, R. M. & Goldman, G. E. On the choice of variables in classification problems with dichotomous variables. Biometrika 54, 668–670 (1967).
Article CAS PubMed Google Scholar
Toussaint, G. T. Note on optimal selection of independent binary-valued features for pattern recognition. IEEE Trans. Inf. Theory 17, 618 (1971).
Google Scholar
Dormann, C. F. et al. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36, 27–46 (2013).
Article Google Scholar
Stańczyk, U. Feature evaluation by filter, Wrapper and embedded approaches. Stud. Comput. Intell. 584, 29–44 (2015).
Article Google Scholar
Ceccarelli, F. et al. Biomarkers of erosive arthritis in systemic lupus erythematosus: application of machine learning models. PLoS One 13, e0207926 (2018).
Article PubMed PubMed Central Google Scholar
Guyon, I. & Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 27–46 (2003).
Google Scholar
Tuv, E. et al. Feature selection with ensembles, artificial variables, and redundancy elimination. J. Mach. Learn. Res. 10, 1341–1366 (2009).
Google Scholar
Altman, N. & Krzywinski, M. Points of significance: clustering. Nat. Methods 14, 545–546 (2017).
Article CAS Google Scholar
Tuv, E. Ensemble learning. in Studies in Fuzziness and Soft Computing (eds Guyon, I., Nikravesh, M., Nikravesh, M. Gunn, S. & Zadeh, L. A.) Vol. 207, 187–204 (Springer, 2006).
Dietterich, T. G. Ensemble methods in machine learning. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 1857 1–15 (Springer, 2000).
Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
Article Google Scholar
Altman, N. & Krzywinski, M. Points of significance: ensemble methods: bagging and random forests. Nat. Methods 14, 933–934 (2017).
Article CAS Google Scholar
Drucker, H. Improving regressors using boosting techniques. in 14th International Conference on Machine Learning 107–115 (1997).
Natekin, A. & Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 7, 21 (2013).
Article PubMed PubMed Central Google Scholar
Schapire, R. E. The Boosting Approach to Machine Learning: An Overview. in Lecture Notes in Statistics 149–171 (Springer, 2003).
Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine learning algorithms. in Advances in Neural Information Processing Systems Vol. 4 2951–2959 (ACM, 2012).
Kubat, M. Probabilities: Bayesian Classifiers. in An Introduction to Machine Learning 19–42 (Springer International Publishing, 2017).
Aha, D. W., Kibler, D., Albert, M. K. & Quinian, J. R. Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991).
Article Google Scholar
Brownlee, J. Master machine learning algorithms discover how they work and implement them from scratch. Mach. Learn. Master. 1, 11 (2016).
Google Scholar
Fu, W. J. Penalized regressions: the bridge versus the lasso? J. Comput. Graph. Stat. 7, 397–416 (1998).
Google Scholar
Tharwat, A., Gaber, T., Ibrahim, A. & Hassanien, A. E. Linear discriminant analysis: a detailed tutorial. AI Commun. 30, 169–190 (2017).
Article Google Scholar
Krogh, A. What are artificial neural networks? Nat. Biotechnol. 26, 195–197 (2008).
Article CAS PubMed Google Scholar
Cross, S. S., Harrison, R. F. & Kennedy, R. L. Introduction to neural networks. Lancet 346, 1075–1079 (1995).
Article CAS PubMed Google Scholar
Ceccarelli, F. et al. Prediction of chronic damage in systemic lupus erythematosus by using machine-learning models. PLoS One 12, e0174200 (2017).
Article PubMed PubMed Central Google Scholar
Ruder, S. An overview of gradient descent optimization algorithms. Preprint at arXiv 1609, 04747 (2016).
Google Scholar
O’Shea, K. & Nash, R. An introduction to convolutional neural networks. Preprint at arXiv 1511, 08458v2 (2015).
Google Scholar
Medsker, L. R. & Jaub, L. C. Recurrent Neural Networks: Design and Applications (CRC Press, 2001).
Arnold, L., Rebecchi, S., Chevallier, S. & Paugam-Moisy, H. An introduction to deep learning. in ESANN 2011 proceedings, 19th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning 477–488 (IEEE, 2010).
Ikonomakis, M., Kotsiantis, S. & Tampakas, V. Text classification using machine learning techniques. WSEAS Trans. Comput. 4, 966–974 (2005).
Google Scholar
Kubat, M. Decision Trees. in An Introduction to Machine Learning 113–136 (Springer International Publishing, 2017).
Luo, G. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Netw. Model. Anal. Heal. Inform. Bioinforma. 5, 18 (2016).
Article Google Scholar
Probst, P. & Bischl, B. Tunability: importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 20, 1–32 (2019).
Google Scholar
Bergstra, J. & Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012).
Google Scholar
Feurer, M. & Hutter, F. Hyperparameter Optimization. in Automated Machine Learning: Methods, Systems, Challenges 3–33 (Springer, 2019).
Lever, J., Krzywinski, M. & Altman, N. Points of Significance: model selection and overfitting. Nat. Methods 13, 703–704 (2016).
Article CAS Google Scholar
Kim, J. H. Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53, 3735–3745 (2009).
Article Google Scholar
Schneider, J. Cross validation. Definitions https://www.cs.cmu.edu/~schneide/tut5/node42.html (1997).
Ross, K. A. et al. Cross-validation. in Encyclopedia of Database Systems 532–538 (Springer US, 2009).
Vabalas, A., Gowen, E., Poliakoff, E. & Casson, A. J. Machine learning algorithm validation with a limited sample size. PLoS One 14, e0224365 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lever, J., Krzywinski, M. & Altman, N. Points of significance: classification evaluation. Nat. Methods 13, 603–604 (2016).
Article CAS Google Scholar
Kumar, R. & Indrayan, A. Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatrics 48, 277–287 (2011).
Article PubMed Google Scholar
Altman, N. & Krzywinski, M. Points of significance: regression diagnostics. Nat. Methods 13, 385–386 (2016).
Article CAS Google Scholar
Handelman, G. S. et al. Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. Am. J. Roentgenol. 212, 38–43 (2019).
Article Google Scholar
Nantasenamat, C. How to build a machine learning model. Towards Data Science. https://towardsdatascience.com/how-to-build-a-machine-learning-model-439ab8fb3fb1 (2018).
Chai, T. & Draxler, R. R. Root mean square error (RMSE) or mean absolute error (MAE)? — arguments against avoiding RMSE in the literature. Geosci. Model. Dev. 7, 1247–1250 (2014).
Article Google Scholar
Chicco, D., Warrens, M. J. & Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. Peer J. Comput. Sci. 7, e623 (2021).
Article Google Scholar
Alpaydin, E. Introduction to Machine Learning (Adaptive Computation and Machine Learning series) (The MIT Press, 2009).
Bas¸tanlar, Y. & Özuysal, M. Introduction to machine learning. Methods Mol. Biol. 1107, 105–128 (2014).
Article PubMed Google Scholar
Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015).
Article CAS PubMed PubMed Central Google Scholar
Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C. & Collins, J. J. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).
Article CAS PubMed Google Scholar
Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug. Discov. 18, 463–477 (2019).
Article CAS PubMed PubMed Central Google Scholar
Stafford, I. S. et al. A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases. NPJ Digit. Med. 3, 30 (2020).
Feldman, C. H. et al. Supplementing claims data with electronic medical records to improve estimation and classification of rheumatoid arthritis disease activity: a machine learning approach. ACR Open. Rheumatol. 1, 552–559 (2019).
Article PubMed PubMed Central Google Scholar
Barnado, A. et al. Developing electronic health record algorithms that accurately identify patients with systemic lupus erythematosus. Arthritis Care Res. 69, 687–693 (2017).
Article Google Scholar
Xiong, W. W. et al. Real-world electronic health record identifies antimalarial underprescribing in patients with lupus nephritis. Lupus 28, 977–985 (2019).
Article CAS PubMed PubMed Central Google Scholar
Barnado, A. et al. Phenome-wide association study identifies dsDNA as a driver of major organ involvement in systemic lupus erythematosus. Lupus 28, 66–76 (2019).
Article CAS PubMed Google Scholar
Barnado, A. et al. Phenome-wide association studies uncover a novel association of increased atrial fibrillation in male patients with systemic lupus erythematosus. Arthritis Care Res. 70, 1630–1636 (2018).
Article CAS Google Scholar
Doss, J., Mo, H., Carroll, R. J., Crofford, L. J. & Denny, J. C. Phenome-wide association study of rheumatoid arthritis subgroups identifies association between seronegative disease and fibromyalgia. Arthritis Rheumatol. 69, 291–300 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zhao, S. S. et al. Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records. Rheumatology 59, 1059–1065 (2020).
Article PubMed Google Scholar
Deodhar, A. et al. Use of machine learning techniques in the development and refinement of a predictive model for early diagnosis of ankylosing spondylitis. Clin. Rheumatol. 39, 975–982 (2020).
Article PubMed Google Scholar
Walsh, J. A., Rozycki, M., Yi, E. & Park, Y. Application of machine learning in the diagnosis of axial spondyloarthritis. Curr. Opin. Rheumatol. 31, 362–367 (2019).
Article PubMed PubMed Central Google Scholar
Moores, K. G. & Sathe, N. A. A systematic review of validated methods for identifying systemic lupus erythematosus (SLE) using administrative or claims data. Vaccine 31, K62–73 (2013).
Article PubMed Google Scholar
Murray, S. G., Avati, A., Schmajuk, G. & Yazdany, J. Automated and flexible identification of complex disease: building a model for systemic lupus erythematosus using noisy labeling. J. Am. Med. Inform. Assoc. 26, 61–65 (2019).
Article PubMed Google Scholar
Liao, K. P. et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 62, 1120–1127 (2010).
Article Google Scholar
Carroll, R. J. et al. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J. Am. Med. Inform. Assoc. 19, e162–9 (2012).
Article PubMed PubMed Central Google Scholar
Ross, B. C. Mutual information between discrete and continuous data sets. PLoS One 9, e87357 (2014).
Article PubMed PubMed Central Google Scholar
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
Article Google Scholar
Bellou, E., James, K., Ng, W. F. & Hallinan, J. Machine learning of fatigue-related clinical features in primary Sjogren’s Syndrome. Int. Symp. Sjogrens Syndr. 81, 363–364 (2015).
Google Scholar
Donelle, J. A., Wang, S. X. & Caffery, B. Differentiating between Sjogren’s syndrome and dry eye disease: an analysis using random forests. J. Math. 5, 22–36 (2012).
Google Scholar
Kalweit, M. et al. Personalized prediction of disease activity in patients with rheumatoid arthritis using an adaptive deep neural network. PLoS One 16, e0252289 (2021).
Article CAS PubMed PubMed Central Google Scholar
Adamichou, C. et al. Lupus or not? SLE Risk Probability Index (SLERPI): a simple, clinician-friendly machine learning-based model to assist the diagnosis of systemic lupus erythematosus. Ann. Rheum. Dis. 80, 758–766 (2021).
Article CAS Google Scholar
Toro-Domínguez, D. et al. Differential treatments based on drug-induced gene expression signatures and longitudinal systemic lupus erythematosus stratification. Sci. Rep. 9, 15502 (2019).
Article PubMed PubMed Central Google Scholar
Toro-Domínguez, D. et al. Stratification of systemic lupus erythematosus patients into three groups of disease activity progression according to longitudinal gene expression. Arthritis Rheumatol. 70, 2025–2035 (2018).
Article PubMed PubMed Central Google Scholar
Andersen, J. K. H. et al. Neural networks for automatic scoring of arthritis disease activity on ultrasound images. RMD Open 5, e000891 (2019).
Article PubMed PubMed Central Google Scholar
Tang, J. et al. Grading of rheumatoid arthritis on ultrasound images with deep convolutional neural network. in IEEE International Ultrasonics Symposium (IEEE Computer Society, 2018).
Tang, J. et al. Enhancing convolutional neural network scheme for rheumatoid arthritis grading with limited clinical data. Chin. Phys. B 28, 038701 (2019).
Article Google Scholar
Üreten, K., Erbay, H. & Maras¸, H. H. Detection of rheumatoid arthritis from hand radiographs using a convolutional neural network. Clin. Rheumatol. 39, 969–974 (2020).
Article PubMed Google Scholar
Murakami, S., Hatano, K., Tan, J., Kim, H. & Aoki, T. Automatic identification of bone erosions in rheumatoid arthritis from hand radiographs based on deep convolutional neural network. Multimed. Tools Appl. 77, 10921–10937 (2018).
Article Google Scholar
Rohrbach, J., Reinhard, T., Sick, B. & Dürr, O. Bone erosion scoring for rheumatoid arthritis with deep convolutional neural networks. Comput. Electr. Eng. 78, 472–481 (2019).
Article Google Scholar
Betancourt-Hernández, M., Viera-López, G. & Serrano-Muñoz, A. Automatic diagnosis of rheumatoid arthritis from hand radiographs using convolutional neural networks. Rev. Cuba. Fis. 35, 39–43 (2018).
Google Scholar
Hemalatha, R. J., Vijaybaskar, V. & Thamizhvani, T. R. Automatic localization of anatomical regions in medical ultrasound images of rheumatoid arthritis using deep learning. Proc. Inst. Mech. Eng. Part. H. J. Eng. Med. 233, 657–667 (2019).
Article CAS Google Scholar
Dehghani, H., Feng, Y., Lighter, D., Zhang, L. & Wang, Y. Deep neural networks improve diagnostic accuracy of rheumatoid arthritis using diffuse optical tomography. in Optics InfoBase Conference Papers (SPIE-Intl Soc Optical Eng, 2019).
Vukicevic, A., Zabotti, A., de Vita, S. & Filipovic, N. Assessment of machine learning algorithms for the purpose of primary Sjögren’s syndrome grade classification from segmented ultrasonography images. in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering. LNICST 241, 239–245 (2018).
Google Scholar
Kise, Y. et al. Preliminary study on the application of deep learning system to diagnosis of Sjögren’s syndrome on CT images. Dentomaxillofacial Radiol. 48, 20190019 (2019).
Article Google Scholar
Simos, N. J. et al. Machine learning classification of neuropsychiatric systemic lupus erythematosus patients using resting-state fmri functional connectivity. in IST 2019 — IEEE International Conference on Imaging Systems and Techniques, Proceedings (Institute of Electrical and Electronics Engineers Inc., 2019).
Morita, K., Tashita, A., Nii, M. & Kobashi, S. Computer-aided diagnosis system for Rheumatoid Arthritis using machine learning. in Proceedings of 2017 International Conference on Machine Learning and Cybernetics Vol. 2 357–360 (IEEE, 2017).
Joo, Y. B., Baek, I. W., Park, Y. J., Park, K. S. & Kim, K. J. Machine learning-based prediction of radiographic progression in patients with axial spondyloarthritis. Clin. Rheumatol. 39, 983–991 (2020).
Article PubMed Google Scholar
Sharon, H., Elamvazuthi, I., Lu, C. K., Parasuraman, S. & Natarajan, E. Development of rheumatoid arthritis classification from electronic image sensor using ensemble method. Sensors 20, 167 (2020).
Article Google Scholar
Simos, N. J. et al. Quantitative identification of functional connectivity disturbances in neuropsychiatric lupus based on resting-state fMRI: a robust machine learning approach. Brain Sci. 10, 777 (2020).
Article PubMed Central Google Scholar
Castro-Zunti, R., Park, E. H., Choi, Y., Jin, G. Y. & Ko, S. B. Early detection of ankylosing spondylitis using texture features and statistical machine learning, and deep learning, with some patient age analysis. Comput. Med. Imaging Graph. 82, 101718 (2020).
Article PubMed Google Scholar
Gossec, L. et al. Detection of flares by decrease in physical activity, collected using wearable activity trackers in rheumatoid arthritis or axial spondyloarthritis: an application of machine learning analyses in rheumatology. Arthritis Care Res. 71, 1336–1343 (2019).
Article Google Scholar
Andreu-Perez, J. et al. Developing fine-grained actigraphies for rheumatoid arthritis patients from a single accelerometer using machine learning. Sensors 17, 2113 (2017).
Article PubMed Central Google Scholar
Oates, J. C. et al. Prediction of urinary protein markers in lupus nephritis. Kidney Int. 68, 2588–2592 (2005).
Article CAS PubMed PubMed Central Google Scholar
Tang, Y. et al. Lupus nephritis pathology prediction with clinical indices. Sci. Rep. 8, 10231 (2018).
Article PubMed PubMed Central Google Scholar
Robinson, G. A. et al. Disease-associated and patient-specific immune cell signatures in juvenile-onset systemic lupus erythematosus: patient stratification using a machine-learning approach. Lancet Rheumatol. 2, e485–e496 (2020).
Article PubMed PubMed Central Google Scholar
Choi, M. Y. & Ma, C. Making a big impact with small datasets using machine-learning approaches. Lancet Rheumatol. 2, e451–e452 (2020).
Article Google Scholar
Ormseth, M. J. et al. Development and validation of a MicroRNA panel to differentiate between patients with rheumatoid arthritis or systemic lupus erythematosus and controls. J. Rheumatol. 47, 188–196 (2020).
Article CAS PubMed Google Scholar
Labonte, A. C. et al. Identification of alterations in macrophage activation associated with disease activity in systemic lupus erythematosus. PLoS One 13, e0208132 (2018).
Article PubMed PubMed Central Google Scholar
Kegerreis, B. et al. Machine learning approaches to predict lupus disease activity from gene expression data. Sci. Rep. 9, 9617 (2019).
Article PubMed PubMed Central Google Scholar
Orange, D. E. et al. Identification of three rheumatoid arthritis disease subtypes by machine learning integration of synovial histologic features and RNA sequencing data. Arthritis Rheumatol. 70, 690–701 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ghosh, J. & Acharya, A. Cluster ensembles. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1, 305–315 (2011).
Article Google Scholar
Lu, R. et al. Immunologic findings precede rapid lupus flare after transient steroid therapy. Sci. Rep. 9, 8590 (2019).
Article PubMed PubMed Central Google Scholar
Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).
Article CAS PubMed PubMed Central Google Scholar
Morris, D. L. et al. Genome-wide association meta-analysis in Chinese and European individuals identifies ten new loci associated with systemic lupus erythematosus. Nat. Genet. 48, 940–946 (2016).
Article CAS PubMed PubMed Central Google Scholar
Stahl, E. A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514 (2010).
Article CAS PubMed PubMed Central Google Scholar
International Genetics of Ankylosing Spondylitis Consortium (IGAS). et al. Identification of multiple risk variants for ankylosing spondylitis through high-density genotyping of immune-related loci. Nat. Genet. 45, 730–8 (2013).
Article Google Scholar
Bowes, J. et al. Dense genotyping of immune-related susceptibility loci reveals new insights into the genetics of psoriatic arthritis. Nat. Commun. 6, 6046 (2015).
Article CAS PubMed Google Scholar
Li, Y. et al. A genome-wide association study in Han Chinese identifies a susceptibility locus for primary Sjögren’s syndrome at 7q11.23. Nat. Genet. 45, 1361–1365 (2013).
Article CAS PubMed Google Scholar
Almlöf, J. C. et al. Novel risk genes for systemic lupus erythematosus predicted by random forest classification. Sci. Rep. 7, 6236 (2017).
Article PubMed PubMed Central Google Scholar
Briggs, F. B. S. et al. Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis. Genes. Immun. 11, 199–208 (2010).
Article CAS PubMed PubMed Central Google Scholar
Glaser, B. et al. Analyses of single marker and pairwise effects of candidate loci for rheumatoid arthritis using logistic regression and random forests. BMC Proc. 1, S54 (2007).
Article PubMed PubMed Central Google Scholar
Croiseau, P. & Cordell, H. J. Analysis of North American Rheumatoid Arthritis Consortium data using a penalized logistic regression approach. BMC Proc. 3, S61 (2009).
Article PubMed PubMed Central Google Scholar
Vignal, C. M., Bansal, A. T. & Balding, D. J. Using penalised logistic regression to fine map HLA variants for rheumatoid arthritis. Ann. Hum. Genet. 75, 655–664 (2011).
Article PubMed Google Scholar
Bartoloni, E. et al. Application of artificial neural network analysis in the evaluation of cardiovascular risk in primary Sjögren’s syndrome: a novel pathogenetic scenario? Clin. Exp. Rheumatol. 37, S133–S139 (2019).
Google Scholar
Navarini, L. et al. A machine-learning approach to cardiovascular risk prediction in psoriatic arthritis. Rheumatology 59, 1767–1769 (2020).
Article PubMed Google Scholar
Navarini, L. et al. Cardiovascular risk prediction in ankylosing spondylitis: from traditional scores to machine learning assessment. Rheumatol. Ther. 7, 867–882 (2020).
Article PubMed PubMed Central Google Scholar
Ravenell, R. L. et al. Premature atherosclerosis is associated with hypovitaminosis D and angiotensin-converting enzyme inhibitor non-use in lupus patients. Am. J. Med. Sci. 344, 268–273 (2012).
Article PubMed PubMed Central Google Scholar
Reddy, B. K. & Delen, D. Predicting hospital readmission for lupus patients: an RNN-LSTM-based deep-learning methodology. Comput. Biol. Med. 101, 199–209 (2018).
Article PubMed Google Scholar
Hong, S. et al. Longitudinal profiling of human blood transcriptome in healthy and lupus pregnancy. J. Exp. Med. 216, 1154–1169 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. Machine learning for prediction and risk stratification of lupus nephritis renal flare. Am. J. Nephrol. 52, 152–160 (2021).
Article CAS PubMed Google Scholar
Babajide Mustapha, I. & Saeed, F. Bioactive molecule prediction using extreme gradient boosting. Molecules 21, 983 (2016).
Article PubMed Central Google Scholar
Nair, N. & Wilson, A. G. Can machine learning predict responses to TNF inhibitors? Nat. Rev. Rheumatol. 15, 702–704 (2019).
Article PubMed Google Scholar
Plenge, R. M. et al. Crowdsourcing genetic prediction of clinical utility in the rheumatoid arthritis responder challenge. Nat. Genet. 45, 468–469 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tao, W. et al. Multiomics and machine learning accurately predict clinical response to adalimumab and etanercept therapy in patients with rheumatoid arthritis. Arthritis Rheumatol. 73, 212–222 (2021).
Article CAS PubMed Google Scholar
Plant, D. & Barton, A. Machine learning in precision medicine: lessons to learn. Nat. Rev. Rheumatol. 17, 5–6 (2021).
Article PubMed Google Scholar
Van Looy, D. et al. Comparing statistics with machine learning models to predict dose increase of infliximab for rheumatoid arthritis patients. in Proc. 9th IASTED Int. Conf. Artif. Intell. Soft Computing, ASC 195–200 (ACTA Press, 2005).
Lee, S. et al. Machine learning to predict early TNF inhibitor users in patients with ankylosing spondylitis. Sci. Rep. 10, 20299 (2020).
Article CAS PubMed PubMed Central Google Scholar
Seridi, L. et al. OP0161 association of baseline cytotoxic gene expression with ustekinumab response in systemic lupus erythematosus. Ann. Rheum. Dis. 79, 101–102 (2020).
Article Google Scholar
Gottlieb, A. B. et al. Secukinumab efficacy in psoriatic arthritis. JCR 27, 239–247 (2021).
PubMed Google Scholar
Wolf, B. J. et al. Development of biomarker models to predict outcomes in lupus nephritis. Arthritis Rheumatol. 68, 1955–1963 (2016).
Article CAS PubMed PubMed Central Google Scholar
Vodencarevic, A. et al. Advanced machine learning for predicting individual risk of flares in rheumatoid arthritis patients tapering biologic drugs. Arthritis Res. Ther. 23, 67 (2021).
Article CAS PubMed PubMed Central Google Scholar
Patrick, M. T. et al. Drug repurposing prediction for immune-mediated cutaneous diseases using a word-embedding-based machine learning approach. J. Invest. Dermatol. 139, 683–691 (2019).
Article CAS PubMed Google Scholar
Ekins, S. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18, 435–441 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lavecchia, A. Deep learning in drug discovery: opportunities, challenges and future prospects. Drug. Discov. Today 24, 2017–2032 (2019).
Article PubMed Google Scholar
Kuang, Z. et al. A machine-learning-based drug repurposing approach using baseline regularization. Methods Mol. Biol. 1903, 255–267 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zeng, X. et al. DeepDR: a network-based deep learning approach to in silico drug repositioning. Bioinformatics 35, 5191–5198 (2019).
Article CAS PubMed PubMed Central Google Scholar
Xu, R. & Wang, Q. Q. Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature. J. Biomed. Inform. 51, 191–199 (2014).
Article PubMed PubMed Central Google Scholar
Bresso, E. et al. Integrative relational machine-learning for understanding drug side-effect profiles. BMC Bioinforma. 14, 207 (2013).
Article Google Scholar
Aliper, A. et al. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13, 2524–2530 (2016).
Article CAS PubMed PubMed Central Google Scholar
Grammer, A. C. & Lipsky, P. E. Drug repositioning strategies for the identification of novel therapies for rheumatic autoimmune inflammatory diseases. Rheum. Dis. Clin. North. Am. 43, 467–480 (2017).
Article PubMed Google Scholar
Figgett, W. A. et al. Machine learning applied to whole-blood RNA-sequencing data uncovers distinct subsets of patients with systemic lupus erythematosus. Clin. Transl. Immunol. 8, e01093 (2019).
Article Google Scholar
Catalina, M. D., Owen, K. A., Labonte, A. C., Grammer, A. C. & Lipsky, P. E. The pathogenesis of systemic lupus erythematosus: harnessing big data to understand the molecular basis of lupus. J. Autoimmun. 110, 102359 (2020).
Article CAS PubMed Google Scholar
Guthridge, J. M. et al. Adults with systemic lupus exhibit distinct molecular phenotypes in a cross-sectional study. EClinicalMedicine 20, 100291 (2020).
Article PubMed PubMed Central Google Scholar
Lu, Z., Li, W., Tang, Y., Da, Z. & Li, X. Lymphocyte subset clustering analysis in treatment-naive patients with systemic lupus erythematosus. Clin. Rheumatol. 40, 1835–1842 (2021).
Article PubMed Google Scholar
Spielmann, L. et al. Anti-Ku syndrome with elevated CK and anti-Ku syndrome with anti-dsDNA are two distinct entities with different outcomes. Ann. Rheum. Dis. 78, 1101–1106 (2019).
Article CAS PubMed Google Scholar
Pinal-Fernandez, I. & Mammen, A. L. On using machine learning algorithms to define clinically meaningful patient subgroups. Ann. Rheum. Dis. 79, e128 (2020).
Article PubMed Google Scholar
Baldini, C., Ferro, F., Luciano, N., Bombardieri, S. & Grossi, E. Artificial neural networks help to identify disease subsets and to predict lymphoma in primary Sjögren’s syndrome. Clin. Exp. Rheumatol. 36, S137–S144 (2018).
Google Scholar
Delgadillo, J. Machine learning: a primer for psychotherapy researchers. Psychother. Res. 31, 1–4 (2021).
Article PubMed Google Scholar
Breck, E., Polyzotis, N., Roy, S., Whang, S. E. & Zinkevich, M. Data Validation for Machine Learning. in Proceedings of the 2nd SysML Conference (Palo Alto Networks, 2019).
Kubat, M. A Simple Machine-Learning Task. in An Introduction to Machine Learning (Springer International Publishing, 2017).
Van Der Aalst, W. M. P. et al. Process mining: a two-step approach to balance between underfitting and overfitting. Softw. Syst. Model. 9, 87–111 (2010).
Article Google Scholar
Schaffer, C. Overfitting avoidance as bias. Mach. Learn. 10, 153–178 (1993).
Article Google Scholar
Adadi, A. & Berrada, M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 6, 52138–52160 (2018).
Article Google Scholar
Tjoa, E. & Guan, C. A Survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Trans. Neural Netw. Learn Syst. https://doi.org/10.1109/TNNLS.2020.3027314 (2020).
Kingsford, C. & Salzberg, S. L. What are decision trees? Nat. Biotechnol. 26, 1011–1012 (2008).
Article CAS PubMed PubMed Central Google Scholar
Doran, D., Schulz, S. & Besold, T. R. What does explainable AI really mean? A new conceptualization of perspectives. in CEUR Workshop Proceedings Vol. 2071 (CEUR-WS, 2018).
Cruz Rivera, S. et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Lancet Dig. Health 2, e549–e560 (2020).
Article Google Scholar
Burmester, G. R. Rheumatology 4.0: big data, wearables and diagnosis by computer. Ann. Rheum. Dis. 77, 963–965 (2018).
Article PubMed Google Scholar
Pandit, A. & Radstake, T. R. D. J. Machine learning in rheumatology approaches the clinic. Nat. Rev. Rheumatol. 16, 69–70 (2020).
Article PubMed Google Scholar
Yang, S. & Berdine, G. The receiver operating characteristic (ROC) curve. Southwest. Respir. Crit. Care Chron. 5, 34 (2017).
Article Google Scholar

Download references

Acknowledgements

The authors thank P. Bachali, S. Shrotri, K. Bell, and J. Kain for helpful discussion about machine learning concepts. The authors thank Dr. C. Nantasenamat for allowing us to modify his figure about the workflow of ML. This work was supported by funding from the RILITE Foundation.

Author information

Authors and Affiliations

AMPEL BioSolutions and RILITE Research Institute, Charlottesville, VA, USA
Kathryn M. Kingsmore, Christopher E. Puglisi, Amrie C. Grammer & Peter E. Lipsky

Authors

Kathryn M. Kingsmore
View author publications
You can also search for this author in PubMed Google Scholar
Christopher E. Puglisi
View author publications
You can also search for this author in PubMed Google Scholar
Amrie C. Grammer
View author publications
You can also search for this author in PubMed Google Scholar
Peter E. Lipsky
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K. M. K. and C. E. P. researched data for the article. K. M. K., C. E. P., A. C. G. and P. E. L. contributed substantially to discussion of the content. K. M. K., C. E. P. and P. E. L. wrote the article. K. M. K. C.E. P. and P. E. L. reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Kathryn M. Kingsmore.

Ethics declarations

Competing interests

K.M.K. and C.E.P. were employed by AMPEL BioSolutions, LLC, during the preparation of this work. K.M.K. was additionally employed by the RILITE Research Institute during the preparation of this work. A.C.G. and P.E.L. are the founders of AMPEL BioSolutions, LLC. The authors declare that the content of this manuscript is not related to AMPEL BioSolutions, LLC’s commercial activities. AMPEL uses machine learning as one technique in our analyses pipelines, but does not have a proprietary interest in machine learning as a technology or commercial interest in a specific classifier, regressor or clustering approach. All of the material described in the manuscript is freely available in the public domain.

Additional information

Peer review information

Nature Reviews Rheumatology thanks M. Krusche and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Glossary

Machine learning: (ML). A subset of artificial intelligence that utilizes software to predict outcomes and recognize relationships in data without explicit programmes for each step.
Algorithms: Mathematical or computational methods that can be applied to data to form a model.
Model: A framework built upon input data that can classify, regress or cluster.
Statistical modelling: A model that relies on explicitly programmed mathematical functions to explain relationships in data.
Classification: Prediction of a categorical outcome.
Regression: Prediction of a quantitative outcome.
Clustering: Grouping of data points with similar characteristics.
Labelled: Data for which the class or outcome value is known.
Class: A group with a label that is produced from classification.
Clusters: Groups without a label that are produced from clustering.
Supervised models: Models trained on labelled data that are used to predict classes or quantitative values.
Unsupervised models: Models trained on unlabelled data that are used to find associations and patterns that result in groups of similar samples.
Imputation: A method of replacing missing values with data points.
Data scaling: The processes of transforming data into a format that a computer algorithm can use, which can also involve normalization.
Feature selection: The process of selecting the best set of variables to be used as input for the model.
Natural language processing: (NLP). A data scaling process that is also a branch of ML, which allows computers to interpret human language.
Dimensionality reduction: The process of reducing the number of input variables (features).
Variance: Error as a result of the fluctuations in the observations, or how much the observations differ from the average value.
Biased: A biased model is one that fails to capture underlying patterns in data and thus there is a difference between the true values and the values predicted by the model.
Decision trees: Supervised method that asks a series of ‘yes or no’ questions with labelled data to classify or regress.
Clustering algorithms: Unsupervised methods that assign observations to subsets using mathematically calculated distances.
Neural networks: Supervised or unsupervised methods that build a series of networks to predict or classify. They are named because the structure of the model is aimed at mimicking the way in which a human brain operates.
Ensemble algorithms: Supervised methods that aggregate several predictors from multiple machine learning models (for example, random forest).
Bagging: Algorithm that generates training sets by sampling of the training data with replacement to generate individual models that are characteristic of the sample, which are then aggregated to build a final model.
Boosting: Algorithm that adds an additional simpler model to minimize the existing error during each iteration of a supervised model.
Bayesian algorithms: Supervised methods that solve classification problems by predicting the most probably hypothesis, given the input data (for example, naive Bayes).
Instance-based: Supervised methods that memorize instances seen in training to make predictions (for example, support vector machines and k-nearest neighbours).
Regression algorithms: Supervised methods that use linear or polynomial functions for or as a fundamental part of prediction (for example, linear regression and logistic regression).
Regularization algorithms: A type of supervised regression method that shrinks coefficient estimates to zero to avoid overfitting (for example, least absolute shrinkage and selection operator and ridge regression).
Hyperparameters: Variables that must be set prior to model construction by the user or by software default and can then be tuned during model construction to maximize accuracy.
Parameters: Variables that are ‘learned’ during model construction. Parameters differ between algorithms based on algorithm architecture.
Training dataset: The dataset used by supervised models to ‘learn’ to predict an outcome by viewing both the input and output variables in the data.
Validation dataset: A portion of the training dataset that is withdrawn to give an estimate of fit while tuning model parameters, or a separate dataset used to estimate model fit and tune parameters.
Holdout: The process of reserving some samples for training and some for validation from a single dataset.
k-fold cross-validation: An extension of model validation that partitions the data into complementary subsets when training, to perform parallel analyses on each subset.
Sensitivity: The proportion of the actual positives that are correctly identified. Also known as the true positive rate.
Specificity: The proportion of the actual negatives that are correctly identified. Also known as the true negative rate.
Receiver operating characteristic (ROC) curves: (ROC curve). A plot of the sensitivity against the 1 − specificity that is used to assess the performance of a binary classifier.
Area under the curve: (AUC). Generally refers to the area under the ROC curve, so it can also be referred to as the area under the ROC (AUROC).
Testing dataset: An independent dataset that is used to provide an unbiased evaluation of the final model fit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kingsmore, K.M., Puglisi, C.E., Grammer, A.C. et al. An introduction to machine learning and analysis of its use in rheumatic diseases. Nat Rev Rheumatol 17, 710–730 (2021). https://doi.org/10.1038/s41584-021-00708-w

Download citation

Accepted: 04 October 2021
Published: 02 November 2021
Issue Date: December 2021
DOI: https://doi.org/10.1038/s41584-021-00708-w

This article is cited by

Deep learning in rheumatological image interpretation
- Berend C. Stoel
- Marius Staring
- Annette H. M. van der Helm-van Mil
Nature Reviews Rheumatology (2024)
Quantitative prediction of radiographic progression in patients with axial spondyloarthritis using neural network model in a real-world setting
- In-Woon Baek
- Seung Min Jung
- Ki-Jo Kim
Arthritis Research & Therapy (2023)
A machine learning model identifies patients in need of autoimmune disease testing using electronic health records
- Iain S. Forrest
- Ben O. Petrazzini
- Ron Do
Nature Communications (2023)
Yearning for machine learning: applications for the classification and characterisation of senescence
- Bethany K. Hughes
- Ryan Wallis
- Cleo L. Bishop
Cell and Tissue Research (2023)
Computational pathology for musculoskeletal conditions using machine learning: advances, trends, and challenges
- Maxwell A. Konnaris
- Matthew Brendel
- Richard D. Bell
Arthritis Research & Therapy (2022)