Improved diagnosis of rheumatoid arthritis using an artificial neural network

Rheumatoid arthritis (RA) is chronic systemic disease that can cause joint damage, disability and destructive polyarthritis. Current diagnosis of RA is based on a combination of clinical and laboratory features. However, RA diagnosis can be difficult at its disease onset on account of overlapping symptoms with other arthritis, so early recognition and diagnosis of RA permit the better management of patients. In order to improve the medical diagnosis of RA and evaluate the effects of different clinical features on RA diagnosis, we applied an artificial neural network (ANN) as the training algorithm, and used fivefold cross-validation to evaluate its performance. From each sample, we obtained data on 6 features: age, sex, rheumatoid factor, anti-citrullinated peptide antibody (CCP), 14-3-3η, and anti-carbamylated protein (CarP) antibodies. After training, this ANN model assigned each sample a probability for being either an RA patient or a non-RA patient. On the validation dataset, the F1 for all samples by this ANN model was 0.916, which was higher than the 0.906 we previously reported using an optimal threshold algorithm. Therefore, this ANN algorithm not only improved the accuracy of RA diagnosis, but also revealed that anti-CCP had the greatest effect while age and anti-CarP had a weaker on RA diagnosis.

Rheumatoid arthritis (RA), a chronic multisystem autoimmune disease, is caused by persistent inflammatory synovitis and subsequent erosion of joint structures. The etiology of this complex disease consists of both genetic and environmental risk factors 1 . RA is generally diagnosed based on two laboratory indicators: rheumatoid factor (RF) and anti-cyclic citrullinated peptide (CCP) antibody. However, even if these indicators are negative, a patient may still develop RA. At the same time, if one of the indicators is positive, a patient may not suffer from RA.
In a previous study, we showed that in the Han population of Northern China, anti-CarP and 14-3-3η protein are valuable indicators of RA, and when combined with RF and anti-CCP, the detection accuracy is maximized 2 . However, in the process of diagnosis, in addition to the above two indicators, other factors such as age and gender are ignored. Moreover, rheumatologists routinely use the 2010 American College of Rheumatology (ACR)/ European Union of Rheumatology (EULAR) classification criteria for diagnosis, but some RA cases do not meet the criteria 3 . Therefore, we are actively working on finding more effective means and various clinical indicators to further improve the accuracy of RA diagnosis.
In recent years, artificial intelligence (AI) has made great breakthroughs in variety of scientific areas. Computer programs perform better than humans in the interpretation of medical images in clinical settings 4 . Deep learning is a sub-discipline of AI, and its application to medical image interpretation has gradually expanded. It is known that in some fields, the efficiency of computer analysis is better than that of human researchers; for example, AI is widely used to analyze magnetic resonance imaging data and predict early RA 5 . Deep learning has a wide range of applications in computer vision, and it plays an important role in analyzing imaging data of many diseases (e.g., melanoma, retinopathy, and metastatic breast cancer). A subcategory of deep learning called recurrent neural networks is the latest technology for longitudinal prediction and application in electronic health record data 6 . Integrating multiple items of data from patients to develop AI-based models has shown great potential to improve the accuracy of diagnosis, thereby resulting in clinical benefits 7 . Fukae and colleagues have transformed various kinds of clinical information from patients into two-dimensional images, and then made fine adjustments to convolutional neural networks (CNNs) to determine whether or not they have RA. This work has laid the foundation for applying deep learning to the diagnosis of RA 3  Variables used in the model. Briefly, we considered 6 features (age, sex, rheumatoid factor (RF), anti-CCP, 14-3-3η, and anti-CarP) for each patient sample. RF was measured by rate-turbidimetric immunoassay using IMMAGE 800 Immunochemistry System (Beckman Coulter, USA). Anti-CCP was measured by electrochemi-luminescence assay (ECLA) using ROCHE COBAS E601 (Roche Diagnostics GmbH, Germany). The expression level of anti-CarP and 14-3-3η in the serum samples was determined by Light Initiated Chemiluminescent Assay (LiCA) using LiCA 500 Immunoassay System (ChIVD Chemclin DiagnosticsCorp., China). All data were illustrated in accordance with the manufacturer's guidelines.

Mathematical models.
We used the open-source toolkit scikit-learn built on python to do feature engineering, model establishment, and model validation 10 . We selected the following models for evaluation: (1) Artificial Neuron Networks (with 1 or 2 hidden layers); (2) Logistic Regression; (3) Random Forest; (4) K nearest neighbors; (5) Support vector machine; (6) Gaussian Naïve Bayes; (7) Gradient boosting classifier. For each hyperparameter, we fixed the other hyperparameters, performed gradient testing, and selected the one with the best performance as the value of the hyperparameter.

Feasibility verification.
For feature selection and model selection, those performance were evaluated using fivefold cross-validation; that is, the original data were equally divided into 5 parts, and the ratios of positive and negative examples for each part were consistent with the original data sets. During each training cycle, we examined the performance of the algorithm by using 4 parts of the data as the training sets and 1 part as the test set. Feature engineering. We did feature normalization, feature selection, and feature importance evaluation for feature engineering. For normalization, we used the z-score standard scaler. The best subset selection is used for feature selection, that is, all possible subset combinations were tested and the best was selected. Based on the comparing with random false features, the feature selection was also performed by Boruta 11 . Inspired by Boruta, for the feature importance evaluation in our perceptron-based ANN model, we replaced each of the real features with the shuffled shadow features and then re-trained the model, and an importance score was given from the sum of the reduction of the accuracy and the area under curve (AUC). www.nature.com/scientificreports/ Statistical analysis. Statistical analysis was performed using GraphPad software (GraphPad Prism 8 Inc., San Diego, CA, USA). Quantitative variables were expressed either as the mean ± standard deviation or the 95% confidence interval, while categorical variables were expressed as frequency and percentage. The accuracy, area under curve (AUC), F1, precision, and recall were calculated using 2 × 2 confusion matrix. p < 0.05 was considered statistically significant.

Results
All six features play important roles in RA diagnosis. To determine which of those features we will use in our model, we used the best subset selection, and tried all the combinations of the 6 features, the result is shown (Fig. 1A); each grey dots indicate a combination, and the best subset of each feature number is colored red, showing that the model AUC increases while the number of features used increases. We also used the Boruta to compare the importance of each of the feature with shadow features, and all feature hit, that is, outperform the best shadow feature, all the times as shown (Fig. 1B), indicating that all features are important. We also evaluated the feature importance in our perceptron-based ANN model (Fig. 1C), the anti-CCP showed the most importance, and the anti-CARP and age also scored high in the evaluation, showing a weaker but evident influence.
ANN with 2 hidden layers performs best among machine learning methods. We then tested those different machine learning models with different structures, and cross-validation results for all models were shown in Table 2, confirming that the ANN with 2 hidden layers performed best among machine learning methods. Together, with the first layer having 9 neurons and the second layer having 4 neurons (Fig. 2), the ANN gave the best result. www.nature.com/scientificreports/ The ANN predicts RA diagnosis more accurately than the threshold algorithm. We then asked how the ANN model performs compared with the threshold algorithm. The dataset is divided into 2 parts randomly, 2/3 (447, 194 RA and 253 non-RA) for training and 1/3 (223, 97 RA and 126 non-RA) for validation. All the evaluation was performed on the validation set. The receiver operating characteristic (ROC) curve of the ANN output is given (Fig. 3B), with an AUC of 0.951 (95% CI [0.921, 0.981]), and the ROC of the previous threshold algorithm output is also given (Fig. 3A), with an AUC of 0.878 (95% CI [0.826, 0.930]). The confusion Table 2. Machine learning methods performance evaluation.

Cross-validation accuracy (± SD) Cross-validation AUC (± SD)
ANN (   www.nature.com/scientificreports/ matrixes are shown in Table 3; based on the confusion matrixes, the precision, recall, F1 and accuracy were calculated as Table 4. Though the recall of ANN method is slightly under the threshold method, the precision, F1 and accuracy overperformed the threshold method, and the AUC also indicated a satisfying classifier. We further asked how those mistakes happened, and the basal characteristics of 4 populations, true negative (TN), true positive (TP), false positive (FP), and false negative (FN), of which our ANN classifier are listed in Table 5. Those FN showed little signs in the traditional indicators, RF and anti-CCP, as well as showed limited sign in the new indicators, 14-3-3η and anti-CarP. Those FP shows each indicators twice over those TN. The basal characteristics of 4 populations indicates that those errors were hardly be avoid and our model accurately predicted most of the cases.

Discussion
Technological advances in image processing and analysis have laid a solid foundation for the automatic detection and diagnosis of RA. Methods based on machine learning and deep learning can be used to automatically apply a threshold to achieve prediction by their confidence levels, so that they can be used to generate objective diseasespecific RA markers of patient mobility between clinical visits 12 . In this study, we introduced an ANN into the diagnosis of RA, enabling the integration of all features to increase the accuracy of diagnosis and decrease the waste of indicator information caused by threshold division. This ANN algorithm achieved a better prediction accuracy (90.6%) than that of the threshold algorithm (88.8%) 2 . Among these features, anti-CCP had the greatest influence while age and anti-CarP also had a weaker but evident influence on RA diagnosis, allowing us to appreciate an age factor in RA diagnosis that was not previously recognized.
AI-based paradigms are useful for accurate tissue characterization and risk stratification for RA patients. In terms of Doppler ultrasound images, neural network techniques can be used in the scoring of disease activity 13 . Machine learning-and deep learning-based techniques not only automate the risk characterization process   www.nature.com/scientificreports/ but also provide accurate cardiovascular risk stratification for the better management of RA patients 14 . A deep learning algorithm has also been used to define and analyze the specific grade of synovitis for determining the nature of arthritis 15 . Besides, others have taken advantage of pixel information from hand radiographs to design a multi-layer CNN architecture with online data augmentation, by which accuracy, sensitivity, specificity, and precision state are achieved for the diagnosis of RA 16 . The application of CNNs may reduce diagnostic effort by saving analysis time and allowing automated data screening 17 . Admittedly, the ANN is a relatively basic form of machine learning, which operates better when the feature numbers are small, but due to the small numbers, it often does not fully reflect the condition of patients. If more clinical information, such as images, symptoms, or even self-assessments, is integrated into the features, combination with other machine learning algorithms will further improve the accuracy and efficiency of the diagnosis of RA and other diseases. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.