Rheumatoid arthritis (RA) is a severe chronic inflammatory reaction that occurs in the synovium of joints. Mortality hazards are 60%–70% higher in patients with RA than in those without the disease1. Although the exact etiology of RA is still under investigation, several genetic studies have suggested a role of genetic factors2, 3. The most well-known genetic risk factors for RA are variations in human leukocyte antigen (HLA) genes, especially the HLA-DRB1 gene4. However, many other genes with potential links to RA remain to be investigated in order to discover further genetic risk factors and therapeutic variations for RA.

Tumor necrosis factor alpha (TNF-α) inhibitors play important roles in inflammatory states, including RA5. There are five TNF-α inhibitors available for RA treatment (adalimumab, certolizumab, etanercept, golimumab, and infliximab), and clinical efficacies in RA are known to be similar among these agents6. Patients with advanced RA are treated with TNF-α inhibitors; however, the efficacy of these treatments is still questionable as several studies have reported that only one-third of the patients benefit from the treatment7, 8.

Toll-like receptors (TLRs) play vital roles in both innate and acquired immune systems9, and several studies have shown their association with the development of RA10,11,12. Notably, TLRs are known as inducers of TNF-α transcription13. Triad3A is an E3 ubiquitin–protein ligase that induces degradation of TLR4 and TLR914. Hence, reduction in endogenous Triad3A results in TLR activation. Since Triad3A acts specifically on TLR4 and TLR9 among the 13 members of the TLR family, the genes encoding TLR4 and TLR9 are important for understanding RA pathogenesis and potential therapeutic intervention15,16,17,18. A study showed that TLR4 is specifically required for production of osteoclastogenic cytokines, thus, involved in pathophysiology of RA19. Moreover, an in vitro study reported that TLR4 is required for the TNF-α expression20. Another study revealed that TLR9 level was elevated on circulating and synovial monocyte subsets of RA patients21.

Nuclear factor-kappaB (NFkB) is associated with the response to TNF-α inhibitors in autoimmune diseases22. Due to this association, several proteins activating NFkB have been discovered and investigated, including TLRs. As TLRs activate pro-inflammatory cytokines including TNF-α and transcription factors such as NFkB, their polymorphisms may potentially affect treatment outcomes23.

Recently, machine learning methods have been utilized as tools for decision making and clinical predictions. Compared to traditional predictive models that use selective variables for calculation, machine learning approaches are favorable when developing novel prediction models. Moreover, remission in RA is important since clinical remission is considered a treat-to target goal. Therefore, this study was designed to investigate the effects of polymorphisms in TLR4 and TLR9 on response to TNF-α inhibitor and by training predictive models utilizing various machine learning approaches for remission.


Study patients

This prospective observational two-center study enrolled 105 patients who were prescribed TNF-α inhibitors (adalimumab, etanercept, golimumab, or infliximab) at Ajou University Hospital and Chungbuk National University Hospital between July 2017 and December 2019. Data collection was conducted using electronic medical records. Data on sex, age, weight, height, duration of RA, autoantibodies against rheumatoid factor, anti-cyclic citrullinated peptide, concomitant medications, and comorbidities were collected from electronic medical records. Additionally, baseline data on disease activity score (DAS)-28 and its subcomponents, which included tender joint count (TJC)-28, swollen joint count (SJC)-28, global health (GH), and erythrocyte sedimentation rate (ESR) or C-reactive protein levels, were collected.

A good clinical response to anti-TNF therapy was defined as the basis of the DAS-28 scores. Patients with a DAS-28 score of less than 2.6 after 6 months of TNF-α inhibitor therapy, were considered to be in remission24. DAS-28 was calculated as 0.56 × √(TJC28) + 0.28 × √(SJC28) + 0.70 × ln(ESR) + 0.014 × GH24.

This study was approved by the Institutional Review Boards of the Ajou University Hospital (approval number: AJIRB-BMR-OBS-17-153) and Chungbuk National University Hospital (approval number: 2017-06-011-004). All patients submitted written informed consents for participation. This study was conducted according to the principles of the Declaration of Helsinki (2013).

Genotyping methods

To select single nucleotide polymorphisms (SNPs) of TLR4 and TLR9 that might be associated with RA remission, genetic information on TLR4 and TLR9 was obtained from the PharmGKB database, Haploreg 4.1, the NCBI Database of SNPs (dbSNP), and previous studies22, 25,26,27,28,29. A total of six SNPS, including four SNPs of TLR4 (rs11536889, rs1927907, rs1927911, and rs2149356) and two SNPs of TLR9 (rs352139 and rs352140), were selected. Tag SNPs were chosen with minor allele frequency (MAF) of ≥ 25% in Japanese and Han Chinese populations using Haploview 4.2. Among selected SNPs, TLR4 SNP rs1927907 and rs1927911 and TLR9 SNP rs352139 were previously studied for autoimmune related conditions19, 25, 26.

Genomic DNA of the patients was isolated from ethylenediaminetetraacetic acid (EDTA)–blood samples using the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer’s protocol. Genotyping was performed using a single-base primer extension assay with TaqMan genotyping assay in a real-time PCR system (ABI 7300, ABI), according to the manufacturer’s recommendations (Supplementary section).

Statistical analysis and machine learning methods

Student’s t-test was used to compare continuous variables between patients who showed good clinical response (remission) and those who did not. Chi-square test or Fisher’s exact test was used to compare categorical variables between the two groups. Multivariable logistic regression analysis was used to examine independent factors affecting remission; factors with a p-value less than 0.05 in univariate analysis along with clinically relevant confounders were included in multivariable analysis. The Hosmer–Lemeshow test was performed to confirm the model’s goodness of fit.

This study employed a random forest–based classification approach to analyze the importance of different variables for factors that affect remission. To prevent over-fitting, we selected seven features that are most important. Various machine learning methods such as multivariate logistic regression, elastic net, random forest, and support vector machine (SVM) were utilized for prediction of remission. All the methods were implemented with the caret R package (version 6.0-88, The area under the receiver-operating curve (AUROC), to assess the ability of the risk factor to predict complication, and its 95% confidence interval (CI) of each machine learning prediction models were described in this study. A p-value of less than 0.05 was considered statistically significant. Univariate statistical analysis was conducted using IBM SPSS statistics, version 20 software (International Business Machines Corp., New York, USA). All other analyses were performed using R software version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria).

To measure performance of each machine learning model, internal validation was done. The dataset was randomly divided for model development and evaluation in prediction process. After partitioning one data sample into five subsets, one subset was selected for model validation while the remaining subsets were used to establish machine learning models. Each five-fold cross-validation iteration was repeated 100 times to evaluate the power of the machine learning models.


Among the 105 patients enrolled in this study, 7 patients were excluded due to incomplete medical data. The data from 98 patients receiving TNF-α inhibitors were analyzed. The mean age of the included patients was 53 years (range: 20–82 years), and there were 79 (80.6%) females. The mean duration of RA was 9 years, and 29 patients reached remission. To determine the possible effect of disease status on response to TNF-α inhibitors, baseline DAS-28 and its subcomponents were examined. Baseline DAS-28 and its subcomponents were not statistically significant between the remission and non-remission groups (Table 1). Marginal significance was found according to sex (p = 0.059) and hypertension (p = 0.060).

Table 1 Patient characteristics according to the response at 6 months treatment of TNF inhibitors.

As shown in Table 2, statistically significant associations between genotypes and RA remission were found for both TLR9 SNPs: T-allele carriers of rs352139 and rs352140 experienced approximately 3.3 and 4.5 times more frequent remission than patients with the CC genotype, respectively. A table of SNP with three genotypes is provided in the Supplementary section (Supplementary Table S2).

Table 2 Genotype association with the remission at 6 months treatment of TNF inhibitors.

Multivariable analysis (Table 3) included sex, age, and factors with p < 0.05 from the univariate analysis. Because significant linkage disequilibrium was observed between rs352139 and rs352140 (r2 = 0.95), only rs352139 was included in the multivariable analysis. Among the included factors, rs352139 was significantly associated with RA remission (95% CI 1.325–19.231, p = 0.018). After adjusting for related covariates, the remission rate in T-allele carriers of rs352139 was about 5.1 times that in patients with the CC genotype. The Hosmer–Lemeshow test showed that the fitness of the multivariable analysis model was satisfactory (χ2 = 0.907, 2 degrees of freedom, p = 0.636).

Table 3 Multivariate analysis to identify predictors for the remission rate at 6 months treatment of TNF inhibitors.

As shown in Fig. 1, after feature selection using performing five-fold cross-validated random forest approach, four important variables from feature selection (rs352139, body mass index (BMI), sulfasalazine, and anti-citrullinated protein/peptide antibody (AC-PA)) were included in machine learning models. After performing five-fold cross-validated multivariate logistic regression, elastic net, random forest, support vector machine (SVM) models, the average area under the receiver-operating curve (AUROC), values across 100 random iterations were shown in Table 4. The AUROC values for multivariate logistic regression, elastic net, and random forest indicated good performances of the models; 0.71, 0.71, and 0.70 respectively (95% CI 0.594–0.827 for multivariate logistic regression and elastic models and 0.584–0.821, respectively). Linear kernel SVM and radial kernel SVM revealed sub-optimal performances of the models; AUROC values of 0.60 and 0.67, respectively (95% CI 0.416–0.782 and 0.53–0.813, respectively). Figure 2 showed AUROC curves of three models that exhibit good interpretability and prediction rate. Details for the packages used and parameters used for training models are provided in the Supplementary section (Supplementary Table S1).

Figure 1
figure 1

Variable importance using random forest to predict remission in patients with RA receiving TNF-α inhibitors. Figure was drawn using caret R package version 6.0-88 (

Table 4 Comparisons of AUC for logistic regression, elastic net, random forest, and SVM models.
Figure 2
figure 2

The receiver operating characteristic curves for predictive performance of elastic net (ENET), logistic regression (LR), and random forest (RF) models. Figure was drawn using caret R package version 6.0-88 (


The main finding of this study is that rs352139 of TLR9 was associated with treatment response to TNF-α inhibitors in RA patients. The remission rate in T-allele carriers of rs352139 was about 5 times that in patients with the CC genotype. Multivariate logistic regression and elastic net were proven to be the most suitable method in predicting remission in patients with RA, with AUROC values of 0.71 (95% CI 0.594–0.827 for both models).

TNF-α is a pro-inflammatory cytokine involved in the innate immune response30. It is involved in the pathogenesis of several inflammatory conditions, especially RA. As the TNF-α level is elevated in patients with RA, TNF-α inhibitors have been frequently used to treat of RA. Unlike other agents for RA therapy, TNF-α inhibitors target cytokines and are used to treat patients with advanced RA.

Damage-associated molecular patterns (DAMPs) are endogenous danger molecules that activate the innate immune system by interacting with TLRs. This evokes innate immune responses, including induction of inflammatory cytokines31. DAMPs play an important role in the initiation of inflammation during tissue injury without infection and are may also be involved in chronic inflammation including autoimmune diseases12. Once DAMPs are released during tissue injury, TLRs are activated, and the inflammatory cycle is initiated. The binding of TLRs to DAMPS activates the receptors, up-regulating pro-inflammatory mediators including cytokines and resulting in various inflammatory conditions and chronic inflammation12.

Ligand-bound TLRs interact with elements on the surface of pathogens and activate the MyD88-related pathways31, resulting in NFkB activation and cytokine gene expression10. This ultimately leads to the induction of molecules associated with inflammation and release of pro-inflammatory components such as TNF-α32. TLRs are known as inducers of TNF-α transcription13. Several studies have shown an increased expression of TLR4 on RA synovial fluid macrophages and RA synovial fibroblasts33, 34 and of TLR9 in RA synovial tissue fibroblasts and RA peripheral blood monocytes18, 35.

Our results revealed that TLR9 polymorphism was associated with the remission rate of RA patients taking TNF-α inhibitors. The T-allele carriers of rs352139 had a significantly higher remission rate than patients with the CC genotype. TLR9 is expressed by B cells and functions with the B cell receptor complex, resulting in the release of rheumatoid factor36. Previously, Bank et al.22 have reported an association of the GG genotype of rs352139 with nonresponse to TNF-α inhibitors in inflammatory bowel disease patients, which is in line with our findings. This association is possibly attributable to alteration in TLR function; however, further research is required to validate our results, as there are no published mechanistic studies on the association between this polymorphism and TNF-α inhibition or treatment response in advanced RA patients.

The utilization of machine learning approaches to predict remission in patients with RA receiving TNF-α inhibitors is novel in clinical research. In clinical settings, these models can be helpful in decision-making process. To overcome over-fitting, this study utilized random forest, an ensemble method of bootstrap aggregated binary classification trees37 for feature selection. We also demonstrated that multivariate logistic regression and elastic net, a penalized linear regression model that combine penalties of the lasso and ridge methods38, models outperformed other models. Hence, these models may be useful in predicting remission in patients on TNF-α inhibitor treatment.

The limitations of our study are its small sample size and the lack of a detailed mechanism. Nevertheless, to our knowledge, this is the first study to investigate the effects of genetic variations in the TLR4 and TLR9 genes on favorable response rates to RA treatment in patients taking TNF-α inhibitors. Moreover, this study provides important features and prediction models based on machine learning algorithms including logistic regression, elastic net, random forest and SVM for remission in patients with RA receiving TNF-α inhibitors. Since our study developed prediction models using TLR4 and TLR9 gene polymorphisms for remission of RA in patients taking TNF-α inhibitors, result of this study could be utilized to develop and design individually tailored TNF-α inhibitor treatments for RA patients.