Association of TLR 9 gene polymorphisms with remission in patients with rheumatoid arthritis receiving TNF-α inhibitors and development of machine learning models

Toll-like receptor (TLR)-4 and TLR9 are known to play important roles in the immune system, and several studies have shown their association with the development of rheumatoid arthritis (RA) and regulation of tumor necrosis factor alpha (TNF-α). However, studies that investigate the association between TLR4 or TLR9 gene polymorphisms and remission of the disease in RA patients taking TNF-α inhibitors have yet to be conducted. In this context, this study was designed to investigate the effects of polymorphisms in TLR4 and TLR9 on response to TNF-α inhibitors and to train various models using machine learning approaches to predict remission. A total of six single nucleotide polymorphisms (SNPs) were investigated. Logistic regression analysis was used to investigate the association between genetic polymorphisms and response to treatment. Various machine learning methods were utilized for prediction of remission. After adjusting for covariates, the rate of remission of T-allele carriers of TLR9 rs352139 was about 5 times that of the CC-genotype carriers (95% confidence interval (CI) 1.325–19.231, p = 0.018). Among machine learning algorithms, multivariate logistic regression and elastic net showed the best prediction with the area under the receiver-operating curve (AUROC) value of 0.71 (95% CI 0.597–0.823 for both models). This study showed an association between a TLR9 polymorphism (rs352139) and treatment response in RA patients receiving TNF-α inhibitors. Moreover, this study utilized various machine learning methods for prediction, among which the elastic net provided the best model for remission prediction.

www.nature.com/scientificreports/ RA are treated with TNF-α inhibitors; however, the efficacy of these treatments is still questionable as several studies have reported that only one-third of the patients benefit from the treatment 7,8 . Toll-like receptors (TLRs) play vital roles in both innate and acquired immune systems 9 , and several studies have shown their association with the development of RA [10][11][12] . Notably, TLRs are known as inducers of TNF-α transcription 13 . Triad3A is an E3 ubiquitin-protein ligase that induces degradation of TLR4 and TLR9 14 . Hence, reduction in endogenous Triad3A results in TLR activation. Since Triad3A acts specifically on TLR4 and TLR9 among the 13 members of the TLR family, the genes encoding TLR4 and TLR9 are important for understanding RA pathogenesis and potential therapeutic intervention [15][16][17][18] . A study showed that TLR4 is specifically required for production of osteoclastogenic cytokines, thus, involved in pathophysiology of RA 19 . Moreover, an in vitro study reported that TLR4 is required for the TNF-α expression 20 . Another study revealed that TLR9 level was elevated on circulating and synovial monocyte subsets of RA patients 21 .
Nuclear factor-kappaB (NFkB) is associated with the response to TNF-α inhibitors in autoimmune diseases 22 . Due to this association, several proteins activating NFkB have been discovered and investigated, including TLRs. As TLRs activate pro-inflammatory cytokines including TNF-α and transcription factors such as NFkB, their polymorphisms may potentially affect treatment outcomes 23 .
Recently, machine learning methods have been utilized as tools for decision making and clinical predictions. Compared to traditional predictive models that use selective variables for calculation, machine learning approaches are favorable when developing novel prediction models. Moreover, remission in RA is important since clinical remission is considered a treat-to target goal. Therefore, this study was designed to investigate the effects of polymorphisms in TLR4 and TLR9 on response to TNF-α inhibitor and by training predictive models utilizing various machine learning approaches for remission.

Methods
Study patients. This prospective observational two-center study enrolled 105 patients who were prescribed TNF-α inhibitors (adalimumab, etanercept, golimumab, or infliximab) at Ajou University Hospital and Chungbuk National University Hospital between July 2017 and December 2019. Data collection was conducted using electronic medical records. Data on sex, age, weight, height, duration of RA, autoantibodies against rheumatoid factor, anti-cyclic citrullinated peptide, concomitant medications, and comorbidities were collected from electronic medical records. Additionally, baseline data on disease activity score (DAS)-28 and its subcomponents, which included tender joint count (TJC)-28, swollen joint count (SJC)-28, global health (GH), and erythrocyte sedimentation rate (ESR) or C-reactive protein levels, were collected.
This study was approved by the Institutional Review Boards of the Ajou University Hospital (approval number: AJIRB-BMR-OBS-17-153) and Chungbuk National University Hospital (approval number: 2017-06-011-004). All patients submitted written informed consents for participation. This study was conducted according to the principles of the Declaration of Helsinki (2013).
Genomic DNA of the patients was isolated from ethylenediaminetetraacetic acid (EDTA)-blood samples using the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's protocol. Genotyping was performed using a single-base primer extension assay with TaqMan genotyping assay in a real-time PCR system (ABI 7300, ABI), according to the manufacturer's recommendations (Supplementary section).

Statistical analysis and machine learning methods. Student's t-test was used to compare continuous
variables between patients who showed good clinical response (remission) and those who did not. Chi-square test or Fisher's exact test was used to compare categorical variables between the two groups. Multivariable logistic regression analysis was used to examine independent factors affecting remission; factors with a p-value less than 0.05 in univariate analysis along with clinically relevant confounders were included in multivariable analysis. The Hosmer-Lemeshow test was performed to confirm the model's goodness of fit.
This study employed a random forest-based classification approach to analyze the importance of different variables for factors that affect remission. To prevent over-fitting, we selected seven features that are most important. Various machine learning methods such as multivariate logistic regression, elastic net, random forest, and support vector machine (SVM) were utilized for prediction of remission. All the methods were implemented with the caret R package (version 6.0-88, https:// github. com/ topepo/ caret/). The area under the receiver-operating curve (AUROC), to assess the ability of the risk factor to predict complication, and its 95% confidence interval (CI) of each machine learning prediction models were described in this study. A p-value of less than 0.05 was considered statistically significant. Univariate statistical analysis was conducted using IBM SPSS statistics, version 20 software (International Business Machines Corp., New York, USA). All other analyses were performed using R software version 3.6.0 (R Foundation for Statistical Computing, Vienna, Austria). www.nature.com/scientificreports/ To measure performance of each machine learning model, internal validation was done. The dataset was randomly divided for model development and evaluation in prediction process. After partitioning one data sample into five subsets, one subset was selected for model validation while the remaining subsets were used to establish machine learning models. Each five-fold cross-validation iteration was repeated 100 times to evaluate the power of the machine learning models.

Results
Among the 105 patients enrolled in this study, 7 patients were excluded due to incomplete medical data. The data from 98 patients receiving TNF-α inhibitors were analyzed. The mean age of the included patients was 53 years (range: 20-82 years), and there were 79 (80.6%) females. The mean duration of RA was 9 years, and 29 patients reached remission. To determine the possible effect of disease status on response to TNF-α inhibitors, baseline DAS-28 and its subcomponents were examined. Baseline DAS-28 and its subcomponents were not statistically significant between the remission and non-remission groups (Table 1). Marginal significance was found according to sex (p = 0.059) and hypertension (p = 0.060).
As shown in Table 2, statistically significant associations between genotypes and RA remission were found for both TLR9 SNPs: T-allele carriers of rs352139 and rs352140 experienced approximately 3.3 and 4.5 times more frequent remission than patients with the CC genotype, respectively. A table of SNP with three genotypes is provided in the Supplementary section (Supplementary Table S2).
Multivariable analysis (Table 3) included sex, age, and factors with p < 0.05 from the univariate analysis. Because significant linkage disequilibrium was observed between rs352139 and rs352140 (r 2 = 0.95), only rs352139 was included in the multivariable analysis. Among the included factors, rs352139 was significantly associated with RA remission (95% CI 1.325-19.231, p = 0.018). After adjusting for related covariates, the remission rate in T-allele carriers of rs352139 was about 5.1 times that in patients with the CC genotype. The Hosmer-Lemeshow test showed that the fitness of the multivariable analysis model was satisfactory (χ 2 = 0.907, 2 degrees of freedom, p = 0.636).
As shown in Fig. 1, after feature selection using performing five-fold cross-validated random forest approach, four important variables from feature selection (rs352139, body mass index (BMI), sulfasalazine, and anticitrullinated protein/peptide antibody (AC-PA)) were included in machine learning models. After performing five-fold cross-validated multivariate logistic regression, elastic net, random forest, support vector machine (SVM) models, the average area under the receiver-operating curve (AUROC), values across 100 random iterations were shown in Table 4. The AUROC values for multivariate logistic regression, elastic net, and random forest indicated good performances of the models; 0.71, 0.71, and 0.70 respectively (95% CI 0.594-0.827 for multivariate logistic regression and elastic models and 0.584-0.821, respectively). Linear kernel SVM and radial kernel SVM revealed sub-optimal performances of the models; AUROC values of 0.60 and 0.67, respectively (95% CI 0.416-0.782 and 0.53-0.813, respectively). Figure 2 showed AUROC curves of three models that exhibit good interpretability and prediction rate. Details for the packages used and parameters used for training models are provided in the Supplementary section (Supplementary Table S1).

Discussion
The main finding of this study is that rs352139 of TLR9 was associated with treatment response to TNF-α inhibitors in RA patients. The remission rate in T-allele carriers of rs352139 was about 5 times that in patients with the CC genotype. Multivariate logistic regression and elastic net were proven to be the most suitable method in predicting remission in patients with RA, with AUROC values of 0.71 (95% CI 0.594-0.827 for both models).
TNF-α is a pro-inflammatory cytokine involved in the innate immune response 30 . It is involved in the pathogenesis of several inflammatory conditions, especially RA. As the TNF-α level is elevated in patients with RA, TNF-α inhibitors have been frequently used to treat of RA. Unlike other agents for RA therapy, TNF-α inhibitors target cytokines and are used to treat patients with advanced RA.
Damage-associated molecular patterns (DAMPs) are endogenous danger molecules that activate the innate immune system by interacting with TLRs. This evokes innate immune responses, including induction of inflammatory cytokines 31 . DAMPs play an important role in the initiation of inflammation during tissue injury without infection and are may also be involved in chronic inflammation including autoimmune diseases 12 . Once DAMPs are released during tissue injury, TLRs are activated, and the inflammatory cycle is initiated. The binding of TLRs to DAMPS activates the receptors, up-regulating pro-inflammatory mediators including cytokines and resulting in various inflammatory conditions and chronic inflammation 12 .
Ligand-bound TLRs interact with elements on the surface of pathogens and activate the MyD88-related pathways 31 , resulting in NFkB activation and cytokine gene expression 10 . This ultimately leads to the induction of molecules associated with inflammation and release of pro-inflammatory components such as TNF-α 32 . TLRs are known as inducers of TNF-α transcription 13 . Several studies have shown an increased expression of TLR4 on RA synovial fluid macrophages and RA synovial fibroblasts 33,34 and of TLR9 in RA synovial tissue fibroblasts and RA peripheral blood monocytes 18,35 .
Our results revealed that TLR9 polymorphism was associated with the remission rate of RA patients taking TNF-α inhibitors. The T-allele carriers of rs352139 had a significantly higher remission rate than patients with the CC genotype. TLR9 is expressed by B cells and functions with the B cell receptor complex, resulting in the release of rheumatoid factor 36 . Previously, Bank et al. 22 have reported an association of the GG genotype of rs352139 with nonresponse to TNF-α inhibitors in inflammatory bowel disease patients, which is in line with our findings. This association is possibly attributable to alteration in TLR function; however, further research is required to validate our results, as there are no published mechanistic studies on the association between this polymorphism and TNF-α inhibition or treatment response in advanced RA patients. www.nature.com/scientificreports/ The utilization of machine learning approaches to predict remission in patients with RA receiving TNF-α inhibitors is novel in clinical research. In clinical settings, these models can be helpful in decision-making process. To overcome over-fitting, this study utilized random forest, an ensemble method of bootstrap aggregated binary classification trees 37 for feature selection. We also demonstrated that multivariate logistic regression and elastic net, a penalized linear regression model that combine penalties of the lasso and ridge methods 38 , models outperformed other models. Hence, these models may be useful in predicting remission in patients on TNF-α inhibitor treatment.
The limitations of our study are its small sample size and the lack of a detailed mechanism. Nevertheless, to our knowledge, this is the first study to investigate the effects of genetic variations in the TLR4 and TLR9 genes on favorable response rates to RA treatment in patients taking TNF-α inhibitors. Moreover, this study provides important features and prediction models based on machine learning algorithms including logistic regression, elastic net, random forest and SVM for remission in patients with RA receiving TNF-α inhibitors. Since our study developed prediction models using TLR4 and TLR9 gene polymorphisms for remission of RA in patients taking TNF-α inhibitors, result of this study could be utilized to develop and design individually tailored TNF-α inhibitor treatments for RA patients.