Genome-Wide Data reveals Novel Genes for Methotrexate Response in a Large Cohort of Juvenile Idiopathic Arthritis Cases

Clinical response to methotrexate (MTX) treatment for children with juvenile idiopathic arthritis (JIA) displays considerable heterogeneity. Currently, there are no reliable predictors to identify non-responders: earlier identification could lead to targeted treatment. We genotyped 759 JIA cases from the UK, Netherlands and Czech Republic. Clinical variables were measured at baseline and 6 months after start of treatment. In Phase I analysis samples were analysed for association with MTX response using ordinal regression of ACR-pedi categories and linear regression of change in clinical variables, and identified 31 genetic regions (P<0.001). Phase II analysis increased SNP density in the most strongly associated regions, identifying 14 regions (P<1×10−5): three contain genes of particular biological interest (ZMIZ1, TGIF1 and CFTR). These data suggest a role for novel pathways in MTX response and further investigations within associated regions will help reach our goal of predicting response to MTX in JIA.


INTRODUCTION
Juvenile Idiopathic Arthritis (JIA) is a heterogeneous condition with variable outcome and considerable ongoing disease burden [1]. Studies indicate that functional disability and complications due to JIA are still common in many teenagers and young people with JIA, and that the effects of early, uncontrolled inflammation may cause irreversible damage to joints and other tissues [2][3][4]. Thus improving long-term outcomes of children with JIA remains a critical challenge. Recent studies in JIA have indicated that early control of joint inflammation correlates with much improved outcomes, suggesting an early 'window of opportunity' when disease control can translate to profound long-term benefit [5]. It is known that not all children respond equally well to any given therapy. Despite increasing availability of new therapeutic options for treating inflammation in JIA, clinicians have no validated tools to help predict likelihood of good response to a particular drug. Therefore the current treatment strategy is to offer disease-modifying drugs in a sequential approach, with choices typically driven more by cost or safety profile than by scientific evidence.
The first line disease-modifying agent for JIA is methotrexate (MTX). Although MTX has proven efficacy in randomised trials and a good long-term safety record, response to MTX displays considerable heterogeneity in JIA with a significant 'non-response' rate of 35% or more of cases [6]. Additionally, consensus concerning level of 'response' that is considered Europe PMC Funders Author Manuscripts acceptable has shifted, with a target of complete control of inflammation now being advocated [7]. These developments combined with ever increasing availability of newer biologic treatments, provide further imperative for the discovery of biomarkers to aid the identification of children who require early aggressive therapy, compared to those who can achieve clinically inactive disease on MTX alone.
Response to treatment is thought to be a complex trait involving multiple genetic variants and environmental factors [8]. However, to date genetic studies of MTX response have been limited for both JIA and rheumatoid arthritis (RA) and have only utilised a candidate gene approach, focussing largely on genes affecting MTX transport and metabolism, enzymes influenced by MTX and adenosine pathways [9][10][11][12][13][14][15][16], recently reviewed in [17].
Given that the mechanisms of action of MTX in JIA are poorly understood, candidate gene studies may miss key pathways of mechanistic importance. Pharmacogenetic studies in other diseases have shown that genes other than those directly involved in known drug pathways often play key roles in variation of drug response [18]. In order to capture the genetic component more comprehensively, we brought together an International Consortium of investigators (the CHARMS-JIA GWAS International Consortium), and employed a genome wide approach to study response to MTX in a large cohort of children with JIA.

Study Population
A cohort of children was recruited for the SPARKS-CHARM (CHildhood Arthritis Response to Medication) Study which has the overall aim to improve understanding of the variability in response to treatment observed in children with JIA and ultimately define a multifactorial model of response outcomes [14], and through the CHARMS-JIA GWAS International Consortium. All cases fulfilled the International League of Associations for Rheumatology (ILAR) criteria for JIA and were about to start new MTX treatment for active arthritis [14]. The study had full ethical approval and was fully compliant with the Declaration of Helsinki; parents provided fully informed consent, and patients provided ageappropriate assent. Samples and data were collected using the same inclusion and exclusion criteria at Great Ormond Street Hospital London, Birmingham Children's Hospital, Department of Paediatrics and Adolescent Medicine Charles University Prague, Wilhelmina Children's Hospital and University Medical Centre Utrecht, and also as part of the Childhood Arthritis Prospective Study (CAPS), a prospective inception cohort study of JIA cases from five centres across the UK [19].
A total of 759 individuals were included from all subtypes of JIA, classified according to ILAR criteria [20]. Demographic and clinical data were collected at baseline (up to 4 weeks before beginning MTX treatment) and again after 6 months (median 6.2 months, range 4-8 months) of MTX treatment. MTX was given orally or subcutaneously at 10-15 mg/m 2 per week (median 11.3 mg/m 2 per week). No other DMARDS were taken concurrently. Steroid treatment was recorded if administered at any point between baseline and follow-up. 370 children received at least one form of steroid medication: this comprised oral prednisolone in 204 children, pulsed iv methylprednisolone was taken by 99 children, while intra-articular joint injections were given to 204 children. Despite patients coming from different regions in Europe, the indications and protocols for use of MTX in JIA were the same across centres.
Clinical data included the six core set variables; erythrocyte sedimentation rate (ESR), childhood health assessment questionnaire (CHAQ) 0-3 [21], active joint count (AJC), limited joint count (LJC), physician's global assessment on a visual analogue scale (PhysVAS) 0-10cm, and the parent/patient global assessment (ParVAS) 0-10cm. As this is an observational study, missing data differed for each core set variable. These variables were used to categorise patients according to the American College of Rheumatology paediatric (ACR-pedi) 30, 50, and 70 improvement criteria, or as non-responders [22]. Note that all children who reach ACR-pedi70 automatically also reach ACR-pedi30 and 50, while those who achieve ACR-pedi50 also achieve ACR-pedi30. In order not to count any child more than once, we defined the level of response for each child by the highest level of response achieved (ACR-pedi 30, 50 or 70).

Genotyping
Samples were genotyped using the Illumina HumanOmniExpress infinium array, according to Illumina's protocols in Manchester, UK. The default Illumina clustering algorithm (GenTrain2.0) was used to cluster SNPs in the software package GenomeStudio. SNPs were excluded if they had a call rate <98% and a cluster separation score of <0.4. Samples were then excluded for call rate <98%, incompatible recorded and genotype inferred gender, duplicates and evidence of identity by descent, or those with outlying heterozygosity. Combining the samples with data from HapMap Phase 3 individuals, principal component (PC) analysis was performed using Eigensoft v4.2 to identify extreme ethnic outliers [23;24]. PC analysis was performed on a subset of SNPs with minor allele frequency (MAF) ≥0.05, selected by removing SNPs in known regions of high linkage disequilibrium (LD) [25] and further pruned for LD between markers. Samples failing to cluster with European HapMap individuals were visually identified and removed. SNPs were excluded from the analysis if they had a MAF<0.05 and failed the Hardy-Weinberg equilibrium test (P≤ 0.001). To assess our dataset for potential systematic over-inflation due to stratification, the genomic control inflation factor (λ GC ) was calculated using the same SNP subset as used in the PC analysis. Cluster plots were visually inspected for the most associated SNPs to confirm genotyping quality.

Statistical Analysis
Data were available on a number of potential confounding variables: gender, sample collection centre, presence or absence of concurrent steroid treatment, age at treatment baseline, time to treatment, duration of treatment and ILAR subtype (grouped into three categories: (1) oligoarthritis: persistent and extended, (2) polyarthritis: RF negative and positive, (3) psoriatic, enthesitis related, systemic and unclassifiable arthritis) (9). Each of these potential confounders were assessed for association with each core set variable. Moreover, presence of population stratification was checked using the first five PCs of the genotype dataset.
Statistical analysis was performed using Plink v1.07 [26] and R v2.15 (http://www.rproject.org), and plots were generated using R and LocusZoom [27]. SNPs were coded by minor allele count as 0, 1, 2. MTX response was defined using the ACR-pedi criteria with four categories: non-responders (reference category), ACR-pedi30, ACR-pedi50, ACR-pedi70, and association between genotype and MTX response was analysed using ordinal regression. Similar to other genetic studies of drug response measured by composite disease scores [18], we hypothesise that attempting to identify the underlying genetic basis of MTX response may be usefully performed by analysing each of the core set variables individually, since it is likely that the genetic basis of each of these is different, with varying contributions to MTX response. As the core set variables are not entirely independent from one another or the ACR-pedi status, the multiple testing burden is not as great as if we performed multiple tests on independent outcomes. Core set variables (ESR, CHAQ, AJC, LJC, PhysVAS, ParVAS) were recoded as change between baseline and follow-up, and linear regression was used to assess the strength of association for each.
We conducted our analysis in two phases (I and II) with the same individuals in each analysis phase. We utilised a low stringency of significance in Phase 1 of the analysis in order to maximise discovery of loci for more detailed investigation in Phase II, where a more stringent significance threshold was set. In Phase I of the analysis, results from the ordinal regression of ACR-pedi categories and six linear regressions of the core set variables (ESR, CHAQ, AJC, LJC, PhysVAS, ParVAS), totalling seven analyses, were used to identify genomic regions of interest for further investigation. The significance threshold selected (P<0.001) allows for greater emphasis on power than reducing type I error to enable hypothesis generation, an approach taken previously [18].
Regions were then selected for further analysis (Phase 2) by searching for clusters of associated SNPs (P<0.001 in at least two of the seven analyses) and extending out to include all SNPs within the annotated gene (based on the Illumina HumanOmniExpress gene annotation file, hg19). This resulted in regions of interest of varying sizes (range 0.02kb-12.8mb, average 1359kb). The aim of Phase II was to refine these regions. This was performed using SNP imputation to increase the density of SNP coverage in those regions. SHAPEIT v1 was used to pre-phase genotypes and SNPs were imputed against the 1000 Genomes Project reference panel (approximately 37 million SNPs) using IMPUTE2 [28;29]. Imputed SNP genotypes reaching the probability threshold 0.9 were included in the followup re-analysis, which focused on only these imputed regions using the same samples as Phase I and performed ordinal and linear regressions as described in Phase I. Regions containing at least one SNP in Phase II with association P<1×10 −5 are the focus of the results presented here.

Power Calculation
Study power was estimated at the two significance thresholds used in Phase I (P<0.001) and Phase II (P<1×10 −5 ) of the analyses, over the range of sample sizes available, and assuming the variance explained by the additive effect of the SNP tested ranged from 0.01-0.1 under an additive genetic model.

Functional Annotation
In order to gain a better understanding of the potential biological impact of our results, the most highly associated SNPs identified in Phase II (as well as SNPs in high linkage disequilibrium (r 2 >0.8)) were queried using the web tool Assimilator (http:// assimilator.mhs.manchester.ac.uk/cgi-bin/assimilator.pl) [30]. This facilitates collation of functional annotations from the publically available ENCODE and UCSC Genome Browser databases. Using the advanced search options available, output was focussed on whether any of the Phase II associated SNPs have shown evidence of transcription factor binding sites, evidence for open chromatin suggesting regions of active gene expression, and epigenetic marks which may be affected by drug treatment [31].

RESULTS
Following stringent quality control, 694 JIA cases were available for analysis (Supplementary Table 1A and 1B) comprising individuals from all ILAR subtypes (Table  1). 31% of children were non-responders (less than ACR-pedi30 response). Among responders, categorised by their highest level of response achieved, 8.6% of children reached ACR-pedi30, 14.6% ACR-pedi50 and 45.8% ACR-pedi70 response. Samples clustered together in the PC analysis (Supplementary Figure 1) and therefore were analysed together. Following the SNP quality control steps and removal of low frequency variants (MAF<0.05), 586,062 SNPs were included in the Phase I analyses (Supplementary Table 1). Quantile-quantile plots and inflation factors showed no systematic inflation of P values (Supplementary Figure 2), and power was estimated to range from 10-100% across the analysed sample sizes for various effect sizes (Supplementary Figure 3).
None of the potential confounding variables tested (gender, ILAR JIA subtype, centre, age at treatment baseline, duration of treatment, time to treatment, steroid treatment or PCs generated to identify ethnic outliers) were associated with all six individual core set variables (data not shown); therefore to reduce loss of analysis power no adjustments were made to the linear or ordinal regressions.
In the hypothesis generating Phase I of the analysis, using both ACR-pedi and the individual core set variables, 31 genetic regions encompassing 75 nearby genes achieved our defined level of significance (P<0.001 in at least 2 of the 7 analyses), Table 2 and Supplementary  Table 2. This included several notable associations such as genes related to TGFbeta signaling (ZMIZ1: zinc finger MIZ-type containing 1, TGIF1: TGFB-induced factor homeobox 1) and a member of the multi-drug resistance subfamily of the ATP-binding cassette transporter proteins (CFTR: cystic fibrosis transmembrane conductance regulator). Overall in Phase I, the most significant was a variant within an intron of the calcium channel CACNA1I (voltage-dependent calcium channel T type alpha 1I subunit) in the analysis of active joint count (AJC) (rs136855, region 31, ß coefficient= 2.71, P= 9.18×10 −8 , see Supplementary Table 2). Two regions showed strong evidence with 13 SNPs in each associated at P<1×10 −4 across several analyses (Region 12, CFTR-CTTNBP2: ParVAS, LJC, CHAQ; Region 20, ZMIZ1: ACR-pedi, ParVAS, ESR, CHAQ LJC, AJC; Figure 1). Using the 31 significant genetic regions found in the discovery phase of the analysis, the next analysis performed was to narrow down the genetic region of interest.
Phase II of analysis involved imputation of SNPs within the 31 regions identified in Phase l to refine the association signals by increasing SNP density. After imputation, using the increased Phase II significance threshold of P<1×10 −5 this analysis identified 14 of the initial 31 genetic regions as the most strongly associated with response to MTX (Table 3, Supplementary Figure 4). Overlapping associations of SNPs were revealed for AJC and LJC in several genetic regions (regions 16, 17, 23 and 28). In one of these regions (region 23, chromosome 11 intergenic between ANGPTL5-KIAA1377) the top associated SNP was the same (rs11225055), and in the other three regions the most significant associated SNPs for the AJC and LJC analyses were in very high LD (r 2 ≥0.97). In all four regions showing association with ACR-pedi (regions 2, 12, 20 and 24), the ParVAS and/or PhysVAS scores were also associated.
Functional annotations for the most highly associated Phase II SNPs from Table 3 (plus SNPs in high LD (r 2 >0.8) with these lead SNPs) were assigned using Assimilator software [30]. The results presented in Supplementary Table 3 suggest the regions identified in Phase II of the analysis contain evidence of many markers of regulation and highlight many possible functional mechanisms. Certain regions were less fully covered by current databases, for example region 14 containing the gene CSMD1. Others including region 30 containing the gene CYTH4 (cytohezin 4), have more evidence for regulatory activity, including multiple SNPs showing evidence of acting as an expression quantitative trait loci (eQTL) [32].

DISCUSSION
Recent developments in treatments and management of childhood arthritis have lead to increased expectations from clinicians, parents and patients for complete control of disease and consequent reduction of long-term adverse health outcomes [4;5;33]. The first step in JIA treatment, in parallel with joint injections, is typically administration of MTX: however it is clear that a proportion of patients treated with MTX will fail to respond adequately. Given recent recommendations for early aggressive treatment, it is important that MTX treatment is targeted to those children most likely to respond well [34]. Increasing our understanding of the influence of genetic variants in MTX response could assist clinicians to choose the best treatment options for their patients and identify patients who need more aggressive treatments. Performing large-scale genetic studies searching for variants contributing to MTX response has great appeal, but has proved challenging due to the relative rarity of JIA and lack of well co-ordinated international efforts. With this in mind, the CHARMS-JIA GWAS International Consortium facilitated the collection of carefully phenotyped response to medication data, and DNA samples from children with JIA treated with MTX for their arthritis, enabling the largest genetic analysis of MTX response in JIA to date. In Phase I of the analysis a total of 31 regions were identified as associated with response to MTX at P<0.001. To narrow down these genetic regions, additional SNPs were imputed in Phase II of the analysis, with the results in 14 regions satisfying a more stringent cut-off of P<1×10 −5 . The most strongly associated locus was CACNA1I which encodes the alpha chain of a low voltage-activated calcium channel that has been implicated in calcium signalling in neurons and may have other roles that have yet to be characterised. Other notable associated genes include CFTR, ZMIZ1 and TGIF1. Although the genes identified from this analysis need replication in independent cohorts, they provide some plausible novel candidates for further investigations into MTX response.
One association of considerable interest is the cystic fibrosis transmembrane conductance regulator gene, CFTR. The peak association signal for this region is within the 3′ end of the downstream CTTNBP2 gene (Figure 1b); this is interesting since it is known that CTFR expression is regulated by complex structural looping involving this region of CTTNBP2 [35]. CFTR, also known as ABCC7, is a member of the ATP-binding cassette transporter superfamily, specifically the multi-drug resistance subfamily. These proteins are known to be important to drug transport and elimination [36]. Interestingly, a gene within the same subfamily, ABCC3, which is known to be involved in MTX efflux has recently been shown to contain a SNP (rs4793665) associated with MTX response in a cohort of 287 Dutch JIA patients [10]. This finding led us to specifically review this gene within our results despite it not fulfilling the selection criteria for Phase I or II. We found that rs4793665 was not directly genotyped in this study nor were there SNPs within r 2 ≥0.8 on the chip; however there were 38 SNPs within the introns/exons of ABCC3 genotyped, with 15 showing association in our cohort at P<0.05 with the MTX response outcomes analysed (except ESR). The most associated of these, rs4148411 within an intron of ABCC3, was found in the PhysVAS analysis (P= 7.55×10 −5 ) and is in low LD with the SNP identified by de Rotte and colleagues (r 2 =0.02), suggesting that comprehensive further investigation of this gene is warranted.
Interestingly our study identifies several genes related to TGFbeta signalling as being associated with response to MTX. ZMLZ1 has been identified in several GWAS of autoimmune diseases [37][38][39][40]. It is a member of the protein inhibitor of activated STAT family, is known to regulate several transcription factors (androgen receptor, Smad3/4, p53) and TGFbeta/SMAD signalling, and is induced by retinoic acid [41]. It is well established that TGFbeta/SMAD and retinoic acid have important roles in the balance between Th17 and Treg cells [42], which are known to impact directly upon JIA severity [43]. Therefore a possible role for this gene in response to treatment in JIA is of considerable interest. Corroborating this finding, another associated region contains TGFbeta-induced factor homeobox 1 (TGIF1), known to be an active transcriptional corepressor of SMAD2 and to modulate the down-regulation of aryl hydrocarbon receptor (AhR) [44;45]. Together these results suggest TGFbeta signalling is a strong biological candidate for a role in reducing disease activity with MTX treatment. These data are of particular interest, since they directly parallel our gene expression profiling studies, which identified TGFbeta signalling, TGFB-2 and the zinc finger protein ZEB1, which interacts with SMAD signalling proteins, as being involved in response to MTX in children with JIA [14].
Four genetic regions associated with ACR-pedi status also showed associations with either ParVAS or PhysVAS. In some research studies, using the ACR-pedi status can present difficulties due to missing data observed in long-term observational cohorts. This finding suggests the ACR-pedi, ParVAS and PhysVAS scores measure MTX response similarly, leading to the possibility that both ParVAS and PhysVAS could be used to measure response to treatment, when full clinical data are unavailable.
To our knowledge, this is the first report of a large-scale genetic association study of MTX response in inflammatory arthritis: despite several international efforts in GWAS studies of JIA and RA as a whole, no previous large-scale analysis of MTX response are available to date, perhaps in part due to the considerable challenges of collecting adequate numbers of cases with detailed response data as well as DNA. A study investigating interferon-beta treatment in MS (using 53 responders and 53 non-responders), found that of the best associations most were in glutamate and interferon receptors, a cell-cycle dependent protein, and guanosine triphosphatase-activating and zinc-finger proteins, all genes not known to be directly involved in the drug metabolism pathway [46]. A recent GWAS in 706 RA patients treated with tocilizumab, a biologic therapy targeting the interleukin (IL)-6 receptor, found eight putative loci associated with tocilizumab efficacy; however none were in known RA risk or IL-6 pathways [18]. Similarly, our results suggest that multiple genes determine response to MTX treatment in JIA, and not just those in known MTX pathways. In fact, none of the MTX pathway genes previously investigated in candidate gene studies in both JIA and RA met our selection criteria for Phase II. This is possibly due to the small sample size and lack of power in previous studies resulting in false positive associations, and may be additionally confounded by the power limitations of our study including the availability of ACR-pedi scores on only a subset of our cohort. Despite this, the lack of strong association in the MTX pathway genes is interesting, and suggests that novel pathways and mechanisms, hitherto not known, may be important to pursue in order to understand and fully elucidate the actions of MTX and the genes involved in success or failure of MTX treatment [47]. It also suggests that the previously developed MTX efficacy prediction models for both RA [16] and JIA [9] could be further enhanced or further optimised by incorporating additional genetic variants outside the MTX pathway genes.
Previous investigations of the genetics of MTX response in JIA have been small, often underpowered, studies taking a candidate gene approach focussing on genes in MTX drug pathways. In contrast, this study is large and comprehensively covers the genome, the first of its kind for JIA. We have identified several regions of interest, three of which show a remarkable degree of functional overlap with genes and pathways implicated by gene expression profiling and previous candidate gene studies. By analysing each clinical outcome variable individually we show their genetic contributions to MTX response may differ, although with interesting overlap in novel candidates including TGIF1, ZMIZ1 and CFTR. Future targeted replication of the exciting novel regions identified is now required to confirm these findings. This study provides an excellent basis for the future development of genetic risk models for MTX response prediction.

Reference List
(1). Moorthy   Demographics of CHARMS-JIA GWAS International Consortium individuals included in this analysis by sample centre/country.  Table 2 Summary of the most highly associated regions identified in phase l of analysis, indicating the genes nearby each region and the total number of SNPs associated at P<0.001 and for which analysis. This formed the basis of the regions selected for SNP imputation in Phase II (some nearby regions were merged and others were expanded to comply with the minimum size requirements in IMPUTE2). Region coordinates based on the NCBI37 assembly. See Supplementary Table 2 for full results.  Table 3 Summary of most highly associated Phase II results (including imputed genotypes #