In a recent genome-wide association study (GWAS) based on 12 374 non-synonymous single nucleotide polymorphisms we identified a number of candidate multiple sclerosis susceptibility genes. Here, we describe the extended analysis of 17 of these loci undertaken using an additional 4234 patients, 2983 controls and 2053 trio families. In the final analysis combining all available data, we found that evidence for association was substantially increased for one of the 17 loci, rs34536443 from the tyrosine kinase 2 (TYK2) gene (P=2.7 × 10−6, odds ratio=1.32 (1.17–1.47)). This single nucleotide polymorphism results in an amino acid substitution (proline to alanine) in the kinase domain of TYK2, which is predicted to influence the levels of phosphorylation and therefore activity of the protein and so is likely to have a functional role in multiple sclerosis.
Multiple sclerosis is a disease of the central nervous system that is frequently disabling.1 Although evidence indicates that the condition results from a complex interplay between environmental and genetic factors,2, 3, 4, 5 relatively little is known about precisely which genes are involved. Association with the DR15 haplotype from the major histocompatibility complex (MHC) on chromosome 6p21 has been established for more than 30 years6 and confirmed in nearly every population studied.7 Identification of other susceptibility genes has proven to be decidedly more difficult. Systematic efforts to screen for linkage8 together with segregation analysis9, 10 indicate that susceptibility to multiple sclerosis is likely determined by a series of common variants each exerting only a modest effect on risk. Fortunately, the resources necessary to find such modest genetic effects are now available and the last 2 years have seen remarkable progress in the identification of susceptibility genes of relevance in multiple sclerosis. Through a candidate gene approach prompted by a careful systematic analysis of all available data, association with the interleukin-7 receptor (IL7R) gene has been firmly established.11, 12 In a genome-wide association study (GWAS) involving 931 cases, The International Multiple Sclerosis Genetics Consortium (IMSGC) identified the interleukin-2 receptor (IL2R) as a susceptibility gene,13 which was confirmed in a very large replication study.14 Recent studies have also shown a role for CD226 and CLEC16A in multiple sclerosis susceptibility.15, 16, 17
In parallel with these efforts, we completed a GWAS based on 12 374 non-synonymous single nucleotide polymorphisms (nsSNPs) typed in 975 patients and 1466 controls. This GWAS was performed as part of The Wellcome Trust Case–Control Consortium (WTCCC)18 (A full list of WTCCC contributors is provided in the Supplementary information online). Although this study failed to identify any unequivocally associated loci, other than those in linkage disequilibrium (LD) with DR15, the most highly ranked markers implicated a number of promising candidate genes. Through extending the analysis of the most promising 17 of these loci, we have now identified another non-MHC susceptibility locus for multiple sclerosis – the tyrosine kinase 2 (TYK2) gene.
Materials and methods
Samples independent of those considered in our WTCCC nsSNP GWAS were collected from Australia, Belgium, Norway, the United Kingdom (UK) and the United States of America (USA). A population-specific breakdown together with patient demographics is shown in Table 1. All patients were diagnosed according to the recognised clinical criteria.19, 20 DNA was extracted using standard methods and all samples were collected with informed written consent and appropriate ethical approval.
In this analysis, we included the eight SNPs identified in our WTCCC GWAS with a P-value<0.001 and a further 10 SNPs selected from amongst those with a P-value<0.01 in the GWAS that were from genes that we considered to be logical candidates for multiple sclerosis. To maximise efficiency and limit unnecessary expenditure, genotyping was performed in two stages. In the first screening stage, we typed all 18 SNPs in 1481 patients, 1879 controls and 806 trio families (from the Belgian, Norwegian and UK1 populations). SNP genotyping was performed using Applied Biosystems TaqMan methodology according to the manufacturer's recommended conditions. A combined analysis of the screening phase data and the original WTCCC data was then completed. In stage 2 of the study, the top five markers identified from this analysis were then followed up in a larger replication set of 2753 patients, 1104 controls and 1247 families from the US, Australian and UK2 (North West UK and South Wales) populations. Power calculations indicate that in both the screening and replication phases, we have greater than 80% power to find association with a nominal (uncorrected) P-value of 0.05 assuming a common risk allele (10% frequency) with an odds ratio of 1.3 (multiplicative model).21 By combining the original WTCCC data with the screening and replication phase data, in total 5213 patients, 4453 unrelated controls and 2053 trio families were considered in the analysis of the top five markers.
The genotyping success rate, Hardy–Weinberg equilibrium and heterozygosity for each marker were established using PEDSTATS.22 Mendelian errors in the trio families were identified through PEDCHECK.23 In total, five Mendelian errors were identified, one for rs1133400 and four for rs7522061. The case–control data sets were tested for heterogeneity using the Breslow–Day test and a Cochran–Mantel–Haenszel test completed (treating each population specific set of samples as a separate stratum) using StatsDirect statistical software. As no significant heterogeneity was identified between the populations, the raw genotypes were combined and an association analysis was completed using UNPHASED v3.08.24
The 18 SNPs were first tested in the screening phase set of samples. A full list of the SNPs and the results in the screening phase set are shown in Supplementary Table S1. Initial testing showed that the marker rs11080149 (from the OMG gene) deviated significantly from Hardy–Weinberg equilibrium. As this variant lies in a region of potential copy number variation, the assay was considered to be unreliable and the marker was therefore excluded from further analysis. All other markers were in Hardy–Weinberg equilibrium.
An analysis of the original WTCCC data with the screening phase data was completed and the top five SNPs were chosen for further investigation (see Supplementary Table S1). These markers were followed up in the replication data set and a combined analysis of all the available data (screening and replication set samples) was then completed for these five SNPs (Table 2). The population specific data for the top five markers is available in Supplementary Table S2. In this larger analysis, no marker showed deviation from Hardy–Weinberg equilibrium and the average genotyping success rate across these five markers was 99.1% (range 98.7–99.4%). There was no statistically significant difference in the genotyping failure rate between cases and controls for any of these markers.
Using the Breslow–Day test, we found no statistically significant evidence for heterogeneity between the different population cohorts or the WTCCC study for any of the top five SNPs. For this reason, it is unsurprising that the results from a Cochran–Mantel–Haenszel analysis showed no meaningful difference from the results obtained, when treating all the data as if they had been collected from a single population.
The most significant association that we observed was with rs6897932 from the IL7R gene (P=1.3 × 10−7). As this marker is already confirmed as a susceptibility locus11, 12 in multiple sclerosis it provides a useful positive control for our study and illustrates that screens based on nsSNPs are capable of identifying genuine associations.
The next most strongly associated marker was rs34536443 from exon 21 of the TYK2 gene (P=2.7 × 10−6). As with the IL2R and IL7R, the common allele increases risk at rs34536443. This risk allele has a frequency of 95.3% in the background population and increases risk with an odds ratio of 1.32 (1.17–1.47). In a stratified analysis of the individual replication cohorts, the trend for association was in the same direction for all cohorts, with the only exception being the Belgian population (the smallest cohort analysed).
In the final analysis of all available data, evidence for association was either substantially reduced or largely uninfluenced for each of the other markers studied. One marker retained modest evidence for association (rs11554159 from IFI30, P=2.2 × 10−4) and may represent a genuine association of smaller effect, but very large sample sizes will be needed to confirm such an association.
This study reports the first attempt to explore the results that emerged from our WTCCC nsSNP GWAS. Through the typing of selected markers in a large independent cohort of cases, controls and trio families, we have shown evidence of a novel association with rs34536443 in the TYK2 gene.
Tyrosine kinase 2, part of the Janus family of tyrosine kinases, has a crucial role in signal transduction for a wide range of cytokines, including type 1 interferons, IL10 and IL12.25, 26 TYK2 deficiency in humans results in severe combined immunodeficiency (SCID),27 with reduced Th1 (and possibly Th17 immune responses) and increased production of Th2 cytokines.25 TYK2 variants have been associated with susceptibility to the autoimmune disease, systemic lupus erythematosus.28, 29 The detrimental autoimmune response generated by TYK2 is further shown in Tyk2-deficient mice that are shown to be resistant to autoimmune arthritis.30 The SNP rs34536443 is located in exon 21 and codes either a conserved proline (major allele) or alanine at position 1104, in the tyrosine kinase domain. 1104A is predicted to be ‘probably damaging’ based on the program Polymorphism Phenotyping (PolyPhen) and ‘intolerant’ based on Sorting Intolerant from Tolerant (SIFT). The shift to a Th1 pro-inflammatory immune response in multiple sclerosis is believed to lead to cell destruction and consequent pathology. The reduced TYK2 function of the 1104A variant could lead to a shift to a protective Th2 response, and/or reduced Th1 or Th17 activation. As the minor allele frequency is higher in the control population, this further supports the possibility that the rs34536443 minor allele may be protective.
IFI30 or IFNγ-inducible lysosomal thiol reductase (GILT) is a prime candidate gene in multiple sclerosis as it has a vital role in antigen processing through MHC class II and up-regulated expression on macrophages and microglia in active demyelinating lesions in multiple sclerosis.31 GILT is expressed constitutively in antigen presenting cells and is induced by inflammatory cytokines, such as IFNγ, TNFα and IL1β in other cell types.32 Studies in mice have shown that GILT deficient T cells show stronger T-cell activation and increased proliferation, suggesting a role for GILT in modifying the immune response by controlling T-cell activation.33 Although the strength of association for rs11554159 was reduced compared with the initial nsSNP screen (despite the trend for association being in the same direction for all cohorts analysed), modest evidence for association remained. As initial findings tend to overinflate the effect size, our replication cohort may be too limited in power to confirm this association. However, given these findings combined with the involvement of IFI30 in the immune response, this gene warrants further analysis in a much larger cohort.
In summary, we have identified a novel association with the TYK2 gene. The power to detect the small genetic effect attributed to this SNP has been crucially dependent on the large sample sizes analysed in this study. The consequence of the rs34536443 SNP on TYK2 function and expression in patients with multiple sclerosis needs to be explored, together with the role this gene may have in autoimmunity in general.
We thank members of the Association of British Neurologists for notifying families. We acknowledge contributions from the International Multiple Sclerosis Genetics Consortium. This work was supported by the Medical Research Council (G0700061) and the National Institute of Health (NS 049477-01A1). We acknowledge use of DNA from the British 1958 Birth Cohort collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. The Norwegian Bone Marrow Donor Registry is acknowledged for collaboration in the establishment of the Norwegian control material. AG is a Postdoctoral Fellow, EB a Research Assistant, and BD a Clinical Investigator of the Research Foundation Flanders (FWO-Vlaanderen). ARL and HFH are supported by The Research Council of Norway (166005/V5), the Ullevål University Hospital Scientific Advisory Council and by the Odd Fellow MS society. This study makes use of data generated by The Wellcome Trust Case–Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by The Wellcome Trust under award 076113 and reported in Nature 2007; 447; 661–78.
About this article
StatsDirect statistical software – http://www.statsdirect.com.
PolyPhen – http://genetics.bwh.harvard.edu/pph/.
Canonical and Non-Canonical Aspects of JAK–STAT Signaling: Lessons from Interferons for Cytokine Responses
Frontiers in Immunology (2017)