Introduction

Multiple sclerosis (MS) is a complex condition characterized by demyelination in the central nervous system and progressive neurological dysfunction.1 The importance of genetic factors in MS was recognized early in the study of this disease, in which a significantly elevated recurrence risk in biological relatives of affected individuals was observed.2, 3 The major histocompatibility complex (MHC) region makes the single, strongest genetic contribution to MS susceptibility.4 In addition, recent genome-wide association studies (GWAS) have identified 110 non-MHC single-nucleotide polymorphisms (SNPs) that influence the risk of MS.5 However, it is unclear how and in which cell types these risk variants exert their functional effects in the causal cascade of MS.

Chromatin is defined as the combination of DNA and nuclear proteins that regulate the expression of our genetic material. Chromatin profiles are highly cell specific and account for the large number of different cell types present in the human body. The Encyclopedia of DNA elements (ENCODE) project has recently profiled a variety of chromatin states, including regulatory regions (enhancers and promoters), repressed regions, heterochromatin (densely packed chromatin), insulator sites, transcribed regions and repetitive/copy number variation in a number of human cell types.6 Similarly, DNase I hypersensitivity sites (DHSs) are highly cell specific and indicate regions of open chromatin that regulate gene expression through binding of transcription factors.7, 8 ENCODE researchers have also recently mapped DHSs across a variety of immune cell types, providing further insights into gene regulation in these cells.7

We have shown that GWAS data explain a considerable proportion (approximately 30%) of the phenotypic variance between MS cases and controls.9 The aim of this study was to investigate which chromatin states and cell types explain most of this variance by integrating GWAS and chromatin profiling data. This analysis will advance the understanding of how genes influence disease risk and which cell types have a part in the causal pathways to MS.

Materials and methods

Data acquisition

We used genotypic data on 475 806 SNPs from 1854 cases and 5164 controls from the United Kingdom produced by the International Multiple Sclerosis Genetics Consortium and the Wellcome Trust Case Control Consortium.5 The chromatin profiles of immortalized B lymphocytes (lymphoblastoid cell lines (LCLs)), hepatocellular carcinoma cells (HepG2) and normal epidermal keratinocytes (NHEKs) were obtained from the ENCODE project.6 Briefly, chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) and expression data were used to identify different classes of chromatin states: active promoter, weak promoter, poised promoter, strong enhancer (SE), weak enhancer, polycomb repressed, heterochromatic (HC), insulator, strongly transcribed, weakly transcribed and repetitive/copy number variation.6 DHSs maps for CD4+ T helper type 1 (Th1) cells, CD4+ Th17 cells, CD8+ T cells, CD19+ B cells and control HepG2 cells were also obtained from the ENCODE project.7

Data analysis

All genotyped SNPs were grouped based on their location within chromatin states and DHSs for each cell type. Estimates of the proportion of phenotypic variance between MS cases and controls explained by chromatin state and DHS-specific SNPs were calculated using the Genome-wide Complex Trait Analysis tool (http://gump.qimr.edu.au/gcta).10 Heritability on the observed scale (proportion of phenotypic variance owing to additive genetic effects) was first estimated via residual maximum-likelihood analysis. The estimate was then transformed on the liability scale as previously described and assuming a disease prevalence of 0.001.10, 11 Association analysis of GWAS SNPs was conducted using PLINK (http://pngu.mgh.harvard.edu/purcell/plink/).12 The potential functional effects of associated SNPs on gene expression was assessed using RegulomeDB, which is a large database of expression quantitative trait loci and predicted regulatory elements from a variety of cell types including lymphoblastoid cell lines (LCLs).13

Results

Calculating variance by each chromatin state for each cell type

Given the well-established immunological nature of MS, we hypothesized that a large proportion of variance would arise from regions of active chromatin in LCLs, and thus calculated the variance explained by each chromatin state in this cell type (Table 1).

Table 1 Percentage of variance explained by each chromatin state in LCLs, together with the size of each chromatin state within the genome

We found that SNPs lying in HC explained most of the phenotypic variance between MS cases and controls (8.71%, s.e.=2.12%). This is expected as HC represents a very large proportion of the genome (72.5%), and the vast majority of the genotyped SNPs are located within HC (352 578 out of 475 806). However, apart from HC, SE elements in LCLs accounted for the largest proportion of variance (5.39%, s.e.=0.83%), despite their small representation in the genome (1.7%). Given that most of the genetic risk of MS was driven by the MHC region (9.08%, s.e.=3.4%), we repeated the analysis after removing all MHC SNPs. In this scenario, the percentage variance explained by SNPs in SE regions was still the highest (after HC) at 4.40%, s.e.=0.82% (Supplementary Table 1).

To show that regulatory regions specifically active in LCLs but not in other cell types were responsible for genetic risk in MS, we calculated the variance explained by each chromatin state for two additional control cell types that were unrelated to MS etiology (hepatocytes (HepG2) and keratinocytes (NHEKs)) (Figure 1). Once again, for the control cell lines, SNPs lying in HC explained most of the phenotypic variance between MS cases and controls. However, in contrast to LCLs, for both the HepG2 and NEHK cell lines, SE accounted for a minimal amount of the variance (0% and 2.18%, respectively), and actively repressed SNPs (within polycomb repressed regions) accounted for a very large contribution to the variance (8.80% and 5.87%, respectively).

Figure 1
figure 1

Bar graph showing the variance explained by each chromatin state for LCLs, HepG2 and NEHK cell lines.

The relevance of regulatory elements specifically active in LCLs appeared even clearer when we calculated the ratio of variance explained by each chromatin state between LCLs and the average of both control cell lines (Figure 2). SE SNPs in LCLs explained a proportion of variance that was 4.94 times higher than that explained by SE SNPs in HepG2 and NHEK cell lines, followed by weak promoter (4.19 times) and active promoter SNPs (4.09 times). This supports our hypothesis that active chromatin states that regulate gene expression in LCLs account for a considerable proportion of phenotypic variance between MS cases and controls.

Figure 2
figure 2

Bar graph showing the ratio of the variance explained between LCLs and the average of NHEK and HepG2 for each chromatin state.

Finding significant genetic associations within SE elements

We then reasoned that by testing only SNPs located within SE elements for genetic association, we could restrict the analysis to only those variants that are more likely to influence MS risk. This would reduce the significance threshold required for multiple testing correction and therefore increase our statistical power to detect associations. We performed an association test for all SNPs located within SE regions in LCLs and found four SNPs with suggestive association with MS (Table 2).

Table 2 SNPs located within SE elements in LCLs with suggestive evidence of association with MS

The only SNP that survived Bonferroni correction (corrected P<0.05) was rs727263 on chromosome 13 and located within the gene UBAC2. Among the other SNPs, there were two genetic variants that have been previously associated with MS (rs12927773) and celiac disease (rs12928822). These two SNPs are in strong linkage disequilibrium (r2=0.997); therefore, they likely represent one single association signal. Interestingly, another SNP located within the same genomic region (rs7200786, within CLEC16A) is known to be associated with MS.5 However, rs12927773 and rs12928822 are not in linkage disequilibrium with rs7200786 (r2=0.003 for both SNPs) and the association of both rs12927773 and rs12928822 remained significant when the analysis was conditioned on the genotype at rs7200786 (rs1292773: odds ratio (OR)=0.81, 95% confidence interval (CI)=0.73–0.90, P=3.8E−05; rs12928822: OR=0.81, 95% CI=0.73–0.89, P=3.6E−05). We investigated whether any of these SNPs or SNPs in strong linkage disequilibrium (r2>0.9) with them had been previously associated with gene expression using RegulomeDB. A total of 115 SNPs were tested but no expression quantitative trait loci for any gene in any cell type was identified.

Estimating phenotypic variance contributed by cell-specific active SNPs

The chromatin profile is highly cell specific and can considerably vary between different components of the immune system. Therefore, we used DHSs data for a variety of immune cell types and grouped all genotyped SNPs based on their location relative to these cell specific DHSs. We found that the proportions of variance explained by SNPs located within DHSs of CD19+ B cells, Th1 cells, CD8+ T cells, Th17 cells and HepG2 cells were 11.98% (s.e.=0.82%), 10.11% (s.e.=0.81%), 9.79% (s.e.=0.79%), 7.53% (s.e.=0.73%) and 4.01% (s.e.=0.72%), respectively.

We reasoned that many of the SNPs located within immune DHSs were shared across more than one cell type. Therefore, after pooling DHSs from Th1 and Th17 into a single Th cell group, we grouped SNPs into those that were located within: (1) only CD19+ B DHSs; (2) only Th DHSs; (3) only CD8+ T DHSs; (4) Th and CD19+ B but not CD8+ T DHSs; (5) CD8+ T and CD19+ B but not Th DHSs; (6) Th and CD8+ T but not CD19+ B DHSs; (7) Th, CD8+ T and CD19+ B DHSs. The proportion of phenotypic variance explained by SNPs located in only CD19+ B DHSs (3.75%, s.e.=0.58%) was higher than that explained by SNPs in only CD8+ T DHSs (0.92%, s.e.=0.39%) and Th DHSs (2.53%, s.e.=0.59%). However, the highest proportion of variance was explained by SNPs located in DHSs that were shared across all cell types (3.83%, s.e.=0.61%) (Figure 3).

Figure 3
figure 3

Proportion of phenotypic variance explained by SNPs within (1) only CD19+ B DHSs; (2) only Th DHSs; (3) only CD8+ T DHSs; (4) Th and CD19+ B but not CD8+ T DHSs; (5) CD8+ T and CD19+ B but not Th DHSs; (6) Th and CD8+ T but not CD19+ B DHSs; (7) Th, CD8+ T and CD19+ B DHSs.

Discussion

MS is a complex disorder of unknown etiology. We have shown that SNPs accounting for most of the genetic risk associated with MS are located within regulatory regions that are specifically active in immune cells. In particular, the proportion of variance explained by active enhancer and promoter elements in LCLs was almost five times higher than was observed in the two control cell types. In contrast, a large proportion of variance was explained by SNPs located within repressed genomic regions of the two control cell types, suggesting that MS risk variants are likely to be functionally inactive in non-immune cells. Even after removing variants falling within the MHC region, SE in LCLs still accounted for a significant percentage of the genetic risk of MS. This indicates an important role for genetic variation within SEs and thus gene expression in influencing the risk of MS. It is interesting to find that this effect is not confined to the MHC but is homogeneously distributed across the genome.

By testing for association of only SNPs located within SE elements in LCLs, we were able to reduce the number of tests performed and identified four SNPs that were suggestive of association with MS. One of them (rs727263) was located within UBAC2, a gene that has been previously associated with the risk of Behcet’s disease.14, 15 Two additional SNPs located on chromosome 16 showed suggestive association with MS (rs12928822 and rs12927773). These two SNPs are in strong linkage disequilibrium with each other and their association is independent of another confirmed MS susceptibility locus in the same genomic region (rs7200786 within CLEC16A).5 Notably, rs12928822 and rs12927773 have been previously associated with the risk of MS and coeliac disease.16, 17 Several genes are located near rs12928822 and rs12927773 including PRM1, PRM2, SOCS1 and TNP2. Interestingly, SOCS1 is involved in the suppression of cytokine signaling required for downregulation of immune cell function and therefore represents a plausible candidate.18 However, these associations would need to be replicated in an independent cohort of individuals before they can be considered established MS-associated loci.

SNPs located within immune-specific DHSs explained a larger proportion of variance than SNPs located in DHSs of a non-immune cell type such as HepG2. Interestingly, we found that among different immune cell types, the proportion of variance explained by CD19+ B-cell-specific SNPs was greater than that explained by Th1, Th17 and CD8+ cytotoxic T-cell-specific SNPs. These results emphasize the role played by B cells in the pathogenesis of MS. The most common immunological finding in MS patients is the presence of IgG oligoclonal bands in their cerebrospinal fluid and this finding lends support to the presence of an abnormal B-cell activation within their central nervous system.19 Furthermore, B-cell abnormalities influence both conversion to clinically definite MS, MRI activity, onset of relapses and disease progression.20, 21, 22, 23, 24, 25 Finally, clinical trials have shown that antibody-mediated depletion of B cells is highly effective in diminishing MRI activity and onset of clinical relapses.26, 27 However, the largest proportion of variance was explained by those SNPs located within DHSs shared across all these cell types. This finding further highlights the complexity of this disease and suggests that the etiology of MS is unlikely to be driven by a single cell type.

To conclude, we have used a novel approach to integrate functional genomics and GWAS data, and it is shown that SNPs located within regulatory elements active in immune cells (particularly in B and T cells) explain a large proportion of the phenotypic variance between MS cases and healthy controls. Genetic variants that influence the risk of MS are therefore likely to act by changing the chromatin landscape and influencing the expression of neighboring genes. Similar analyses in other immunological cell types relevant to MS and functional studies are required to further elucidate how MS-associated genetic variants exert their effects in the causal cascade. This approach is likely to yield more specific and effective treatments in the future than what is currently available.