Genome-wide association studies (GWASs) perform per-SNP association tests to identify variants involved in disease or trait susceptibility. However, such an approach is not powerful enough to unravel genes that are not individually contributing to the disease/trait, but that may have a role in interaction with other genes as a group. Pathway analysis is an alternative way to highlight such group of genes. Using SNP association P-values from eight multiple sclerosis (MS) GWAS data sets, we performed a candidate pathway analysis for MS susceptibility by considering genes interacting in the cell adhesion molecule (CAMs) biological pathway using Cytoscape software. This network is a strong candidate, as it is involved in the crossing of the blood–brain barrier by the T cells, an early event in MS pathophysiology, and is used as an efficient therapeutic target. We drew up a list of 76 genes belonging to the CAM network. We highlighted 64 networks enriched with CAM genes with low P-values. Filtering by a percentage of CAM genes up to 50% and rejecting enriched signals mainly driven by transcription factors, we highlighted five networks associated with MS susceptibility. One of them, constituted of ITGAL, ICAM1 and ICAM3 genes, could be of interest to develop novel therapeutic targets.
Multiple sclerosis (MS) is a common inflammatory and demyelinating disease of the central nervous system.1 Epidemiological studies have proved the multifactorial causes of the disease, resulting from the interaction between genetic factors and currently unknown environmental factors.2
To date, MHC genetic variants, as well as 110 non-MHC variants, have been associated with MS susceptibility by genome-wide association studies (GWASs).3, 4, 5, 6, 7 However, it has been estimated that several other genes contribute to disease susceptibility and are yet to be identified.8
Several hypotheses have been proposed to explain the missing heritability.9, 10 One of them points to the interactions between genes involved in the same biological pathway.11, 12 According to this hypothesis, some of the genes implicated in MS susceptibility cannot on their own reach GWAS significance but when grouped in a pathway may collectively contribute to the MS genetic component.
The initial event leading to the development of an MS lesion is blood–brain-barrier (BBB) disruption and the crossing of the latter by peripherally activated T cells.13 This step, which is a key mechanism in our current understanding of MS physiopathology, requires the interaction of integrins, cell surface molecules expressed by T cells, with the adhesion molecules expressed by the BBB endothelium, ultimately allowing the lymphocytes transmigration into the brain.14, 15, 16, 17
The importance of the BBB crossing in MS physiopathology is stressed by the use of treatments active on adhesion molecules. For example, Natalizumab, a monoclonal antibody blocking the interaction between VLA-4 (very late antigen-4) molecule (composed of ITGB1, integrin beta 1 and integrin alpha 4 (ITGA4), integrin subunits) and its receptor VCAM1 (vascular cell adhesion molecule-1), is among the most efficient therapies available for MS patients thus far.18, 19 The role of adhesion molecules in MS has also been reinforced by the demonstration that the expression of ALCAM (activated leukocyte cell adhesion molecule) located at the surface of the BBB endothelium was increased during the inflammatory process in MS patients. In the same study, it has been shown that in vitro expression of ALCAM, ICAM1 (intercellular cell adhesion molecule-1) and VCAM1 was dependent on activation of the human BBB endothelial cells by pro-inflammatory cytokines (tumor necrosis factor (TNF) and interferon-gamma (IFN-γ)).20
Moreover, genetic studies pointed out the importance of adhesion molecules in the susceptibility to the disease.21, 22 In 2003, we reported that a rare haplotype of the ICAM1 gene (intercellular adhesion molecule-1, expressed at the surface of endothelial cells) was under-transmitted to patients, suggesting a protective effect of this haplotype.21 In 2009, a pathway meta-analysis combining two GWASs uncovered genetic variants that interacted in the cell adhesion molecule (CAM) pathway potentially contributing to MS development.22 This new strategy led to the identification of new genes that do not significantly contribute to MS susceptibility individually, but have an important role in interaction with other genes of the biological pathway. An earlier protein interaction network (PIN)-based pathway analysis considered all the SNPs with a P-value <0.05, without any prior analysis on gene function. That study showed significant interactions among adhesion-molecule-coding genes including ICAM1, ITGB2 (integrin beta 2), ITGAM (integrin alpha M), ITGA6 (integrin alpha 6), CD58 (CD58 molecule), CD2 (CD2 molecule) and CD4 (CD4 molecule). This observation, coming from a meta-analysis of two independent GWAS, suggests that the biological network of adhesion molecules is involved in MS susceptibility.
In order to decipher the role of the CAM pathway in MS susceptibility, we conducted a candidate pathway analysis focusing on adhesion molecules. We found five networks of the CAM pathway enriched in low P-values for genes interacting synergistically to confer MS susceptibility.
CAM pathway-selected genes
Using KEGG database and the Sabiosciences website, we identified 76 genes of interest involved in the adhesion molecule pathway. More than 85% of them (66 out of 76) were involved in three subpathways: adhesion, adherens junctions and tight junctions, and 10 of them encode for transcription factors highly involved in adhesion molecule genes regulation (Supplementary Table 1).
Using data from eight GWAS studies, we selected calculated P-values of identified CAM genes (±1 kb from the 5′ and 3′ UTR). Seven GWASs (IMSGC UK, IMSGC US, ANZGene, GeneMSA DU, GeneMSA SW, GeneMSA US and BWH/MIGEN) constituted the D1 data set.4 The 8th GWAS Wellcome Trust Case Control Consortium 2 (WTCCC2) constituted the D2 data set3 (Figure 1).
A total of 70 out of the 76 genes were represented by at least one SNP in at least one of the seven data sets included in D1, whereas 68 out of the 76 genes were represented by at least one SNP in the WTCCC2 data set (D2 data set). Six out of the 76 genes of interest were thus excluded from further analysis. The number of available SNPs per gene and per data set is given in Supplementary Table 2. D1 and D2 gene-wise P-values computed using VEGAS software and Fisher test for D1 (Materials and methods section and Figure 1) are given in Table 1. As expected, CD58 (PD1=1.89 × 10−6, PD2=0 (meaning that P<89 × 10−6, Materials and methods section)), NFKB1 (PD1=0.0125, PD2=0.0011) and STAT3 (PD1=2.94 × 10−5, PD2=1.5 × 10−5) genes that were previously identified as associated with MS in GWAS show low gene-wise P-values.
Interestingly, VCAM1, previously identified as associated with the disease, with rs11581062 being the most statistically significant SNP, did not reach a significant P-value (PD1=0.7224; PD2=0.993) either in the D1 or in the D2 data sets. The published associated SNP (rs11581062) is 202 kb far from the VCAM1 gene. We conclude that even if VCAM1 is a good candidate gene for MS physiopathology, not enough evidence support its involvement within the rs11581062 region. Furthermore, no CIS-e-QTL regulating VCAM1 expression was described within the 1-Mb region around the gene (Pritchard lab resources).
Subnetworks associated with MS susceptibility
As described by IMSGC,23 we used a curated human PIN data set, which consisted in a network of more than 400 000 interactions among ∼25 000 proteins. In order to be confident with the interactions, we selected those quoted in at least two publications, and as a result we obtained a reduced human PIN data set of 8920 proteins and 27 724 interactions. Using Cytoscape software, we attributed gene-wise P-values to 70 of the 76 CAM genes (listed in Supplementary Table 1). As mentioned in IMSGC,23 Cytoscape plugin jActives modules was used to calculate a global score (Z-score) for all the possible networks that could be generated from the PIN data set, using D1 and D2 P-values. Networks with Z-scores >3.0 are generally considered significant, that is, these subnetworks are enriched in CAM genes showing D1 and/or D2 gene-wise significant P-values. The cytoscape software was applied on the reduced human PIN data set, containing CAM genes (Supplementary Table 1 and Figure 2) and non-CAM ones. Sixty-four networks were highlighted regarding the enrichment of CAM genes with significant P-values. Focusing on the process of BBB transmigration by T cells, we only considered six networks as relevant for our study, with at least 50% of CAM genes. Finally, we excluded networks that were only enriched in CAM transcription factors to prevent from non-specific association signals only driven by transcription factors (Table 2 and Figure 2). We identified five networks constituted by genes known to interact together within the CAM pathway and meeting all expressed criteria (Figure 3).
GWAS and replication studies have successfully identified approximately 110 non-MHC MS susceptibility genes.3, 4, 5, 6, 7 However, an important part of genetic heritability remains to be discovered.3 One of the hypotheses that can be put forward postulates that genes without individual effect could influence the susceptibility to the disease through genetic interactions.11 Here, we report the first candidate pathway analysis on the CAM network in MS. Our results highlight five networks enriched in CAM genes with significant P-values, reflecting a potential genetic contribution of these molecules in MS susceptibility.
In 2009, Baranzini et al.22 identified seven genes of the CAM pathway as associated with MS susceptibility considering their interaction. Focusing on the CAM pathway and using two powerful datasets (5545+9772 patients and 12 153+17 376 controls), our study refines the role of these genes as MS genetic factors.
Two out of the five identified networks (network 2 and 27) appear to be the highest contributors to MS physiopathology. Network 2 is constituted by ICAM1, ICAM3 and ITGAL genes that are involved in the adhesion process of the T cells on the endothelial cells of the BBB. The adhesion process between endothelial and T cells is one of the earliest events in MS development leading to T-cell transmigration into the central nervous system across the BBB. Of note, ITGAL, ICAM1 and ICAM3 molecules constitute ligand–receptor complexes, ITGAL interacting both with ICAM1 and with ICAM3. Our hypothesis on the role of these genes in MS susceptibility is that some intra- and inter-gene combinations of polymorphisms could influence the adhesion process of T cells on the BBB, leading to a positive or negative modulation of the inflammatory cells’ flow into the central nervous system.
Network 27 is constituted by CD82 (KAI1), ITGA4, ITGB1, ITGB2 and HLA-DMA genes. The ITGB1 and ITGA4 genes code for the two subunits of the very late antigen-4 molecule. CD82 is notably known for its inhibitory effect on ITGB1 activation.24 The effective role of CD82 in inflammatory response has not been yet investigated. In the MS context, it would be relevant to analyze the development and the severity of experimental autoimmune encephalomyelitis (MS animal model) in CD82-deficient mice. As CD82 initiates the differentiation of oligodendrocyte precursors into mature myelinating cells,25 quantification of remyelination in this model would be of interest.
Interestingly, we identified two networks containing the ITGA4 molecule, which is the target of Natalizumab. The interaction between VLA-4 and its receptor VCAM1 is known to be crucial for the transmigration of T cells across the BBB, and our results may potentially underline the genetic basis for this treatment’s effectiveness.
In conclusion, our results highlighting the role of CAM genes’ interactions in MS susceptibility could be of high interest to identify new targets for efficient treatments for MS. A monoclonal antibody directed against ITGA4 that prevents its interaction with VCAM1 already exists.18 We propose that targeting ITGAL or ICAM1 or ICAM3 in order to block their interactions could also be an efficient way to prevent the crossing of the BBB by T cells.
Materials and methods
Selection of genes coding for CAM pathway molecules
Selection of the genes belonging to the network was performed in three phases: using KEGG (Kyoto Encyclopedia for Genes and Genomes) database (http://www.genome.jp/kegg/)26, 27 we searched for adhesion molecules involved in tight and adherens junctions between BBB endothelial cells, and for those involved in T-cell transmigration. Using KEGG database we then identified molecules interacting directly with adhesion molecules. In a last step, we selected the 10 transcription factors that are most involved in adhesion molecule-coding gene expression and regulation using the SabioSciences website (http://www.sabiosciences.com) through GeneCards database (http://www.genecards.org/).28 Using this three-step strategy, we compiled a list of 76 genes of interest.
Eight data sets were used in this study. Seven data sets (IMSGC UK, IMSGC US, ANZGene, GeneMSA DU, GeneMSA SW, GeneMSA US and BWH/MIGEN) are described in Patsopoulos et al.4 and constitute the D1 data set. The 8th data set consists in the MS GWAS published by the IMSGC3 and WTCCC2 in 2011 and constitutes the D2 data set. As the IMSGC-WTCCC2 data set is to date the most powerful GWAS published in MS, we decided to use this data set separately to avoid signal losses.
Gene-wise P-value computation
With data from eight GWASs, we computed an individual gene-wise P-value corresponding to the association of each gene with MS in each GWAS using VEGAS software29 (Figure 1). VEGAS assigns SNPs to each of 17 787 autosomal genes according to positions on the UCSC Genome Browser (hg18 assembly). For the capture of regulatory regions and SNPs in LD, gene boundaries are defined as ±50 kb of each gene. VEGAS takes into account LD patterns between markers within a gene by using Monte–Carlo simulations from the multivariate normal distribution on the basis of the LD structure of a set of reference individuals (the HapMap2 CEU (Utah residents with ancestry from northern and western Europe from the CEPH collection) population). In VEGAS, the number of simulations per gene is determined adaptively. In the first stage, 103 simulations are performed. If the resulting empirical P-value is <0.1, 104 simulations are then performed. If the empirical P-value from 104 simulations is <0.001, the program will perform 106 simulations. At each stage, the simulations are mutually exclusive. For computational reasons, if the empirical P-value is 0 then no more simulations will be performed. An empirical P-value of 0 from 106 simulations can be interpreted as P<10−6, which exceeds a Bonferroni-corrected threshold of P<2.8 × 10−6.
To combine the gene-wise P-values across the seven data sets described above (named D1), we applied Fisher’s method for each of the 70 selected genes. The gene-wise P-values from the WTCCC2 data set were used separately as a second independent data set (named D2) (Figure 1).
PIN-based pathway analysis
As described by IMSGC23 in 2013, we integrated data from a curated human PIN data set in Cytoscape software.30 Cytoscape plugin jActives modules was used to calculate a global score for each network enriched in CAM genes, showing D1 and/or D2 gene-wise significant P-values. Using successive filters, we identified networks enriched in low P-Values with at least 50% of CAMs genes and less than 50% of transcription factors (Figure 2).
This study was supported by the Institut National de la Santé et de la Recherche Médicale, the Fondation pour la Recherche sur la Sclérose En Plaques (ARSEP), the Association Française contre les Myopathies, GIS-IBISA and ICM Carnot Institute. The research leading to these results has received funding from the program ‘Investissements d’avenir’ ANR-10-IAIHU-06. We thank ICM, CIC Pitié-Salpêtrière, Généthon, BRC-REFGENSEP’s and IMSGC’s members for their help and support as well as Jorge Oksenberg and Pierre-Antoine Gourraud. VD received a travel grant from the Fondation ARSEP and ICM Carnot Institute. Philip L De Jager is a Harry Weaver Neuroscience Scholar of the National MS Society. SEB is a Harry Weaver Neuroscience fellow from the US National MS Society. This investigation was supported (in part) by a Postdoctoral Fellowship from the National Multiple Sclerosis Society to Nikolaos A Patsopoulos (FG 1938-A-1).
About this article
Supplementary Information accompanies this paper on Genes and Immunity website (http://www.nature.com/gene)
Journal of Neuroimmunology (2017)