Gene–gene interactions are proposed as an important component of the genetic architecture of complex diseases, and are just beginning to be evaluated in the context of genome-wide association studies (GWAS). In addition to detecting epistasis, a benefit to interaction analysis is that it also increases power to detect weak main effects. We conducted a knowledge-driven interaction analysis of a GWAS of 931 multiple sclerosis (MS) trios to discover gene–gene interactions within established biological contexts. We identify heterogeneous signals, including a gene–gene interaction between CHRM3 (muscarinic cholinergic receptor 3) and MYLK (myosin light-chain kinase) (joint P=0.0002), an interaction between two phospholipase C-β isoforms, PLCβ1 and PLCβ4 (joint P=0.0098), and a modest interaction between ACTN1 (actinin alpha 1) and MYH9 (myosin heavy chain 9) (joint P=0.0326), all localized to calcium-signaled cytoskeletal regulation. Furthermore, we discover a main effect (joint P=5.2E−5) previously unidentified by single-locus analysis within another related gene, SCIN (scinderin), a calcium-binding cytoskeleton regulatory protein. This work illustrates that knowledge-driven interaction analysis of GWAS data is a feasible approach to identify new genetic effects. The results of this study are among the first gene–gene interactions and non-immune susceptibility loci for MS. Further, the implicated genes cluster within inter-related biological mechanisms that suggest a neurodegenerative component to MS.
Multiple sclerosis (MS) is a complex autoimmune disorder characterized by demyelination and scar tissue formation within the central nervous system. Axonal loss and progressive neurodegeneration lead to impaired neurological function and diminished quality of life. The major histocompatibility complex region of chromosome 6 is consistently associated with MS risk,1, 2 and several additional immunological and inflammatory loci have recently been implicated in MS susceptibility. However, all known susceptibility loci combined account for far less than 50% of the estimated heritability.
Epistasis or gene–gene interaction has been promoted as an important part of complex disease etiology because of the monumental complexity of biological systems. In addition, when explicitly modeled, it can increase power to detect the independent effects of susceptibility loci.3, 4 However, key challenges of interaction analysis are the biological interpretation of statistical results and the task of resolving functional relationships between variants from multiple loci across the genome. With these challenges in mind, we assessed gene–gene interactions within established biological contexts based on multiple knowledge sources, such as pathway and ontology databases. This analysis implicates several new functionally related MS risk loci, including three interaction effects centered on calcium signaling, representing a neurodegenerative genetic component to MS etiology.
Results and discussion
For our analysis, we examined gene–gene interactions in a screening data set followed by three validation sets. The screening phase was conducted in two stages to maximize information gain from the analysis. Using the trio study design of our screen data set, we examined the transmission of alleles to the affected offspring. This stage revealed 5463 models (consisting of 5965 single-nucleotide polymorphisms (SNPs)) with model fit statistic (MF) P<0.001 and likelihood ratio test statistic (LR) P<0.001, indicating an effect from simultaneous over transmission of alleles at two separate loci. The majority of these models contain distinct SNPs, however some are recurrent across multiple models. The top recurring SNP was involved in 35 models, and most occurred in less than 5 models. Also, notably the majority of models do not contain the known main effect of the major histocompatibility complex region, with only 474 of 5463 containing markers from chromosome 6 where the major histocompatibility complex resides. The joint transmission of alleles to offspring could also be due to chromosomal linkage if the two alleles are in close enough proximity that recombination events between the two loci are infrequent. To minimize false positives due to this possibility and to provide additional evidence for interaction, we compared the probands from the 931 families with 2950 controls from the Wellcome Trust Case Control Consortium.5 Using standard logistic regression on the proband/control set, we reduced the 5463 models to 326 models (MF P<0.001 and LR P<0.001) containing 469 SNPs. For all validation sets, we established α=0.05 for both MF and LR tests as our replication criteria.
In validation set I, 306 of the 326 models contain all SNPs available after quality control procedures and were evaluated. Twenty multi-locus models had significant model fit (MF P<0.05) and direction of coefficients consistent with the screening data set, representing nine gene–gene combinations (Supplementary Table 1).
Although this approach incorporated previous biological knowledge, it was somewhat unexpected that of hundreds of thousands of gene–gene relationships considered, four of the nine models with statistically consistent results point to two interconnected biological mechanisms—a calcium-signaled change in cytoskeleton dynamics. Of these four models, two had a significant interaction component (LR P<0.05). To examine and validate the role of this biological mechanism in MS susceptibility, we explored the four functionally related models (Table 1) in two additional data sets, validation set II and III. Validation set III was genotyped with a different platform from the other studies, so surrogate SNPs were selected with close physical proximity and high r-squared based on the Hapmap CEU data.
The results from validation sets II and III indicate heterogeneous weak effects from variants within this biological mechanism (Table 2). In validation set II, model 1 had a nonsignificant model fit (MF=0.0595), and there was no evidence of interaction (LR=0.7664). In validation set III, model 1 also had a nonsignificant model fit (MF=0.0748) but with a significant interaction term (LR=0.0366). Models 2, 3 and 4 had nonsignificant P-values (MF and LR) in both validation sets, and although model 1 was not significant, it did show consistently low P-values (P<0.1) across all evaluated data sets.
As models 1 and 2 did not show evidence of interaction in validation set I, and as interaction analysis can enhance the detection of main effects,4 we hypothesized that significant models identified in the screening phase could consist of weak effects of independent SNPs. Within validation set I, the significant model fit of models 1 and 2 is driven by main effects of rs1009150 within MYH9 (myosin heavy chain 9) (P=0.0022) and rs2240571 within SCIN (scinderin) (P=0.0004). In validation set II, rs2240571 shows a main effect (P=0.0073). In validation set III, rs6118378 of PLCβ1 (phospholipase C-β1) (surrogate for rs6516415) has a significant main effect (P=0.0391).
Combining all data sets with directly typed SNPs (excluding validation set III) for a joint analysis, all models have significant MF P-values (α=0.05), and all but model 1 have significant LR P-values (α=0.05). In this joint analysis, SCIN SNP rs2240571 appears as a significant main effect (P=5.2e−5), and its interaction with CYFIP1 SNP rs8025779 is nonsignificant (LR P=0.0677).
rs2240571 is located approximately 400 base pairs upstream from the SCIN gene on chromosome 7. In some data sets, SCIN statistically interacts with cytoplasmic fragile X mental retardation 1-interacting protein 1, or CYFIP1.
SCIN is a calcium-dependent actin-severing protein.5 In addition to binding calcium, SCIN has phosphatidylinositol 4,5-bisphosphate binding sites that likely have a role in its activity. SCIN-mediated disassembly of the cortical actin cytoskeleton may regulate translocation of secretory vesicles to facilitate neurotransmitter release.6 SNPs in the SCIN gene, curiously, did not show a significant main effect from the transmission disequilibrium test (P=0.1026), Cochran–Mantel–Haenszel test (P=0.3762) or additive logistic regression analysis (P=0.2014) of the screen data. In the screen data set, there was approximately 48% power (α=0.001) to detect an additive effect of 1.213 (the point estimate for the SNP from the overall data set), and an effect from rs2240571 was only seen when modeled with rs8025779, a SNP in the CYFIP1 gene. Assessing the joint effect of these SNPs allowed the detection of the weak main effect from the SCIN SNP, either because of a true interaction effect that was oversampled in the screen data or perhaps because of subtle changes in allele frequency that prevented detection of the interaction in the validation sets.7 The effect from SCIN was seen in all data sets, appearing as a main effect in two data sets and as an interaction with CYFIP1 in two others. Though the nature of the interaction effect was inconsistent, the direction of the effect from SCIN was consistent in all models, and in the overall set of samples SCIN appeared as a main effect.
rs17106421 is a SNP located 6 kb upstream of actinin alpha 1 (ACTN1) on chromosome 14, and rs1009150 is an intronic SNP in MYH9 located on chromosome 22. ACTN1 and MYH9 function in the formation of actin stress fibers and cytoskeletal contraction. ACTN1 has a key role in phosphoinositide-3-kinase-induced cytoskeletal reorganization,8 and has brain-specific splicing isoforms.9 Variations in the MYH9 gene have been implicated in a wide variety of disorders, including HIV-associated nephropathy, hypertension and other kidney diseases,10 and hearing loss.11
rs528011 is an intronic SNP located on chromosome 1 in the muscarinic cholinergic receptor 3 (CHRM3) gene between exons 4 and 5. This SNP statistically interacts with rs4677905, an intronic SNP in the myosin light-chain kinase (MYLK) gene located on chromosome 3. These genes are related by the calcium signaling pathway, and also have a role in regulation of the actin cytoskeleton (KEGG pathways: ko04810 and ko04020).
CHRM3 is a G-protein-coupled receptor that binds acetylcholine, freeing the G-protein complex to activate phospholipase C (PLC) isoforms, generating inositol 1,4,5-triphosphate (IP3). IP3 binds to the IP3 receptor to release Ca2+ ions from intracellular stores. MYLK is activated by this downstream intracellular calcium release.12 CHRM3 is lost in astrogliotic MS lesions,13 and acetylcholinesterase inhibitors reduce the clinical severity of experimental autoimmune encephalomyelitis, the mouse model of MS.14 MYLK mediates myosin II motor activity responsible for actin cytoskeleton contraction, which is upregulated during axon regeneration,15 and has a role in axon retraction and regeneration mechanisms.16 Differential expression of MYLK is seen in astrocytes from glaucoma patients, and appears to be part of a collection of genes involved in neurodegenerative processes.17 As the IL-2 receptor (IL2R) has been implicated in MS by multiple studies,18, 19, 20 it is noteworthy that intracellular calcium release also triggers multiple downstream events, including IL-2 production.21
rs4816129 in the intronic region of the phospholipase C-β4 (PLCβ4) gene is located on chromosome 20, and statistically interacts with intronic SNP rs6516415 in the PLCβ1, located approximately 200 kb upstream. Despite being in close physical proximity, these two SNPs are not in linkage disequilibrium in any of the data sets. PLCβ4 and PLCβ1 function together in multiple KEGG pathways, including the calcium signaling pathway, Wnt signaling and inositol phosphate metabolism (KEGG: ko04020, ko04310 and ko00562).
PLCβ1 and PLCβ4 are two isozymes in the larger phospholipase C family,22 which hydrolyze phosphatidylinositol 4,5-bisphosphate to produce IP3 and diacylglycerol. Notably, diacylglycerol activates various protein kinase C (PRC) isoforms. Further, PLCβ1 and PLCβ4 are expressed in the central nervous system,23, 24 and model systems illustrate a role for both isoforms in proper conduction of nerve signals25 (Figure 1).
The genes identified fall among the first susceptibility loci for MS ostensibly involved with the central nervous system and neuron function. Previous studies have identified numerous genes implicated primarily in the autoimmune inflammatory process.26, 27, 28, 29 The recent analysis of Baranzini et al.30 identified general patterns of significance in axon guidance and neurogenesis pathways using the entire GWAS data from Validation set III, but did not specifically identify any of the genes found in this study. From these results, we identify calcium-signaled cytoskeleton regulation as potential neurodegenerative mechanism for MS risk.
Using basic logistic regression procedures, we identified three models that have a non-additive interaction component while significantly contributing to MS risk, and further identify a main effect that was undetected by single-locus analysis of the GWAS data, confirming the principle that interaction analysis can improve detection of weak main effects.4 Methodologically, this study is among the first to apply knowledge-based interaction analysis to genome-wide association data,30, 31, 32, 33, 34, 35 and illustrates that restricting evaluation of two-locus interaction models to those with established biological contexts is a viable strategy.
Materials and methods
The International Multiple Sclerosis Genetics Consortium (IMSGC) genotyped 931 parent-affected child trios for ∼500,000 SNPs using the Affymetrix Mapping 500K SNP chip (Affymetrix, Santa Clara, CA, USA). In all, 334,923 SNPs passed quality control procedures.29 Validation set I consisted of 808 MS cases and 1720 controls ascertained through the Partners MS Center in Boston, Massachusetts and genotyped using the Affymetrix 6.0 platform. From this panel, 453 of the 469 SNPs passed quality control procedures, and these were used to assess the significant models from the genome-wide screen. Validation set II consists of an independent set of 2330 MS cases and 2110 controls ascertained from Brigham and Women's Hospital in Boston, MA, University of California San Francisco, Washington University, the Accelerated Cure Project out of Massachusetts, Rush University of Chicago and Cambridge University in the United Kingdom. Eight SNPs were genotyped for this study using the Sequenom MassARRAY iPLEX (Sequenom Inc., San Diego, CA, USA) genotyping platform. Validation set III consists of an independent set of 875 MS cases and 903 controls ascertained as part of the multicenter collaborative GeneMSA study, involving the University of California San Francisco, Vrije Universiteit Medical Center in Amsterdam, University Hospital Basel and Glaxo Smith Kline. This study typed samples using the Illumina Sentrix HumanHap550 BeadChip (Illumina, San Diego, CA, USA) genotyping platform6. As the initial studies used Affymetrix SNP panels, surrogate SNPs were selected from the Illumina platform with close physical proximity and high r-squared value based on the Hapmap CEU data.
Pairwise linkage disequilibrium (LD) statistics computed for over two million SNPs by the International HapMap Project (posted June 2006) were used to establish the Caucasian-specific haplotype block boundary for each of the 334,923 SNPs in the IMSGC data set. We defined the boundaries of the haplotype block represented by each IMSGC SNP using an iterative procedure that extends the block boundary sequentially (by SNP) if the D’ measure between the HapMap SNP and the IMSGC SNP is equal to 1. As the IMSGC SNPs are a subset of all known genomic variants, using HapMap LD statistics in this way provides the larger genomic region (which may harbor susceptibility variants) represented by each IMSGC SNP. In all, 5137 markers in the IMSGC data set were not represented in the HapMap LD data, and the nearest HapMap marker was used as a surrogate to assess haplotype block boundaries. Marker-gene mappings were generated if a haplotype block overlaps with any portion of a gene as described by the Ensembl database. IMSGC markers capture 14,236 genes using LD, compared to 13,425 using the markers without accounting for LD.
Using a collection of public data sources that suggest putative gene–gene interaction, we generated a set of roughly 20 million two-SNP models.36 These sources include the Kyoto Encyclopedia of Genes and Genomes, the Protein Families database, the Gene Ontology, Reactome, the Database of Interacting Proteins, NetPath, the Genetic Association Database, previous regions of suspected linkage for MS, hand-selected candidate genes and genes showing differential expression in MS. These resources define gene categories, such as a Gene Ontology term or a KEGG pathway. As the Gene Ontology is a hierarchical resource with some broad categories, Gene Ontology terms used for model generation were restricted to those containing 30 genes or less. For the collection of genes within each category, two-SNP models were exhaustively generated by selecting LD-mapped SNPs (using the LD-Spline procedure above) from two different genes of the category. Models containing two SNPs within the same gene were avoided to prevent the assessment of haplotype effects.
Models were evaluated using conditional logistic regression37 in the trio analysis, and logistic regression for case–control analysis. Regression models contained three terms, which include the additive main effect of each of two SNPs and a multiplicative interaction term. Two test statistics are generated for each model: MF describing the likelihood of the specified model given the data, and LR comparing the full model to a reduced model containing only main effect terms. A significant likelihood ratio test indicates that including an interaction term significantly improves the fit of the model. We required both statistics to be significant, further constraining our results set to models with evidence of non-additivity, consistent with Fisher's description of epistasis.38
This research was funded by the NIH grants 1R01 LM010040-01 and 5R01 NS049477-05. This study makes use of data generated by the WTCCC.
About this article
Supplementary Information accompanies the paper on Genes and Immunity website (http://www.nature.com/gene)
Genetic Epidemiology (2015)