Contribution of IKBKE and IFIH1 gene variants to SLE susceptibility

The type I interferon system genes IKBKE and IFIH1 are associated with the risk of systemic lupus erythematosus (SLE). To identify the sequence variants that are able to account for the disease association, we resequenced the genes IKBKE and IFIH1. Eighty-six single-nucleotide variants (SNVs) with potentially functional effect or differences in allele frequencies between patients and controls determined by sequencing were further genotyped in 1140 SLE patients and 2060 controls. In addition, 108 imputed sequence variants in IKBKE and IFIH1 were included in the association analysis. Ten IKBKE SNVs and three IFIH1 SNVs were associated with SLE. The SNVs rs1539241 and rs12142086 tagged two independent association signals in IKBKE, and the haplotype carrying their risk alleles showed an odds ratio of 1.68 (P-value=1.0 × 10−5). The risk allele of rs12142086 affects the binding of splicing factor 1 in vitro and could thus influence its transcriptional regulatory function. Two independent association signals were also detected in IFIH1, which were tagged by a low-frequency SNV rs78456138 and a missense SNV rs3747517. Their joint effect is protective against SLE (odds ratio=0.56; P-value=6.6 × 10−3). In conclusion, we have identified new SLE-associated sequence variants in IKBKE and IFIH1, and proposed functional hypotheses for the association signals.


INTRODUCTION
Systemic lupus erythematosus (SLE) is a clinically heterogeneous disease, which affects multiple organ systems and causes significant morbidity and mortality. 1 About 8-15% of the genetic component of SLE is accounted for by DNA sequence variants identified by association studies in European populations. 2 These findings have underscored the crucial role of the innate immune system, and particularly of the type I interferon (IFN) system in SLE. 3,4 Genes involved in the type I IFN system such as STAT4, IRF5, TNFAIP3, TNFSF4 and TNIP1, [5][6][7][8] have been replicated as SLE susceptibility genes in multiple ethnical groups. Previously, we have provided evidence of association for two additional type I IFN system genes, namely inhibitor of kappa light polypeptide gene enhancer in B cells kinase epsilon (IKBKE) and IFN induced with helicase C domain 1 (IFIH1). 2,9 IKBKE encodes IKKe, which belongs to the IkB kinase family. Its expression is restricted to lymphoid tissues and peripheral blood leukocytes. 10,11 IKKe is activated downstream of Toll-like receptors or cytosolic RNA/DNA sensors, and mediates phosphorylation and activation of the transcription factors, for example, IFN regulatory factors 3 and 7, or activation of the nuclear factor of kappa B (NF-kB) transcription factor system. [12][13][14] Both signalling cascades can trigger the expression of type I IFN and other pro-inflammatory cytokines. Recently, IKBKE was also found to be associated with rheumatoid arthritis 15 and antibodypositive primary Sjö gren's syndrome. 16 IFIH1, also known as melanoma differentiation-associated protein 5, is a cytosolic RNA sensor belonging to the retinoic acid-inducible gene I-like receptor family. IFIH1 is expressed in most cell types at low levels, and its expression can be largely enhanced by exposure to IFNs or viruses. 17,18 When recognizing viral nucleic acids, IFIH1 triggers activation of IFN regulatory factor 3, 7 and NF-kB via mitochondrial antiviral signalling protein, leading to antiviral and pro-apoptotic responses. 19 Genetic association studies have demonstrated an important role for IFIH1 in autoimmune diseases such as type I diabetes, 20 rheumatoid arthritis, 21 multiple sclerosis, 22 Graves' disease 23 and psoriasis. 24 With the aim of identifying functional sequence variants of relevance for SLE in IKBKE and IFIH1, we resequenced the genes including exonic and intronic regions in pools of SLE patients and controls from Sweden. The identified sequence variants together with imputed variants were subsequently tested for association with SLE in a large Swedish case-control cohort. With this strategy, our study explored the variation spectrum of IKBKE and IFIH1, and elucidated their contribution to the association to SLE.

Resequencing and variant identification
The IKBKE and IFIH1 genes, including exonic, intronic and untranslated regions were amplified by long-range PCRs in a pool of 100 SLE patients or of 100 controls, and then sequenced for comprehensive identification of potentially functional sequence variants for SLE. In total, 91 high-quality single-nucleotide variants (SNVs) in IKBKE and 138 high-quality SNVs in IFIH1 were identified. About 70% of these SNVs were already archived in the Short Genetic Variations database dbSNP v137, whereas 70 novel sequence variants were discovered. Of the identified SNVs, 10 were located in exonic regions, 4 in untranslated regions, one at a splice site and the remaining ones were intronic variants.

SLE association analysis
To limit the number of SNVs for genotyping but still maintain our discovery potential, we selected SNVs found by resequencing based on their potentially functional effect and allele frequencies in patients and controls. Three SNVs (rs1539243 and rs17433930 in IKBKE, 9 and rs1990760 in IFIH1 2,25 ) associated with SLE and one SNV (rs35732034 in IFIH1 20 ) associated with type I diabetes in previous studies were also added to the genotyping panel. In total, 86 SNVs were genotyped in an extended cohort of 1140 SLE patients and 2060 controls from Sweden. In addition, 108 highquality sequence variants that were imputed based on our genotyped SNVs and data from the 1000 Genomes Phase I (build 37, release March 2012) were included in the association analysis. In all, 10 SNVs in IKBKE and 3 SNVs in IFIH1 showed significant association signals with SLE, of which the associations of 12 SNVs have not been reported before (Table 1 and Supplementary  Table 1). The minor alleles of nine of the associated SNVs in IKBKE conferred a moderate risk for SLE (odds ratio ¼ 1.12-1.33). The most profound P-values were from the SNVs rs12142086 (Pvalue ¼ 2.1 Â 10 À 4 ) and rs2297545 (P-value ¼ 5.5 Â 10 À 4 ), which are located in intron 13 and exon 8 of IKBKE, respectively. Their association signals were retained after permutations. For IFIH1, the minor alleles of the associated SNVs and particularly the lowfrequency SNV rs78456138 appeared to confer protection against SLE. This SNV is located in intron 11 of IFIH1, and its T allele (minor allele frequency (MAF) ¼ 0.01-0.02) showed an odds ratio as low as 0.56 (P-value ¼ 0.0059). The association of the previously reported common SLE SNV rs1990760 was also replicated. Although we could not detect a significant association signal from the SNV rs17433930 in IKBKE, the effect of the minor allele was in the same direction as reported in our previous study. 9 None of the novel SNVs in IKBKE or IFIH1 identified by resequencing in this study or the diabetes-associated rare SNV rs35732034 in IFIH1 (MAFs ¼ 0.0005-0.01 after genotyping) gave any significant association signal, which might be due to the limited statistical power (Supplementary Figure 1).
Independence test and analysis for the joint effect of risk alleles To dissect the relationship between the associated SNVs and seek for independent association signals, we calculated pairwise linkage disequilibrium (LD) values between the SLE-associated SNVs for both genes (Supplementary Table 2). For IKBKE, the two SNVs at the 5 0 -gene region, rs11117858 and rs1539241, had an r 2 value of 0.60 and a D 0 value of 1.00. The SNV in exon 8 rs2297545, and the four SNVs spanning intron 13 and intron 15, rs11118087, rs12409804, rs12142086 and rs11118132 also showed a high correlation (r 2 ¼ 0.45-1.00; D 0 ¼ 0.93-1.00). For IFIH1, strong LD was observed between the SNVs rs3747517 and rs1990760 (r 2 ¼ 0.65; D 0 ¼ 1.00), but not for the SNV rs78456138 (r 2 ¼ 0.04-0.06; D 0 ¼ 1.00).
The results from pairwise conditional logistic regression analysis were in accordance with the correlation patterns among SNVs within each gene (Supplementary Table 3). For IKBKE, when conditioning on the most significantly associated SNV rs12142086, the association signals from the other SNVs disappeared, except for the two SNVs at the gene 5 0 -region; whereas at least marginal significant association P-values remained for rs12142086 when conditioning on the other SNVs, except for the SNV rs12409804, which is in high LD with rs12142086. In addition, when conditioning on the two SNVs at the gene 5 0 -region, rs11117858 Contribution of IKBKE and IFIH1 gene variants to SLE C Wang et al and rs1539241, most of the association signals from the SNVs located in the other gene regions persisted. Conditioning on the other SNVs, significant associations from these two SNVs at the gene 5 0 -region were retained in most cases; whereas they lost their significant P-values when conditioned on each other. These observations indicate that there are two independent association signals for the IKBKE SNVs in our study. One originates from the 5 0 -region of the gene, which is more preferably tagged by the SNV rs1539241, and the other from the remaining gene regions, which is most likely to be accounted for by the SNV rs12142086. Similarly, two independent association signals were also revealed for IFIH1. The first one was from the low-frequency SNV rs78456138, and the second one from the two common SNVs rs3747517 and rs1990760 located in the exonic regions of IFIH1.
Further analysis was conducted to investigate the possible joint effect of the independent risk or protective alleles in either gene ( Table 2). For IKBKE, the haplotype composed of the risk alleles of rs1539241 and rs12142086 had a frequency of about 5% in our sample collection, and conferred a disease risk as high as 1.68 (P-value ¼ 1.0 Â 10 À 5 ). In contrast, the haplotype composed of their respective protective alleles, which was the most abundant haplotype in the population, showed a protective effect against SLE (odds ratio ¼ 0.82, P-value ¼ 9.3 Â 10 À 4 ). The association signals from these two haplotypes were also retained after permutations. For IFIH1, the association of the haplotype composed of the minor alleles of the SNVs rs78456138 and rs3747517 was comparable to that of rs78456138 alone (odds ratio ¼ 0.56), which indicated that the T allele of the low-frequency SNV rs78456138 appeared to dominate the protective effect; whereas the most abundant haplotype of these two SNVs render a weak risk for SLE (odds ratio ¼ 1.14, P-value ¼ 0.026). Robust association was only observed for the minor allele haplotype after permutations.
To investigate the SLE association for the rare SNV alleles (MAFso0.01), we also examined the combination of all rare alleles in IKBKE or IFIH1 using the sequence kernel association test, which largely increases the statistical power for detecting potential effect of rare variants. 26 No significant association was observed for the combinations of all rare SNV alleles with MAFs o0.01 (data not shown).

Functional implications for associated SNVs
Considering that the associated SNVs may affect interactions between DNA and protein factors, we compared the 30-bp flanking sequences of these SNVs with canonical protein binding motifs in databases. The major allele T of rs12142086, which was  the strongest associated SNV in IKBKE, was found to be located at a highly stringent nucleotide in the binding motif of splicing factor 1 (SF1; P-value ¼ 4.8 Â 10 À 6 , expected number of false positives (E-value) ¼ 8.0 Â 10 À 3 ; Figure 1a). Evidence was provided experimentally by electrophoretic mobility shift assays (EMSAs), which showed that the major allele (T) of rs12142086 had a stronger binding potential with certain protein factors in nuclear extract compared with the minor allele (C; Figure 1b). Subsequently, we demonstrated that SF1 was involved in the allelic differential binding by super-shift assays using antibodies against SF1 (Figure 1c). SF1 has been reported to have regulatory roles in transcription. 27,28 As SF1 is also known to facilitate the recognition of splice sites and the assembly of the early spliceosomal complex with pre-mRNA, 29 we also compared the sense-and antisense-RNA sequences bearing the major (U/A) or minor (C/G) allele of rs12142086 using gel shift assays. The sense-RNA sequence bound more prominently to protein factors in the nuclear extract than the antisense-RNA sequence (Supplementary Figure 2a). However, we did not observe any differential binding of protein factors to the U and C allele of rs12142086, or any super-shift signal from a possible interaction with SF1 antibodies (Supplementary Figure 2b).

DISCUSSION
Prompted by the rapid development of sequencing technologies, deep resequencing has become an increasingly popular approach for studying disease candidate genes, especially when integrated with association analysis for identifying causal mutations, which can account for previously known association signals. In this study, we explored the spectrum of sequence variants in IKBKE and IFIH1, and identified 12 SNVs as novel risk factors for SLE. In addition, we confirmed the association of one SNV in IFIH1, which has been reported previously. The association signals of all 13 SLEassociated SNVs we identified are supported by statistical power of over 60% (Supplementary Figure 1). Most of them are common in the population (MAFs ¼ 0.15-0.39 after genotyping) and have modest effects. Two independent association signals were detected from IKBKE, which can be tagged by the SNVs rs1539241 and rs12142086, respectively. Association analysis of the haplotype composed of the risk alleles of these two SNVs reveals a potential synergic effect. We further examined an extended genomic region adjacent to IKBKE based on the data from the HapMap project (Supplementary Figure 3a) and found that IKBKE is located in a relatively isolated LD block. The LD between either of the two tagging SNVs in IKBKE and SNVs located in the neighbouring genes is relatively low (r 2 o0.2), supporting the potential functional importance of IKBKE in SLE. The SNV rs12142086 is of particular interest, as our in vitro assays demonstrated that its minor allele may influence the transcriptional regulatory function of SF1 by impairing its binding motif. In addition, we showed that rs12142086 was unlikely to affect the binding of SF1 at the RNA level. Indeed, the pre-mRNA sequence bearing rs12142086 lacks the canonical binding motif of SF1, as well as the polypyrimidine tract recognized by the U2 snRNP auxiliary factor 65 kDa subunit (U2AF65), which functions as a cofactor with SF1 in the spliceosomal complex. 30 The case with IFIH1 is more complicated. The minor alleles of the two missense SNVs, rs3747517 and rs1990760, are both predicted to have benign effects on the protein. The impact of rs1990760 has been extensively investigated. Its risk allele is associated with a lower IFN-a level, but enhanced IFN-induced gene expression in the SLE patients who carry anti-dsDNA antibodies. 31 In addition, this allele is associated with a higher expression level of the IFIH1 gene itself. 25 IFIH1 is located in a high LD region that is shared by the genes fibroblast activation protein alpha (FAP) and grancalcin EF-hand calcium binding protein (GCA) (Supplementary Figure 3b). FAP is an important factor in tumour development and the wound-healing processes. 32 It is inducible by tumour necrosis factor-a, 33 and overexpression of FAP has been reported in rheumatoid arthritis 34 and Crohn's disease. 35 The expression of GCA is restricted to neutrophils and monocytes, 36 in which GCA participates in leukocyte-specific host defence. 37 Nevertheless, none of the other variants in this LD block provide a more plausible functional hypothesis than the two missense SNVs in IFIH1, rs3747517 and rs1990760.
No clear functional hypotheses could be proposed for the two intronic SNVs (rs11117858 and rs1539241) at the 5 0 -region of IKBKE or the intronic SNV (rs78456138) in IFIH1, which tag independent association signals for their respective genes in this study. Particularly, the low-frequency SNV rs78456138, which gives the strongest association signal in IFIH1 in our study, may tag other independent functional variants in the high LD region.
Resequencing of targeted genomic regions in pools of individuals is a promising strategy for detecting novel rare alleles. 38 However, a proportion of the SNVs with differences in allele frequencies between patients and controls according to resequencing were not replicated in our follow-up association analysis of a larger number of samples. Despite that no clear difference was observed in the distribution of the local read coverage depth for the SNVs between the SLE patients and the controls (Supplementary Figure 4), the different pooling strategies for patients and controls might be one source of bias for accurate estimation of allele frequencies. The stochastic experimental factors during PCR amplification and second generation sequencing may also contribute to the inconsistent findings between the two stages. In addition, the coverage depth along the entire surveyed regions was not perfectly consistent, and the low local coverage depth at certain amplicons could have affected the identification of sequence variants (Supplementary Figure 4). The imbalance of local coverage depth may be due to the imperfect pooling of equal amounts of amplicons, or due to the properties of the DNA sequences, which may have caused bioinformatic mis-alignment. Improved methods for target selection that avoid PCR, as well as increased sequencing accuracy and capacity, together with large-scale barcoding of individual samples are likely to facilitate direct association analysis through resequencing in the future. The accuracy in estimating allele frequencies based on the counts of sequence reads may be improved by increasing the number of subjects and the depth during the sequencing stage. 39 However, considering the cost for sequencing, association analysis based on genotyping in large populations is still indispensable for genetic studies. In particular thanks to the recent large-scale projects on human genetic variation, imputation has become a powerful tool for comprehensive investigation of sequence variants based on a limited number of genotyped loci. Indeed, in our study many single-nucleotide polymorphisms located in regions of low coverage depth were retrieved using imputation.
In conclusion, we have identified new sequence variants in IKBKE and IFIH1 that are associated with SLE, dissected the contribution of the associated variants to the risk of SLE, and provided functional hypothesis to one independent disease association signal in IKBKE. More exhaustive investigation of sequence variants, especially low-frequency and rare variants in these two genes within a larger cohort, or with subjects of different origins may help to pinpoint any unidentified functional variants. Systematic functional studies of the disease-associated variants are required for a better understanding of the mechanism underlying their association with the risk of SLE.

SUBJECTS AND METHODS Subjects
A total of 100 Swedish SLE patients and 100 controls for which there was sufficient amount of DNA available were selected for targeted sequencing. Genotyping was performed in an extended cohort of 1140 SLE patients

Sequence data analysis and variant calling
Sequence reads were aligned to their target regions in the human reference genome (GRCh37/hg19), and potential SNVs were called using the software Maq v.0.7.1, 42 with manual inspection of the sequence reads at called positions. For IKBKE and IFIH1, 65% and 60% of the reads aligned to the reference sequences, respectively. For both genes, 480% of the amplified regions were covered by at least 10 sequence reads. The median coverage depth for the reads was 42000 from SLE patients and 41000 from the controls, which corresponded to 420and 410-fold coverage per individual.
For further analysis, we retained SNVs that were located at high-quality base positions (quality score X28) and that were supported by at least 10 reads for the minor allele. In addition, SNVs for which the number of reads for the minor alleles was less than twice the reads for the background were removed. The MAFs of the called SNVs were estimated by counting the number of reads, which carried either allele. Any SNV with a MAF o0.005, corresponding to 1 out of 200 chromosomes, was also filtered out. Putative SNVs were classified as novel or known by referencing to the Short Genetic Variations database dbSNP v137 (http://www.ncbi.nlm. nih.gov/projects/SNP/). Annotations for all identified SNVs were retrieved from the SeattleSeq Annotation server (http://gvs.gs.washington.edu/ SeattleSeqAnnotation).

Genotyping
SNVs were selected based on the following criteria: non-intronic SNVs (n ¼ 15); SNVs called exclusively in SLE patients or in controls (n ¼ 144); and SNVs occurring in both sample groups with an allele frequency difference X0.03 (n ¼ 19). A total of 86 SNVs in IKBKE and IFIH1 were included for genotyping with custom GoldenGate Genotyping assays using VeraCode microbeads on the Illumina BeadXpress system (Illumina). A subset of 72 SNVs passed the following quality filters applied to exclude poor genotyping assays from the downstream analysis: SNVs or samples with 410% missing data (n ¼ 13) and SNVs with genotypes, which deviated from Hardy-Weinberg equilibrium (P-value o0.001, n ¼ 1) in controls.

Imputation
Imputation was performed based on the genotyped 72 SNVs, which passed quality filters using the software IMPUTE2, 43 with sequence variants from the 1000 Genomes Phase I (build 37, release March 2012) as reference. We kept imputed sequence variants with information scores 40.8, and imputed genotypes with confidence probability 40.9. The imputed sequence variants with 410% missing genotypes in all individuals and the ones deviating from Hardy-Weinberg equilibrium (P-value o0.001) in controls were excluded from downstream analysis.

Statistical analysis
Using the software PLINK, 44 allelic association analysis for individual SNVs was performed by comparing the allele counts in cases and controls with Fisher's exact tests, with 10 6 permutations to control for the rare SNVs and multiple testing involved in this study. D 0 -and r 2 -based pairwise LD values between associated SNVs were calculated using the genotype data from the controls. Independence of association signals between SNVs was tested with conditional logistic regression analysis under an additive model. Haplotype association analysis was performed using logistic regression analysis, also with 10 6 permutations, all using PLINK. For the rare SNVs (MAFs o0.01), we examined their joint effects in IKBKE or IFIH1 using the R package sequence kernel association test (SKAT) v0.63 26 with default parameters. Statistical power was estimated using the software Quanto v1.2.4 (http://hydra.usc.edu/gxe/) based on the total number of genotyped SLE patients and controls, with a log-additive model, a twosided type I error rate of 0.05 and a Swedish SLE prevalence of 0.068%. 45 Functional prediction Potential binding sites for protein factors were predicted by comparing the sequences around each SNV, including 10-bp upstream and downstream of the variation site, with known motifs in the JASPAR CORE 2009 database 46 and the TRANSFAC database 7.0 47 using the software Motif Comparison Tool (TOMTOM) v4.7.0. 48 A match with a P-value o1 Â 10 À 3 and an expected number of false positives (E-value) o0.05 was called as significant. Effects of the non-synonymous SNVs were predicted using the software Structure SNP (StSNP). 49 Electrophoretic mobility shift assay DNA and RNA EMSA were performed according to the instructions from the LightShift Chemiluminescent DNA/RNA EMSA Kit (Thermo Scientific Pierce Protein Research, Rockford, IL, USA). Four pairs of 31-nt long singlestranded DNA oligonucleotides (Integrated DNA Technologies, Leuven, Belgium) with the T/A or C/G allele of the IKBKE SNV rs12142086 positioned in the centre, and either 5 0 -biotinylated or unlabelled, were annealed to generate double-stranded DNA probes. The 5 0 -biotinylated single-stranded RNA probes (Integrated DNA Technologies) shared base compositions with the single-stranded DNA oligonucleotides, corresponding to the forward and reverse strand bearing either the U/C or the A/G allele. Nuclear extract was prepared from human peripheral blood mononuclear cells from a healthy blood donor using the NE-PER Nuclear and Cytoplasmic Extraction Reagents Kit (Thermo Scientific Pierce Protein Research). For super-shift assays, nuclear extract was incubated with SF1 antibody (sc-48792 X) (Santa Cruz Biotechnology, Dallas, TX, USA) before the binding reactions.