Abstract
Systemic lupus erythematosus (SLE) has a strong but incompletely understood genetic architecture. We conducted an association study with replication in 4,478 SLE cases and 12,656 controls from six East Asian cohorts to identify new SLE susceptibility loci and better localize known loci. We identified ten new loci and confirmed 20 known loci with genome-wide significance. Among the new loci, the most significant locus was GTF2IRD1-GTF2I at 7q11.23 (rs73366469, Pmeta = 3.75 × 10−117, odds ratio (OR) = 2.38), followed by DEF6, IL12B, TCF7, TERT, CD226, PCNXL3, RASGRP1, SYNGR1 and SIGLEC6. We identified the most likely functional variants at each locus by analyzing epigenetic marks and gene expression data. Ten candidate variants are known to alter gene expression in cis or in trans. Enrichment analysis highlights the importance of these loci in B cell and T cell biology. The new loci, together with previously known loci, increase the explained heritability of SLE to 24%. The new loci share functional and ontological characteristics with previously reported loci and are possible drug targets for SLE therapeutics.
Main
SLE is a debilitating autoimmune disease characterized by pathogenic autoantibody production that can affect virtually any organ. Asians have higher SLE incidence, more severe disease manifestations and greater risk of organ damage (for example, lupus nephritis)1,2 than European-ancestry populations. SLE has a strong genetic component, constituting a sibling risk ratio3 (λs) of ∼30, with ∼40 susceptibility loci reported through candidate gene studies and genome-wide association studies (GWAS)4,5,6. However, only 8–15% of disease heritability7,8 is accounted for, leaving many contributing loci unidentified. Because multiple susceptibility loci are shared across autoimmune diseases and studying high-risk populations can facilitate the identification of new risk loci, we performed high-density association analysis in East Asians.
Our study was conducted in three stages (Fig. 1 and Online Methods). First, after quality control, we performed association analysis based on the Immunochip9 in 2,485 cases and 3,947 controls from Korean (KR), Han Chinese (HC) and Malaysian Chinese (MC) populations and identified 578 associated regions (P < 5 × 10−3) (Supplementary Fig. 1 and Supplementary Tables 1–3). To increase statistical power, we included 3,669 out-of-study Korean controls (Supplementary Fig. 2 and Supplementary Table 1) and conducted imputation-based association analysis (Online Methods). Second, we followed up 16 newly associated loci with Pdiscovery-meta < 5 × 10−5 in three replication cohorts: one Japanese cohort (JAP) and two independent cohorts of Han Chinese ancestry from Beijing (BHC) and Shanghai (SHC). We identified ten new loci associated at genome-wide significance (Pmeta < 5 × 10−8; Figs. 2 and 3, Table 1, Box 1 and Supplementary Fig. 3) and six new suggestive loci (Supplementary Table 4). Third, we used a series of bioinformatic analyses, including two recently developed Bayesian-based tests10,11 (Online Methods), to identify the most likely functional variants at each locus. Because the lead SNPs might not be functional, we examined SNPs in high linkage disequilibrium (LD; r2 > 0.8) with them. Variants were annotated using Encyclopedia of DNA Elements (ENCODE)12 and blood expression quantitative trait locus (eQTL) data13. We estimated the proportion of the heritability and sibling risk (λs) explained by new and known SLE-associated loci.
This study followed three stages. In stage 1, we genotyped three Asian cohorts of patients with SLE and controls and identified 578 regions with P < 5 × 10−3. Next, we performed imputation-based fine-mapping, association tests and conditional analysis on the quality-controlled data. We identified 16 statistically independent loci (P < 5 × 10−5) for replication. In stage 2, we performed an in silico replication of these 16 loci in an independent Japanese (JAP) cohort and two independent Han Chinese cohorts from Shanghai (SHC) and Beijing (BHC). We identified new regions represented by ten replicated SNPs that passed the genome-wide significance threshold (P < 5 × 10−8). In stage 3, we performed integrated functional and interaction analyses of SLE-associated loci.
New significant loci are highlighted in red, suggestive loci are highlighted in blue and previously known SLE-associated loci are highlighted in black. The red line represents the threshold for genome-wide significance (P = 5 × 10−8), and the blue line represents the threshold for suggestive evidence of association (P = 5 × 10−5).
We identified ten new loci in the KR, HC and MC cohorts that were replicated in at least two independent cohorts. A partial discovery meta-analysis is presented in the middle of the plot, and the overall meta-analysis is presented below the replication cohorts. Diamonds are used to represent the meta-analysis odds ratios; 95% confidence intervals are represented for each cohort as tickmarks. KR, Korean; HC, Han Chinese; MC, Malaysian Chinese; JAP, Japanese; BHC, Beijing Han Chinese; SHC, Shanghai Han Chinese.
The strongest new signal (Pmeta = 3.75 × 10−117, ORmeta (95% confidence interval (CI)) = 2.38 (2.22–2.56)) was at rs73366469 between two 'general transcription factor' genes14, GTF2I and GTF2IRD1 (Supplementary Table 5). Surprisingly, this signal was much stronger than variants in the human leukocyte antigen (HLA) region. Notably, rs117026326 within GTF2I (92 kb from rs73366469) was recently identified as a major risk locus for primary Sjögren's syndrome, another autoimmune disease, in Han Chinese15 and southern Chinese16. Two recent Sjögren's syndrome GWAS15,17 showed substantial overlap with SLE18, emphasizing the validity and immune relevance of this region. To confirm the veracity of this association signal, we genotyped 2–6 SNPs (including rs73366469) in ∼40% of our discovery samples and in two replication cohorts (Supplementary Table 6). Associations were consistently replicated; rs117026326 showed the strongest association but is in LD with rs73366469 ( = 0.76;
= 0.65; and
= 0.64 in controls), making it difficult to separate the effects of these SNPs (Supplementary Table 6). Interestingly, conditional analysis on four SNPs showed that rs80346167 (GTF2IRD1) was independent in the KR cohort, supporting the involvement of both genes in the locus. However, because of the strong correlation structure among variants, genotyping and fine-mapping on a larger scale are required to further delineate this signal. ENCODE data indicate that the high-LD SNP rs7800325 (r2 = 0.99) and indel SNP rs587608058 (r2 = 0.81), ∼1,000 bp from rs73366469, lie within conserved enhancers, active chromatin and transcription factor binding sites in CD4+ T cells and GM12878 lymphoblastoid cells (Supplementary Fig. 4a). Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) and chromosome conformation capture (Hi-C) showed that this region overlaps transcription start sites for GTF2I and VGF (Supplementary Fig. 5 and Supplementary Tables 7 and 8).
The second strongest signal was at the intronic SNP rs10807150 (DEF6, Pmeta = 6.06 × 10−16) and is correlated with rs8205 (ZNF76 promoter, r2 = 1), a cis-eQTL altering expression of ZNF76 and DEF6 (Supplementary Tables 5 and 9). The nearby SNP rs4711414 (r2 = 0.91) alters a highly conserved promoter and transcription factor binding site cluster (Supplementary Fig. 4b). The third strongest signal was near IL12B (encoding interleukin (IL)-12β; rs2421184, Pmeta = 4.67 × 10−12), in a highly conserved enhancer (Supplementary Fig. 4c and Supplementary Tables 5 and 7).
Among the other new signals, rs7726414 (Pmeta = 1.13 × 10−11) in the distal promoter of TCF7 is highly linked to rs6874758 (r2 = 0.99), located in a conserved enhancer (Supplementary Fig. 4d and Supplementary Tables 5 and 7). Nearby rs201806887 (r2 = 0.79) alters a strong enhancer and transcription factor binding site cluster. The 5p15.33 signal was in an oncogene19 (TERT, intronic; rs7726159, Pmeta = 2.11 × 10−11); this locus is tightly bound by the RNA-binding proteins PABPC1 and SLBP (Supplementary Fig. 4e). High-LD SNP rs7705526 (r2 = 0.94) has been linked to chronic lymphocytic leukemia20. The CD226 signal is explained by the intronic SNP rs1610555 (Pmeta = 4.50 × 10−11), is linked (r2 = 0.74) to the nonsynonymous SNP rs763361 (Supplementary Fig. 4f) and has been associated with multiple autoimmune diseases. rs763361 is a cis-eQTL for CD226 and also a trans-eQTL for ACRBP and MAP3K7CL (Supplementary Table 9). The signal at PCNXL3 (rs2009453, Pmeta = 9.61 × 10−11) is in strong LD (r2 = 0.95) with rs931127 (Supplementary Fig. 4g), a cis-eQTL for PCNXL3, SIPA1 and RELA (Supplementary Table 9). The signal at RASGRP1 (rs12900339, Pmeta = 4.73 × 10−10) is connected with multiple chromatin interactions (Supplementary Table 8) as well as correlated (r2 = 0.77) with rs12324579, a cis-eQTL for C15orf53 (Supplementary Table 9). Intronic rs61616683 (SYNGR1, Pmeta = 5.73 × 10−10) is in active chromatin (Supplementary Fig. 4i) and is a cis-eQTL for SYNGR1 (Supplementary Table 9). Correlated SNP (r2 = 0.86) rs909685 is associated with rheumatoid arthritis in Koreans21. Intronic rs2305772 (SIGLEC6, Pmeta= 1.34 × 10−9) is a cis-eQTL for SIGLEC6-SIGLEC12 (Supplementary Table 9) and disrupts a conserved SIGLEC6 splice junction (Supplementary Fig. 4j).
We also confirmed association (P < 0.005) with 36 previously reported SLE susceptibility loci (Supplementary Fig. 6 and Supplementary Table 10). Conditional analysis (Online Methods) at each locus identified secondary associations in three new and ten reported loci (Supplementary Table 11).
As expected, HLA association was replicated in all cohorts (Supplementary Fig. 7a and Supplementary Table 10). The strongest signal was at the HLA class II locus (rs113164910, Pdiscovery-meta = 2.48 × 10−37, OR = 1.65), 14 kb 3′ of HLA-DRA. To further delineate the HLA signal, we imputed SNPs, classical HLA alleles and HLA amino acid residues in all three cohorts (KR, HC and MC; Online Methods). The most significant association was identified at HLA-DRβ1 amino acid position 13 (P = 9.5 × 10−45) and the linked amino acid position 11 (P = 7.37 × 10−39), as shown in a recent HLA fine-mapping study using a subset (∼60%) of our Korean samples22 (Supplementary Table 12). Our results also confirmed the reported associations of the two linked classical alleles HLA-DRB1*15:01 (P = 4.19 × 10−29) and HLA-DQB1*06:02 (P = 6.46 × 10−26) (Supplementary Fig. 7b and Supplementary Table 12). To investigate the secondary effect within and outside of HLA-DRB1, we performed a conditional analysis. Consistent with the recent study22, the associations at HLA-DRB1 were almost entirely explained by the primary effect of amino acid positions 11 and 13 and the secondary effect of amino acid position 26 (P = 4.09 × 10−17). After accounting for the effect of the HLA-DRB1 locus (Online Methods), no additional association signals were detected. Thus, the HLA-DRB1 locus explained most of the major histocompatibility complex (MHC) associations (Supplementary Fig. 7c and Supplementary Table 12). Comparing SNP and classical HLA allele associations, we found that both association results colocalized the strongest effects in the HLA-DRB1 region (Supplementary Fig. 7b), as evidenced by the signals for HLA-DRB1*15:01 and nearby rs113164910.
Additionally, we identified six new suggestive loci (1.9 × 10−9 < Pmeta < 1.12 × 10−5) with three missense variants (Supplementary Tables 4 and 13). Although three of these loci (ATG16L2-FCHSD2, MYNN-LRRC34 and CCL22) passed genome-wide significance, further replications are needed to confirm their association.
We replicated most of the previously reported genes with the same published SNPs or with highly correlated SNPs (Supplementary Table 14). We also found four genes with new uncorrelated SNPs shifting the association peaks in Asians (Supplementary Table 15). Of these, ARHGAP31-TMEM39A-CD80 was of special interest: previously reported association signals for TMEM39A (rs1132200)23 and CD80 (rs6804441)6 were now explained by a synonymous SNP in ARHGAP31 (rs2305249, Pmeta = 1.64 × 10−9), a cis-eQTL of B4GALT4 and POGLUT1 (a Notch signaling regulator24).
To identify the most likely functional variants within a locus, we used Bayesian-based analyses10,11, eQTL searches and epigenetic analyses (Online Methods and Supplementary Tables 7–9 and 16). We found that the lead SNPs in the GTF2I, IL12B, PCNXL3, SYNGR1, RASGRP1 and SIGLEC6 loci had a high probability of being functional (Supplementary Table 17).
To explore biological functions and pathways related to the SLE susceptibility loci (new and previously published), we performed gene set enrichment analysis (GSEA; Online Methods). We identified pathways and gene ontology categories (including immunity, inflammation and cytotoxicity) (Supplementary Fig. 8) in common between the new and published loci. Moreover, GSEA with a drug target database25 identified a set of 56 significantly enriched drugs (adjusted P < 0.05; Supplementary Table 18), including SLE therapeutics26 (cyclosporine, zinc acetate, hydrocortisone and methotrexate), that affected expression of the target loci. Of note was GTF2I, significantly enriched in drugs used for the treatment of leukemia (imatinib; adjusted P = 1.82 × 10−10) and lymphoma (cisplatin; adjusted P = 2.68 × 10−4). Immune system involvement was confirmed by enrichment analysis of SLE-associated loci in mouse immune phenotypes, with significant enrichment in abnormal lymphocyte, leukocyte and/or immune cell physiology and abnormal cell-mediated and/or adaptive immunity (Supplementary Table 19).
To understand the relationship between our newly identified loci and known SLE loci and to identify possible molecular mechanisms involved in SLE pathogenesis, we performed network interaction analysis27,28 (Online Methods). We found that the new and replicated SLE loci are connected directly and indirectly to each other through gene regulation as well as protein and biochemical interactions (Supplementary Figs. 9 and 10). Text mining methods29 confirmed that many of these loci have strong associations with one another in the literature and show how the new loci are related to the replicated loci (Supplementary Fig. 11). Within these relationships, we further identified subnetworks of molecules interacting with our new loci in the context of known SLE-relevant genes (TERT, IL12B, GTF2I, RELA, SRC and NFKB2; Supplementary Fig. 12).
We identified only one nonsynonymous variant (rs2305772, p.Pro246Ser/splice junction, SIGLEC6) in LD (r2 ≥ 0.8) with the new SNPs (Supplementary Table 20), suggesting that other variants likely contribute to SLE pathogenesis through epigenetic regulation rather than alterations to protein structure or function. Joint analysis of lead and correlated (r2 > 0.8) SNPs indicated a 13-fold enrichment in strong enhancers in K562 erythroleukemia cells and an up to 22-fold enrichment in DNase I hypersensitivity sites in MCF-7 breast cancer cells (Supplementary Table 21).
In six of the ten new loci (GTF2I, DEF6, CD226, PCNXL3, RASGRP1 and SIGLEC6), the highly conserved, ancestral alleles were the risk allele. Except in the SIGLEC6 locus, all derived, protective alleles were the major allele in Asians and European-ancestry populations (Utah residents of Northern and Western European ancestry (CEU)); in SIGLEC6, the derived, protective allele was the major allele only in Europeans. Notably, derived risk alleles for SYNGR1 occur at a frequency of >80% in Asians (Han Chinese in Beijing, China (CHB) and Japanese in Tokyo, Japan (JPT)), as compared to a frequency of ∼20% in the CEU population, suggesting that SYNGR1 is undergoing selection in Asian populations, as indicated by FST, iHS and XP-EHH analyses (Supplementary Table 22).
We assessed whether regions associated with these SNPs (new and replicated) harbored genes expressed in distinct immune cell types30 (Online Methods). We identified significant (1 × 10−9 < P < 4 × 10−4) cell type–specific expression of genes within the new loci in human B cells, T cells, natural killer cells and dendritic cells (Fig. 4 and Supplementary Fig. 13a). This result was further strengthened by replication with homologous mouse genes in mouse cell lines, with significant enrichment in CD19+ B cells (P = 1.0 × 10−5) and transitional B cells (P = 1.0 × 10−5) (Supplementary Fig. 13b). Thus, our results point to a strong (and conserved) effect of gene expression in B and T cells during SLE pathogenesis.
We estimated enrichment of our gene set in a set of human (FANTOM5) cell lines. Cell types with overexpression have a high correlation (Pearson's correlation coefficient) of SLE loci expression (dark red). P values (blue bars) that passed the multiple-testing threshold (black line) show significant enrichment in SLE-associated loci (indicated by an asterisk).
Six of the ten newly identified loci are also associated with other autoimmune diseases, including celiac disease, rheumatoid arthritis, type 1 diabetes and multiple sclerosis (Supplementary Table 23), suggesting pleiotropic effects. This pattern extended to suggestive signals at ATG16L2, PTPRC, UBAC2 and RGS1, which are reportedly associated with other autoimmune diseases.
Collectively, these new and previously reported SLE susceptibility variants (47 SNPs) explain 24% of the total heritability of SLE in Asians (Supplementary Table 24). Among these loci, the HLA region explains 2% of the heritability and the ten new loci account for 6%. All loci together explain 24% of λs (Supplementary Table 25); new loci explain 7% and HLA loci explain 2%. To quantify the predictive ability of these variants, we estimated genetic risk through the weighted genetic risk score (wGRS). Newly identified risk alleles significantly (P = 6.58 × 10−39) increased the wGRS area under the curve (AUC) (95% CI) from 0.82 to 0.85 (0.85–0.86) (Supplementary Fig. 14a,b).
In summary, our results further define the genetic architecture and heritability of SLE risk (especially in Asians) and provide insights into disease pathogenesis. Through comprehensive analysis of multiple Asian populations, we identified ten new SLE-predisposing loci and validated association in 36 reported loci (often refining the associated intervals). We pinpointed and annotated independently associated variants at each locus. Further analysis in additional populations and experimental validation in cultured and patient-derived cells (as previously performed31,32,33) will demonstrate which SNPs are causal and elucidate biochemical pathways through which genetic changes contribute to SLE. This study highlights the success of targeting high-risk populations for genetic analysis, followed by systematic bioinformatics analysis to set up future experimental validation.
Methods
Study overview.
This study was conducted in three stages (Fig. 1). In the first stage, we genotyped three Asian cohorts: Koreans (KR), Han Chinese (HC) and Malaysian Chinese (MC). This step was followed by quality control and preliminary association analysis to identify 578 regions with association P < 5 × 10−3. We then increased the Korean sample size with out-of-study controls and performed imputation-based meta-analysis to discover 16 new regions with Pdiscovery-meta < 5 × 10−5. In the second stage, we followed up these 16 new regions, performing in silico replication on a Japanese (JAP) GWAS58 data set and two independent replications on separate Beijing Han Chinese (BHC) and Shanghai Han Chinese (SHC) data sets to identify ten new loci with Pmeta < 5 × 10−8. In the third stage, we used bioinformatic databases to annotate the identified variants and carried out comprehensive analyses to uncover potential disease-predisposing variants involved in SLE pathogenesis (Supplementary Table 1 and Supplementary Note). Information on ethical approval is provided in the Supplementary Note.
Imputation-based association analysis, meta-analysis and conditional analysis.
For the first stage of our study (Fig. 1), we performed single-SNP case-control association analysis based on Immunochip genotype data from each population that were subjected to quality control. We calculated association P values, standard errors, and odds ratios and 95% confidence intervals using PLINK59. This identified 578 regions with P < 5 × 10−3 in at least one Asian cohort for imputation (Supplementary Fig. 1 and Supplementary Table 3). To perform the imputation more intensively and accurately, we wrote a script based on a recursive algorithm to define imputation regions. Imputation regions were defined if they contained a peak SNP with P < 5 × 10−3. Region size was defined by the length of the LD region (r2 > 0.2) with respect to the peak SNP. To avoid edge effects, we extended a further 100 kb on each side for each region. The recursive algorithm to define imputed regions used the following steps:
-
1
Find the peak SNP with minimal P ≤ 5 × 10−3 in a region a(x,y) (the region starting with the whole chromosome (x is the start position, y is the end position)). If such a peak SNP exists, continue; otherwise, stop.
-
2
Define the imputation region as d(u,v) = LD region (r2 > 0.2 with peak SNP) ± 100 kb.
-
3
If (x,u) exists, go to a(x,u) recursively (step 1); if (v,y) exists, go to a(v,y) recursively (step 1). Otherwise, stop.
-
4
Collect all regions d(u,v) for final imputation.
For the second stage of our study (Fig. 1), we integrated additional GWAS data from Korean out-of-study controls to increase both SNP density and statistical power. Because the Korean Immunochip and GWAS data sets were genotyped on two different platforms and the number of overlapping SNPs was less than the original number of SNPs for either the Korean Immunochip or Korean GWAS data set, we imputed each set separately on its original number of real genotyping SNPs using MACH-Admix60. The Han Chinese and Malaysian Chinese Immunochip data sets were imputed separately as well following the Korean Immunochip protocol. We took 504 Asians (104 from the JPT population, 200 from the CHB population and 200 from the Southern Han Chinese (CHS) population) from 1000 Genomes Project data (1000 Genomes Project Phase 3 Integrated Release Version 5 Haplotypes) as the reference panel for imputations. All SNP names and strands for the three Immunochip data sets and the one out-of-study control data set were aligned with the Asian reference panel (n = 504) before those four data sets were each imputed separately. This imputation strategy has been used by many earlier studies61,62 and has also been recommended as best practice by the eMERGE Network63.
After imputation, we performed strict quality control on post-imputed SNPs. In addition to the quality control steps described above (Hardy-Weinberg equilibrium P > 0.0001 in controls, MAF > 0.5%), post-imputed SNPs were also required to have high imputation quality (Rsq > 0.7 for MAF ≥ 3% and Rsq > 0.9 for MAF < 3%) to be included for further analysis. To take into account imputation uncertainty, we used mach2dat64,65 for single-SNP post-imputation-based association tests and for conditional logistic regression analysis, with adjustment for population stratification. We used the first three principal components as covariates to correct for population stratification and potential batch effects (Supplementary Figs. 15 and 16). Additionally, as a complementary analysis, we used a newly developed genotype-conditional association test (GCAT)66 to confirm our principal-component analysis (PCA)-corrected associations, the results of which were very consistent (data not shown). We used Metal67 to perform the meta-analysis based on post-imputation associations for three Immunochip cohorts (KR, HC and MC), as well as for the combined KR data set (the merged dosage data set of KR Immunochip and KR GWAS controls), HC Immunochip and MC Immunochip. To include the highest quality SNPs in the follow-up association analysis, we used imputed SNPs with high imputation quality (Rsq > 0.7) in each of the separately imputed data sets (Immunochip and GWAS). We then merged the two imputation sets according to the stringent quality control criteria described above.
Finally, we analyzed SLE association in 152,918 post-imputation SNPs subjected to quality control and identified 20,213 associated SNPs (Pdiscovery-meta < 0.005), from which we successfully replicated 36 SLE loci with Pdiscovery-meta < 0.005 (Supplementary Table 10) and identified 16 new suggestive regions with Pdiscovery-meta < 5 × 10−5 for follow-up replication (Supplementary Tables 26 and 27).
To test whether any systematic bias was introduced by this imputation procedure, we also performed an association analysis of the lead SNPs between the controls (Immunochip versus GWAS). We found no evidence of systematic bias introduced by imputation and thus consider the imputation results sound (Supplementary Table 28).
We performed conditional analysis for 20 known SLE-associated loci with genome-wide significance and ten new regions with genome-wide significance after replication in the largest cohort (KR). Conditional analysis was iterative, starting with the top SNP with that lowest P value as the first SNP to be conditioned on; all subsequent SNPs that were significant after conditioning were added to the regression model as covariates until no SNP with P < 5 × 10−5 remained. To ensure that SNPs were truly independent, SNPs in high LD (r2 > 0.3 with the SNP being conditioned on) were filtered out before the next iteration, and only associated SNPs with P < 5 × 10−5 entered conditional analysis.
Functional annotation of new loci.
To localize candidate causal variants, we annotated each lead SNP along with its surrounding correlated SNPs (r2 > 0.7 in Asian samples from the 1000 Genomes Project), as implemented in Haploreg68 on data obtained from Phase 1 of the 1000 Genomes Project and Ensembl69. We surveyed allele-dependent gene expression regulation (eQTLs) by querying the blood eQTL13 database (which houses the experimental meta-analysis from gene expression experiments performed on non-transformed peripheral blood samples from 5,311 individuals of European descent and later replicated in 2,775 individuals) for cis- and trans-eQTLs (Supplementary Table 9). The functional significance of independent SNPs from new regions is shown in Supplementary Table 7, and we report eQTL results in Supplementary Table 9.
We annotated epigenetic regulatory features for all independent lead SNPs (and their correlated variants; r2 > 0.8) in our new regions using the Haploreg68, GWAS3D70 and rSNPBase71 online tools. Haploreg68 provides functional annotations for binding motifs and epigenetic marks. GWAS3D70 aggregates epigenetic data from 16 cell types from multiple databases, including the ENCODE Project, and identifies multiple regulatory SNPs in high LD with the queried SNPs. Among the regulatory elements queried were enhancer marks (p300, H3K4me1 and H3K27ac), promoter regions, CTCF insulator marks and DNase I–hypersensitive sites (DHSs). ChromHMM was used to predict histone states and chromatin interactions. To understand distal regulatory relationships among the new loci, chromatin interactions between candidate loci were gathered from ChIA-PET and Hi-C data on eight cell lines (K562, NB4, GM12878, CD4+ T cells, H1-hESC, IMR90, RWPE1 and MCF-7), available through ENCODE. We reported data for lead SNPs with at least three ChIA-PET or Hi-C hits (Supplementary Fig. 5 and Supplementary Table 8). Additionally, rSNPBase71 provided putative functional SNPs with experimentally validated regulatory elements controlling transcriptional and post-transcriptional events.
Functional fine-mapping.
To identify the set of variants most likely to house a functional variant, we used two Bayesian methods. The first one was based on a Bayesian regression to estimate each SNP's Bayes factor and, thereafter, its posterior probability of association in the region10. Second, we used the Probabilistic Identification of Causal SNPs (PICS) algorithm11, which incorporates the underlying epigenetic information for those variants, to further narrow down the available SNPs within the Bayesian credible set.
Bayesian logistic regressions for each of the SNPs at the new imputation regions was implemented in the Bayes Factor (BF)72 library in R. Henceforth, we estimated the posterior probability for each SNP, as well as the proportion of the total Bayes factor explained by each variant. We formed 95–99% credible sets as the cumulative proportion of the Bayes factor10. To assess how much of the effects could be explained by the credible sets, we annotated each candidate SNP with dbSNP functions (intron, missense, UTR, synonymous or intergenic), as well as epigenetic annotations (promoter, enhancer, DNase I hypersensitivity, bound protein motif driver disrupted, rSNP, LD proxy of rSNP (r2 > 0.8), proximal regulation, distal regulation, microRNA regulation, RNA-binding protein–mediated regulation or eQTL).
We implemented the PICS method11 to identify the set of variants with probable functional effects. This method uses the epigenetic information at each locus and estimates the posterior probability of a SNP being causal given the strength of association and its linkage neighborhood, as well as regulatory element annotations.
Gene-gene interaction.
To identify gene-gene interaction, we performed logistic regression with an interaction term between all pairs of lead SNPs (Table 1) using PLINK. Both BOOST73 and joint effects74 methods were used to screen for SNP-SNP interactions. We used a significance threshold of 1 × 10−4.
Network interactions.
To investigate how our new loci interact with other genes, we used curated network interactions from the Disease Association Protein-Protein Link Evaluator database (DAPPLE v2.0)27. We used a seed of all our new loci (flanking genes on both the left and right were also used for intergenic signals) and 20,000 within-degree-node permutations. We chose to simplify our networks given the number of potential interactions (Supplementary Fig. 12). The network represents all significant interactions between proteins that form a network.
Additionally, we confirmed network interactions using the aggregated database ConsensusPathDB28. ConsensusPathDB scores the confidence level of protein interactions on a scale between 0 and 1, and aggregates 11 pathway databases for GSEA. We chose interactions with a high confidence score (Intscore >0.9). Additionally, we plotted all possible high-confidence interactions for all new loci (Supplementary Figs. 9 and 10).
To investigate how our updated set of new SLE loci were related to each other and to previously established loci, we used a literature mining–based approach, implemented in IRIDESCENT29 (Supplementary Fig. 11). This approach identifies genes mentioned together in the same MEDLINE titles and/or abstracts (over 24 million currently) and weights their relevance on the basis of relative frequencies of gene mention and gene-gene co-mention.
Gene set enrichment analysis.
To determine whether there were significant enrichments of our SLE (new and replicated) loci as compared to reported SLE loci in human and mouse ontologies, we performed GSEA using GREAT75 (Supplementary Table 19). To compare interacting pathways and the ontological properties of new versus published SLE genes, we used ConsensusPathDB28. Additionally, to identify and compare drug perturbation signatures for new and reported loci, we used the gene enrichment analysis software Enrichr25 (Supplementary Table 18).
To test whether there was bias in enrichment due to the choice of the Immunochip as a genotyping platform, we conducted 100 over-representation analysis tests using sets of 58 genes taken at random from the Immunochip gene set in ConsensusPathDB28. We computed the number of times any pathway or ontology category was observed in the 100 random sets (Supplementary Table 29).
Cell type–specific enrichment analysis.
To identify enrichment in cell type–specific expression of new and replicated SLE loci (57 SNPs), we used a previously reported approach30,76 as follows. We used normalized expression data from 79 human cell types from GeneAtlas77 (curated by the Genomic Institute of the Novartis Research Foundation), as well as from 249 mouse cell types sorted by FACS and assayed at least three times from the Immunological Genome Project (ImmGen)78. Additionally, we used cell type–specific expression of the collection of 573 human cell samples from the FANTOM5 Project79.
In this analysis, we extracted genes from the regions where SNPs correlated with the lead SNPs (r2 > 0.5; Table 1), spanning between recombination hotspots. We used the normalized cell type–specific expression profiles of the extracted genes to identify which cell types significantly express SLE candidate genes. Specificity P values were estimated on the basis of the permutation of ranked expression levels for each locus (1010 permutations) using SNPsea76 (Fig. 4 and Supplementary Fig. 13). P values (blue bars) that passed the multiple-testing threshold (black line) indicate significant enrichment in SLE-associated loci. Threshold lines are dependent on the number of categories present in each database: that is, for the 1,751 GO categories possible, the significance threshold would be 2 × 10−5.
Explained heritability.
We assessed the variance in liability (Vg) explained for each of our genome-wide significant SNPs using the liability threshold method7. We estimated Vg for new, reported and HLA loci separately. We used the weighted risk allele frequency and meta-analysis odds ratio for each variant to calculate the liability threshold for each genotype (Supplementary Table 24). We present values estimated using a prevalence estimate (K) of 0.0030653 following So and Sham7. To check the consistency of this heritability estimate, we also used the allele frequencies from each cohort, as well as the allele frequencies for the HapMap and 1000 Genomes Project CHB and JPT populations.
Sibling relative risk.
We estimated the contribution of SLE susceptibility loci to the familiar relative risk (Supplementary Table 25), especially for the sibling relative risk (λs) under the multiplicative model80

where λ0 is the overall sibling relative risk, assumed here to be ∼30 (ref. 81), with the relative sibling risk from each locus (λ) given by

where p is the frequency of the risk allele (q = 1 − p) and r is the per-allele risk ratio82.
Weighted cumulative genomic risk score.
To assess the effect of accumulation of risk variants between cases and controls, we estimated the wGRS for all individuals with high imputation quality (Rsq > 0.7). We weighted the number of risk variants by the natural logarithm of the meta-analysis odds ratio83 for all ten new loci, two HLA loci and 35 replicated loci from a total of 2,476 cases and 8,426 controls. Significant differences in wGRS were estimated using a logistic regression model including sex and the top three principal components as covariates (Supplementary Fig. 14). Differences between mean wGRS in cases and controls were estimated through a linear model.
Area under the curve.
We estimated the predictive power of the wGRS for variants, as well as the marginal contribution of the new variants, by comparing the AUCs for the baseline model (including reported loci) and the expanded model (including reported and new loci) (Supplementary Fig. 14). AUC corrected for sex was estimated in R using the pROC library84. Confidence intervals for the AUC were estimated using the nonparametric DeLong method85.
Evidence for natural selection.
To assess evidence for natural selection, we used HapMap 2 and Human Genome Diversity Project (HGDP) population data through Haplotter and the HGDP Selection Browser. For each of the ten new loci, we looked for evidence of positive natural selection in the 1-Mb region around each gene. Haplotter uses three statistics: iHS (the integrated haplotype score), FST (the fixation index of population differentiation) and the empirical P value for the distribution of Tajima's D and Fay's H (ref. 86), whereas the HGDP Selection Brower uses XP-EHH87 (Cross-Population Extended Haplotype Homozygosity) to identify positive natural selection in addition to the iHS score. Evidence of natural selection was considered positive if the empirical P value was <0.05 for the distribution of both Tajima's D and Fay's H, and −log(q) was >3 for FST, D, iHS or XP-EHH, where q is the empirical P values rank ordering the summary statistic value (a given region divided by the total number of regions) (Supplementary Table 22).
Graphical display of the epigenetic landscape of the loci.
For Supplementary Figure 4, plots were assembled similarly to in ref. 32. Most data were downloaded from the UCSC Genome Browser and displayed using custom MATLAB code. ATAC-seq (assay for transposase-accessible chromatin with high-throughput sequencing) tracks for CD4+ cells and GM12878 cells were downloaded from the Gene Expression Omnibus (GEO) under accession GSE47753 (ref. 88). DNase I hypersensitivity, ENCODE sequence classification, histone marks and binding data for transcription factors (to DNA) and RNA-binding proteins (to RNA) were all downloaded from the UCSC Genome Browser. ENCODE regulatory elements are color-coded according to their standard; other signals are shown in grayscale, with a darker shade of gray representing a higher signal. All tested SNPs are shown as bars with a height of −log10 (P value) at the top. In the zoomed images, SNPs of interest are labeled.
URLs.
GCTA v1.24, http://www.complextraitgenomics.com/software/gcta/; EIGENSTRAT, http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm; PLINK version 1.07, http://genetics.med.harvard.edu/reich/Reich_Lab/Software.html; MACH-Admix, http://www.unc.edu/~yunmli/MaCH-Admix/; 1000 Genomes Project, reference data, http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/; Mach2dat, http://www.unc.edu/~yunmli/software.html; Metal, http://www.sph.umich.edu/csg/abecasis/metal; HaploReg, http://www.broadinstitute.org/mammals/haploreg/haploreg.php; Ensembl, http://www.ensembl.org/; blood eQTL browser, http://genenetwork.nl/bloodeqtlbrowser/; GWAS3D, http://jjwanglab.org/gwas3d; rSNPBase, http://rsnp.psych.ac.cn/; ENCODE Project, http://www.genome.gov/10005107; ChromHMM, http://compbio.mit.edu/ChromHMM/; DAPPLE v2.0, http://www.broadinstitute.org/mpg/dapple/dappleTMP.php; ConsensusPathDB, http://cpdb.molgen.mpg.de/; GREAT v2.0.2, http://bejerano.stanford.edu/great/public/html/; Enrichr, http://amp.pharm.mssm.edu/Enrichr/; GeneAtlas, http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS596; ImmGen, http://www.immgen.org/; FANTOM5, http://fantom.gsc.riken.jp/5/; SNPsea, http://www.broadinstitute.org/mpg/snpsea/; R version 3.0.0, http://www.r-project.org/; pROC, http://web.expasy.org/pROC/; Haplotter, http://haplotter.uchicago.edu/; HGDP Selection Browser, http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/; UCSC Genome Browser, http://genome.ucsc.edu/.
Accession codes.
Summary-level association data for the discovery sets are provided as a Supplementary Data Set.
Accession codes
References
- 1
Jakes, R.W. et al. Systematic review of the epidemiology of systemic lupus erythematosus in the Asia-Pacific region: prevalence, incidence, clinical features, and mortality. Arthritis Care Res. (Hoboken) 64, 159–168 (2012).
- 2
Danchenko, N., Satia, J.A. & Anthony, M.S. Epidemiology of systemic lupus erythematosus: a comparison of worldwide disease burden. Lupus 15, 308–318 (2006).
- 3
Wandstrat, A. & Wakeland, E. The genetics of complex autoimmune diseases: non-MHC susceptibility genes. Nat. Immunol. 2, 802–809 (2001).
- 4
Harley, I.T., Kaufman, K.M., Langefeld, C.D., Harley, J.B. & Kelly, J.A. Genetic susceptibility to SLE: new insights from fine mapping and genome-wide association studies. Nat. Rev. Genet. 10, 285–290 (2009).
- 5
Boackle, S.A. Advances in lupus genetics. Curr. Opin. Rheumatol. 25, 561–568 (2013).
- 6
Yang, W. et al. Meta-analysis followed by replication identifies loci in or near CDKN1B, TET3, CD80, DRAM1, and ARID5B as associated with systemic lupus erythematosus in Asians. Am. J. Hum. Genet. 92, 41–51 (2013).
- 7
So, H.C., Gui, A.H.S., Cherny, S.S. & Sham, P.C. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet. Epidemiol. 35, 310–317 (2011).
- 8
Gateva, V. et al. A large-scale replication study identifies TNIP1, PRDM1, JAZF1, UHRF1BP1 and IL10 as risk loci for systemic lupus erythematosus. Nat. Genet. 41, 1228–1233 (2009).
- 9
Cortes, A. & Brown, M.A. Promise and pitfalls of the Immunochip. Arthritis Res. Ther. 13, 101 (2011).
- 10
Wellcome Trust Case Control Consortium. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
- 11
Farh, K.K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
- 12
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
- 13
Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
- 14
Tantin, D., Tussie-Luna, M.I., Roy, A.L. & Sharp, P.A. Regulation of immunoglobulin promoter activity by TFII-I class transcription factors. J. Biol. Chem. 279, 5460–5469 (2004).
- 15
Li, Y. et al. A genome-wide association study in Han Chinese identifies a susceptibility locus for primary Sjögren's syndrome at 7q11.23. Nat. Genet. 45, 1361–1365 (2013).
- 16
Zheng, J. et al. The GTF2I rs117026326 polymorphism is associated with anti-SSA-positive primary Sjögren's syndrome. Rheumatology (Oxford) 54, 562–564 (2015).
- 17
Lessard, C.J. et al. Variants at multiple loci implicated in both innate and adaptive immune responses are associated with Sjögren's syndrome. Nat. Genet. 45, 1284–1292 (2013).
- 18
Perl, A. Emerging new pathways of pathogenesis and targets for treatment in systemic lupus erythematosus and Sjogren's syndrome. Curr. Opin. Rheumatol. 21, 443–447 (2009).
- 19
Johnatty, S.E. et al. Evaluation of candidate stromal epithelial cross-talk genes identifies association between risk of serous ovarian cancer and TERT, a cancer susceptibility “hot-spot”. PLoS Genet. 6, e1001016 (2010).
- 20
Berndt, S.I. et al. Genome-wide association study identifies multiple risk loci for chronic lymphocytic leukemia. Nat. Genet. 45, 868–876 (2013).
- 21
Kim, K. et al. High-density genotyping of immune loci in Koreans and Europeans identifies eight new rheumatoid arthritis risk loci. Ann. Rheum. Dis. 74, e13 (2015).
- 22
Kim, K. et al. The HLA-DRβ1 amino acid positions 11-13-26 explain the majority of SLE-MHC associations. Nat. Commun. 5, 5902 (2014).
- 23
Lessard, C.J. et al. Identification of IRF8, TMEM39A, and IKZF3-ZPBP2 as susceptibility loci for systemic lupus erythematosus in a large-scale multiracial replication study. Am. J. Hum. Genet. 90, 648–660 (2012).
- 24
Chu, Q., Liu, L. & Wang, W. Overexpression of hCLP46 enhances Notch activation and regulates cell proliferation in a cell type–dependent manner. Cell Prolif. 46, 254–262 (2013).
- 25
Chen, E.Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
- 26
Xiong, W. & Lahita, R.G. Pragmatic approaches to therapy for systemic lupus erythematosus. Nat. Rev. Rheumatol. 10, 97–107 (2014).
- 27
Trost, B., Arsenault, R., Griebel, P., Napper, S. & Kusalik, A. DAPPLE: a pipeline for the homology-based prediction of phosphorylation sites. Bioinformatics 29, 1693–1695 (2013).
- 28
Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).
- 29
Wren, J.D., Bekeredjian, R., Stewart, J.A., Shohet, R.V. & Garner, H.R. Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 389–398 (2004).
- 30
Hu, X. et al. Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsets. Am. J. Hum. Genet. 89, 496–506 (2011).
- 31
Molineros, J.E. et al. Admixture mapping in lupus identifies multiple functional variants within IFIH1 associated with apoptosis, inflammation, and autoantibody production. PLoS Genet. 9, e1003222 (2013).
- 32
Maiti, A.K. et al. Combined protein- and nucleic acid–level effects of rs1143679 (R77H), a lupus-predisposing variant within ITGAM. Hum. Mol. Genet. 23, 4161–4176 (2014).
- 33
Guthridge, J.M. et al. Two functional lupus-associated BLK promoter variants control cell-type- and developmental-stage-specific transcription. Am. J. Hum. Genet. 94, 586–598 (2014).
- 34
Vandeweyer, G., Van der Aa, N., Reyniers, E. & Kooy, R.F. The contribution of CLIP2 haploinsufficiency to the clinical manifestations of the Williams-Beuren syndrome. Am. J. Hum. Genet. 90, 1071–1078 (2012).
- 35
Howard, M.L. et al. Mutation of Gtf2ird1 from the Williams-Beuren syndrome critical region results in facial dysplasia, motor dysfunction, and altered vocalisations. Neurobiol. Dis. 45, 913–922 (2012).
- 36
Antonell, A. et al. Partial 7q11.23 deletions further implicate GTF2I and GTF2IRD1 as the main genes responsible for the Williams-Beuren syndrome neurocognitive profile. J. Med. Genet. 47, 312–320 (2010).
- 37
Roy, A.L. Biochemistry and biology of the inducible multifunctional transcription factor TFII-I: 10 years later. Gene 492, 32–41 (2012).
- 38
Malcolm, T., Kam, J., Pour, P.S. & Sadowski, I. Specific interaction of TFII-I with an upstream element on the HIV-1 LTR regulates induction of latent provirus. FEBS Lett. 582, 3903–3908 (2008).
- 39
Gupta, S. et al. T cell receptor engagement leads to the recruitment of IBP, a novel guanine nucleotide exchange factor, to the immunological synapse. J. Biol. Chem. 278, 43541–43549 (2003).
- 40
Biswas, P.S. et al. Dual regulation of IRF4 function in T and B cells is required for the coordination of T-B cell interactions and the prevention of autoimmunity. J. Exp. Med. 209, 581–596 (2012).
- 41
Noble, J.A. et al. A polymorphism in the TCF7 gene, C883A, is associated with type 1 diabetes. Diabetes 52, 1579–1582 (2003).
- 42
Klapper, W. et al. Telomerase activity in B and T lymphocytes of patients with systemic lupus erythematosus. Ann. Rheum. Dis. 63, 1681–1683 (2004).
- 43
Iguchi-Manaka, A. et al. Accelerated tumor growth in mice deficient in DNAM-1 receptor. J. Exp. Med. 205, 2959–2964 (2008).
- 44
Alcina, A. et al. The autoimmune disease–associated KIF5A, CD226 and SH2B3 gene variants confer susceptibility for multiple sclerosis. Genes Immun. 11, 439–445 (2010).
- 45
Deshmukh, H.A. et al. Evaluation of 19 autoimmune disease–associated loci with rheumatoid arthritis in a Colombian population: evidence for replication and gene-gene interaction. J. Rheumatol. 38, 1866–1870 (2011).
- 46
Hafler, J.P. et al. CD226 Gly307Ser association with multiple autoimmune diseases. Genes Immun. 10, 5–10 (2009).
- 47
Maiti, A.K. et al. Non-synonymous variant (Gly307Ser) in CD226 is associated with susceptibility to multiple autoimmune diseases. Rheumatology (Oxford) 49, 1239–1244 (2010).
- 48
Qiu, Z.X., Zhang, K., Qiu, X.S., Zhou, M. & Li, W.M. CD226 Gly307Ser association with multiple autoimmune diseases: a meta-analysis. Hum. Immunol. 74, 249–255 (2013).
- 49
Wieczorek, S. et al. Novel association of the CD226 (DNAM-1) Gly307Ser polymorphism in Wegener's granulomatosis and confirmation for multiple sclerosis in German patients. Genes Immun. 10, 591–595 (2009).
- 50
Du, Y. et al. Association of the CD226 single nucleotide polymorphism with systemic lupus erythematosus in the Chinese Han population. Tissue Antigens 77, 65–67 (2011).
- 51
Stoeckman, A.K. et al. A distinct inflammatory gene expression profile in patients with psoriatic arthritis. Genes Immun. 7, 583–591 (2006).
- 52
Yasuda, S. et al. Defective expression of Ras guanyl nucleotide–releasing protein 1 in a subset of patients with systemic lupus erythematosus. J. Immunol. 179, 4890–4900 (2007).
- 53
He, C.F. et al. TNIP1, SLC15A4, ETS1, RasGRP3 and IKZF1 are associated with clinical features of systemic lupus erythematosus in a Chinese Han population. Lupus 19, 1181–1186 (2010).
- 54
Iatropoulos, P. et al. Association study and mutational screening of SYNGR1 as a candidate susceptibility gene for schizophrenia. Psychiatr. Genet. 19, 237–243 (2009).
- 55
Liu, J.Z. et al. Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis. Nat. Genet. 44, 1137–1141 (2012).
- 56
Gorski, K.S. et al. A set of genes selectively expressed in murine dendritic cells: utility of related cis-acting sequences for lentiviral gene transfer. Mol. Immunol. 40, 35–47 (2003).
- 57
Patel, N. et al. OB-BP1/Siglec-6. A leptin- and sialic acid–binding protein of the immunoglobulin superfamily. J. Biol. Chem. 274, 22729–22738 (1999).
- 58
Okada, Y. et al. A genome-wide association study identified AFF1 as a susceptibility locus for systemic lupus eyrthematosus in Japanese. PLoS Genet. 8, e1002455 (2012).
- 59
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
- 60
Liu, E.Y., Li, M., Wang, W. & Li, Y. MaCH-admix: genotype imputation for admixed populations. Genet. Epidemiol. 37, 25–37 (2013).
- 61
Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).
- 62
Liu, H. et al. Discovery of six new susceptibility loci and analysis of pleiotropic effects in leprosy. Nat. Genet. 47, 267–271 (2015).
- 63
Verma, S.S. et al. Imputation and quality control steps for combining multiple genome-wide datasets. Front. Genet. 5, 370 (2014).
- 64
Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genomics Hum. Genet. 10, 387–406 (2009).
- 65
Li, Y. & Abecasis, G.R. Mach. 1.0: rapid haplotype reconstruction and missing genotype inference. Am. J. Hum. Genet. S79, 2290 (2006).
- 66
Song, M., Hao, W. & Storey, J.D. Testing for genetic associations in arbitrarily structured populations. Nat. Genet. 47, 550–554 (2015).
- 67
Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
- 68
Ward, L.D. & Kellis, M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012).
- 69
Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014).
- 70
Li, M.J., Wang, L.Y., Xia, Z., Sham, P.C. & Wang, J. GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucleic Acids Res. 41, W150–W158 (2013).
- 71
Guo, L., Du, Y., Chang, S., Zhang, K. & Wang, J. rSNPBase: a database for curated regulatory SNPs. Nucleic Acids Res. 42, D1033–D1039 (2014).
- 72
Rouder, J.N. & Morey, R.D. Default Bayes factors for model selection in regression. Multivariate Behav. Res. 47, 877–903 (2012).
- 73
Wan, X. et al. BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am. J. Hum. Genet. 87, 325–340 (2010).
- 74
Ueki, M. & Cordell, H.J. Improved statistics for genome-wide interaction analysis. PLoS Genet. 8, e1002625 (2012).
- 75
McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
- 76
Slowikowski, K., Hu, X. & Raychaudhuri, S. SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci. Bioinformatics 30, 2496–2497 (2014).
- 77
Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004).
- 78
Hyatt, G. et al. Gene expression microarrays: glimpses of the immunological genome. Nat. Immunol. 7, 686–691 (2006).
- 79
FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
- 80
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
- 81
International Consortium for Systemic Lupus Erythematosus Genetics (SLEGEN). et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat. Genet. 40, 204–210 (2008).
- 82
Zheng, W. et al. Common genetic determinants of breast-cancer risk in East Asian women: a collaborative study of 23 637 breast cancer cases and 25 579 controls. Hum. Mol. Genet. 22, 2539–2550 (2013).
- 83
Hughes, T. et al. Analysis of autosomal genes reveals gene-sex interactions and higher total genetic risk in men with systemic lupus erythematosus. Ann. Rheum. Dis. 71, 694–699 (2012).
- 84
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).
- 85
DeLong, E.R., DeLong, D.M. & Clarke-Pearson, D.L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
- 86
Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
- 87
Pickrell, J.K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 (2009).
- 88
Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. & Greenleaf, W.J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Acknowledgements
We are grateful to the affected and unaffected individuals who participated in this study. We thank the research assistants, coordinators and physicians who helped in the recruitment of subjects, including the individuals in the coordinating projects. A part of the Korean control data was provided from the Korean Biobank Project supported by the Korea Center for Disease Control and Prevention at the Korea National Institute of Health. Genomic DNA from ∼100 Korean patients with SLE was obtained from the Korean National Biobank at Wonkwang University Hospital, which is supported by the Ministry of Health and Welfare, Republic of Korea.
This work was supported by grants from the US National Institutes of Health (AR060366, MD007909, AI103399, AI024717, AI083194, AI107176, TR001425, HG008666 and HG006828), the US Department of Defense (PR094002), the US Department of Veterans Affairs, the National Basic Research Program of China (973 program) (2014CB541902), the Research Fund of Beijing Municipal Science and Technology for the Outstanding PhD Program (20121000110), the National Natural Science Foundation of China (81200524, 81230072) and High-Impact Research Ministry of Education Grant UM.C/625/1/HIR/MoE/E000044-20001, Malaysia. This study was also supported by a grant from the Korea Healthcare Technology R&D Project (HI13C2124), Ministry for Health and Welfare, Republic of Korea.
Author information
Affiliations
Contributions
S.K.N., J.B.H. and S.-C.B. conceived and initiated the study. S.K.N. designed, coordinated and supervised the overall study. C.S., X.Z., P.M., K.B., A.A. and X.K.-H. prepared samples, performed genotyping, cleaned the data, combined various data sets and maintained the database. C.S., J.E.M., K.K. and Y.O. performed data imputation, association analysis and various statistical analyses on the data. L.L.L., J.E.M., M.D. and J.D.W. performed the bioinformatic analysis. S.-C.B., H.Z., K.H.C., X.Z., K.K., S.-Y.B., H.-S.L., T.-H.K., Y.M.K., C.-H.S., W.T.C., Y.-B.P., J.-Y.C., S.C.S., S.-S.L., Y.J.K., B.-G.H., Y.K., A.S., M.K., T.S., K.Y., J.M., Y.Q., K.M.K. and N.S. recruited and characterized patients with SLE and controls and supplied the demographic and clinical data. C.S., J.E.M., X.K.-H., K.K., S.-C.B., L.L.L. and S.K.N. drafted the manuscript. All authors approved the study, reviewed the manuscript, commented and helped in revising the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–16 and Supplementary Note. (PDF 11732 kb)
Supplementary Tables 1–29
Supplementary Tables 1–29. (XLSX 2075 kb)
Supplementary Data Set
Summary-level association data for the discovery sets. (XLSX 18647 kb)
Rights and permissions
About this article
Cite this article
Sun, C., Molineros, J., Looger, L. et al. High-density genotyping of immune-related loci identifies new SLE risk variants in individuals with Asian ancestry. Nat Genet 48, 323–330 (2016). https://doi.org/10.1038/ng.3496
Received:
Accepted:
Published:
Issue Date:
Further reading
-
Association of TERT and DSP variants with microscopic polyangiitis and myeloperoxidase-ANCA positive vasculitis in a Japanese population: a genetic association study
Arthritis Research & Therapy (2020)
-
Thymic epithelial tumors: From biology to treatment
Cancer Treatment Reviews (2020)
-
The pathogenesis of systemic lupus erythematosus: Harnessing big data to understand the molecular basis of lupus
Journal of Autoimmunity (2020)
-
Identifying damage clusters in patients with systemic lupus erythematosus
International Journal of Rheumatic Diseases (2020)
-
The association of the UHRF1BP1 gene with systemic lupus erythematosus was replicated in a Han Chinese population from mainland China
Annals of Human Genetics (2020)