Evaluation of 10 SLE susceptibility loci in Asian populations, which were initially identified in European populations

Ten novel loci have been found to be associated with systemic lupus erythematosus (SLE) susceptibility by a recent genome-wide association study conducted in Europeans. To test their disease associations and genetic similarities/differences in Asians and Europeans, we genotyped the 10 novel single nucleotide polymorphisms (SNPs) and performed an association study. A Chinese cohort from Northern China was recruited as the discovery population, and three East Asian cohorts were included for independent replication. The 10 SNPs were genotyped using TaqMan allele discrimination assays. To prioritize the associated SNPs, different layers of the public functional data were integrated. Among the 10 SNPs, rs564799 in IL12A was shared in both ethnicities (Padjust = 5.91 × 10−4; odds ratio = 1.22, 1.10–1.35). We also confirmed the reported polymorphism rs7726414 in TCF7 in the current study (Padjust = 4.12 × 10−8; odds ratio = 1.46, 1.28–1.66). The directions and magnitudes of the allelic effects for most of the 10 SNPs were comparable between Europeans and Asians. However, higher risk allele frequencies and population-attributable risk percentages were observed in Asians than in Europeans. We also identified the most likely functional SNPs at each locus. In conclusion, both genetic similarities and differences across ethnicities have been observed, providing further evidence for a genetic basis of the high incidence of SLE in Asian ancestry.


Results
Allelic association analyses. After quality control, 493 cases and 628 controls were included in the analysis. All 10 SNPs were in Hardy-Weinberg equilibrium in patients and controls (P > 0.05). As shown in Tables 1, three SNPs, including rs6740462 in SPRED2, rs564799 in IL12A, and rs2286672 in PLD2 have been significantly detected (with P values ranging from 3.51 × 10 −2 to 9.36 × 10 −5 ). The variant rs7726414 in TCF7 showed marginal significance in the discovery population (P = 5.37 × 10 −2 ), while no solid evidence of associations was observed for the others (with P values of 0.16-0.77). For replication, genotype data for the 4 associated SNPs were then extracted from our previous study on East Asians, including Korean, Han Chinese and Malaysian Chinese 8 . Consistent associations of rs564799 in IL12A and rs7726414 in TCF7 have been observed, and the significances were enhanced by meta-analysis. Interestingly, the association between these two SNPs and SLE remained significant after multiple corrections (P values were 5.91 × 10 −4 and 4.12 × 10 −8 , respectively, using the Bonferroni method on 4 SNPs) ( Table 2). The effects of the two associated alleles were in the same direction (either risk or protective factors for SLE) as reported in Europeans.
Detection powers for 10 SNPs in the Chinese Han population, assuming the odds ratios (ORs) in the published GWAS, are 55.1%, 61.6%, 71.7%, 75.8%, 80.9%, 92.6%, 95.1%, 95.7%, 96.8%, and 98.3% for rs9652601, rs887369, rs4902562, rs10774625, rs3768792, rs2286672, rs6740462, rs7726414, rs564799, and rs3794060, respectively. For the replicated 4 SNPs, rs7726414, rs564799, rs2286672, and rs6740462, genetic powers of the combined set of 2978 SLE cases and 4575 controls were 95.7%, 96.8%, 97.7%, and 99.5%. Thus, the un-replicated SNPs may be due to sample heterogeneity or limited detecting power (lower MAFs compared to Europeans).  Comparisons of risk allele frequencies, effect sizes and risk across cohorts. As shown in Table 3, the risk allele frequencies (RAFs) of all 10 SNPs in the controls were significantly higher in Asians than in Europeans, with P values ranging from 1.93 × 10 −266 to 4.54 × 10 −2 . Especially for rs4902562 in RAD51B and rs2286672 in PLD2, the minor alleles in Europeans were the major alleles in Asians. Consistently with the clear differences in RAFs between Asians and Europeans, the PARPs were higher for most of the 10 SNPs in Asians than in Europeans, highlighting likely more pivotal roles in Asian patients 2,5 . Notably, the PARP value of the significant SNP rs564799 in IL12A was almost three times as high in Asians as in Europeans. In contrast, as mentioned in the former part, the effect size of all 10 SNPs, regarding the OR value and direction, were comparable in both Asians and Europeans.

Systematic annotation and prioritization of the functional SNPs.
For the two replicated SNPs, 11 proxy SNPs (r 2 > 0.8) were extracted, resulting in 13 candidate SNPs for functional annotation. Overall, we found that all of the variants located in non-coding regions of the genome and overlapped with at least one layer of ENCODE data, indicating that these SNPs are likely to influence SLE through mechanisms regulating gene expression. As shown in Table 4, in one context, for the lead SNP rs564799 and its proxies, rs485789 showed the most layers of functional information (i.e., the highest RegulomeDB score + Promoter/Enhancer histone marks and DNAse sites + protein-binding site + matched motifs). This concordance of peaks in rs485789 that correlated disease susceptibility with IL12A expression made this SNP a strong candidate as a functional SNP, with IL12A as the potentially causal genes. In another context, for the lead SNP rs7726414 and its proxies, rs7726414 intersected with the most layers of functional data (i.e., the highest RegulomeDB score + Promoter/Enhancer histone marks and DNAse sites + protein-binding site + matched motifs) and thus was prioritized as the most likely functional SNP. However, no cis-eQTL effects of rs7726414 and its proxies were identified in the databases applied in the current study.

Discussion
By investigating the 10 SLE related SNPs in 3 independent East Asian SLE populations, we detected one significant novel loci (IL12A) and confirmed one previously reported one (TCF7) 8 . With the current replication population, the statistical power for the two significant association signals were 96.8% and 95.7%, respectively. Notably, the locus IL12B was identified as novel related genes for SLE in East Asian populations by high-density genotyping 8 , emphasizing the validity and immune relevance of these regions. Moreover, markedly higher RAFs and PARPs for these SNPs were observed in Asian populations compared with Europeans, providing further evidence for a genetic background for the difference in prevalence. The risk alleles and their effects (both effect size and direction) were shared by Asians and Europeans. Consistently with previous studies 2,5 , both similarities and differences with respect to RAFs, PARPs and ORs were observed across ethnicities. Although, the genetic heterogeneity across ancestries would cause different association results. For the 8/10 SNPs for which we did not detect consistent association signals in the current study, different distributions of RAFs and PARPs were also observed. Even using a similar number of cases and controls for both ethnicities, differences in the power to detect significant associations for individuals SNPs across ethnicities appear to depend largely on their allele frequencies. As mentioned above, the detection powers for the 8/10 un-replicated SNPs were about 50-80%. In future work, independent replication in larger populations, especially for the SNPs with lower allele frequencies, will be needed.
More importantly, using the public available databases, we have been able to zoom in on the functional SNPs of the significant SNPs rs564799 and rs7726414, which were proposed to affect the SLE pathology. We found that most of the SLE-related SNPs were located in non-coding regions of the genome and played a role in disease pathogenesis through altering the target gene expression. On one hand, rs485789 in high LD with rs564799 (r 2 = 1) showed the strongest regulatory evidence among the lead SNP rs564799 and its proxies. It also had a cis-eQTL effect on IL12A, indicating rs485789 as the functional SNP and IL12A as the potential causal gene. IL12A encodes IL-12α , which is a component of IL-12 (made in B cells, macrophages, dendritic cells and neutrophils). IL-12 is a critical secreted signal in T cell activation. On the other hand, the lead SNP rs7726414 itself was annotated as the strongest regulatory variant. Although no cis-eQTL effects of rs7726414 have been detected in the current study, the annotated gene TCF7 seemed more likely to be the causal gene. TCF7 is a T cell-specific transcription factor that regulates the expression of CD3. A mouse Tcf7 knockout showed reduced immune-competence of T cells in the periphery. Thus, further fine-mapping analysis and functional studies are still needed to clarify the role of TCF7 in the pathogenesis of SLE. In summary, two novel loci reported by SLE GWAS in Europeans have been significantly replicated in three independent East Asian populations. The comparison of RAFs and PARPs in Europeans and Asians provides further evidence for a genetic basis of the high incidence of SLE in Asia compared to Europe. By integrating multiple layers of regulatory information and eQTL mapping, the functional SNPs and genes have been detected.

Materials and Methods
Subjects. The current association analysis was conducted in two stages. In the discovery stage, a discovery cohort of Chinese Han ancestry from Northern China was recruited, including 493 SLE cases (age 32.52 ± 12.31 years, female 86.07%) and 628 unrelated healthy controls (age 41.40 ± 11.01 years). In the replication stage, three independent East Asian cohorts, including Koreans, Han Chinese and Malaysian Chinese 8 , were included to validate the associated SNPs (P < 0.1). A flowchart of the current study is presented in Fig. 1.
All the patients met the revised SLE criteria of American College of Rheumatology 11 . This investigation was conducted according to the Declaration of Helsinki. The medical ethics committee of Peking University approved the study. All participants gave informed consent. SNP selection and genotyping. Although the variant rs7726414 in TCF7 was also discovered and replicated in our previous study 8 , for consistency and to assess the replication, we evaluated the 10 novel SNPs reported in a recent SLE GWAS conducted in Europeans 7 without selection. Genotyping was conducted using TaqMan allele discrimination assays as previously reported [12][13][14] .
To comprehensively evaluate the genetic heterogeneity between Asians and Europeans, we retrieved the summary data of the 10 SNPs from the published SLE GWAS in Europeans 7 . The Han Chinese replication population 8 were also included for the analysis because both this population and the discovery population were recruited from Northern China. Genotype data for 5 SNPs, including rs6740462, rs564799, rs7726414, rs10774625, and rs9652601, were extracted from the Immunochip, while the remaining 5 SNPs for this cohort were genotyped using TaqMan allele discrimination assays.

Systematic annotation.
To prioritize potential functional SNPs and causal genes at the replicated susceptibility loci, we integrated multiple functional data. The detailed procedures of the prioritization process were presented in Fig. 2. Considering that there are often SNPs showing high linkage disequilibrium (LD) with the associated SNPs, we first extracted the proxies (r 2 > 0. 8    The study was designed in three stages. First, we genotyped the 10 novel genome wide associated loci with European SLE patients in a Han Chinese cohort in Beijing and compared the genetic similarities and differences between the two ancestries. Four out of 10 loci were identified as significant (P < 0.1). Second, we performed independent replications of these 4 loci in three cohorts from Korean, Han Chinese and Malaysian Chinese. Consistent associations have been identified in our discovery and replication cohorts for two loci (i.e., IL12A and TCF7). Third, by integrating different layers of functional data, we identified the most likely functional SNPs for these two loci. Abbreviations: GWAS: genome-wide association study; SLE: systemic lupus erythematosus; SNP: single nucleotide polymorphism.  haploreg.php), forming the candidate SNPs. The potential functional consequences of the candidate SNPs were predicted using rSNPBase (http://rsnp.psych.ac.cn/) and RegulomeDB databases (http://www.regulomedb.org/). The rSNPBase database provides the regulatory information on SNPs with experimentally validated regulatory elements controlling transcriptional and post-transcriptional events. RegulomeDB ranks SNPs based on the amount of regulatory information with which an SNP intersects. Then, the eQTL mapping data were used to prioritize the replicated SNPs. As a discovery set, the comprehensive and versatile eQTL database seeQTL (http://www.bios.unc.edu/research/genomicsoftware/seeQTL/), which includes various eQTL studies and a meta-analysis of HapMap eQTL information, was investigated, and the results were replicated in lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project 15 .
Statistical analysis. Quality control of genotyping, Hardy-Weinberg equilibrium tests, allelic association analyses were performed using PLINK 16 . As a replication, no multiple testing was applied, and P < 0.05 was considered significant. ORs and allele frequencies were presented according to the risk alleles identified in Europeans. The contributions of SNPs to the risk of SLE were estimated with PARP, which considers both OR and RAF in the general population, using the formula RAF(OR-1)/[RAF(OR-1) + 1] × 100% 17 . Statistical power was estimated using Power and Sample Size Calculations Version 3.0 (http://biostat.mc.vanderbilt.edu/PowerSampleSize) with a two-sided type I error rate of 0.05.