Abstract
Systemic lupus erythematosus (SLE), a worldwide autoimmune disease with high heritability, shows differences in prevalence, severity and age of onset among different ancestral groups. Previous genetic studies have focused more on European populations, which appear to be the least affected. Consequently, the genetic variations that underlie the commonalities, differences and treatment options in SLE among ancestral groups have not been well elucidated. To address this, we undertake a genome-wide association study, increasing the sample size of Chinese populations to the level of existing European studies. Thirty-eight novel SLE-associated loci and incomplete sharing of genetic architecture are identified. In addition to the human leukocyte antigen (HLA) region, nine disease loci show clear ancestral differences and implicate antibody production as a potential mechanism for differences in disease manifestation. Polygenic risk scores perform significantly better when trained on ancestry-matched data sets. These analyses help to reveal the genetic basis for disparities in SLE among ancestral groups.
Similar content being viewed by others
Introduction
Systemic lupus erythematosus (SLE; OMIM 152700) is an autoimmune disease characterized by production of autoantibodies and multiple organ damage. Genetic factors play a key role in the disease, with estimates of its heritability ranging from 43% to 66% across populations1,2,3. Differences in the expression of the disease across ancestral groups have been reported with non-European populations showing an earlier age of onset, 2–4 fold higher prevalence and 2–8 fold higher risk of developing end-stage renal disease than European populations4,5,6,7. Responses to treatment of SLE with the novel monoclonal antibody against B-cell activating factor (BAFF), Belimumab, also show variation across ancestral groups8,9. These findings highlight the heterogeneous nature of the disease, so closer examination of ancestral group differences is likely to improve disease risk prediction and lead to more precise treatment options.
More than 90 loci have been shown to be associated with SLE through genome-wide association studies (GWAS)10,11,12. Trans-ancestral group studies conducted previously were primarily designed to increase power and to identify SLE susceptibility loci shared across ancestries13,14. However, due to inadequate power in studies involving non-Europeans, current findings are biased towards loci associated with SLE in European populations. Some risk alleles reported from studies on European populations, such as those in or near PTPN22, NCF2, SH2B3, and TNFSF13B, are absent in East Asian populations15 while a missense variant (rs2304256) in TYK2 points to a European-specific disease association16,17,18.
The basis for ancestral group differences in the manifestation of SLE at the genome level remains poorly understood. Further studies on non-European populations will help define the genetic architecture underlying SLE and the consequences of patients’ ancestral backgrounds. To this end, we genotyped 8252 participants of Han Chinese descent recruited from Hong Kong (HK), Guangzhou (GZ) and Central China (CC), and combined these data with previous datasets to give a total of ten SLE genetic cohorts consisting of 11,283 cases and 24,086 controls. The increased sample size, particularly for those of Chinese ancestry, allowed identification of novel disease loci and comparative analyses of the genetic architectures of SLE between major ancestral groups. In this work, we identify 38 novel loci associated with SLE and demonstrate both shared and specific genetic components between East Asians and Europeans.
Results
Data set preparation
Han Chinese data: After removing individuals with a low genotyping rate or hidden relatedness, the 7596 subjects of Han Chinese descent from HK, GZ, and CC genotyped in this study and the 5057 subjects from the existing Chinese GWAS13 gave a Chinese ancestry data set of 4222 SLE cases and 8431 controls (Supplementary Tables 1–2 and Supplementary Fig. 1). Ethnic European Data: Existing GWAS data from European populations19 were reanalyzed, based on principal components (PC) matching those for subjects from the 1000 Genomes Project to minimize the potential influence of population substructures20 (see “Methods” section) and grouped into three cohorts, EUR GWAS 1–3 (Supplementary Fig. 2). The recent GWAS21 data from Spain (SP) was included. After quality control, the European data included 4576 cases and 8039 controls. A further 2485 SLE cases and 7616 controls were included as summary statistics from an Immunochip study of East Asians22 (Supplementary Table 2).
Ancestral correlation of SLE
Genotype imputation and association analysis were performed independently for each GWAS cohort (Supplementary Figs. 3–4) and as meta-analyses of each ancestral group (Fig. 1; see “Methods” section). The trans-ancestral genetic-effect correlation, rge, between the Chinese and European GWAS was estimated to be 0.64 with a 95% confidence interval (CI) of 0.46 to 0.81 by Popcorn23 (see “Methods” section), indicating a significant, but incomplete, correlation of the genetic factors for SLE between the two ancestries. This analysis was repeated by removing variants in the human leukocyte antigen (HLA) region (chr6: 25–35 mbp), and the rge increased to 0.78 as a result, suggesting greater ancestral differences for the HLA region.
Novel SLE susceptibility loci
Meta-analyses, involving a total of 35,369 participants (see “Methods” section; Supplementary Table 2) were conducted. Of the 94 previously reported SLE associated variants (Supplementary Data 1), 59 (62.8%) surpassed a genome-wide significance P-value threshold (5.0E−08) and 84 (89.4%) exceeded the threshold of 5E−05 in our study. Thirty-four novel variants reached genome-wide significance and four variants had P-values approaching this threshold based on either ancestry-dependent or trans-ancestral meta-analyses. The newly identified loci included the immune checkpoint receptor CTLA4, the TNF receptor-associated factor TRAF3 and the type I interferon gene cluster on 9p21 (Table 1 and Supplementary Data 2). The new loci bring the total of SLE-associated loci to 132 and produce a 23.5% and 16.5% increase in the proportion of heritability explained for East Asians and Europeans, respectively (see “Methods” section).
Annotation of SLE susceptibility loci
Functional annotations that might be enriched with SLE susceptibility loci were evaluated by the stratified LD score regression method24 (see “Methods” section). For non-cell type-specific annotations, heritability was significantly enriched in transcription start sites (P = 2.47E−05), regions that are conserved in mammals (P = 8.22E−03) and ubiquitous enhancers that are marked with H3K27ac or H3K4me1 modifications (P = 4.43E−02, P = 5.03E−02, respectively; Supplementary Fig. 5). Based on H3K4me1 modifications (associated with active enhancers) across 127 cell types, enrichment of specific cell types was investigated (see “Methods” section). Cells that surpassed the false discovery threshold rate (FDR < 0.05) were mostly hematological cells, with B and T lymphocytes the most prominent cell types associated with SLE (Supplementary Fig. 6a). Similar results were observed based on H3K4me3 modifications (associated with promoters of active genes) (Supplementary Fig. 6b).
We used the Regulatory Element Locus Intersection (RELI) method25 to identify transcription factors (TFs) whose binding sites are enriched in SLE-associated loci. Out of 1544 ChIP-seq datasets with a total of 344 TFs in 221 cell types, 249 datasets showed significant enrichment with the associated loci (corrected P < 1.00E−05; see “Methods” section; Supplementary Data 3). Consistent with results from previous studies25,26, the associated SNPs were strongly intersected with binding sites of immune-related TFs, including NFATC1, NF-κB, STAT5A, IRF4, and viral protein EBNA2.
Identification of putative disease genes and pathways
Excluding the HLA region, 179 putative disease genes were identified across the disease-associated loci reported before and those newly identified in this study (Supplementary Data 4; see “Methods” section). A significant level of protein-protein connectivity corresponding to genes found at the novel loci and known SLE-associated loci was observed (P < 1E−16; Supplementary Fig. 7; see “Methods” section). Forty-five pathways were significantly enriched with these putative SLE susceptibility genes (ToppGene27, FDR < 0.05; Supplementary Table 3). The pathways of cytokine signaling, IFN-α/β signaling, Toll-like receptor (TLR) signaling, and B and T cell receptor signaling showed greatest enrichment. The RIG-I-like receptor signaling (P = 5.83E−10) and TRAF6-mediated IRF7 activation (P = 6.41E−10) pathways were designated as SLE associated pathways primarily based on genes newly identified in this study.
Trans-ancestral fine-mapping of disease-associated loci
One hundred and eight SLE-associated loci tagged by SNPs having a minor allele frequency (MAF) greater than 1% in both Chinese and European populations were examined by PAINTOR28, making use of the differences in LD between ancestries. The median number of putative causal variants in the 95% credible sets reduced from 57 per locus when using only the European GWAS to 16 per locus when using data from both ancestries (one-sided paired t-test P = 9.79E−07, Fig. 2). The number of disease-associated loci with five or fewer putative causal variants increased from four when using the European GWAS alone to 15 (Supplementary Data 5). A single putative causal variant was identified for the WDFY4 and TNFSF4 loci, the latter of which was functionally validated in a previous study29 (Supplementary Fig. 8).
Ancestral group differences
Based on the analysis of Cochran’s Q (CQ)-test that assesses heterogeneity of effect-size estimates from different ancestral groups, SLE-associated variants or loci outside of the HLA region were divided into four categories: (1) ancestry-shared disease loci tagged by variants with CQ-test P ≥ 0.05; (2) putative ancestry-heterogeneous disease loci with CQ-tests of P < 0.05 but FDR adjusted CQ-test P ≥ 0.05; (3) ancestry-heterogeneous disease loci with FDR adjusted CQ-test P < 0.05; and (4) disease loci tagged by associated variants with the risk allele absent in one of the two ancestries15 (Supplementary Table 4). Nine disease variants, other than those absent or rare (MAF < 0.01) in one of the two ancestries15, showed significant differences in effect-size estimates between the two ancestral groups and were considered ancestry-heterogeneous (FDR adjusted CQ-test P < 0.05, category 3; Fig. 3a and Supplementary Table 5). Within this category, variants in the HIP1, TNFRSF13B, PRKCB, PRRX1, DSE, and PLD4 loci were associated with SLE only in East Asians and variants in TYK2 and NEURL4-ACAP1 only in Europeans (P < 5.0E−08 in one ancestry but P > 0.01 in the other, with non-overlapping of the 95% CIs of the ORs). These eight loci were thus considered ancestry-specific. SNP rs4917014, a variant near IKZF1, showed a significantly stronger effect in East Asians (OR = 1.33, P = 5.18E−29) than in Europeans (OR = 1.16, P = 1.34E−06; CQ-test P = 4.02E−04). These findings were supported by analyses in each cohort (Supplementary Figs. 9–10).
On reanalyzing data from association studies on rheumatoid arthritis (RA)30, the variants in the PRKCB and PLD4 loci were found to be associated with RA in East Asians but not in Europeans (CQ-test P = 0.004 and 0.010 for PRKCB and PLD4, respectively), while the variant in TYK2 was found associated in Europeans only (CQ-test P = 0.002; Fig. 3b and c). This consistency with the differences found in the SLE study suggests that shared mechanisms could be responsible for ancestral group differences among autoimmune diseases.
Colocalization of loci across ancestral groups
Colocalization methods consider many SNPs, rather than only the leading variant in a locus, to compare association signals between ancestral groups. The ancestry-shared disease loci showed much higher posterior probabilities (PP) of colocalization (mean of PP = 0.42) than the ancestry-specific loci (mean of PP = 0.03; Supplementary Fig 11a). For example, the MTF1, IKBKE and TNIP1 loci in category 1 showed strong posterior probabilities (≥90%) of colocalization under a Bayesian test31 (see “Methods” section), suggesting shared causal effects between the two ancestries. The eight ancestry-specific loci in category 3 showed low posterior probabilities of colocalization (1–10%), consistent with the CQ-test of the top variant of each locus (above and Supplementary Fig. 10).
Since LD differences between ancestries may affect the colocalization results, we compared the SLE association signals from the Chinese populations with those on 27 non-immune-related phenotypes studied in Europeans (Supplementary Table 6) to serve as colocalization baseline values. While posterior probabilities for colocalization at the ancestry-shared MTF1, IKBKE and TNIP1 loci were much greater than the baseline values, there were no differences for the six Asian-specific disease loci (Supplementary Fig. 11b), thereby excluding the potential influence of LD. The European-specific loci (TYK2 and NEURL4-ACAP1) were not evaluated this way due to lack of public data.
Functional annotation of the ancestry-heterogeneous loci
The ancestry-heterogeneous loci appear to be enriched for functions related to antibody production. Two of the nine putative disease genes at ancestry-heterogeneous loci (category 3), TNFRSF13B and IKZF1, are causal genes for human primary immunodeficiency disorders (PID)32 presenting with primary antibody deficiencies (PADs), whereas none of the disease genes at the putative ancestry-heterogeneous loci (category 2, 0/22; Fisher exact test P = 0.07) or the ancestry-shared loci (category 1, 0/120; Fisher exact test P = 0.004) are known to cause PADs in humans. Two of the East Asian-specific disease variants, those in the TNFRSF13B and PRKCB loci, were associated with serum immunoglobulin levels in East Asian populations33,34,35, whereas none of the variants in loci belonging to category 1 and 2 were found to be associated.
TNFRSF13B, which encodes a BAFF receptor, TACI, plays a major role for immunoglobulin production36,37,38. In this study, a missense variant in TNFRSF13B, rs34562254, was specifically associated with SLE in Chinese populations (OR = 1.18, P = 2.88E−08 in Chinese; OR = 1.01, P = 0.75 in Europeans). In European populations, an SLE-associated variant in the 3’-UTR of TNFSF13B (encoding BAFF), which is absent in Asian populations (category 4), was associated with serum levels of total IgG, IgG1, IgA, and IgM39.
Mice deficient in the orthologs of four of the nine putative disease genes (44.4%) at the ancestry-heterogeneous loci for SLE, Tnfrsf13b-, Ikzf1-, Prkcb-, and Tyk2-demonstrated abnormal IgG levels40 (MP:0020174), while at putative ancestry-heterogeneous loci (4/22 or 18.2%; OR = 3.43, Fisher exact test P = 0.185) or ancestry-shared loci (14/120 or 11.6%; OR = 5.92, Fisher exact test P = 0.027), proportionately fewer genes caused aberrant IgG levels in mice. Orthologs of four of the twelve (33.3%) putative disease genes from disease loci where the risk allele is monomorphic in one of the ancestries (PTPN22, TNFSF13B, IKZF3, and IGHG1; category 4) also demonstrated abnormal immunoglobulin production in gene knockout mouse models36,39,41,42,43.
Evolutionary signatures for the disease loci
Disease loci with heterogeneity between East Asians and Europeans might have undergone differential selection pressures in recent human history, as has been shown for the SLE risk variant in TNFSF13B39. Frequency variances, as fixation indexes (Fst), for the variants of the first three categories were calculated using 3324 controls from the HK cohort and 5379 controls from EUR GWAS 2 cohort (see “Methods” section). Higher Fst would indicate a larger frequency difference between the two ancestries. Mean Fst values for the ancestry-shared, putative ancestry-heterogeneous and ancestry-heterogeneous variants were 0.054, 0.061, and 0.084, respectively. Although a small sample, three (DSE, HIP1, TNFRSF13B) of the nine ancestry-heterogeneous variants (33.3%) showed Fst ≥ 0.15 (empirical P < 0.03), while only 10% of the putative ancestry-heterogeneous variants and 8.8% for the ancestry-shared disease variants had Fst ≥ 0.15 (Supplementary Fig. 12a).
In addition, recent positive selection, measured by standardized integrated Haplotype Scores (iHS)44, was investigated at the associated loci. A significant correlation of the iHS scores, estimated using control subjects from HK and EUR GWAS 2 cohorts (see “Methods” section), supports recent positive selection for the shared associated variants (categories 1; r = 0.28, P = 0.03; Supplementary Fig. 12b). This is consistent with results using data from Southern Han Chinese (CHS) and Utah residents of European ancestry45 (CEU; r = 0.32, P = 0.008). However, there was no evidence of such a correlation for disease variants that showed ancestry heterogeneity (category 2 and 3; Supplementary Fig. 12b). For example, in the BAFF system, the derived risk allele rs34562254-A in TNFRSF13B is much more prevalent in East Asians than in other populations (Fig. 4a) and has a significantly longer haplotype for the derived risk allele than the ancestral allele (more negative standardized iHS score) in East Asian populations than in Africans (P = 3.2E−04) or Europeans (P = 4.4E−04) (Fig. 4b), suggesting recent positive selection for the risk allele in East Asians.
Polygenetic risk scores for SLE and their accuracies across ancestries
Polygenic risk scores (PRS) have been used to estimate individual risk to complex diseases, such as coronary artery disease46 and schizophrenia47. However, as the majority of GWAS findings used to calculate these scores are based on European populations, their accuracy in other populations may be limited. PRS for SLE, trained by data on European populations, were tested on individuals of three Chinese cohorts using the lassosum algorithm48 (see “Methods” section). The area under the receiver-operator curve (AUC) ranged from 0.62 to 0.64 for the three Chinese cohorts. Similar results were observed in the reverse case (Supplementary Fig. 13a). The LDpred49 algorithm produced similar results (Supplementary Fig. 13b). These analyses suggest a partial transferability of PRS between the two ancestries.
Using samples from the GZ cohort as the validation dataset, performance of predictors trained using GWAS summary statistics from the HK and CC cohorts (2618 cases and 7446 controls) or from the European cohorts (4,576 cases and 8039 controls) were evaluated. Ancestry-matched predictors significantly outperformed (AUC = 0.76, 95% CI: 0.74–0.78) ancestry mismatched predictors (AUC = 0.62, 95%CI: 0.60–0.64) (Fig. 5a). When the analysis was repeated by randomly choosing the same number of samples (1500 cases and 1500 controls) from each of the Chinese and European GWAS as training data, a similar difference was observed (Supplementary Fig. 14). Ancestry-matched PRS for samples in the GZ cohort had a mean difference of 0.89 (standard deviation) between the SLE case and control groups (t-test P = 9.01E−116) and disease classification using the optimal threshold achieved 73.4% sensitivity and 65.4% specificity (Fig. 5b; see “Methods” section). Disease risk increased with higher PRS, with individuals in the highest PRS decile having a much higher disease risk than those in the lowest decile (OR = 30.3, Chi-square test P = 6.23E−54; Fig. 5c).
Discussion
As non-European groups appear to be more severely affected by SLE, greater genetic data on these groups are likely to be highly informative. By increasing the number of subjects of East-Asian ancestry to levels roughly equivalent to those of European subjects, and including previously published data, we have made cross ancestral group studies possible.
Through ancestry-dependent and trans-ancestral meta-analyses, we identified 38 novel loci associated with SLE, bringing the total number of SLE-associated loci to 132. High level functional annotation of these SLE associated loci implicated hematological cells, particularly B and T lymphocytes, cytokine signaling and other immune system pathways. Consistent with previous findings50, we demonstrated the value of trans-ancestral data in significantly reducing the number of putative causal variants at each disease-associated locus, which may facilitate future functional and mechanistic studies.
There was strong evidence of heterogeneity on SLE associations between the two ancestries, which were not likely to be artefacts of study power. Eight variants (that are common in both ancestral groups) were associated with disease in only one of the ancestral groups. For three of these, a similar ancestral difference was found by re-analyzing association data on RA30. This might suggest that common mechanisms account for ancestral differences in autoimmune diseases.
Genes at the ancestry-heterogeneous disease loci seemed more likely to be involved in regulation of immunoglobulins than genes at the ancestry-shared disease loci. Immunoglobulin levels are highly heritable51 and have been found to differ between ancestries, with African Americans and Asians having higher serum immunoglobulin levels than people of European ancestry52,53,54,55. Higher antibody levels in non-European populations might have contributed to their higher prevalence of SLE and further study of intrinsic differences in immune function among ancestries may be informative.
That differential mechanisms may exist for antibody regulation between East Asian and European populations is supported by the association of SLE with the BAFF signaling system. This system, which is part of the initial reaction to host-pathogen interactions37, might be under positive selection due to different environmental exposures37,39. BAFF (TNFSF13B) and its receptors, one of which is TACI (TNFRSF13B), play essential roles in B cell survival and differentiation36,37,38. The SLE risk allele in the gene encoding BAFF is completely absent in Chinese populations39 and a missense variant in the gene encoding TACI (TNFRSF13B) was found to be specifically associated with SLE in East Asians in this study. Both these genes, TNFSF13B39 in Europeans and TNFRSF13B in East Asians, were found to have undergone positive selection in recent human history. Adaptation of the host to resist pathogens may underlie some of the ancestral group-heterogeneity.
TACI is expressed at very low levels in human newborns and mice before exposure to pathogens56,57 and previous studies have shown that certain pathogens can ablate B cell responses by modulating the expression of TACI58,59,60. The BAFF risk allele was shown to significantly upregulate humoral immunity39, and whether this is the case for the risk allele in TACI should be investigated. TACI blockers, such as Atacicept61 and Telitacicept62, might give better responses to SLE in patients of Asian ancestry and recent results of a Phase 2b study showed that the Telitacicept was efficacious for SLE patients in China62. In addition, the variant found in TNFRSF13B may be a useful genetic marker for the prescription of Belimumab and TACI blockers, a hypothesis that may warrant further study.
In addition to gene-environment interactions, gene-gene interactions could be another reason for the population difference. However, such effect requires much bigger power to be detected. Recent study showed that the penetrance of monogenic risk mutations could be dependent on polygenic background63, indicating a complex format of gene-gene interactions. Besides, ancestry-dependent tagging of untyped causal variants due to different LD structure may result in artefacts of population-different associations. This is an area that warrants further investigation, which demands both increase of study power and innovative methodologies.
Our analyses have identified a substantial number of novel SLE-associated genetic loci and deepened our understanding of the genetic factors that may underly the differences in the manifestation of SLE between peoples of European and non-European ancestry. Like a recent PRS study in SLE64, but to a greater extent, we have shown that PRS achieved a far better performance when based on ancestry-matched populations. Our findings contribute new insights into precise treatments, and to risk prediction and prevention of SLE.
Methods
Overview of samples
8252 subjects of Han Chinese descent from Hong Kong (HK), Guangzhou (GZ) and Central China (CC) were genotyped in this study. The institutional review boards of the institutes collecting the samples (The University of Hong Kong, Hospital Authority Hong Kong West Cluster and Guangzhou Women and Children’s Medical Center) approved the study and all subjects gave informed consent. These subjects were genotyped by the Infinium OmniZhongHua-8, the Infinium Global Screening Array-24 v2.0 (GSA) and the Infinium Asian Screening Array-24 v1.0 (ASA) platforms. Illumina GenomeStudio 2.0 was used to perform genotyping for individuals and transformed the results into PLINK format. Fourteen samples were randomly selected and genotyped by different platforms. High concordance rates (>99.9%) were observed for genotypes derived from the different platforms (Supplementary Table 1). Principal component (PC) analysis was performed to examine potential batch effects and no significant differences were observed from the PCs for data genotyped by different platforms and from different batches (Supplementary Fig. 1). Compared to our previous SLE GWAS13, 2042 more samples (992 cases and 1050 controls) were added to the HK cohort, and 2917 additional controls were combined with the CC cohort (named AH in the previous study13) after quality control procedure. The samples in the GZ cohort were newly recruited and genotyped in this study.
For the European data13, we split the samples into three cohorts to better control for population substructures (see the section below). Summary statistics for the GWAS from Spain21 (SP) and three ImmunoChip data sets from Korea (KR), Han Chinese in Beijing (BJ) and Malaysian Chinese (MC)22 were included in our analyses. In total, 11,283 SLE cases and 24,086 controls were involved in this study, and the sample size for each cohort was summarized in Supplementary Table 2.
Quality control and association study
Genotype Harmonizer65 was used to align the strands of variants of the Chinese GWAS to the reference of the 1000 Genomes Project Phase 3 panel. Variants with a low call rate (<90%), low minor allele frequency (<0.5%) and violation of Hardy–Weinberg equilibrium (P-value < 1E−04) were removed. Quality control required the following criteria: (i) missing genotypes were below 5%, (ii) hidden relatedness (identity-by-descent) with other samples was ≤12.5% (iii) inbreeding coefficients with other samples ranged from −0.05 to 0.05, and (iv) not having extreme PC values as computed for individuals using EIGENSTRAT embedded in PLINK66,67. After quality control, pre-phasing used SHAPEIT68 and individual-level genotype data were imputed to the density of the 1000 Genomes Project Phase 3 reference using IMPUTE269. We compared allele frequencies of all the variants after imputation for the same control groups genotyped by different platforms at different time points, and 190,618 variants were removed from further analyses due to significant differences (P-value < 5E−05). For association analysis SNPTEST70 was used to fit an additive model. Top PCs and the BeadChip types were included as covariates. The number of PCs to be adjusted for in each analysis was determined using a scree plot with a cutoff when the plot levels off. Variants with imputed INFO scores <0.7 were excluded. The genomic inflation factors (λGC) for the HK, GZ, and CC GWAS were 1.04, 1.03, and 1.04, respectively, and the LD score regression (LDSC)71 intercepts were 1.03, 1.02, and 1.03, respectively. Manhattan plots for each cohort are shown in Supplementary Fig. 3.
For the European SLE GWAS data, the λGC and LDSC intercept listed in LD hub seemed inflated (λGC = 1.17 and LDSC intercept = 1.10)20. Thus, the data were reanalyzed to minimize the potential influence of sub-population stratification (and see below). PCA analysis showed that subjects from the existing European data were more diverse than the Chinese subjects used in this study (Supplementary Fig. 2a). The European individuals were grouped into three cohorts by their PCs relative to the subjects of the 1000 Genomes Project. Subjects in the EUR GWAS 1, EUR GWAS 2, EUR GWAS 3 cohorts shared similar PCs with individuals of Spanish (IBS), northern and western European (CEU and GBR) and Italian (ITS) origins, respectively (Supplementary Fig. 2b and Supplementary Table 2). Quality control, imputation and association analyses were conducted, as for the Chinese datasets, in each cohort. λGC for the three European GWAS datasets were 1.05, 1.08, and 1.03, respectively, and the LDSC intercepts were 1.03, 1.04, and 1.00, respectively. Manhattan plots for each cohort are shown in Supplementary Fig. 4.
Meta-analyses of SLE association studies
Meta-analyses for the Chinese and European SLE GWAS were conducted independently. The summary association statistics from HK, GZ and CC GWAS (4222 SLE cases and 8431 controls) were combined in a meta-analysis using a fixed-effect model, weighted by the inverse-variance72. The λGC for the Chinese meta-analysis was 1.09 and the LDSC intercept was 1.04. For the European data, the EUR GWAS 1–3 datasets were combined with the SP GWAS21 in the meta-analysis (4576 cases and 8039 controls). λGC and the LDSC intercept for the European SLE meta-analysis reduced to 1.11 and 1.03, respectively.
Trans-ancestral meta-analysis across the Chinese and European GWAS cohorts used the fixed-effect model. The summary association statistics for the Immunochip data from KR, BJ, and MC22 were included as an in silico replication. The λGC for the trans-ancestry meta-analysis was 1.15, and the LDSC intercepts computed by using the LD score from either East Asian or European panels were 1.06 and 1.08, respectively.
Genetic correlation between the two ancestries
Trans-ancestral genetic correlation from the meta-analysis results for Chinese (HK, CC, and GZ GWAS) and Europeans (EUR GWAS 1–3 and SP GWAS) were estimated using the Popcorn algorithm23 based on common SNPs in the autosomes. The disease prevalence in Chinese and European populations were set to be 1‰ and 0.3‰, respectively4. SNPs were removed from this analysis according to the following criteria: (1) SNPs with strand-ambiguities (A/T or C/G alleles); (2) having MAF <5%; and (3) having imputed INFO score <0.9. The cross-ancestry LD scores were estimated using control subjects from the HK cohort (n = 3324) and EUR GWAS 2 cohort (n = 5379).
Heritability explained by the SLE-associated variants
The variance in liability explained by the SLE-associated variants was measured using VarExplained program73. Variants in the HLA region were excluded in the analysis. The disease prevalence was set to be 1‰ for East Asians and 0.3‰ for Europeans4. The novel loci increased the heritability explained from 0.10 to 0.13 for East Asians, and from 0.08 to 0.09 for Europeans.
Functional annotations of SLE associated SNPs
The stratified LD score regression method24 was applied on the trans-ancestral meta-analysis result to partition SNP-heritability across functional annotations. Twenty-eight categories of annotations that are not cell type specific (Supplementary Fig. 5) provided by this source were studied. For cell type-specific analyses, H3K4me1 and H3K4me3 modifications across 127 cell types (Supplementary Fig. 6) were downloaded from the Roadmap Epigenomics Project74. The cell type-specific enrichment was performed under the “full baseline” model24, which aimed to control for overlaps with annotations that are not cell type-specific. The RELI25 analyses were performed to identified TFs whose binding sites are enriched in the disease-associated loci. All SLE-associated SNPs and variants that are in high LD with them (r2 > 0.8) were taken as input. All the 1544 ChIP-seq datasets curated in this tool were tested in this study. The significance level and relative risk for each dataset were computed by comparing the observed intersections with expected intersections obtained from 2000 simulations.
Identification of putative SLE genes and gene-set enrichment analysis
Putative causal gene(s) across all the SLE-associated loci outside of the HLA region were identified using DEPICT75. The default setting (r2 > 0.3) was used to set boundaries for each SLE associated locus. Genes within (or overlapping) the boundaries were examined and those with a P-value <0.05 were defined as putative causal genes. If no genes were selected at that locus, gene(s) identified from eQTL data from human whole blood76,77 were considered to be putatively causal. The protein-protein interaction network and the enrichment P-value were constructed and computed by STRING78 (version 11). Gene-set enrichment analysis was performed using ToppGene27, with the August, 2019 versions of the KEGG79, Reactome80 and mouse knockout phenotype40 databases. The 2017 IUIS Phenotypic Classification for Primary Immunodeficiencies32 (PID) was used to obtain 320 human PID genes categorized into nine phenotypic classifications.
Trans-ancestral fine-mapping of the associated loci
The HLA region was excluded from this analysis, as extensive LD and limited genotyping of SNPs in both ancestries makes defining the best model of association difficult for this region. Disease loci with rare risk alleles (MAF < 0.01) or absent in one ancestry were also excluded, leaving 108 SLE-associated loci in the autosomes for this study. For each disease locus, all variants within the region were extracted for both ancestries. The genetic interval was determined by the closest recombination hotspots around a given disease-associated variant (defined as a recombination rate <10 cM/Mb). A fine-mapping algorithm, PAINTOR (version 3.0)28, was used to estimate the posterior probability of causality for each variant at a given locus based on the trans-ancestral model. For comparison, we also applied the fine-mapping algorithm on the Chinese and European SLE GWAS, separately. All analyses were run under the assumption of a single causal variant per locus, and conditional analysis was performed if multiple signals were present within a locus. The LD matrix was calculated using control samples from HK (n = 3324) and EUR GWAS 2 (n = 5379) for Chinese and European populations. Variants with a cumulative posterior probability greater than 95% were defined as putative causal variants (95% credible set).
Identification of loci with differential effects between the two ancestries
Cochran’s Q (CQ)-test81 was used to examine effect-size differences between the two ancestries for all the disease-associated variants in the autosomes. If the variants were also interrogated by the Immunochip22 system, the association results derived from the KR, BJ, and MC cohorts were also included. CQ-test P-values were adjusted for a cutoff of 0.05 using the Benjamini–Hochberg method82. For comparison, summary association statistics on RA were downloaded from a previous study30 of 4873 RA cases and 17,642 controls of Asian ancestry and 14,361 RA cases and 43,923 controls of European ancestry.
Colocalization analysis
Colocalization of association signals from the two ancestries was determined using the R package coloc31 on all variants with a MAF >1% and imputation (IMPUTE269) INFO score >0.9 within a given disease locus. Posterior probabilities (PP) for five different configurations were evaluated at the associated loci: PP0, no association in either group; PP1, association with SLE in East Asians but not in Europeans; PP2, association with SLE in Europeans but not in East Asians; PP3, association with SLE in both ancestries but by two independent signals; PP4, association with SLE in both East Asians and Europeans by the same signal. The average PP for the five configurations were 0.24, 0.14, 0.12, 0.08, and 0.42 for the ancestry-shared loci, and 0.04, 0.67, 0.24, 0.02, and 0.03 for the ancestry-specific loci.
To control for LD differences between ancestries, SLE association signals from Chinese populations were compared with those from 27 immune-unrelated phenotypes from European populations (LD hub20; Supplementary Table 6) to generate baseline posterior probabilities of colocalization in the absence of a phenotypic relationship. Ancestry-shared causal effects for SLE were expected to be significantly greater than the baseline values.
Analysis of selection signatures for the associated variants
The fixation index (Fst) was used to test allele-frequency differences between the two ancestries. Fst was calculated based on the following formula83:
where Ht is the expected proportion of heterozygosity in the pooled samples from all ethnicities based on Hardy–Weinberg equilibrium: \(H_t = 2\bar p\left( {1 - \bar p} \right)\), \(\bar p\) is the allele frequency in the overall pool. Hs, the expected proportion of heterozygosity in a subpopulation (either Chinese or Europeans), is estimated as
where Hpi is the expected heterozygosity in the ith subpopulation estimated by the allele frequency in that subpopulation under Hardy-Weinberg equilibrium. Npi is the sample size of the ith subpopulation.
Potential selective sweeps in respective ancestries were examined using the Integrated Haplotype Score (iHS) method, which measures the extended haplotype homozygosity for the ancestral allele relative to the derived allele84. Raw iHS scores were computed using the R package rehh85 and normalized by different frequency bins (50 bins over the range 0 to 1). Large negative standardized iHS values indicate long haplotypes carrying the derived allele, while large positive values suggest long haplotypes with the ancestral allele. The Fst and iHS values analyzed in this study were estimated using control subjects from HK (n = 3324) and EUR GWAS 2 (n = 5379). Standardized iHS scores based on the 1000 Genomes Project were downloaded from a previous study45 for comparison.
Calculation of polygenic risk scores
Polygenic risk scores (PRS) for individuals were computed using lassosum48, a penalized regression framework. The meta-analysis results on Europeans were used to calculate PRS for individuals of Chinese ancestry, and vice versa. LD information among SNPs was calculated from the testing dataset. These analyses were repeated using LDpred49.
The GZ SLE GWAS cohort was used as a test dataset to evaluate the influence of training data from different ancestries. Two predictors were constructed using lassosum based on meta-analysis results from: (1) HK and CC GWAS, 2618 cases and 7446 controls; (2) European GWAS, 4576 cases and 8039 controls. To control for influence from different sample sizes, 1500 cases and 1500 controls were randomly chosen from the Chinese and European populations to train two same-size predictors (repeated 3 times). PRS values generated from each test were scaled to a mean of 0 and a standard deviation of 1, and then evaluated based on the area under the ROC curve (AUC). The values and the 95% confidence intervals were calculated using the R package pROC86 and the optimal cut-off, the point that maximizes the sum of sensitivity and specificity, for case-control classification was estimated using the coords function.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Genome-wide association summary statistics for the East Asian populations can be accessed through the GWAS Catalog (GCST90011866). The data for the European populations are available at http://insidegen.com/ and http://urr.cat/data/GWAS_SLE_summaryStats.zip. The ImmunoChip data are publicly available for download at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4767573/bin/NIHMS747721-supplement-3.xlsx. Summary association statistics for other phenotypes are downloaded from LD hub (http://ldsc.broadinstitute.org/). Summary statistics for eQTL results are retrieved from Blood eQTL browser (https://genenetwork.nl/bloodeqtlbrowser/). Protein–protein interaction information is downloaded from STRING database (https://string-db.org/). Histone modifications across cell types are downloaded from the Roadmap Epigenomics Project (http://www.roadmapepigenomics.org/).
Code availability
The code that support the findings of this study are available from the corresponding author upon request.
References
Lawrence, J. S., Martins, C. L. & Drake, G. L. A family survey of lupus-erythematosus .1. heritability. J. Rheumatol. 14, 913–921 (1987).
Wang, J. et al. Systemic lupus erythematosus: a genetic epidemiology study of 695 patients from China. Arch. Dermatol. Res. 298, 485–491 (2007).
Kuo, C. F. et al. Familial aggregation of systemic lupus erythematosus and coaggregation of autoimmune diseases in affected families. JAMA Intern. Med. 175, 1518–1526 (2015).
Johnson, A. E., Gordon, C., Palmer, R. G. & Bacon, P. A. The prevalence and incidence of systemic lupus erythematosus in Birmingham, England. Relationship to ethnicity and country of birth. Arthritis Rheumat. 38, 551–558 (1995).
Danchenko, N., Satia, J. A. & Anthony, M. S. Epidemiology of systemic lupus erythematosus: a comparison of worldwide disease burden. Lupus 15, 308–318 (2006).
Costenbader, K. H. et al. Trends in the incidence, demographics, and outcomes of end-stage renal disease due to lupus nephritis in the US from 1995 to 2006. Arthritis Rheumat. 63, 1681–1688 (2011).
Ballou, S. P., Khan, M. A. & Kushner, I. Clinical features of systemic lupus erythematosus: differences related to race and age of onset. Arthritis Rheumat. 25, 55–60 (1982).
Stohl, W. et al. Efficacy and safety of subcutaneous belimumab in systemic lupus erythematosus: a fifty-two-week randomized, double-blind, placebo-controlled study. Arthritis Rheumatol. 69, 1016–1027 (2017).
Stohl, W. & Hilbert, D. M. The discovery and development of belimumab: the anti-BLyS-lupus connection. Nat. Biotechnol. 30, 69–77 (2012).
Chen, L., Morris, D. L. & Vyse, T. J. Genetic advances in systemic lupus erythematosus: an update. Curr. Opin. Rheumatol. https://doi.org/10.1097/BOR.0000000000000411 (2017).
Wen, L. et al. Exome-wide association study identifies four novel loci for systemic lupus erythematosus in Han Chinese population. Ann. Rheumat. Dis. https://doi.org/10.1136/annrheumdis-2017-211823 (2017).
Wang, Y. F. et al. Identification of ST3AGL4, MFHAS1, CSNK2A2 and CD226 as loci associated with systemic lupus erythematosus (SLE) and evaluation of SLE genetics in drug repositioning. Ann. Rheumat. Dis. https://doi.org/10.1136/annrheumdis-2018-213093 (2018).
Morris, D. L. et al. Genome-wide association meta-analysis in Chinese and European individuals identifies ten new loci associated with systemic lupus erythematosus. Nat. Genet. https://doi.org/10.1038/ng.3603 (2016).
Langefeld, C. D. et al. Transancestral mapping and genetic load in systemic lupus erythematosus. Nat. Commun. 8, 16021 (2017).
Wang, Y. F., Lau, Y. L. & Yang, W. Genetic studies on systemic lupus erythematosus in East Asia point to population differences in disease susceptibility. Am. J. Med. Genet. C Semin. Med. Genet. https://doi.org/10.1002/ajmg.c.31696 (2019).
Cunninghame Graham, D. S. et al. Association of NCF2, IKZF1, IRF8, IFIH1, and TYK2 with systemic lupus erythematosus. PLoS Genet. 7, e1002341 (2011).
Li, P., Chang, Y. K., Shek, K. W. & Lau, Y. L. Lack of association of TYK2 gene polymorphisms in Chinese patients with systemic lupus erythematosus. J. Rheumatol. 38, 177–178 (2011).
Kyogoku, C. et al. Lack of association between tyrosine kinase 2 (TYK2) gene polymorphisms and susceptibility to SLE in a Japanese population. Mod. Rheumatol. 19, 401–406 (2009).
Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).
Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics https://doi.org/10.1093/bioinformatics/btw613 (2016).
Julià, A. et al. Genome-wide association study meta-analysis identifies five new loci for systemic lupus erythematosus. Arthritis Res. Ther. 20, 100 (2018).
Sun, C. et al. High-density genotyping of immune-related loci identifies new SLE risk variants in individuals with Asian ancestry. Nat. Genet. 48, 323–330 (2016).
Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. & Asian Genetic Epidemiology Network Type 2 Diabetes, C. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Harley, J. B. et al. Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity. Nat. Genet. https://doi.org/10.1038/s41588-018-0102-3 (2018).
Wang, T. Y. et al. Identification of regulatory modules that stratify lupus disease mechanism through integrating multi-omics data. Mol. Ther. Nucleic Acids 19, 318–329 (2020).
Chen, J., Bardes, E. E., Aronow, B. J. & Jegga, A. G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009).
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
Manku, H. et al. Trans-ancestral studies fine map the SLE-susceptibility locus TNFSF4. PLoS Genet. 9, e1003554 (2013).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Bousfiha, A. et al. The 2017 IUIS phenotypic classification for primary immunodeficiencies. J. Clin. Immunol. 38, 129–143 (2018).
Liao, M. et al. Genome-wide association study identifies common variants at TNFRSF13B associated with IgG level in a healthy Chinese male population. Genes Immun. 13, 509–513 (2012).
Osman, W. et al. Association of common variants in TNFRSF13B, TNFSF13, and ANXA3 with serum levels of non-albumin protein and immunoglobulin isotypes in Japanese. PLoS ONE 7, e32683 (2012).
Yang, M. et al. Genome-wide scan identifies variant in TNFSF13 associated with serum IgM in a healthy Chinese male population. PLoS ONE 7, e47990 (2012).
Castigli, E. et al. TACI and BAFF-R mediate isotype switching in B cells. J. Exp. Med. 201, 35–39 (2005).
Sakai, J. & Akkoyunlu, M. The role of BAFF system molecules in host response to pathogens. Clin. Microbiol. Rev. 30, 991–1014 (2017).
Zhang, Y., Li, J., Zhang, Y. M., Zhang, X. M. & Tao, J. Effect of TACI signaling on humoral immunity and autoimmune diseases. J. Immunol. Res. 2015, 247426 (2015).
Steri, M. et al. Overexpression of the cytokine BAFF and autoimmunity risk. N. Engl. J. Med. 376, 1615–1626 (2017).
Eppig, J. T. et al. The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res. 40, D881–D886 (2012).
Dai, X. et al. A disease-associated PTPN22 variant promotes systemic autoimmunity in murine models. J. Clin. Investig. 123, 2024–2036 (2013).
Wang, J. H. et al. Aiolos regulates B cell activation and maturation to effector state. Immunity 9, 543–553 (1998).
Chen, X. et al. An autoimmune disease variant of IgG1 modulates B cell activation and differentiation. Science 362, 700–705 (2018).
Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).
Johnson, K. E. & Voight, B. F. Patterns of shared signatures of recent positive selection across human populations. Nat. Ecol. Evol. 2, 713–720 (2018).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. https://doi.org/10.1038/s41588-018-0183-z (2018).
Schizophrenia Working Group of the Psychiatric Genomics, C. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41, 469–480 (2017).
Vilhjalmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
Grundbacher, F. J. Heritability estimates and genetic and environmental correlations for the human immunoglobulins G, M, and A. Am. J. Hum. Genet. 26, 1–12 (1974).
Tollerud, D. J. et al. Racial differences in serum immunoglobulin levels: relationship to cigarette smoking, T-cell subsets, and soluble interleukin-2 receptors. J. Clin. Lab. Anal. 9, 37–41 (1995).
Mehta, A., Ramirez, G., Ye, G., McGeady, S. & Chang, C. Correlation Between IgG, IgA, IgM and BMI or Race in a Large Pediatric Population. J. Allergy Clin. Immunol. 129, AB85 (2012).
Roode, H. Serum immunoglobulin values in white and black South African pre-school children. Part I: Healthy children. J. Trop. Pediatr. 26, 104–107 (1980).
Lau, Y. L., Jones, B. M., Ng, K. W. & Yeung, C. Y. Percentile ranges for serum IgG subclass concentrations in healthy Chinese children. Clin. Exp. Immunol. 91, 337–341 (1993).
Akkoyunlu, M. TACI expression is low both in human and mouse newborns. Scand. J. Immunol. 75, 368 (2012).
Kanswal, S., Katsenelson, N., Selvapandiyan, A., Bram, R. J. & Akkoyunlu, M. Deficient TACI expression on B lymphocytes of newborn mice leads to defective Ig secretion in response to BAFF or APRIL. J. Immunol. 181, 976–990 (2008).
Treml, L. S. et al. TLR stimulation modifies BLyS receptor expression in follicular and marginal zone B cells. J. Immunol. 178, 7531–7539 (2007).
Kanswal, S. et al. Suppressive effect of bacterial polysaccharides on BAFF system is responsible for their poor immunogenicity. J. Immunol. 186, 2430–2443 (2011).
Moir, S. et al. Decreased survival of B cells of HIV-viremic patients mediated by altered expression of receptors of the TNF superfamily. J. Exp. Med. 200, 587–599 (2004).
Isenberg, D. et al. Efficacy and safety of atacicept for prevention of flares in patients with moderate-to-severe systemic lupus erythematosus (SLE): 52-week data (APRIL-SLE randomised trial). Ann. Rheum. Dis. 74, 2006–2015 (2015).
Wu, D. et al. A Human Recombinant Fusion Protein Targeting B Lymphocyte Stimulator (BlyS) and a Proliferation-Inducing Ligand (APRIL), Telitacicept (RC18). In Systemic Lupus Erythematosus (SLE): Results of a Phase 2b Study [abstract]. Arthritis Rheumatology. Vo. 71 (Suppl 10) https://acrabstracts.org/abstract/a-human-recombinant-fusion-protein-targeting-b-lymphocyte-stimulator-blys-and-a-proliferation-inducing-ligand-april-telitacicept-rc18-in-systemic-lupus-erythematosus-sle-results-of-a-phase/ (2019).
Fahed, A. C. et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat. Commun. 11, 3635 (2020).
Chen, L. et al. Genome-wide assessment of genetic risk for systemic lupus erythematosus and disease severity. Hum. Mol. Genet. https://doi.org/10.1093/hmg/ddaa030 (2020).
Deelen, P. et al. Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration. BMC Res. Notes 7, 901 (2014).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Delaneau, O. & Marchini, J. & Genomes Project, C. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
So, H. C., Gui, A. H., Cherny, S. S. & Sham, P. C. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet. Epidemiol. 35, 310–317 (2011).
Roadmap Epigenomics, C. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Westra, H. J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. https://doi.org/10.1038/ng.3737 (2016).
Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).
Cochran, W. G. The combination of estimates from different experiments. Biometrics 10, 101–129 (1954).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
Nei, M. Analysis of gene diversity in subdivided populations. Proc. Natl Acad. Sci. USA 70, 3321–3323 (1973).
Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
Gautier, M. & Vitalis, R. rehh: an R package to detect footprints of selection in genome-wide SNP data from haplotype structure. Bioinformatics 28, 1176–1177 (2012).
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 12, 77 (2011).
Acknowledgements
We thank grant support from National Key Research and Development Program of China (2017YFC0909001), Hong Kong PhD fellowship scheme (HKPF), HKU Postgraduate Scholarships and the Edward and Yolanda Wong Fund for supporting postgraduate students who participated in this work. We also thank the Hong Kong Area of Excellence (AoE) NPC case-control study partially funded by the World Cancer Research Fund (WCRF) for sharing their GWAS data. Y.-F.W. thanks grant support from National Natural Science Foundation of China (Grant No. 81801636). Y.Z. thanks grant support from National Natural Science Foundation of China (Grant No. 81970450) and the Science and Technology Project of Guangzhou (Grant No. 201903010074). W.Y. and Y.L.L. thank Research Grant Council of Hong Kong for support on genetic studies of SLE (GRF 17146616,17106320).
Author information
Authors and Affiliations
Contributions
W.Y. and Y.-F. Wang conceived the study. Y.L.L. and Y.Z. took the lead in data collection and management. Z.L. H.Z., Y.S., X.Y., S.-L.Z., X.G., J.H., Q.W., T.-H.L. J.-H.L., Z.-M.M., Y.T., Y.C., Q.S., B.B., C.C.M., C.Y., L.L., N.S., P.C.S., C.C.L., J.Y. and X.Z. undertook subject recruitment and collected phenotype data. D.M. and T.J.V. shared SLE GWAS data on European populations. Y.-F.W., T.-Y.W., Y.C., M.G., J.J.S. carried out data analyses including quality control, genotype imputation, association, and meta-analyses. Y.-F.W., T.-Y.W., and Y.L. carried out fine-mapping, selection signatures analyses and PRS comparison between the two ancestral groups. Y.-F.W., W.Y., Y.Z., Y.L.L., P.C.S., and D.K.S. wrote the manuscript. All authors read and contributed to the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks Leah Kottyan, Chris Wallace and Guillermo Reales for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, YF., Zhang, Y., Lin, Z. et al. Identification of 38 novel loci for systemic lupus erythematosus and genetic heterogeneity between ancestral groups. Nat Commun 12, 772 (2021). https://doi.org/10.1038/s41467-021-21049-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-021-21049-y
This article is cited by
-
Causal relationship between systemic lupus erythematosus and primary liver cirrhosis based on two-sample bidirectional Mendelian randomization and transcriptome overlap analysis
Arthritis Research & Therapy (2024)
-
Association between systemic lupus erythematosus and osteoporosis: a mendelian randomization analysis
BMC Rheumatology (2024)
-
Causal effects of endometriosis on SLE, RA and SS risk: evidence from meta-analysis and Mendelian randomization
BMC Pregnancy and Childbirth (2024)
-
Shared genetic architecture between autoimmune disorders and B-cell acute lymphoblastic leukemia: insights from large-scale genome-wide cross-trait analysis
BMC Medicine (2024)
-
Systemic lupus erythematosus genetics: insights into pathogenesis and implications for therapy
Nature Reviews Rheumatology (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.