Multi-ancestry and multi-trait genome-wide association meta-analyses inform clinical risk prediction for systemic lupus erythematosus

Systemic lupus erythematosus is a heritable autoimmune disease that predominantly affects young women. To improve our understanding of genetic etiology, we conduct multi-ancestry and multi-trait meta-analysis of genome-wide association studies, encompassing 12 systemic lupus erythematosus cohorts from 3 different ancestries and 10 genetically correlated autoimmune diseases, and identify 16 novel loci. We also perform transcriptome-wide association studies, computational drug repurposing analysis, and cell type enrichment analysis. We discover putative drug classes, including a histone deacetylase inhibitor that could be repurposed to treat lupus. We also identify multiple cell types enriched with putative target genes, such as non-classical monocytes and B cells, which may be targeted for future therapeutics. Using this newly assembled result, we further construct polygenic risk score models and demonstrate that integrating polygenic risk score with clinical lab biomarkers improves the diagnostic accuracy of systemic lupus erythematosus using the Vanderbilt BioVU and Michigan Genomics Initiative biobanks.


File name: Supplementary Data 3 Description: Posterior probability of replicability (PPR) for sentinel variants identified from multi-ancestry and multi-trait meta-analysis.
In total, we identify 79 known loci and 27 novel loci, of which 74 known loci and 16 novel loci are deemed replicable with PPR > 0.90 via RATES. We arrange the variants by chromosome and genomic location. Two-sided P value associated with each variant is calculated according to the Chi-squared test statistic with 1 degree of freedom.
File name: Supplementary Data 4 Description: Cochran's Q test results for sentinel variants identified from multi-ancestry and multi-trait meta-analysis. In total, 103 of the 106 loci (97%) have two-sided P values ≥ 0.05/106, which fail to reject the null hypothesis of no genetic effect heterogeneity. We arrange the variants by chromosome and genomic location.
File name: Supplementary Data 5 Description: Previously known SLE loci that do not reach genome-wide significance in our multi-ancestry and multi-trait meta-analysis. In total, there are 94 known loci that do not reach genome-wide significance in the meta-analysis. Known loci are retrieved from GWAS catalog and SLE studies included in the meta-analysis. Mapped target gene is based on the Open Target Genetics database. We report the P values of the sentinel variants in each locus in our analyses. We arrange the loci by chromosome and genomic location. Two-sided P value associated with each variant is calculated according to the Chi-squared test statistic with 1 degree of freedom.
File name: Supplementary Data 6 Description: Top TWAS associations at 106 SLE GWAS loci using DGN gene expression prediction models. We define a GWAS locus as a 1 Mb window surrounding each GWAS sentinel variant. Top TWAS associations at 48 SLE GWAS loci reach transcriptome-wide significance threshold (P value < 2.5 × 10 −6 ; Bonferroni threshold for testing 20,000 genes). We label loci without genes in them as NA. Two-sided P value associated with each variant is calculated according to the Chi-squared test statistic with 1 degree of freedom. Two-sided TWAS P value associated with each gene is calculated based on the TWAS Z score for gene-based association test.
File name: Supplementary Data 7 Description: P values from TWAS associations at 106 SLE GWAS loci using GEUVADIS gene expression prediction models. We define a GWAS locus as a 1 Mb window surrounding each GWAS sentinel variant. Top TWAS associations at 42 SLE GWAS loci reach transcriptomewide significance threshold (P value < 2.5 × 10 −6 ; Bonferroni threshold for testing 20,000 genes). We label loci without genes in them as NA. Two-sided P value associated with each variant is calculated according to the Chi-squared test statistic with 1 degree of freedom. Two-sided P value associated with each gene is calculated based on the TWAS Z score for gene-based association test.  Table 4). AUC1 represents area under the receiver operating characteristic curve (AUC) for "Method1", while AUC2 refers to AUC for "Method2". Corresponding 95% confidence interval (95% CI) is estimated from bootstrap with 1,000 replicates. P value is calculated via two-sided Delong's test.