Genome-wide association study identi ﬁ es Sjögren ’ s risk loci with functional implications in immune and glandular cells

Sjögren ’ s disease is a complex autoimmune disease with twelve established susceptibility loci. This genome-wide association study (GWAS) identi ﬁ es ten novel genome-wide signi ﬁ cant (GWS) regions in Sjögren ’ s cases of European ancestry: CD247 , NAB1 , PTTG1-MIR146A , PRDM1-ATG5 , TNFAIP3 , XKR6 , MAPT- CRHR1 , RPTOR-CHMP6-BAIAP6 , TYK2 , SYNGR1 . Polygenic risk scores yield predictability (AUROC = 0.71) and relative risk of 12.08. Interrogation of bioinformatics databases re ﬁ ne the associations, de ﬁ ne local regulatory networks of GWS SNPs from the 95% credible set

Studies of Sjögren's genetics have been largely limited to familial aggregation and candidate gene studies 12 . In 2013, the Sjögren's Genetics Network (SGENE) published the first large-scale genomic study of Sjögren's of European ancestry using the ImmunoChip 1.0 array 16 . It identified six novel genome-wide significant (GWS; P GWAS < 5 × 10 −08 ) regions of association, identified several additional suggestive regions of association, and replicated previously established regions 16 . In the same issue of Nature Genetics, Li et al. published the first genome-wide association study (GWAS) identifying GTF2IRD1-GTF2I as a GWS region of association in Sjögren's of Han Chinese ancestry 17 . Later, Zhao et al. leveraged ImmunoChip data from European and East Asian populations to further characterize the GTF2IRD1-GTF2I locus in systemic lupus erythematosus (SLE) and Sjögren's, discovering a missense mutation in Neutrophil cytosolic factor 1 (NCF1) (pArg90His) that decreases reactive oxygen species production and predisposes individuals to Sjögren's and other autoimmune diseases 18 . Also in 2017, the Sjögren's International Collaborative Clinical Alliance (SICCA) reported a GWAS of Sjögren's of European and Asian ancestry that uncovered ancestry-specific heterogeneity between genetic associations, replicated previously established associations, and identified several suggestive regions of association but, due to the small European case-control cohort, did not identify new GWS associations 19 . To date, only 12 loci (9 in European populations) are established as GWS associations with Sjögren's (Supplementary Data 1) 12,[16][17][18][19][20][21][22][23][24] ; at least ten times fewer loci than related autoimmune diseases, such as SLE and rheumatoid arthritis (RA) 25,26 .
Defining the genetic risk of Sjögren's will provide important insights into the dysregulated molecular mechanisms that influence disease pathogenesis and promote the development of new therapeutic approaches to improve early diagnosis and treatment. To address the current gap in Sjögren's genetics, we performed the largest GWAS to date in Sjögren's of European ancestry, resulting in the identification of ten novel GWS associations (Fig. 1a). Using genotyped SNPs, we also assessed the genetic correlation with related autoimmune diseases and the ability of these SNPs to predict disease using polygenic risk score (PRS) analyses. Last, we performed a deep analysis of bioinformatic data to predict the functionality of the most likely functional SNPs in each locus (Fig. 1b). Our approach, which searched for a coalescence of available cell type-specific expression quantitative trait loci (eQTLs) and mapped topologically associated domain (TAD) interactions with enrichment of epigenetic marks indicative of gene regulation, identified several likely functional SNPs from each locus for future mechanistic investigation [27][28][29][30] . Further, mapping TADs that interacted with genomic regions carrying risk SNPs revealed several extended regulatory networks that likely modulate the expression of >40 additional genes up or downstream of the index gene.

Results
Genome-wide association study of Sjögren's of European ancestry. A GWAS was performed on 3232 Sjögren's cases and 17,481 population controls of European ancestry remaining after quality control (Supplementary Fig. 1A-G; Supplementary Data 2). Inflation in the test statistic for the 101,574 SNPs in common between the arrays was λ = 1. 15 and was reduced to λ = 1.10 when the previously established regions were removed ( Supplementary  Fig. 1E, F). Whole-genome imputation increased the number of variants tested for single-marker SNP-Sjögren's association from 101,574 to 6,257,359 polymorphisms tested. In total, seven novel regions exceeded GWS (P GWAS < 5 × 10 −8 ), while 74 additional regions reached suggestive association (P Suggestive < 5 × 10 −5 ) (Fig. 2a- Four of the nine previously established regions in Sjögren's of European ancestry were replicated at GWS threshold: major histocompatibility complex (MHC) region, including the previously associated MICA*008, STAT1-STAT4, TNIP1, and IRF5-TNPO3 ( Supplementary Fig. 1G, Supplementary Data 1). Previously associated IL12A, BLK, and CXCR5 loci reached suggestive association threshold.
Since some of the identified regions overlapped with loci covered by the ImmunoChip 1.0, a meta-analysis was performed using ImmunoChip 1.0 data from an additional 619 Sjögren's cases of European ancestry and 6171 population controls independent from the GWAS, focusing on regions with common genotyped SNPs ( Supplementary Fig. 1H, I; Supplementary Data 2) 16 . The meta-analysis replicated four regions from the GWAS and identified three additional novel GWS associations (Fig. 2a, i-k; Table 1). It also increased the significance of the previously associated IL12A, BLK, and CXCR5 loci from suggestive to GWS ( Fig. 2a; Supplementary Data 1). MAPT-CRHR1, RPTOR-CHMP6-BAIAP6, and SYNGR1 regions were not available for testing in the ImmunoChip 1.0 data. Collectively, this GWAS and meta-analysis increased the number of established Sjögren-associated loci surpassing GWS from 12 to 22.
Bayesian analysis: Posterior probability estimates were calculated for each SNP adjusting for genotype covariates in each risk loci using Trinculo. iii.
Haplotype analysis: LD and probable haplotype blocks were determined using solid spine of LD algorithms in HAPLOVIEW.
Step 2: Annotation of candidate functional variants in risk loci i. RegulomeDB score: RegulomeDB database was used to identify scores for SNPs in 95% credible sets for each loci. ii.
HaploReg: HaploReg database was used to query the enrichment of regulatory chromatin states from DNAse and histone ChIP-Seq experiments in primary immune cells and EBV transformed B lymphocytes, proteins bound in ChiP-Seq experiments, reported eQTLs and regulatory motifs altered for each SNP in 95% credible sets for each loci. iii.
IMPACT score: IMPACT was used to identify the SNPs with the highest transcription factor binding activity in the 95% credible set.
Step 3: Cis-eQTL mapping and Promoter capture Hi-C analyses i. eQTL mapping: FUMA GWAS was used to map SNPs to genes up to 1Mb apart that show a significant (p<0.05) cis-eQTL association in different databases: eQTL catalogue, single-cell RNA sequencing eQTLs, Database of Immune Cell eQTLs and Epigenomics (DICE), and minor salivary gland (GTEx v08). Another online tool, QTLbase was also used to map SNPs in 95% credible sets to cis-eQTLs. ii.
Promoter capture Hi-C analysis: Pre-processed capture Hi-C data from 3D Genome Browser for each candidate functional variants from 14 immune cells and EBV transformed B lymphocytes were used to find promoter and enhancer sites located 5kb either side of each SNP. The start and end coordinates of each consensus loop were then queried in USCS genome browser to identify overlapping and/or nearby genes. iii.
EcholocatoR: R package for automated fine-mapping, annotation and plotting of SNPs to identify most probable causal SNPs.

RegDB scores
Enrichment in immune cell type  Fig. 1 Schematic of the Sjögren's GWAS and Bioinformatic Workflow. a Workflow of the six genotyped datasets (DS1-6) and one ImmunoChip dataset (DS7) used in this study, including the number of cases, controls, and SNPs included in each dataset pre-and post-quality control, and after whole-genome imputation. The post-imputation merged dataset (PI1) containing DS1-6 was used to perform the SNP-Sjögren's single marker trait analysis (orange), the polygenic risk score (PRS) analysis (yellow), the genetic correlation analyses (blue), and the epigenetic enrichment analyses (blue). Meta-analysis was performed using the genotyped PI1 merged dataset and DS7 ImmunoChip dataset (green) merged using the DS7 genotyping platform. See Supplementary Data 1 for detailed information for each dataset. b Statistical and bioinformatic analysis workflow applied to each novel risk locus to identify and predict functionality of likely functional SNPs.
transcription factor (BATF), and MYC proto-oncogene, BHLH transcription factor (MYC), exhibited the highest number of intersections with Sjögren's risk loci (Fig. 4c). Further, seven of the ten transcription factors yielding the highest number of intersection counts with Sjögren's risk loci were previously reported to be associated with EBV 38 . Moreover, the EBVencoded protein, Epstein-Barr Virus nuclear protein 2 (EBNA2), occupied 8 of the 17 GWS Sjögren's risk loci.
Refinement of the novel Sjögren's genetic associations. Publicly available Hi-C data from the EBV B cell line, GM12878, and eQTL data from the DICE database in FUMA [http://fuma.ctglab.nl/] 39,40 were used to map TADs and eQTLs reported for genes positioned in each Sjögren's risk locus. A majority of the Sjögren's risk loci exhibited strong linkage disequilibrium with many SNPs and enrichment of reported eQTLs and TADs in GM12878 (Fig. 5a), indicating that the risk alleles of these Sjögren-SNPs may modulate extended local regulatory networks to alter the expression of genes beyond the index gene, which is identified based on closest proximity to the index SNP. For example, one of the two associated regions on chromosome 17, MAPT-CRHR1, exhibited extended linkage disequilibrium with many SNPs spread broadly across several genes within the~1.4 Mb region flanking the index gene (Supplementary Fig. 3; Supplementary Data 5, 6). Exploring TADs reported in various cell types revealed that many of the SNPs may have potential functional implications spanning up to 1.9 Mb from MAPT-CRHR1 ( Supplementary Fig. 3F). These data were further supported by the large number of strong cis-eQTLs and blood cell count traits reported for many of the SNPs in the Sjögren-associated MAPT-CRHR1 locus (Supplementary Data 7). Logistic regression and Bayesian analyses were used to refine the association signals from each of the 10 novel GWS regions, then posterior probability analyses were used to identify the 95% credible set of likely functional SNPs present in each region ( Fig. 2b- 43 , and annotated accordingly. SNPs with the strong evidence of known and predicted regulatory functions, including transcription factor binding sites, reported promotor and enhancer activities, DNAse hypersensitivity, and eQTLs were selected for further functional dissection (Fig. 1b).
Coalescence of epigenetic marks with eQTLs mapped to TADs identified potential functional SNPs from each locus that most likely         . Further, rs2949661 may alter the expression of cellular repressor of E1A stimulated genes 1 (CREG1), as both eQTLs and TAD coalesced in B and CD8 + T cells. Active transcription factor binding sites were also enriched at rs2949661 compared with other 95% credible SNPs in CD247 (Fig. 5c)  rs10800313 as additional SNPs of interest ( Supplementary  Fig. 4K). The association region peaking at rs4841466 positioned in the intron of XK-related 6 (XKR6) is an independent association signal from the previously reported FAM167A-BLK region ( Fig. 2d; Supplementary Fig. 5; Supplementary Data 11,12) 45,46 ; as observed in SLE 47,48 . Bioinformatic analyses revealed that the index variant, rs4841466, likely has no biological function (Supplementary Data 13); however, four additional potential functional variants positioned in a predicted intronic regulatory element of XKR6 were identified in the 95% credible set: rs11250099, rs4841465, rs11250098, and rs4314618 (Supplementary Fig. 5G-J; Supplementary Data 11,12). Coalescence between epigenetic marks, eQTLs, and TADs indicated that rs11250098 and rs11250099 are most likely to impact enhancer activity that may target XKR6 and myotubularin-related protein 9 (MTMR9) ( Fig. 5b; Supplementary Fig. 5G, I; Supplementary Data 13). Moreover, eQTL data in minor salivary glands implicated XKR6, RP1 like 1 (RP1L1), and solute carrier family 35 member G5 (SLC35G5). Interestingly, IMPACT analyses showed limited transcription factor binding activity (Fig. 5c), suggesting that specific cellular contexts may be required. EcholocatoR analyses using rs11250099, rs11250098, rs4841465, or rs4314618 as the index SNP all identified rs60724652 as an additional SNP of interest ( Supplementary Fig. 5K-N).
The index SNP, rs2069235, positioned in the intronic region of synaptogyrin 1 (SYNGR1), has epimarks indicative of a promoter element for an alternate isoform in immune cells (Fig. 5b,  Supplementary Fig. 6G; Supplementary Data 16). While the conditional analysis indicated a single effect in the region, several SNPs were present in the 95% credible set and exhibited bioinformatic indicators of function, including rs2069235, rs909685, rs2267407, and rs3747177 ( Fig. 2h; Supplementary  Fig. 6; Supplementary Data 14-16). The IMPACT annotation showed transcription factor binding activity for rs2069235 and rs3747177 (Fig. 5c). rs2069235 not only modulated SYNGR1 expression in several immune cell types and minor salivary gland, but also showed coalescence of both TADs and eQTLs for activating transcription factor 4 (ATF4) and chromobox 7 (CBX7) in macrophages and neutrophils ( Fig. 5b; Supplementary Fig. 6G; Supplementary Data 16). Closer evaluation of the SNPs in the 95% credible set revealed a second set of SNPs with much lower posterior probabilities that were located~84.5 kb upstream of the peak association signal in the proximal promoter of PDGFB: rs5757585, rs11703434, rs137594 ( Fig. 5b; Supplementary  Fig. 6K-M; Supplementary Data 14-16). Several eQTLs, including SYNGR1 and platelet-derived growth factor subunit B (PDGFB) in several immune cell types and the minor salivary gland, have been reported for rs5757585. Independent Echolo-catoR analyses using rs2069235, rs909685, rs2267407, rs3747177, rs5757585, rs11703434, or rs137594 as the index SNP all identified rs470049 as an additional SNP enriched for bioinformatic indicators of function ( Supplementary Fig. 6N-T). Echo-locatoR analyses using rs5757585, rs11703434, or rs137594 also identified rs5757599 and rs11089938 as potential functional SNPs ( Supplementary Fig. 6R-T).  Supplementary Fig. 7B). Epigenetic regulatory marks, eQTLs, and TAD data suggest that rs2293765 may alter promoter activity and NAB1 expression, as well as alter enhancer activity to modify major facilitator superfamily domain containing 6 (MFSD6), nuclear envelope integral membrane protein 2 (TMEM194B/NEMP2), and AC093388.3 expression in multiple cell types ( Fig. 5b; Supplementary Fig. 7G; Supplementary Data 19). TMEM194B/NEMP2 was also a cis-eQTL in the minor salivary gland for three correlated variants: rs11900804 (r 2 = 0.96), rs2192008 (r 2 = 0.94), and rs744600 (r 2 = 0.72) (Supplementary Fig. 7H-J; Supplementary Data 20). Our annotation analysis showed that rs2192008 has high transcription factor binding activity (Fig. 5c). EcholocatoR analyses of the NAB1 region using rs2192008 and rs744600 as index SNPs identified rs1468685 as an additional SNP of interest ( Supplementary Fig. 7L,  N). Analyses using rs2293765 or rs11900804 as index SNPs did not identify any additional SNPs ( Supplementary Fig. 7K, M).
The index SNP, rs8071514, positioned in the promoter region of RPTOR-CHMP6-BAIAP6, had the highest posterior probability of the 95% credible SNP set, but bioinformatic analyses suggested Chromatin Interactions and eQTLs of the ten novel Sjögren-associated genetic risk loci. a Circos plot shows the zoom regional Manhattan plots for each genetic risk locus (outer most layer); SNPs with P-value <0.05 (black); r 2 > 0.08 (red); r 2 > 0.06 (orange). Index SNP rsIDs are indicated in red. Black rsIDs are prioritized SNPs from the 95% credible set that are also eQTLs that exhibit chromatin-chromatin interactions and are shown in (b). Outer circle displays the chromosome coordinate with the genomic risk loci highlighted in blue. Genes that are eQTLs (green) or exhibit chromatin interaction by Hi-C in Epstein-Barr virus (EBV)-transformed B lymphocytes (orange) are reported on the inner circles as text or interaction links. Each index gene is colored blue. Genes that are eQTLs and engage in chromatin interactions are reported in purple. b Cell type-specific functional annotations (horizontal rectangles), select eQTLs (top triangles), and chromatin-chromatin interactions (bottom triangle) are shown for the indicated prioritized SNPs from each 95% credible set. MIR146A was omitted because mined eQTL databases did not test MIR146A. Complex linkage disequilibrium of the CRHR1 association impaired refinement and fine-mapping of the region. c IMPACT annotation of the most likely functional Sjögren-SNPs to quantify SNP position in 700 celltype-specific active transcription factor binding sites. Top panel depicts SNP position (blue lines) relative to genomic coordinates (Mb) of each indicated locus. Bottom panel shows the total number of active transcription factor binding sites detected at each indicated SNP. NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30773-y ARTICLE that it was not likely functional ( Supplementary Fig. 8B; Supplementary . Four additional SNPs in the 95% credible set exhibited bioinformatic indicators of function, including epigenetic marks of both promoter and enhancer activity, eQTLs, and TADs in several immune cell types: rs4969328, rs6565516, and rs4969331 in the promoter region of charged multivesicular body protein 6 (CHMP6) and rs6565518 in a predicted intronic enhancer of CHMP6 ( Fig. 5b; Supplementary Fig. 8G-J; Supplementary Data 22). The fact that rs4969328, rs6565516, and rs4969331 were all eQTLs for CHMP6 in immune cell types, despite TAD deficiency in the region, indicated that the three SNPs likely modulate CHMP6 expression by altering promoter activity. In addition, rs4969328 exhibited moderate transcription factor binding activity, while there was no notable transcription factor binding activity in the other two SNPs (Fig. 5c). All three SNPs were also minor salivary gland eQTLs for TMEM105 long non-coding RNA (TMEM105) and small integral membrane protein 11 (SMIM11, a.k.a. FAM165B) 314 kb and~816 kb downstream, respectively ( Fig. 5b; Supplementary Fig. 8G-J; Supplementary Data 22). Further, TAD and eQTL data suggested that rs4969328 may also modulate the activity of an enhancer that, in turn, modulates the activity of the RPTOR) promoter through long-range interactions in several immune cell types. EcholocatoR analyses using rs4969328, rs6565516, rs4969331 or rs6565518 as the index SNP all identified rs4969322, rs34050444, and rs8071514 as additional SNPs of interest ( Supplementary Fig. 8K-N).

Functional analysis of Sjögren-SNPs in intergenic enhancers.
In the region of PRDM1-ATG5, GWS Sjögren-SNP association was observed after meta-analysis, peaking at rs526531 positioned 10 kb downstream of PR/SET domain 1 (PRDM1; encodes B lymphocyte-induced maturation protein 1 (BLIMP1)) ( Fig. 2j; Supplementary Data 23,24). Seven variants in the 95% credible set clustered in an intergenic region enriched with enhancer regulatory marks (Supplementary Fig. 9; Supplementary Data 25). Of these, four of the SNPs were eQTLs for PRDM1 and autophagy-related 5 (ATG5) in immune cell types ( Fig. 5b; Supplementary Fig. 9; Supplementary Data 25). Interestingly, ATG5 was also an eQTL in the minor salivary gland for rs526531 and rs533733 ( Fig. 5b; Supplementary Fig. 9G, I; Supplementary Data 25). Further, IMPACT analyses revealed elevated transcription factor binding activity at rs533733 (Fig. 5C). While we did not observe colocalization with TADs and eQTLs in this region, lymphoblastoid cell lines do show potential for these variants to interact with the promoters of both PRDM1 and ATG5. EcholocatoR analyses also identified rs1008944 as an additional SNP of interest ( Supplementary Fig. 9N-T).
The TNFAIP3 region of association peaked at rs61117627 ( Fig. 2k; Supplementary Data 29, 30) near the 3' end of TNF alpha-induced protein 3 (TNFAIP3) gene. Bioinformatic analyses revealed that rs61117627 is not likely functional but identified several other correlated variants within the 95% credible set that are likely functional (Supplementary Data 31). Two variants, rs10499197 and rs58915141, are positioned in a likely enhancer upstream of TNFAIP3 that engages in several TADs, including the TNFAIP3 promoter, and are eQTLs for TNFAIP3 in multiple immune cell types ( Fig. 5b; Supplementary Fig. 11H, I; Supplementary Data 31). In addition, rs10499197 and rs58915141 have elevated transcription factor binding activity (Fig. 5c). rs5029924 had the highest transcription factor binding activity among the 95% credible SNPs (Fig. 5c), but no coalescence of eQTLs and TADs was observed for this SNP (Supplementary Fig. 11J; Supplementary Data 31). We did discover that rs7749323, the tagging SNP of the previously characterized TT > A variant contributing to hypomorphic TNFAIP3 expression with the SLE risk haplotype 46,50,51 , is also an eQTL for IFNGR1 in the minor salivary gland (Fig. 5b;  Supplementary Fig. 11G; Supplementary Data 31). TAD data is currently unavailable for the salivary gland, but a TAD between the TT > A enhancer region and upstream interferon-gamma receptor 1 (IFNGR1) promoter was found in EBV B cells. Interestingly, EcholocatoR analyses only identified additional variants of interest when rs10499197 or rs58905141 were designated as index SNPs, identifying rs675640, rs142373084, and rs113237273 ( Supplementary Fig. 11M-S).

Discussion
This study identified ten Sjögren's genetic susceptibility loci in the largest GWAS to date of Sjögren's of European ancestry, nearly doubling the total number of identified genetic risk loci from 12 to 22. Although this list lags far behind the >120 risk loci identified for SLE or RA, the Sjögren-SNPs yielded similar PRS calculations (Sjögren-All AUROC = 0.71; Sjögren-Ro + AUROC = 0.78; SLE AUROC = 0.72; RA AUROC = 0.76), suggesting that this GWAS accounts for a substantial portion of Sjögren's heritability 25,26,59,60 . To understand how these ten novel risk loci contribute to Sjögren's pathogenesis and identify the GWS Sjögren-SNPs responsible, we also fine mapped the loci and performed a deep bioinformatic analyses of the Sjögren-SNPs from each credible set of SNPs. We leveraged publicly available promoter-capture Hi-C data and eQTL data to map Sjögren-SNPs in the context of their (a) local regulatory network, (b) position relative to cell type-specific chromatin-chromatin interactions, and (c) potential function in specific immune cells and disease target tissues. Our deep bioinformatic analyses revealed new potential functional implications for the novel risk loci and further expanded the list of implicated genes to >40 (Supplementary Fig. 13).
PRSs assess the cumulative trait-associated genetic risk of an individual and, though limited in trans-ancestral applications, have accurately predicted an individual's genetic risk for specific phenotypes, disease severity, and/or early onset of autoimmune disease in tightly controlled case-control studies 61 . Limitations of PRS applicability are due, in part, to the limited genetic load accounted for by algorithms that calculate the PRS based on the prevalence and/or odds ratio of selected GWS SNPs. By leveraging the PRSice-2 algorithm, which calculates PRSs using all genotyped SNPs (after LD-pruning), we obtained PRS calculations with a similar genetic load as previously reported for SLE and RA studies, despite having 5-fold fewer regions reaching genome-wide association 25,26,59,60 . Interestingly, removal of SNPs from the HLA region significantly reduced the predictability of the Sjögren's PRS calculations in both the Sjögren-All and Sjögren-Ro + datasets, thus indicating that the SNPs carried on the HLA are strong genetic risk factors for Sjögren's and the anti-Ro autoantibody positive sub-phenotype. Due to the limited number of Sjögren-Ro − individuals available in this study, we could not assess the impact of HLA on the genetic risk of the anti-Ro autoantibody negative phenotype. However, the residual effect observed in the Sjögren-All PRS predictions after HLA removal suggests that the HLA association may have a lower impact on the genetic risk of the Sjögren-Ro − subphenotype 62 . Future studies focusing on the genetic risk of Sjögren-Ro − patients are needed to assess these subphenotypic genetic differences.
Mapping the local regulatory networks and identifying additional genes of interest based on coalescence of reported TADs and cell type-specific eQTLs provided additional insights into the potential functional implications of susceptibility regions where the index gene function is unknown. For example, the functional implications of the XKR6 susceptibility locus, identified herein and implicated in other autoimmune diseases [65][66][67] , remains largely unknown. Our bioinformatic analyses revealed that rs11250099 is positioned in an intronic enhancer of XKR6 that forms a TAD with the promoter of MTMR9 and is an eQTL for MTMR9 in CD4 + and CD8 + T cells and B cells (Fig. 5b; Supplementary Fig. 5G; Supplementary Data 13). MTMR9 encodes myotubulin-related protein 9, a protein that interacts and modulates the enzymatic activity of the autophagic inhibitor, MTMR8, in HeLa cells and Drosophila 68,69 . Given that autophagy is implicated in the regulation of B and T cell proliferation, survival, and ability to distinguish self from non-self 70,71 , it is interesting to hypothesize that the risk allele of rs11250099 may influence Sjögren's susceptibility, in part, by modulating MTMR8/MTMR9-mediated autophagy [27][28][29] . However, to address this hypothesis, detailed functional studies, such as testing for allele-specific TAD formation between the XKR6 intronic enhancer and the MTMR9 promoter and the role of MTMR9 in autophagy, in Sjögren's are needed.
In addition to dysregulated immunity, analyzing genetic susceptibility in the context of salivary gland function may provide additional insights into why the salivary gland becomes a target tissue in Sjögren's. Transcriptomic studies of the salivary gland have identified several differentially expressed genes associated with Sjögren's pathology 3,4,6,64 ; many of the salivary gland eQTLs reported herein are from similar studies. Unfortunately, most of these studies lack the cellular granularity to discern differential expression in salivary acinar and ductal cells from invading immune cells. Further, very little is known about the cell-typespecific chromatin architectures in the salivary gland. By exploring the local regulatory networks from EBV B cells, which are known to exhibit more promiscuous chromatin looping 72,73 , in the context of reported salivary gland eQTLs, we have garnered several new insights into how Sjögren's genetic susceptibility may influence initial salivary gland dysfunctions.
TNFAIP3 is a common autoimmune disease risk locus with several functionally characterized SNPs that regulate proinflammatory nuclear factor kB (NFκB) signaling implicated in the chronic inflammation and immune cell dysfunctions of autoimmunity 74 . The coding polymorphism, rs2230926, which propagates NFκB signaling by hindering TNFAIP3 expression, is statistically associated with primary Sjögren-associated lymphoma 75,76 . Additionally, the TNFAIP3 locus has a complex genomic architecture carrying several enhancers and regulatory elements, including the previously characterized TT > A enhancer 50,51 . The tag SNP for the TT > A enhancer, rs7749323, was a GWS Sjögren-SNP in this GWAS and is an eQTL for TNFAIP3 in neutrophils (Figs. 2k, 5b; Supplementary Data 31). Interestingly, rs7749323 is also a salivary gland eQTL for IFNGR1 and engages the promoter of IFNGR1 through chromatin looping in EBV B cells ( Fig. 5b; Supplementary Data 31). IFNγ signaling is a potent driver of inflammation and immune activation in the salivary gland, and upregulation of interferon signaling is strongly correlated with autoimmunity 63,77,78 . In Sjögren's, focal inflammation and the accumulation of IFNγ and other inflammatory cytokines contribute to exocrine gland dysfunction independent of lymphoid infiltration 78 . Therefore, our bioinformatic analyses implicate rs7749323 and allele-specific expression of IFNGR1 as one of likely several potential mechanisms involved in the previously observed interferon signature of the salivary gland. Given that the mechanisms driving dysregulation of the salivary gland are largely understudied, these observations are meant to provide insights that help formulate hypotheses to test predicted mechanisms.
Inflammation and cell stress are known activators of autophagy, a process that has been implicated in the production of selfantigens in autoimmune disease target tissue when dysregulated [79][80][81] . Like our IFNGR1 finding, we observed that rs533733 in the PRDM1-ATG5 locus is a salivary gland eQTL for the autophagy regulator, ATG5, and is positioned in a TAD with the ATG5 promoter in EBV B cells ( Fig. 5b; Supplementary Data 25), leading us to hypothesize that the risk allele of rs533733 may also contribute to Sjögren's pathogenesis by modulating autophagy in the salivary gland.
Interpretations of our bioinformatic analyses must consider the following limitations: (1) leveraged eQTL databases do not always report directionality, which limits the ability to interpret whether an implicated Sjögren-SNP increases or decreases gene expression; (2) the promoter-capture Hi-C data used for TAD mapping were obtained from quiescent cell types. Cellular microenvironments, including the presence/absence of inflammatory cytokines, can modulate gene expression in several ways, including altered transcription factor binding, regulatory element activity, and TADs between regulatory elements and target gene promoters [28][29][30] . Therefore, genes in the expanded gene list that have reported eQTLs, but lack cell-specific chromatin interactions, should be further examined in the context of specific stimuli or environments.
In conclusion, we performed a GWAS of Sjögren's of European ancestry, then leveraged fine mapping and bioinformatic databases to determine the functional potential of GWS Sjögren-SNPs. Evaluating the local regulatory networks of each region in the context of immune cell type-specific eQTLs revealed that most of the regions have broad regulatory networks collectively involving >40 genes up-and downstream of the index gene. Further, pathway analyses of genes within these expanded local regulatory networks have diverse functional potential and may influence the pathogenesis of disease by altering cellular functions that might not have been otherwise considered ( Supplementary  Fig. 13). While our study demonstrates how deep bioinformatic analyses can expand the utility of GWAS data, we recognize that future functional studies in specific immune cell and tissue types at single-cell resolution in the minor salivary gland and in different disease subphenotypes will be needed to understand whether the genes we have identified within the broad regulatory networks of the ten novel Sjögren's risk loci influence disease pathology.
All cases fulfilled the American-European Consensus Group (AECG) criteria for primary Sjögren's disease according to clinical evaluations performed within their respective cohort 83,84 . Anti-Ro autoantibody positivity was determined according to the approved study protocol of each respective cohort and reported as positive or negative by each respective cohort for this study. Written informed consent was obtained in accordance with the Institutional Review Boards of each respective cohort. All study protocols and informed consent documents from outside institutions were reviewed and approved by the OMRF Institutional Review Board. Then, this study was conducted in accordance with the OMRF Institutional Review Board approval.
Genotyping and QC. Ilumina Infinium Omni1 or Omni2.5 Genome-Wide Genotyping Array kits were used to genotype DS1 16 and DS4 19 . Illumina OmniExpress kit was used to genotype DS2, DS3, and DS5. DS6 was genotyped using the Illumina Global Screening Array (GSA) kit following Infinium chemistry. DS7 was genotyped using the ImmunoChip 1.0 array 16 . Strict quality control procedures were applied to each dataset: (i) subjects within well-defined cluster scatter plots; (ii) SNP having MAF > 1%; (iii) SNPs and samples each with call rate >95%; (iv) controls with Hardy-Weinberg proportion test with P > 0.001; (v) cases and controls with differential missingness (P > 0.001) were selected for downstream analysis. PLINK (v1.9) was used to merge the quality controlled DS1, DS2, DS3, DS4, DS5, and DS6 into a single merged dataset, then quality control procedures were applied to the merged genotyped dataset 85 .
Individual genotyped data were excluded from the merged genotyped dataset and DS7 if it had <95% call rate and excessive heterozygosity (>5 s.d. from the mean). PLINK (v1.9) was used to determine relatedness within the remaining samples using identity-by-descent (IBD) estimates 85 . One individual from each pair was removed if the proportion of the alleles that shared IBD was >0.4. Base pair positions were assigned according to the GRCh37/hg19 version of the human genome reference sequence. After quality control, 3,232 cases and 17,481 population controls in the merged genotyped dataset, and 619 cases and 6171 controls in DS7 were available for subsequent analyses (Fig. 1a; Supplementary Data 2).
Assessment of population stratification. Population substructure within the PI1 and DS7 datasets was determined using EIGENSTRAT [https://www.hsph.harvard. edu/alkes-price/software/] with 53,108 (merged genotyped dataset) or 16,596 (DS7) independent markers (r 2 < 0.20 between variants), respectively 86 89,90 . To be included in analyses, imputed variants had to meet or exceed the imputation quality score (INFO) > 0.5 and quality control criteria described above. Imputed variants from datasets DS1-DS6 were merged to create dataset PI1 (Fig. 1a) and subjected to quality control criteria described above. Whole-genome imputation increased the number of SNPs tested for single-marker trait association from 101,574 to 6,257,359.
Statistical analysis. Logistical regression models were computed using PLINK to test for single marker SNP-Sjögren's association in the post-imputation PI1 dataset 85 . The additive genetic model was calculated for the 22 autosomal chromosomes, while adjusting for the first four principal components. PC1-PC4 accounted for >80% of the variation in the dataset after quality control.
To analyze imputed SNPs on the X chromosome, two separate logistical regression analyses were performed, one for each sex, using the same qualitycontrolled samples from the analysis of the autosomal genome and adjusting for the first four principal components. A dosage compensation model was used to account for inactivation of chromosome X, thereby assigning the SNPs as 0 or 2 (i.e., males were treated as homozygotes for the present allele). Logistic results of both sexes were then meta-analyzed to determine the common effect of chromosome X variants for the disease using a weighted Z-score in METAL [http:// csg.sph.umich.edu/abecasis/metal/] 91 .
Results from the logistic regression analyses of the novel GWS regions (P < 5 × 10 −8 ), 74 suggestive GWS regions (P < 5 × 10 −5 ), and previously established regions were meta-analyzed with DS7 data using a fixed-effect model in METAL by weighing the SNP effect by sample size 91 . Cochran's Q test statistic and I 2 index were both used to test for meta-analysis heterogeneity 92,93 . LD and probable haplotype blocks were determined using the solid spine of LD algorithms in the HAPLOVIEW software v4.2 [https://www.broadinstitute.org/haploview/ haploview], and a threshold of r 2 > 0.8 94 .
Polygenic risk score calculation. Genotyped individuals were randomly separated into a training dataset (2/3rd of the individuals) and testing dataset (1/3rd of the individuals) to calculate individual polygenic risk scores (PRS). PRSs were calculated for both the Sjögren-All and Sjögren-Ro + dataset (Fig. 1a). Anti-Ro autoantibody positivity was determined and reported by each respective cohort as described above. To correct for population stratification, PCA analyses were performed for each of the datasets separately, then the first three PCs were used as covariates in the PRS analysis. Data was pruned to remove highly correlated SNPs using independent pairwise analysis with a window size of 50 kb, step size or variant count of 5, and r 2 > 0.2 in PLINK (v1.9) 85,94 , then PRSice-2 v2.3.3 [https:// www.prsice.info/] 95 was used to calculate PRSs. After pruning, the Sjögren-All training dataset included 2166 cases and 11,638 population controls and testing dataset included 1076 cases and 5826 population controls. The Sjögren-Ro + testing dataset included 1100 cases and 11,544 population controls and training dataset included 618 cases and 5,894 population controls. In subsequent analyses, the HLA region (24-37 Mb) was dropped and PRSs were recalculated using PRSice-2. PRSs were generated at multiple P value thresholds (P T ) ranging from P = 0.001 to P = 1. The best-fit thresholds were used to predict Sjögren's status under logistic regression, while adjusting for the three PCs using the general linear model (glm) function in R 3.6.0. Distribution of PRSs among cases and controls were plotted using R. Area under the receiver operating characteristic curves (AUROC) were used to evaluate the accuracy of the two PRS models to distinguish the case from control status using the pROC package  97 . Analysis was limited to the Sjögrenassociated GWS SNPs imputed using HapMap3 as implemented in the LDSC software. Traits with a genetic correlation (rg) of p < 0.05 were considered as genetically correlated with Sjögren's.
Cells and tissue-specific epigenetic enrichment and Sjögren's heritability. Partitioned LD Score Regression (LDSC) 96 was performed to estimate enrichment of Sjögren's risk loci in cell-type-specific enhancer peaks (Fig. 1B). Cell type-and tissue-specific annotations were downloaded from EnhancerAtlas2 database [http://www.enhanceratlas.org/] 35 . LD scores were calculated based on Enhancer-Atlas2 custom annotation, linkage pattern derived from European samples from 1000 Genomes Project and the baseline model suggested by Finucane et al. 97 . Epigenomic enrichments of genetic variants were tested using GREGOR [http:// csg.sph.umich.edu/GREGOR/] 98 . Sjögren's variants with P < 5 × 10 −8 were tested for the enrichment in 4035 genomic features. The saddle-point approximation was used to estimate the enrichment P value by comparing it to the distribution of permuted statistics 97 . The enrichment was considered significant if the enrichment P value was less than the Bonferroni-corrected threshold of 1.23 × 10 −5 (nominal P = 0.05 of 4035 tested genomic features). Imputed narrow peak data from the Roadmap Epigenomics Project was used for this purpose 33,34 .
Transcription factor enrichment in Sjögren's risk loci. Regulatory Element Locus Intersection (RELI) algorithm 37 was used to identify transcription factors intersecting with Sjögren's risk loci (Fig. 1b). Sjögren's variants with P < 5 × 10 −8 were used as input. RELI picks the most significantly associated SNP in each risk loci as an anchor SNP for calculating the LD blocks using sequencing data from the 1000 Genomes Project [http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/]. Variants with r 2 > 0.8 with the top significant SNPs were prepared as input for RELI. RELI was run with the default parameters to calculate the z-score and P-value for each transcription factor, and Bonferroni corrected to obtain the final P-value.
Fine-mapping. To identify the independent associations at each of the seven novel GWS loci (P < 5 × 10 −8 ), logistic regression was performed while adjusting for the most significant variants (i.e. conditional analysis; P < 0.0001). LocusZoom [http:// locuszoom.org/genform.php?type=yourdata] was used to plot logistic regression results for each independent Sjögren-associated region 99 . Bayesian analyses were used to further refine the association signal in each GWS region. Posterior probability (PP) estimates were calculated for each SNP in the GWS region as if the SNP was a true putative casual variant. Then, a credible set of SNPs (PP > 95%) was defined using the Trinculo software package [https://sourceforge.net/projects/ trinculo/files/] 100,101 .
Fine-mapping was also performed separately using EcholocatoR R package [https://github.com/RajLabMSSM/echolocatoR] 44 that uses SuSiE and PolyFun +SuSiE and summary statistics to analyse SNPs from a 1 Mb window centered on each designated index SNP (i.e., ±0.5 Mb flanking the SNP). Both approaches used the Bayesian model and provided posterior probability (PP) to each SNP on a scale from 0 to 1 and defined the credible sets based on PP ≥ 0.95. In both models, the maximum number of casual SNPs was set to 5. Enrichment of GWAS SNPs was performed using the following annotations: (1) cell and tissue-type specific regulatory chromatin states from Roadmap Epigenomics Consortium, (2) ENCODE transcription factor binding sites in different cell types, and (3) regulatory chromatin states from DNAse hypersensitivity (Roadmap Epigenomics Consortium).
Functional annotation. SNPs in the 95% credible set were annotated for potential function using FUMA version v1.3.5e [http://fuma.ctglab.nl/] 102 . FUMA is an online platform for the functional mapping and annotation of genetic variants to define genomic risk loci and obtain functional information of the relevant variants in that locus. FUMA identifies SNPs that have a GWAS P value (P < 5 × 10 −8 ) and are not in LD (r 2 < 0.6). Then, it provides a list of all known SNPs, regardless of presence in the GWAS input, that are in LD (r 2 ≥ 0.6) with the independent SNPs for further annotation. Index SNPs were defined as independent SNPs with an r 2 < 0.1. Independent Sjögren-associated genomic risk loci were defined as regions that are both in LD with an index SNP and separated by >250 kb from another region of association. Hi-C data of GM12878 and eQTL from DICE database in FUMA was utilized for chromatin interaction mapping with a false discovery rate (FDR) cutoff of <1 × 10 −5 to define the significant interactions. FUMA performs chromatin interaction mapping, overlapping independent significant SNPs and SNPs in LD with one end of significantly interacting regions in user-defined tissues/cell types. Predicted enhancer regions were selected in any of the 111 tissues/ cell types from the Roadmap Epigenomics Project 34 and the promotor region (250 bp up and 500 bp downstream of transcription start site) as predicted by Roadmap Epigenomics Project 34 . Circos software in FUMA was used to visualize associated regional plots, genomic risk loci, chromatin interactions, and eQTLs mappings.
Pre-processed capture Hi-C data for the candidate functional variants from 14 immune cell types and the GM12878 cell line were obtained from the 3D Genome Browser [http://3dgenome.fsm.northwestern.edu/chic.php] 107 . Promotor and enhancer sites located 5 kb on either side of each SNP were queried and the start and end coordinates for each loop were obtained. Consensus loops for each SNP were defined, then the start and end coordinates of each consensus loop were visualized in the UCSC Genome Browser [http://genome.ucsc.edu/] to find nearby or overlapping genes.
Risk loci and IMPACT annotation. To have a better understanding of the link between the Sjögren's risk loci and transcription factor binding sites, the risk loci were annotated against~700 cell-type-specific active transcription factor binding sites from the IMPACT model (Fig. 1b) 108 . First, the total number of active transcription factor binding sites was calculated for each genomic position. A genomic position with an IMPACT score >0.5 was reported as an active transcription factor binding site. The candidate SNPs in Sjögren's risk loci against the total active binding sites were then annotated.
Functional pathway analysis. Candidate functional genes that coalesced with eQTLs and TAD interactions were selected for canonical pathway analysis and disease and function analysis using the Ingenuity Pathway Analysis Program (version: 60467501, Ingenuity System).
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The genome-wide association summary statistics generated in this study have been deposited in the Databases of Genotypes and Phenotypes (dbGaP) under accession number: phs002723. v1.p1. Individual-level genotype data for 1199 subjects included in this study are available via dbGAP controlled access under accession number: phs002723.v1.p1. Individual-level genotype data for 10,850 subjects are available via dbGAP controlled access under accession numbers: phs000672.v1.p1 (n = 735), phs000428.v2.p2 (n = 8519), phs000196.v3.p1 (n = 995), phs000187.v1.p1 (n = 602). Remaining pre-existing individual-level genotype data were generated by coauthors and provided for specific use in this study and cannot be publicly shared per data use agreements. Contact the corresponding author (chris-lessard@omrf.org) with inquiries about accessing this pre-existing genotyping data. All other data presented in this study was previously published and can be accessed by: