POGLUT1, the putative effector gene driven by rs2293370 in primary biliary cholangitis susceptibility locus chromosome 3q13.33

Primary biliary cholangitis (PBC) is a chronic and cholestatic autoimmune liver disease caused by the destruction of intrahepatic small bile ducts. Our previous genome-wide association study (GWAS) identified six susceptibility loci for PBC. Here, in order to further elucidate the genetic architecture of PBC, a GWAS was performed on an additional independent sample set, then a genome-wide meta-analysis with our previous GWAS was performed based on a whole-genome single nucleotide polymorphism (SNP) imputation analysis of a total of 4,045 Japanese individuals (2,060 cases and 1,985 healthy controls). A susceptibility locus on chromosome 3q13.33 (including ARHGAP31, TMEM39A, POGLUT1, TIMMDC1, and CD80) was previously identified both in the European and Chinese populations and was replicated in the Japanese population (OR = 0.7241, P = 3.5 × 10−9). Subsequent in silico and in vitro functional analyses identified rs2293370, previously reported as the top-hit SNP in this locus in the European population, as the primary functional SNP. Moreover, e-QTL analysis indicated that the effector gene of rs2293370 was Protein O-Glucosyltransferase 1 (POGLUT1) (P = 3.4 × 10−8). This is the first study to demonstrate that POGLUT1 and not CD80 is the effector gene regulated by the primary functional SNP rs2293370, and that increased expression of POGLUT1 might be involved in the pathogenesis of PBC.

Thousands of genetic variations associated with susceptibility to human complex diseases have been identified by GWASs 18 . Among genes located near the "top-hit SNP" in each susceptibility locus, candidate genes with well-known functions are often selected as the "disease susceptibility genes". The majority of SNPs that regulate gene expression are actually found in the vicinity of genes within 100 kb of the transcription start site (TSS) 19 . However, trans-acting expression quantitative trait loci (e-QTL) variations, whose target transcripts are separated by arbitrary distances, are believed to explain a substantial proportion of the heritable variation in gene expression 20 . For example, although FTO was reported as a susceptibility gene for obesity, the effector genes whose expression levels were influenced by the significantly associated SNPs were not FTO but Iroquois Homeobox 3 (IRX3) and IRX5 21 . In PBC, a locus on chromosome 17q12-21 (ORMDL3-GSDMB-ZPBP2-IKZF3) has been reported as a shared susceptibility locus in different populations. Although the top-hit SNP was located in the IKAROS family zinc finger 3 (IKZF3), in which the function of the protein product is related to the proliferation and differentiation of B cells, the effector gene in this locus was identified as ORMDL sphingolipid biosynthesis regulator 3 (ORMDL3), whose protein product regulates endoplasmic reticulum (ER)-mediated Ca 2+ homeostasis and facilitates the unfolded-protein response (UPR) 22,23 . Therefore, understanding the contribution of susceptibility loci to the onset of diseases requires identification of the effector genes that are regulated by the primary functional variation located in the disease susceptibility loci.
The present study aimed to further elucidate the genetic architecture of PBC in the Japanese population. To this end, we performed a GWAS and subsequent genome-wide meta-analysis based on a whole-genome SNP imputation analysis with previous GWAS 16

Results
GWAS and genome-wide meta-analysis. We genotyped an independent set of 1,148 samples (668 PBC cases and 480 healthy controls) using the Affymetrix Japonica V1 Array 24 . Thirty-four samples were excluded by Dish QC (<0.82) or overall call rate for a total of 20,000 SNPs (<0.97) and 13 samples were excluded because of cryptic relatives. A further 13 samples were located far from the JPT cluster drawn using the first and second components after PCA and were removed from further analysis ( Supplementary Fig. 1A). We re-genotype called about 2,897 samples (1,392 PBC cases and 1,505 healthy controls) collected in the previous study 16 . Eighteen samples were excluded by Dish QC (<0.82) or overall call rate for a total of 20,000 SNPs (<0.97) and 15 samples were excluded because of cryptic relatives. Seventeen samples were located far from the HapMap JPT cluster drawn using the first and second components after PCA and were removed from further analysis ( Supplementary  Fig. 1B). A quantile-quantile plot of the distribution of test statistics for the comparison of allele frequencies in the PBC cases and healthy controls provided an inflation factor lambda value of 1.097 for all tested SNPs for the 1,148 entries in the current dataset and a value 1.061 for the 2,897 entries in the previous dataset ( Supplementary  Fig. 2). Genotype imputation and the association study were separately performed for the two datasets. The process of data cleaning and meta-analysis is summarized in Supplementary Fig. 3. Figure 1 shows a genome-wide view of the single-point association data based on allele frequencies after meta-analysis. The loci HLA, TNFSF15, IL7R, NFKB1/MANBA, and chromosome 17q12-21 showed significant associations with PBC, as reported in the previous GWAS performed on a Japanese population 16 . In addition to these regions, meta-analysis to combine the two datasets showed a significant association in chromosome 3q13.33 (Top hit SNP: rs57271503, OR = 0.7241, P = 3.5 × 10 −9 , Fig. 2), although this locus appears as evidence of no association with PBC from studies using each platform (Japonica and ASI, Supplementary Fig. 4).
Identification of rs2293370 as the primary functional SNP in chromosome 3q13.33. Among the 29 SNPs whose P values were less than 1.0 × 10 −6 upon genome-wide meta-analysis, SNPs located in the 3′-untranslated region (UTR) and synonymous substitutions were selected as potential candidates for primary functional variation in the chromosome 3q13.33 region (Table 2 and Fig. 2). Five of the 29 SNPs [rs57271503 and rs3830649 in the 3′UTR of CD80, rs2305249 in exon 11 of ARHGAP31 (P567P), rs1131265 in exon 3 of TIMMDC1 (V146V), and rs3732421 in the 3′UTR of TMEM39A] are located in the 3′UTR or synonymous substitutions but are unrelated to their own gene expression as determined by e-QTL analysis 25 ( Supplementary  Fig. 5) and thus were excluded from further analysis.
Two of the remaining 24 SNPs had RegulomeDB scores higher than 3 and these scores were supported by their location in DNase hyper-sensitivity clusters and the binding of transcription factor. Consequently, these two SNPs were selected as potential candidates 26 (Table 2; rs2293370 in intron 2 of TIMMDC1 and rs56008261 in intron 8 of ARHGAP31). Both SNPs were located in DNase I hyper-sensitivity clusters and in H3K27Ac markers in at least one cell type identified by the UCSC genome browser 27 .
We performed electrophoretic mobility shift assays (EMSAs) to evaluate the effect of candidate SNPs that potentially regulate the binding affinity of transcription factors. A difference in mobility shift between the major allele and the minor allele was detected for rs2293370 in HepG2 (Fig. 3A) and Jurkat ( Supplementary Fig. 6A) cells. The shifted band was abrogated by incubation with a 200× concentration of a non-labeled probe (competitor probe) ( Fig. 3B and Supplementary Fig. 6B). In contrast, there was no difference in mobility shift for rs56008261 between the major allele and the minor allele ( Fig. 3A and Supplementary Fig. 6A). Additionally, in order to deny the possibility of the existence of other variations with independent genetic contributions in chromosome 3q13.33, conditional logistic regression analysis was performed. When the rs2293370 was conditioned on, significant associations of other SNPs in chromosome 3q13.33 totally disappeared ( Supplementary Fig. 7).
These results indicated that rs2293370 is the primary functional SNP in chromosome 3q13.33.

Molecular features of disease susceptibility influenced by rs2293370. We performed luciferase
reporter assays in HepG2 and Jurkat cells to determine the differences in transcription efficiency between the C (major allele, PBC-susceptibility) and T (minor allele, PBC-protective) alleles of rs2293370. Concordant with the result of EMSA, the luciferase activity of cells 24 h after transfection with a reporter construct containing the C allele of rs2293370 was significantly higher than that of cells containing the T allele for both cell lines (Fig. 3C,D and Supplementary Fig. 6C). Next, we performed in silico prediction of transcription factor binding using the TRANSFAC database 28 to identify the transcription factor responsible for the mobility shift associated with the C allele of rs2293370. The C allele of rs2293370, but not the T allele, was predicted to be involved in a binding motif of Runt-related transcription factors (Supplementary Fig. 8). Although the DMRT and Myb families also showed similar patterns, they are not expressed in HepG2 or Jurkat cells 29 . Of the Runt-related transcription factors, Runt-related transcription factor (RUNX1) -1, but not RUNX-2 and RUNX-3, was confirmed to be expressed in both HepG2 and Jurkat cells 29 (Supplementary Fig. 9). Consistent with the in silico prediction of transcription factor binding, the mobility shift associated with the C allele of rs2293370 was reduced by pre-incubation with an anti-RUNX1 antibody prior to electrophoresis (Fig. 3E).
These results indicated that the PBC susceptibility allele of rs2293370 enhances transcription via binding with RUNX1.
The mRNA expression level of POGLUT1 is influenced by rs2293370. We used the GTEx portal database 25 to assess the influence of rs2293370 on endogenous gene expression by comparing the expression levels of all genes in the human genome for the different genotypes of rs2293370 in every organ whose expression level of POGLUT1 was above the threshold for detection. Individuals carrying the C allele (i.e., the PBC-susceptible allele) of rs2293370 showed a significantly higher level of expression of POGLUT1 in several organs ( Fig. 4; statistical significance level: P < 0.05/47 organs = 0.00106). Other genes located in chromosome 3q13.33 (ARHGAP31, TMEM39A, POGLUT1, TIMMDC1, and CD80) showed no significant association between rs2293370 genotypes and gene expression (Supplementary Fig. 10).

Discussion
In the present study we identified chromosome 3q13.33, which includes the genes ARHGAP31, TMEM39A, POGLUT1, TIMMDC1, and CD80, as a PBC susceptibility locus in the Japanese population by genome-wide meta-analysis based on whole-genome SNP imputation analysis of two distinct data sets of Japanese PBC-GWAS. The role of chromosome 3q13.33 had previously been identified in European and Chinese populations. Consequently, rs2293370, which is located in intron 2 of TIMMDC1, was identified as the primary functional SNP for disease susceptibility to PBC in chromosome 3q13.33 by in silico and in vitro functional analyses. In addition, the disease protective allele of rs2293370 was shown to disrupt a RUNX1 binding site and was associated with significantly decreased POGLUT1 mRNA expression levels in tissues compared with individuals without this allele. The contribution of POGLUT1 to the pathogenesis of PBC has not been reported to date. Endoplasmic reticulum (ER)-localized protein O-glucosyltransferase 1, which is encoded by POGLUT1, adds glucose moieties to serine residues of the epidermal growth factor (EGF)-like domains of Notch family proteins 30,31 . Notch signaling is an evolutionally conserved cascade that includes four receptors (Notch 1-4) and five ligands [Jagged 1, Jagged 2, Delta-like ligand 1 (DLL1), DLL3 and DLL4]. Therefore, it might be possible that genetic polymorphisms affecting the expression levels of POGLUT1 influence the Notch signaling pathway by altering Notch glycosylation. The generation and development of diverse blood cell lineages and peripheral immune responses are regulated by this Notch signaling cascade, especially in hematopoiesis during T cell lineage commitment and maturation in the thymus, and during marginal zone B (MZB) cell development in the spleen 32 . Recently, dendritic cell (DC) homeostasis and the development of several lymphocyte subsets belonging to the innate immune system have been reported to be regulated by Notch 32 . Therefore, inappropriate immune responses against self-antigens could occur due to impaired regulation of Notch signaling. In experimental autoimmune encephalomyelitis (EAE) and non-obese diabetic (NOD) mice, which are mouse models for multiple sclerosis and type 1 diabetes (T1D), respectively, disease progression was impeded by the administration of blocking antibodies against Notch receptors or DLL4 32 . Therefore, higher endogenous levels of POGLUT1 caused by the PBC-susceptible allele of rs2293370 may induce excessive Notch signaling and inappropriate immune responses against self-antigens. Very importantly, Notch signaling is also involved in the development or formation of intrahepatic bile ducts. Mutations in JAG1 or Notch2 are known causes of Alagille syndrome, an autosomal dominant disease characterized by congenital cholangiopathy with jaundice and bile duct paucity [33][34][35] . POGLUT1 was shown to regulate the number of bile ducts around portal veins in a JAG1-dependent manner using JAG1 +/− and POGLUT1 +/− mice 36 . These results indicate that POGLUT1 might be involved in the mechanism of bile duct loss in PBC. However, a limitation of this study was that endogenous POGLUT1 expression levels in PBC patients were not examined in this study. Additional studies are warranted to improve our understanding of the relationship between PBC pathogenesis and POGLUT1.
A more than 100-kb stretch of the genome is located in chromosome 3q13.33 that includes ARHGAP31, TMEM39A, POGLUT1, TIMMDC1, and CD80. In addition to the present study on the Japanese population, this  locus, as represented by top-hit SNP rs2293370, has been reported as a susceptibility region to PBC in European and Chinese populations 10,14,17 . The present study identified rs2293370 as the primary functional SNP by in silico and in vitro functional analysis. Therefore, POGLUT1 is likely the effector gene for susceptibility to PBC in not only the Japanese population but also other populations. This locus has also been reported as a susceptibility region for celiac disease, multiple sclerosis, systemic lupus erythematosus, and vitiligo, represented by rs11712165 in ARHGAP31, rs2293370 in TIMMDC1, and rs6804441 and rs148136154 in CD80, respectively [37][38][39][40] . The LDs were not strong between these represented tag-SNPs and rs2293370, which was identified as the primary functional SNP in this study (r 2 < 0.4 in 1000 genomes Asian, Supplementary Fig. 11). Regardless, rs2293370, whose effector gene is POGLUT1, may operate as a functional SNP in both PBC and multiple sclerosis. The present study, compared with the GWAS in the European descent, identified four out of 30 non-HLA loci (IL7R, NFKB1/MANBA, chromosome 17q12-21, and chromosome 3q13.33) as susceptibility loci in the Japanese population. Additionally, even though the p-values were below the genome-wide significance level (P < 5.0 × 10 −8 ), the direction of OR was the same between a European descent population and the Japanese population in MMEL1, STAT4/STAT1, IL12A, CXCR5/DDX6, and SPIB (Table 1). These loci are likely candidates as shared susceptibility loci for the pathogenesis of PBC between European and Japanese populations. We are currently working to clarify the shared gene profiles between different populations by conducting SNP imputation and subsequent meta-analysis using GWAS data from an international collaboration involving the UK, Italy, the USA, Canada, China and Japan.
In conclusion, genome-wide meta-analysis together with in silico/in vitro functional analyses identified the primary functional SNP rs2293370 and the effector gene POGLUT1. Chromosome 3q13.33 contains CD80, which encodes a well-known co-stimulatory signaling molecule necessary for antigen presentation from HLA class II to T cell receptor (TCR). CD80 had been assumed as a candidate effector gene in previous GWASs, whereas our approach identified POGLUT1 as the target transcript for disease susceptibility in this locus comprehensively without stereotypes. Of the PBC susceptibility genes identified in the Japanese population, we previously identified the primary functional SNPs of TNFSF15, PRKCB, and chromosome 17q12-21, and their effector genes 16,22,41 , as well as chromosome 3q13.33 in the present study. Similar post-GWAS approaches for susceptibility genes are needed to further clarify the molecular mechanisms of disease development.

Materials and Methods
Subjects. DNA  Genotyping, quality control, and genotype imputation. Genotyping was performed using the Japonica V1 array (1,148 Japanese individuals in the present study; Affymetrix Japan). Genotype calling was performed with the apt-probeset-genotype program in Affymetrix Power Tools ver. 1.18.2 (Thermo Fisher Scientific Inc., Waltham, MA). Sample quality control was conducted by following the manufacturer's recommendation (dish QC > 0.82 and sample call rate > 97%). Clustering of each SNP was evaluated by the Ps classification function in the SNPolisher package (version 1.5.2, Thermo Fisher Scientific Inc.). SNPs that were assigned "recommended" by the Ps classification function were used for downstream analyses. SNPs that satisfied the following criteria were used for genotype imputation: a call rate > 99.0%, a Hardy-Weinberg Equilibrium (HWE) p-value > 0.0001, and a minor allele frequency (MAF) > 0.5%. Pre-phasing was conducted with EAGLE v2.3.2 42 with default settings. Genotype imputation was conducted with IMPUTE4 v1.0 43 using a phased reference panel of 2,049 Japanese individuals from a prospective, general population cohort study performed by the Tohoku Medical Megabank Organization (ToMMo) 44,45 . These procedures were conducted using default settings. Cryptic relatives were excluded using PRIMUS 46 with default settings. In addition, principal component analysis (PCA) was performed using East Asian samples from the International 1000 Genome Project (104 JPT, 103 CHB, 93 CHS, 91 CDX, and 99 KHV samples) in addition to the case and control samples. PCA identified outliers to be excluded using the Smirnov-Grubbs test with a Bonferroni corrected p-value < 0.05. We had previously analyzed the data (2,897 Japanese individuals) using the Axiom Genome-Wide ASI 1 Array 16 and re-analyzed the data using the above-mentioned procedures.
Association analysis and meta-analysis. Association analysis was performed with PLINK (version 1.9) in each dataset (i.e., 2,897 ASI array data and 1,148 Japonica array data). The following options were used for PLINK: a call rate > 97.0%, a HWE p-value > 0.000001, a minor allele frequency (MAF) > 0.1%, and a logistic regression model. Meta-analysis was performed using PLINK with the meta-analysis option after excluding duplicates between the two datasets. The fixed-effects meta-analysis p-value was used.
Databases. The probability that a candidate functional variation might influence transcription regulation was evaluated using the RegulomeDB database (http://www.regulomedb.org/index) 26 and the UCSC genome browser (http://genome.ucsc.edu/index.html) 27 . Transcription factor binding was predicted using TRANSFAC Professional (QIAGEN, Valencia CA; http://www.gene-regulation.com/pub/databases.html) 28 . Gene expression levels in each cell line and the correlation between the genotypes of candidate SNPs and gene expression were examined using GeneCards (http://www.genecards.org/) and the GTEx portal database (http://gtexportal.org/ home/), respectively 25,29 . P values < 0.05, after adjustment for multiple testing (Bonferroni correction), were regarded as statistically significant.
Electrophoretic mobility shift assay (EMSA). EMSA was performed using a LightShift Chemiluminescent EMSA Kit (Thermo-Fisher Scientific) and biotin-labeled double-stranded oligonucleotide probes corresponding to each major and minor allele (Supplementary Table 2), according to the manufacturer's instructions. These oligonucleotide probes (10 fmol/μL) and a nuclear extract (2.5 μg/mL) of HepG2 or Jurkat cells (Nuclear Extract Kit; Active Motif, Carlsbad, CA) were incubated together for 30 min at 25 °C.
The super-shift assay was performed by incubating the biotin-labeled probe with the nuclear extracts for 30 min at 25 °C, before subsequently incubating these complexes with Anti-RUNX1/AML1 antibody -ChIP Grade (ab23980) (Abcam, Cambridge, UK) for 30 min at 25 °C.
Each assay was independently performed three times.
Luciferase reporter assay. Amplicons of part of the 2nd intron of TIMMDC1, which contain each allele of rs2293370, were obtained from human genomic DNA using specific PCR primers (Supplementary Table 3) and were then subcloned into the luciferase reporter pGL4.23 (luc2/minP) vector (Promega, Madison, WI). pGL4.23 constructs (500 ng) of each allele and 50 ng of the pGL4.74 (hRluc/TK) vector as an internal control were transfected into Jurkat and HepG2 cells using Lipofectamine 3000 (Thermo-Fisher Scientific). The Dual-Luciferase Reporter Assay system (Promega) was used to measure luciferase activity. Differences in relative luciferase activity were compared between the major and minor alleles of each SNP using Student's t-test. P values < 0.05 were regarded as statistically significant. Each figure shows representative data from experiments performed independently three times. The data in the figures represent the mean ± standard deviation of triplicate assays in a single experiment.