Identification of the functional variant driving ORMDL3 and GSDMB expression in human chromosome 17q12-21 in primary biliary cholangitis

Numerous genome-wide association studies (GWAS) have been performed to identify susceptibility genes to various human complex diseases. However, in many cases, neither a functional variant nor a disease susceptibility gene have been clarified. Here, we show an efficient approach for identification of a functional variant in a primary biliary cholangitis (PBC)-susceptible region, chromosome 17q12-21 (ORMDL3-GSDMB-ZPBP2-IKZF3). High-density association mapping was carried out based on SNP imputation analysis by using the whole-genome sequence data from a reference panel of 1,070 Japanese individuals (1KJPN), together with genotype data from our previous GWAS (PBC patients: n = 1,389; healthy controls: n = 1,508). Among 23 single nucleotide polymorphisms (SNPs) with P < 1.0 × 10−8, rs12946510 was identified as the functional variant that influences gene expression via alteration of Forkhead box protein O1 (FOXO1) binding affinity in vitro. Moreover, expression-quantitative trait locus (e-QTL) analyses showed that the PBC susceptibility allele of rs12946510 was significantly associated with lower endogenous expression of ORMDL3 and GSDMB in whole blood and spleen. This study not only identified the functional variant in chr.17q12-21 and its molecular mechanism through which it conferred susceptibility to PBC, but it also illustrated an efficient systematic approach for post-GWAS analysis that is applicable to other complex diseases.

This 200-kb stretch of the genome is located in a strong linkage disequilibrium (LD) block structure in chr.17q12-21 (D' > 0.9 and r 2 > 0.6, See Supplementary Fig. 1), that includes orosomucoid-like 3 (ORMDL3), gasdermin B (GSDMB), zona pellucida binding protein 2 (ZPBP2), and IKAROS family zinc finger 3 (IKZF3), and has been reported as one of the susceptibility gene regions that are associated with susceptibility to asthma, rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), and inflammatory bowel disease (IBD) represented by rs2305480, rs2872507, rs1453560, and rs12946510, respectively, as well as PBC [17][18][19][20][21][22] . Among these genes, the function of IKZF3 and ORMDL3 gene products has been reported. IKZF3 encodes a member of the Ikaros family of zinc-finger proteins, and functions as a transcription factor that is important for the regulation of B lymphocyte proliferation and differentiation 23 . The protein product of ORMDL3 negatively regulates sphingolipid synthesis, regulates endoplasmic reticulum (ER)-mediated Ca 2+ signaling, and is related to T-lymphocyte activation [24][25][26] . However, mainly because of the existence of an extended LD in this gene region (See Supplementary Fig. 1), it remains unclear what functional variants in the chr.17q12-21 locus underlie disease susceptibility to PBC and what the molecular mechanisms of such functional variants might be.
In the present study, in order to identify a functional variant in an extended LD block of human chromosome 17q12-21 in PBC, we performed a high-density association mapping based on SNP imputation analysis using a whole-genome sequence reference panel of 1,070 Japanese individuals and our previous GWAS data regarding PBC in the Japanese population. We next carried out a combination of in silico and in vitro functional analyses in this extended LD gene region to identify a functional variant. Finally, we performed expression-quantitative trait locus (e-QTL) analyses to clarify the molecular mechanisms underlying disease susceptibility to PBC via the functional variant.
In silico functional analysis to select candidates for functional variants. From the total of 23 SNPs, SNPs that were predicted to be located in transcription regulatory elements, which were characterized by DNase hyper-sensitivity cluster analyses, SNPs that were predicted to be binding sites of transcription factors, and SNPs that showed significant associations in e-QTL analysis, were selected using the Regulome DB database (http:// www.regulomedb.org/index) 28 . Next, seven SNPs with Regulome DB scores higher than 2a (supported by the data of e-QTL and transcription factor binding/DNase peak) were selected as potential candidates (Table 1; rs9303277 and rs9909593 in intron 3 of IKZF3, rs10445308 in intron 6 of IKZF3, rs2313430 in intron 7 of IKZF3, rs7219923 in intron 1 of GSDMB, rs12150079 in intron 2 of ZPBP2, and rs12946510 in the intergenic region between GRB7 and IKZF3). Among these seven SNPs, two SNPs (Table 1, rs2313430 and rs12946510) that were located in transcription regulatory elements defined by (1) DNase I hyper-sensitivity cluster analysis of any one of 125 cell types and (2) by the existence of H3K27Ac markers in any one of the cell lines (GM12878, H1-hESC, HSMM, HUVEC, K562, NHEK, or NHLF) were selected as the final candidates using the UCSC genome browser (http://genome. ucsc.edu/index.html) 29 . In addition to these two SNPs, the two most significant SNPs in the high-density association mapping ( Table 1, rs9303277 and rs113897057), were also selected as final-candidate primary variants in the chr.17q12-21 locus for the conferral of susceptibility to PBC ( Fig. 1B and C).
In vitro functional analysis to identify the functional variant. To evaluate the effect of the final-candidate primary variants on binding affinities of transcription factors, biotin-labeled probes corresponding to the different alleles of each SNP were used in an electrophoretic mobility shift assay (EMSA) with nuclear extracts of the Jurkat (human T lymphocyte) and HepG2 (human liver carcinoma) cell lines. This assay detected Scientific RepoRts | 7: 2904 | DOI:10.1038/s41598-017-03067-3 a difference in mobility shift between the major and the minor (i.e., PBC-susceptible) alleles of rs12946510. No difference in mobility shift was detected for the other alleles tested (rs9303277, rs113897057, and rs2313430) ( Fig. 2A and B). The band shifted by the major allele (i.e., C-allele) of rs12946510 was specifically diminished by incubation with a 200× amount of a non-labeled competitor probe (Fig. 2C). Luciferase reporter assays were used to further determine differences in transcription efficiency between the major and the PBC susceptibility alleles of rs12946510. These assays were performed in the Jurkat, HepG2, and the human bile duct carcinoma, HuCCT1, cell lines ( Fig. 3A-C, and See Supplementary Fig. 3). In all cell lines, the luciferase activity of cells transfected with the reporter construct containing the PBC susceptibility allele of rs12946510 was significantly reduced to 70% of that of cells transfected with the reporter construct containing the major allele at 24 hours after transfection of the pGL4.23 vector. Similar results were obtained in the luciferase assay when a construct was used in which the inserted sequence containing rs12946510 was inserted in an opposite orientation (see Supplementary Fig. 4). These results indicated that rs12946510, which is located in the intergenic region between GRB7 and IKZF3, was the functional variant in the enhancer region in chr.17q12-21 which conferred susceptibility to PBC. Molecular mechanisms underlying the PBC susceptibility allele of rs12946510. Based on analysis using the TRANSFAC professional database it was predicted that the major allele of rs12946510 but not the PBC susceptibility allele constituted a Forkhead box protein O1 (FOXO1) binding motif ( Fig. 4A-C, http:// www.gene-regulation.com/pub/databases.html) 30 . In addition, the same alteration of FOXO1 binding between rs12946510 major and minor alleles was predicted by RegulomeDB. Therefore, the binding of FOXO1 with the major allele of rs12946510 was investigated in a super-shift assay using the nuclear extract of Jurkat. In accordance with the results of the predictions of transcription factor binding as described above, the shifted band caused by the major allele of rs12946510 was super-shifted by pre-incubation with an anti-FOXO1 specific antibody before electrophoresis (Fig. 4D). This result suggested that, in the PBC susceptibility allele of rs12946510, a primary FOXO1 binding site was disrupted. Thus, rs12946510 could function as the functional variant that regulates the efficiency of the FOXO1-related gene expression enhancer.
To assess the influences of rs12946510 on gene expression efficiency, the endogenous expression levels of all of the genes located in chromosome 17 were compared in whole blood and spleen among the genotypes of rs12946510 using the GTEx portal database (http://gtexportal.org/home/) 31 Supplementary Fig. 4A) or spleen (See Supplementary Fig. 5A).
Although the data of ZPBP2 in whole blood are not available, ZPBP2 expression levels did not show a significance difference among the genotypes of rs12946510 in the spleen (See Supplementary Fig. 5B). The expression levels of GRB7 and gasdermin A (GSDMA), which are located outside but close to the strong LD block structure in chr.17q12-21 did not show significance differences among the genotypes of rs12946510 in whole blood (See Supplementary Fig. 4B and C) or spleen (See Supplementary Fig. 5C and D). Additionally, chromatin interaction between the 5 kb window that contains rs12946510 and the upstream sequence of ORMDL3 and GSDMB could be detected in Hi-C database analysis (http://promoter.bx.psu.edu/hi-c/ index.html) (see Supplementary Fig. 7) 32 .
These results suggested that in individuals with the PBC-susceptible genotype of rs12946510, organs related with the immune response had reduced endogenous ORMDL3 and GSDMB expression levels.

Discussion
The current study, which included high-density association mapping based on SNP imputation analysis and in silico/in vitro functional analyses, identified rs12946510 as the functional variant in chr.17q12-21, a well-known PBC-susceptibility region in multiple ethnicities 8,[12][13][14]16 . Super-shift assay analysis demonstrated that the disease susceptibility allele of rs12946510 disrupted the primary FOXO1 binding motif. Additionally, the whole blood and spleen from individuals with the PBC susceptibility allele of rs12946510 showed significantly lower ORMDL3 and GSDMB expression levels than individuals without this allele. These results showed that the PBC susceptibility allele of rs12946510 disrupted the enhancer region for ORMDL3 and GSDMB gene expression (Fig. 6). There were 567 and 23 SNPs whose P values were less than 1.0 × 10 −3 and 1.0 × 10 −8 , respectively, that were located in chromosome 17. The latter 23 SNPs in chromosome 17 are all located in chr.17q12-21. Of these 23 SNPs, 2 SNPs (rs9303277 and rs113897057), which showed the most significant associations with susceptibility to PBC, and the candidate primary functional variants (rs2313430 and rs12946510) were chosen. These SNPs are located in a transcription regulatory element as assessed by the presence of a DNase I hyper-sensitivity cluster or H3K27Ac marks, and/or their location in a region that might modulate transcription factor binding. Chromatin interaction between the sequence that contains rs12946510 and the upstream sequence of ORMDL3 and GSDMB also supported this enhancer model (see Supplementary Fig. 7) 32 .
Recently, emerging methods of SNP imputation using a large scale reference panel have been used as an efficient approach for supplementation of GWAS data to identify disease-associated genetic factors 33 . In order to improve the accuracy of the results of SNP imputation analysis, large sample sizes are needed for both the GWAS and the whole-genome sequence reference panels 34 . Since the whole-genome sequencing of 104 JPT (Japanese living in Tokyo) by the 1000 Genome project (http://www.1000genomes.org/) showed that the proportion of the variants that were characteristic to the Japanese population was higher than that in other populations, even in East Asia 35 , it became clear that ethnicity-matched reference data of whole-genome sequences obtained from a large number of individuals was needed for appropriate SNP imputation analysis. We recently conducted an analysis of the whole-genome sequences of 1,070 Japanese individuals using high-coverage next-generation sequencing (32.4x on average) and constructed the largest Japanese population reference panel (1KJPN) 27 . By using this 1KJPN we were able to identify functional variants by high-density association mapping 16 .
Unlike gene expression promoter regions that are located upstream of the transcription start site, expression enhancer regions are located not only within or near the genes whose expression is regulated, but also at a distance of more than 100 kb from the regulated genes, as exemplified by the enhancer regions of mouse sox2 36 . In the present study, we clearly showed that, although rs12946510 was located between IKZF3 and GRB7, it affects the expression levels of other genes such as ORMDL3 and GSDMB that are located 120 kb downstream from the SNP. Thus, our approach consisting of high-density association mapping and in silico/in vitro analyses could overcome the difficulties of genetic analysis that are caused by the strong LD block structure.
ORMDL3 has been reported as one of the susceptibility genes to asthma (the most significantly associated SNP of ORMDL3 was rs7216389) 17,18 . Individuals with the asthma-susceptibility allele of rs7216389 showed a higher ORMDL3 expression level than those with an asthma-protective allele 17 . In contrast, the asthma-susceptibility allele of rs2872507 (located in chr.17q12-21, r 2 = 0.96 with rs7216389 in 1000 genomes panels in Asian subjects) was protective against autoimmune diseases such as rheumatoid arthritis, Crohn's disease, and ulcerative colitis 37 . In addition, other SNPs such as rs8076131 and rs4065275, which are located in ORMDL3 and are associated with the endogenous expression levels of ORMDL3, were recently reported to affect the levels of type-2 helper T cell (Th2) cytokines such as interleukin-4 (IL-4) and interleukin-13 (IL-13) in the supernatants of stimulated PBMCs 38 . This evidence indicated that the difference in ORMDL3 expression levels is directly related to the balance of Th subsets and Th cytokine levels and eventually confers disease susceptibility to asthma and other autoimmune diseases. In the present study, in accordance with these previous reports, the PBC-susceptibility allele of  (A and B) EMSA of each of the four candidate primary variants using biotin-labeled probes corresponding to the major alleles and the minor alleles, and nuclear extracts of Jurkat (human T lymphocyte) (A) and HepG2 (human liver carcinoma) (B) cells. Rs12946510 was the only variant to show a difference in mobility shift between the major allele (C-allele) and the PBC susceptibility allele (T-allele). (C) Competitor assay using a 200× amount of unlabeled probe corresponding to the major or the minor alleles. Three independent experiments were performed in each assay. rs12946510 showed a lower expression level of ORMDL3, which is possibly followed by lower expression levels of Th2 subset/cytokines and the predominance of other subsets/cytokines of Th in vivo, which are closely associated with the pathogenesis of PBC 39 . These observations indicated that ORMDL3 potentially has important protective effects against the development of PBC. Further evaluation of the expression levels of ORMDL3 in immunological organs derived from PBC patients will confirm our present hypothesis.
In conclusion, the in silico/in vitro functional analyses in the present study showed the molecular mechanisms through which rs12946510 caused reductions in the transcriptional efficiency of ORMDL3 and GSDMB, i.e. the transcription factor FOXO1 showed a possible protective effect against the development of PBC by binding to the expression enhancer region which includes rs12946510. These results indicated that ORMDL3, GSDMB, and FOXO1 have potentially important roles for protection against the development of PBC. This evidence suggested that the functional changes resulting from primary variants could be clarified by focusing on the functional effect of primary functional variants and that such an approach could be decisive for understanding the molecular mechanisms by which diseases develop. Other than the gene locus that was the focus of the present study, HLA, TNFSF15, POU2AF1, NFKB1-MANBA, PRKCB, and IL7R also displayed significant associations with susceptibility to PBC in the Japanese population 16 . Among these genes, we already reported the functional variants of TNFSF15 and PRKCB and their underlying molecular mechanisms 16,40 . A similar systematic approach using the methods exemplified in the present study and focusing on the other and/or the as yet un-identified susceptibility genes to PBC is needed to clarify the molecular mechanisms of disease development.

Materials and Methods
Subjects, guidelines, and regulations. Information regarding the study participants, (PBC patients and healthy controls), was provided in a previous study 16 . Written informed consent for participation in this study was obtained from all participants. This study was approved by the Research Ethics Committee, and the committee on genetically modified organisms, of the Graduate School of Medicine, The University of Tokyo. All methods were performed in accordance with the ethical guidelines and regulations. Genotype imputation. SNP genotypes of PBC cases and healthy controls that passed SNP filtering were phased with SHAPEIT2 (v2.r644) 41 . The following options were used for SHAPEIT2: -burn 10, -prune 10, and -main 25. Genotype imputation was performed on the phased genotypes with IMPUTE2 (ver. 2.3.1) 42 using a phased reference panel of 1,070 Japanese individuals from a prospective, general population 27 . For IMPUTE2, the following options were used: -Ne 2000, -k_hap 1000, -k 120, -burnin 15, and -iter 50.

Electrophoretic mobility shift assay (EMSA). The LightShift Chemiluminescent EMSA Kit
(Thermo-Fisher Scientific, Waltham, MA) was used to perform EMSA as per the manufacturer's instructions. Biotin-labeled double-stranded oligonucleotide probes corresponding to each major and minor allele, whose sequences are provided in Supplementary Table 1, were used for this assay. These biotin-labeled probes (10 fmol/ μl) were incubated with a nuclear extract (2.5 μg/ml) of Jurkat or HepG2 cells (Nuclear Extract Kit; Active Region, Carlsbad, CA) for 30 min at 25 °C.
The super-shift experiments were performed by incubating goat anti-human FOXO1 (FKHR) antibody (2 μg/ μl; C-20 X; Santa Cruz Biotechnologies, Santa Cruz, CA) for 30 min at 25 °C with the mixture of the nuclear extract incubated with the biotin-labeled probe. Each assay was performed independently, three times.
Luciferase reporter assay. For the luciferase assays, specific PCR primers (see Supplementary Table 2) were used to amplify part of the intergenic region between IKZF3 and GRB7 from human genomic DNA, which was then subcloned into the luciferase reporter pGL4.23 (luc2/minP) vector (Promega, Madison, Wis). pGL4.23   (500 ng) and the pGL4.74 (hRluc/TK) vector (50 ng) that was used as an internal control to account for cell numbers were transfected into Jurkat, HepG2, or HuCCT1 cells using Lipofectamine 3000 reagents (Thermo-Fisher Scientific). The Dual-Luciferase Reporter Assay system (Promega, Madison, WI) was used to measure luciferase activities. Each figure shows representative data of experiments that were done independently three times. The data in the figures represent averages ± standard deviation of triplicate assays in one experiment. eQTL. The correlation between the rs12946510 genotype and expression of all of the genes in chromosome 17 was examined using data available from the database of the GTEx portal at the BROAD Institute (http://gtexportal.org/home/) 31 .
Hi-C. Chromatin interaction between the 5 kb window that contains rs12946510 and the upstream sequence of ORMDL3 and GSDMB was examined using data available from the Hi-C database (http://promoter.bx.psu.edu/ hi-c/index.html) at the ENCODE Consortium 32 .
Statistical analysis. Statistical significance of the difference in luciferase activity between major and PBC susceptibility alleles of rs12946510 was determined using Student's t-test. Multiple comparisons in e-QTL analysis were adjusted by the Bonferroni correction.