A genome-wide association study identiﬁes six novel risk loci for primary biliary cholangitis

1

P rimary biliary cholangitis (PBC) is an autoimmune disease characterized by chronic inflammation and destruction of intrahepatic bile ducts. It is often considered a model autoimmune disease because of its specific autoantibodies (the anti-mitochondrial antibodies) and distinctive bile duct pathology. PBC shows a strong genetic predisposition, with the highest concordance rate of all autoimmune diseases among identical twins 1

. Several genome-wide association studies (GWAS) and
Immunochip studies have revealed a nearly uniform level of genetic susceptibility among people (or populations) of different European ancestries [2][3][4][5][6][7] , showing significant association of more than 20 loci with PBC. However, a PBC GWAS in Japanese (including 487 PBC cases and 476 controls and replication in 787 cases and 615 controls) showed very different results identifying only TNFSF15 and POU2AF1 as the main susceptibility loci, neither of which was described in the European studies 8 . Recently, we performed a replication study of the 14 most significant single nucleotide polymorphisms (SNPs) identified in either or both of the European and Japanese cohorts, using 1,070 Han Chinese PBC cases and 1,198 healthy controls, and confirmed strong association of the TNFSF15, CD80 and 17q12 loci with the disease in Chinese patients, with P-value reaching GWA significance (Po5 Â 10 À 8 ) at TNFSF15 and CD80 loci 9 . These early results also ensured required samples for our subsequent GWAS design.
To identify additional genetic risk factors for PBC and further define differences in susceptibility to the disease between European and East Asian populations, we conducted a large case-control GWAS of 1,127 PBC subjects and 4,074 controls of self-declared Han Chinese origin with population-specific HumanOmniZhongHua-8 beadchip and verified the results by genotyping top SNPs in a separate cohort of 907 PBC cases and 2,127 controls. We confirmed eight previously identified PBC susceptibility loci and also identified variants in IL21, IL21R, CD28/CTLA4/ICOS, CD58, ARID3A and IL16 as novel PBC risk loci. Subsequent histochemical studies show enhanced expression of IL21 and IL21R in PBC livers. These findings significantly expand our understanding of the disease susceptibility and suggest IL21 signalling pathway, in addition to CD4 T-cell activation and T-cell co-stimulation, are critical components in the development of PBC.

Results
Genome-wide discovery analysis. The HumanOmniZhongHua-8 beadchip (v1.1) contains 894,956 markers covering common, intermediate and rare variations found within Chinese populations. Following stringent quality control measures, 776,516 autosomal SNPs were available across 1,122 cases and 4,036 geographically matched controls (Supplementary Tables 1-3; Supplementary Fig. 1). Principal-component analysis (PCA) did not demonstrate substantial stratification in our study population (l gc ¼ 1.029, and see Supplementary Fig. 2). The quantile-quantile plot of the case-control test showed a substantial excess of significant associations in the tail of the distribution even after removal of the major histocompatibility complex (MHC) region that includes human leukocyte antigen (HLA) genes ( Supplementary Fig. 3).
We identified 179 SNPs in the MHC region and 39 SNPs in 8 non-MHC loci exceeding genome-wide significance in the discovery stage ( Fig. 1; Supplementary Tables 4-7). The most significant association was found in the HLA-DRA locus. A SNP within HLA-DRPB1 was the second most significant locus in the MHC region. Two SNPs from the MHC region and 32 SNPs from 24 non-MHC loci with at least suggestive association (Po5 Â 10 À 6 ) were selected for replication studies (Supplementary Table 6). There were 907 PBC cases and 2,127 controls in the replication panels that confirmed association of the MHC region and 15 non-MHC loci. Each of these loci in the combined analysis (2,029 PBC cases and 6,163 controls) also met genome-wide significance. These included 8 non-MHC risk loci previously identified in Caucasian and Japanese cohorts ( Supplementary Fig. 4) [2][3][4][5][6][7][8] . In addition, a previously identified risk locus, DDX6-CXCR5, did not replicate but still achieved genome-wide significance (P ¼ 2.56 Â 10 À 13 ) in the combined analysis (Table 1; Supplementary Fig. 4).
Among the non-MHC risk loci, the strongest association was observed in rs4979467, a SNP in the TNFSF15-TNFSF8 locus, the same as reported in the Japanese cohort 8 . This locus has been implicated in several autoimmune diseases 10 . Two previous studies have reported that two functional SNPs, rs6478108 and rs4979462, in the TNFSF15 promoter region are associated with TNFSF15 expression 11,12 . The rs4979467 SNP is in strong linkage disequilibrium (LD) with rs6478108 (r 2 ¼ 1) and rs4979462 (r 2 ¼ 0.68) in the Han Chinese population, implying that the functional variation in the TNFSF15 promoter can affect the genetic susceptibility to PBC. IL12A was found to be strongly associated with Han Chinese PBC cohorts, in contrast to both the Japanese GWAS and our previous candidate gene study 8,9 . On the basis of the dense finemapping study with the Immunochip, four independent associations were observed at the IL12A-SCHIP1 locus in European ancestry PBC cases 6 . The significant association (rs582537) in our GWAS corresponded to the third signal rs668998 in the European studies (Supplementary Table 8). This association also explained the discrepancy with the previous candidate gene study, in which selected SNPs represented only the first signal at the IL12A locus.
Discovery and validation of six novel PBC risk loci. Six novel risk loci associated with PBC were identified and validated in the current study (Table 2; Fig. 2). Both IL21R and IL21 were strongly associated in the Han Chinese PBC cohorts. Notably, several recent studies have found increased production of IL21 from T follicular helper (Tfh) cells in PBC patients, which might mediate B-cell maturation and autoantibody secretion 13,14 . Multiple SNPs in the IL21 and IL21R loci were significantly associated with PBC (Supplementary Table 5). We identified two independent signals represented by variants rs925550 and rs17005934 in the IL21 locus using stepwise conditional regression analysis with the two SNPs (Supplementary Table 9). These two signals were separated in the Han Chinese population as a result of a major recombination event (Fig. 2c). A much weaker association (rs108507092) observed in the nearby IL2 region was presumably due to linkage disequilibrium (r 2 ¼ 0.64) with rs17005934.
The most significant SNP, rs2189521, at the IL21R locus is located in the 5 0 UTR of IL21R, indicating its potential involvement in regulating differential IL21R expression. IL4R is located 27 kb upstream from IL21R. To analyse if the significance came from the contribution of IL4R, we performed LD analysis with the Han Chinese 1000 genome sequencing data. The results indicate that IL21R and IL4R are in two separate LD blocks in the Han population (Fig. 2a). None of the significant SNPs observed at the IL21R locus are linked to IL4R, excluding the association of IL4R with Han PBC cohorts. The variant rs2189521 is an expression quantitative trait locus (eQTL), in strong LD (r 2 ¼ 0.98-1) with multiple SNPs in the IL21R region, suggesting   Tables 10 and 11). We also note that in a recent study of European PBC the genomic region containing IL21R and IL4R while not achieving significant association did show suggestive association (P ¼ 1.6 Â 10 À 7 ) (ref. 5).
Multiple SNPs across the CD28-CTLA4-ICOS locus reaching genome-wide significance were found at the discovery stage (Supplementary Table 5) and further confirmed by the replication study. On the basis of the functional role of all the three genes in T-cell costimulatory pathways, the CD28-CTLA4-ICOS locus has long been suggested by many candidate gene studies as a PBC risk locus, but large-scale GWAS and dense fine-mapping studies in European PBC cohorts have not revealed this association 2-7 . An earlier study in a small Chinese PBC cohort also suggested association of the CTLA4 variant rs231775 with PBC 15 . Our GWAS study is the first to report the genome-wide significance of the CD28-CTLA4-ICOS locus with PBC. The most significant signal was obtained for the intergenic variant rs4675369. Haplotype analysis using the Han Chinese genome sequencing data identified a major recombination breakpoint between CD28 and CTLA4. In addition, rs4675369, together with five other highly linked SNPs (r 2 40.9), was located in the LD block    containing CD28, indicating that the observed significance may be associated with CD28, rather than CTLA4 or ICOS (Fig. 2b). Association of the CD28-CTLA4-ICOS locus with primary sclerosing cholangitis (PSC) was also reported previously in a Immunochip analysis 16 . However, the variant (rs7426056) associated with PSC is not in LD with any significant variants identified in our study We identified significant association of the CD58 locus with PBC. The SNP showing the strongest association, rs2300747 has been previously associated with multiple sclerosis (MS) and rheumatoid arthritis (RA) in large-scale GWAS and replication studies 17,18 . The 1000 genome sequencing data from the Han Chinese population indicated two major haplotype blocks in the CD58 gene. The first haplotype block includes part of CD58 intron-1, exon-1 and the 5 0 regulatory region and the nearby IGSF3 gene. The second haplotype block includes most of intron-1 and the remaining CD58 region (Fig. 2d). The variant rs2300747 and multiple SNPs in high LD (r 2 40.9) in the block are eQTLs in lymphoblastoid cell lines (LCLs) ( Supplementary  Tables 8 and 9). A previous study found that the A allele of rs2300747 was associated with lower expression of CD58 (ref. 19). In our PBC cohort, the A allele of rs2300747 is the risk allele, suggesting that decreased expression of CD58 levels may be related to PBC susceptibility.
Another significant association found in our study, rs10415976, is located in the intronic region of ARID3A, a B-cell-restricted transcription factor that activates immunoglobulin heavy chain gene transcription. During the discovery stage of GWAS, five SNPs at the ARID3A locus showed significant association, two of them reaching genome-wide significance. These two SNPs were further confirmed in the replication study. All of the significant SNPs, including SNPs in LD (r 2 40.8), were located in the intronic region and identified as cis eQTL loci ( Supplementary  Tables 10 and 11). ARID3A is the first B-cell-restricted transcription factor demonstrated to induce autoimmunity 20 . Transgenic mice constitutively expressing ARID3A in B lineage cells produce anti-nuclear antibodies. Systemic lupus erythematosus (SLE) patients show dramatically increased numbers of ARID3A þ B cells compared to healthy controls 21 .
Our study is the first to report that the ARID3A locus is associated with human autoimmune of inflammatory disease. Further studies should be aimed at investigating its functional role in the pathogenesis of PBC. In our GWAS discovery stage, seven SNPs in the IL16 region showed strong association (P ¼ 9.42-2.93 Â 10 À 6 ) with PBC. One of the variants, rs4778636, was selected in the replication study and reached genome-wide significance (P ¼ 8.99 Â 10 À 9 ). IL16 is a T-cell chemo-attractant factor and has been linked to diseases that are characterized by accumulation of CD4 þ cells at the site of inflammation, for example, asthma, atopic dermatitis, RA, MS and Crohn's disease 22 . However, the pathophysiological role of IL16 in PBC remains unclear. IL16 has been found to be constitutively expressed at the mRNA and protein level in monocytes, T lymphocytes, mast cells, eosinophils and dendritic cells. Several of the associated IL16 variants are cis eQTL in peripheral monocytes (Supplementary Tables 10 and 11). This is the first report that IL16 is a disease susceptibility locus with genome-wide significance.
There was also a novel locus CSNK2A2/CCDC113 that did not meet significance in the replication study after correction for multiple comparison, but was significant by genome-wide criteria (P ¼ 1.51 Â 10 À 8 ) in the combined analysis (Table 2).
Aberrant expression of IL21/IL21R in PBC liver lesion. To further explore whether deregulated IL21 and IL21R contribute to the pathogenesis of PBC, we performed histochemical analysis with liver biopsy samples. Significantly elevated levels of expression of IL21 and IL21R was observed in the livers of PBC patients, compared with those of chronic hepatitis B (CHB) patients, autoimmune liver hepatitis (AIH) patients or healthy controls (HC). IL21 was extensively expressed by inflamed hepatocytes and infiltrating inflammatory cells around the portal tracts in PBC patients (Fig. 3a-d), while IL21R þ cells were generally aggregated in the inflamed portal tracts, especially around the damaged interlobular bile ducts in PBC (Fig. 4a-d). Notably, the numbers of IL21 þ cells and IL21R þ cells were positively correlated with inflammation severity (r ¼ 0.45, Po0.05; r ¼ 0.43, Po0.05; respectively) and hepatic fibrosis stages (r ¼ 0.66, Po0.01; r ¼ 0.60, Po0.01; respectively) in PBC (Figs 3e-g and 4e-g). Therefore, there is marked activation of IL21/IL21R signalling in PBC patients, suggesting that IL21/IL21R interaction is an important component in the pathogenesis of PBC.

Discussion
Our PBC GWAS included 2,029 cases and 6,163 controls in a Han Chinese population and allowed us to perform comparisons with GWAS findings from European populations (Supplementary  Table 12). When three PBC GWAS in European ancestry cohorts from North America and Europe were published, they were hailed by researchers as not only a successful demonstration of the power of GWAS in deciphering complex diseases, but also a landmark achievement in our understanding of the pathogenesis of PBC, given the striking consistency in results among the three studies 23 . This notion was further reinforced later by two separate fine-mapping studies providing virtually identical results with PBC cohorts from Europe and North America 6,7 . These findings prompted some experts to proclaim that PBC probably does constitute a single disease entity across different populations 23 . However, a subsequent GWAS from a Japanese PBC cohort with divergent results, failed to identify any non-MHC loci with genome-wide significance during the discovery stage. The most significant non-MHC locus reaching genome-wide significance after combined analysis is TNFSF15, which is not associated in European PBC cohorts. Only three loci identified in the Caucasian cohorts were replicated in Japanese cohort after combined analysis 8 . Lack of a large sample size, as well as ethnic specific chips, may have contributed to these results 24 . Compared with the European and Japanese cohorts, the Han Chinese PBC cohort offers an advantage in genetic studies, since the overall genetic variation in the Han Chinese population is not as great as observed in the European and Japanese populations 25,26 . The low w 2 inflation prior to PCA in our study (l gc ¼ 1.029) further supports these previous observations. The low genetic variation may enhance the power of association studies in our Han Chinese PBC cohort because of reduced genetic heterogeneity.
GWAS analysis has achieved great success in our understanding of PBC pathogenesis. Multiple PBC risk loci identified in this study were also shared risk loci in other autoimmune diseases (Supplementary Table 13). But several unique features were observed in our studies. Our IL21 results in particular have important implications and this gene might be worth exploring as a therapeutic target. IL21 is predominantly produced by Tfh cells although high levels are also noted in Th 17 and natural killer (NK) cells. Tfh are characterized by expression of the cell surface marker CXCR5, encoded by the gene located at the PBC loci previously identified and strongly suggested by the current study 4 . Binding of IL21 with its ligand IL21R leads to activation of multiple downstream signalling molecules, including STAT1 and STAT3, which is important for proliferation and differentiation of T cells, B cells and NK cells. We also note that IL12 (both IL12A and IL12RB2 are associated with PBC) has been shown to enhances IL21 production and the frequency of IL21 þ CD4 T cells 27 .
Two genes (CD80 and TNFSF15) encoding T-cell costimulatory molecules had been previously identified as PBC risk loci and were confirmed in our study. Identification of two more genetic loci encoding CD28/CTLA4/ICOS and CD58 provided further evidence that deregulated costimulatory signalling confers predisposition to the disease.
Association of IL16 locus with PBC may have significant application in PBC treatment. A previous animal study found that neutralizing anti-IL16 mAb can prevent insulitis and type 1 diabetes in mice 28 . Neutralizing IL16 in PBC patients may represent a potential therapeutic approach.
In summary, this is the first report of a large scale GWAS of PBC in a Han Chinese population and has identified six new PBC susceptibility loci. Our study supports the hypothesis that the IL21 signalling pathway and Tfh cells are involved in PBC pathogenesis. Our results also draw attention to the importance of T-cell costimulatory signalling in PBC and advance our understanding of PBC as a disease of immune deregulation.

Methods
Patient and control selection and diagnosis. PBC patients were recruited with the approval of the research ethics boards of Southeast University and Jiaotong University and in accordance with the guidelines of the Declaration of Helsinki (2008). Informed consent was obtained from all subjects recruited. PBC diagnosis in this study was based on the criteria recommended by AASLD and the European Association for the Study of the Liver (EASL) 29,30 . PBC patients were recruited from two main sources, the Jiangsu Province PBC Collaboration Group (JSPPCG) and Renji Hospital of Jiaotong University. Patient blood samples from Renji Hospital were tested for PBC-specific autoantibodies with EUROLINE Profile Autoimmune Liver Diseases (DL1300-1601-5G). JSPPCG samples were collected from participating hospitals and PBC-specific autoantibodies were tested by different clinical laboratories. Samples from the JSPPCG collection were verified for autoantibody status with an in-house ELISA assay for anti-PDC-E2 and a 3E assay with a commercial ELISA kit (Shanghai Kexin Biotech, Shanghai, China). We confirm our study is compliant with the 'Guidance of the Ministry of Science and Technology (MOST) for the Review and Approval of Human Genetic Resources', which requires formal approval for the export of human genetic material or data from China.
The control samples used in the discovery stage were recruited from previous studies 31,32 . Healthy control samples used in replication stage were recruited in Nanjing, from the annual health check of staff members of Southeast University. All the control samples used in the replication study have been tested for AMA-M2 autoantibody using 3E-based ELISA assay. Only AMA-M2 negative samples were retained for study. To avoid potential population stratification, the vast majority of PBC patients and controls for GWA and replication studies originally came from Central China (Supplementary Fig. 1). All samples from both the discovery and replication stages were unrelated individuals of self-claimed Chinese Han descent. The demographic information of patients and controls were summarized in Supplementary Tables 1 and 2.
Quality control in the GWA discovery stage. The Illumina Huma-nOmniZhongHua-8 Beadchip (v1.1), a population-specific beadchip targeting common, intermediate, and rare variations found within Chinese populations, was used for both PBC and control samples. The Illumina beadchip scan of 1,127 PBC samples was performed by Genenenergy Bio-Technology, an Illumina CSProcertified service provider. Scanning of 4,074 control samples was conducted at the Key Laboratory of Dermatology at Anhui Medical University (Ministry of Education). PBC and control samples with overall call rates of o98% were excluded from further analysis. Unexpected duplicates or probable relatives were excluded based on pairwise identity-by-state comparisons using the 'PI_HAT' value in PLINK (all PI_HAT40.25) 33 . The remaining samples were subsequently assessed for population outlier and stratification using a PCA-based approach. PCA was performed using the smartpca software 33 . After filtering, 1,122 cases and 4,036 controls were retained for analysis. In the GWA stage, single-marker association analyses were performed using logistic regression with gender as a covariate. The Manhattan plot of À log10 P was generated using Haploview v4.2 (ref. 34). The Quantile-quantile (Q-Q) plot was used to assess the number and magnitude of observed associations between genotyped SNPs and PBC, compared with the association statistics expected under the null hypothesis of no association. The Q-Q plot was created using the R qq.plot function.
We performed systematic quality control on the raw genotyping data to filter out both unqualified samples and SNPs. After excluding mitochondrial SNPs and SNPs on sex chromosomes, the following criteria were applied to disqualify additional SNPs: SNPs with a call rate o98%, or with minor allele frequency (MAF)o0.01 in all samples or SNPs with genotype distributions that deviated from those expected by the Hardy-Weinberg equilibrium (Po1 Â 10 À 4 ) in the controls. For all the SNPs with Po1 Â 10 À 5 in the discovery stage, manual checking of SNP genotyping clustering in each dataset was performed and 54 SNPs were further excluded for the analysis. After quality control filtering, there were 776,516 SNPs remained in the discovery stage (Supplementary  Table 3).
SNP selection and genotyping in the replication stage. SNPs for the replication stage were selected using the following criteria: each locus with one SNP reaching Po1 Â 10 À 6 or with minimum three SNPs reaching Po5.0 Â 10 À 6 for discovery samples. A total of 22 non-MHC loci matched these criteria, except TNFRSF1A locus, which has three SNPs reaching Po6 Â 10 À 5 . The most significant SNPs in HLA-DRA locus (rs9268644) and DPB1 locus (rs9501251) were selected for validation. For all the known PBC loci published previously, one of the most significant SNPs was selected for validation. We also intentionally selected one SNP (rs1944918) in POU2AF1 locus, since this locus was reported in Japanese GWAS as the second most significant locus for PBC. For each loci newly discovered, two of most significant SNPs were selected for validation. Genotyping analyses of replication were conducted by the Sequenom MassARRAY system using the iPlex method 35 . All SNPs achieved a call rate 498% and had no deviation from HWE (Po0.05) in the controls.
Statistical analysis in the replication and combined stage. For the replication studies, 34 SNPs that passed quality control were analysed using logistic regression with gender as a covariate, assuming an additive allelic effect. The P-values adjusted by gender were reported without correction for multiple testing. Joint analysis of all combined samples of Han Chinese origin was conducted by both the random effects model (I 2 425%) and by the fixed-effect model (I 2 o25%). The P-value reported in Tables 1 and 2; Supplementary Table 6 was based the results by the fixed-effect model. The regional association plot was created using LocusZoom 36 . Imputation of selected loci was conducted using IMPUTE2 software version 2.2.2 and version 3 of 1000 Genomes Project data as the reference set 37 .
We used seeQTL (http://www.bios.unc.edu/research/genomic_software/ seeQTL/) and the University of Chicago eQTL browser (http://eqtl.uchicago.edu/ cgi-bin/gbrowse/eqtl/) to identify eQTLs amongst significant variants [38][39][40] . eQTLs that are in high linkage disequilibrium (r 2 40.8) with the most strongly associated SNP at the locus were extracted. Gene regulatory elements from the Encyclopedia of DNA Elements (ENCODE) database were annotated using HaploReg 41 . For each locus, SNPs in high linkage disequilibrium (r 2 40.8) with the most significantly associated SNP was assessed as to whether they lie within regions with promoter and enhancer histone marks, DNase-I hypersensitivity, protein binding or regulatory motifs in one or more of 147 cell types.
Immunohistochemistry. Paraffin-embedded liver tissues, derived from ultrasound-guided needle liver biopsies of 30 PBC patients, 30 AIH patients, 25 CHB patients and 6 controls, were studied. Sections were stained with either hematoxylin and eosin (H&E) or Masson and were independently reviewed 'blindly'. Inflammatory degrees and fibrotic stages were graded according to the Scheuer scoring system. The six healthy liver tissues were collected from donors whose livers were subsequently used for liver transplantation. Immuohistochemistry imaging of IL21 and IL21R staining of human liver tissues was performed on a Leica Bond system (Leica, Germany) using the standard protocol. The liver sections were pre-treated using heat-mediated antigen retrieval with sodium citrate buffer (pH 6, epitope retrieval solution 1) for 20 min followed by incubation with an anti-IL21 antibody (1:200 dilution, ab5978, Abcam, Cambridge, MA, USA) or an anti-IL21R antibody (1:150 dilution, ab13268, Abcam) for 20 min at room temperature and then detected using a horse radish peroxidase-conjugated compact polymer system. 3,3 0 -diaminobenzidinewas used as the chromogen 42 . The liver sections were then counterstained with hematoxylin. All the sections were visualized using light microscopy (Olympus, Japan), and five fields were randomly selected for each section. The numbers of IL21-or IL21R-positive cells were quantified at 40 Â 10 magnification. All the samples were analysed by a hepatic pathologist. All data are reported as mean ± standard error (s.e.). The Mann-Whitney U-test was used to evaluate differences in continuous variables. A P-value of o0.05 was considered statistically significant. Correlations were determined using the Spearman's correlation coefficient. All analyses were two-tailed and performed using Prism software Version 6.0 (Graphpad Software, La Jolla, CA, USA).
Data availability. The GWA SNP results are available upon request by contacting X.L. at xiangdongliu@seu.edu.cn. Any additional data (beyond those included in the main text and Supplementary Information) that support the findings of this study are also available from the corresponding author upon request.