Understanding of HLA-conferred susceptibility to chronic hepatitis B infection requires HLA genotyping-based association analysis

Associations of variants located in the HLA class II region with chronic hepatitis B (CHB) infection have been identified in Asian populations. Here, HLA imputation method was applied to determine HLA alleles using genome-wide SNP typing data of 1,975 Japanese individuals (1,033 HBV patients and 942 healthy controls). Together with data of an additional 1,481 Japanese healthy controls, association tests of six HLA loci including HLA-A, C, B, DRB1, DQB1, and DPB1, were performed. Although the strongest association was detected at a SNP located in the HLA-DP locus in a SNP-based GWAS using data from the 1,975 Japanese individuals, HLA genotyping-based analysis identified DQB1*06:01 as having the strongest association, showing a greater association with CHB susceptibility (OR = 1.76, P = 6.57 × 10−18) than any one of five HLA-DPB1 alleles that were previously reported as CHB susceptibility alleles. Moreover, HLA haplotype analysis showed that, among the five previously reported HLA-DPB1 susceptibility and protective alleles, the association of two DPB1 alleles (DPB1*09:01, and *04:01) had come from linkage disequilibrium with HLA-DR-DQ haplotypes, DRB1*15:02-DQB1*06:01 and DRB1*13:02-DQB1*06:04, respectively. The present study showed an example that SNP-based GWAS does not necessarily detect the primary susceptibility locus in the HLA region.

Hepatitis B virus (HBV) is an infectious disease that has spread worldwide with an estimated 350 million chronically infected people. Some countries in Asia and Africa are known to be high endemicity areas where the prevalence of chronic hepatitis B (CHB) infection is over 8%. In Japan, chronic infection of an estimated 1.5 million people was caused by mother-to-child transmission, the reuse of syringes and needles, and sexually transmitted infections. Previous genome wide association studies (GWASs) have reported CHB susceptibility loci including HLA-DP, HLA-DQ, EHMT2, TCF19, HLA-C, UBE2L3, CFB, NOTCH4, HLA-DOA, and CD40 in Asian populations [1][2][3][4][5] . Among CHB susceptibility loci, associations between polymorphisms within HLA-DP locus and CHB infection were replicated in Asian and Arabian populations, including Japanese, Han Chinese, Korean, Thai and Saudi Arabian populations 6,7 .
Previous reports revealed that polymorphisms within the HLA-DP and HLA-DQ loci were independently associated with CHB infection in the Japanese population 2,3 . HLA class II genes are known to be highly polymorphic, which means that there are many different subtypes (i.e. HLA alleles) in the different individuals inside a population. Therefore, HLA genotyping-based association analysis is necessary to comprehensively understand the associations between HLA genes and CHB infection. There have been no reports to clearly analyze the association of HLA genes with CHB infection. This is the first report to clearly show the associations of HLA class II genes with CHB infection using the emerging method of HLA imputation. The findings in this paper will be essential for future analysis to clarify the mechanisms of the immune recognition of HBV antigens by HLA class II molecules.
To investigate the relationship between HLA-DP variants (rs2395309 for HLA-DPA1 and rs9277496 for HLA-DPB1) and the HLA-DQB1 variant (rs9368737) and CHB susceptibility, we performed logistic regression analysis using the three associated SNPs as covariates. Significant associations of variants within the HLA-DP and HLA-DQ loci with CHB susceptibility were independently identified, as previously reported (Supplementary Table 1). In the regression analysis using three representative SNPs located in both HLA-DP and HLA-DQ regions as covariates, a number of SNPs located around the SNPs showed weakened ( Supplementary Fig. 2). These results indicated that SNPs in HLA-DP and HLA-DQ regions were in strong linkage disequilibrium (LD) each other.
In order to clearly understand the associations of HLA genes with CHB infection, HLA genotyping has been considered as the next step, in which HLA alleles that will behave as functionally distinct HLA allotypes are determined. Here, instead of HLA genotyping, we performed statistical imputation of classical HLA alleles for six HLA loci including HLA-A, C, B, DRB1, DQB1, and DPB1 using 1,975 genome wide SNP typing data as in our previous report 8 . The call rates and imputation accuracies for six HLA loci were evaluated in 417 Japanese healthy controls 9 , whose HLA genotypes were determined using a PCR sequence-specific oligonucleotide (PCR-SSO) method. When only samples with posterior probability of 0.5 or more were considered, the call rates and imputation accuracies had a range of 98.1-100% and 97.3-100%, respectively, across six HLA loci (Supplementary Table 2 and Supplementary Table 3). Higher accuracy was achieved compared to previous reports in Asian populations 10,11 . Although the HLA alleles were imputed with high accuracy in the present study, four HLA class I alleles were shown to have a discordant rate of over 0.5% (more than 5 discordant alleles out of a total of 417 HLA genotypes); HLA-A*24:20 (8 discordances), HLA-A*26:02 (5 discordances), HLA-C*03:04 (6 discordances), and HLA-C*08:03 (10 discordances). Therefore, these four alleles were excluded from the following association analyses to avoid false positives due to an error of imputation.
Tests of the association of HLA alleles for six HLA loci with CHB susceptibility was carried out using data from a total of 3,456 Japanese individuals consisting of 1,975 individuals whose HLA genotypes were estimated by HLA imputation, and 1,481 Japanese healthy individuals whose HLA genotypes were determined using the PCR-SSO method. After removing the defect data to compare OR of each HLA allele, HLA allele frequencies between 805 HBV patients and 2,278 healthy controls were compared for the six HLA loci (Supplementary  Table 4-9). Significant associations after correction of the significance level by the total number of observed alleles (P < 0.05/144) were observed for a total of twenty alleles. Interestingly, the strongest association was observed for HLA-DQB1*06:01, which showed a greater association with CHB susceptibility than any one of five HLA-DPB1 alleles that were previously reported as CHB susceptibility alleles (OR = 1.76; 95%CI = 1.55-2.01, P = 6.57 × 10 −18 for DQB1*06:01).
As is well known, strong LD between DRB1 and DQB1 alleles and less strong LD between DPB1 and DRB1-DQB1 alleles/haplotypes have been reported in many populations [12][13][14] . Strong LD (r-squared and D prime) between HLA class II alleles was also observed in the studied Japanese individuals (Supplementary Table 10 and  Supplementary Table 11). Haplotype frequencies for six HLA loci, for three HLA class I loci and for three HLA class II loci were estimated using the PHASE software and were compared between HBV patients and healthy controls (Supplementary Table 12, Supplementary Table 13 and Table 1). Among the twenty-five haplotypes of HLA-A-C-B-DRB1-DQB1-DPB1 whose frequencies were over 0.5% in either of two groups (i.e. HBV patients and healthy controls), the most frequent haplotype showed the strongest association with CHB susceptibility in the studied individuals (OR = 1.81; 95%CI = 1.47-2.22, P = 1.03 × 10 −8 for HLA-A*24:02-C*12:02-B*52:01-DRB1*15:02-DQB1*06:01-DPB1*09:01). Because the estimated haplotypes of six HLA loci were highly varied, subdivided haplotypes with low frequency may lead to difficulty in detection of a true association. Haplotype analysis of HLA class I genes and HLA class II genes showed a total of twenty-three haplotypes and twenty-five haplotypes, respectively, whose frequencies were over 1.0% in either of the two groups. Among these haplotypes, the haplotype harboring DQB1*06:01 showed up with the highest frequency in the studied individuals, and had a significant association with CHB susceptibility (OR = 1.91; 95%CI = 1.61-2.28, P = 1.13 × 10 −13 for HLA-DRB1*15:02-DQB1*06:01-DPB1*09:01).
In the current study, SNP based association tests showed that the significant association of variants located in the HLA class II region with CHB susceptibility was replicated in Japanese individuals. Although HLA-DQ and DP were shown to be independently associated with CHB susceptibility by applying regression analysis with associated variants as covariates, further analysis of HLA molecules is necessary to clarify the pathogenesis of HBV infection. To clearly understand the associations of HLA genes with CHB infection, HLA alleles were determined by the HLA imputation method using the genome-wide SNP typing data set. HLA class II alleles showed stronger associations with CHB susceptibility than HLA class I alleles. Interestingly, HLA-DQB1*06:01 showed the strongest association out of a total of twenty associated alleles, including any one of the previously reported HLA-DPB1 alleles (i.e. DPB1*05:01 and *09:01 for susceptibility to CHB infection; DPB1*02:01, *04:01, and *04:02 for protection against CHB infection). Haplotype analysis of HLA class II genes showed seven haplotypes that were significantly associated with susceptibility to or protection against CHB infection (Table 1). Figure 1A,B summarize the associations of each allele and estimated haplotypes of HLA class II genes with CHB susceptibility. A variety of haplotypes harboring DPB1*05:01 were observed. Of these, two haplotypes, DRB1*09:01-DQB1*03:03-DPB1*05:01 and DRB1* 08:03-DQB1*06:01-DPB1*05:01, showed significant associations, with the same trend of association (i.e. susceptibility to CHB infection). These results imply that association of DPB1*05:01 may have the primary effect on CHB susceptibility, regardless of DRB1 and DQB1 alleles. The same can be said for haplotypes harboring DPB1*02:01 or *04:02, although no significant association with CHB infection was observed in haplotypes harboring DPB1*02:01.
Associations of variants located in the HLA class II region with CHB susceptibility have been identified in several studies based on GWAS including the present study. Although HLA-DR and DQ, which are known to be in strong LD, and HLA-DP were independently associated with CHB susceptibility, it is difficult to clearly understand the association of HLA genes with CHB susceptibility using SNP based GWASs. Thus, the association of a specific SNP in the HLA region with CHB susceptibility may result from compositing effects of several HLA alleles. Therefore, the emerging method of HLA imputation, which uses a genome-wide SNP typing data set, is considered to be an effective strategy for comprehensive understanding of HLA-disease associations. Indeed, the present study showed that among the five previously reported HLA-DPB1 susceptibility alleles, three DPB1 alleles (DPB1*05:01, *02:01, and *04:02) had the primary effects on CHB susceptibility. However, the association of the remaining two alleles (DPB1*09:01 and *04:01) had come from LD with HLA-DR-DQ haplotypes (i.e. DRB1*15:02-DQB1*06:01 and DRB1*13:02-DQB1*06:04, respectively). These observations provide an example that SNP-based GWAS does not necessarily detect the primary susceptibility locus in this particular genomic region.
The disease-associated HLA alleles which were identified in this study may be beneficial to select patients who need a continuous follow-up (i.e. patients harboring susceptible HLA allele to CHB infection). As our current results showed, observed odds ratio of disease-associated HLA alleles were 1.91 for susceptible DRB1-DQB1-DPB1 haplotype, and 0.44 for protective DRB1-DQB1-DPB1 haplotype. Although the impact of disease-associated HLA alleles or haplotypes on clinical diagnosis is indeed small, further analysis to identify new host factors behind HLA genes, viral factors and clinical features may proceed effectively by selecting individuals who have the disease-associated HLA class II alleles.

Methods
Ethics approval. This study was approved by the Ethics Committee of The University of Tokyo and of all of SNP genotyping and data cleaning. For the GWAS, we genotyped 1,975 samples (1,033 Japanese HBV patients and 942 Japanese healthy controls) using the Affymetrix Axiom Genome-Wide ASI 1 Array, according to the manufacturer's instructions. All samples had an overall call rate of more than 96%; the average overall call rate for HBV patients and healthy controls was 99.45% (97.48-99.84) and 99.31% (96.18-99.89), respectively. We then applied the following thresholds for SNP quality control during the data cleaning: SNP call rate ≥ 95%, minor allele frequency ≥ 5% in both HBV patients and healthy controls, and Hardy-Weinberg Equilibrium P-value ≥ 0.001 in healthy controls 17 . Of the SNPs on autosomal chromosomes, 424,157 SNPs passed the quality control filters and were used for the association analysis. All cluster plots for SNPs with a P < 0.0001 based on a chi-square test of the allele frequency model were checked by visual inspection, and SNPs with ambiguous genotype calls were excluded. Supplementary Fig. 1 shows the regional Manhattan plot of the HLA region (Chr6: 32,256,456 -33,258,648, GRCh37 hg19).
HLA imputation. SNP data from 1,975 samples were extracted from an extended MHC (xMHC) region ranging from 25759242 to 33534827 bp based on the hg19 position. We conducted 2-field HLA genotype imputation for six class I and class II HLA genes using the HIBAG R package 8,18 . For HLA-A, B, DRB1, DQB1 and DPB1, our in-house Japanese imputation reference 8 was used for HLA genotype imputation; for HLA-C, the HIBAG Asian reference 18 was used for HLA genotype imputation. We applied post-imputation quality control using call-threshold (CT > 0.5); the call rate of the successfully imputed samples ranged from 98.1-100% for the 6 HLA classes we imputed. Quality of HLA imputation was further accessed using the data of 417 healthy controls in which their HLA genotypes were determined using the PCR-SSO method. In total, we imputed 148 HLA genotypes of HLA class I and class II genes.
Scientific RepoRts | 6:24767 | DOI: 10.1038/srep24767 Haplotype estimation. The phased haplotypes consisting of six HLA loci were estimated by using the PHASE program version 2.1 19,20 . The estimated 6-locus haplotypes were further used for the estimation of haplotypes of three HLA class II loci (i.e., the collapsing method was applied to the phased data for six HLA loci).
Pairwise LD between HLA class II alleles. The pairwise LD parameters, r 2 and D′ 21 , between alleles at different class II HLA loci were calculated based on the haplotype frequencies estimated by using the expectation maximization (EM) algorithm 22 . Here, each HLA allele was assumed to be one of the alleles at a bi-allelic locus, and the other HLA alleles at the same locus were assumed to be the other allele. For example, the DRB1*01:01 allele and the other DRB1 alleles were designated as "A allele" and "B allele", respectively. Accordingly, the EM algorithm for the estimation of haplotype frequencies for two loci each with two alleles could be applied to two HLA alleles at different loci.

Association test.
To assess the association of HLA allele or haplotype with CHB infection, Pearson's chi-square test was applied to a two-by-two contingency table based on the allele or haplotype frequencies. The susceptibility to or resistance against CHB infection was evaluated based on the OR (i.e., OR > 1 and OR < 1 indicate susceptible and resistant alleles, respectively). To avoid false positives due to multiple testing for 144 HLA alleles, the significance level was set at 0.00035 (= 0.05/144).