Genome-wide association study identified new susceptible genetic variants in HLA class I region for hepatitis B virus-related hepatocellular carcinoma

We have performed a genome-wide association study (GWAS) including 473 Japanese HBV (hepatitis B virus)-positive HCC (hepatocellular carcinoma) patients and 516 HBV carriers including chronic hepatitis and asymptomatic carrier individuals to identify new host genetic factors associated with HBV-derived HCC in Japanese and other East Asian populations. We identified 65 SNPs with P values < 10−4 located within the HLA class I region and three SNPs were genotyped in three independent population-based replication sets. Meta-analysis confirmed the association of the three SNPs (rs2523961: OR = 1.73, P = 7.50 × 10−12; rs1110446: OR = 1.79, P = 1.66 × 10−13; and rs3094137: OR = 1.73, P = 7.09 × 10−9). We then performed two-field HLA genotype imputation for six HLA loci using genotyping data to investigate the association between HLA alleles and HCC. HLA allele association testing revealed that HLA-A*33:03 (OR = 1.97, P = 4.58 × 10−4) was significantly associated with disease progression to HCC. Conditioning analysis of each of the three SNPs on the HLA class I region abolished the association of HLA-A*33:03 with disease progression to HCC. However, conditioning the HLA allele could not eliminate the association of the three SNPs, suggesting that additional genetic factors may exist in the HLA class I region.

investigate the association between HLA alleles and HCC. HLA allele association testing revealed that HLA-A * 33:03 (OR = 1.97, P = 4.58 × 10 −4 ) was significantly associated with disease progression to HCC. Conditioning analysis of each of the three SNPs on the HLA class I region abolished the association of HLA-A * 33:03 with disease progression to HCC. However, conditioning the HLA allele could not eliminate the association of the three SNPs, suggesting that additional genetic factors may exist in the HLA class I region.
Hepatitis B (HB) is a potentially life-threatening liver infection caused by hepatitis B virus (HBV), and approximately 248 million people worldwide are estimated to be chronically infected with HBV 1 . The clinical course of HBV infection is variable, including acute self-limiting infection, fulminant hepatic failure, inactive carrier state, and chronic hepatitis with progression to liver cirrhosis and hepatocellular carcinoma (HCC). Although some HBV carriers spontaneously eliminate the virus, every year 2-10% of individuals with chronic HB (CHB) develop liver cirrhosis, and a subset of these individuals suffer from liver failure or HCC 2 . Around 600,000 new HCC cases are diagnosed annually worldwide, and it is relatively common in Asia-Pacific countries and sub-Saharan Africa. More than 70% of HCC patients are diagnosed in Asia 3 . In contrast, HCC is relatively uncommon in the USA, Australia, and European countries 3,4 . The majority of HCC cases develop in patients with cirrhosis, which is most often attributable to chronic HBV infection followed by chronic hepatitis C virus infection in the Asia-Pacific region 5 .
Human leucocyte antigen (HLA) proteins present self and non-self peptides to T cell receptors (TCRs) to maintain self-tolerance and adapted immunity. The HLA region resides on the short arm of chromosome 6, designated as 6p21. 3. It is about 3.6 Mb in length and more than 200 functional and nonfunctional genes 6,7 are located in the region. The whole HLA region is divided into three subgroups, which are designated as class I, II, and III. The HLA class I region contains 19 HLA class I genes including 3 classical (HLA-A, -B, and -C), 3 non-classical (HLA-E, -F, and -G), and 12 non-coding genes or pseudogenes. The HLA class II region contains classical class II alpha-and beta-chain genes of HLA-DR, -DQ, and -DP. All HLA class I and class II molecules can present peptides to T cells, but each protein binds a different range of peptides. The presence of several different genes of each HLA class means that any one individual is equipped to present a much broader range of peptides than if only one HLA molecule of each class were expressed at the cell surface. A total of 17,695 HLA alleles (12,893 in class I and 4,802 in class II) were released by The IPD-IMGT/HLA database release 3.31.0 in January 2018 (https://www.ebi. ac.uk/ipd/imgt/hla/). Of the 12,893 class I alleles, 4,181, 4,950, and 3,685 alleles were registered in HLA-A, -B, and -C genes, respectively. Of 4,802 class II alleles, 2,146, 1,178, and 965 alleles were registered in HLA-DRB1, -DQB1, and -DPB1 genes, respectively.
Recent genome-wide association studies (GWAS) of chronic HBV carriers with or without HCC in Chinese populations reported that one SNP (rs17401966) in KIF1B, two SNPs (rs9272105 and rs455804) in HLA-DQA1/ DRB1 and GRIK1, and two SNPs (rs7574865 and rs9275319) in STAT4 and HLA-DQ were associated with disease progression to HCC [8][9][10] . A number of candidate genes have been investigated by genetic association studies to evaluate their roles in susceptibility to HCC. The findings from these studies, however, are inconclusive due to insufficient evidence and a lack of independent validation. All three papers referred to in this manuscript performed GWAS and replication studies using only Chinese population samples. For example, the study by Zhang et al. 10 used 2,310 cases and 1,789 controls of Chinese ancestry and identified one intronic SNP in KIF1B associated with HBV-related HCC. This result, however, was not replicated in several other populations 11,12 ). These findings suggest that GWAS and subsequent replication studies should be conducted in populations other than Chinese.
In this study, we performed GWAS using Japanese CHB patients with and without HCC and a replication study using East Asian populations including Japanese, Hong Kong Chinese, and Thai.

GWAS and replication study of HBV-related HCC.
We conducted a GWAS using samples from 473 Japanese HBV-positive HCC patients and 516 HBV carriers including CHB and asymptomatic carrier (ASC) individuals by analyzing 447,830 autosomal SNPs. Figure 1 shows a genome-wide view of the SNP association data based on allele frequencies. There were 110 SNPs with P values < 10 −4 in the GWAS (Supplementary Materials, Table S1). Of the 110 SNPs, 65 and 4 SNPs were located on the HLA class I and II regions, respectively. These results suggested that HBV-related HCC could be associated with SNPs located in the HLA region, although associations did not reach the genome-wide significance level. Outside the HLA region, there were 41 SNPs with P values < 10 −4 and 4 SNPs showed P values < 10 −5 .
In order to validate these suggestive associations, we selected seven SNPs based on the following criteria: P values < 10 −4 in the HLA region and <10 −5 outside the HLA region and only SNPs with the lowest P value or highest OR were selected when multiple SNPs showed strong LD. Three independent sets of HBV-related HCC cases, CHB and ASC controls (replication-1: Japanese 153 cases and 614 controls; replication-2: Hong Kong Chinese 94 cases and 187 controls; and replication-3: Thai 185 cases and 198 controls), and the original GWAS set of 989 Japanese samples (473 cases and 516 controls) were genotyped and used in a subsequent replication analysis. Of the seven SNPs, four (rs2523961, rs1110446, and rs3094137 located on HLA class I region, and rs2295119 located on HLA class II region) were validated, and consistent associations were observed between the original GWAS set and replication sets (Table 1). For these four SNPs, no heterogeneity of association was observed between the original GWAS samples and the replication samples. Two SNPs in the HLA region (rs2523961 and rs1110446) showed a genome-wide significant association (rs2523961: OR = 1.91, P = 6.42 × 10 −10 ; and rs1110446: OR = 1.93, P = 2.52 × 10 −10 ) using the combined Japanese samples (GWAS and replication-1) (Table 1). Moreover, the meta-analysis with the combined Japanese samples and two independent sample sets (Hong Kong Chinese and Association test for imputed HLA alleles. The two SNPs showing genome-wide significant associations were located on HLA class I region, and the marginally associated SNP was located on HLA class I and II region. To investigate the association of HLA alleles, we performed two-field HLA genotype imputation for six HLA loci (HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1) using 989 genome-wide genotyping data used for the GWAS. Imputed HLA alleles were filtered (Call Threshold < 0.5) before performing association analysis for each HLA locus. The results of association tests in HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1 alleles are shown in Table 2 and Supplementary Materials, Table S2. To avoid false-positive results due to multiple testing for 77 HLA alleles, significance levels were set at 0.000649 (=0.05/77). A protective effect of HLA-DPB1 * 02:01 (OR = 0.59, P = 5.23 × 10 −6 ) was observed as previously reported 13 . We also detected that HLA-A * 33:03 was significantly associated with disease progression to HCC (OR = 1.97, P = 4.58 × 10 −4 ) ( Table 2).
Conditioning each of the three SNPs on the HLA class I region (Supplementary Material, Fig. S1a-c) abolished the association of HLA-A * 33:03 (P > 0.05), but conditioning of A * 33:03 could not eliminate the association of the three SNPs (rs2523961: OR = 1.69, P = 7.06 × 10 −4 ; rs1110446: OR = 1.65, P = 9.33 × 10 −4 ; and rs3094137: OR = 1.54, P = 5.68 × 10 −3 ) (Fig. 2). These conditional analyses suggest that additional genetic factors other than HLA-A allele exist in the HLA class I region. In contrast to the class I region, conditional analysis controlling for the SNP rs2295119 using DPB1 * 02:01 allele suggests that DPB1 allele could abolish the association of rs2295119 on the HLA class II region (P > 0.05) (Supplementary Material, Fig. S1e).

Discussion
In the current GWAS, we found a marginal association between an SNP (rs2295119) located in the HLA-DPB region and HBV-related HCC. Moreover, the association analysis of HLA-DPB1 alleles and the conditional analysis with HLA-DPB1 * 02:01 suggested that DPB1 * 02:01 was the major protective allele in the HLA class II region. Recent GWAS also showed that SNPs located in the HLA class II region (HLA-DQA1/DRB1 9 and HLA-DQ 8  We also identified significant associations in the HLA class I region, especially around the HLA-A locus. The association test of imputed HLA alleles and conditional analyses with HLA-A * 33:03 suggested that HLA-A * 33:03 is the susceptibility allele for HCC. We performed additional conditional analyses controlling for the SNP on chromosome 6 using A * 33:03 and DPB1 * 02:01 alleles. This indicated that HLA-A and DPB1 alleles could abolish the association in the HLA class II region but were not sufficient to abolish the association in the HLA class I region ( Fig. 2 and Supplementary Material, Fig. S1f). Therefore, not only the HLA-A allele but also additional genetic factor(s) likely exist in the HLA class I region. There are several genes in this region including HLA-A, HCG9, HLA-J, HCG8, ZNRD1-AS1, ZNRD1, PPP1R11, RNF39, TRIM31, and TRIM40 (shown in Fig. 2). Although these genes include pseudogenes and poorly characterized genes, some are associated with various diseases. The zinc ribbon domain-containing 1 (ZNRD1) protein is associated with cell growth of gastric cancer cells 15 , angiogenesis of leukemia cells 16 , and HIV-1/AIDS disease progression 17,18 . In addition, ZNRD1 knockdown inhibits the expression of HBV mRNA and promotes the proliferation of HepG2.2.15 cells 19 , suggesting that ZNRD1 is one of the possible additional genetic factors at the HLA class I region. The tripartite motif-containing 31 (TRIM31) protein is essential for promoting lipopolysaccharide-induced Atg5/Atg7-independent autophagy 20 . Moreover, TRIM40 is downregulated in gastrointestinal carcinomas and chronic inflammatory lesions of the gastrointestinal tract 21 .
Non-self antigens, such as virus-infected cells and cancer cells, and HLA class I molecules are generally recognized by the TCRs on CD8+ T lymphocytes, resulting in T cell activation 22 . The activated T cells divide and some of their progeny differentiate into lymphocytes capable of killing cells (cytotoxic T lymphocytes: CTLs) displaying the same peptides (such as tumor-specific peptides) on their HLA class I molecules. These CTLs target tumor-specific antigenic peptides and eliminate them. In other words, CTLs cannot eliminate cancer cells without HLA class I molecules even if the person has tumor-specific peptides. Cancer cells therefore need to escape from the immune system for patients to be identified as having cancer.
In this study, we identified a significant association between HLA-A * 33:03 and HBV-related HCC. In addition to HLA-A * 33:03, previous studies and this study suggested that HLA-DR, -DQ, and -DP were associated with disease progression 8,9,13 . Functional analysis of HLA class I and II proteins could be an important step in determining the pathology of HBV-related HCC.  Table 2. Association analyses of HLA-A alleles.   Fig. S3), respectively. We finally used 473 cases and 516 controls for GWAS. A quantile-quantile plot of the distribution of test statistics for the comparison of genotype frequencies in the cases and controls showed that the inflation factor λ was 1.016 for all tested SNPs and was 1.009 when SNPs in the HLA region were excluded (Supplementary Material, Fig. S4). All cluster plots for SNPs with P values of <10 −4 were checked visually and SNPs with ambiguous genotype calls were excluded.
In the replication stage, we selected seven SNPs with P values of <10 −5 from the results of the chi-square test in the GWAS. A TaqMan SNP genotyping assay (Applied Biosystems, Foster City, CA, USA) was used to confirm the genotypes at each SNP. We genotyped 989 and 767 Japanese samples for the validation of the GWAS and for the replication study, respectively. We further genotyped 281 Hong Kong Chinese and 383 Thai samples for the replication study (Supplementary Materials, Table S3). ,   Table S3. For the GWAS and replication study, the chi-square test was applied to a two-by-two contingency table in the allele frequency model. Meta-analysis was performed using the DerSimonian-Laird method (random-effects model) in order to calculate the pooled OR and its 95% confidence interval. Fisher's exact test in a two-by-two contingency table was used to examine the association between HLA alleles and disease progression of HBV patients. To avoid false-positive results due to multiple testing, the resulting P-values were adjusted based on the number of observed alleles with frequencies ≥0.5% in cases and controls. Conditional logistic regression analysis was performed for SNPs and HLA alleles. This analysis was performed as implemented in Plink v1.07 software 24 , conditioning on HLA-A * 33:03 and DPB1 * 02:01 to each of the other SNPs. Other statistical analyses were performed using the SNP & Variation Suite 7 software (Golden Helix, Bozeman, MT, USA) and statistical software R v2.6. Manhattan plot of conditioning of each SNP or HLA allele was generated by LocusZoom 25 .

Statistical analysis. The characteristics of analyzed samples are shown in Supplementary Materials
HLA imputation. SNP data from 989 samples were extracted from extended MHC (xMHC) regions ranging from 25759242 bp to 33534827 bp based on hg19 position. Two-field HLA genotype imputation was performed for a total of six HLA class I and class II genes using the HIBAG R package 26,27 . For HLA-A,-B, -DRB1,  and -DPB1, a Japanese imputation reference 26 was used for HLA genotype imputation. For HLA-C, the HIBAG Asian reference 27 was used for HLA genotype imputation. We applied post-imputation quality control using call-threshold (CT > 0.5); the call rate of successfully imputed samples ranged from 88.7 to 98.5% for the six HLA classes. In total, we imputed 5,650 HLA genotypes in HLA class I and class II genes.