Introduction

Liver cancer is the second most common cause of cancer-related deaths worldwide and was estimated to cause nearly 746,000 deaths in 2012 (9.1% of all cancer-related deaths that year)1. Hepatocellular carcinoma (HCC) accounts for more than 90% of cases of primary liver cancer2 and chronic hepatitis B virus (HBV) infection is the leading cause of liver diseases evolving into liver cirrhosis and HCC3,4. Moreover, 60% of HCC is associated with HBV, whereas 20% is related to hepatitis C virus5 in Africa and Asia. The majority of all cases of HCC worldwide are found in the Asian Pacific region, and approximately 75% of liver cancer cases occur in Asia5.

HBV genotypes B and C are dominant in Asia, and genotype C plus core mutations in the HBV genome are associated with higher risk of HCC than genotypes A, B, and D6,7,8. In addition, double mutation in the basal core promoter (A1762T/G1764A) of HBV genotype C was commonly found as an independent risk factor for the development of HCC9,10,11. HBV gene mutations were reported in 1998 by Takahashi12 in many parts of the gene, as pre-S deletion and multi-site mutations in the core promoter and at core protein aa 130 are associated with HCC. Mutations at C1653T and/or T1753V and A1762T/G1764A in Enhancer II/basal core promoter were also reported to be associated with HCC in 1999 compared with other liver disease statuses13. Subsequently, many reports confirmed these mutations14,15. The combination mutation involving the double mutation at A1762T/G1764A and mutation at C1653T and/or T1753V14 has now been shown to be a risk factor for HCC occurrence16,17,18. Prompt anti-viral treatment is proposed for such patients to prevent HCC18.

The Kingdom of Cambodia is one of 37 countries located in the Western Pacific Region and has been reported to be highly endemic for HBV infection. In Cambodia, liver cancer is the second leading cause of cancer-related deaths and is responsible for 21.5 of every 100,000 deaths annually19. From 2010 to 2014, we conducted a pilot sero-epidemiological survey on hepatitis virus infection among the general population and elementary school students in Siem Reap province, Cambodia in cooperation with the Ministry of Health in Cambodia20,21,22. In our previous survey, we found that the prevalence of hepatitis B surface antigen (HBsAg) was 4.6% and that genotype C was dominant among adults in Cambodia21. One report showed that genotype C1 accounted for 66.7% of cases and that genotype B1 was identified in 12 isolates from Cambodia23.

In this study, we performed a genetic analysis of HBV carriers among Cambodian residents by full-length genomic sequencing to clarify the characteristics of HBV genomes and to predict the occurrence of HCC. We used 340 full genomes of genotype C1 registered in GenBank for comparison.

Results

Participants in Cambodia

In total, 626 participants (254 men and 372 women, age range: 7ā€“90 years as of 2014, average age: 38.3ā€‰+ā€‰16.3 years) were recruited in our survey. The participants were from the general population of Chrey village (nā€‰=ā€‰333), Sasar Sdam commune (nā€‰=ā€‰55), Krabei Riel commune (nā€‰=ā€‰189), and Rohal village (nā€‰=ā€‰49)21.

Prevalence of HBV infection in 626 participants

TableĀ 1 shows the age/sex-specific seroprevalence of HBsAg, anti-HBs, and anti-HBc among 626 participants. The prevalence of HBsAg, anti-HBs, and anti-HBc were 5.6% (95% CI: 3.8ā€“7.4), 28.0% (95% CI: 24.4ā€“31.5), and 35.3% (95% CI: 31.6ā€“39.0), respectively. The prevalence rate of HBsAg in men (7.9%; 95% CI: 4.6ā€“11.2) was higher than that in women (4.0%; 95% CI: 2.0ā€“6.0).

Table 1 The Age-sex specific prevalence of hepatitis B infection among 626 general populations in Siem Reap province, Cambodia.

Phylogenetic analysis and genotyping of 26 HBV infected residents in Cambodia

Phylogenetic tree analysis with the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) method of the full genomes showed that 24 of 26 isolates belonged to genotype C1, and one belonged to genotype B2 and B4 each (Fig.Ā 1). In Fig.Ā 1, an Asian map in the phylogenetic circle is shown with all isolates in the same colour of their country indicated.

Figure 1
figure 1

Phylogenetic tree generated using the UPGMA method with 26 Cambodian isolates. The phylogenetic tree was constructed using the 26 isolates with HBV full genomes in this study, and HBV genotypes B1ā€“B9 strains, and C1ā€“C16 strains registered in GenBank by the UPGMA method. The analysis involved 416 complete nucleotide sequences. Evolutionary analyses were conducted using MEGA7. Our Cambodian samples are marked with stars. Other strains are shown according to colour based on location in Southeast Asia at the centre of the phylogenetic circle.

Next, phylogenetic tree analysis with the neighbour-joining method was used to compare 24 genotype C1 isolates with 340 HBV genotype C1 strains from many countries registered in GenBank. The results showed that the sequences were separated into more than ten clusters (Fig.Ā 2), and mainly into four clusters (clusters aā€“d). Twenty-four Cambodian isolates existed in clusters a, c, and d. Cluster a was composed of strains from India and Southeast Asia (including Myanmar, Laos, Thailand, Malaysia, Cambodia) and six isolates from this study. Cluster c was composed of mainly sixteen Cambodian isolates from this study and some strains from Laos, Thailand, and Malaysia spreading over a narrow area of Southeast Asian countries (see map in Fig.Ā 2). Cluster d was primarily composed of strains from China and Hong Kong, with some strains from Thailand, Malaysia, Laos, Vietnam, and two Cambodian isolates obtained in this study.

Figure 2
figure 2

Phylogenetic tree generated using the neighbour-joining method with 24 genotype C1 Cambodian isolates from this study and 340 HBV C1 strains registered in GenBank. The analysis involved 364 HBV genotype C1 complete nucleotide sequences, which were separated into over 10 different clusters, but primarily clusters (aā€“d). The 24 isolates obtained here were categorized into three clusters (a,c, and d). Cluster (a) contained six Cambodian isolates that were close to strains from Myanmar, Laos, Thailand, and Malaysia, and cluster (d) contained two Cambodian isolates that were close to strains from China or Hong Kong. Cluster c was composed of mainly 16 Cambodian isolates and other strains from Laos, Thailand, and Malaysia, spreading into a narrow area of the Southeast Asian countries.

Profiles and mutations of HBV full-length genome sequencing of 26 HBV-infected residents in Cambodia

Of the 26 HBV carriers, 13 were men, and 13 were women (13ā€“70 years old) (TableĀ 2). The mean age of the Cambodia cohort was 42ā€‰Ā±ā€‰14 years. Eighteen of the HBV carriers (69.2%) were residents of Chrey village, 7 (26.9%) were from Sasar Sdam Commune, and 1 was from Krabei Riel Commune. They were all positive for HBsAg and anti-HBc, but negative for anti-HBs.

Table 2 Profiles and mutations of 26 HBV DNA full-genome sequences.

Among the 26 isolates, the nucleotide lengths were within 3179ā€“3216ā€‰bp (TableĀ 2).

In the 24 genotype C1 isolates, many mutations were found. The double mutation at A1762T/G1764A was recognized in 18 of the isolates in genotype C1 (18/24 75.0% TableĀ 2). Combination mutation was observed in 14 isolates (14/24 58.3%, 95% CI: 38.6ā€“78.1). Of these, five at C1653T and A1762T/G1764A and nine at T1753V and A1762T/G1764A were recognized. Combination mutation at T1753V and A1762T/G1764A was most frequent. There was no solitary mutation at C1653T or T1753V.

Based on the phylogenetic analysis of genotype C1 (Fig.Ā 2), among the 16 isolates from our study located in cluster c, 15 isolates had the A1762T/G1764A mutation and 12 isolates had combination mutation (TableĀ 3). Mutation at G1613A was found in five isolates. Regarding the other mutations, there were four isolates with mutation at pre-S deletion (4/24) and 11 isolates at the Core P130 (11/24).

Table 3 Double mutation and combination mutation in 340 HBV genotype C1 strains retrieved from GenBank and 24 isolates Cambodian in the study.

For the isolates of genotype B2 and B4, only the genotype B4 isolate had a mutation at G1613A.

Seven of the 18 isolates that were negative for HBeAg had a stop codon at the precore region nt1896 (7/18, 38.9%). Two other isolates had a one base pair insertion at the precore region, which caused a frameshift in the precore protein.

Mutations and liver disease status among 340 genotype C1 genomes in GenBank

After filtering by drawing phylogenetic trees several times with strains obtained by BLAST from NCBI, we finally extracted 340 genotype C1strains from GenBank.

Based on the presence or absence of the mutations of pre-S deletion, G1613A, C1653T, T1753V, A1762T/G1764A, Pre-C W28 stop codon, and P130, these 340 genotype C1 strains were classified in 48 patterns. They were then evaluated based on liver status obtained from the registered information or published papers (Fig.Ā 3). In patterns 1 to 34 in Fig.Ā 3, double mutation at A1762T/G1764A was confirmed in 160 strains (160/340, 47.1%), almost a half of the 340 strains (TableĀ 3). Mutation patterns 1ā€“23 in Fig.Ā 3 represented combination mutation. In details, patterns 1 to 9 represented combination mutation at C1653T and A1762T/G1764A (32/340 9.4%), patterns 10 and 11 at C1653T or T1753V and A1762T/G1764A (4/340 1.25%), and patterns 12 to 23 at T1753V and A1762T/G1764A (77/340 22.6%). In patterns 24 to 32, 47 strains of double mutation at A1762T/G1764A were recognized (47/340 13.8%). In patterns 33 to 48, single mutation at G1613A, C1653T, T1753V, or Core P130 was observed in 18, 2, 4 and 18 strains, respectively (Supplementary TableĀ 1).

Figure 3
figure 3

Mutations focused at the core promotor were classified into 47 patterns that were confirmed in 364 HBV genotype C1 full genomes including the 24 isolates in this study. *Threonine (T)/Leucine (L), **Isoleucine (I)/Leucine (L)/Threonine (T), ***Histidine (H)/Isoleucine (I)/Glutanine (Q)/Threonine (T). ASC#: asymptomatic carrier including blood donor general population and occult hepatitis B infection, HIVā€‰+ā€‰HBV: Human immunodeficiency virus and HBV co-infected patients, CH: patients with Chronic Hepatitis, LC/HCC: patients with Liver Cirrhosis or Hepatocellular Carcinoma. The registered 340 HBV genotype C1 strains and 24 isolates were analysed and classified by their mutations. The map of the whole region and genome is shown in upper case. C1653T, T1753V, A1762T/G1764A exist in Enhancer II and the Basal Core Promoter. The classifications were quite detailed to show the distribution. Patterns show assortments of each mutation; pattern 47 means that the HBV gene has no mutations at the special-focused point in our study, and the pattern ā€œothersā€ means that the HBV gene has some mutations but not at the focused point.

In 340 C1 strains retrieved from GenBank, we investigated the associations between their liver disease status and special mutations (TableĀ 3). The rates of retaining double mutation at A1762T/G1764A in ASC, patients with chronic hepatitis, and patients with LC/HCC were 28.9% (28/97), 49.0% (73/149), and 100% (21/21), respectively. The rates of double mutation were raised significantly according to liver disease progression (Among 3 groups, pā€‰<ā€‰0.001; posthoc pairwise ASC vs CH, Pā€‰<ā€‰0.001, ASC vs LC/HCC, Pā€‰<ā€‰0.001, CH vs LC/HCC, Pā€‰<ā€‰0.001; Fig.Ā 4). The rates of combination mutation in ASC, patients with chronic hepatitis, and patients with LC/HCC were 16.5% (16/97), 34.2% (51/149), and 95.2% (20/21), respectively. The rates of combination mutation at C1653T or T1753V and A1762T/G1764A were also raised significantly according to liver disease progression (Among 3 groups Pā€‰<ā€‰0.001; posthoc pairwise ASC vs CH Pā€‰<ā€‰0.001, ASC vs LC/HCC Pā€‰<ā€‰0.001, CH vs LC/HCC Pā€‰<ā€‰0.001; Fig.Ā 4).

Figure 4
figure 4

The rates of double mutation at A1762T/G1764A and combination mutation at C1653T and/or T1753V and A1762T/G1764A by liver disease status in 340 genotype C1 strains and Cambodian isolates. *Pā€‰<ā€‰0.001. The rates of double mutation at A1762T/G1764A were raised significantly according to liver disease progression (among 3 groups Pā€‰<ā€‰0.001; posthoc pairwise ASC vs CH Pā€‰<ā€‰0.001, ASC vs LC/HCC Pā€‰<ā€‰0.001, CH vs LC/HCC Pā€‰<ā€‰0.001). The rates of combination mutation at C1653T or T1753V and A1762T/G1764A were also raised significantly according to liver disease progression (among 3 groups Pā€‰<ā€‰0.001; posthoc pairwise ASC vs CH Pā€‰<ā€‰0.001, ASC vs LC/HCC Pā€‰<ā€‰0.001, CH vs LC/HCC Pā€‰<ā€‰0.001). The rates of both double mutation and combination mutation in 24 Cambodian isolates were higher than those of ASC and CH in 340 genotype C1 strains.

In patients with LC/HCC, the rates of Pre-S deletion, G1613A, and Core P130 mutation were 4.8% (1/21), 52.4% (11/21) and 38.1% (8/21) respectively. Strains with single mutation at G1613A, C1653T, T1753V, or Core P130 were not confirmed in patients with LC/HCC (Fig.Ā 3).

Discussion

Despite the high prevalence of HBV infection and high mortality from HCC in Cambodia, only few reports have been published on the molecular characterisation of HBV genomes in this country, and only three complete genomic sequences have been reported23,24,25. We performed complete genomic sequencing of 26 HBV carriers in Cambodia and found that the genotype of the 26 HBV carriers was genotype C1 dominant, consistent with a previous report in Cambodia24.

With phylogenetic tree analysis of 24 Cambodian genotype C1 isolates in this study and 340 genotype C1 strains registered in GenBank, we revealed the geographical distribution of HBV genotype C1 (Fig.Ā 2). By the neighbour-joining method, these C1 strains were separated mainly into four clusters, with cluster a and cluster d being the largest. Cluster a was composed of strains from India and Southeast Asia and cluster d was composed of strains from China and Hong Kong. Sixteen of 24 Cambodian C1 isolates were located in cluster c with strains from Laos, Thailand, and Malaysia, neighboring countries to Cambodia.

Among 24 Cambodian C1 isolates, the double mutation at A1762T/G1764A was most frequent and found in 75.0% of them, and combination mutation at C1653T or T1753V and A1762T/G1764A was found in 58.3% of them. To verify these mutations in genotype C1 genomes, we analysed full genomic sequences of 340 genotype C1 strains registered in GenBank and compared them with 24 Cambodian full genome sequences. We evaluated the relationship between liver disease status and these mutations in the 340 genotype C1 strains using GenBank data. The result showed that of the 340 genotype C1 strains, 47.1% had double mutation and 33.2% had combination mutation.

Analysing the mutation with liver disease status among the 340 genotype C1 strains, 16.5% of the ASC and 34.2% of CH already had combination mutation, but the combination mutation rate in LC/HCC was significantly high (95.2%). Moreover, the rate of combination mutation in 24 Cambodian C1 isolates (58.3%) was significantly higher than that in overall 340 genotype C1 strains (33.2%, pā€‰=ā€‰0.0128) and the CH alone in GenBank sequence (34.2, pā€‰=ā€‰0.241) (TableĀ 3). Possible underlying factors causing the higher rate of mutation such as race, human genomic influence, and host immunity of the individuals exist, but the exact reasons behind them are unknown. Nevertheless, the higher rate of HBV mutation is related to the high possibility of HCC occurrence.

The double mutation at A1762T/G1764A in genotype C2 has been shown to be present at a high rate in HCC in previous reports and the combination mutation at C1653T and/or T1753V and A1762T/G1764A increases the risk of occurrence of HCC among HBV genotype C2 carriers17,18,26,27,28,29.

As for HBV genotype C1 genome, some reports were published; the report in the hospital-based study from South China27 reported a combination mutation rate at T1753V and A1762T/G1764A in CH and HCC patients of 4.8% and 14.7%, respectively. It was reported that mutations in the basal core promoter might play a synergistic role on enhancing HBV carcinogenesis. In our analysis, the rate of combination mutation of the LC/HCC patients among 340 genotype C1 strains in GenBank was also high.

It has been reported in many studies that mutations in the core promoter including Enhancer II and Basal core promoter were related to carcinogenesis. X protein is a transcriptional activator and interacts with tumor suppressive factor P5330. Mutation at X gene may induce uncontrolled cell proliferation31, but its mechanisms have not been well clarified17 yet. There were some cohort studies reporting that carriers with mutations developed to HCC18,32, but there is no report about cohorts with mutations in processes leading from ASC or CH to HCC. Further studies are needed to elucidate these mechanisms in HCC and to investigate the prognosis of HBV carriers who had combination mutation in the core promoter region by the prospective studies for follow up.

As for pre-S deletion in 340 genotype C1 strains, only one case from this study was found in 21 cases of LC/HCC (4.8%), and 21 strains were found in 340 C1 strains (3.4%) from GenBank. The rate of pre-S deletion associated with HCC was lower than that reported in genotype C2 with HCC29,33,34,35. Therefore, this deletion might not be strongly associated with HCC in genotype C1.

There were some limitations to this study. The first limitation was that we were not able to obtain the liver status of all the 340 genotype C1 strains registered in GenBank. But according to our findings in the analysis of 340 genotype C1 strains, 95% of patients with HCC had the combination mutation. These findings suggest that genotype C1 with combination mutation might be associated with a high risk of HCC.

The second limitation was that we conducted a serological pilot survey for only a small portion of residents of four regions in Siem Reap province, Cambodia. We could not reach all residents of these regions, but according to the results of this study among 24 Cambodian isolates of genotype C1, double mutation at A1762T/G1764A was observed in 78.3% and combination mutation was observed in 58.3% of the isolates. The study participants were recruited from previously conducted cross-sectional pilot studies so the liver status of the participants could not be known. Therefore, the actual correlation of mutation and liver status could not be found by this study.

Therefore, since there is some probabilities that Cambodian HBV infected residents are expected to have the risk of progression to HCC, they should be followed up with to grasp trends of their genomes over time and to take measures to prevent HCC.

Methods

Study design

We conducted a pilot survey among the general population in four regions in Siem Reap province, Cambodia from 2010 to 201421,22,23. In this study, we rearranged the data for all participants (nā€‰=ā€‰626), including children, and found that the age of participants ranged from 7 to 90 years old. We then selected the participants who were positive for HBsAg and tested their serums for HBV genome sequences.

Serological Testing of the participants

We performed the serological tests for HBV infection as follows. HBsAg was detected using reverse passive hemagglutination assays (R-PHA, Mycell II HBsAg; Institute of Immunology, Tokyo Japan) and chemiluminescent immunoassays (CLIA, Architect HBsAg QT; Abbott, Tokyo, Japan). Hepatitis B core antibody (Anti-HBc) was detected using passive hemagglutination (PHA) with Mycell anti-rHBc (Institute of Immunology, Tokyo, Japan) and CLIA with Architect HBc II (Abbott, Tokyo, Japan). Hepatitis B surface antibody (Anti- HBs) was detected using PHA with Mycell II anti-HBs (Institute of Immunology, Tokyo, Japan) and CLIA with Architect Osabu (Abbott, Tokyo, Japan).

Subjects for HBV full-length genome sequencing

Among 35 HBsAg-positive samples, eight had insufficient amounts of virus for further analysis. Thus, 27 samples were subjected to PCR for full sequencing. Finally, we obtained 26 full genome sequences out of 27 PCR-positive samples; only one yielded a partial sequence. These 26 patients were positive for HBsAg with R-PHA method (TableĀ 2).

HBV full-length genome sequencing in 26 Cambodian participants

For full-length sequencing of HBV genome, we used the same primers used for PCR in the previously described method21,36. Briefly, we extracted 100ā€‰ĀµL of serum for detection of HBV DNA using SMITEST EX-R&D (Genome Science Laboratories, Fukushima, Japan) and for nested PCR using Prime STARĀ®GXL polymerase (Takara Bio Inc., Shiga, Japan) with the primer set WA-L and WA-R and inner primers WA-L2 and WA-R2. For the missing portion of the circular HBV DNA, we performed nested PCR again on the extracted DNA using Prime STARĀ®GXL polymerase and the primer sets S1, S2, AS1, and AS2. Final products were sequenced using an Applied Biosystems 3730 x l DNA sequencer (Thermo Fisher Scientific K.K., Kanagawa, Japan) and a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA).

Extraction of 340 full genomes of genotype C1 registered in GenBank

To extract HBV C1 strains from GenBank, we first searched many neighbourhood genes resembling the full sequence data obtained from a Basic Local Alignment Search Tool (BLAST) screening of the National Centre for Biotechnology Information (NCBI) database. We constructed phylogenetic trees several times as shown below and determined genotype C1 strains. There were 340 registered HBV genotype C1 strains in the GenBank. We recruited all of them in this study after gathering background information and full-length genomic strains for each one.

Genetic analysis and phylogenetic analysis

After genomic sequencing, nucleic acid analysis was performed using GENETYX-MAC version 18 software, by the UPGMA method34 with the 26 full genomes and HBV genotypes B1ā€“B9 and C1ā€“C16 strains registered in GenBank. For genotype C1, analysis was performed by the neighbour-joining method35 with 24 full genomes of genotype C1 and 340 HBV C1 strains selected above. Evolutionary analyses were conducted in MEGA737.

Classification of HBV genomic mutation patterns and liver statuses of participants for 340 genomes in GenBank and 24 Cambodian genomes with genotype C1

First, we classified 340 genotype C1 strains and 24 Cambodian isolates into 48 patterns by combinations of the following mutations pre-S deletion, G1613A, C1653T, T1753V, and A1762T/G1764A, Pre-C W28 stop codon, and P130 in the core region.

Second, we confirmed the backgrounds and liver statuses of the participants for the 340 genotype C1 strains registered in GenBank by investigating the original papers in which the strains were reported and the information in GenBank.

Third, we classified the isolate holders as asymptomatic carriers (ASC) in which we includedĀ blood donors (BDs),Ā occult hepatitis B infection and general populations,Ā  human immunodeficiency virus (HIV) and HBV co-infected patients, patients with chronic hepatitis B, patients with liver cirrhosis (LC), or patients with hepatocellular carcinoma (HCC). If there was no description of the clinical background in the GenBank registry or paper or if there was no publication with gene registration, the disease state was classified as unknown.

Statistical analysis

Both of the proportion of double mutation and the proportion of combination mutation among ASC, CH and LC/HCC were compared using Ļ‡2 test for the statistical homogeneity among 3 groups and posthoc pairwise Ļ‡2 test with Bonferroni correction. pā€‰<ā€‰0.05 was considered statistically significant. All data were analysed using JMP version 11 (SAS Institute Inc., Cary, NC, USA).

Ethical consideration

Informed consent was obtained from every participant after well explanation of the main content and purpose of the study. For participants under 18 years, informed consent was obtained from either their mother, father, or legal guardian. Informed assent was obtained from all children in the study. All specimens were de-identified, with reference only to a unique identifier. The detailed study procedure including taking informed consent for human subjects was clearly described in the study protocol. The study protocol was confirmed and approved by the Ethics Committee for Epidemiological Research of Hiroshima University, Japan (ethical no. 370-1) and the National Ethics Committee for Health Research at Ministry of Health of Cambodia (ethical no. 0085 NECHR). Additionally, we obtained permission from the National Institute of Public Health of the Ministry of Health of Cambodia (no. 1494 NIPH) for the transportation of blood samples from Cambodia to our laboratory in Hiroshima, Japan. All methods were performed in accordance with the relevant guidelines and regulation.