Implementation of serological testing, especially in voluntary blood donors, is critical for controlling the transmission of the hepatitis B virus (HBV)1. After the introduction of the nucleic acid test (NAT) in blood screening, the risk of HBV transmission by transfusion has been significantly decreased because blood components with HBsAg-negative /HBV NAT-positive (HBsAg−/HBV NAT+), which were missed by serological testing, can be identified by NAT2,3,4,5. It has been reported that the majority of HBsAg−/HBV NAT+ cases were occult HBV infection (OBI)6,7. The definition of OBI is the presence of replication-competent HBV genome in the blood and/or liver of individuals who test negative for HBsAg by the currently available testing methods8. Defining the epidemiology of OBI can be difficult because it relies on the sensitivity of HBsAg and HBV DNA assays. OBI is the potential risk of HBV transmission through blood transfusion, and organ transplantation, as well as from occult infected mothers to newborns9,10,11. In low- and middle-income countries that anti-HBc and/or NAT tests have not been implemented, HBV transmission from OBI blood donors remains a major health issue8. The biochemical and clinical symptoms of most OBIs are not serious, but serious liver diseases can also occur12.

The virological features and the mechanisms of OBI remain unclear. OBI has been related to the HBV S protein mutations in vitro13,14,15,16. Other reasons leading to OBI include genomic regulatory regions mutations that may negatively affect viral replication17 and incomplete control of HBV under the host immune system18,19,20. Although there have been many studies on Pre-S/S mutation in OBI14,20,21, few studies reported full genomes of OBI, or the sample size was limited22. Here, we aimed to characterize OBI from blood donors in Guangdong, China. The sample size in this study was large, because we got 50 full-length genome sequences of OBI. The specific aim of this study was to report the more comprehensive and detailed serological and molecular characteristics of OBI in China to assess whether specific viral mutations could characterize OBI.


Sample classification and characterization

From March 2015 to May 2017, a total of 554,154 blood donors were screened and 1793 blood donors were HBsAg negative and multiplex NAT positive. Among them, 1407 were further determined using HBV Discriminatory Assay, and 428 were found to be HBV NAT positive. Among 428 HBsAg−/HBV NAT+ samples, 212 were randomly selected as the study population. Among them, 77 (36.32%) were reactive for anti-HBc and anti-HBs, 105 (49.53%) carried anti-HBc only, 13 (6.13%) were reactive for anti-HBs only, and 17 (8.02%) were non-reactive for anti-HBc and anti-HBs (Table 1).

Table 1 Molecular and serological confirmation of hepatitis B virus (HBV) infection in NAT-positive blood donor samples.

In order to determine OBI, the existence of HBV DNA in the 212 HBsAg−/HBV NAT+ subjects on both primary donation and follow up samples were confirmed by Q-PCR and nested PCR. A total of 200 samples were considered HBV DNA positive, including 172 OBI, 6 "other" for various specific features (which can be recent HBV infection and acute resolving may happen), 22 unclassified samples that were anti-HBc negative could not be differentiated between seronegative primary or transient OBI and window period because of lack of follow-up, and 12 samples negative by Q-PCR and nested PCR assays were considered HBV DNA not confirmed (Table 1 and Fig. 1). The distribution of viral loads of the 200 HBV DNA positive subjects was shown in supplementary Table S1.

Figure 1
figure 1

Detection and classification of HBV infection in blood donors (including follow-up samples) with HBsAg−/HBV NAT+. HBV serological testing was performed by electrochemiluminescence immunoassay (ECLIA) for HBsAg, anti-HBs, anti-HBc, HBeAg, and anti-HBe. In addition, HBV DNA testing was performed by real-time quantitative polymerase chain reaction (Q-PCR) and PCR for BCP/PC gene and long fragment. Six samples were positive for HBV DNA (Q-PCR or nested PCR positive) and negative for anti-HBc; however, in follow-up analyses, these samples were negative for HBV DNA by NAT, Q-PCR, and nested PCR assays. “failure” meant that we did not get the followed-up samples, and “successful” meant that we got the followed-up samples successfully.

Among the 172 OBI donors, there were 137 males and 35 females. The median age was 44 years old. The values of ALT less than or equal to 50 U/L are defined as normal no matter male or female according to the manufacturer's instructions (Alanine aminotransferase Reagent Kit, Shanghai Huashi Asia–Pacific Biopharmaceutical Co., Ltd.). All of them had normal ALT levels. Their viral loads ranged between unquantifiable and 4,667 IU/ml (median, 71.15 IU/ml). The OBI samples were further classified by serological markers: 70 (40.70%) samples were reactive for anti-HBc and anti-HBs, and 102 (59.30%) carried anti-HBc only (Table 1).

Among 172 OBI samples, 129 BCP/PC and 50 long fragment sequences were amplified and sequenced; in total, 50 full-length genomes (including 3 HBV whole genomes minus 53 bp) were obtained, which classified 33 strains as genotype (gt) B (OBIB) and 17 strains as genotype C (OBIC) via phylogenetic analysis. All OBIB strains were of subgenotype B2. Thirteen OBIC strains were subgenotype C1, with 4 belonged to other subgenotypes of genotype C (Fig. 2). The mean viral load among OBIB donors was 2.099 log10 IU/ml comparing to 1.915 log10 IU/ml among OBIC donors (P = 0.743) using the non-parametric Mann–Whitney test (data not shown).

Figure 2
figure 2

Estimated maximum-likelihood phylogeny for full-length genome sequences of OBI strains. Red circles indicate sequences from OBI strains in this study, and the rest indicate reference sequences of proposed genotypes and subgenotypes. Bootstrap analysis values (> 70%) are displayed on the branches. The bars at the middle top of the figure show the scale in nucleotide substitutions per site. AY226578: from Woolly monkey as an out-group.

Variability analysis of regulatory regions and core, X, envelope (Pre-S/S), and polymerase (Pol) proteins of occult HBV sequences

The nucleotide sequences of regulatory regions and amino acid sequences of ORFs from OBIB and OBIc strains were compared to their respective control (HBsAg+) sequences. Regulatory regions of HBV, including enhancer (ENH), promoters for Pre-S1 (SP1), Pre-S2/S (SP2), Core (BCP), and X (XP) proteins, core upstream regulatory sequence (CURS), and direct repeat sequences (DR) were analyzed. DR1 and DR2 were conserved. Significantly higher variabilities (P < 0.05) were observed in SP2 and CURS regulatory regions in OBIB strains than in their controls. In contrast, SP1 was less variable in OBIB than controls (Table 2).

Table 2 Intergroup variability analyses between OBIB and HBsAg + strainsδ.

The variability of amino acid sequences of ORFs was analyzed using the same method. Amino acid variabilities were significantly higher in Pol and Pre-S/S protein of both OBIB and OBIC than their corresponding controls (Table 2).

Mutation analysis on Pre-S/S, Pol, Pre-core/core, and X region between OBI strains and controls

Deletion and insertion were found in 4 cases and 1 case in OBIC strain, respectively (Fig. 3). Samples 408 and 716 had amino acids 6–10 and 18–22 deleted in Pre-S2, respectively. Sample 498 and 1420 had amino acids 1–6 and 1–5 deleted in Pre-S1, respectively, which leads to start codon lost in Pre-S1. Sample 170 had 4 amino acids insertion (after aa112 in S gene), which was located in the major hydrophilic region (MHR). Five clones of each strain were sequenced and were all support the deletion/insertion (Supplementary Fig. S2). Deletions and insertion in this study have not been found in other studies23,24,25,26,27.

Figure 3
figure 3

Amino acid location of deletions and insertion at the PreS/S region for gtC strains. Deleted or inserted amino acid sequences were labeled with long red boxes. GQ205441 was a reference sequence of genotype C.

To identify OBI-related point mutations, the amino acid sequences of OBIB and OBIC strains (excluding strains with deletions and insertions) were compared with their corresponding controls. Forty-five OBI-related point mutations were identified in Pre-S1, Pre-S2, S, Core, and Pol genes, because their frequency in OBIs was significantly higher than that in the controls (P < 0.05; Table 3 and Supplementary Table S2). Among them, 26 mutations have been documented in previous studies (Supplementary Table S2)14,16,20,28,29, while the other 19 mutations were novel findings, including E39K/D and S101T in Pre-S1 gene, Q10R in Pre-S2 gene, P178Q, Q181R, and I226N in the S gene, T147A in Core gene in OBIB and T118K in S gene in OBIC (Table 3).

Table 3 Novel OBI-related mutations of HBV genome relating to the genotype B and C.

OBI-related mutations in Pol gene rarely reported before. Here we identified 10 novel mutations within Pol gene (Table 3). It should be noted that, since the reverse transcriptase (RT) of the Pol gene spanned the S region completely, any single nonsynonymous mutation may lead to changes in amino acid of both proteins, which consequently referred two mutations. For example, pR499Q (rtR153Q) and pH580Q (rtH234Q) corresponding to sG145R and sI226N/S in OBIB strains, and pH468N (rtR122 Q) and pH472Q (rtH126Q) corresponding to sS114T and sT118K in OBIC strains were found in this study (Table 3 and Supplementary Table S2).

MHR is the most important antigenic determinant for HBV strains. Mutation sT118K in the MHR changed the secondary structure of the S protein (Fig. 4). This mutation had 15 instead of 17 amino acids in beta turns, and 97 instead of 95 amino acids in random coils. HBV DNA level in strains with OBI-related mutations and in strains without OBI-related mutations were compared, and HBV DNA level was significantly lower in strains with mutations sG145R (pR499Q) than in strains without the mutation, so was mutation sI226S in OBIB strains (Supplementary Table S3).

Figure 4
figure 4

Prediction of the secondary structure of the S protein upon mutation sT118K in OBIC strains. h (blue), alpha-helix; e (red), extended strand; t (green), beta-turn; c (yellow), random coil. Mutated amino acid and altered secondary structure were labeled with red glide line.

Two mutations resulted in stop codon were found: aa9 in Pre-S1 in sample 498 and aa201 in the S gene in sample 716 (data not shown).


The majority (94.34%, 200/212) of the HBsAg−/HBV NAT+ blood samples was HBV-infected, 81.13% (172/212) of which were identified as OBIs (Table 1). Samples not confirmed as HBV DNA positive may be related to (a) viral load under the confirmatory assay detection limit, (b) NAT screening false positive, or (c) genetic variability. Donors with OBI had normal ALT levels, detectable anti-HBc and low viral loads (median: 71.15 IU/ml, 62.21% cases were lower than 100 IU/ml) which was consistent to previous studies8,22. Incomplete control of HBV replication by the host immunity may lead to low viremia. Anti-HBs formed by the selection of HBsAg escape mutations spontaneously and the consequences for vaccination of the general population against HBV28,30 may facilitate the incomplete immune control. OBI is related to the antiviral immune response, which is believed to be important for maintaining HBV control8. The immune system effectively controls HBV (even if it is not cleared) in most OBI cases31. Nearly half of these individuals (40.70%, 70/172) carried detectable anti-HBs in this study, which was similar to what was reported in OBIs in South Africa and Europe (45%)17,18. It is known that anti-HBs are considered a protective antibody and proof that people have developed immunity. So, OBI with anti-HBs may indicate incomplete immunity.

In previous studies, OBI-related mutations are often identified without matched control21,32. Consequently, it was impossible to rule out natural polymorphisms and/or differences related to the tissue source of the virus, the clinical status of the HBV infected persons, or the geographic origin of individuals32. We selected matching sequences obtained from HBsAg+ asymptomatic and apparently healthy blood donors identified during the same blood screening process in Guangzhou Blood Center as control groups in order to overcome limitations above.

Naturally occurring mutations in the HBV genome have been attributed to play an important role in the persistence of HBV infection. Significantly higher mean nucleotide variability was observed in the SP2 and CURS regulatory regions only in the OBIB strains sequences than controls in this study. However, the regulatory regions were conserved for OBIC strains compared to HBsAg+ strains, but a significantly higher nucleotide variability was observed in the SP1, SP2, ENH1, and ENH2 regulatory regions in OBIC strains compared to the HBsAg+ strains in previous studies22,33. The reason for the discrepancy may be related to (a) different study populations, (b) our more reliable control from the same population infected with HBV.

The mean number of amino acid substitutions in Pre-S/S regions from OBIB strains sequences was higher than in their wild-type strains, and this phenomenon was the same in OBIC strains sequences (except for Pre-S1; Table 2). The genome's Pre-S/S ORF encodes the three envelope glycoproteins, which are produced by differential translation initiation at each of three in-frame start codons. The three envelope glycoproteins are called the large (L), middle (M), and small (S) HBsAgs. The expression of envelope protein is essential for virion assembly and secretion. Therefore, deletions, insertions, and point mutations may interfere with HBsAg detection in Pre-S/S.

Mutations in the Pre-S1 and Pre-S2 gene may reduce the expression of L-HBsAg and M protein, respectively. In this study, deletions and some mutations happened in Pre-S1 and Pre-S2 region. Moreover, one ps1C25T mutation resulted in a stop codon in the Pre-S1 gene in sample 498. The specific ratio of L-HBsAg and S-HBsAg protein is essential for assembly of the envelop particles because an excessively low or high ratio of L/S proteins could change assembly and secretion of HBsAg, and reduce secretion of virion34. HBsAg secretion is significantly reduced, envelope proteins are retained in the endoplasmic reticulum, virion secretion efficiency is reduced, and nuclear accumulation of higher amounts of covalently closed circular DNA in Pre-S variant HBV compared with in wild-type HBV35.

The factors of occult hepatitis B infection are complicated and not yet been fully elucidated. Mutations in S gene may contribute to occult infection. The S protein corresponds to HBsAg36. Mutations in the S gene may affect immunogenicity, antigenicity, expression, and/or secretion of HBsAg, causing HBsAg test failed37,38, reducing the replication and/or secretion of the virion, exerting a negative effect on HBsAg14,39, or avoiding final clearance by the immune system and finally leading to OBI. HBV DNA level strains with mutations sG145R (pR499Q) was significantly lower than strains without the mutation, so was mutation sI226S in OBIB strains. However, further functional studies are needed. In this study, an insertion in sample 170 and the new OBI-point mutation sT118K in gtC strains were all located in MHR. MHR is the most important antigenic determinant of all HBV strains and is crucial to the HBsAg detection and HBV vaccines development14,36. An aa201 stop codon mutation in OBIC sample 716 was found in the S gene. Some new OBI-point mutation in the S gene are uncommon mutations in this study. Among them, mutation sT118K in MHR caused a decrease in the thermo-stabilities (0.31 kcal/mol), which might lead to structural instability under the thermal circumstance. The prediction of the secondary structures showed that the proportion of beta-turns, and random coils changed after sT118K mutation. This change may influence the 3D structures of S protein and then affect biological function, supported by protein 3D structures. Especially, beta-turns are generally located on the surface of proteins and are related to molecular recognition, so reduced beta-turns upon mutation sT118K may influence the detection function of the HBV diagnostic ELISA kit. Strains with mutations sG145R (pR499Q) and sI226S had lower HBV DNA level, which may imply that the mutation in S gene may reduce the replication and/or secretion of the virion. However, further functional studies are needed.

Mutations in the Pol (RT) gene may be one of the reasons for the low titers of most OBI cases. The RT activity is important for the replication of HBV DNA40. The mean number of amino acid substitutions in the Pol region (P = 0.009 and P = 0.042), especially in the RT region (P < 0.001 and P = 0.005) from OBIB and OBIC sequences, was higher than in their wild-type strains (Table 2), which was similar to a previous study41. Seven new OBI-related mutations were identified gtB, and three in gtC strains in polymerase gene in the present study. Most of these mutations were located in the RT region. Mutation pR499Q (rtR153Q) and pH580Q (rtH234Q) corresponding to sG145R and sI226N/S in OBIB strains, respectively, and pH468N (rtR122 Q) and pH472Q (rtH126Q) corresponding to sS114T and sT118K in OBIc strains, respectively, were observed in this study. Some studies have a focus on the concomitant mutations in RT and S region about drug-resistant and vaccine-escape due to their overlapping protein-coding regions42,43.

It is reported that the mutations of the core protein in HBV infection is not only related to the low secretion of HBV virions, but also related to immune escape epitopes at CTL and Th levels and severe liver disease (such as liver cirrhosis and hepatocellular carcinoma)44,45,46. One new OBI-related mutation, T147A in the core gene, was identified in OBIB strains. The mutations of the OBI core protein in CTL epitope cluster 141–151 may become an escape epitope. Although these mutations may reduce the adaptability of the virus, it still contributes to the persistence of HBV infection46. Studies have recently shown that the HBc (9-residues, 141–149) linker peptide between the N-terminal domain and C-terminal domain, which plays a key role in multiple stages of virus replication rather than just as a spacer with no specific function and strongly implicated the HBc linker in recruiting the protein phosphatase 2A and other host factors to regulate multiple stages of HBV replication47,48, which may result in low viremia.

OBI is the potential risk of HBV transmission through blood transfusion (the minimal infectious dose is 3 IU/ml), organ transplantation, and from occult infected mothers to newborns9,10,11, and episodes of reactivation can occur after the development of an immunodeficiency, and then acute hepatitis and occasionally fulminant hepatitis may happen after reactivation49. The reactivation can make the progression of liver damage, resulting in fibrotic conditions that promote the development of cirrhosis50. The study of characteristics of a large sample size of the full-length genome of OBI may better to understand the situation of OBI infections in blood donors and further help us to pay attention to the fact that reactivation of OBI strains occurs.

In conclusion, OBI maintained by host, viral, immunological, and/or epigenetic factors, is one of the most challenging clinical features in the viral hepatitis study50. We conducted a comprehensive survey about the characteristics of a large sample size of the full-length genome of OBI. The variabilities and mutations mainly occurred in the Pol and Pre-S/S region both in OBIB and OBIC strains, which may lead to HBsAg undetectable and low HBV DNA viral load in the present study, but the relationship remains to be confirmed by functional studies, which are being planned.

Materials and methods

Sample identification

HBsAg and antibodies to human immunodeficiency virus (HIV), hepatitis C virus (HCV), and Treponema pallidum (TP) were tested by individual donation enzyme immunoassays (EIAs) testing. The qualified blood donors were further screened for HBV, HCV, and HIV genomes by NAT with Procleix Ultrio Plus multiplex Assay, and then with HBV Discriminatory Assay (Grifols Diagnostic Solutions, Inc.) on the Tigris platform; the lower detection limits of the two NAT assays were 3.4 IU/ml and 4.1 IU/ml, respectively. Two hundred and twelve HBsAg−/HBV NAT+ blood donors were enrolled in Guangzhou Blood Center from March 2015 to May 2017. The diagnostic criteria of OBI are described in detail in previous studies17,22. Briefly, samples identified as OBI were confirmed by combining real-time quantitative polymerase chain reaction (Q-PCR), nested amplification, anti-HBc, and anti-HBs, excluding the samples of the window period, false positive, and convalescent period of acute infection. All participants were duly informed about this study, and written informed consent was obtained from each participant. All procedures performed in this study involving human participants were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This study was approved by the Medical Ethics Committee of Guangzhou Blood Center.

HBV serological testing

HBV serologic markers (HBsAg, anti-HBs, HBeAg, anti-HBe, and anti-HBc) were analyzed by a highly sensitive electrochemiluminescence immunoassay [(ECLIA), cobas e602; Roche Diagnostics, Mannheim, Germany] according to the manufacturer's instructions. The limit of ECLIA for HBsAg is 0.05 IU/ml.

HBV DNA quantification, amplification, sequencing, and phylogenetic analysis

The HBV DNA from the HBsAg-/HBV NAT+ samples was extracted from 2.5 mL of plasma, using a large volume of high-purity virus nucleic acid extraction kit (Roche Diagnostic, Germany)51. Q-PCR was used to quantify viral load (sensitivity 5 IU/ml). Viral DNA was also used to amplified by nested PCRs for an HBV long fragment [the HBV full length-genome minus 53 bp, nucleotide (nt)1804 to 1856] and the HBV basic core promoter/precore gene (BCP/PC, nt 1679 to 1973), as described previously18,22,51. Samples positive with Q-PCR or nested PCR tests were considered HBV DNA positive. Samples negative with Q-PCR and nested PCR tests were considered HBV DNA not confirmed. Blood donors with anti-HBc or HBV DNA testing negative were used in follow-up analyses.

The HBV long PCR fragments products were ligated into a cloning vector, which was used for the transformation of E. coli, followed by cultivation overnight and five clones were picked up for sequencing. We obtained the full-length genome sequence by combining the two overlapping fragments. MAFFT version 7 ( was used to generate the multiple sequences alignment. The phylogenetic tree was constructed using MEGA X software ( on the maximum-likelihood method. The reliability of the tree was estimated using 1000 bootstrap replications. HBV subgenotype reference sequences52 were downloaded from the National Center for Biotechnology Information (NCBI) database ( HBV genotypes/subgenotypes were confirmed by the phylogenetic tree.

Occult HBV sequence analyses

A consensus sequences was from each OBI plasma. Consensus sequences were used for alignment analyses18. BioEdit software ( was used to calculate the intragroup variability based on “Sequence difference count Matrix”. HBV wild-type sequences were used as controls, which were obtained from HBsAg-positive (HBsAg+) strains selected from blood donors in Guangzhou Blood Center. The whole genomes of 81 control strains (58 gtB and 23 gtC) were amplified successfully, as described previously53. A phylogenetic tree of control sequences was constructed similarly to the sequences of OBI strains (Supplementary Fig. S1). Average intragroup variability was calculated as the number of nucleotides (regulatory region sequences) or amino acid (protein sequences) substitution differences between OBI strain sequences and control HBV strain sequences of different genotypes. Point mutations were investigated between OBI and control group throughout the four open reading frames (ORFs, Pre-S/S, Pol, Pre-Core/Core, and X region). The OBI-related point mutations in OBI sample sequences that were not present in any of the reference isolates were designated as uncommon mutations33. The secondary structures of the S protein were predicted by SOPMA ( GenBank accession numbers of the full-length HBV genomic sequences from 81 HBsAg+ and 50 OBI blood donors in this study were OM669567 through OM669697.

Statistical analysis

Intergroup variability analyses were performed by the non-parametric Mann–Whitney test. HBV DNA level between strains with OBI-related mutations and strains without OBI-related mutations were compared by T-test, and the non-parametric Mann–Whitney test was used when the condition of T-test was not satisfied. The significance of differences in point mutations between OBI and control group were determined using the Fisher's exact test. All tests used were two-tailed. All statistical analysis was performed by SPSS 22.0 software (SPSS, Chicago, IL, USA). A P-value of < 0.05 was considered to be statistically significant.