Introduction

Hepatitis B virus (HBV) infection is one of the most common chronic viral infections in the world, affecting 240 million people globally1 and accounting for 1.3 million deaths each year mostly from cirrhosis and hepatocellular carcinoma (HCC)1. China accounts for one-third of all global chronic HBV cases1, despite widespread and free universal vaccination of newborns since 20052. The prevalence of chronic HBV infection among middle-aged to older adults in China remains relatively high3, and without treatment up to one-quarter of those infected ultimately develop cirrhosis or liver cancer4,5. There is ongoing high mortality associated with chronic HBV, with an estimated 10 million people living with chronic HBV in China predicted to die by 2030 from liver cancer and chronic liver diseases6.

Improving the identification of chronically infected individuals is a key component to addressing the chronic HBV burden in China6. It is described as a ‘silent epidemic’1 reflecting the asymptomatic disease course that contributes to late diagnosis and poor prognosis – in China the diagnosis rate is estimated at less than 20%, well below the World Health Organization (WHO) target of 90% by 20307,8. However, apart from risk factors contributing to mother to child transmission, comparatively little is known about risk factors for HBV chronicity in adults, and the most recent nationwide Hepatitis B serosurvey in China in 2014 only included participants aged 1–29 years9. Furthermore, in addition to conventional risk factors, genome wide association studies (GWAS) have found several genetic variants associated with chronic HBV. Most variants are human leukocyte antigen (HLA) loci, which play a critical role in the host immune response to viral infection through antigen presentation10, where polymorphisms can alter the efficacy of antigen binding and T-cell response, impacting viral clearance11. Other genes (including some in non-HLA regions of the genome) also impact likelihood of viral persistence or clearance by altering the magnitude of adaptive or innate immune responses11. Existing GWAS were relatively small case–control studies in constrained geographical areas, recruiting cases from hospitals or liver cancer screening units12,13,14,15,16, and further research examining how these genetic variants are related to chronic HBV risk in a large, geographically diverse population sample is of interest.

Further knowledge about risk factors associated with HBV chronicity in adults may help ongoing efforts to reduce the chronic HBV burden in China, by informing targeted testing of higher risk individuals to capture infected individuals on the chronic HBV care continuum, to receive appropriate treatment and care17,18. We used a large community-based cohort study of middle-aged adults from ten geographically diverse sites in China to assess both genetic and non-genetic risk factors associated with chronic HBV.

Methods

Study population

The China Kadoorie Biobank (CKB) study design has been described in detail elsewhere19. A baseline survey was conducted in 2004–2008 among 512,726 men and women, aged 30–79 years, recruited from five urban and five rural geographically diverse areas in China. Potentially eligible participants were identified through official residential records in each of 100–150 administrative units (rural villages and urban residential committees) within each region. Trained health workers administered laptop-based questionnaires at local study clinics collecting information on sociodemographic and lifestyle factors (e.g. smoking, alcohol consumption, diet, physical activity) and medical history (e.g. history of blood transfusion, self-reported health and medical conditions diagnosed by a doctor including whether they had a history of chronic hepatitis or cirrhosis). Blood pressure, lung function and anthropometric measures were measured using standard protocols, and a non-fasting venous blood sample was collected at baseline for on-site tests and long-term storage. Resurveys following similar procedures were conducted in 2008 and 2013–2014 among a subset (4–5%) of surviving participants. Vital status of participants was determined periodically through national death registries, and episodes of hospitalization were collected via linkage to disease registries and national health insurance claims database, which has almost universal coverage in study areas. International Classification of Diseases, 10th Revision (ICD-10) were used to code disease events. Prior international, national and regional ethics approvals were obtained and all participants provided written informed consent.

Measurement of chronic HBV infection

Hepatitis B surface antigen (HBsAg) was measured in all participants at the baseline visit, using a point of care, lateral flow rapid diagnostic test (RDT), where participants’ venous whole blood was applied to an on-site rapid test strip (ACON dipstick). Results were recorded as positive, negative, or unclear. HBV antibodies to hepatitis B core antigen (anti-HBc) and hepatitis B e antigen (anti-HBe) were additionally measured in stored plasma samples from a randomly selected subcohort of 2000 participants who were alive and cancer free after two years of follow-up, using a Luminex-based multiplex serology panel20.

Genotyping and genetic variant selection

The present study used a subset of 75,982 genotyped samples using a custom-designed 800K-SNP array (Axiom; Affymetrix). This sample was approximately representative of the overall CKB cohort, where selection was by box of DNA samples, prioritising individuals from second resurvey study clinics (which were representative of the cohort) or at random from other recruitment sites. After exclusion of 4945 individuals who were regional population outliers based on genomic principal components analysis within regions, there were 71,037 people in the genetic analyses of chronic liver disease (CLD), and after further exclusion of 1139 participants with missing HBsAg data, there were 69,898 participants included in the main genetic analyses for HBsAg positivity (Supplementary Fig. 1). Replication of genetic associations were performed on 18 SNPs previously found to be associated with chronic HBV infection in prior GWAS. These were identified by searching the US National Human Genome Research Institute Catalog of Published GWAS21 for “hepatitis B virus infection trait” (Trait: EFO_0004197, searched August 2020) and limiting findings to GWAS reporting SNPs associated with persistent HBV infection, or susceptibility to HBV infection reported, and where the finding had been replicated (either in the relevant study or subsequent GWAS). Studies10,12,13,14,16,22,23,24 reporting these SNPs are summarized in Supplementary Table 12.

Disease outcomes in genetic analyses

CLD was defined as participants with either prevalent or incident liver disease. Prevalent liver disease included participants reporting either cirrhosis/chronic hepatitis or liver cancer diagnosed by a doctor in the baseline survey. Incident liver disease was captured by electronic health record linkage described above, including cirrhosis [ICD10: K70, K74], hepatic failure [K72] or liver cancer [C22]). A total of 1600 people (2.3%) had CLD at baseline or occurring during the follow-up out of 71,037 participants in the GWAS randomly selected sample (Supplementary Fig. 1).

Statistical analysis

Individuals with missing body mass index (BMI) (n = 2) or missing/unclear HBsAg (n = 11,733) data were excluded, leaving 500,991 participants for the main analysis. Prevalence estimates were generated for HBsAg and HBV antibodies by baseline characteristics, standardized by age (10-year categories) and study site among men and women separately. Logistic regression models were used to estimate odds ratios (OR) and 95% confidence intervals (CI) for HBsAg positivity associated with a range of baseline characteristics, including demographic, socioeconomic, behavioural and medical risk factors. In the basic model, we adjusted for age (5-year categories), sex and study site (10 areas). A forward selection method was then used to determine which factors were included in the multivariable model, where the likelihood ratio test (LRT) was used to compare the basic model with sequential addition of risk factors and those that significantly improved model fit were retained in the model (Supplementary: Table 3, 4 and Methods S1 for details). Given the high proportion of participants that consumed regular vegetables and were currently married (all > 90%), these factors were not included in multivariable analysis. Tests of trend were performed across ordered categorical variables.

Genetic associations were analysed using an additive model for individual SNPs. SNPs were orientated so that the risk allele was the allele associated with HBsAg positivity in existing GWAS. Logistic regression models for HBsAg status (positive, negative) and chronic liver disease (CLD, yes vs. no) were fitted, stratified by ten study sites and adjusted for age (10-year categories), sex and up to ten region specific principal components. Inverse-weighted fixed effect meta-analyses were used to calculate overall estimates and 95% CI. Additional analyses were conducted to investigate risk of progression to CLD among HBsAg positive participants, and, in the subset of 2000 people with HBV antibody data, the risk of HBsAg positivity among those exposed to chronic HBV, as measured by anti-HBc status).

All analyses used R v4.0.2 and PLINK v2.025. We considered two-tailed p < 0.05 as evidence of an association. To account for multiple testing in the genetic analyses we applied a Bonferroni correction to the significance level, dividing 0.05 by 18 SNPs tested (i.e., 0.003).

Role of the funding source

The funders of the study had no role in study design, collection, analysis or interpretation of data, or in the writing of the report.

Ethical approval and informed consent

The China Kadoorie Biobank (CKB) complies with all the required ethical standards for medical research on human subjects. Ethical approvals were granted and have been maintained by the relevant institutional ethical research committees in the UK and China: UK – Kadoorie Study of Chronic disease in China (KSCDC) - Baseline, Oxford Tropical Research Ethics Committee (OxTREC) (2005) OxTREC Ref: 25-04; China– Kadoorie Study of Chronic disease in China (KSCDC) – Baseline, Chinese Centre for Disease Control and Prevention. Ethical Review Committee (2004) Approval Notice 005/2004. Informed consent was obtained from all participants included in the study.

Results

Baseline characteristics and prevalence of chronic HBV infection

Among the 500,991 participants included, the mean (SD) age was 52.1 (10.7) years, 41.0% were men, 55.4% lived in a rural area, 18.2% had attained primary or middle school education. The overall HBsAg prevalence was 3.0% (n = 15,552), which was higher in men (3.4%) than in women (2.8%). HBsAg prevalence decreased with age particularly in men (Fig. 1A,B), and varied across study areas (Fig. 1C,D), with the highest prevalence in Southern Haikou (women: 4.8%; men: 6.4%) and lowest in Western Gansu (women: 1.8%; men: 1.9%). Overall, urban sites had a higher prevalence than rural sites among women (3.3% vs. 2.4%) and among men (4.0% vs. 3.1%;). HBsAg prevalence was higher in those less educated, agricultural workers and those with lower household income, while prevalence decreased across increasing number of years with a household fridge (Supplementary Fig. 2). Of HBsAg positive participants, 11.3% reported chronic hepatitis or cirrhosis at baseline, of whom 18.9% were on current treatment (Table 1). In the subcohort with HBV antibodies measured, overall seroprevalence was 45.0% and 44.8% for anti-HBc and anti-HBe respectively, where seropositivity for both antibodies were higher in males than females, and increased with older age (Supplementary Table 5).

Figure 1
figure 1

Prevalence of Hepatitis B surface antigen by age, sex and study area. Hepatitis B surface antigen (HBsAg) prevalences displayed (95% CI) are standardized by age (10-year categories) and study site (ten sites) among men and women separately, stratified by urban or rural location. (a) HBsAg prevalence in men by age category, (b) HBsAg prevalence in women by age category, (c) HBsAg prevalence in men by study site, (d) HBsAg prevalence in women by study site. HBsAg, hepatitis B surface antigen.

Table 1 Baseline characteristics of overall cohort and by Hepatitis B surface antigen status.

Conventional risk factors associated with HBsAg positivity

Table 2 shows relationships between a range of conventional risk factors measured at baseline with HBsAg positivity. Compared to their counterparts, participants who were of younger age, male, resident in urban sites, underweight, with no formal education, with a history of blood transfusion or with poor self-reported health status at baseline had higher HBsAg positivity; while the converse was true for people with higher household income, occasional alcohol intake, longer use of a household fridge and who were overweight (all p < 0.001). Of these, the strongest associations were seen with age (< 40 vs. ≥ 60 years old: OR 1.48 [95% CI 1.32–1.66]), sex (male vs. female: 1.40 [1.34–1.46]), study area (urban vs. rural: 1.55 [1.47–1.62]) and self-rated health (poor vs. good health status: 1.29 [1.22–1.37]). Compared to non-drinkers, occasional and ever-regular alcohol drinkers had lower HBsAg positivity. For BMI, compared to the normal BMI category, underweight participants had higher HBsAg positivity (1.11, 1.03–1.20), while the opposite was true for overweight participants (0.92, 0.89–0.95). Occupation, household size, physical activity, smoking and fruit intake were not associated with HBsAg positivity after multivariable adjustment (Table 2).

Table 2 Odds ratios and 95% CI for baseline factors by Hepatitis B surface antigen status.

Genetic risk factors associated with HBsAg positivity

Of the 68,899 participants included in the genetic analysis, 2069 (3.0%) participants were HBsAg positive (Supplementary Table 6). Among the 18 SNPs studied, the risk allele frequency (RAF) varied by study site (Supplementary Table 7), with up to a twofold difference for certain SNPs (e.g. rs652888 G allele RAF 0.16 in Henan and 0.31 in Liuzhou). Overall 17 SNPs were associated with higher odds of HBsAg positivity, with 13 passing the significance threshold after multiple-testing adjustment (Table 3). The strongest associations were observed at SNPs located in the HLA-DPB1 gene, including rs9277535 (1.48, 1.39–1.58), rs7770370 (1.38, 1.29–1.47) and rs3077 (1.34, 1.25–1.44). Variants rs3130542 and rs2853953 near the HLA-C gene were also associated with HBsAg positivity. Non-classical HLA variants associated with HBsAg positivity included rs652888 in EHMT2 (1.18, 1.10–1.27); rs422951 in NOTCH4 (1.19, 1.10–1.29) and rs12614 in CFB (1.48, 1.26–1.73). Two SNPs in a non-HLA locus were associated with HBsAg positivity: rs1883832 near gene CD40 (1.12, 1.05–1.19), and rs4821116 near UBE2L3 (1.10, 1.03–1.17). SNP rs7000921 did not replicate, although the association was directionally consistent with the previous GWAS.

Table 3 Genetic variant associations of select SNPs with HBsAg positivity and chronic liver diseasea.

Overall 14 of 18 SNPs tested were associated with CLD (Table 3), with 11 having p values < 0.003. The SNPs with the strongest associations included HLA variants rs9277535 (1.28, 1.19–1.38), rs7770370 (1.23 1.14–1.32) and rs3077 (1.25, 1.16–1.36). SNPs rs9266816 near HLA-DPA3 (1.20, 1.12–1.29), rs7453920 near HLA-DQB2 (1.26, 1.11–1.42) and rs9276370 (1.23, 1.11–1.37) were also associated with CLD, as were non-classical HLA variants 1419881 near TCF19 (1.16, 1.08–1.24) and rs421446 near MIR219A1 (1.15, 1.07–1.24).

Among the 2069 HBsAg positive participants, rs652888 was the only SNP showing an association with CLD (n cases = 406; 1.73, 1.12–2.64) (Supplementary Table 8). In analyses of anti-HBc positive participants (n = 769 from the subset of 2000), only rs7453920 showed an association with HBsAg positivity (n = 49; 3.22, 1.35–9.62) (Supplementary Table 9).

Discussion

This large nationwide study of Chinese adults presents findings on both non-genetic and genetic risk factors associated with chronic HBV infection. While HBsAg prevalence was 3% in the overall cohort, this varied greatly by study site, with younger age, male sex, socioeconomic factors, alcohol intake and BMI strongly correlated with HBsAg positivity. We also replicated findings of existing GWAS for genetic variants previously associated with chronic HBV, and showed that several of these were additionally associated with risk of CLD. This is the largest population-based study in China to assess both non-genetic and genetic risk factors associated with chronic hepatitis B infection, highlighting the role that numerous risk factors may play in chronic hepatitis B infection.

Several large nationwide surveys in China have previously reported regional variation in HBsAg prevalence. Historically rates of chronic HBV have been higher in rural, western regions of China, but with widespread urbanization and mass migration of rural workers to large coastal cities and eastern provinces, these patterns have been shifting26. A nationally representative serosurvey conducted in 200627 of ≈ 41,000 people aged 1–59 years recruited from 31 provinces measured HBsAg in serum blood samples using ELISA, reported higher HBsAg prevalence in western (8.3%) and rural (7.3%), compared to eastern (6.5%) and urban (6.8%) areas. However, there was large variation within these broad geographical regions—for example in Western China, HBsAg prevalence was 11.6%28 and 3.9%29 in Sichuan and Gansu province respectively. A more recent study in 2 million men aged 21–49 years in rural China enrolled in the National Free Preconception Health Examination project (NFPHEP) between 2010 and 2012 reported HBsAg prevalence of 7.7%, 5.5% and 6.5% in Eastern, Central and Western China respectively30, while other large population based cross-sectional studies, mostly conducted in Eastern China, have reported higher HBsAg prevalence among areas of lower socioeconomic status31, coastal areas32 and areas containing a higher proportion of immigrants33. In CKB, we also found large geographic variation in HBsAg prevalence, where study sites in southern and eastern China had higher HBsAg prevalence than western and north-eastern sites, and urban sites tended to have higher prevalence than rural sites. This regional variation in HBsAg prevalence highlights both the need to draw on populations from diverse areas of China, where the relative importance of correlates with chronic HBV may vary, and that pooled estimates across large regions may obscure important intra-region differences in HBsAg prevalence. The lower prevalence of HBsAg positivity in CKB compared to the 2006 National serosurvey27 which reported an overall HBsAg prevalence of 7.2% (30–60 years: 8.6%) may reflect this regional variation in HBsAg prevalence, in addition to the CKB cohort including adults aged over 59 years (with lower HBsAg prevalence), and the HBsAg test used in CKB having lower sensitivity than ELISA34.

The trend of HBsAg positivity in relation to age has shifted over recent decades; as the proportion of vaccinated younger adults increases, HBsAg prevalence peaks at older ages. For example, the 2006 National serosurvey27 reported peak HBsAg prevalence in 20–29 years olds (10.5%), while another large cross-sectional study of ≈ 87,000 adults recruited in 2009–2010 in Eastern China reported HBsAg peaked in participants aged 35–40 years31 at 11.6%, and a third population based study in 2013 in Western China reported peak prevalence in 53–57 year olds at 10.5%35. Prevalence tends to decrease with age beyond this peak, as more of the population undergoes HBsAg seroclearance, and a proportion of the infected individuals are diagnosed and treated, or die from liver related disease. The inverse association observed between older age and HBsAg prevalence is consistent with this, as participants in CKB are from the pre-vaccine era and thus largely unvaccinated.

The higher levels of HBsAg positivity among men compared with women has been described in past studies, including an absolute difference of 3% in the 2006 National serosurvey27 (8.6% men; 5.7% women) and up to a twofold relative difference in odds of HBsAg positivity in other large population based studies36,37,38. This is similar to our finding of a 1.4 fold greater risk in HBsAg positivity in men than in women. This sex disparity is hypothesized to be related to a differential HBV-related immune response where immune clearance of serum HBsAg is achieved in a higher proportion of women than men, in addition to women gaining better protection from HBV vaccination39.

Past studies in China have also reported on the association between education level and HBsAg positivity27,36,40,41,42, with most showing an inverse association, consistent with our findings. Findings for occupation have been mixed, although agricultural work has been associated with HBsAg positivity in several past studies27,40,43, consistent with the higher prevalence of HBsAg positivity in agricultural workers in our study, which may reflect geographic variation and socioeconomic status. Two past studies in Henan37 and Jilin44 also reported no association between smoking and HBsAg positivity, while few studies have examined the association between self-rated health, alcohol intake or BMI and HBsAg positivity. Self-rated health is likely a marker of socioeconomic status, consistent with higher HBsAg prevalence among participants with lower levels of education described in past studies. Two existing studies reporting the association between HBsAg and BMI had conflicting findings—one population based study of ≈ 400,000 adults in Sichuan province found participants with BMI ≥ 25 kg/m2 were significantly more likely to be HBsAg positive compared to normal weight (BMI 18.5–25 kg/m2) counterparts (OR 1.08, 95% CI 1.05–1.11)44; while the other reported45 in ≈ 3500 adults in Shanghai, an inverse association with odds of HBsAg positivity and BMI, where participants with BMI ≥ 28 kg/m2 were 49% (95% CI 6–72%) less likely to be HBsAg positive than participants of normal weight. The association we observed between HBsAg positivity and BMI is consistent with this latter study, and may reflect socioeconomic status or reverse causation, whereby participants with chronic HBV may have lost weight in the course of their illness. Furthermore, a U-shaped association between BMI and cirrhosis in CKB has been previously described46. Two past studies on Chinese adults in conducted in Sichuan and Guangdong province, reported lower risk of HBsAg positivity among occasional or low to moderate alcohol drinkers compared to never drinkers35,47, while another study32 conducted in Zhejiang province found that any drinking was associated with a 30% (27–34%) higher risk of HBsAg positivity compared to no drinking. The apparent protective association between alcohol intake and HBsAg positivity in our study may reflect altered behaviour related to alcohol intake among known HBV positive people or people with CLD, for whom abstaining from alcohol may be recommended.

Since the first GWAS on chronic HBV was conducted in 2009, the number of SNPs significantly associated with chronic HBV has expanded from SNPs at HLA class II loci, to include those at HLA class I loci, non-classical HLA SNPs and non-HLA SNPs. Most previous GWAS were based on diagnosed clinical conditions such as CLD or liver cancer10,12,13,14,16,22, meaning that participants with different HBV phenotypes such as less severe disease, or chronic HBV without progression to liver disease, may be under-represented. Furthermore, although most GWAS have been performed in participants of East-Asian ancestry, several used populations from particular geographic areas, and tended to be modest in sample size, ranging from between ≈ 400016 to ≈ 15,00023 people. Our study included > 65,000 participants and replicated the associations of 17 SNPs with HBsAg positivity. We did not replicate rs7000921 (INTS10) previously reported in a Chinese ancestry case–control study of ≈ 9500 people24. However, the phenotype examined in that study was HBsAg positivity among anti-HBc positive individuals, which we had limited power to explore due to the small size of the sub-cohort with anti-HBc data.

Existing evidence suggests there is little overlap between SNPs associated with HBsAg positivity and those associated with progression to HBV-related liver disease, where a systematic review of published SNPs associated with different HBV phenotypes found that the overlap occurred between SNPs associated with HBV positivity and HBV vaccine response, rather than with disease progression11. Most past GWAS on disease progression have reported HCC progression among HBsAg positive participants. These differences in population and phenotype reported in past GWAS may explain our finding of 14 of 18 SNPs being associated with CLD: we examined CLD more broadly among all participants regardless of HBsAg status, with limited power to investigate progression to CLD among HBsAg positive participants.

The strengths of this study include its large size from diverse geographic areas, both middle- and older-aged population and breadth of information that enabled investigation of a wide range of both conventional and genetic factors associated with HBsAg positivity. To date risk factors associated with chronic HBV have been focused on factors related to mother to child transmission and age of infection; while evidence around associations of socioeconomic, behavioural and medical factors with chronic HBV among adults is lacking. Given that the key burden of chronic HBV related disease occurs in middle-aged and older adults, and the low diagnosis rate of chronic HBV in China, our study findings help fill the evidence gap. However, our study also has several limitations. First, the RDT HBsAg test has lower sensitivity than laboratory-based tests such as ELISA used in most existing smaller studies34, leading to a likely underestimation of HBsAg prevalence, which may be more pronounced in those with lower viral load, such as older participants. Second, due to lack of other hepatitis data (e.g. anti-HBc, e-antigen) in the whole cohort, we were only able to compare HBsAg positive to HBsAg negative individuals. We therefore were unable to detect individuals with occult infection, or investigate other phenotypes of interest such as HBsAg viral clearance. Although this approach is consistent with the approach taken by past GWAS12,14,22, several others were able to investigate HBsAg clearance among a cohort of exposed participants13,23,24. Third, we investigated SNPs identified in previous GWAS in different populations, mainly from the HLA region, but did not explore further the likely multiple independent effects from various SNPs in this part of the genome, which may vary among different populations. However, our work nonetheless adds to the evidence regarding the likely association between HLA variance and HBsAg positivity in Chinese populations. Four, we did not have information on other relevant risk factors, including drug use, number of sexual partners and vaccination status, in addition to information on viral subtype, which is an important source of disease heterogeneity. Finally, this is a cross-sectional study investigating associations between a range of non-genetic and genetic factors with prevalent chronic HBV, which does not capture risk of incident infection.

In summary, this study adds to the current knowledge of factors associated with HBV chronicity in adults, which may help to inform targeted HBsAg screening, enabling improved diagnosis and capturing of individuals on the HBV care continuum. Future research combining conventional and genetic risk factors, including viral genotypes, could further improve knowledge about the risk of HBV chronicity and disease progression.