Introduction

More than half of the hepatocellular carcinoma (HCC) cases worldwide occur in Southeast Asia and sub-Saharan Africa, where hepatitis B virus (HBV) infection is hyperendemic. The HCC risk is extremely high, with a relative risk of 20, for chronic HBV carriers when compared with that for non-carriers.1 However, it was found that less than one-quarter of chronic HBV carriers were affected with HCC in their lifetime.2 The heterogeneity among chronic HBV carriers in the variable rates of progression to HCC is the result of many factors, including viral factors, such as viral variants and replication activity, environmental conditions, and host factors.3, 4, 5, 6, 7, 8, 9, 10, 11

In Taiwan where hepatitis B is hyperendemic, 15% of patients with HCC have first-degree relatives with a history of HCC. A familial tendency is one of the risk factors for HCC.2, 12 Several candidate gene-association studies on HBV-related HCC have been carried out, but have not found any genetic difference that can account for the underlying mechanisms of heritable causes for the familial clustering of HCC.7, 8, 11, 12, 13, 14, 15, 16 Using linkage analysis with a dense map of microsatellite markers, we recently identified a HCC-susceptibility locus in the vicinity of the marker D4S3240 on chromosome 4q25.17 Although this linkage region has been identified as a region of frequent allelic loss in HCC,18 only a polymorphism in the epidermal growth factor gene on 4q25 has been shown to predispose individuals to HCC among cirrhotic patients who were mostly affected by factors other than HBV.19

In this study, we used family-based data with single-nucleotide polymorphisms (SNPs) to search for evidence of association within the 4q25 linkage region. We have provided evidence for an association of SNPs with haplotypes in the 3′-phosphoadenosine 5′-phosphosulfate synthetase-1 (PAPSS1) gene with HCC and have also shown that this association explains our earlier reported linkage peak.17 Using death registry data and another set of unrelated cases, we further showed that PAPSS1 SNPs were associated with serum α-fetoprotein (AFP) and HCC survival.

Materials and methods

The research ethics committee at the College of Public Health, National Taiwan University approved this study, and all subjects provided written informed consent.

Patients

Patients with HCC were consecutively recruited from the Chang Gung Memorial Hospital, Taipei Veterans General Hospital, and from National Taiwan University Hospital, the three major teaching hospitals that has provided primary to tertiary medical care in Taiwan, since 1997.2 The diagnosis of HCC was based on either histological or cytological findings, or on elevated serum AFP levels (≥100 ng/ml) combined with at least one positive liver image on arteriography, sonography, and/or computed tomography. By 2004, 3204 patients who represented 90% of those who were originally contacted had been recruited with questionnaire interview and from DNA samples extracted from peripheral white blood cells and/or buccal swabs. The annual follow-up for date of death was obtained through data linkage with computer files of the national death registry system.

Family sample

Relatives aged 25–75 years were recruited through the index cases. All of the participating relatives were interviewed in person, using a structured questionnaire, and underwent a clinical assessment at the time of enrollment. Relatives with HCC were ascertained through an annual follow-up examination and data linkage with computer files of the national cancer and death registry system. The clinical evaluation that was performed at study entry and during follow-up comprised abdominal ultrasonography, conventional liver biochemical tests, serological tests for HBV surface antigen (HBsAg) and antibodies against hepatitis C virus (anti-HCV) (only at the time of entry), anthropometry, and six tests of cardiovascular disease risk factors.

Families that met the following criteria were included in this study: (1) at least one blood relative with HCC available for genotyping (multiplex families); (2) both parents available for genotyping (trios); or (3) one parent and at least two unaffected siblings available for genotyping. The data set comprised 1003 genotyped individuals (295 affected and 708 unaffected individuals) from 240 families (74 multiplex and 166 singleton families) (Table 1).

Table 1 Characteristics of the 240 study families with HCC

Independent case series

To determine whether genetic variants in the critical region were also associated with patient subgroups and outcome, an additional set of unrelated HBsAg-positive cases under the age of 75 years were selected from the database of 3204 HCC patients for the purpose of survival analysis. This series of cases comprised the first 912 patients who met the inclusion criteria in the database list. Our analysis was restricted to HBsAg carriers who represented 87.5% of the affected individuals in the family sample; hence, we could compare the disease-specific survival between patients unselected for family history and those from families with HCC.

Selection of SNPs and genotyping

SNPs were identified from public databases (http://www.hapmap.org/ and http://snp.ims.u-tokyo.ac.jp/). First, we genotyped 43 SNPs, with an average inter-marker distance of 146.63 kb across a 6.16-Mb region, which were 1-heterogeneity LOD (logarithm of odds) score down from the linkage peak (at marker D4S3240) of 4q25 in the family sample.17 When associations with HCC were identified for two SNPs located between 108.69 and 108.79 Mb, the additional 24 SNPs in the surrounding region were then tested for high-resolution association mapping. In an independent case series, we selected a subset of SNPs, which can capture all other untyped SNPs in the critical region.

SNPs were genotyped using the TaqMan technology implemented on an ABI PRISM 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's protocol. Genotypes of missing relatives were inferred from available genotyping information of more than three offsprings and a spouse. All genotypes were checked for inconsistent Mendelian inheritance using the PedCheck software,20 and all inconsistencies were eliminated by either retyping or by removal from the analysis.

Sequencing

To identify any missense SNPs that might be of functional significance in the implicated region, we re-sequenced each exon from 6 to 12 of the PAPSS1 gene in DNAs from 48 unrelated individuals (including 40 HBsAg-positive cases and 8 unaffected HBsAg carriers randomly selected from our study subjects). This sample size for analysis was deliberately chosen (96 chromosomes) to identify the common variants with a frequency of >5% and a probability of 95%. The coding regions and at least 60 bp of each flanking intronic sequence were amplified using polymerase chain reaction and sequenced using conventional dye-primer sequencing on the ABI3730 DNA Analyzer System (Applied Biosystems).

Statistical methods

A Hardy–Weinberg disequilibrium test was performed for each SNP in 270 unrelated founders, using the SAS software with 10 000 Monte Carlo simulations to obtain the exact P-value estimates. The Haploview version 4.1 software was used to estimate pairwise linkage disequilibrium (measured as Lewontin's standardized disequilibrium coefficient, D′). Haplotype blocks were defined using the default confidence interval (CI) algorithm.21 We performed the pedigree disequilibrium test (PDT) as implemented in the PDTPHASE program within the UNPHASED (version 2.404) suite of programs to examine associations between SNPs or haplotypes and HCC.22 The reported P-values were based on the PDT-sum statistic. In addition to PDT, we also included the family-based association test (FBAT) and the haplotype-based association test using the FBAT software version 1.7.2.23 The complementary use of the additional method will reduce the chance for false positive findings. We calculated both nominal and empirical P-values that were derived from 10 000 simulating replicates. The Benjamini-corrected false discovery rate was used to correct for multiple testing.24 After adjustment for multiple testing and with a false discovery rate of ≤0.05, the cutoff for a significant association was P=0.0075 for single-locus analyses. The GIST (Genotype identity-by-descent sharing test) was performed using affected-sibship data to test whether our earlier described evidence for linkage on 4q25 could be explained by any of the tested SNPs.25 We used conditional logistic regression on sibship data with family as the stratification variable to estimate the odds ratio and 95% CI for the association between implicated haplotypes and HCC.

Survival was calculated from the date of hospital admission to the last reported search for death entries (31 December 2005) in the national death certification system. Patients whose causes of death were not due to HCC or cirrhosis were censored in the survival analysis. The Kaplan–Meier method was used to generate survival curves, and the Wilcoxon test was used to compare the survival curves between groups. Hazard ratios were calculated using the Cox regression model with adjustment for putative risk factors and clinical features. The analyses were performed using the SAS version 9.1 software (SAS Institute, Cary, NC, USA). All statistical tests were two-sided.

Results

Characteristics of study family and SNPs

Of the 240 study families, 212 (88.3%) were ascertained through HBsAg-positive index cases. HCC occurred more frequently in males than in females. The mean age (±SD) at diagnosis of HCC was 43.7 (±11.3) years. A majority of the subjects were HBsAg carriers (87.5% affected and 39.6% unaffected individuals). The positive rate of the antibodies against hepatitis C virus was 10.9% in the affected subjects and 3.5% in the unaffected (Table 1). None of the 67 genotyped SNPs showed evidence of deviation from the Hardy–Weinberg equilibrium. The minor allele frequency for these SNPs was between 0.067 and 0.50 (Supplementary Table 1).

Family-based single-locus association analysis

The initial assessment of the 43 SNPs by PDT revealed evidence of association for 5 SNPs (rs1514728, P=0.0025; rs2271590, P=0.0032; rs12512339, P=0.0350; rs2704108, P=0.0371; and rs2739206, P=0.0421), of which the 2 most significant SNPs resided in the region flanking or within the PAPSS1 gene. This prompted us to analyze an additional 24 SNPs in a 300 kb interval (108.63–108.93 Mb) with the PAPSS1 gene as the principal target. We then identified 8 other SNPs with nominal P-values from 0.0004 to 0.0063 (permutated P=0.0002–0.0045). The 10 significant SNPs in the region surrounding PAPSS1 remained significant after correction for multiple testing. Although the level of significance was lower in the FBAT analysis compared with that in the PDT results, the same pattern of association was observed (Table 2).

Table 2 Allele tests for associations between SNPs in a 127.45-kb interval at 4q25 and HCC

With the exception of rs9569, which is in the 3′-untranslated region of PAPSS1, 9 of the 10 HCC-associated SNPs were in the intronic or intergenic region without any known function. We applied specialized algorithms (including miRBase,26 microRNA.org,27 PupaSuite,28 and miRNAMap29) with the default parameters included in the software to investigate whether rs9569 resides in a microRNA target site. The results indicated that microRNAs, hsa-mir-520d-5p and hsa-mir-421, can bind to a site containing rs9569.

SNPs potentially explaining the observed linkage

According to the GIST analysis under the additive weighting scheme, the genotypes observed for 6 of these 10 implicated SNPs located in or around PAPSS1 could be responsible for the linkage of HCC to marker D4S3240 on 4q25 in our sample (Table 2). Investigating the same issue using a dominant or recessive model yielded similar results (data not shown).

Linkage disequilibrium structure

The region associated with HCC spanned 127.45 kb, extending from the intron 5 to the 3′-flanking region of PAPSS1. Two highly correlated haplotype blocks (multiallelic D′=0.72 between blocks 2 and 3) were identified across this region (Figure 1).

Figure 1
figure 1

Schematic representation of the PAPSS1 gene showing locations of the ATP-sulfurylase and adenosine 5′-phosphosulfate (APS) kinase domains and the haplotype blocks. Values for D′ (original magnification, × 100) are shown. The five-color scheme (white to black) represents the increasing strength of linkage disequilibrium. SNPs significantly associated with HCC after multiple testing adjustments are indicated with an asterisk (*).

The fractions of all common (minor allele frequency ≥0.05) SNPs (Han Chinese HapMap release #23a) that were captured by the tested SNPs were 0.95 and 0.86, respectively, for blocks 2 and 3. The haplotype block structure surrounding PAPSS1 in our sample is comparable with the structure defined using the HapMap SNPs. To confirm the critical region for HCC-susceptibility locus, we then searched the HapMap data to identify additional SNPs that were in r2≥0.8 with any of the SNPs in blocks 2 and 3. Although an exhaustive search with a 1.25-Mb window surrounding PAPSS1 was carried out, none of the SNPs outside the critical region met this criterion (data are available from the author on request).

Family-based haplotype association analysis

To prioritize regions of interest, we also performed sliding window haplotype analysis of varying sizes from 2 to 4 SNPs per window on SNPs 15–32, which covered the associated region. The same pattern was observed using sliding windows of 2–4 SNPs. Multiple haplotypes showed evidence for association with HCC. The haplotype at SNPs 29–30–31 revealed the minimum P-value across all windows (global permutation P=0.0012, with the corrected P-value approaching significance (cutoff P-value=0.0010) after multiple testing adjustment for 48 tests) and high significance for both PDT and FBAT (Figure 2).

Figure 2
figure 2

Haplotype analyses for all three SNP sliding windows over blocks 2 and 3. Global nom P, global nominal P-value; Global per P, permutation-based global haplotype P-value. P-value of 0.05 or less. Significant P-values are shown in boldface. PDT, pedigree disequilibrium test; FBAT, family-based association test.

Effect of PAPSS1 SNPs

To estimate the magnitude of all potential effects on the risk of HCC associated with PAPSS1 genetic variants, we performed a conditional logistic regression on 145 sibships discordant for HCC (183 affected and 433 unaffected individuals) stratified by family. The matched odds ratios of HCC were calculated for haplotypes of SNPs 29–30–31, the marker combination leading to the lowest global P-value in the PDT/FBAT analyses. The odds ratios for heterozygotes and homozygotes of the C–T–T haplotype were 1.55 (95% CI=0.81–2.98) and 3.41 (95% CI=1.36–8.53), respectively, as compared with that for individuals without this haplotype.

We selected 10 (including 6 significant SNPs identified by the family sample) SNPs, which captured a 100% genetic variation of the 16 SNPs in the critical region (blocks 2 and 3) for survival analysis in the independent series of 912 unrelated cases. Several SNPs showed associations with increased AFP levels (SNP17, P=0.0459; SNP19, P=0.0152; SNP27, P=0.0261; and SNP30, P=0.0145) or with worse survival (SNP17, P=0.0173; SNP19, P=0.0183; SNP22, P=0.0325; and SNP30, P=0.0076) in cases with small tumor (≤2 cm) present at hospital admission. Consistent with the family study, we found SNP30 to be the most associated SNP. The C–T–T haplotype at SNPs 29–30–31 was also associated with an elevation of the serum AFP levels (Table 3). None of the haplotypes were associated with tumor size or number of lesions (data not shown).

Table 3 Odds ratios (ORs) of haplotypes of SNPs 29–30–31 in the PAPSS1 gene with the risk of HCC and elevated serum AFP levels (ng/mL)a

Characteristics and clinical features of the cases included in the survival analysis are shown in Table 4. The median age at diagnosis of HCC was lesser in non-familial cases from singleton families than in familial cases and the independent set of unrelated cases unselected for family history. Among unrelated cases with small tumor (≤2 cm) present at hospital admission, those with haplotype-1 C–T–T of SNPs 29–30–31 had an increased hazard ratio of 1.74 (95% CI=1.12–2.71) for death when compared with cases without haplotype-1. For familial cases with small tumor, the hazard ratio of haplotype-1 for death was 4.25 (95% CI=1.01–17.86) (Figure 3). We did not find an association between survival and haplotype-1 in non-familial cases or in cases with tumor >2 cm.

Table 4 Characteristics of HCC cases in the survival analysisa
Figure 3
figure 3

Haplotype 1 (Ht1, C–T–T at SNPs 29–30–31) and HCC survival among cases with small tumor (≤2 cm) present at the time of hospital admission. Unrelated cases (a) are an independent series of cases. Familial (in multiplex families) (b) and non-familial (in singleton families) (c) cases are from the family sample. The 3-year survival (95% confidence interval (CI)) for each group of cases, Wilcoxon P-value and hazard ratios (HRs) of death for cases with Ht1 versus those without Ht1 are presented. HRs were adjusted for age (continuous variable), sex, cigarette smoking (yes versus no), alcohol abuse (≥80 versus <80 g/day), anti-HCV status, serum α-fetoprotein levels (>400 versus ≤400 ng/ml), and number of lesions (single versus multiple).

Sequence analysis

No missense variants were detected by re-sequencing all coding regions of exons 6–12 of PAPSS1 among 48 individuals, 40 of whom were cases. Five non-coding/silent variants were found, including three earlier reported SNPs and two insertion events (Supplementary Table 2).

Discussion

In this study, we detected an association of the PAPSS1 gene with HCC in a family sample. This gene is located 1.15 Mb away from the linkage peak at D4S3240 in the 4q25 region. Several points suggest that our results may reflect a genuine association between PAPSS1 and HCC. First, the association was identified using family-based association analysis, which was robust against false positive associations from cryptic population stratification.30 Second, multiple SNPs, which belonged to two correlated haplotype blocks, were associated with HCC, and the P-value associated with 10 SNPs was significant on correction for multiple testing. Third, we showed that the observed associations with HCC for multiple PAPSS1 SNPs at least partially accounted for the linkage signal on 4q25.

In addition, the observed association between PAPSS1 SNPs and HCC is unlikely because of long-range linkage disequilibrium with neighboring genes as the structure of haplotype blocks in the critical region, where multiple SNPs were found to be associated with HCC, are comparable with that of the blocks defined by HapMap in Han Chinese population; and we did not detect any r2≥0.8 between alleles of SNPs in the critical region and those in other genes by using a 1.25-Mb window surrounding PAPSS1.

We also found an association for a PAPSS1 haplotype with high serum AFP levels and with poor HCC survival in an independent set of HBsAg-positive cases with small tumor present at hospital admission. Homozygosity for this haplotype increased the risk of developing HCC threefold, estimated by using the case-sibling matched design. Interestingly, familial cases with small tumor also had worse survival associated with harboring the same haplotype. It is unclear why no such association was observed in non-familial cases from singleton families. However, these cases were substantially younger than the other two groups of cases. Their survival also seemed to be longest among cases.

The PAPSS1 gene contains 12 exons, including the adenosine 5′-triphosphate (ATP) sulfurylase and adenosine 5′-phosphosulfate kinase domains. PAPSS1 catalyzes the synthesis of PAPS through reactions of ATP sulfurylase and adenosine 5′-phosphosulfate kinase. There are two PAPSS isoforms, PAPSS1 and PAPSS2. PAPSS1 is expressed in virtually all human tissues.31, 32 PAPS is a co-substrate for all sulfation reactions, which is a key step in the metabolism of a broad range of compounds, of which some have been implicated in cancer; these include bile acids, phenolic xenobiotics, endogenous and exogenous estrogens, and certain classes of environmental carcinogens, such as hydroxylated aromatic amines.33, 34, 35 In addition, sulfation of extracellular matrix is of crucial importance for a multitude of processes, such as extracellular signaling and adhesion of leukocytes to endothelial cells at the site of inflammation.34, 36 It was recently shown that PAPSS1 influenced retroviral replication by affecting a step during provirus establishment.37

It is noted that multiple SNPs associated with HCC clustered within the PAPSS1 ATP sulfurylase domain that included exons 6–12 and 3′-untranslated region. Two rare missense variants in this domain have been reported in other races; one of which can change in vitro substrate kinetic properties.32 We did not find any missense variant in the ATP sulfurylase domain. The lack of coding SNPs perhaps suggests that regulatory SNPs that influence gene expression might be involved, although we may not have re-sequenced enough cases to identify the functional, causal variants. In this study, we identified a significant association with HCC for the 3′-untranslated region SNP rs9569, which resides in putative microRNA binding site. Given the functional nature of rs9569, it may be the most suspected causal variant for HCC that lies in our implicated region. However, the haplotypes that are significantly associated with HCC almost completely span the critical region. We have not been able to implicate a single causative variant, because of the tight linkage disequilibrium in the region. By performing a sliding window haplotype analysis, we found that the most significant association was at SNPs 29–30–31, in which only SNP30 showed significant association with HCC. Therefore, the association with C–T–T haplotype might be derived from SNP30. Taken together, the haplotype-based analysis supports the association between single SNP and HCC.

In conclusion, we have shown that polymorphisms within the ATP sulfurylase domain of the PAPSS1 gene are associated with HCC in a family sample enriched with familial or early-onset cases. We have also shown that an allelic combination in a haplotype significantly increases HCC susceptibility, and is associated with a poor prognosis of HCC. These associations were seen mostly in HBV carriers. Considering the recent discovery that PAPSS1 plays a role in retroviral replication,37 PAPSS1 polymorphisms may modify the response to viral infection. Perhaps it is plausible to suggest that a similar mechanism involving PAPSS1 may occur in HBV infection. Our study is the first to suggest the importance of PAPSS1 in HCC. Future genetic and functional studies are necessary to further validate our findings.