Introduction

Hepatocellular carcinoma (HCC) is, worldwide, the sixth most common cancer and the third leading cause of cancer-related mortality, resulting in 600 000 deaths annually.1 As a heterogeneous disease with a multistep pathogenesis, there are several risk factors that are mechanistically implicated in the development of HCC, including infection by Hepatitis B (HBV) and Hepatitis C (HCV) viruses. HCV-related HCC has emerged as a growing public health concern in both the developed and developing worlds, with cost being a major impediment to the widespread use of new, effective antiviral therapies.2 HCC is commonly associated with decades of chronic inflammation and cirrhosis, but can occur in as many as 20% of individuals without cirrhosis.3 Inflammation of the liver by itself does not appear to be sufficient to induce HCC, as the tumour rarely co-occurs with chronic autoimmune hepatitis.4 In animal models, the expression of individual HCV proteins has led to inflammation-independent tumour induction, indicating that HCV infection alone may be enough to promote oncogenesis.5, 6 As ~75% of HCCs are multifocal,7 the model that emerges is one in which hepatocytes throughout the liver have acquired a similar increased potential for tumorigenesis, a ‘field defect’ previously suggested by altered transcription patterns in the non-malignant areas of livers in individuals with HCC.8

The mechanism by which an oncogenic virus induces a field defect and cancer susceptibility is not known. While DNA sequence mutations characterize the malignant HCC,9, 10, 11, 12 there is sparse evidence for extensive mutational events in infected, non-neoplastic hepatocytes.13 The widespread transcriptional changes of these infected hepatocytes8, 14 indicate disturbances with broad effects in the genomes of these cells. Epigenetic regulatory mechanisms represent an attractive candidate group of mediators of such a field defect, given their potential for widespread transcriptional effects (reviewed previously15, 16), and their observed dysregulation in malignancies (reviewed previously17) and pre-malignant neoplasias.18, 19, 20 Locus-specific epigenetic events are believed to occur early during hepatocarcinogenesis,21 and several studies have found DNA methylation changes to occur at a limited number of loci tested in pre-neoplastic lesions associated with HCV infection.22, 23 More recent studies have investigated genome-wide epigenetic changes during the progression from HCV-associated cirrhosis to HCV-related HCC, finding potential prognostic value to identifying aberrant patterns of DNA methylation at specific gene promoters, CpG islands and CpG island shores.24, 25

The most commonly used technique for studying the epigenome in cancer is the measurement of DNA methylation at as many sites as possible throughout the genome. Changes in DNA methylation can be global, with a general decrease in levels exemplified by colorectal carcinomas (reviewed previously26), but can also be locus-specific, as found with silencing of tumour-suppressor genes involving the abnormal acquisition of DNA methylation at their promoters.27, 28 Local changes in DNA methylation are associated with changes of nucleosome positioning at cis-regulatory elements in cancer.29 A further observation in non-neoplastic cells is that the changes of DNA methylation occur associated with the presence or absence of transcription factor (TF) binding to that locus.30, 31 The causality of the relationship between DNA methylation and transcription factor binding is complex,15, 32, 33 but the model linking these events is relatively straightforward – DNA methylation changes at cis-regulatory elements (which are defined by transcription factor binding) represent a footprint of changes of transcription factor binding. Distal (non-promoter) cis-regulatory elements are also being recognized to be the more informative loci for DNA methylation changes associated with transcriptional regulatory functions,34, 35, 36 cancers37, 38 and other diseases.39

A problem in interpreting DNA methylation changes from genome-wide assays is that the locations of distal cis-regulatory elements are very variable between cell types40, 41 and not readily interpretable from fixed genomic annotations like the positions of genes, CpG islands or their shores.42 CpG islands, regions with unusually high density of CG dinucleotides,43, 44 have been found to acquire increased DNA methylation in some tumours45, 46 (reviewed previously47, 48). They also correspond to bivalent chromatin domains in pluripotent cells,49 loci that are targets of the polycomb repressive complex in early development that are believed to be unusually prone to the acquisition of DNA methylation later in life as part of oncogenesis.50, 51, 52 As genome-wide DNA methylation assays have begun to test more sites in the genome, it is now becoming apparent that DNA methylation changes appear to target cis-regulatory sites more frequently outside than within CpG islands.42, 53, 54 We have previously shown that CpG island shores, where DNA methylation is unusually informative for changes associated with cancer or local transcription levels,40, 42, 55, 56 are enriched for chromatin features indicating enhancer functions.57 However, these would represent only a proportion of the cis-regulatory elements within a cell, with many regulatory sequences located far from genes or CpG islands. To overcome this problem, publicly accessible data have been generated in which chromatin features have been mapped in primary cells, tissues or cell lines, allowing some insights into where cis-regulatory loci are likely to be located when studying a tissue like liver, and improving the ability to interpret changes of functional consequence.

The extent and significance of transcriptome dysregulation in HCV-associated HCC is relatively unknown, but gene expression analyses of HCCs from multiple etiologies show non-random effects, selecting certain genes with associated protein functions and pathway relationships. These include associations between poor survival and signatures related to aberrant cell signalling,58, 59, 60 cell proliferation61, 62, 63, 64 and differentiation,59, 65, 66, 67 inflammation,64 and angiogenesis.58, 63, 68 Transcriptional studies of the pre-neoplastic liver tissue have found altered expression of genes associated with inflammation,69, 70 hepatic progenitor cells71 and pathways involving Wnt/β-catenin72 and hedgehog14 signalling. These results suggest that a study of transcriptional regulatory processes in HCV-associated pre-neoplastic and malignant tissues should show functional coherence in terms of the loci targeted for dysregulation.

In this study, we use patient biopsies from uninfected individuals and from individuals with HCV infection and HCC, allowing us to compare HCV-infected non-neoplastic and HCV-infected malignant samples with control liver samples, creating a representation of the natural course of the disease. We integrated high-resolution DNA methylation studies with transcriptional profiling and cell-type specific reference epigenomic profiles, generating evidence for DNA methylation changes targeted to the binding sites of a group of specific transcription factors. These loci with altered DNA methylation are enriched near genes encoding proteins already associated with HCC but also include a group with stem cell functions. We find the DNA methylation changes to occur predominantly at the pre-neoplastic stage, with continued functional consequences in HCV-infected tumour tissue, and an association with independent polycomb-mediated repression at a subset of loci. Data from The Cancer Genome Atlas (TCGA)73 are used as a replication cohort to validate our findings. The study reveals a relationship between the dysregulation of transcription factor activity, DNA methylation and transcription occurring before malignant transformation, potentially contributing to the field defect that characterizes the chronically HCV-infected liver.

Results

DNA methylation changes in HCV-infected livers precede neoplastic transformation

We tested DNA methylation at ~2 million CpGs throughout the human genome in control (uninfected, non-neoplastic), infected (HCV+, non-neoplastic) and malignant (HCV+, HCC) samples (Supplementary Table S1). The infected and malignant samples were from different sites within the same individual’s liver. To get an overview of the DNA methylation patterns characterizing control, infected and malignant samples, an unsupervised hierarchical clustering was performed of the 5000 loci with the most variable DNA methylation. The DNA methylation patterns revealed seven of the nine malignant samples to have a subset of loci with distinctively decreased DNA methylation (group A, Figure 1). The infected samples also tended to cluster together but were less distinctive from the control samples, with a different subset of loci showing a relatively moderate increase in DNA methylation compared with controls (group B, Figure 1). These results suggested distinct steps in a progression of DNA methylation changes in HCV-infected liver and associated HCC, prompting more detailed analyses.

Figure 1
figure 1

Unsupervised hierarchical clustering of DNA methylation. The 5000 loci with the most variable DNA methylation were used to explore DNA methylation changes in the 30 samples studied. The malignant (black) samples form a major cluster of the three clusters (ac), the malignant (black) samples form a major cluster (a), largely due to the Group A loci that appear strikingly less methylated in the malignant samples, and a less striking distinction of infected from control samples.

A K-means clustering approach was used to identify the subsets of loci with distinctive patterns of DNA methylation change associated with a progression from control to the infected and malignant states. The optimum number of clusters was calculated to be approximately 8 (see Supplementary Methods, Supplementary Figure S1), allowing the grouping of loci into those with no changes of DNA methylation and those with early or later (infection or malignancy-associated) changes (increases and decreases). Genome-wide, the majority of loci tested (69.6%) show no changes in DNA methylation (groups I and V, Supplementary Figure S2), but there is a significant subset of loci with both early and late changes in DNA methylation. The transition from infected to malignant states is associated with subsets of loci making late changes in DNA methylation (7.8% of loci, groups II and VI, Supplementary Figure S2). The analysis also reveals a larger number of loci that make early and sustained changes in DNA methylation (17.1% of loci, groups III, IV and VII, Supplementary Figure S2), and a subset that appears to revert in the malignant samples to the control state (5.5% of loci, group VIII, Supplementary Figure S2).

The changes of DNA methylation were explored further using polytomous regression modelling at the individual CG dinucleotide level. This technique is effective for modelling relative DNA methylation proportions, is robust for detection of linear and non-linear DNA methylation events, and allows for adjustment for potential confounding covariates. A principal components analysis showed that subject age and race were associated with DNA methylation variability (Supplementary Figure S3), prompting the inclusion of these variables in regression models to estimate the adjusted odds of DNA methylation for both infected and malignant samples relative to the control samples for each of the ~2 million assayed CG dinucleotides.

This analysis confirmed the impression of the heat map of Figure 1, that the malignant samples have a greater number of loci with significant loss (16,921) than gain (6894) of DNA methylation relative to control samples (Figure 2a). Linking the identified K-means clustering groups with this restricted set of loci shows the loss of DNA methylation to be a late event, mostly occurring in the transition from the infected to malignant states (groups V–VI, Figure 2b), whereas the smaller subset of loci gaining DNA methylation does so at the earlier, infected stage (groups III–IV, Figure 2b).

Figure 2
figure 2

Identification of subsets if loci with distinctive progressive changes in DNA methylation. Panel (a) shows the comparison between malignant and control samples, with the majority of loci losing DNA methylation (green) with malignant transformation. In panel (b), K-means clustering shows the majority of loci gaining DNA methylation to do so in the infected stage (groups III and IV), but the loci losing DNA methylation only have this occur in the transition from infected to cancer (groups V and VI).

Targeting of DNA methylation changes to specific genomic contexts

Having identified loci with distinctive patterns of change of DNA methylation in the progression to HCV-associated HCC, it was possible to ask whether these changes were targeting certain types of loci in the genome. Genomic annotations were explored that are not cell type-specific (RefSeq gene promoters and bodies and intergenic regions, CpG islands and shores42) as well as liver-specific annotations based on publicly available chromatin immunoprecipitation sequencing (ChIP-seq) data from the NIH Roadmap Epigenomic Mapping Consortium for three normal adult livers (histone H3 lysine 4 trimethylation (H3K4me3), H3K4me1, H3K27ac, H3K36me3 and H3K27me3). The significance of the enrichment or depletion of loci with altered DNA methylation for each annotation was determined using randomized permutation tests.

We studied the loci defined in Figure 2a as having significantly increased or decreased DNA methylation. The loci with increases in DNA methylation are significantly enriched at enhancers, characterized by local enrichment of H3K4me1 and H3K27ac (but not the promoter-associated mark H3K4me3), as well as loci likely to be undergoing active transcription (gene bodies and H3K36me3-enriched loci, Figure 3). In contrast, decreased DNA methylation is only significantly enriched at intergenic regions and not at any candidate regulatory elements (Figure 3). As increasing DNA methylation at enhancer sequences is potentially reflective of local transcription factor binding,30, 32 the permutation analysis was repeated using ChIP-seq data for transcription factor binding sites mapped in the HepG2 HCC cell line as part of the ENCODE project,74 the best available hepatocyte surrogate with extensive transcription factor mapping. No significant associations were observed for loci with loss of DNA methylation, but permutation testing revealed significant enrichment for several transcription factors at loci that gain DNA methylation: FOXA1, FOXA2, HNF4A, MAFK, MAFF, CEBPB and RXRA (Figure 4). No significant changes in levels of expression of these genes were found at alpha=0.05 (Supplementary Figure S4), so this acquisition of DNA methylation is not attributable to a simple model of down-regulation of production of these transcription factors. All seven transcription factors are involved in normal liver development and physiology,75, 76, 77, 78, 79, 80 with CEBPB, RXRA and MAF transcription factors also involved with inflammation and cellular stress.81, 82, 83, 84, 85, 86, 87 The most strongly perturbed FOXA1/2 and HNF4A transcription factors have also been implicated in WNT signalling pathways that promote carcinogenesis and epithelial-to-mesenchymal transition,88, 89, 90, 91 which links the DNA methylation changes to the risk of neoplastic progression.

Figure 3
figure 3

The genomic contexts of loci gaining or losing DNA methylation. In panel (a) we show the observed/expected ratio for overlap of the loci with distinctive DNA methylation with different genomic features. The loci with increasing DNA methylation are enriched at candidate enhancers (with histone H3 lysine 4 monomethylation H3K4me1 and H3K27ac but not co-incident H3K4me3 that would indicate promoter function). These loci are also enriched at regions likely to be transcribed (with H3K36me3 and at RefSeq gene bodies). These enrichments were tested by permutation analyses (b), revealing them to be non-random (black). Loci where DNA methylation is lost during disease progression is enriched at several genomic features (a) but only that for intergenic sequences survives the permutation analysis of significance in panel (b).

Figure 4
figure 4

Overlap of loci with distinctive DNA methylation with sites of transcription factor binding in HepG2 cells. Whereas no loci with decreasing DNA methylation are enriched at transcription factor binding sites (b), those gaining DNA methylation (a) non-randomly (black) overlap the sites of binding of several transcription factors, FOXA2, FOXA1, HNF4A, MAFK, MAFF, CEBPA and RXRA.

Peripheral blood B lymphocytes do not show DNA methylation changes in HCV infection

A question that arises is whether the changes of DNA methylation in the infected, non-malignant samples represent cellular responses to the chronic inflammatory state and are not due to the HCV infection or neoplastic process. To test this, peripheral blood CD19+ B lymphocytes were chosen as a reporter cell type that is not only readily accessible but also appears to be unusually influenced by chronic HCV infection, judging by the known associations of chronic HCV with mixed cryoglobulinemia, rheumatoid factor production and lymphoproliferative disorders including monoclonal gammopathies and lymphomas.92 DNA was isolated from CD19+ B lymphocytes from an independent cohort of 10 chronically HCV-infected patients and 10 HCV-negative controls. The global DNA methylation profiles of the HCV-infected patients were not obviously different to those of uninfected patients (Supplementary Figure S5). Furthermore, after adjusting for multiple comparisons, there were no significant DNA methylation events associated with the inflammatory environment of chronic HCV infection at a false discovery rate (FDR) <0.05. The results do not support the chronic inflammatory process itself inducing DNA methylation changes as a generalized effect in the body.

Enhancer-targeted DNA methylation is associated with decreased gene expression

As the acquisition of DNA methylation at enhancers is potentially associated with altered gene expression,38, 42 RNA-seq was therefore performed on the same samples on which DNA methylation studies had been performed. K-means clustering was used in the analysis of these data, with the optimal number of groups found to be approximately 6, allowing patterns of gene expression changes in infected and malignant samples to be compared with the controls (Supplementary Figure S1). In Figure 5a it is apparent that most genes (groups I and II) do not change levels of expression in infected or malignant cells (n=13 507), but that there are also groups of genes with early (group IV, n=1720) and progressive (group III, n=4621) patterns of increased gene expression, and further genes that progressively decrease expression levels by substantial (pattern VI, n=890) or lesser extents (pattern V, n=4375) (Figure 5a). Overall, more genes were represented by patterns with increased rather than decreased gene expression. A principal components analysis found potential covariates modifying gene expression, allowing subsequent polytomous regression to control for sequencing batch (Supplementary Figure S6). In all, 309 genes were identified to have significantly decreasing expression and only 193 genes with increased expression levels (Figure 5b, Supplementary Tables S2–S3). The 309 genes with significantly decreased expression were tested for proximity to the enhancers found to have increased DNA methylation. A conservative approach was used, linking a candidate enhancer (H3K4me1 positive, H3K4me3 negative) to a gene if the chromatin state was within 5 kb of the gene’s transcription start site, finding a total of 644 genes fitting these criteria. A significant association was found linking enhancers with increased DNA methylation and genes with decreased expression levels (hypergeometric test P=0.002), with no significant association found for the genes with increased expression levels (hypergeometric test P=0.763) (Figure 5c). While influencing only a small proportion of genes, the increased DNA methylation at candidate enhancers is significantly associated with local transcriptional repression.

Figure 5
figure 5

Analysis of gene expression data from the samples tested for DNA methylation changes. In (a) we again use K-means clustering to identify subsets of genes with distinctive progression of expression patterns during disease progression. Groups III and IV represent genes increasing their expression while groups V and VI show a decrease in expression levels. The significantly differentially expressed genes in a comparison of malignant and control samples (b) are mostly accounted for by genes in groups IV and VI. We associated a candidate enhancer (H3K4me1 positive, H3K4me3 negative) to a gene if the chromatin state was within 5 kb of the gene's transcription start site and showed increased DNA methylation in control versus malignant samples, finding a total of 622 genes fitting these criteria. When we tested to see whether these 622 include genes with significantly altered levels of expression, we showed that these genes with increased DNA methylation were significantly enriched for overlap with genes with decreased expression (P=0.002) but not increased expression (c). In (d) we show the results of analysis of matched HCC and infected liver samples from HCV+ individuals, using data from The Cancer Genome Atlas (TCGA). We tested DNA methylation data from TCGA at candidate enhancers where we had identified increased DNA methylation and found the studies from TCGA to reveal a significant increase of DNA methylation in their samples also. We also compared the genes where we had found significant changes in levels of expression, showing the genes with decreased expression also to have significantly lower levels in TCGA samples, but no significant changes in the levels of expression of genes that we had found to be upregulated with disease progression.

Validation of findings using TCGA data

The Cancer Genome Atlas (TCGA) includes HCC as one of its studied tumour types, and includes both DNA methylation and expression studies,73 testing the HCC tumour itself plus adjacent liver, thereby mimicking the malignant and infected samples of the current study in its HCV-related HCC tumour cohort (n=5). We compared our results with those from these TCGA individuals, comparing their DNA methylation profiles between infected and malignant samples at the loci identified in the current study to have increased DNA methylation at enhancers. In Figure 5d we show that these loci are also significantly increased in DNA methylation in the TCGA subjects. When we focused on the genes identified to be significantly altered in expression in our study (Figure 5a), the 193 genes with increased expression do not show significant differences in expression in TCGA data, but the 309 genes with decreased expression are also significantly decreased in expression in TCGA subjects (Figure 5d). We conclude that the changes in DNA methylation at enhancers and the downregulated genes that we find in the current study are robust and high confidence findings.

Epigenetic and transcriptional dysregulation target genes associated with liver cancer and stem cell functions

To understand the potential cellular consequences of altering transcription levels and dysregulating genes by DNA methylation changes, Gene Set Enrichment Analysis93 was performed focusing on a comparison between control and malignant samples. Three lists of genes were generated – the 309 downregulated and the 193 upregulated genes, and the 644 genes where increased DNA methylation was found at H3K4me1-defined enhancers within 5 kb of the transcription start site (Supplementary Table S4). All three gene lists showed concordance for enrichment in gene sets related to altered expression in liver cancer, fetal liver expression and polycomb repressive complex 2 (PRC2) targeting in ES cells (Supplementary Figure S7). Interestingly, two of the gene sets were published as targets of the HNF4A and FOXA2 transcription factors in liver,94, 95 supporting the possibility illustrated in Figure 4 that the regulation exerted by these transcription factors is specifically targeted in HCV infection and HCC. Because we found pathway-enrichment for PRC2 gene targets associated H3K27me3 in embryonic stem cells, the pathway analysis for 644 genes with H3K4me1-defined enhancers in normal adult liver was stratified by the presence of H3K4me1 and H3K27me3 in HepG2, using publicly available ChIP-seq data. Of the 644 genes defined to have increasing DNA methylation at promoter-proximal enhancers, 326 retained those enhancers in HepG2 cells, but another 137 added the repressive H3K27me3 mark at those enhancers, while 63 genes gained H3K27me3 and lost the H3K4me1 mark altogether. This sub-categorization allowed further insights – the genes that do not gain the H3K27me3 mark are more likely to represent those previously associated with liver and other cancers, while those that do gain the H3K27me3 modification represent known targets of polycomb and other transcriptional regulators, especially in stem cells and early development (Figure 6a). When the CG dinucleotide content of the enhancers in each sub-category was tested, it became strikingly apparent that the acquisition of H3K27me3 and the loss of H3K4me1 were positively associated with increased CG content (Supplementary Figure S8). The cancer gene signature is therefore associated with acquisition of DNA methylation at CG-depleted loci, and the stem cell signature with acquisition of DNA methylation and H3K27me3 at CG-dense loci.

Figure 6
figure 6

Functional properties of genes targeted for DNA methylation and transcriptional changes. We focus in (a) on the 644 genes with gain of DNA methylation during disease progression at promoter-proximal enhancers, but breaking down the associated pathways depending on whether the hypermethylated enhancer loci with H3K4me1 in normal liver continued to have this mark in the HepG2 HCC cells, and whether the repressive H3K27me3 polycomb mark is added at those sites in HepG2 cells. We show that the associated functional pathways are segregated into two groups (red boxes) when these patterns of change of histone modifications are studied. Genes acquiring DNA methylation at nearby H3K4me1-defined enhancers that do not lose this mark or gain polycomb repression are those associated with cancers in general and liver cancers in particular. The genes that acquire H3K27me3 are those encoding transcription factors and with stem cell properties. In (b) we show a result from our SMITE analysis based on integration of transcriptional information with genomic context-dependent DNA methylation changes. The module shown is highly enriched for proteins involved in WNT signalling, consistent with the targeting of genomic dysregulation to genes with stem cell functions.

We then tested how integration of the transcriptional, DNA methylation and histone modification information could increase our power to detect pathway-level events in an unbiased manner. We applied our recently developed SMITE integrative network approach (manuscript in review) to derive gene scores using gene expression differences combined with DNA methylation measurements at gene promoters, gene bodies and gene-associated H3K4me1 peaks. By testing for high-scoring sub-networks within a gene network, we identified 12 genomic modules, each with multiple interacting genes enriched for transcriptomic and epigenetic dysregulation. Using Kyoto Encyclopedia of Genes and Genomes pathway information, genes within each module were annotated to determine the likely functions of the modules. Out of the 12 genomic modules, 10 strongly associate with Kyoto Encyclopedia of Genes and Genomes annotations: seven related to liver metabolism, and one each to the cell cycle, immune system processes and to WNT signalling (Supplementary Table S5). Specifically, a module built around telomerase reverse transcriptase (TERT), transducin-Like enhancer of split (TLE2/4), pygopus family PHD fingers (PYGO1/2) included 10 WNT signalling-enriched genes, consistent with the acquisition of a stem cell-like phenotype in the HCV-related HCC genome (Figure 6b).

Polycomb inhibition suppresses cell growth but does not influence DNA methylation

We were interested in whether the increased DNA methylation at CG-dense enhancers that acquire H3K27me3 was associated with a decreased gene expression independently of the presence of H3K27me3. To test the dependence of gene expression on the presence of H3K27me3, we treated HepG2 cells with a PRC2-EZH2 inhibitor, GSK343,96 and found that there was a decrease in cellular proliferation (Supplementary Figure S9), a finding consistent with a prior study involving a shRNA-mediated knock-down of EZH2.97 We used ENCODE data to identify loci where H3K27me3 was enriched in HepG2 cells and compared enrichment at these loci with negative control loci using quantitative PCR of chromatin immunoprecipitated DNA (qChIP). We showed that the GSK343 treatment of the HepG2 cells depleted H3K27me3, as expected. When we then tested DNA methylation at these loci using bisulphite MassArray98 and local gene expression using qRT-PCR, we found both to be unaltered in the presence of local depletion of H3K27me3 (Supplementary Figure S9). The results indicate that, at least at the loci tested, the increased DNA methylation and transcriptional repression are not dependent upon the presence of polycomb-mediated H3K27me3 repressive marks, but that polycomb inhibition exerts repressive effects upon the proliferation of the HepG2 HCC cell line.

Discussion

Our results show an intriguing pattern of acquisition of DNA methylation in HCV-infected livers prior to malignant transformation, at which time a second wave of global demethylation occurs. Whereas the global demethylation associated with malignancy is enriched at un-annotated intergenic sequences, the pre-neoplastic, infection-associated acquisition of DNA methylation is found at candidate cis-regulators of transcription. This pattern of global hypomethylation and local hypermethylation is considered typical of oncogenesis.99 Prior studies have also found acquisition of DNA methylation at promoters and other cis-regulatory elements in liver prior to the development of HCC,22, 23, 100, 101, 102 but were focused on individual loci and did not perform the genome-wide survey of our study. We defined the loci regulating transcription using fixed annotations (RefSeq gene promoters, CpG islands and shores) as well as tissue-specific annotations from ChIP-seq data of histone modifications and sites of transcription factor binding in primary liver samples or the HepG2 cell line. Our results support a model of interaction between transcription factor binding and local DNA methylation, with a possible model involving a primary effect on transcription factor binding resulting in DNA methylation changes as the downstream effect, similar to the processes observed during normal cell differentiation.30, 32 Transcriptional data from the same samples links the alterations in DNA methylation at cis-regulatory sites with local gene expression changes, and that the genes affected are those previously found to be associated with liver cancers as well as genes involved in hepatic and other types of tissue development. Furthermore, we show that the acquisition of the H3K27me3 repression-associated histone modification is enriched at loci where candidate enhancers (H3K4me1 positive) are lost in the HepG2 HCC cell line compared with normal liver.

The major goal of this study was to gain insights into events that create an organ-wide susceptibility to neoplastic transformation, reflected by tumour multifocality in the majority of individuals with HCC.7 We have previously found that these kinds of tumour precursor states are associated with epigenetic dysregulation, in monoclonal gammopathy of undetermined significance as a precursor to multiple myeloma,103 and in Barrett’s esophagus as a field defect preceding esophageal adenocarcinoma.104 A mouse model of colitis that predisposes to neoplasia also showed such early epigenetic events preceding neoplasia.105 The current study used the HCV-infected state to represent the pre-neoplastic hepatocytes in which a field defect allows multifocal HCC to develop. Unlike previous studies in which DNA methylation was compared between the HCC and adjacent liver,25, 106, 107, 108, 109 we added uninfected liver as a control, and restricted our study to HCV-infected individuals, with the assumption that tissue adjacent to the HCC does not, in fact, represent normal liver. By creating three stages representing disease progression, we could identify the early (infection associated) and late (malignancy associated) epigenetic and transcriptional events. The epigenetic changes included gains and losses of DNA methylation, with the losses associated with malignant transformation occurring at intergenic regions of uncertain functional significance, but the gains in DNA methylation taking place earlier, in the infected, pre-neoplastic state, and at cis-regulatory elements. The genes near the candidate enhancers identified to increase DNA methylation show a decrease in expression levels, with both epigenetic and transcriptional findings replicated in an independent TCGA cohort.

In pursuing the hypothesis that gains in DNA methylation are due to the lack of local binding of transcription factors,110, 111 we were able to implicate a group of transcription factors that have been implicated in WNT signalling and cancer. Why these transcription factors would fail to target their binding sites in infected cells is not known, but as the levels of expression of the genes encoding these specific transcription factors do not change, the amount of transcription factor is likely to be less influential than some qualitative property of these transcription factors or of the chromatin at these binding sites. As might be expected for genes regulated by specific transcription factors, the effect of increased DNA methylation is quite specific in terms of the genes selected for dysregulation, enriching for genes already implicated in liver cancer, but also a set of genes associated with polycomb targeting in stem cells or early development. The enhancers at these two groups of genes have different patterns of polycomb-mediated influences and DNA methylation in HepG2 cells, with the cancer genes acquiring just DNA methylation, but the developmentally important genes also acquiring the repressive H3K27me3 mark. Sequential ChIP-bisulphite studies have revealed a complex relationship between DNA methylation and H3K27me3, with mutual exclusivity of the two marks in CG dinucleotide-dense regions, but their co-localization in the remainder of the genome112, 113 and perturbations of this relationship in cancer.112 Biochemically, the polycomb complex protein EZH2 has been found to associate with DNA methyltransferases,114 but recruitment by EZH2 of DNA methyltransferase3a to a specific site has been found to be insufficient to cause local DNA methylation.115 Our use of pharmacological inhibition of the polycomb protein EZH2 did not affect DNA methylation at sites of depletion of H3K27me3, indicating that the acquisition of DNA methylation is not dependent upon the presence of polycomb-mediated repressive mechanisms locally.

For an epigenome-wide approach to show changes in one group of samples and not another, the changes need to be recurring at the same loci in multiple individuals, and not randomly within the genome. This consistency of location of changes is of interest from a mechanistic standpoint – in cancer it could certainly reflect a primary event of changes at random locations followed by selection of cells with patterns favouring uncontrolled cell division. However, the alternative hypothesis is that the changes occur at specific loci because sequence-specific transcriptional regulators are altering their functions. In the human genome, sequence specificity of transcriptional regulation is best recognized to be mediated by transcription factors, which bind to definable DNA sequences. It has been proposed that many events currently referred to as ‘epigenetic’ are, in fact, due to primary effects mediated by transcription factors,116, 117 allowing the frequently studied epigenetic regulator DNA methylation to be viewed as a way of ‘footprinting’ where transcription factor binding is occurring in the genome.15, 30, 32 It has been recognized for some time that HCV interacts with host transcription factors, the HCV NS5A protein binding to the host transcription factors SRCAP,118 TBP and p53,119 and the virus affecting the subcellular localization and function of forkhead transcription factors through alteration of post-translational modifications.120 In the current study, the results are consistent with the HCV infection influencing the DNA binding of forkhead transcription factors (FOXA1, FOXA2) as well as HNF4A, MAFK, MAFF, CEBPB and RXRA, causing the loci normally bound by these transcription factors to acquire DNA methylation in infected cells. We find that the changes in DNA methylation are enriched at genes with pre-neoplastic transcriptional changes, and that independent samples from TCGA reveal similar patterns of DNA methylation and gene expression changes. Our model is therefore one of transcription factors having a primary role in mediating the transcriptional regulatory reprogramming, including epigenetic regulation, that occurs in neoplastic transformation.

Our parallel study of B lymphocytes indicates that the transcriptional dysregulatory events are not constitutive throughout the body and reflective of a generally inflammatory state, but instead occur specifically in the cell type infected by HCV. B lymphocytes should have represented a cell type with a higher likelihood of being informative for HCV-associated cellular perturbations, given the unusually high frequency of a spectrum of B cell lymphoproliferative events associated with HCV infection.92 The easy accessibility of these cells from peripheral blood also raised the possibility that they could serve a role as biomarkers of events occurring in the liver. Instead, we see no significant DNA methylation or transcriptional changes characterizing the B lymphocytes from HCV-infected individuals, indicating that the transcriptional regulatory effects observed in liver are more likely to be due to the HCV infection in their host cells and not to the chronic inflammatory process.

A limitation of our study is that the extensive transcription factor mapping by ChIP-seq is only available from HepG2 cells, which are derived from an HCC, and as such do not fully represent normal hepatocytes. However, the problem likely to arise when using these tumour-derived cells (as opposed to normal hepatocytes) is likely to be one of sensitivity – if tumour-associated changes have also taken place in HepG2, a comparison with uninfected, normal hepatocytes would probably have greater potential to find transcription factors enriched at sites found to be differentially methylated in our study. The ChIP-seq mapping data for H3K4me1 (candidate enhancers) and H3K27me3 (polycomb-mediated repressive chromatin) allowed us to exploit data from normal liver (Roadmap Epigenomics data), with HepG2 (ENCODE data) this time serving as an HCC model, allowing us to group the genes we found to be affected by DNA methylation and transcriptional changes. We find that the H3K4me1 enhancers in normal liver that gain DNA methylation in pre-neoplastic liver can continue to have this chromatin mark in HCC, or can acquire H3K27me3 at the same loci (to create ‘poised’ enhancers121, 122) or can lose the H3K4me1 mark altogether, with only H3K27me3 remaining. The first group, in which the enhancers acquire DNA methylation but remain free of polycomb-mediated inactivation, is enriched for genes involved in liver and other cancers, while those that acquire the polycomb-mediated suppressive H3K27me3 mark are known targets of polycomb in stem cells and genes expressed in tissue progenitor cells. These links between polycomb and cell lineage commitment123 and with targeting of DNA methylation in cancer50, 51, 52 have been appreciated for some time. Our results suggest that the polycomb stem cell mark is targeted to a subset of genes undergoing DNA methylation changes in neoplastic transformation, and is not responsible for all targeting of DNA methylation observed.

The role of mutations is not addressed in the current study. However, there is little prior evidence for mutations occurring in infected, non-neoplastic hepatocytes,13, 124, 125 suggesting that the driver mutations for HCC occur during and are responsible for the transition from the pre-neoplastic to the malignant stage. When HCC arises simultaneously in different parts of the infected liver, two potential mechanisms are possible, one in which a primary single tumour gives rise to intrahepatic metastases, the other in which there are multiple independent malignant transformation events. The latter model suggests a field defect throughout the liver, with current evidence suggesting that a substantial proportion of cases of multicentric HCC is due to such independent events (reviewed previously126). The intriguing model that arises is that some mutations may only be capable of promoting neoplastic transformation when the transcriptional regulatory and epigenetic properties of the cell have already been altered as we find in the infected hepatocytes. This finding opens up the possibility that the use of pharmacological agents targeting mediators of epigenetic repression may be worth exploring as chemoprevention in individuals during the long latency period of chronic HCV infection.

Materials and methods

Patient tissue samples for DNA methylation and gene expression studies

This study was approved by the institutional review board of the Montefiore Medical Center and the Committee on Clinical Investigation at the Albert Einstein College of Medicine and is in accordance with Health Insurance Portability and Accountability Act regulations. Written informed consent was obtained from all subjects prior to participation. We obtained liver biopsies from the Montefiore-Einstein Liver Center Hepatic Tissue and Serum Repository taken from control patients with non-virally (HCV, HBV and HIV) infected non-cancerous liver tissue and cases with HCV-infected HCC, from which we obtained paired tumour and non-tumour tissues. All cases were sex, race and approximately age (±15 years) matched to controls. After retrospective chart reviews, we reclassified two samples resulting in 11 HCV-uninfected non-neoplastic (controls), 10 HCV+ non-neoplastic (infected) and 9 HCV+ HCC (malignant) samples.

Patient blood samples for DNA methylation studies

We obtained 8–12 ml of whole blood from 10 HCV positive patients and 10 age, race and gender matched HCV negative control patients attending Gastroenterology and Liver Disease clinics at Montefiore Medical Center. We isolated lymphocytes through negative selection using immunomagnetic beads and magnetic columns (Miltenyi Biotec, Bergisch Gladbach, Germany).

Nucleic acid preparation

For each of the 30 biopsies, we divided the tissue into two portions for separate DNA and RNA extraction. For all peripheral lymphocytes only genomic DNA was extracted. Genomic DNA was extracted using proteinase K digestion, phenol–chloroform extraction, dialysis against 0.2X SSC and concentration by surrounding the dialysis bag with PEG 20 000 to reduce water content by osmosis. Total RNA was extracted using TRIzol. The quality and integrity of the RNA were measured using NanoDrop spectrophotometry and Bioanalyzer (Agilent, Santa Clara, CA, USA).

HELP-tagging assay and quantification of genome-wide DNA methylation

As previously described,127 we used 1 μg of liver tissue DNA and 200 ng of lymphocyte DNA to generate HELP-tagging libraries. Genomic DNA was digested with either MspI or HpaII at 37 °C overnight, purified and ligated indexed Illumina adapters containing the T7 promoter sequence, as well as the EcoP15I recognition site (AE adapters). After ligation, the DNA samples were digested with EcoP15I at 37 °C overnight, end-filled, 3′ terminal A extended and ligated to the second Illumina adapter (AS adapter). Samples were then in vitro transcribed using the MEGAshort kit (Ambion, Thermo Fisher Scientific, Waltham, MA, USA), followed by retrotranscription (SuperScriptIII kit, Invitrogen, Carlsbad, CA, USA) before amplification. Libraries were multiplexed for 50 bp single-end sequencing on the Illumina HiSeq 2000 platform. For each sample, sequencing read counts were obtained at HpaII sites throughout the genome and compared to the previously generated MspI human reference. The read counts were log transformed and normalized by calculating Z scores before taking the ratio of the normalized HpaII count to normalized MspI count at each assayed HpaII site. The ratios are approximately Cauchy distributed, so the cumulative distribution function of the Cauchy distribution was used to obtain DNA methylation probabilities. Finally, mixture modelling of the distribution assuming three composite distributions was performed to scale the probabilities between 0 and 1. Low confidence CG sites were identified as sites with no reads in the HpaII channel and fewer than five reads in the MspI channel, and they were subsequently removed from the analysis.

Directional whole transcriptome RNA-sequencing assay and gene expression quantification

DNase-treated, rRNA-depleted (Ribozero, Epicentre, Madison, WI, USA) RNA was used as a template for SuperScript III first-strand cDNA synthesis (Invitrogen), using oligo-dT as well as random hexamers. Actinomycin D was added to the reaction to prevent any possible amplification from contaminating genomic DNA. During second-strand synthesis, a dU/VTP mix was used to create directional libraries. cDNA samples were fragmented with Covaris to 300-bp fragments. The samples were then end-filled, 3′ terminal A extended and ligated to pre-annealed TruSeq-indexed Illumina adapters. Uracil-DNAglycosylase treatment preceded the PCR reaction to amplify exclusively the originally oriented transcripts. Libraries were amplified using P5 and P7 Illumina primers and gel-extracted for size selection and primer-dimer removal. Before sequencing, libraries were tested using the BioAnalyzer to assure library quality, in terms of size and primer-dimer depletion. Indexed libraries were multiplexed for 100 bp single-end sequencing on the Illumina HiSeq 2000 platform. For each sample, we obtained FASTQ files of past filter reads and aligned them to a composite reference exome of human (hg19), mirBase and HCV using GSNAP. Next, alignments were associated with genes using HTSeq. We scaled gene counts using DESeq implemented in R to estimate effective library size.

Unsupervised K-means clustering to determine summary patterning

For each HpaII locus and each gene, the average DNA methylation and gene expression, respectively, for each tissue type was calculated and used to compute the natural log-ratio of infected and malignant samples relative to the control samples. To identify patterns in DNA methylation and expression relative to tissue type, we used K-means clustering, an unsupervised clustering approach, available through the kmeans function in R. We determined an optimal number of clusters by plotting the total within cluster sums of squares against the number of clusters and selecting a cluster value that occurred at an inflection point.

Statistical method for DNA methylation and gene expression analysis

We used principal components analysis to identify biological and technical covariates that were associated with the variability in DNA methylation and gene expression, separately. Covariates found to be significantly associated with DNA methylation and gene expression were controlled for downstream analyses by inclusion in linear models to obtain covariate-adjusted estimates. For robust data analysis and visualization, we employed batch correction implemented through the ComBat package in R to obtain covariate adjusted DNA methylation values. Infected and malignant samples show more intrasample variability than intersample variability (Figure 1), so they were considered independent cohorts in all analyses. For liver tissue, we performed polytomous regression, a modelling technique that is suited for analysing DNA methylation proportions and log gene counts and allows simultaneous assessment of the odds of DNA methylation or expression for infected and malignant tissues relative to control, in R using the multinom function available through the nnet package. Odds of DNA methylation was adjusted for age and race, while odds of gene expression was adjusted for sequencing batch. Overall model significance was assessed by adjustment for multiple comparisons using a false discovery rate controlled at alpha=0.05. Because the overall model significance does not indicate which comparison between control, infected and malignant samples is driving the significance, we additionally use two criteria for further determination of models: (a) control compared with malignant effect significance <0.05 and (b) an average effect difference between control and malignant samples of 10%. For lymphocyte DNA methylation, we used logistic regression predicting the odds of methylation during chronic HCV infection relative to uninfected samples.

Genomic context analysis

Using the UCSC Genome Browser, we obtained the hg19 RefSeq gene annotation track and the predicted CpG island track. Gene promoters were defined as the 2 kb region flanking the gene transcription start site, while gene bodies were defined as the end of the gene promoter to the transcription end site. All other genome sequence was defined as intergenic. The 2 kb regions flanking CpG Islands were defined as CpG island shores.42 We obtained publically available ChIP-seq datasets from the ROADMAP Epigenomics project for normal adult liver corresponding to four histone marks: H3K4me3, H3K4me1, H3K36me3 and H3K27me3. For each histone mark, we obtained hg19 aligned reads from three individuals and called peaks using MACS2.128 We only considered true peaks to be those occurring in at least two of the samples. For transcription factor binding sites, H3K4me3 and H3K27me3 mapping data from the HepG2 HCC cell line, we obtained publically available ChIP-seq datasets from the ENCODE project. The peaks for the HepG2 histone methylation datasets were already called relative to input in a pooled replicate approach, mirroring the approach we used for the liver studies. For relative enrichment studies, the observed overlap between loci of interest and a particular annotation was compared with the expected overlap given the total coverage of the annotation. For permutation tests, we compared the observed overlap of significant HpaII sites and a particular genomic interval (for example gene promoters, CpG islands) with distribution of the same overlap under the null hypothesis. To obtain the null distribution, we employed a random sampling approach where we randomly sampled HpaII sites 100 times from the population of all assayed HpaII sites, effectively permuting the HpaII sites. Thus, with loci of interest, n, and an observed overlap with a particular genomic interval, X, we sampled n loci from all HpaII sites and found the overlap of the random sample with the genomic interval, Yk, for k in 1, 2, …, 100 (that is 100 random samples). Next, we compared X with the distribution of simulated annotation overlaps, Y1, 2, …, 100. If the resulting null distribution, Y1, 2, …, 100, contains the observed overlap, X, then we can conclude that there is no significant enrichment. Conversely, when the null distribution, Y1, 2, …, 100, excludes the observed overlap, X, then we can conclude that there is significant enrichment beyond that of random chance.

Gene Set Enrichment Analysis

Using the Gene Set Enrichment Analysis tool from the Broad Institute, we were able to cross-reference our identified genes of interest to a database of common pathways, the Molecular Signatures Database v 4.0 (MSigDB). The MSigDB contains curated gene sets assembling online pathway databases, like kyoto encyclopedia of genes and genomes and REACTOME, publications from PubMed and domain experts. Additionally, the Gene Set Enrichment Analysis tool has catalogued all published genome-wide datasets pertaining to HCC. The overlap between our genes of interest and known gene pathways was analysed using a hypergeometric distribution with a false discovery rate used to correct for multiple hypothesis testing.

Verification

We verified methylation results using Sequenom’s MassARRAY (Sequenom, San Diego, CA, USA) in six controls, five infected and four malignant samples at 24 different loci (Supplementary Figure S10). Using the UCSF MethPrimer tool, we designed primers to amplify bisulphite converted target DNA sequences. BiSearch was used to avoid primer sets that generate off-target amplicons. Following Sequenom’s MALDI-TOF Mass Spectrometry, matched peak data was exported using EpiTYPER software and analysed for quality and DNA methylation. We verified gene expression using qRT-PCR of 10 genes in seven controls, five infected and five malignant samples (Supplementary Figure S10). We also show representative loci demonstrating the sites and the distributions of DNA methylation changes in Supplementary Figure S11.

Validation using TCGA data

Using TCGA datasets for HCC, we identified five samples with HCV infection listed as the primary reason for HCC. All five samples had Infinium HumanMethylation450 Methylation microarray and RNA-seq gene expression data available. For DNA methylation and RNA-seq we plotted the density of signal for all genes and our genes of interest.

Polycomb inhibition using the EZH2 inhibitor GSK343

We obtained GSK343, an EZH2 methyltransferase inhibitor, through Santa Cruz Biotechnology, Inc. (Santa Cruz, CA, USA; sc-397025). GSK343 was dissolved in DMSO at a stock concentration of 1.5 mM. HepG2 cells were maintained in Eagle’s Minimum Essential Medium supplemented with 10% fetal bovine serum, streptomycin (100 μg/mL), and penicillin (100 units/mL). A WST-1 assay (Clontech, Takara Bio USA Inc., Mountain View, CA, USA) was performed following the manufacturer’s recommendations. HepG2 cells for control and GSK343 treatment groups were seeded in triplicate in 96 well plates. After 24, 48, 72 and 96 h of treatment with either 5 μM GSK343 or a 0.1% DMSO vehicle control, absorbance was measured after 2 h of incubation with WST-1 to quantify the net metabolic activity of the cells in culture.

For control and treatment group HepG2 cells, chromatin and nucleic acids were isolated at 72 h. Using the primers in Supplementary Table S3, candidate genes were analysed for gene expression, and gene-associated candidate regions were analysed for the presence of DNA methylation and H3K27me3. Gene expression studies were performed using qRT-PCR with intron-spanning primers designed for PCSK6, SLC27A2 and CYP2A7. We used SYBR Green (Roche 4707516001) with the recommended settings for the LightCycler 480. For DNA methylation studies, 1 μg of DNA was bisulphite treated (Zymo Research, Irvine, CA, USA; D5005) and for each gene, regions overlapping informative HpaII sites were chosen for amplification and analysis with MassArray (Sequenom). For histone studies, native ChIP was performed with immunoprecipitation by antibodies against H3K27me3 (Fisher, Waltham, MA, USA; 07449MI) and rabbit IgG. The relative amount of H3K27me3 was quantified using qRT-PCR for the same regions assayed for DNA methylation in PCSK6, SLC27A2 and CYP2A7. All assays were performed in triplicate.

Data and code sharing

All custom code used is available at https://github.com/GreallyLab/HCV_Wijetunga_et_al_2016.

All data have been uploaded to the Gene Expression Omnibus under Accession Number GSE82178 at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE82178.

Private reviewer link: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=gbqnqyuahzghjaj&acc=GSE82178.