The construction of intrahepatic cholangiocarcinoma model in zebrafish

Intrahepatic cholangiocarcinoma (ICC) is a highly malignant tumor, difficult to diagnose even at an early stage. In this study, we successfully constructed an nras 61K-induced ICC model in zebrafish. Transcriptome analysis and gene set enrichment analysis (GSEA) of liver samples of the ICC and WT (wild-type) zebrafish revealed that the genes differentially expressed between the two groups were mainly involved in focal adhesion, chemokine signaling and metabolic pathways. Analysis of DNA methylomes revealed that compared with WT samples, methylated genes in ICC samples were enriched in functions associated with cellular, single-organism and metabolic processes. In particular, our result discovered eleven potential biomarker genes of ICC which were conserved between zebrafish and humans. Moreover, three potential biomarker genes were hypomethylated in the tumorigenesis of ICC: ehf, epha4 and itgb6. In summary, our study provides a comprehensive analysis of molecular mechanisms accompanying the progressive nras 61K-induced ICC. This work indicates that our transgenic zebrafish could be a valuable model, not only for studying liver cancer, but also for exploring new therapeutic targets.

example, hypermethylation in promoter regions of tumor suppressor genes often results in transcriptional silencing of those genes, which then drives the cancer initiation 20 . In addition, DNA methylation changes in other gene body regions in oncogenes may also contribute to the pathogenesis of cancer 21 . Abnormal DNA methylation frequently appears in various cancers. With regards to the liver cancer research, studies revealed massive epigenetic alternations in HCC, indicating that deregulation of DNA methylation plays an important role in tumorigenesis and metastasis 22,23 .
In this study, we successfully constructed an ICC model in zebrafish using the liver-specific fabp10 (fatty acid binding protein 10) promoter to overexpress oncogenic nras 61K specifically in the transgenic zebrafish liver, evidenced by histological diagnosis. Moreover, we utilized this zebrafish ICC model to investigate genome-wide DNA methylomes in reference to normal zebrafish. Our study provided a comprehensive analysis of molecular mechanisms during the progressive nras 61K -induced ICC. This work indicates that our transgenic zebrafish could be a valuable model, not only for studying liver cancer but also for exploring new therapeutic targets.

Results
Generation of Tg(fabp10:nras 61K ) transgenic zebrafish. We constructed Tg(fabp10:nras 61K ) transgenic zebrafish, in which the gene expression was driven under the control of the liver-specific fabp10 promoter ( Supplementary Fig. 1A). To verify if Tg(fabp10:nras 61K ) transgenic zebrafish was successful, we examined the fluorescence of two-day-old larvae by inspecting the green fluorescence of the GFP gene in the livers of F 0 fish. Western blot analyses confirmed that the fusion protein Nras 61K -GFP could be detected in the ICC samples, but not WT ( Supplementary Fig. 1B,C). Two founders transmitted the transgene to their progenies (F 1 ) and carried the transgene insertions in germ cells. In addition, to identify the integration locus, we sequenced the 5′ flanking fragment of expression cassettes in two lines. The integration site of line 1 was idntified at the loci 27630228 to 27631132 of the chromosome 15, while the integration site of line 2 was idntified at the loci 55341075 to 55341374 of the chromosome 20. Both of the two integration loci of cassettes are not tagged genes or introns of genes.
Moreover, to validate ICC phenotype, ICC tumor marker called CK19, was monitored in Tg(fabp10:nras 61K ) zebrafish livers. Strong positive signaling was observed using IHC (Fig. 3C,D). In addition, since Ras signals through MEK and ERK proteins via phosphorylation, we have identified the protein expression levels of phospho-MEK1/2 and phospho-ERK in WT and in Tg(fabp10:nras 61K ) transgenic zebrafish livers. The IHC of liver sections showed apparently enhanced cytoplasmic and nuclear staining of phospho-MEK1/2 and phospho-ERK in ICC ( Fig. 3E-H). Furthermore, in vitro study revealed that the green fluorescent protein (GFP) could not activate the phosphorylation of MEK and ERK ( Supplementary Fig. 2).
Transcriptomics of Tg(fabp10:nras 61K ) liver tumorigenesis. For the purpose of investigating the genes involved in nras 61K -induced ICC, we analysed the global gene expression in Tg(fabp10:nras 61K ) transgenic zebrafish livers. By applying the data filtering criteria (the adaptor sequences; unknown bases more than 10%; the percentage of no more than Q 5 bases is over 50% in a read) have been filtered, we obtained 48.6 million 90-bp sequence reads per sample. Nevertheless, approximately 34.6 in WT and 34.9 million reads in Tg(fabp10:nras 61K ) transgenic zebrafish belonged to the zebrafish genome by analysis as described by Bowtie 24 along with enrichment analysis with Noiseq. 25 . 661 down-regulated-and 1214 up-regulated-differentially expressed genes in Tg(fabp10:nras 61K ) transgenic zebrafish were observed ( Fig. 4A and Supplementary Table 1). Functions of different gene expressions in the ICC were involved in metabolic, focal adhesion, and chemokine signaling pathways (Fig. 4B).
Moreover, Gene Set Enrichment Analysis (GSEA) was performed to analyze the similarities among the gene-expression profiles from the Tg(fabp10:nras 61K ) transgenic zebrafish and two previous studies reported by the Obama 26 and Jinawath 27 groups. Both studies determined gene-expression profiles in tumor samples from patients with intrahepatic cholangiocarcinoma (n = 25 for Obama group; n = 40 for Jinawath group). Our results revealed that three upregulated genes and 24 downregulated genes derived from the Tg(fabp10:nras 61K ) transgenic zebrafish model showed similarity with the significantly enriched gene sets in these two studies (Supplementary Table 2). In addition, previous studies showed that 25 significantly mutated genes 28 and top 42 up-regulated genes 29 in ICC patients are likely to be good candidates for ICC markers. In this study, eleven similar up-regulated markers were identified (log 2 fold change ≥1 and P ≤ 0.05) and validated using the RT-qPCR (Table 2 and Supplementary Fig. 3), suggesting that the liver morphological changes are in connection with the alterations of gene expression.  760.5 M raw reads were produced. After removing low-quality and clonal reads, we obtained 608.3 M effective reads, and the sequence yield for final analysis was 54.3 gigabase pairs (Gb), covering 90.75% of all cytosines in the genome with an average depth of 22.77 per strand. Initially we observed overall genome-wide methylation levels of 72.49% at CG, 0.54% at CHG and 0.51% at CHH sites (H = A, C or T), indicating higher CG methylation than non-CG methylation. The overall CG methylation level of transgenic zebrafish was lower than that in WT zebrafish (75.82%), but it was still considered to be relatively high. Methylation distribution displayed the similar bimodal (6.37% hypo (<20%) methylated, 62.2% hyper (>80%) methylated) methylation pattern to the WT zebrafish liver cells (5.47% and 62.5%, respectively). This indicates that there was no bias introduced by the whole-genome approach used here. The heat map below presents the distribution of methylation levels in each genomic feature of the liver samples of Tg(fabp10:nras 61K ) transgenic zebrafish (Fig. 5). As shown in Fig. 5., the thin black lines within each heat map denote the median methylation level of CGs at the given local density. Our result show different median methylation level of CGs in the non-coding RNA sequences, exons and repeat elements of Tg(fabp10:nras 61K ) transgenic zebrafish compared with WT, suggesting it may have a role in the tumorigenesis. Furthermore, all transcriptional units were divided into six distinct functional elements to demonstrate the changes of methylation level. Our result indicate that two discrete switch over zones, upstream of the TSS and the exon 1, demarcate the transition from hypo-to hypermethylation in the inverse relationship between promoter and gene-body methylation and expression both in liver samples of Tg(fabp10:nras 61K ) transgenic and WT zebrafish (data not shown).
To functionally categorize the methylated genes of Tg(fabp10:nras 61K ) transgenic and normal samples, we performed the BGI WEGO (Web Gene Ontology Annotation Plotting) analysis, which revealed significant differences (Fig. 6). Compared with normal samples, methylated genes in Tg(fabp10:nras 61K ) transgenic zebrafish were enriched in cellular component and binding activities. As for biological processes and cellular component, they tend to be enriched in functions associated with cellular process, single-organism process and metabolic process (Fig. 6).   down-regulated at transcription level, while 924 genes remained unchanged. On the contrary, when the CG methylation was down-regulated, the number of up-regulated, down-regulated and unchanged genes at transcription level were 406, 182 and 4011, respectively ( Supplementary Fig. 4). Particularly, the methylation level of 11 obviously up-regulated potential ICC marker genes mentioned above was investigated and the result revealed that three genes were changed, including ehf, epha4 and itgb6 (Table 3). Both epha4 and itgb6 have one specific CpG site, while ehf has two specific CpG sites. Moreover, we used bisulfite sequencing PCR to assess the methylation status of the CpG islands in these genes, and found that these CpG islands were hypomethylated (Supplementary Table 3).

Discussion
Intrahepatic cholangiocarcinoma is a highly malignant liver cancer that affects 5-10% of all primary liver cancers. ICC arises in the epithelial bile ducts in the biliary tree and shows quite different characteristics from HCC. Although it is a relatively rare malignancy, it deserves immediate attention because of its steady and substantial   Table 2. Eleven potential ICC marker genes were obviously up-regulated in nras 61K transgenic zebrafish. * Those genes are highly mutated in human ICC. # Those genes are highly expressed in human ICC.
increase ranging from 0.32 to 0.85 per 100,000 people worldwide over the last 30 years 30,31 . It is particularly prevalent in Asian countries, for instance, the prevalence of ICC in Thailand is 96 per 100,000 people, which is 100-fold higher than the worldwide average 32 . Unfortunately, ICC is very difficult to diagnose and is usually associated with high mortality due to its late clinical presentation and the lack of effective non-surgical therapeutics 33 .
Here we report an in vivo ICC model using the transgenic overexpression of oncogene nras 61K in zebrafish. This work is reminiscent of a previous study that reported a HCC transgenic model which constitutively expresses the kras V12 oncogene in the zebrafish liver 6 . Intriguingly, our study uncovered that even using the same liver-specific fabp10 promoter, a different kind of ras oncogene would lead to distinct type of liver cancer. One explanation for the phenotypic differences between kras V12 and nras 61K is that they are expressed at different levels within specific cell types 34 . Another previous study reported that hepatocytes may be capable of changing their   35 . It is suggested that nras 61K may contribute to this process 35 . However, further studies are necessary to determine whether nras 61K is responsible for the conversion of hepatocytes to the biliary lineage cells, and next, to induce the malignant transformation to the intrahepatic cholangiocarcinoma.
Transcriptomic analyses show that genes related to metabolic pathways are most abundant in the ICC samples, identifying the importance of metabolic modulation during the tumorigenesis in ICC. It was reported that tmprss4 (Transmembrane protease serine 4) is up-regulated in a broad spectrum of cancers 36 . Overexpression of tmprss4 significantly promotes the invasion, migration, adhesion and metastasis in HCC 37 . In our study, it was demonstrated that tmprss4 might serve as an evolutionarily conserved marker in liver cancer. Furthermore, several gene signatures which acted as potential markers for ICC 29,30 were analysed in this study. Moreover, our data uncovered that 11 of these potential ICC marker genes (Table 3) were obviously up-regulated in the Tg(fabp10:nras 61K ) transgenic zebrafish, indicating that these genes may play critical roles in the nras 61K liver tumorigenesis and that our model provides a good platform for ICC studies.
Deregulation of developmental genes by hypomethylation of CG islands appears to be one of the major factors driving tumorigenesis 23 . Recently, several studies reported DNA methylation in nras-mutated cancers. For instance, Furlan et al. 38 found that DNA demethylation together with specific nras mutations drives the early steps of oxidative damage colorectal tumourigenesis. Another study reported that the hypomethylated genes may play important roles in the pathogenesis of nras 61K -mutated melanoma 21 . Therefore, we explored the genome-wide methylation of Tg(fabp10:nras 61K ) transgenic zebrafish. As shown in Fig. 6, methylated genes in Tg(fabp10:nras 61K ) transgenic fish were observed, associated with cellular components, binding activities and biological processes such as cellular process, metabolic process and single-organism process. It is of note that genes involved in metabolic process are the largest group in differentially expressed genes, which suggests that genes in metabolic pathways are of considerable significance to the formation of ICC through methylation. Furthermore, our result uncovered a rich landscape of distinct epigenomic features such as repeat elements, coding, and non-coding sequences, in the livers of Tg(fabp10:nras 61K ) transgenic zebrafish. For example, exons were clearly distinguishable from introns by elevated methylation levels, delimited by sharp intron-exon boundaries. This finding was consistent with some recent reports that exons can be determined by epigenetic marks 39,40 . Our data provided biological information to analyse features that have previously been difficult to assess such as repeat elements 41 .
In addition, it is very crucial in terms of cancer diagnosis that cancer-/cancer-stage-specific biomarkers are identified 42 . In this study, one of our goals is to identify ICC-related biomarkers by virtue of methylome and transcriptomic analysis. Our results showed that three potential ICC marker genes were hypomethylated in liver samples of the 12 months post fertilization (mpf) Tg(fabp10:nras 61K ) transgenic fish. One notable example is epha4 (EPH tyrosine kinase receptor A4), a member of the receptor tyrosine kinases gene family, modulating the epithelial-mesenchymal transition process, which was hypomethylated in our results. Previous study reported that epha4 promotes cell proliferation and migration in glioblastoma 43 . However, epha4 mRNA is significantly down-regulated in HCC tissues with the ability to inhibit the cancer cell migration and invasion, and promoting cell adhesion 44 . Interestingly, epha4 was up-regulated (5.07-log 2 fold change, p value < 0.0001) in our ICC samples, suggesting the exsistence of different mechanisms in ICC, prompting further investigations.
Moreover, hypomethylated loci featured prominently in gene bodies (mainly in introns) but not promoter regions, which is similar to the previous report of the nras 61K -mutated melanoma 21 . One explanation is that the CpG-rich sequences in these intron of the oncogenes were potential mediators linking the oncogene and its transcription factor. For instance, Lee et al. 45 reported that the transcription factor MZF1, which binds to CpGs in the first intron of PRAME (preferentially expressed antigen in melanoma) gene, could induce the up-regulation of PRAME by increasing DNA hypomethylation, and then promote the proliferation of melanoma cells. However, the mechanisms of the DNA hypomethylation in these ICC marker genes remain to be elucidated in the future.
In summary, we successfully constructed a Tg(fabp10:nras 61K ) transgenic model and created an ICC animal model. Our model provides a good tool to investigate the molecular events involved in the progression of bile duct neoplasms. The methodologies utilized to study the whole-gene methylation and transcriptomes during the development of ICC supplied the tools to study the meachnisms of ICC. Consequently, the final purpose of  Table 3. Three potential ICC marker genes were hypomethylated in nras 61K transgenic zebrafish.
research in this field is to find and screen therapeutic drugs, and we believe that the ICC model can be applied to high-throughput screen the anti-ICC drugs.

Materials and Methods
Vector construction and production of Tg(fabp10:nras 61K ) transgenic zebrafish lines. The liver-driver construction was made by the insertion of zebrafish liver-specific fatty acid-binding protein (fabp10) promotor at the Apal I and Nhe I sites into the vector pAcGFP1-N1 (Clontech, Mountain View, CA, 632469). Human nras 61K gene was inserted into the vector pAcGFP1-N1 at the Sal I and BamH I sites. The cDNA encoding human nras 61K gene was inserted at the C-terminal of the liver-specific fabp10 promotor and the N-terminal of the green fluorescent protein (GFP) provided by the pAcGFP1-N1 vector. Linearized plasmid of pAcGFP1-N1-fabp10-nras 61K was injected into one-cell embryos of the AB strain zebrafish. Experiments involving zebrafish in this study were approved by the Animal Research and Ethics Committees of Institute of Hydrobiology, Chinese Academy of Sciences, and all experiments were conducted in accordance with the guidelines of the committees.
Tumor screening and histopathological analysis. The GFP-positive embryos were raised to adults and out-crossed to wild-type fish for testing germline transmission. Transgenic fish were dissected to expose the abdominal area and liver morphology observed under the SMZ1600 stereomicroscope (Nikon, Tokyo, Japan). Liver tumor in transgenic fish was defined as GFP-marked liver which was enlarged to at least twice the size of a WT normal liver. Liver samples were then fixed in 4% paraformaldehyde (PFA) for at least 24 hours by dehydration through a series of graded ethanol solutions and embedding in paraffin. Identification and mapping of integration sites. Genomic DNA was isolated from the F 3 transgenic line and digested with Dra I, EcoR V, Stu I, and Pvu II. The digested DNA was purified and ligated with GenomeWalker TM adaptor (Clontech, Mountain View, CA, 638904). Primary PCR was performed using primer AP1 and gene-specific primer (GSP1: 5′-AAA TAC TCA GAG CAG CCC ATC TGG-3′). The PCR product was 50-fold diluted as a template and secondary PCR was performed using AP2 and gene-specific primer (GSP2: 5′-GTC GAC TCC CTT TAG TGA GGG TTA-3′). PCR products larger than 1 kb were cloned and sequenced. These sequence reads were mapped to the zebrafish genome using BLASTN (GRCz10, November 2015 freeze in the Ensembl).
Transcriptome sequencing and analysis. Liver tissues from both WT and Tg(fabp10:nras 61K ) transgenic zebrafish were collected at 12 mpf. Livers of four fish were pooled to generate one sample. Total RNA was purified by beads containing oligo (dT). Purified mRNA was then fragmented in fragmentation buffer. Using these short fragments as templates, random hexamer-primers were applied to synthesize the first-strand cDNA. BS-seq analysis. BS-seq reads were mapped to the reference genome using SOAP2 48 as described 49 , allowing up to 6 mismatches for 90 bp paired-end reads. Multiple reads mapping to the same position were regarded as PCR duplicates, and only one of them was kept. However, bases with a quality score less than 20 were not considered for subsequent analysis.
The error rate of each library (sum of the non conversion rate and T/C sequencing errors) was calculated as the total number of sequenced Cs divided by the total sequencing depth for sites corresponding to Cs in the Lambda genome. The error rate for each library was approximately 0.5%. To distinguish true mCs from false positives, we applied a model based on the binomial distribution B (n, p) following 49 , and only the mCs with FDR 50 adjusting to P-values less than 0.01 were considered true positives. All experiments were performed in triplicate.

Estimation of methylation level and bisulfite DNA sequencing PCR analysis.
To estimate the methylation level of a single base accurately, the CpG cytosines with a per-strand depth of less than 4 have been excluded from the analysis. Methylation level of an individual CpG was determined by the number of reads containing a C at the site of interest divided by the total number of reads covering the site. Meanwhile, methylation level of a specific region was determined by the sum of methylation levels of individual CpGs in the region divided by the total number of overed CpGs in this region.
Genomic DNA was isolated from liver tissues of both WT and Tg(fabp10:nras 61K ) transgenic zebrafish (12 mpf). Livers of four fish were pooled to generate one sample. The DNA samples were treated with sodium bisulfite to convert cytosine to uracil using the BisulFlash ™ DNA Modification kit (Epigentek, USA, P-1026-050) according to the manufacturer's instruction. Bisulfite sequencing was performed as described previously 51 , and the amplified bisulfite sequencing PCR products were sequenced to determine the methylation status of each CpG site. Primer sequences are shown in Supplementary Table 5.

Statistical analysis.
Differentially expressed genes (DEGs) were selected by applying the statistical significance threshold of log2-fold >1.0 or <−1.0, with the false discovery rate (FDR) threshold set at <0.01 and p < 0.01. Gene Ontology (GO) analysis was applied to analyze the primary functions of the DEGs using MetaCore software (GeneGo). The data are reported as average results of three independent experiments.
RT-PCR data are reported as mean + S.E.M. of three independent experiments. Statistical analysis (unpaired t-test) was performed using GraphPad Prism 5 (GraphPad Software Inc.).