Human pluripotent stem cell derived HLC transcriptome data enables molecular dissection of hepatogenesis

Induced pluripotent stem cells (iPSCs) and human embryonic stem cells (hESCs) differentiated into hepatocyte-like cells (HLCs) provide a defined and renewable source of cells for drug screening, toxicology and regenerative medicine. We previously reprogrammed human fetal foreskin fibroblast cells (HFF1) into iPSCs employing an episomal plasmid-based integration-free approach, this iPSC-line and the hESC lines H1 and H9 were used to model hepatogenesis in vitro. Biochemical characterisation confirmed glycogen storage, ICG uptake and release, urea and bile acid production, as well as CYP3A4 activity. Microarray-based transcriptome analyses was carried out using RNA isolated from the undifferentiated pluripotent stem cells and subsequent differentiation stages- definitive endoderm (DE) hepatic endoderm (HE) and HLCs. K-means identified 100 distinct clusters, for example, POU5F1/OCT4 marking the undifferentiated stage, SOX17 the DE stage, HNF4α the HE stage, and ALB specific to HLCs, fetal liver and primary human hepatocytes (PHH). This data descriptor describes these datasets which should be useful for gaining new insights into the molecular basis of hepatogenesis and associated gene regulatory networks.


Wasco Wruck 1 & James Adjaye 1
Induced pluripotent stem cells (iPSCs) and human embryonic stem cells (hESCs) differentiated into hepatocyte-like cells (HLCs) provide a defined and renewable source of cells for drug screening, toxicology and regenerative medicine. We previously reprogrammed human fetal foreskin fibroblast cells (HFF1) into iPSCs employing an episomal plasmid-based integration-free approach, this iPSC-line and the hESC lines H1 and H9 were used to model hepatogenesis in vitro. Biochemical characterisation confirmed glycogen storage, ICG uptake and release, urea and bile acid production, as well as CYP3A4 activity. Microarraybased transcriptome analyses was carried out using RNA isolated from the undifferentiated pluripotent stem cells and subsequent differentiation stages-definitive endoderm (DE) hepatic endoderm (HE) and HLCs. K-means identified 100 distinct clusters, for example, POU5F1/OCT4 marking the undifferentiated stage, SOX17 the DE stage, HNF4α the HE stage, and ALB specific to HLCs, fetal liver and primary human hepatocytes (PHH). This data descriptor describes these datasets which should be useful for gaining new insights into the molecular basis of hepatogenesis and associated gene regulatory networks.

Background & Summary
The implementation of a well-characterized renewable source of hepatocytes differentiated from iPSCs and hESCs provides a powerful in vitro model system for analysing the molecular mechanisms associated with hepatogenesis. Several essential initiators of hepatogenesis such as fibroblast growth factor 2 and 4 (FGF4 and FGF2) 1,2 , bone morphogenic protein (BMP2) 3 , hepatocyte growth factor (HGF), oncostatin M and dexamethasone 4 have already been described. These factors are sequentially supplemented into the media during the course of the differentiation process. Besides detoxification the liver is responsible for a number of essential functions e.g. the uptake and storage of glycogen, various metabolic functions, synthesis of bile acids and production of plasma proteins. Available liver cellular models have disadvantages: (i) liver biopsy derived primary human hepatocytes (PHH) cannot be expanded for long periods in vitro, are often obtained from diseased individuals and are difficult to obtain in sufficient quantities 5,6 , (ii) transformed, permanent cell lines, such as HepG2 and HepaRG, have cancer phenotypes which are significantly diverged from normal primary hepatocytes [7][8][9] . A potential alternative could be the differentiation into hepatocyte-like cells. Although hepatocyte-like cells (HLCs) derived from iPSCs are not fully mature compared to liver biopsy derived adult hepatocytes they are endowed with many advantages, for example easily generated from iPSCs, known genetic background and disease states thus optimal for disease modelling in vitro, toxicology studies and drug screening. iPSC-based cellular models have already been employed in several studies for drug screening, toxicology studies and disease modeling [10][11][12][13][14] .
The liver develops in a stepwise process in vivo: first, competence is established in the foregut endoderm in response to signals emanating from cardiac mesoderm, thereafter liver-specific gene expression is initiated 15 . The differentiation of hiPSCs and hESCs also proceed via the intermediate step of definitive endoderm, the bipotential hepatic endoderm, then maturation into HLCs 16 . Distinct stage specific changes in the associated transcriptional regulatory networks control the different phases of hepatogenesis 17 . Wang et al. describe a developmental progression from unmarked chromatin to poised chromatin and then to histone H3K27 acetylation which is accompanied by specific transcription factor classes 18 . They suggest FOXA transcription factors -known as pioneer factors facilitating the unwinding of chromatin -to play a role at poised enhancers while lineage-specific factors such as PDX1 for pancreatic and HNF4α for hepatic lineage drive the poised to an active enhancer state 18 .
Attaining maturation comparable to primary hepatocytes is still one of the most challenging issues associated HLC differentiation. Knowledge on HNF4α, as major transcription factor regulating hepatic differentiation and maturation has already been described 15 . Additionally Li et al. reported that HNF4α lies upstream of the transcription factors HNF1α and PXR suggesting it might initiate a cascade of gene regulatory networks driving hepatocyte differentiation 15 . In our publication related to the hESCs and hiPSC dataset pertinent to this data descriptor we confirmed expression of maturation markers such as ALB, HNF4α, HNF1α and TTR 16,17 .
The data described consists of microarray gene expression data from hESCs and hiPSCs differentiated into HLCs via the DE and HE stages and also fetal liver and primary human hepatocyte samples as reference. Although transcription factors central to hepatogenesis have been described, the datasets described here will enable a more detailed analyses of gene regulatory networks associated with modelling hepatogenesis using pluripotent stem cells.

Human ES and iPS cells culture
Human ES cell lines H1 and H9 (WiCell Research Institute, Madison, Wisconsin) from passage 39 to 66 were maintained under sterile conditions in a humidified incubator in a 5% CO2-95% air atmosphere at 37°C (INNOVA CO-170 Incubator, New Brunswick Scientific). In a routine culture cells were maintained on Matrigel® in conditioned media (CM) 19 . Under these culture conditions, hESCs were confirmed to stain positive for OCT4, SSEA-4, TRA-1-60, and TRA-1-81 (ES Cell Characterization Kit, Chemicon). Before initiating the differentiation cells were washed with PBS without Ca2+Mg2+ (Gibco, Invitrogen).
Cell culture of iPS cells which were derived from Human neonatal foreskin fibroblasts HFF1 is described in Matz et al. 17 .

Differentiation into hepatocyte-like cells (HLCs)
The derivation of HLCs from the hESC lines H1 and H9 16 followed protocols described by Hay et al. 20 and Agarwal et al. 21 . RNA samples were extracted after each step of the differentiation protocol.
Differentiation of iPS cells into HLCs 17 followed in large parts the protocol described by Jozefczuk et al. 16 .
The overall experimental design of this study is illustrated in Figure 1a. Two pluripotent stem cells lines (hESC-H1 and H9) and fetal foreskin derived iPSC were used. Both proceeded via the intermediate DE and HE stages to HLCs and then compared to commercially bought RNA from fetal liver (Stratagene, MVP Total RNA: tissue from single male donor, 18th week of gestation; positive control for the iPSCbased differentiations: Clontech, #636540) and adult liver biopsy-derived primary human hepatocytes-PHH (Ready Heps Fresh Hepatocytes; Lonza, 65-year old male of Asian origin; positive control for the iPSC-based differentiations: Clontech, #636531).

Illumina BeadChip hybridisation
Biotin-labelled cRNA was produced by means of a linear amplification kit (Ambion, Austin, TX, USA) using 500 ng of quality-checked total RNA as input. Chip hybridisations, washing, Cy3-streptavidin staining, and scanning were performed on an Illumina BeadStation 500 platform (Il-lumina, San Diego, CA, USA) using reagents and following protocols supplied by the manufacturer. cRNA samples were hybridised in biological triplicates on HumanRef-8 Expression BeadChips. The following samples were hybridized: Undifferentiated cells (H1 cell line), DE (definitive endoderm)-differentiated cells, HE (hepatic endoderm)-differentiated cells and hepatocyte-like cells (HLCs) derived with two independent protocols 20,21 .
Differentiation experiments of iPSCs were hybridised on Illumina HumanHT-12 BeadChips. For details see the Methods description in Matz et al. 17 .
The summary of bead-level data to bead-summary data was carried out using the manufacturer's software BeadStudio 3.0 (Illumina) for hESC and iPSC differentiation experiments. Table 1 provides an overview of all samples used for this study.

Data analysis and statistical methods
For further analysis, the bead-summary data saved in the BeadStudio was imported into the Bioconductor environment 22 and quantile normalized using the bioconductor package lumi 23 . Global gene expression similarities within biological replicates and between dedicated differentiation stages, pairwise Pearson correlation coefficients were calculated for all samples. Cluster analyses were performed using the R/ Bioconductor environment 22 and the package pvclust 24 using n = 1000 for bootstrap sampling. k-means clustering was employed to identify clusters of genes with similar gene expression changes over the stages of the differentiation protocol using k = 100 as number of clusters. The software is available in the Supplementary Data File 1.

Technical Validation
Transcriptome data Microarray data were quality controlled via the proprietary Illumina quality control mechanisms. Tables of Pearson correlation coefficients of all samples vs. each other were generated validating the absence of outliers (Tables 2 and 3). Several samples were investigated in triplicates, all others in duplicates. Figure 1b and Figure 1c show that the replicates cluster together as well in the hESC as in the iPSC differentiation experiments as one would expect. Both bootstrap sampling methods implemented in the pvclust clustering software confirmed that all clusters within the dendrogram are with one exception (98%) at 100% supported by data. This demonstrates the validity of experiments on the level of wholegenome gene expression. k-means clustering to assess differentiation stages and similarity to primary hepatocytes Normalized gene expression microarray data of the iPSC differentiation experiments were further investigated via a k-means clustering algorithm. The algorithm split the data into 100 clusters of genes with similar expression over all differentiation stages. Associations of genes with clusters are included in the publication by Matz et al. 17 . Several clusters were representative for distinct differentiation stages. Genes from cluster#9 were employed to make a tissue type prediction via the tool KeyGenes 25 (Figure 2a). Based on the normalized gene expression data of these genes KeyGenes predicted the tissue type "liver" for HLC, fetal liver and PHH samples. Figure 2b demonstrates that genes from cluster#9 have most abundantly peaks at the HLC stages. Furthermore, k-means clustering provided several stage-specific clusters six of which are shown in Figure 3. These represent stages iPSCs, definite endoderm, hepatic endoderm, HLCs, fetal liver and PHHs and display a gene expression peak at the dedicated stage. They include stage-specific markers which in some cases are already known: POU5F1/ OCT4 in the iPSC-cluster, SOX17 in the DE-cluster, AFP in the fetal-liver-cluster and ALB in the PHH-cluster. In Supplementary Fig. S4F    differentiation and furthermore reveal the activity of CTCF, ZFX, FOXA2, FOXA1, CEBPA. Additionally, these datasets may provide new insights into the differences and similarities of the hepatocyte differentiation processes between hESCs and iPSCs. Figure 4 using marker genes from the representative k-means-clusters shows that the DE stage and HLCs are very similar between hESC-and iPSC-derived differentiations while the HE stage provides a pronounced peak in iPSC-derived cells and a small peak in hESC-derived cells. As a cautionary note, the iPSC and hESC differentiations into HLCs and also the microarray-based transcriptome analyses were not conducted simultaneously, hence the observed minor variations.

Usage Notes
The microarray experiments related to this publication have been performed on the Illumina BeadStation 500 platform (Illumina, San Diego, CA, USA) but on different BeadChips. The iPSC-derived differentiations were hybridized using the Illumina HumanHT-12 BeadChip while the hESC-derived differentiations were hybridized using the HumanRef-8 Expression BeadChip. The differing chip types should be taken into account when comparing transcriptomics data between hESC-derived and iPSCderived experiments. Further points which should be considered are: (1) The fetal liver RNA was derived from liver homogenates containing all cells, but the adult liver samples were derived from isolated hepatocytes; (2) the liver disease for which the biopsy was performed could have an influence on the dataset; (3) the two different differentiation protocols used may affect the data.  17 ). Based on NCBI GEO datasets for human liver, brain, intestine, kidney, lung and heart via the KeyGenes tool a training set for these Illumina microarray platform data was generated. Genes from the HLC cluster#9 resulting from k-means clustering and HLC, fetal liver (FL) and primary human hepatocyte samples were used as test set. (c) Plot of 263 genes from the k-means cluster#2 over all differentiation stages derived from the hESCs. The plot demonstrates that this cluster is representative for HLCs. (d) KeyGenes tissue-based classification for hESCs k-means Hepatocyte-like-cell (HLC) cluster#2.  Figure 3. Clusters representative for all stages of differentiation. Via k-means clustering genes were assigned to 100 clusters having similar behaviour during the hepatic differentiation process. The plots in (a) show expression of genes in the hESC-differentiation experiments associated with cluster#94 representative for hESCs, cluster#23 containing SOX17 representative for DE, cluster#64 representative for HE, cluster#77 representative for HLCs, cluster#19 containing AFP representative for fetal liver, cluster#59 containing the liver marker ALB representative for PHHs. The plots in (b) show expression of genes in the iPSC-differentiation experiments associated with cluster#68 representative for iPSCs, cluster#81 containing SOX17 representative for DE, cluster#37 representative for HE, cluster#51 representative for HLCs, cluster#72 containing AFP representative for fetal liver, cluster#91 containing the liver marker ALB representative for PHHs.