Background & Summary

The implementation of a well-characterized renewable source of hepatocytes differentiated from iPSCs and hESCs provides a powerful in vitro model system for analysing the molecular mechanisms associated with hepatogenesis. Several essential initiators of hepatogenesis such as fibroblast growth factor 2 and 4 (FGF4 and FGF2)1,2, bone morphogenic protein (BMP2)3, hepatocyte growth factor (HGF), oncostatin M and dexamethasone4 have already been described. These factors are sequentially supplemented into the media during the course of the differentiation process.

Besides detoxification the liver is responsible for a number of essential functions e.g. the uptake and storage of glycogen, various metabolic functions, synthesis of bile acids and production of plasma proteins. Available liver cellular models have disadvantages: (i) liver biopsy derived primary human hepatocytes (PHH) cannot be expanded for long periods in vitro, are often obtained from diseased individuals and are difficult to obtain in sufficient quantities5,6, (ii) transformed, permanent cell lines, such as HepG2 and HepaRG, have cancer phenotypes which are significantly diverged from normal primary hepatocytes79. A potential alternative could be the differentiation into hepatocyte-like cells. Although hepatocyte-like cells (HLCs) derived from iPSCs are not fully mature compared to liver biopsy derived adult hepatocytes they are endowed with many advantages, for example easily generated from iPSCs, known genetic background and disease states thus optimal for disease modelling in vitro, toxicology studies and drug screening. iPSC-based cellular models have already been employed in several studies for drug screening, toxicology studies and disease modeling1014.

The liver develops in a stepwise process in vivo: first, competence is established in the foregut endoderm in response to signals emanating from cardiac mesoderm, thereafter liver-specific gene expression is initiated15. The differentiation of hiPSCs and hESCs also proceed via the intermediate step of definitive endoderm, the bipotential hepatic endoderm, then maturation into HLCs16. Distinct stage specific changes in the associated transcriptional regulatory networks control the different phases of hepatogenesis17. Wang et al. describe a developmental progression from unmarked chromatin to poised chromatin and then to histone H3K27 acetylation which is accompanied by specific transcription factor classes18. They suggest FOXA transcription factors - known as pioneer factors facilitating the unwinding of chromatin - to play a role at poised enhancers while lineage-specific factors such as PDX1 for pancreatic and HNF4α for hepatic lineage drive the poised to an active enhancer state18.

Attaining maturation comparable to primary hepatocytes is still one of the most challenging issues associated HLC differentiation. Knowledge on HNF4α, as major transcription factor regulating hepatic differentiation and maturation has already been described15. Additionally Li et al. reported that HNF4α lies upstream of the transcription factors HNF1α and PXR suggesting it might initiate a cascade of gene regulatory networks driving hepatocyte differentiation15. In our publication related to the hESCs and hiPSC dataset pertinent to this data descriptor we confirmed expression of maturation markers such as ALB, HNF4α, HNF1α and TTR16,17.

The data described consists of microarray gene expression data from hESCs and hiPSCs differentiated into HLCs via the DE and HE stages and also fetal liver and primary human hepatocyte samples as reference. Although transcription factors central to hepatogenesis have been described, the datasets described here will enable a more detailed analyses of gene regulatory networks associated with modelling hepatogenesis using pluripotent stem cells.

Methods

Human ES and iPS cells culture

Human ES cell lines H1 and H9 (WiCell Research Institute, Madison, Wisconsin) from passage 39 to 66 were maintained under sterile conditions in a humidified incubator in a 5% CO2-95% air atmosphere at 37 °C (INNOVA CO-170 Incubator, New Brunswick Scientific). In a routine culture cells were maintained on Matrigel® in conditioned media (CM)19. Under these culture conditions, hESCs were confirmed to stain positive for OCT4, SSEA-4, TRA-1-60, and TRA-1-81 (ES Cell Characterization Kit, Chemicon). Before initiating the differentiation cells were washed with PBS without Ca2+Mg2+ (Gibco, Invitrogen).

Cell culture of iPS cells which were derived from Human neonatal foreskin fibroblasts HFF1 is described in Matz et al.17.

Differentiation into hepatocyte-like cells (HLCs)

The derivation of HLCs from the hESC lines H1 and H916 followed protocols described by Hay et al.20 and Agarwal et al.21. RNA samples were extracted after each step of the differentiation protocol.

Differentiation of iPS cells into HLCs17 followed in large parts the protocol described by Jozefczuk et al.16.

The overall experimental design of this study is illustrated in Figure 1a. Two pluripotent stem cells lines (hESC- H1 and H9) and fetal foreskin derived iPSC were used. Both proceeded via the intermediate DE and HE stages to HLCs and then compared to commercially bought RNA from fetal liver (Stratagene, MVP Total RNA: tissue from single male donor, 18th week of gestation; positive control for the iPSC-based differentiations: Clontech, #636540) and adult liver biopsy-derived primary human hepatocytes-PHH (Ready Heps Fresh Hepatocytes; Lonza, 65-year old male of Asian origin; positive control for the iPSC-based differentiations: Clontech, #636531).

Figure 1: Comparison of hepatic differentiation of iPSCs and hESCs.
figure 1

(a) Scheme of comparative hepatic differentiation of iPSCs and hESCs. iPSCs and hESCs are differentiated into HLCs which can be compared versus each other and versus fetal liver (FL) and PHHs. Also the intermediate stages DE and HE are captured and thus can be subjected to comparative analysis. Hierarchical clustering of hESC (b) and iPSC (c) differentiation into HLCs was performed via pvclust using 1000 bootstrap samples. „au“ (approximately unbiased) is computed by multiscale bootstrap resampling and „bp“ (bootstrap probability) by normal bootstrapping. Red rectangles mark clusters with AU larger than 95%. Thus the dendrogram is with one exception in hESC (98%) at 100% supported by data. All replicates cluster together. HLCs cluster apart from hESCs/iPSCs, DE and HE. Fetal liver and PHH cluster together and separated from the hESC/iPSC-derived hepatic differentiation stages.

Illumina BeadChip hybridisation

Biotin-labelled cRNA was produced by means of a linear amplification kit (Ambion, Austin, TX, USA) using 500 ng of quality-checked total RNA as input. Chip hybridisations, washing, Cy3-streptavidin staining, and scanning were performed on an Illumina BeadStation 500 platform (Il-lumina, San Diego, CA, USA) using reagents and following protocols supplied by the manufacturer. cRNA samples were hybridised in biological triplicates on HumanRef-8 Expression BeadChips. The following samples were hybridized: Undifferentiated cells (H1 cell line), DE (definitive endoderm)-differentiated cells, HE (hepatic endoderm)-differentiated cells and hepatocyte-like cells (HLCs) derived with two independent protocols20,21.

Differentiation experiments of iPSCs were hybridised on Illumina HumanHT-12 BeadChips. For details see the Methods description in Matz et al.17.

The summary of bead-level data to bead-summary data was carried out using the manufacturer's software BeadStudio 3.0 (Illumina) for hESC and iPSC differentiation experiments. Table 1 provides an overview of all samples used for this study.

Table 1 Samples related to data sets in repositories.

Data analysis and statistical methods

For further analysis, the bead-summary data saved in the BeadStudio was imported into the Bioconductor environment22 and quantile normalized using the bioconductor package lumi23. Global gene expression similarities within biological replicates and between dedicated differentiation stages, pairwise Pearson correlation coefficients were calculated for all samples. Cluster analyses were performed using the R/Bioconductor environment22 and the package pvclust24 using n=1000 for bootstrap sampling. k-means clustering was employed to identify clusters of genes with similar gene expression changes over the stages of the differentiation protocol using k=100 as number of clusters. The software is available in the Supplementary Data File 1.

Data Records

Data Record 1

The iPSC-related microarray experiments related to this publication have been performed on the Illumina BeadStation 500 platform (Illumina, San Diego, CA, USA) using the Illumina HumanHT-12 BeadChip. The data were uploaded to NCBI GEO and are accessible under (Data Citation 1). The dataset (Data Citation 1) was first released to the public with the publication Matz et al.17.

Data Record 2

The hESC-related microarray experiments related to this publication have been performed on the Illumina BeadStation 500 platform (Illumina, San Diego, CA, USA) using the HumanRef-8 Expression BeadChip. The data were uploaded to NCBI GEO and are accessible under (Data Citation 2). The dataset (Data Citation 2) is being released for the first time with the publication of this Data Descriptor.

Technical Validation

Transcriptome data

Microarray data were quality controlled via the proprietary Illumina quality control mechanisms. Tables of Pearson correlation coefficients of all samples vs. each other were generated validating the absence of outliers (Tables 2 and 3). Several samples were investigated in triplicates, all others in duplicates. Figure 1b and Figure 1c show that the replicates cluster together as well in the hESC as in the iPSC differentiation experiments as one would expect. Both bootstrap sampling methods implemented in the pvclust clustering software confirmed that all clusters within the dendrogram are with one exception (98%) at 100% supported by data. This demonstrates the validity of experiments on the level of whole-genome gene expression.

Table 2 Pearson correlation coefficients of hESC-derived transcriptome data of all samples vs. each other.
Table 3 Pearson correlation coefficients of iPSC-derived transcriptome data of all samples vs. each other.

k-means clustering to assess differentiation stages and similarity to primary hepatocytes

Normalized gene expression microarray data of the iPSC differentiation experiments were further investigated via a k-means clustering algorithm. The algorithm split the data into 100 clusters of genes with similar expression over all differentiation stages. Associations of genes with clusters are included in the publication by Matz et al.17. Several clusters were representative for distinct differentiation stages. Genes from cluster#9 were employed to make a tissue type prediction via the tool KeyGenes25 (Figure 2a). Based on the normalized gene expression data of these genes KeyGenes predicted the tissue type “liver” for HLC, fetal liver and PHH samples. Figure 2b demonstrates that genes from cluster#9 have most abundantly peaks at the HLC stages.

Figure 2: Characterization of hepatocyte-like cells.
figure 2

(a) Plot of 407 genes from the k-means cluster#9 over all differentiation stages derived from the iPSCs. The plot demonstrates that this cluster is representative for HLCs. (b) KeyGenes tissue classification of for iPSCs k-means Hepatocyte-like-cell (HLC) cluster9 (source: Matz et al.17). Based on NCBI GEO datasets for human liver, brain, intestine, kidney, lung and heart via the KeyGenes tool a training set for these Illumina microarray platform data was generated. Genes from the HLC cluster#9 resulting from k-means clustering and HLC, fetal liver (FL) and primary human hepatocyte samples were used as test set. (c) Plot of 263 genes from the k-means cluster#2 over all differentiation stages derived from the hESCs. The plot demonstrates that this cluster is representative for HLCs. (d) KeyGenes tissue-based classification for hESCs k-means Hepatocyte-like-cell (HLC) cluster#2.

Furthermore, k-means clustering provided several stage-specific clusters six of which are shown in Figure 3. These represent stages iPSCs, definite endoderm, hepatic endoderm, HLCs, fetal liver and PHHs and display a gene expression peak at the dedicated stage. They include stage-specific markers which in some cases are already known: POU5F1/ OCT4 in the iPSC-cluster, SOX17 in the DE-cluster, AFP in the fetal-liver-cluster and ALB in the PHH-cluster. In Supplementary Fig. S4F of our previous publication related to the iPSC dataset17 of this data descriptor we could confirm PHH-cluster activity of the transcription factors HNF4α and HNF1α reported by Li et al.15 as factors inducing hepatocyte differentiation and furthermore reveal the activity of CTCF, ZFX, FOXA2, FOXA1, CEBPA. Additionally, these datasets may provide new insights into the differences and similarities of the hepatocyte differentiation processes between hESCs and iPSCs. Figure 4 using marker genes from the representative k-means-clusters shows that the DE stage and HLCs are very similar between hESC- and iPSC-derived differentiations while the HE stage provides a pronounced peak in iPSC-derived cells and a small peak in hESC-derived cells. As a cautionary note, the iPSC and hESC differentiations into HLCs and also the microarray-based transcriptome analyses were not conducted simultaneously, hence the observed minor variations.

Figure 3: Clusters representative for all stages of differentiation.
figure 3

Via k-means clustering genes were assigned to 100 clusters having similar behaviour during the hepatic differentiation process. The plots in (a) show expression of genes in the hESC-differentiation experiments associated with cluster#94 representative for hESCs, cluster#23 containing SOX17 representative for DE, cluster#64 representative for HE, cluster#77 representative for HLCs, cluster#19 containing AFP representative for fetal liver, cluster#59 containing the liver marker ALB representative for PHHs. The plots in (b) show expression of genes in the iPSC-differentiation experiments associated with cluster#68 representative for iPSCs, cluster#81 containing SOX17 representative for DE, cluster#37 representative for HE, cluster#51 representative for HLCs, cluster#72 containing AFP representative for fetal liver, cluster#91 containing the liver marker ALB representative for PHHs.

Figure 4: Comparison of clusters representative for DE, HE and HLCs between hESC- and iPSC differentiation.
figure 4

Genes from representative clusters for (a) DE (cluster#81), (b) HE (cluster#37,subcluster), (c) HLCs (cluster#51) and (d) cluster #72 containing AFP were compared between hESC- and iPSC-derived lines. Mean and standard error of all genes in the dedicated clusters are shown. The DE, AFP and HLC clusters show maxima at the associated stages in both differentiations. The HE cluster provides a pronounced peak in iPSC-derived cells and a small peak in hESC-derived cells.

Usage Notes

The microarray experiments related to this publication have been performed on the Illumina BeadStation 500 platform (Illumina, San Diego, CA, USA) but on different BeadChips. The iPSC-derived differentiations were hybridized using the Illumina HumanHT-12 BeadChip while the hESC-derived differentiations were hybridized using the HumanRef-8 Expression BeadChip. The differing chip types should be taken into account when comparing transcriptomics data between hESC-derived and iPSC-derived experiments. Further points which should be considered are: (1) The fetal liver RNA was derived from liver homogenates containing all cells, but the adult liver samples were derived from isolated hepatocytes; (2) the liver disease for which the biopsy was performed could have an influence on the dataset; (3) the two different differentiation protocols used may affect the data.

Additional information

How to cite this article: Wruck, W. & Adjaye, J. Human pluripotent stem cell derived HLC transcriptome data enables molecular dissection of hepatogenesis. Sci. Data 5:180035 doi:10.1038/sdata.2018.35 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.