Extensive reprogramming of the nascent transcriptome during iPSC to hepatocyte differentiation

Hepatocyte-like cells (HLCs) derived from induced pluripotent stem cells (iPSCs) provide a renewable source of cells for drug discovery, disease modelling and cell-based therapies. Here, by using GRO-Seq we provide the first genome-wide analysis of the nascent RNAs in iPSCs, HLCs and primary hepatocytes to extend our understanding of the transcriptional changes occurring during hepatic differentiation process. We demonstrate that a large fraction of hepatocyte-specific genes are regulated at transcriptional level and identify hundreds of differentially expressed non-coding RNAs (ncRNAs), including primary miRNAs (pri-miRNAs) and long non-coding RNAs (lncRNAs). Differentiation induced alternative transcription start site (TSS) usage between the cell types as evidenced for miR-221/222 and miR-3613/15a/16-1 clusters. We demonstrate that lncRNAs and coding genes are tightly co-expressed and could thus be co-regulated. Finally, we identified sets of transcriptional regulators that might drive transcriptional changes during hepatocyte differentiation. These included RARG, E2F1, SP1 and FOXH1, which were associated with the down-regulated transcripts, and hepatocyte-specific TFs such as FOXA1, FOXA2, HNF1B, HNF4A and CEBPA, as well as RXR, PPAR, AP-1, JUNB, JUND and BATF, which were associated with up-regulated transcripts. In summary, this study clarifies the role of regulatory ncRNAs and TFs in differentiation of HLCs from iPSCs.

Primary human hepatocytes (PHH) are extensively needed for both basic and translational science but their use is restricted by the lack of donors, and the limited number of cells since functional PHHs cannot be expanded in vitro and they rapidly lose their functionality during culture 1 . Stem cells offer an alternative, as they possess the ability to self-replicate and differentiate into all cell types in the body. Several protocols have been published for the production of hepatocyte-like cells (HLCs) from human embryonic stem cells and human induced pluripotent stem cells (iPSCs) through a definitive endoderm stage 2 . Producing hepatocytes from iPSCs offers a way to study different aspects of hepatocyte function in a patient-specific cell model, thus in the context of the genetic background of the patient.
Transcription constitutes the first step of gene expression and hence a fundamental cellular function. Microarrays have widely been replaced by massively parallel sequencing methods like RNA-sequencing (RNA-seq) as a method of studying whole transcriptomes (known and novel transcripts) 3 . RNA-seq enriches for stable mRNAs, which accumulate to steady-state levels in cells whereas global run-on sequencing (GRO-seq) measures production rates of primary transcripts by directly measuring nascent RNA production 4 . Hence, GRO-seq identifies the genes that are being transcribed at a given time point in a cell. It provides extensive information on the location and quantity of coding and non-coding transcripts, including promoter-and enhancer associated long non-coding RNAs (lncRNAs) and primary microRNAs (pri-miRNAs 5,6 ). Furthermore, GRO-seq signal being independent of the stability of the transcripts produced, captures the correlation between gene Gene expression profiling by GRO-seq. Protein-coding genes comprised most of the transcripts detected in all sample groups, as expected and ncRNAs such as pri-miRNAs and lncRNAs were the second prevalent transcript type ( Supplementary Fig. S2G).
Protein-coding genes. Heatmap representation of the protein-coding genes illustrates the differentially expressed genes between the sample groups (Fig. 1A). In total 1942 up-and 2100 down-regulated protein-coding genes were identified when comparing the PHHs to iPSCs (Fig. 1B). To gain insight on which key biological pathways are active in PHHs, a gene ontology (GO) enrichment analysis was conducted for the differentially expressed genes (DEGs), separately for the sets of up- (Supplementary Table S1) and down-regulated protein-coding genes (Supplementary Table S2) in PHHs vs. iPSCs. For up-regulated genes, the most enriched terms were drug metabolic process, bile acid related processes and regulation of plasma lipoprotein levels (Fig. 1C). Furthermore, other liver-related biological processes such as organic acid metabolism, several lipid-related processes, regulation of hormone levels and blood coagulation were enriched. The down-regulated genes in PHHs (i.e. up-regulated in iPSCs) were enriched for GO terms related to e.g. stem cell proliferation and differentiation. When comparing the expression profiles of M1-and M2-HLCs to the iPSCs, in total 1422/2074 up-and 466/2030 down-regulated protein-coding genes were observed in M1/M2-HLCs, respectively (Fig. 1B). The up-regulated genes in M1/ M2-HLCs (vs. iPSCs) shared enrichment of several GO terms with PHHs. Specifically, the drug metabolic process, regulation of hormone levels, lipid/lipoprotein and bile acid related pathways were enriched in M2-HLCs and PHHs, and immune response, regulation of plasma lipoprotein particle levels, blood coagulation and response to drug in M1-HLCs and PHHs (Fig. 1C).
Analysis on restricted set of pluripotency and liver-specific genes. Several known pluripotency TFs such as SALL4, OCT4, SOX2, LIN28, NANOG and TEAD2 had highest expression levels in the iPSCs (Fig. 1D) whereas liver-specific genes like APOH, CPS1, ASGR1, ALB and ITIH1 were statistically significantly up-regulated in PHHs compared with iPSCs. In the M1-and M2-HLCs, the gene expression levels of the plasma proteins (e.g. FGA, FGB, APOA2, APOB), liver-related genes (ALB, AFP, TTR) and several key hepatic TFs such as HNF4A, HNF1B, HNF1A, FOXA1 and FOXA2 were well in concordance with other published studies with similar differentiation protocols and total liver expression profiles (Supplementary Figs S3 and S4 and Supplementary  Table S3).
Next, we used mirNET tool 11 to construct liver-specific miRNA target networks connecting miRNA to genes that had been validated by CLIP-studies (CLIP = Crosslinking and immunoprecipitation). This analysis identified 29 hub miRNAs that could regulate ~3000 target mRNAs (Fig. 2B). The hub miRNAs included pluripotency miR-NAs miR-302b and -302c, and liver-related miRNAs like miR-21, miR-192 and miR-34a. Similar to protein coding genes, the expression of the hub miRNAs clustered the M2-HLCs and PHHs closer together than the M1-HLCs and PHHs (Fig. 2C), and the GO analysis indicated that the hub miRNA target genes were mostly related to regulation of translation, initiation of transcription, regulation of protein metabolism as well as cell cycle (Fig. 2D). MiR-34a, -29a, -29c and -192 exhibited an increasing expression in the order M1-< M2-HLCs < PHHs (Fig. 2C). To validate the GRO-seq results, expression of eight pri-miRNAs was measured at mature miRNA level by RT-qPCR in M1-and M2-HLCs, as well as in PHHs and human liver (hTLR). In line with the GRO-seq results, miR-302b, miR-9*, miR-221-3p and miR-222-3p expression was highest in the iPSCs. On the other hand, miR-21-5p and miR-122 expression was higher in the HLCs, PHHs and hTLR. Expression of miR-29a-3p and miR-15a-5p was highest in PHHs and hTLR, and higher in M2-than M1-HLCs. (Supplementary Fig. S2H).
To further categorize the lncRNAs, we divided them into promoter-(36%) and enhancer-associated (64%) (Supplementary Table S7) based on the prominent presence of H3K4me3 at the promoters and H3K4me1 at the enhancers 12 . Plotting the expression of differentially regulated lncRNAs (identified in PHHs vs. iPSCs) for M1-and M2-HLCs, demonstrated again higher correlation between M2-HLCs and PHH compared to M1-HLCs and PHH, for both promoter-and enhancer associated lncRNAs (Fig. 3C). Interestingly, 534 of the lncRNAs overlapped with hepatocyte-specific super-enhancers [SEs; 13 ], majority of which were induced in differentiated HLCs. In line with previous findings, these SEs were found in close proximity to genes important for hepatocyte function and transcriptional control including FOXA2 (Fig. 3D) 13,14 .
Finally, we applied the guilt-by-association principle to infer functions of the identified lncRNAs. This approach is based on a correlation analysis between lncRNA and protein-coding gene expression in combination with the GO enrichment analysis of the protein-coding mRNA 15 . Altogether, a good correlation was detected between lncRNA and adjacent coding gene expression, irrespective of the lncRNA category (Fig. 3E). The pattern of GO term enrichment in the lncRNA target genes was similar to the pattern detected in protein-coding genes. The highest enrichment was detected for e.g. the lipid homeostasis, regulation of plasma lipoprotein particle levels, bile acid and salt transport, blood coagulation and response to drug (Fig. 3F). On the contrary, the 'drug metabolic process' was not detected in the HLC lncRNAs target genes and the enrichment was low even in the PHH lncRNAs target genes (Fig. 3F). This is opposite to protein-coding genes where 'drug metabolic process' was the most enriched process in the PHHs and highly enriched in the M2-HLCs (Fig. 1C).

Changes in regulatory motifs reflect the altered repertoire of transcription factors (tFs).
Majority of binding sites for lineage-specific TFs locate at inter-and intragenic enhancer regions, outside the www.nature.com/scientificreports www.nature.com/scientificreports/ promoters of coding genes 16,17 . Recent studies have further demonstrated that highly active enhancers display H3K4me3-promoter mark 18 but also that promoters can serve as enhancers 19,20 . This prompted us to focus the identification of regulator TF binding sites to the TSSs of all lncRNAs identified above. The top TF motifs found enriched at up-regulated TSSs were SP1, E2F2 and ETS whereas the down-regulated (in PHHs vs. iPSCs) TSSs were enriched for POU6F1, RARG and SIX motifs (Fig. 4A). This suggests that a subset of TFs that exhibit significant changes in response to differentiation could be responsible for establishing the hepatocyte-specific transcriptional profiles. Altogether, we identified 137 TFs statistically significantly up-and 174 down-regulated in PHHs vs. iPSCs (Supplementary Table S8). Among the most up-regulated TFs were liver-related HNF4A, HNF1B, FOXA1 and FOXA2 21 , which were also up-regulated in the HLCs but remained statistically significantly lower in the M1-than M2-HLCs. Among the most down-regulated TFs were pluripotency-related FOXD3 22 , ZIC3, SOX2 and NANOG 23 , as well as SOX11 21 and OTX2 24 . These were all down-regulated during the HLC differentiation but remained statistically significantly higher in the M1-than M2-HLCs, in which the expression reached the levels detected in the PHHs (Supplementary Table S8).
To study the contribution of the TFs to transcriptional programs, we correlated the TF expression levels (of TFs for which motifs have been well established) with the relative enrichment of the motif at regulated lncRNA TSSs (Fig. 4B,C). Our analysis identified several subgroups: (i) the down-regulated transcripts being enriched for the binding of pluripotency and cell cycle associated TFs such as RARG, E2F1, SP1 and FOXH1; (ii) the up-regulated transcripts being enriched for potential transcriptional repressors, whose expression is dampened in hepatocytes exemplified by TEAD2, TEAD4, TCF3, TCF4 and FOXM1; (iii) the up-regulated lncRNA transcripts  www.nature.com/scientificreports www.nature.com/scientificreports/ being enriched for the binding of hepatocyte-specific TFs such as FOXA1, FOXA2, HNF1B, HNF4A and CEBPA; as well as AP-1 along with its subunits/family members JUNB, JUND and BATF (Fig. 4C). Also, the motifs of RXR and PPAR were enriched in the up-regulated lncRNAs transcripts.

Discussion
In the present study, we produced iPSCs from patient-derived fibroblasts, and differentiated them into hepatocyte-like cells (HLCs) by two different methods modified from the protocols originally described by Si-Tayeb 9 ; [referred to as M1] and Hay 10 ; [referred to as M2]. Moreover, we measured nascent RNA transcription in iPSCs, HLCs and primary human hepatocytes (PHHs) by GRO-sequencing 4 , which provides a "map" of the position and orientation of all engaged RNA polymerases across the genome at extremely high resolution. We demonstrate, for the first time, extensive changes in the nascent transcriptional profiles during the iPSC to HLC differentiation, concurrently delineating the expression profile of modulatory ncRNAs like miRNAs and lncRNAs.
Maturation of HLCs remains a challenge as they continue to express AFP, which is a marker of a fetal, immature hepatocyte phenotype. Despite the persisting AFP positivity, the iPSC-derived HLCs possessed hepatocyte functionality, which was confirmed by albumin, urea and triglyceride secretion. Extensive analysis of nascent coding and non-coding RNA expression provided further evidence on successful hepatic differentiation by M1 and M2 methods. Through pathway analyses, we further demonstrated an enrichment of liver-related GO terms in both M1-and M2-HLCs but the enrichment remained lower than in PHHs. Recently these differentiation protocols have been updated and new methods developed [25][26][27][28] , and future studies are needed to identify the optimal protocol. However, our study provides several candidate genes among the coding and non-coding RNAs that could be used as markers to evaluate the efficiency of differentiation.
To be usable as an in vitro model for studying human hepatic drug metabolism, transport, clearance and toxicity, the HLCs have to express a set of enzymes and transporters involved in hepatic drug clearance. The most important drug-metabolizing enzymes belong to the cytochrome P450 (CYP) superfamily. Overall, the CYP gene expression levels were lower in the HLCs than the PHHs. This was in line with the GO enrichment analyses showing that drug metabolic process was highly enriched in the PHHs but less in the HLCs. The expression of CYP3A family is of interest as it, especially CYP3A4, is responsible for the metabolism of majority of drugs and highly expressed in the liver 29,30 . CYP3A4 was highly expressed in the PHHs and statistically significantly more in M2-than M1-HLCs. Another major isoenzyme in human hepatocytes is CYP1A2 31 , which was expressed only at very low levels in the HLCs. The CYP1A1 level, on the other hand, was highest in M2-HLCs similarly as in a previous study detecting higher CYP1A1 than CYP1A2 levels in ESC-and iPSC-derived hepatocytes 32 . High CYP1A1 expression in M2-HLCs speaks against the cells being of fetal phenotype as fetal hepatocytes express very low levels of this CYP enzyme 33 . In conclusion, regardless of the large variety of CYP enzymes expressed in the HLCs, the PHHs still appear superior. However, we must acknowledge the limitation that our analysis is based on transcriptional level and further studies are needed to compare the differences in the drug metabolizing capacities of M1-and M2-HLCs compared to PHHs.
The abundance of pri-miRNAs is low at steady state and hence, they are poorly represented in standard RNA-seq libraries. GRO-seq measures the nascent transcriptome, thus enabling the analysis of active transcription at miRNA loci. This allowed us to detect the expression of 564 pri-miRNAs of which 45% were differentially regulated. The expression of pri-miRNA clusters, like the miR-302-367 and miR-17-92, encoding for pluripotency and proliferation-promoting miRNAs 34 was highest in the iPSCs, with a gradient of declining expression in M1-> M2-HLCs > PHHs. Pluripotency TFs OCT4, SOX2 and Nanog are known upstream regulators of the miR-302 cluster 35 , which subsequently regulates cell cycle through cyclin D1 and Cdk4 36 . The cell cycle dynamics has a crucial impact on the differentiation potential of stem cells with longer G1 phase inducing differentiation in ESCs 37 . The important role of differentially expressed miRNAs in the regulation of cell cycle was further supported by the established role of many hub miRNAs, such as miR-21 38 , miR-15a 39 , miR-34a 40 , -29a 41 , -29c 42 and -192 43 , and the miRNA target gene functions in cell cycle. Consequently, through regulation of the cell cycle miRNAs may participate in regulating proliferation, which reduces during iPSC to hepatocyte differentiation 44 .
Multiple TSSs are an important regulatory feature at miRNA loci and alternative TSS usage plays an important role in transcriptional control 45 . Alternative TSS usage can affect gene expression and result in tissue-specific and temporally regulated expression 46 . We have earlier shown usage of multiple TSSs to be common within intergenic pri-miRNA genes 6 . Here we show that active TSS locations change in response to hepatocyte differentiation from progenitors, as exemplified by alternative TSS usage at the miR-221/222 and miR-3613/15a/16-1 loci. In conclusion, alternative pri-miRNA TSS usage can take part in orchestrating the intricately controlled expression of specific genes during the iPSC to HLC differentiation. However, future studies are needed to elucidate how changes in pri-miRNAs levels or TSS usage translate to changes at mature miRNA level, and further on to changes at miRNA target gene levels.
The lncRNAs have no coding potential but exhibit tissue-specific expression 47 . A recent study identified >58000 lncRNA genes 48 but the function is validated for only a small fraction of them 49 . LncRNA are known to take part in epigenetic, transcriptional (as activators or repressors), and post-transcriptional regulation of gene expression, as well as regulate stem cell pluripotency and differentiation 50 . The high levels of lncRNA-ROR (regulator of reprogramming) in the iPSCs in our study coincides with previous studies showing lncRNA-ROR to promote pluripotency and stem cell survival 51 . Also other pluripotency-associated lncRNAs like TUNAR 52 and LINC00458 53 decreased during the iPSC to HLC differentiation. LINC00458 is highly expressed in the early stages of hepatic differentiation 54 , and it modulates pluripotency by physically interacting with SOX2 53 . TUNAR mediates the recruitment of RNA-binding proteins to SOX2, NANOG and FGF4 promoters, thus activating their transcription and promoting pluripotency 52 . Concurrently, the expression of these pluripotency markers decreased during differentiation. Based on the histone methylation marks, we deduced that >60% of www.nature.com/scientificreports www.nature.com/scientificreports/ the lncRNAs were enhancer-associated. We detected good correlation between the expression of lncRNAs and adjacent coding genes, irrespective of the lncRNA category, suggesting that lncRNA and coding genes could be tightly co-expressed and thus co-regulated. Interestingly, several up-regulated lncRNAs (in HLCs vs. iPSCs) were situated close to KLF6, which is vital for definitive endoderm formation during hepatocyte differentiation 55 . In addition, in PHH and M2-HLCs lnc-MUC4-1 was upregulated, the nearest gene of which is MUC20. The protein encoded by this gene, Mucin-20, interacts with Met and eventually attenuates ERK1/2 activation, which is important for cell cycle regulation as well as hepatocyte survival 56 . Interestingly, inhibition of the ERK pathway has been shown to extend the life span of hepatocytes in culture as well as increase hepatocyte differentiation 57 . Thus, the lncRNAs could play an important part in the hepatocyte differentiation through regulating e.g. endoderm formation and cell cycle as well as liver-related functions in the differentiated HLCs. The molecular mechanisms behind this regulation need to be explored in further studies.
Correlation of the TF expression with the relative enrichment of the motifs at regulated lncRNA TSSs allowed us to identify a distinct set of transcriptional regulators that could participate in orchestrating the transcriptional changes in response to differentiation. Motifs of TFs like RARG, E2F1, SP1 and FOXH1, whose expression was repressed in PHHs and M2-HLCs, were enriched in the down-regulated transcripts in all the hepatocytes. For example, RARG activates endogenous OCT4 expression and is thus important for somatic cell reprogramming to iPSCs 58 . On the other hand, E2F1 transcription factor is involved in regulating the cell cycle 59 again suggesting that modulating the cell cycle progression is vital for successful HLC differentiation. FOXH1 mainly acts as a transcriptional activator and could be pivotal for the formation of mesendoderm 60 , the major precursor of definitive endoderm 61 . Its higher levels in M1-compared to M2-HLCs suggests that endodermal features persist in M1-HLCs. Motifs of transcriptional regulators like TEAD2, TEAD4, TCF3, TCF4 and FOXM1, whose expression was repressed in PHHs and M2-HLCs, were enriched in the up-regulated transcripts suggesting that the main function of these factors was to repress gene expression in progenitor cells. The TEAD1-4 proteins interact with YAP to form functional heterodimeric TFs that act in the Hippo signaling pathway regulating cell proliferation, differentiation and cell death 62 . TEAD2 has earlier been shown to be highly expressed in embryonic liver and dramatically downregulated in adult liver 63 . Thus, higher levels of TEAD2 in M1-HLCs suggests they could possess a more proliferative and embryonic phenotype than the M2-HLCs. Similarly, FOXM1, a pro-proliferative TF acting as a stimulator of cell cycle progression and expression of pluripotency genes 64 , had high expression in the iPSCs and higher in M1-compared to M2-HLCs. Finally, the upregulated lncRNA transcripts in PHHs and HLCs were enriched for the binding of hepatocyte-specific TFs such as FOXA1, FOXA2, HNF1B, HNF4A and CEBPA. Most of these TFs and their binding sites have previously been correlated with a set of genes that are enriched with mature liver functions 21 . HNF1B, HNF4A, FOXA1 and FOXA2 are vital for hepatocyte differentiation 65 , and they were all more induced in HLCs. Moreover, HNF4A is known to maintain hepatic expression of certain miRNAs like miR-192 and miR-194 66 , and thus play a key role in the indirect regulation of hepatic transcriptome. Interestingly, PPARD and RXR that are known to act as heterodimers 67 and regulate genes important for cell differentiation as well as lipid and cholesterol metabolism 68,69 were both up-regulated in PHH and M2-HLCs (vs. iPSC). Another up-regulated TF in PHH and M2-HLCs was BATF, which increases chromatin accessibility to allow subsequent binding by other TFs 70 . BATF plays an essential role in the differentiation and function of various effector T cell subsets by regulating expression of key TFs, cytokines as well as cytokine and chemokine receptors 71 . BATF acts as a pioneering factor by binding cooperatively with the TF IRF4 and its dimerization partners like JunB and JunD 70 . Our data suggests that it might also play a previously unrecognized role in the hepatocyte differentiation as the up-regulated lncRNA transcripts were highly enriched for the BATF motif as well as the motifs of its binding partners JUNB and JUND. This information could provide future means for the generation of more mature hepatocytes through modulation of TF expression.

Conclusions
We present here the first study characterizing the nascent transcriptome of the iPSC-derived HLCs that expands the analysis from protein coding genes to lncRNAs and pri-miRNAs. We provide evidence that abundance of transcriptional regulators and miRNAs central to mature hepatocyte function are indeed regulated at transcriptional level. Moreover, we identified potential novel master regulators that could orchestrate the transcriptional changes during hepatocyte-specific differentiation. Altogether, this information could be used to improve the current hepatic differentiation protocols aiming at mature hepatocyte phenotypes.

Material and Methods
ipsC reprogramming and cell culture. The study was approved by the Ethical Committee of Pirkanmaa Hospital District (Ethical approval no R12123) and written informed consent was obtained from all fibroblast donors. The iPSC lines were produced from skin fibroblasts with Sendai virus vectors and maintained as published before 72 . All experiments were performed in accordance with relevant guidelines and regulations. Patients donating skin biopsies signed an informed consent after receiving both an oral and written description of the study. Altogether three iPSC lines (UTA.10100.EURCAs, UTA.11104.EURCAs, and UTA.11304.EURCCs) were used in this study, and the lines were characterized as previously described 73 .

Differentiation of iPSCs into hepatocyte-like cells (HLCs).
To differentiate iPSCs to HLCs we used two methods derived from the ones described by Si-Tayeb et al. 9 (Method 1, M1) and Hay et al. 10 (Method 2, M2), both consisting of three main stages (Supplementary Fig. S1A). Details of the differentiation protocols and their modifications are described in the supplement.