Introduction

Embryonic stem cells (ESCs) derived from the inner cell mass of a blastocyst are pluripotent stem cells with unique properties of pluripotency and self-renewal. They can divide indefinitely in vitro, while maintaining the capacity to generate all the cell types of an adult organism. The unique identity of ESCs is governed by a network of transcriptional factors along with epigenetic factors1,2. The epigenetic status of ESCs features an open chromatin structure with characteristic histone and DNA modification profiles.

Somatic cells can acquire the ESC properties through nuclear reprogramming. Three major approaches, including somatic cell nuclear transfer (SCNT), cell fusion and introduction of defined transcription factors, have been established to reprogram somatic cells to pluripotency3,4. The latter approach was first reported by Yamanaka and colleague5, who demonstrated that the expression of combined transcription factors, Oct4, Sox2, Klf4 and c-Myc is capable of reprogramming somatic cells into ESC-like cells, termed induced pluripotent stem cells (iPSCs). Since its initial report, this technology has attracted great attention and motivated many investigations because of its technical simplicity and tremendous application potentials in regenerative medicine.

iPSCs have been shown to be highly similar to ESCs, in terms of transcription program, chromatin modification profiles5,6,7,8,9,10,11,12 and global chromatin configuration6,13,14. Functionally, at least some of the iPSCs have the developmental potential equivalent to ESCs, as entirely iPSC-derived animals (“all-iPSC” mice) can be generated through tetraploid complementation15,16,17. Despite that the approach for iPSC generation is well established3,18, questions remain as to how reprogramming factors drive somatic cells into iPSCs and why the reprogramming process is extremely inefficient in terms of time and the conversion rate from starting cells to iPSCs.

Since any cell fate change is largely an epigenetic process, it is conceivable that potential epigenetic barriers may restrict the transition from somatic cells to iPSCs. Overcoming these epigenetic barriers might be a prerequirement for successful generation of iPSCs. Consistent with this notion, many epigenetic factors (Table 1) and chemical modulators of epigenetic modifications (Table 2) are capable of affecting reprogramming efficiency. In this review, we first discuss the roles of epigenetic regulations in ESC maintenance and iPSC generation. We then discuss our current understanding of the mechanisms underlying iPSC generation with a focus on epigenetic reprogramming.

Table 1 Epigenetic factors involved in iPSC generation
Table 2 Epigenetic-modulating small molecules that affect iPSC generation

Epigenetic regulations in ESCs and reprogramming to iPSCs

The pluripotent state of ESCs is enforced by epigenetic factors closely linked to the pluripotency transcription factor network1,2. Resetting the epigenetic state of somatic cells to that of ESCs is one of the ultimate tasks for the reprogramming factors in iPSC generation. The epigenetic factors involved in maintaining the pluripotency of ESCs must be activated through the reprogramming process. Furthermore, epigenetic modulating strategies must be used to overcome the inherent somatic epigenetic state. Therefore, some epigenetic factors may function specifically to erase somatic epigenetic statuses. In this section, we discuss the detailed roles of epigenetic modulations in iPSC generation by juxtaposing the functions of these modulations in maintaining ESC identity and in establishing iPSC pluripotency.

Global chromatin reorganization in iPSC generation

Compared with differentiated cells, ESCs display distinctive chromatin features related to its unique properties. The chromatin in ESCs is in an “open” state, with more accessible chromatin domains and less heterochromatin foci. In contrast, highly condensed heterochromatin foci are prevalent in lineage-committed somatic cells19,20,21,22. Consistent with this, genome-wide distribution of repressive histone modifications is less prevailing in ESCs, compared with differentiated cells23,24; and active histone modifications are more abundant in ESCs19,20,25,26. Additionally, the hyperdynamics of nuclear proteins19 and hyperactivity of global transcription in ESCs20 also indicate that the ESC chromatin is in a permissive state.

During iPSC generation, the somatic cell chromatin needs to be reorganized to an ESC-like state with loosely organized heterochromatin and abundant euchromatin modifications13,14. It appears that the chromatin reorganization events take place in a coordinated and sequential manner. Rearrangement of the heterochromatin, characterized by the presence of histone H3 lysine 9 trimethylation (H3K9me3) and HP1, precedes the activation of Nanog, while enrichment of euchromatin marks occurs concurrently with Nanog activation14. Consistently, heterochromatin is rearranged and becomes dispersed when partially reprogrammed cells are converted to iPSCs by dual inhibition of MEK and GSK313. Thus, chromatin reorganization from the somatic state to an ESC-like one seems to be required for the activation of pluripotency circuitry. However, such a drastic chromatin rearrangement appears to have a substantial latency in the reprogramming process. Detailed characterization of these changes at the molecular level is difficult due to the low percentage of somatic cells that can be successfully converted to iPSCs.

To overcome the problem associated with cell population heterogeneity during the reprogramming process, one study focused on examining the histone modification changes in the first several cell cycles after the induction of the reprogramming factors. In this initial stage of reprogramming, a global change in H3K4me2 distribution is observed27. De novo acquisition or further enrichment of this modification occurs at large numbers of gene loci, including those encoding pluripotency factors and their targets. Although H3K4me2 is classified as an active chromatin mark, change in local H3K4me2 level does not result in gene expression change27. Altered expression patterns can mainly be detected at gene loci marked with H3K4me327, indicating that somatic chromatin status may restrict the activity of reprogramming factors. Consistent with this notion, the distribution of the H3K27me3 repressive mark is largely unchanged27, implicating that drastic transition of epigenetic landscapes has yet to occur after these initial changes.

ATP-dependent chromatin remodeling

Global or local chromatin structure is regulated in part by ATP-dependent chromatin-remodeling factors. These factors are capable of regulating DNA accessibility by depositing, replacing or evicting nucleosomes28. Multiple chromatin-remodeling factors belonging to different classes have been shown to regulate the ESC identity. The SWI/SNF class ESC-specific BAF (esBAF) complex is essential for ESC maintenance. ESCs lacking the esBAF components are deficient in self-renewal and display an abnormal differentiation program29,30,31,32. The esBAF catalytic subunit Brg1 shares a substantial portion of targets with core pluripotency factors32,33, and contributes to pluripotency with dual functions in transcription regulation34: Brg1 facilitates the activation of its targets involved in the LIF-STAT3 pathway, an essential pathway for ESC self-renewal, by antagonizing Polycomb repressive complex (PRC) 2-mediated repression; meanwhile, it reinforces the repression on the differentiation-related Hox gene loci34. By tracing nuclear fractions that can transiently activate the Oct4 locus, esBAF components were found to be capable of increasing reprogramming efficiency35, consistent with their role in shaping the chromatin state in ESCs. In the presence of esBAF components, euchromatin features at certain pluripotency gene loci are more prominent at the intermediate stage of reprogramming, and the accessibility of these loci to the reprogramming factors is enhanced35.

The CHD class remodeling factor, Chd1, preferentially binds to euchromatin and colocalizes with RNA polymerase II (Pol II)36. Chd1 helps maintain the open chromatin in ESCs, as depletion of Chd1 leads to accumulation of heterochromatin and interferes with proper differentiation. Consistently, Chd1 deficiency reduces reprogramming efficiency36, indicating that establishing the ESC chromatin state is crucial for acquiring pluripotency.

Two other CHD class remodeling factors, Chd3 and Chd4 (also known as Mi2-α and Mi2-β) reside in the NuRD complex, which also harbors histone deacetylases HDAC1 and HDAC2. In ESCs, NuRD functionally converges with other repressive machinery, including PRC2 and H3K4me2-specific demethylase Kdm1a (also called Lsd1 or Aof2), while eliciting effects opposite to Brg137,38,39. ESCs lacking Mbd3, an essential component of NuRD, exhibit elevated expression of certain pluripotency genes. Such an expression change is associated with LIF-independent self-renewal capacity and deficiency in lineage commitment upon differentiation40,41. Moreover, NuRD has been shown to contribute to the autorepression of a set of pluripotency genes, whose expression are subjected to negative autoregulatory feedback control in serum-cultured ESCs41,42. In particular, NuRD is recruited by the transcription factor Zfp281 and mediates Nanog autorepression42. The repressive effect of Zfp281 impedes Nanog activation during iPSC generation, and depletion of Zfp281 facilitates the conversion of partially reprogrammed cells, or “pre-iPSCs”, to iPSCs42. It will be interesting to determine whether depletion of NuRD results in similar effects on iPSC generation.

The INO family Tip60-p400 complex, which possesses both chromatin remodeling and histone acetyltransferase (HAT) activities, is also essential for ESC maintenance. ESCs lacking Tip60-p400 subunits fail to self-renew or differentiate efficiently43. Tip60-p400 potentially regulates genes bound by Nanog and marked by H3K4me3 through depositing histone H4 acetylation, and thereby contributes to the ESC identity43. Lastly, a study on the ISWI family remodeling complex Nurf revealed that, depletion of its essential component, Bptf, in ESCs leads to deficiency in differentiation into all three germ layers, albeit having minimal effect on ESC self-renewal44. It is currently unknown whether Tip60-p400 or Nurf plays a role in iPSC generation.

Histone acetylation

Histone-modifying enzymes play important roles in regulating ESC identity and the iPSC generation process. Histone modifications are thought to function by either directly affecting higher-order chromatin configurations or mediating chromatin-related processes through recruiting specific binding proteins45. Histone acetylation can potentially open up chromatin by neutralizing the positive charge of histone lysine residues. Consistent with this function, histone acetylation is highly enriched in ESCs compared with differentiated cells19,20,25,26, indicating that it contributes to the open chromatin state in ESCs. Consistently, treatments of HDAC inhibitors have been shown to enhance nuclear dynamics, reduce differentiation propensity46 and support the self-renewal program in ESCs47. In addition, in cell fusion-mediated reprogramming, low levels of histone H3 K9 acetylation (H3K9ac) in ESCs correlate with reduced efficiency in reprogramming the nuclei from fibroblasts, and HDAC inhibitors can improve the reprogramming efficiency48.

In support of the role of histone acetylation in reprograming, HDAC inhibitors, including valproic acid (VPA) and butyrate, significantly improve the efficiency of iPSC generation49,50,51,52. Butyrate also promotes the reprogramming fidelity by reducing the frequency of partially reprogrammed cells52. Treatments of the reprogramming cells with HDAC inhibitors lead to enhanced expression of ESC-enriched genes before the establishment of pluripotency49,51,52. Furthermore, the role of VPA in reprogramming can be at least partly attributed to its ability to induce HDAC2 degradation53. It has been reported that VPA treatment or HDAC2 depletion allows reprogramming by the microRNA cluster miR302/367 without introducing any other factors53. Finally, HDAC inhibitors have also been applied to rectify aberrantly silenced loci and eliminate the somatic cell memory in established iPSC lines54,55.

Despite the prevailing evidence for the association between histone acetylation and pluripotency, it is surprising that few reports have identified the function of individual HATs or HDACs in regulating pluripotency. A recent study showed that, Kat8 (also called Mof or Myst1), a HAT catalyzing H4K16ac, is important for ESC identity25. Kat8 deletion abolishes the self-renewal capacity and pluripotency of ESCs. Mechanistically, Kat8 has been found to regulate the ESC transcription network by functioning upstream to activate Nanog25. Interestingly, Wdr5, a shared component of the Kat8-containing complex and the MLL complex catalyzing H3K4 methylation56, displays a phenotype similar to that of Kat8 when depleted in ESCs57. Given that Dpy30, another MLL complex component regulating H3K4 methylation, is dispensable for ESC self-renewal58, it seems likely that the Kat8-containing complex mediates the Wdr5 function. In support of this notion, the recruitment of Wdr5 to its target loci, including key pluripotency genes, depends on Kat825. The fact that the binding of Wdr5 to these loci results in an enrichment of H3K4me3 indicates a hierarchical relationship between these histone modifications25. Wdr5 has been shown to be required for efficient generation of iPSCs57. It will be interesting to investigate the potential effect of Kat8 on iPSC generation. Kat8 may enhance the reprogramming efficiency by directly activating Nanog and promote the conversion of pre-iPSCs to iPSCs.

Histone methylation

Histone methylation is closely linked to transcription. Methylations on different residues and sometimes to different degrees (i.e. tri-, di- or mono-) represent differential transcriptional statuses. Generally, transcription activation is associated with H3K4me3/2 at promoter, and H3K36me3/2 and H3K79me3/2 across the transcribed region, while transcription silencing correlates with H3K27me3/2 at promoter and H3K9 and H4K20 methylation in heterochromatic regions59,60,61. However, accumulating evidence has challenged this generalized view, arguing for a more context-dependent role of histone methylation in transcription regulation62.

H3K4 methylation

In lower eukaryotes, H3K4 methylation enriched in the promoter highly correlates with transcription activation59. However, in mammalian cells, promoter H3K4me3 does not seem to strictly associate with gene expression. Instead, the majority of promoters are marked by H3K4me3, regardless of the gene expression status63,64. Although the presence of both H3K4me3 and Pol II at promoter is an indication of transcription allowance63,64, productive transcription relies on the rate-limiting step of RNA Pol II releasing from the promoter63,65,66. The inactive state of some H3K4me3-marked genes can also be attributed to the coexistence of repressive modifications, such as H3K27me3 (see below). These repressive modifications appear to play a dominant role in determining the transcription status.

In ESCs, the overall prevalence of H3K4me3 at promoter is similar to that in the differentiated cells. However, cell type-specific H3K4me3 pattern can be detected and is frequently correlated to cell type-specific gene expression63,67. Enzymes depositing or removing H3K4 methylation have been shown to play important roles in ESCs. Dpy30, a component of the MLL histone methyltransferase (HMT) complex, is required for ESCs to commit to the neural lineage58, indicating that H3K4me3 deposited by MLL is crucial for maintaining differentiation potential in ESCs. Deficiency in Kdm1a, an H3K4me2/1-specific demethylase, results in spontaneous differentiation of human ESCs68. Kdm1a targets developmental genes co-occupied by H3K4me3/2 and H3K27me3 and contributes to their repression by balancing the H3K4 methylation level68. Another demethylase, Kdm5b (also called Plu1), which is specific for H3K4me3/2, is essential for ESC self-renewal69. Kdm5b contributes to the activation of self-renewal-related genes, which preferentially function in nucleotide metabolism, cell division and chromatin regulation. Kdm5b is located to the transcribed region of these genes, repressing intragenic cryptic transcription and sustaining efficient transcription elongation69. The potential roles of these enzymes in iPSC generation remain to be tested.

H3K9 methylation

H3K9 methylation is associated with transcription silencing and heterochromatin formation. Genome-wide localization studies have shown that the genomic domains marked with H3K9me3 are substantially expanded in differentiated cells compared with ESCs23, and long-range silenced genomic regions marked by H3K9me2 are also increased upon differentiation24. It has been shown that, upon exiting the pluripotent state, H3K9-specific HMT Kmt1c (also called G9a) contributes to the silencing of the Oct4 locus by forming heterochromatin structure and recruiting the de novo DNA methylation machinery70. While Kmt1c plays a role in differentiation-induced silencing, H3K9me3/2-specific histone demethylases (HDMs), Kdm3a and Kdm4c (also called Jhdm2a/Jmjd1a and Jhdm3c/Jmjd2c, respectively) are essential for maintaining the ESC identity71. Knocking down either Kdm3a or Kdm4c in ESCs blocks ESC self-renewal and leads to differentiation71. In ESCs, Kdm3a regulates a distinct set of pluripotency genes, including Tcl1, Tcfcp2l1 and Zfp57, while Kdm4c contributes to the activation of Nanog71.

Consistent with the roles of these enzymes in regulating pluripotency, inhibition of Kmt1c and/or overexpression of Kdm3a increases the efficiency of SCNT and cell fusion-based reprogramming70,72. In transcription factors-induced reprogramming, inconsistent results are reported for the role of Kmt1c. Treatment of Kmt1c inhibitor has been shown to increase reprogramming efficiency73; however, another study showed that depletion of Kmt1c by small hairpin RNA (shRNA) does not increase reprogramming efficiency74. More studies are needed to clarify the role of Kmt1c and to investigate the potential role of H3K9me3/2-specific demethylases in iPSC generation.

In ESCs, H3K9 methylation also functions in repressing trophectoderm-specific genes. ESCs lacking Kmt1e (also called Setdb1 or Eset), an H3K9-specific HMT, fail to self-renew while acquiring trophectoderm properties75,76,77,78. Consistent with its role in ESCs, depletion of Kmt1e reduces the reprogramming efficiency in iPSCs generation74. In addition to Kmt1e, other H3K9-specific HMTs have also been shown to affect iPSC generation. Depletion of Kmt1d (also called Ehmt1 or Glp1) reduces iPSC generation efficiency, while depletion of Kmt1a (also called Suv39h1) enhances iPSC generation, manifesting the complexity of the roles of H3K9 methylation in shaping pluripotency. How these HMTs function in iPSC generation and/or ESC maintenance requires further studies.

H3K27 methylation

H3K27me3 is a repressive modification placed by PRC2. Much attention has been drawn to its role in ESCs for its involvement in the “bivalent” domain, which is coined by the coexistence of the repressive mark H3K27me3 and the active mark H3K4me379. Genes that harbor the bivalent domain are transcriptionally silenced in ESCs, suggesting a potentially dominant role of H3K27me3. In ESCs, genes with bivalent domain include a substantial number of differentiation-related genes targeted by the core pluripotency factors67,79,80,81,82. Upon differentiation, these bivalent domains are often resolved, leaving either H3K27me3 or H3K4me3, indicative of the expression status67,79,80. It is believed that the bivalent domain keeps the differentiation-activated promoters in a “poised” state, allowing rapid response to differentiation cues. Consistently, bivalent differentiation-related genes are bound by PRC2 components in ESCs83,84,85. Recruitment of PRC2 to these targets are directed by the PRC2-associated protein Jarid2 or Mtf286,87,88,89,90,91. In addition, part of the targets repressed by PRC2 are further occupied by PRC1, which establishes another repressive mark, histone H2A K119 ubiquitylation (H2AK119ub)84,88,92,93. The recruitment of PRC1 to these loci is thought to fortify the gene repression during differentiation93.

Despite its intriguing genomic distribution, the functional significance of H3K27me3 in ESCs is controversial. Several lines of evidence indicate that H3K27me3 may play a more important role in differentiation rather than in ESC maintenance, and the deposition of H3K27me3 may simply mark the genes that need to be activated upon differentiation. First, ESCs lacking PRC2 activity retain the self-renewal capacity, although the expression of PRC2 target genes is slightly derepressed in these ESCs83,94,95,96,97. The fact that PRC2-deficient ESCs only display deficiency upon ESC differentiation94,95,96,97 is consistent with the postimplantation lethality phenotype of PRC2-deficient animals (Table 1)98,99,100,101. Second, the bivalent domain is not limited to ESCs or progenitor cells, as it is also found in terminally differentiated cells67,81,82,102,103, suggesting that the bivalent structure itself may not exclusively stand for differentiation potential. Third, a recent study of the ground state ESCs, a primitive pluripotent state enforced by MEK and GSK3 inhibitors (2i), showed that, compared to serum-cultured ESCs, H3K27me3 is dramatically reduced in 2i-cultured ESCs. This results in the loss of bivalency for two-thirds of the bivalent genes104. However, the genes losing bivalency in 2i culture are still effectively silenced in this condition104, suggesting that other mechanisms must contribute to the repression of these loci. Along this line, a recent study showed that PRC2 is preferentially recruited to chromatin with high-density nucleosomes105, indicating that a non-permissive state of chromatin has already been established, at least, in part, before H3K27me3 is deposited. Therefore, it appears that the repression state of H3K27me3-marked genes in ESCs is not only enforced by PRC2, but that other functionally important repressive mechanisms may also be involved. Consistent with this notion, depletion of both PRC2 and PRC1 in ESCs leads to a much more severe differentiation phenotype compared with the cells lacking either one of them95. Collectively, H3K27me3 contributes to the repression of developmental genes in ESCs, but its detailed role in regulating ESC pluripotency requires further investigation.

Although controversy remains for the function of H3K27me3 and PRC2 in ESCs, PRC2 components have been shown to be critical for somatic cell reprogramming. Cell fusion-based reprogramming studies demonstrated that ESCs lacking PRC2 component Suz12 or Eed have reduced capacity to reprogram somatic cell nuclei106. Consistently, in iPSC generation, depleting each of the PRC2 complex core components, Kmt6 (also called Ezh2), Suz12 or Eed, dramatically reduces reprogramming efficiency74,107,108, and overexpression of Kmt6 increases reprogramming efficiency107. Other proteins specifically associated with PRC2 in ESCs, including Jarid2, Mtf2 and Esprc2p48, also promote iPSC generation synergistically108. Furthermore, Kmt6 has been shown to be activated by a hierarchy of pluripotency factors in the late phase of iPSC generation and its activation in part contributes to pluripotency establishment107. In addition, components of PRC1, which is functionally relevant to PRC2, are also important for iPSC generation. Ring1 (also called Ring1a) and Bmi1 are required for efficient reprogramming74,109, and ectopic expression of Bmi1 is sufficient to mediate iPSC generation from fibroblasts when combined with Oct4109.

Interestingly, removal of H3K27 methylation is also involved in reprogramming. Utx, an H3K27me3/2-specific demethylase, is required for efficient reprogramming in both cell fusion-mediated and transcription factors-induced reprogramming110. iPSCs derived in the absence of Utx bear aberrant H3K27me3 and H3K4me3 profiles. Mechanistically, Utx specifically regulates a set of pluripotency genes, including Sall1, Sall4 and Utf1, whose activation is important for establishing pluripotency110. Although Utx is dispensable for ESC derivation and maintenance, Utx-deficient ESCs are crippled in differentiation110,111. Overall, it seems that the dynamic regulation of H3K27 methylation plays a more important role in the transition of cell identities, including differentiation and reprogramming, than in the maintenance of ESC status.

H3K36 methylation

H3K36 methylation, especially H3K36me3/2, is involved in transcription elongation, marking actively transcribed loci in the gene body. Recent study has suggested that depletion of H3K36me2, along with the deposition H3K4me3, at CpG-rich promoter helps establish a platform for the assembly of gene regulatory machinery112. Removal of H3K36me2 can be directed by the JmjC domain-containing demethylases, Kdm2a and Kdm2b (also called Jhdm1a/Fbxl11 and Jhdm1b/Fbxl10, respectively). ESCs lacking Kdm2a appear to be normal in self-renewal71,112, while Kdm2b-deficient ESCs maintain the expression of pluripotency genes, but display a skewed lineage commitment upon differentiation (He and Zhang, unpublished data).

We and others have found that Kdm2b is capable of enhancing iPSC generation113,114. Kdm2b functions early in reprogramming by promoting the activation of epithelial genes114. It has been shown that activation of epithelial genes through mesenchymal-to-epithelial transition (MET) is an early event in reprogramming115,116. Consistent with the role of H3K36 demethylation in reprogramming, the bulk H3K36me3/2 level decreases during the iPSC generation process113. Interestingly, dynamic regulation of H3K36 methylation is not limited to the MET process during iPSC generation. Global H3K36me3 level has been shown to become more abundant when an epithelial cell line is induced to a mesenchymal state117, indicating a general link between H3K36 methylation and mesenchymal-epithelial state. The role of Kdm2b in epithelial gene activation appears to be direct, as Kdm2b binds to and modulates the H3K36me2 level of the activated epithelial gene loci during reprogramming114. Interestingly, Kdm2b-mediated enhancement of epithelial gene activation is followed by the elevated activation of pluripotency genes, such as Nanog. However, the activation of Nanog takes place outside the functioning time window of Kdm2b114, suggesting a potential link between the sequential gene activation events in iPSC generation. In addition, inhibition of the epithelial gene activation largely abrogates the enhancing effect of Kdm2b in iPSC generation114, indicating that activation of epithelial genes is a prerequisite for pluripotency establishment. How activation of epithelial genes facilitates the establishment of pluripotency remains to be determined.

In addition, vitamin C has been shown to promote iPSC generation synergistically with Kdm2b113. Such a synergy between vitamin C and Kdm2b can be explained by the capacity of vitamin C to convert the oxidative Fe(III) to the reduced Fe(II)118. Fe(II) is a cofactor required for dioxygenase-catalyzed oxidation reactions, including histone demethylation by JmjC domain-containing proteins such as Kdm2b119,120. Since vitamin C also enhances iPSC generation by itself113,121, it would be interesting to test whether vitamin C contributes to reprogramming by potentiating other reprogramming-related enzymatic activities.

H3K79 methylation

H3K79 methylation, catalyzed by Kmt4 (or Dot1L), is associated with active transcription and usually enriched in the gene body61. Kmt4-depleted ESCs appear to self-renew with pluripotency markers expressed but have a lower proliferation rate122. Kmt4 deletion in ESCs also leads to the loss of H3K9me2 and H4K20me3 at centromeres and telomeres, resulting in a less condensed chromatin state122. Kmt4 deficiency in mice causes embryonic lethality with multiple developmental abnormalities122. How this phenotype is reflected in differentiation potentials in vitro is not clear.

Through a knockdown screening for epigenetic factors affecting iPSC generation, Kmt4 was found to impede iPSC generation. Depletion of Kmt4 by shRNA or inhibition of Kmt4 enzymatic activity by a small molecule significantly enhances the efficiency of iPSC generation74. Inhibition of Kmt4 in the early stage is sufficient for its enhancing effect on iPSC generation. Consistent with the functioning time window, Kmt4 inhibition attenuates the expression of mesenchymal transcription factors, facilitating the acquisition of epithelial features through MET74. Downregulation of mesenchymal genes is accompanied by the reduced H3K79me2 level on these loci, consistent with the association of H3K79me2 with active transcription. Interestingly, upregulation of pluripotency genes, such as Nanog and Lin28, is also detected upon Kmt4 inhibition, and such an upregulation depends on the enhanced MET, which is caused by Kmt4 inhibition74. These observations again indicate that gaining of epithelial properties is a prerequisite for the activation of key pluripotency genes during iPSC generation.

Studies from H3K36 and K79 methylations suggest that, these two modifications appear to constitute a barrier for acquisition of epithelial properties during iPSC generation. In the reprogramming process, Kdm2b and factors antagonizing the function of Kmt4 presumably facilitate the elimination of specific inherent somatic barriers rather than to establish or maintain the pluripotency circuitry.

DNA methylation and its derivatives

Roles of DNA modifications in ESC maintenance and reprogramming to iPSCs

One of the best-characterized DNA modifications is the methylation of cytosine (mC) at the 5 position. mC is catalyzed by the de novo DNA methyltransferases Dnmt3a and Dnmt3b and maintained by Dnmt1. Most of the DNA methylation occurs in the context of CpG dinucleotide123. Genome-wide mapping of DNA methylation suggests that, in general, CpG-rich promoters tend to be hypomethylated and CpG-poor ones hypermethylated124,125,126. DNA methylation at promoter is indicative of a repressive chromatin environment and inversely correlated with H3K4me367,125,126. In ESCs, hypomethylated CpG-rich promoters are enriched in ubiquitously expressed house-keeping genes and genes highly regulated during development. These gene loci are marked by either H3K4me3 alone or bivalent modifications. In contrast, hypermethylated CpG-poor promoters are preferentially associated with tissue-specific genes that are devoid of H3K4me3124,126. Particularly, promoters of pluripotency genes such as Nanog and Oct4, are hypomethylated in ESCs. These genes become hypermethylated through de novo DNA methylation upon differentiation7,125,127. DNA methylation is not required for ESC maintenance as deletion of any or all of the three DNMTs does not affect the self-renewal capacity of ESCs128,129,130. However, DNMT-deficient ESCs fail to execute lineage commitment upon differentiation signaling in vitro131. Studies in mouse models also showed that deletion of Dnmt1 or Dnmt3b leads to postimplantation lethality and mice lacking Dnmt3a exhibit early postnatal lethality128,129, supporting a critical role of DNA methylation in development. Nevertheless, somatic cells that lack Dnmt3a and Dnmt3b can be converted into iPSCs with comparable, if not better, reprogramming efficiency132, suggesting that de novo DNA methylation is not important for transcription factors-directed establishment of pluripotency.

DNA methylation can be passively diluted by inhibition of DNMTs through cell cycle progression or actively removed by various potential mechanisms133. Among these mechanisms, Tet protein-mediated oxidation has been implicated in the functioning in ESCs. Tet proteins are dioxygenases capable of oxidizing 5 mC to hydroxylmethylcytosine (hmC), formylcytosine and carboxylcytosine134,135,136,137. Two of the Tet proteins, Tet1 and Tet2, are enriched in ESCs135. Genome-wide localization analysis revealed that, Tet1 and hmC are preferentially enriched at the CpG-rich promoters, including those in bivalent gene loci138,139,140,141,142. It has been suggested that Tet1 and hmC play dual functions in regulating gene expression: Tet1 and hmC potentially support gene activation at pluripotency genes, while contributing to the silenced state of bivalent genes. Deficiency in Tet1, Tet2 or Tet1/2 in ESCs does not seem to affect ESC maintenance or pluripotency140,143,144, but skewed differentiation can be detected when these ESCs are subjected to differentiation135,142,143,144. It is likely that Tet1, as well as hmC, functions in fine-tuning pluripotency in ESCs.

During reprogramming, pluripotency genes that are hypermethylated in somatic cells must be demethylated and activated. Compared with the fast activation of pluripotency genes in SCNT and cell fusion, activation of these genes has substantial latency in the iPSC generation process3,4. It has been suggested that this difference may be due to the abundance of putative demethylase(s) in oocyte and ESCs. Consistent with this notion, Aid, a deaminase that triggers base excision repair (BER)-mediated demethylation by converting mC to thymidine, has been shown to be required for efficient activation of Oct4 and Nanog in cell fusion-mediated reprogramming145. However, it is unclear whether Aid functions similarly in iPSC generation. Nonetheless, insufficient DNA demethylation does pose a hurdle for iPSC generation.

5-Azacytidine (5-Aza), a DNMT inhibitor, facilitates conversion of partially reprogrammed cells to iPSCs, and enhances overall reprogramming efficiency when applied at the late stage of reprogramming7. 5-Aza contributes to the reprogramming process presumably by facilitating the demethylation and activation of pluripotency genes. Moreover, a recent study demonstrated that Tet2 and Parp1, a poly(ADP-ribose) polymerase involved in BER-mediated demethylation146, are required for iPSC generation, and Parp1 overexpression facilitates reprogramming147. Mechanistically, Parp1 prevents further methylation of Nanog and Esrrb promoters during reprogramming, while Tet2 deposits hmC on these loci147. Parp1- and Tet2-regulated DNA modifications appear to affect the establishment of the active chromatin and the binding of reprogramming factors to Nanog and Esrrb, which eventually contributes to the activation of these pluripotency genes147. Although these studies suggest that loss of DNA methylation is an integral part of the reprogramming process, how active DNA demethylation is targeted to the relevant genes remains to be uncovered.

DNA methylation patterns after the establishment of iPSCs

Once reprogramming cells gain the self-renewal capacity and become independent of the introduced transcription factors, iPSCs are established. Insufficient DNA demethylation in the reprogramming process also affects the properties of established iPSCs. Remnant DNA methylation patterns specific for the starting cells have been observed in the established iPSCs. This epigenetic memory of the starting cells is associated with biased differentiation potential toward the originating cell lineages54,148,149,150. The remnant methylation can be eliminated and the biased potential rectified by serial passages, cross-lineage differentiation, or combined treatment of DNMT inhibitor and HDAC inhibitor54,148. Moreover, epigenetic memory in iPSCs also reflects in insufficient DNA methylation on somatic-specific gene loci, and such a hypomethylated state is usually associated with transcription aberrancy151. Thus, it seems that DNA methylation status needs to be adjusted for nascent iPSCs to reach bona fide pluripotency.

Apart from the remnant somatic cell DNA methylation patterns, multiple reports have shown that certain genomic loci in iPSCs bear aberrant DNA methylation12,55,152. Such an aberrant methylation pattern is neither a feature of ESCs nor that of the originating somatic cells. Instead, it is gained through the reprogramming process. Genome-wide single-base mapping of DNA methylation uncovers certain iPSC-specific DNA methylation patterns in human iPSCs12,152. One study identified nine aberrantly methylated genes that distinguish human iPSCs from human ESCs152. A second study showed that the aberrant methylation hotspots in human iPSCs, although mostly occurring at CpG sites, can be at non-CpG sites around centromeres and telomeres, which may potentially affect chromatin structure12. Both studies suggest that the aberrant methylation patterns can be transmitted through differentiation, therefore they may potentially interfere with developmental programs12,152. However, since these analyses are carried out with ESCs and iPSCs with different genetic backgrounds, it is unknown whether and how differences in genetic background contribute to the observed differential DNA methylation patterns between iPSCs and ESCs.

To avoid the potential genetic background effect, a study comparing genetically identical mouse ESCs and iPSCs showed that, an imprinted locus, Dlk1-Dio3, is frequently silenced and hypermethylated during reprogramming55. Silencing of this locus in iPSCs is associated with poor success rate in generating all-iPSC mice by tetraploid complementation55,153. Hypermethylation of this locus depends on the de novo DNA methyltransferase Dnmt3a, and silencing of Dlk1-Dio3 in iPSCs can be overturned by VPA treatment although with a low efficiency55. Interestingly, vitamin C is capable of preventing the aberrant silencing of Dlk1-Dio3154. It is tempting to speculate that silencing of this locus might be counteracted during reprogramming by vitamin C-promoted activities of the Tet enzymes, which mediate active DNA demethylation.

In summary, a variety of epigenetic mechanisms have different roles in ESC maintenance and during iPSC generation. Epigenetic factors important for sustaining the ESC fate are crucial for iPSC generation, whereas those that barely affect ESC pluripotency and/or self-renewal may also contribute to iPSC generation by affecting the transition of epigenetic landscapes during the reprogramming process.

Mechanisms of iPSC generation

Reprogramming from somatic cells to an ESC-like state requires elimination of epigenetic marks inherent of somatic cells and establishment of new epigenetic marks characteristic of pluripotency. Each epigenetic event needed to take place in reprogramming can be regarded as an epigenetic barrier. Accumulating evidence indicates that these barriers converge onto two sequential events: (1) MET; and (2) activation of pluripotency circuitry (Figure 1). In addition, nascent iPSCs also need to overcome epigenetic barriers to reach bona fide pluripotency found in ESCs (Figure 1). In this section, we discuss how cell fate conversion is achieved and how reprogramming cells overcome the epigenetic barriers during the iPSC generation process.

Figure 1
figure 1

The path from a somatic cell to a refined iPSC and the putative epigenetic barriers during the process. When reprogramming factors are introduced into a fibroblast, the reprogramming factors immediately drive the cell to overcome barrier 1, resulting in the acquisition of the epithelial properties through MET. A fibroblast that fails to conquer this barrier retains its cellular identity with either an accelerated or arrested proliferation status. After the cell gains epithelial properties, a subsequent barrier (2) to acquiring pluripotency is encountered. Intermediate epithelial cell that successfully overcomes the second barrier becomes a nascent iPSC, which can self-renew independently of introduced transcription factors. Otherwise, it is trapped in the intermediate stage and becomes a partially reprogrammed cell. For the nascent iPSC, additional barrier(s) (3) need to be overcome actively or destructed passively to achieve a bona fide pluripotency equivalent to that in ESCs. Processes that depend on the reprogramming factors are shown in solid arrows, whose thickness reflects the approximate propensity for the cell to undergo a specific transition. Dotted arrows represent the processes that require additional manipulations other than the induction of reprogramming factors. Putative epigenetic barriers in iPSC generation are numbered and shown in solid arcs. Other potential barriers are shown in dotted arcs.

Gaining epithelial properties and accelerating cell cycle

Under the influence of reprogramming factors, somatic cells follow a set of steps to achieve pluripotency7,107,116,155,156. Acquisition of epithelial cell properties through MET is one of the earliest events in reprogramming of somatic cells to iPSCs115,116. Upregulation of epithelial genes, such as Cdh1 and Epcam, and downregulation of mesenchymal genes, such as Snai1/2 and Zeb1/2, take place early in reprogramming7,115,116. Consistent with this, factors promoting the epithelial state, such as TGF-β inhibitors, BMPs, microRNA miR200s and miR302/367, and Cdh1, enhance iPSC generation, and in some cases, are able to substitute for reprogramming factors116,157,158,159,160,161,162. In contrast, factors that suppress the epithelial state (e.g. TGF-β) or depletion of key epithelial adhesion molecules (e.g. Cdh1, Epcam) are able to inhibit iPSC generation115,116,160,161.

Morphologically, reprograming cells that gain epithelial properties show reduced cell size and compact cell-cell interaction115,116,163. By retrospective image tracing, acquiring epithelial cell-specific morphology is observed for the iPSC-destined cells early in reprogramming. In fact, all iPSCs, indicated by the co-expression of Nanog and Cdh1, are originated from the cells that acquired epithelial features163. This observation indicates that acquisition of epithelial status is a necessary step for the establishment of pluripotency in iPSCs. However, gaining epithelial properties is not sufficient for reaching the iPSC fate, since continuous induction of reprogramming factors is required even after iPSC-destined reprogramming cells have gained the epithelial features155,156,163. Therefore, additional event(s) that occurs later is also crucial for the establishment of pluripotency. Given that acquisition of epithelial properties is completed early during reprogramming115,116,163 and able to predict the iPSC-destined cells163, an elite model for iPSC generation seems reasonable163. In this model, the elite status of a certain population of reprogramming cells is imposed at the beginning of reprogramming either stochastically or deterministically163. However, such a model may not fully depict the whole reprogramming process, since it neglects the existence of partially reprogrammed cells, for example, a potential population of Cdh1+ Nanog cells. These partially reprogrammed cells probably gain the epithelial features and form colonies, but fail to activate pluripotency genes7,52,107,164,165,166. A revised model (Figure 2) may present the reprogramming process in a more integral way.

Figure 2
figure 2

Schematic presentation of a model explaining how iPSC generation can be facilitated by additional factors that act on different steps of the reprogramming process. Reprogramming is initiated at t0 by introducing transcription factors Oct4, Sox2 and Klf4 (OSK), OSK plus Kdm2b, or OSK plus Nanog. Soon after reprogramming (t1), some cells rapidly overcome the putative epigenetic barrier (1) to gain epithelial features. In the presence of Kdm2b, which facilitates the acquisition of epithelial status114, more cells overcome this barrier, turning into intermediate epithelial cells. Overcoming this first barrier is depicted in a stochastic manner, although a deterministic mode is also possible. Intermediate cells subsequently encounter the barrier (2) to activating pluripotency circuitry. Overcoming the second barrier may take longer time (t1 to t2 to t3) and the task remains incomplete for some cells, which become partially reprogrammed cells. Nanog, which is capable of driving intermediate cells to iPSCs165, facilitates reprogramming at this step. At certain time points in reprogramming (e.g. from t2 to t3), the effect of Kdm2b or Nanog is manifested by the increased efficiency in iPSC generation. Different reprogramming factor combinations may also lead to different ratios of partially reprogrammed cells. Note that, cell proliferation and its potential effect on reprogramming are not considered in the figure, since neither Kdm2b nor Nanog enhances reprogramming by affecting the cell cycle114,168. Cell fate transitions in the reprogramming process are illustrated in the box on the right with putative epigenetic barriers numbered. Bars at the bottom of the figure indicate the progression of iPSC-destined cells in overcoming individual barriers.

The molecular determinants that endows the reprogramming cells with the advantage in gaining epithelial status, if it exists, need to be determined. At the chromatin level, perturbation of H3K36 and H3K79 methylation statuses may partly account for the advantage of some cells in gaining epithelial statuses74,114. Determining the binding profiles of the reprogramming factors within the somatic epigenetic landscape will be informative for identifying the rate-limiting epigenetic barriers encountered by these factors in the early step of the iPSC generation process.

Concurrent with gaining epithelial properties, cells that acquire epithelial properties have been shown to escape cell cycle arrest and secure a faster division rate163. Several reports showed that promoting cell proliferation, for example, by repressing the p53-p21 pathway or inhibiting Ink4a/Arf, facilitates iPSC generation167,168,169,170,171,172,173,174. Along this line, mitosis has been suggested to facilitate reprogramming by promoting the resetting of chromatin states during DNA replication175. However, it has also been suggested that faster cell division simply expands the pool of cells that later become iPSCs, as well as cells with other deviated cell fates163. Nevertheless, it is not surprising that cell cycle acceleration is not sufficient for driving reprogramming cells to an epithelial state, since a minor percentage of fibroblasts also proliferate faster without transitioning to an epithelial state163.

Alteration of cell proliferation apparently agrees well with the early reprogramming-induced metabolic changes, which potentially provide the energy needed for the accelerated cell cycle. The metabolic profile of differentiated cells, which favors oxidative phosphorylation, is reset to a glycolysis-dependent ESC-like state after reprogramming176,177,178,179,180. This metabolic resetting takes place before pluripotency is established179. Manipulating the activities of glycolysis and oxidative phosphorylation affects reprogramming accordingly176,179, indicating that metabolic change is also a prerequisite step for pluripotency establishment. c-Myc is likely involved in the immediate cell cycle acceleration and metabolic changes. It has been shown that c-Myc functions early during reprogramming and regulates metabolic genes, in particular, glycolysis-related genes, which are not targeted by other reprogramming factors164. However, c-Myc is dispensable for iPSC generation181,182, and iPSCs derived in the absence of c-Myc bear a similar bioenergetic profile to those reprogrammed with c-Myc179, suggesting that the introduced pluripotency factors are sufficient for mediating the metabolic changes during reprogramming. Currently, how the metabolic change is achieved, and whether the metabolic change is linked to the concurrent MET process remain to be determined.

Activating pluripotency circuitry

Compared with the initial step in reprogramming, later events mediating the activation of the pluripotency circuitry in iPSC generation are less characterized. This is in part due to the low efficiency of reprogramming and the cell heterogeneity generated during the reprogramming process. To determine the molecular events leading to pluripotency establishment may require cell purification with available predictive markers or the use of single-cell-based assays. The possibility of the stochastic nature of iPSC generation168,183 further casts doubt on whether it is possible to define specific molecular events leading to iPSC generation. Nevertheless, several studies have shed some light on the late events in iPSC generation.

Pioneer studies revealed that iPSCs come from the cell population expressing ESC-specific surface marker SSEA-1155,156. Unfortunately, the predictability of this marker is low as the majority of SSEA-1+ cells do not achieve pluripotency156. Later work showed that some pluripotency genes, including Nanog, are upregulated soon after epithelial gene activation74,114,116,158. The proteins encoded by these early upregulated pluripotency genes may serve as pioneer factors in setting up the pluripotency circuitry. Consistent with this possibility, Nanog has been shown to be essential for the entry to pluripotency165,184. Specifically, in iPSC generation, Nanog is initially dispensable but takes a pivotal role in driving partially reprogrammed cells into pluripotency165, and overexpression of Nanog enhances overall reprogramming efficiency168.

A recent study using single-cell gene expression analysis and clonal retrospective tracing showed that cells destined to iPSCs express a specific set of genes before the pluripotency circuitry is activated107. Expression of a set of pluripotency genes, including Esrrb, Utf1, Lin28 and Dppa2, in clonal cells is strictly correlated with successful derivation of stable iPSC lines107. Indeed, iPSC generation can be facilitated by introducing Esrrb185, Utf1174, Lin28168,186 or Dppa251, suggesting that these factors may control some rate-limiting step(s) in the reprogramming process. Interestingly, Esrrb, Utf1 and Lin28 are upregulated earlier compared with other pluripotency genes in the bulk of reprogramming cells116, again suggesting that pioneer expression of some pluripotency genes helps to activate the whole pluripotency circuitry. In addition, other ESC-enriched genes, including Fbxo15, Fgf4, and surprisingly Oct4, the commonly used pluripotency marker, can be activated in partially reprogrammed cells that form colonies but fail to become stable iPSCs. Consequently, the expression of these genes does not predict for the iPSC fate107.

In the same study, an activation hierarchy for pluripotency factors was developed by probabilistic modeling and confirmed by experiments. In this hierarchy, endogenous activation of the Sox2 locus positions the most upstream and drives the hierarchical activation of a set of pluripotency factors107. Such a hierarchy suggests that generation of iPSCs can be achieved by using alternative combinations of reprogramming factors. Indeed, iPSC derivation can be achieved by combinations without any of the original Yamanaka factors, for example, the combination of Lin28, Sall4, Esrrb and Dppa2107, suggesting that the pluripotency circuitry can be activated from multiple entry points. In addition, it is a bit puzzling that activation of the endogenous Sox2 locus positions at the most upstream of the activation hierarchy, considering that Sox2 is one of the introduced reprogramming factors with a non-physiological expression level in reprogramming cells. We speculate that activation of the endogenous Sox2 locus likely represents the permissiveness of chromatin environment, at least, at some loci. The permissive chromatin state may lead to the activation of the pluripotency network with the help of the introduced reprogramming factors. In fact, expression of the other pluripotency gene, Nanog, has been shown to take place upon radical chromatin reorganization, and after this the pluripotency is established13,14.

Overall, at the latter stage of iPSC generation, activation of the pluripotency circuitry takes place through pioneer gene expression and subsequent hierarchical activation of pluripotency genes. The epigenetic barrier restricting the pluripotency establishment in this step is likely reflected in the global non-permissive chromatin configuration and/or the inherent non-permissive chromatin context at the pioneer pluripotency genes. The molecular natures of the epigenetic barriers probably include H3K9me3/213,14,71, H3K27me3110, insufficient histone acetylation25,49,51,52, hypermethylation and hypo-hydroxylmethylation on DNA7,147, and the lack of proper chromatin remodeling activities34,42. At this point, many questions remain unresolved. For example, what molecular events are required to overcome the potential barriers? Is the success in overcoming the barriers due to the direct effect of the reprogramming factors or the molecular features that are gained from the preceding events (e.g. MET)? Have these barriers overcome in a coordinated way or independently? Is there a rate-limiting barrier? Answering these questions requires detailed study of the chromatin status using pure cell populations or single-cell technologies.

Progression toward bona fide pluripotency

Upon activation of the pluripotency circuitry, reprogramming cells are able to self-renew independently of the introduced factors. However, these nascent iPSCs may have differential identities distinct from ESCs. First, isolated iPSC lines, at least in low passages, retain some somatic cell memory, which is reflected in the somatic cell gene expression pattern9,148,151,187,188 and/or chromatin modification pattern54,148,149,150,151. Vestige somatic cell traits need to be removed passively or actively, before nascent iPSCs reach “matured” pluripotency similar to that of ESCs. Second, the established iPSCs may harbor aberrant traits resulted from the reprogramming process. These traits are frequently reflected in aberrant DNA methylation, which can be translated into altered transcription output and/or abnormal functional phenotypes12,55,152. These aberrant traits may be caused by technical issues, such as stoichiometry of the reprogramming factors used189 and culturing conditions190. Alternatively, they may be intrinsic to the approach of transcription factors-mediated reprogramming. Regardless, this observation argues for the need in optimizing the reprogramming approach.

Despite the documented differences between iPSCs and ESCs, we note that, several analyses suggested that the variable properties of iPSCs may simply reflect the polymorphism of pluripotency that can also be observed among different ESC lines10,11,190. Nevertheless, further characterization of the polymorphism of pluripotency will enlighten us on how to better harness the therapeutic potential of the pluripotent stem cells.

Concluding remarks

Epigenetic regulatory mechanisms play important roles in shaping the cellular identity. ESCs provide an invaluable cellular model for understanding the biology of cell-fate control by epigenetic mechanisms; while reprogramming from somatic cells to iPSCs serves as a model system for understanding epigenetic regulations during cell fate transition. Characterization of the epigenetic changes and their roles in iPSC generation has provided valuable mechanistic insights into how cell fate change might be achieved. However, our understanding of iPSC generation at the molecular level is hindered by the cell heterogeneity arising from each step of reprogramming and the low percentage of cell population that achieves the iPSC fate. Based on the current understanding of the reprogramming mechanisms, investigation on the cell population enriched by epithelial markers may help elucidate the intermediate events leading to pluripotency. Identification and genetic engineering of stage-specific marker genes that can predict the reprogramming potential will facilitate the studies of molecular events leading to successful iPSC generation. Single-cell-based assays, although in its infancy, have already provided us much mechanistic insight. Analysis of chromatin status at the single-cell level, although technically challenging, will ultimately reveal the epigenetic mechanisms of the reprogramming.

Several recent studies revealed that ESCs have a specialized metabolic profile176,177,178,179,180. Interestingly, chromatin modification has also been associated with metabolism in the case of cancer cells191. In particular, intermediate metabolites, such as acetyl-CoA, S-adenosylmethionine (SAM) and α-ketoglutarate, are cofactors required for acetyltransferases, methyltransferases and dioxygenases, respectively. These enzymes represent a large number of chromatin-modifying enzymes, whose functions in ESCs and iPSC generation are discussed in this review. Most recently, the cellular SAM level, which is controlled by threonine metabolism in ESCs, has been shown to be essential for ESC self-renewal192. Threonine turnover sustains the H3K4me3 level in ESCs and supports the robust proliferation and self-renewal of ESCs192, illustrating the regulatory connections among metabolism, epigenetic modification and pluripotency. It will be intriguing to uncover how these connections are involved in the iPSC generation process.